mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
Closes a systematic +0.02..+0.07 SAP over-prediction on every triple-
glazed cert in cohort 2 (13 of 38) and removes a silent-default
failure mode flagged via cert 3336-2825-9400-0512-8292 (+0.0674 Δ).
Root cause: `_map_elmhurst_window` (datatypes/epc/domain/mapper.py)
was passing the Elmhurst-lodged glazing-type string verbatim into
`SapWindow.glazing_type` (declared `Union[int, str]`). The §5 (66)..
(67) daylight-factor cascade at
`domain/sap10_calculator/worksheet/internal_gains.py:512` requires
`isinstance(w.glazing_type, int)` to look up Table 6b col light g_L —
string lodgings silently fell through to the `_G_LIGHT_DEFAULT = 0.80`
(double-glazed) branch. Cert 3336 (Triple glazed, worksheet "Window,
Triple glazed") got g_L = 0.80 instead of the correct 0.70, inflating
C_daylight from 1.072 to 1.041 → lighting kWh under-predicted by
−4.53 kWh/yr → total fuel cost under by −1.17 GBP → ECF Δ −0.0049 →
SAP continuous over by +0.0674.
Fix: `_ELMHURST_GLAZING_LABEL_TO_SAP10` dict + `_elmhurst_glazing_
type_code` helper translate the Elmhurst Summary §11 lodged strings
to the SAP 10.2 Table U2 integer codes the cascade keys on:
"Single" → 1
"Double pre 2002" → 2
"Double between 2002 and 2021" → 3
"Double with unknown install date" → 3
"Double with unknown 16 mm or install date more" → 3
"Double post or during 2022" → 5
"Triple post or during 2022" → 6
"Triple post or during" → 6 (year-trunc.)
"Secondary" → 7
Two regex passes strip the layout noise the extractor sometimes folds
into the glazing-type token: a `(?:Part )?value value Proofed Shutters`
prefix (from adjacent column headers) and a ` Summary Information` /
` Alternative wall…` suffix. Verified against the union of cohort-1
(7 certs) + cohort-2 (38 certs) + test-fixture (9 PDFs) glazing
labels: 18 distinct surface forms, all closed by the dict + noise
patterns; one window in cert 2636's Summary_000898.pdf lodged the
year-truncated "Triple post or during" — added as an alias for code 6
per worksheet "Triple glazed" lodging.
Strict-enum gate: `_elmhurst_glazing_type_code` raises
`UnmappedElmhurstLabel("glazing_type", label)` (Slice S0380.15
pattern, extended to the new helper) when the label is None or not
in the dict — surfaces mapper-coverage gaps at extraction time rather
than masking them as a SAP precision floor.
Cohort-2 Summary-path delta progression (38 certs):
bucket before slice 2 after slice 2
exact (<1e-4) 11 11
<0.005 0 5 ← 9421 +0.0012, 2536 +0.0016, 9370 +0.0017, 0100 +0.0028, 2800 +0.0044
0.005-0.07 15 10 ← all triple-glazed
0.07-0.5 5 5
0.5-1 4 4
1-5 1 1
5+ 2 2
RAISES 0 0
3336 (user's flag) closes from +0.0674 → +0.0400 — the residual is
the remaining systematic offset the next slice will investigate.
Tests added (3):
- `test_summary_3336_triple_glazed_windows_route_to_code_6` — pins
the mapper output for the user's flagged cert.
- `test_summary_000474_double_glazed_windows_route_to_code_3` —
exercises the DG branch + the year-unknown alias mapping.
- `test_summary_mapper_raises_on_unmapped_glazing_type_label` —
strict-enum coverage gate via mutated site notes.
Tests updated (1):
- `test_first_window_glazing_type` (test_elmhurst_end_to_end.py):
asserts int code 5 (DG low-E argon — "Double post or during 2022")
not the string verbatim. The string-passthrough behaviour was
always a latent bug; this test was the only direct pin on it.
Pyright net-zero per file:
- datatypes/epc/domain/mapper.py: 32 (baseline 32)
- backend/documents_parser/tests/test_summary_pdf_mapper_chain.py: 0
- backend/documents_parser/tests/test_elmhurst_end_to_end.py: 0
Regression baseline: 694 pass + 10 fail (= prior 691 + 10 + 3 new).
Triple-glazed original-cohort certs are now closer to worksheet too;
the ±0.07 chain tests on the original cohort still hold, and a future
slice tightens them once the next-largest residual is closed.
Spec refs:
- SAP 10.2 Table U2 — glazing-type integer enum.
- SAP 10.2 Table 6b col light — light-transmission g_L by glazing
type (triple 0.70, double-glazed variants 0.80, single 0.90).
- RdSAP 10 §11 Windows — Summary lodging of glazing type as a
type+install-date phrase.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2007 lines
90 KiB
Python
2007 lines
90 KiB
Python
"""End-to-end validation for the Elmhurst Summary→EpcPropertyData chain.
|
||
|
||
The 6 Elmhurst worksheet fixtures in `domain.sap10_calculator.worksheet.tests`
|
||
build their `EpcPropertyData` synthetically — they validate the
|
||
calculator + cascade in isolation from the mapper. This file pins
|
||
the OTHER half of the chain: `from_elmhurst_site_notes` must produce
|
||
a calculator-equivalent `EpcPropertyData` when fed the Summary PDF
|
||
the worksheet was generated from. Together with the worksheet
|
||
cascade tests, this closes the loop: extractor + mapper + cascade
|
||
+ calculator validated end-to-end against the authoritative
|
||
Elmhurst documents.
|
||
|
||
Status: GREEN. For cert U985-0001-000474, this pipeline produces an
|
||
unrounded SAP within 0.5 of the worksheet PDF's `62.2584` (line 257).
|
||
The cascade itself reproduces Elmhurst's calculator exactly on
|
||
hand-built inputs (handbuilt → 62.2584 to 4 d.p.); the remaining
|
||
sub-half-point gap from the mapped path is non-load-bearing field
|
||
drift (e.g. central_heating_pump_age the Summary PDF doesn't lodge).
|
||
|
||
Preprocessing: the existing `ElmhurstSiteNotesExtractor` was written
|
||
against Textract-style output (label\\nvalue pairs in spatial
|
||
reading order). We don't have Textract in the test environment, so
|
||
this helper converts `pdftotext -layout` output (label-whitespace-
|
||
value on a single line) into the Textract-style sequence the
|
||
extractor expects. Test-only preprocessing; production runs through
|
||
Textract directly.
|
||
"""
|
||
|
||
from __future__ import annotations
|
||
|
||
import dataclasses
|
||
import json
|
||
import re
|
||
import subprocess
|
||
from pathlib import Path
|
||
from typing import cast
|
||
|
||
import pytest
|
||
|
||
from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor
|
||
from datatypes.epc.domain.mapper import (
|
||
EpcPropertyDataMapper,
|
||
UnmappedElmhurstLabel,
|
||
)
|
||
from domain.sap10_calculator.calculator import calculate_sap_from_inputs
|
||
from domain.sap10_calculator.rdsap.cert_to_inputs import SAP_10_2_SPEC_PRICES, cert_to_inputs
|
||
from domain.sap10_calculator.worksheet.tests import (
|
||
_elmhurst_worksheet_000474 as _w000474,
|
||
_elmhurst_worksheet_000477 as _w000477,
|
||
_elmhurst_worksheet_000480 as _w000480,
|
||
_elmhurst_worksheet_000487 as _w000487,
|
||
_elmhurst_worksheet_000490 as _w000490,
|
||
_elmhurst_worksheet_000516 as _w000516,
|
||
)
|
||
|
||
_FIXTURES = Path(__file__).parent / "fixtures"
|
||
_SUMMARY_000474_PDF = _FIXTURES / "Summary_000474.pdf"
|
||
_SUMMARY_000477_PDF = _FIXTURES / "Summary_000477.pdf"
|
||
_SUMMARY_000480_PDF = _FIXTURES / "Summary_000480.pdf"
|
||
_SUMMARY_000487_PDF = _FIXTURES / "Summary_000487.pdf"
|
||
_SUMMARY_000490_PDF = _FIXTURES / "Summary_000490.pdf"
|
||
_SUMMARY_000516_PDF = _FIXTURES / "Summary_000516.pdf"
|
||
_SUMMARY_001479_PDF = _FIXTURES / "Summary_001479.pdf"
|
||
_SUMMARY_000897_PDF = _FIXTURES / "Summary_000897.pdf"
|
||
_SUMMARY_000784_PDF = _FIXTURES / "Summary_000784.pdf"
|
||
_SUMMARY_000899_PDF = _FIXTURES / "Summary_000899.pdf"
|
||
_SUMMARY_000903_PDF = _FIXTURES / "Summary_000903.pdf"
|
||
_SUMMARY_000901_PDF = _FIXTURES / "Summary_000901.pdf" # cert 3800
|
||
_SUMMARY_000904_PDF = _FIXTURES / "Summary_000904.pdf" # cert 9285
|
||
_SUMMARY_000900_PDF = _FIXTURES / "Summary_000900.pdf" # cert 2225
|
||
_SUMMARY_000898_PDF = _FIXTURES / "Summary_000898.pdf" # cert 2636
|
||
_SUMMARY_000902_PDF = _FIXTURES / "Summary_000902.pdf" # cert 9418
|
||
_SUMMARY_000889_PDF = _FIXTURES / "Summary_000889.pdf" # cert 2536 (Normal cylinder)
|
||
_SUMMARY_000884_PDF = _FIXTURES / "Summary_000884.pdf" # cert 9421 (Normal cylinder)
|
||
|
||
# GOV.UK EPB API JSON for cert 001479 — the API-path counterpart of the
|
||
# Summary_001479.pdf fixture. Together they drive the API ≡ Summary
|
||
# parity workstream; Layer 4 of the validation stack is "API cascade SAP
|
||
# matches worksheet continuous SAP at 1e-4".
|
||
_API_001479_JSON = (
|
||
Path(__file__).parents[3]
|
||
/ "domain/sap10_calculator/rdsap/tests/fixtures/golden"
|
||
/ "0535-9020-6509-0821-6222.json"
|
||
)
|
||
|
||
|
||
def _summary_pdf_to_textract_style_pages(pdf_path: Path) -> list[str]:
|
||
"""Convert a Summary PDF into the per-page text format the existing
|
||
`ElmhurstSiteNotesExtractor` expects (label\\nvalue sequences).
|
||
|
||
`pdftotext -layout` preserves the spatial pairing of label and value
|
||
on each line; we split each line on 2+ spaces to surface the
|
||
label/value tokens, then concatenate them back into a single
|
||
newline-delimited stream per page.
|
||
"""
|
||
info = subprocess.run(
|
||
["pdfinfo", str(pdf_path)], capture_output=True, text=True, check=True
|
||
).stdout
|
||
m = re.search(r"Pages:\s+(\d+)", info)
|
||
if m is None:
|
||
raise RuntimeError(f"Could not parse page count from {pdf_path}")
|
||
page_count = int(m.group(1))
|
||
|
||
pages: list[str] = []
|
||
for i in range(1, page_count + 1):
|
||
layout = subprocess.run(
|
||
[
|
||
"pdftotext", "-layout", "-f", str(i), "-l", str(i),
|
||
str(pdf_path), "-",
|
||
],
|
||
capture_output=True, text=True, check=True,
|
||
).stdout
|
||
tokens: list[str] = []
|
||
for line in layout.splitlines():
|
||
if not line.strip():
|
||
tokens.append("")
|
||
continue
|
||
parts = [p for p in re.split(r"\s{2,}", line.strip()) if p]
|
||
tokens.extend(parts)
|
||
pages.append("\n".join(tokens))
|
||
return pages
|
||
|
||
|
||
def test_summary_000474_mapper_produces_three_building_parts() -> None:
|
||
# Arrange — cert U985-0001-000474 is a mid-terrace with 3 building
|
||
# parts (Main + 2 extensions) per the hand-built worksheet fixture
|
||
# at domain/sap10_calculator/worksheet/tests/
|
||
# _elmhurst_worksheet_000474.py. Routing the Summary PDF through
|
||
# extractor + mapper must yield the same count.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000474_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
assert len(epc.sap_building_parts) == 3
|
||
|
||
|
||
def test_summary_000474_mapper_extracts_seven_windows() -> None:
|
||
# Arrange — cert U985-0001-000474's §11 table lodges 7 windows
|
||
# across Main + 1st Extension + 2nd Extension. The legacy Textract-
|
||
# style window parser couldn't anchor on the Summary PDF's tabular
|
||
# layout; the new W/H/Area-plus-Manufacturer anchor pair picks them
|
||
# all up.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000474_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
assert len(epc.sap_windows) == 7
|
||
|
||
|
||
# Cohort chain SAP-pin tests follow. NOTE: certs 000474, 000480, 000487,
|
||
# 000490 previously had chain tests here pinning their cascade SAP
|
||
# against the U985 worksheet PDF — those tests were removed because
|
||
# their worksheets violate RdSAP 10 §5 (12) "Floor infiltration
|
||
# (suspended timber ground floor only)". Our cascade applies the spec
|
||
# rule (via `cert_to_inputs._has_suspended_timber_floor_per_spec`);
|
||
# the worksheet does not. So the spec-correct chain SAP for those
|
||
# certs can't match the worksheet SAP — by design, not by mapper bug.
|
||
# The Layer 1 hand-built fixtures for those 4 certs absorb the
|
||
# worksheet quirk by lodging `has_suspended_timber_floor=False`
|
||
# explicitly (overriding the spec inference) — so Layer 1 cascade pins
|
||
# still pin the worksheet value exactly. The chain tests below remain
|
||
# only for 000477, 000516 (and 001479 further down), where the
|
||
# worksheet IS spec-correct.
|
||
|
||
|
||
def test_summary_000477_full_chain_sap_matches_worksheet_pdf_exactly() -> None:
|
||
# Arrange — cert U985-0001-000477 is a single-bp mid-terrace with
|
||
# a 15.06 m² Room-in-Roof storey and zero baths lodged. Worksheet
|
||
# PDF lodges unrounded SAP 65.0057. Drives the chain through the
|
||
# `RoomInRoof.detailed_surfaces` cascade with stud walls @ 100mm
|
||
# Mineral, two uninsulated slopes, two party gable walls, plus the
|
||
# RR/storey-area suspended-timber-floor heuristic (RIR < storey →
|
||
# 0.2 ACH floor infiltration).
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000477_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Act
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert
|
||
worksheet_unrounded_sap = 65.0057
|
||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 1e-4
|
||
|
||
|
||
def test_summary_000516_full_chain_sap_matches_worksheet_pdf_exactly() -> None:
|
||
# Arrange — cert U985-0001-000516 is a mid-terrace with main bp +
|
||
# 19.02 m² room-in-roof. Worksheet PDF lodges unrounded SAP 62.7937.
|
||
# The §11 table mixes 5 vertical windows (U=2.80) with 1 roof
|
||
# window (U=3.10 in cert, U=3.40 Table 24 raw); the mapper
|
||
# discriminates by `U > 3.0` and routes the high-U entry to
|
||
# `sap_roof_windows` so its solar gains feed §6 with the right
|
||
# pitch (45°) and Table-24 U-value.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000516_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Act
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert
|
||
worksheet_unrounded_sap = 62.7937
|
||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 1e-4
|
||
|
||
|
||
def test_summary_001479_mapper_extensions_count_matches_extension_bps() -> None:
|
||
# Arrange — cert 0535-9020-6509-0821-6222 (Summary_001479) is the first
|
||
# cohort cert with an actual GOV.UK API counterpart. Worksheet PDF
|
||
# lodges Main + Extension 1 + Extension 2 (3 building parts, 2
|
||
# extensions). Pre-slice the Elmhurst mapper hard-coded
|
||
# `extensions_count=0` regardless of survey.extensions; this asserts
|
||
# the count flows through.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_001479_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
assert epc.extensions_count == 2
|
||
assert len(epc.sap_building_parts) == 3
|
||
|
||
|
||
def test_summary_001479_main_party_wall_construction_is_cavity_unfilled() -> None:
|
||
# Arrange — cert 001479 Main §7 Walls lodges "Party Wall Type: CU
|
||
# Cavity masonry unfilled". The Elmhurst leading-code map previously
|
||
# only knew "S" and "C"; "CU" fell through to None, which made the
|
||
# cascade default to U=0.25 instead of the worksheet's lodged U=0.50.
|
||
# The fix adds "CU" → SAP10 wall_construction code 4 (WALL_CAVITY),
|
||
# which `u_party_wall` resolves to U=0.50 — matching the worksheet's
|
||
# §3 `Party walls Main … 0.50` row.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_001479_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
assert epc.sap_building_parts[0].party_wall_construction == 4
|
||
|
||
|
||
def test_summary_001479_ext2_floor_is_exposed_to_external_air() -> None:
|
||
# Arrange — cert 001479 Ext2 §9 lodges "Location: E To external air"
|
||
# — a cantilevered exposed timber floor (the upper-storey extension
|
||
# over the back garden). The worksheet's §3 row `Exposed floor Ext2
|
||
# … 1.92, 1.20, 1.20` pins this as U=1.20 via Table 20. Pre-slice the
|
||
# mapper only routed "U Above unheated space" through `is_exposed_
|
||
# floor=True`; "E To external air" fell through to the BS EN ISO
|
||
# 13370 ground-floor cascade, dropping the lodged exposure entirely.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_001479_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
ext2 = epc.sap_building_parts[2]
|
||
assert ext2.floor_type == "To external air"
|
||
assert ext2.sap_floor_dimensions[0].is_exposed_floor is True
|
||
|
||
|
||
def test_summary_001479_ext2_sloping_ceiling_roof_uninsulated_for_pre_1950() -> None:
|
||
# Arrange — cert 001479 Ext2 §8 lodges "Type: PS Pitched, sloping
|
||
# ceiling" + "Insulation Thickness: As Built" + age band C (1930-49).
|
||
# Original 1930s construction had no sloping-ceiling insulation;
|
||
# worksheet §3 `External roof Ext2 … 2.30` pins U=2.30 (uninsulated
|
||
# Table 16 row 0). Pre-slice the mapper passed thickness=None through,
|
||
# routing to `u_roof`'s pitched-roof Table 18 col 1 default (0.40 for
|
||
# age C, assumes loft-joist retrofit) — wrong geometry for PS.
|
||
# Ext1's PS roof at age M leaves thickness=None (modern build,
|
||
# cascade default U=0.15 matches worksheet).
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_001479_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
assert epc.sap_building_parts[2].roof_insulation_thickness == 0
|
||
assert epc.sap_building_parts[1].roof_insulation_thickness is None
|
||
|
||
|
||
def test_summary_001479_secondary_heating_routes_mains_gas_fuel() -> None:
|
||
# Arrange — cert 001479 §14.1 Main Heating2 lodges "Secondary Heating
|
||
# Code: SAP code 605, Flush fitting live effect gas fire, sealed to
|
||
# chimney". The Summary surfaces only the SAP code (605); the fuel
|
||
# type 26 (mains gas) must be derived from the code range so the
|
||
# `_fuel_cost` orchestrator's `secondary_high_rate_gbp_per_kwh`
|
||
# picks up Table 32's gas tariff (£0.0348/kWh) rather than the
|
||
# default standard-electricity tariff (£0.132/kWh). Worksheet line
|
||
# (242) "Space heating - secondary … 3.4800 70.5022" confirms gas
|
||
# pricing.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_001479_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
assert epc.sap_heating.secondary_heating_type == 605
|
||
assert epc.sap_heating.secondary_fuel_type == 26
|
||
|
||
|
||
def test_summary_9501_flat_has_no_built_form_in_summary_pdf() -> None:
|
||
# Arrange — cert 9501 (Summary_000784.pdf) is a flat. The Elmhurst
|
||
# Summary's §1.0 "Property type" section lodges the built-form
|
||
# descriptor (e.g. "M Mid-Terrace", "D Detached") only for houses;
|
||
# flats have no built-form line — the §2.0 "Number of Storeys"
|
||
# section follows immediately after the "F Flat" property type.
|
||
#
|
||
# The extractor's `_extract_attachment` regex previously captured
|
||
# the line immediately after the property-type value
|
||
# unconditionally, so cert 9501 ends up with attachment
|
||
# "2.0 Number of Storeys:" — pure section-header noise that the
|
||
# mapper then surfaces on EpcPropertyData.built_form, breaking the
|
||
# cascade's flat-exposure routing downstream.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000784_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert — built_form is empty for flats. Houses set it to their
|
||
# attachment descriptor; flats lodge no attachment.
|
||
assert epc.built_form == ""
|
||
|
||
|
||
def test_summary_9501_dwelling_type_is_top_floor_flat() -> None:
|
||
# Arrange — cert 9501's worksheet treats the cert as a TOP-floor
|
||
# flat: §3 (28a) "Ground floor Main … U=0.0" because the floor
|
||
# sits over "Another dwelling below" (worksheet line 9.0 Floor
|
||
# location); §3 (30) has both an external roof + RR contributions
|
||
# so the roof IS exposed. The cascade's `_dwelling_exposure`
|
||
# function does prefix matching on `dwelling_type.lower()` to gate
|
||
# which surfaces are party — without "top-floor flat" the cert
|
||
# falls through to fully-exposed houses (Δ +9.25 W/K on floor).
|
||
#
|
||
# Floor-position inference rules:
|
||
# - floor.location indicates "Another dwelling below"
|
||
# → not ground floor (rules out ground-floor flat)
|
||
# - room_in_roof OR external roof present
|
||
# → roof exposed (rules out mid-floor flat)
|
||
# - therefore → top-floor flat
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000784_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
assert epc.dwelling_type is not None
|
||
assert epc.dwelling_type.lower().startswith("top-floor")
|
||
|
||
|
||
def test_summary_9501_rr_gable_walls_route_to_external_walls_hlc() -> None:
|
||
# Arrange — cert 9501's worksheet §3 lodges "Roof room Main Gable
|
||
# Wall 1" + "Gable Wall 2" as line (29a) entries (external walls)
|
||
# at the main-wall U (= 1.70 for age B Solid Brick): 13.50×1.70 +
|
||
# 15.95×1.70 = 50.07 W/K added on top of the regular external-walls
|
||
# 168.74 → 218.81 W/K total.
|
||
#
|
||
# The Summary mapper currently lodges these as
|
||
# `SapRoomInRoofSurface(kind='gable_wall', ...)` — the cascade's
|
||
# cohort-house default which routes to party walls at U=0.25
|
||
# (Table 4 row 2). For a top-floor flat in a mid-terrace block,
|
||
# the gables sit at the ends of the building (no neighbour above)
|
||
# — they're EXTERNAL not party. Surface them as
|
||
# `gable_wall_external` so the cascade's (29a) sum picks them up.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000784_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Act
|
||
from domain.sap10_calculator.rdsap.cert_to_inputs import (
|
||
heat_transmission_section_from_cert,
|
||
)
|
||
ht = heat_transmission_section_from_cert(epc)
|
||
|
||
# Assert — worksheet (29a) total walls = 168.7420 (main) +
|
||
# 22.95 (Gable 1) + 27.115 (Gable 2) = 218.807 W/K. Tolerance
|
||
# 1e-2 absorbs the 2-d.p. rounding of the underlying U/area
|
||
# products; the 1e-4 chain test downstream will tighten this
|
||
# to the cascade-internal rounding floor.
|
||
worksheet_walls_w_per_k = 218.807
|
||
assert abs(ht.walls_w_per_k - worksheet_walls_w_per_k) <= 1e-2
|
||
|
||
|
||
def test_summary_9501_pv_array_surfaced_from_elmhurst_section_19() -> None:
|
||
# Arrange — cert 9501's Elmhurst §19.0 PV section lodges measured
|
||
# array detail (2.36 kWp, South-West orientation, 45° elevation,
|
||
# "None Or Little" overshading). The worksheet's §10a PV credit
|
||
# of -250.02 GBP (-129.49 used in dwelling + -120.53 exported)
|
||
# depends on Appendix M / Appendix U3.3 reading these from the
|
||
# cascade's `SapEnergySource.photovoltaic_arrays` list. Without
|
||
# the array surfacing the cascade computes total cost +£250 too
|
||
# high → ECF 2.92 vs worksheet 2.26 → SAP 59.26 vs 68.53 (current
|
||
# Δ -9.27 after Slice 99c closed the fabric heat loss).
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000784_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
arrays = epc.sap_energy_source.photovoltaic_arrays
|
||
assert arrays is not None
|
||
assert len(arrays) == 1
|
||
assert abs(arrays[0].peak_power - 2.36) <= 1e-4
|
||
assert arrays[0].orientation == 6 # SAP octant: South-West
|
||
assert arrays[0].pitch == 3 # RdSAP §11.1 pitch enum: code 3 = 45°
|
||
assert arrays[0].overshading == 1 # RdSAP code: None or very little
|
||
|
||
|
||
def test_summary_9501_full_chain_sap_matches_worksheet_pdf_exactly() -> None:
|
||
# Arrange — cert 9501-3059-8202-7356-0204 (Summary_000784.pdf /
|
||
# dr87-0001-000784.pdf) is the third boiler validation cert and
|
||
# the first FLAT in the per-cert mapper validation cohort.
|
||
# Mains-gas Vaillant PCDB idx 19007, mid-terrace top-floor flat
|
||
# with Room-in-Roof + measured PV (2.36 kWp SW @ 45°). TFA 113.08
|
||
# m². Worksheet PDF "SAP value" line lodges unrounded SAP
|
||
# **68.5252**.
|
||
#
|
||
# Slices 99a-99e jointly closed the Summary path from Δ -5.25 to
|
||
# 1e-4: 99a extractor attachment fix (built_form=''), 99b dwelling
|
||
# _type identifies top-floor flat (cascade exposure routing), 99c
|
||
# RR gables external for flats + SO Solid Brick wall code, 99d
|
||
# surface PV array from §19.0, 99e PV pitch enum-not-degrees.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000784_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Act
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert — 1e-4 pin (project memory `feedback_zero_error_strict`).
|
||
worksheet_unrounded_sap = 68.5252
|
||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 1e-4
|
||
|
||
|
||
def test_summary_001479_full_chain_sap_matches_worksheet_pdf_exactly() -> None:
|
||
# Arrange — cert 001479 (Summary_001479.pdf / P960-0001-001479.pdf)
|
||
# is the first cohort cert with a real GOV.UK EPB API counterpart
|
||
# (cert ref 0535-9020-6509-0821-6222). Worksheet PDF line "SAP value"
|
||
# lodges unrounded SAP **69.0094** (rating C 69, also the API-
|
||
# published integer). This is the load-bearing forcing function for
|
||
# the API↔Elmhurst parity workstream: any drift from 1e-4 means a
|
||
# mapper gap, not a calculator bug — the cohort 6 cert cascades all
|
||
# reproduce Elmhurst exactly at 1e-4 on hand-built fixtures.
|
||
#
|
||
# Source-data caveat (documented for future debuggers): Summary §3
|
||
# lodges Ext1 age band as "M 2023 onwards"; the worksheet header
|
||
# records "Ext1: L". Likely assessor data-entry inconsistency. The
|
||
# mapper trusts the Summary (its source of truth); accept whatever
|
||
# residual the M vs L disagreement produces.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_001479_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Act
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert — 1e-4 pin, no widening, no xfail (project memory
|
||
# `feedback_zero_error_strict`).
|
||
worksheet_unrounded_sap = 69.0094
|
||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 1e-4
|
||
|
||
|
||
def test_summary_0330_full_chain_sap_matches_worksheet_pdf_exactly() -> None:
|
||
# Arrange — cert 0330-2249-8150-2326-4121 (Summary_000897.pdf /
|
||
# dr87-0001-000897.pdf) is the second boiler cert under per-cert
|
||
# mapper validation: mains-gas boiler (PCDB idx 10241), mid-terrace
|
||
# 2-bp dwelling, TFA 69.14 m². Worksheet PDF "SAP value" line lodges
|
||
# unrounded SAP **61.5993**. Same load-bearing role as cert 001479
|
||
# (the first boiler) — Summary path proves itself against the
|
||
# worksheet, then becomes the canonical reference for the API path.
|
||
# Expected RED at Δ +0.4667 at handover-baseline (Summary mapper
|
||
# cascade SAP 62.0660); mapper gaps to close are §11 glazing_type=14
|
||
# (windows HLC +6.71 W/K) and the §4 hot-water cascade (kWh +1060).
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000897_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Act
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert — 1e-4 pin, no widening, no xfail (project memory
|
||
# `feedback_zero_error_strict`).
|
||
worksheet_unrounded_sap = 61.5993
|
||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 1e-4
|
||
|
||
|
||
def test_summary_0380_main_heating_category_is_heat_pump() -> None:
|
||
# Arrange — cert 0380's Summary lodges main heating as a PCDB-
|
||
# indexed Mitsubishi PUZ-WM50VHA (idx 104568), which lives in
|
||
# PCDB Table 362 (heat pumps only). The Elmhurst mapper must
|
||
# surface `main_heating_category=4` so the cascade routes the
|
||
# cert through the Appendix N3.6/N3.7 heat-pump path instead of
|
||
# falling through to the default boiler-ish branches that key off
|
||
# `main_heating_category in {1, 2}`. Spec ref: SAP 10.2 Table 4a
|
||
# (main heating category code 4 = heat pump).
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000899_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
assert epc.sap_heating.main_heating_details, "no main heating details surfaced"
|
||
main = epc.sap_heating.main_heating_details[0]
|
||
assert main.main_heating_index_number == 104568
|
||
assert main.main_heating_category == 4
|
||
|
||
|
||
def test_summary_0380_filled_cavity_plus_external_insulation_routes_to_code_6() -> None:
|
||
# Arrange — cert 0380's Summary lodges main walls as
|
||
# `wall_type = "CA Cavity"` and `insulation = "FE Filled Cavity +
|
||
# External"` (a cavity wall with subsequent external-insulation
|
||
# upgrade). The cascade enum `wall_insulation_type=6` is
|
||
# "filled cavity + external insulation" (per
|
||
# `domain.sap10_ml.rdsap_uvalues` lines 120-131); without it the
|
||
# cascade defaults to the as-built routing and overstates walls
|
||
# heat loss by +58 W/K on cert 0380 (Summary 69.69 vs API 11.62
|
||
# at HEAD before this slice). API path EPC for cert 0380 surfaces
|
||
# `wall_insulation_type=6` and is the ground-truth pin here.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000899_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
assert epc.sap_building_parts, "no building parts surfaced"
|
||
main = epc.sap_building_parts[0]
|
||
assert main.wall_construction == 4 # 4 = Cavity ('CA')
|
||
assert main.wall_insulation_type == 6 # 6 = filled cavity + external
|
||
|
||
|
||
def test_summary_0380_surfaces_wall_insulation_thickness_100mm() -> None:
|
||
# Arrange — cert 0380's Summary §7.0 Walls block lodges the
|
||
# composite-wall insulation thickness on the line pair
|
||
# "Insulation Thickness" / "100 mm". Without surfacing this to
|
||
# `wall_insulation_thickness`, the heat-transmission cascade
|
||
# falls through `_parse_thickness_mm(None) → None` and the
|
||
# composite filled-cavity-plus-external U-value calc uses its
|
||
# default thickness rather than the lodged 100 mm — leaving cert
|
||
# 0380's `walls_w_per_k` at 24.62 vs API's 11.62 even with
|
||
# `wall_insulation_type=6` set (Slice S0380.3). Mirror of the
|
||
# existing `_roof_details_from_lines` reader that surfaces roof
|
||
# `insulation_thickness_mm` from the same "Insulation Thickness"
|
||
# label.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000899_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert — match the API mapper's "100mm" string (the EPC schema
|
||
# type is `Optional[str]`; the cascade's `_parse_thickness_mm`
|
||
# strips non-digit trailers).
|
||
main = epc.sap_building_parts[0]
|
||
assert main.wall_insulation_thickness == "100mm"
|
||
|
||
|
||
def test_summary_0380_surfaces_insulated_door_u_value_1_2() -> None:
|
||
# Arrange — cert 0380's Summary §10 Doors block lodges the door
|
||
# U-value on the "Average U-value" / "1.20" line pair. The dr87
|
||
# worksheet line ref (26) confirms the spec value: "Doors
|
||
# insulated 1, NetArea 3.7000 m², U-value 1.2000, A×U 4.4400 W/K".
|
||
# Without surfacing the lodged U-value the cascade defaults the
|
||
# door U and overstates `doors_w_per_k` to 5.18 vs worksheet
|
||
# 4.44 W/K. The comment at
|
||
# `datatypes/epc/domain/epc_property_data.py:585` claimed the
|
||
# value was "not available in site notes" — that assertion is
|
||
# outdated for Elmhurst Summary PDFs which lodge it explicitly.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000899_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert — float compare with small tolerance (Summary lodges
|
||
# "1.20" which parses cleanly to 1.2; API lodges 1.2 directly).
|
||
assert epc.insulated_door_u_value is not None
|
||
assert abs(epc.insulated_door_u_value - 1.2) < 1e-6
|
||
|
||
|
||
def test_summary_0380_cylinder_block_surfaces_full_15_1_lodging() -> None:
|
||
# Arrange — cert 0380's Summary §15.1 Hot Water Cylinder block
|
||
# lodges (L 340-347):
|
||
# Cylinder Size Medium
|
||
# Insulated Foam
|
||
# Insulation Thickness 50 mm
|
||
# Cylinder Thermostat Yes
|
||
# The dr87 worksheet pins these as:
|
||
# (47) Cylinder Volume 160.00 L → cascade enum 3
|
||
# "Cylinder Insulation Type Foam" → cascade enum 1 (factory)
|
||
# "Cylinder Insulation Thickness 50 mm" → 50
|
||
# "Cylinder Stat Yes" → 'Y'
|
||
# Worksheet (51) 0.0152 × (52) 0.9086 × (53) 0.5400 × (47) 160 ÷ 1000
|
||
# = daily storage loss 1.193 kWh/day → (56) annual ~435 kWh — exact
|
||
# only when ALL FOUR fields are surfaced together: insulation_type
|
||
# + thickness key the Table 2 loss factor (51), volume keys (52),
|
||
# and cylinder_thermostat keys the Table 2b temperature factor (53).
|
||
# Without cylinder_thermostat='Y' the cascade uses the no-stat
|
||
# temperature factor (~0.9 instead of 0.54) and HW storage loss
|
||
# over-counts by ~300 kWh/yr.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000899_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
assert epc.sap_heating.cylinder_size == 3
|
||
assert epc.sap_heating.cylinder_insulation_type == 1
|
||
assert epc.sap_heating.cylinder_insulation_thickness_mm == 50
|
||
assert epc.sap_heating.cylinder_thermostat == "Y"
|
||
|
||
|
||
def test_summary_0350_surfaces_two_pv_arrays() -> None:
|
||
# Arrange — cert 0350's Summary §19.0 Photovoltaic Panel block
|
||
# lodges TWO arrays (L 503-510):
|
||
# 1.50 kWp / South-East / 45° / None Or Little
|
||
# 1.50 kWp / North-West / 45° / None Or Little
|
||
# The Elmhurst extractor's `_extract_pv_array_detail` hardcodes a
|
||
# single 4-value reader (loop breaks at `len(values) == 4`) and
|
||
# the `Renewables` dataclass exposes only 4 scalar PV fields —
|
||
# together they cap output at one array regardless of how many the
|
||
# PDF lodges. Cert 0380 (single-array) is unaffected; cert 0350
|
||
# is the first multi-array cohort cert. Without both arrays the
|
||
# cascade halves the PV export credit and the SAP score drops.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000903_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
assert epc.sap_energy_source is not None
|
||
arrays = epc.sap_energy_source.photovoltaic_arrays
|
||
assert arrays is not None
|
||
assert len(arrays) == 2
|
||
# Both arrays at 1.5 kWp; order matches PDF row order.
|
||
assert arrays[0].peak_power == 1.5
|
||
assert arrays[1].peak_power == 1.5
|
||
|
||
|
||
def test_summary_0350_ext1_inherits_main_wall_insulation_thickness() -> None:
|
||
# Arrange — cert 0350-2968-2650-2796-5255 is a multi-bp dwelling
|
||
# (Main + 1st Extension). Its Summary §7 Walls block lodges
|
||
# "1st Extension / As Main Wall / Yes" — the extension's walls
|
||
# inherit Main's lodgings (CA Cavity, FE Filled Cavity + External,
|
||
# 100 mm). The `_extract_extensions` "As Main Wall" inheritance
|
||
# at `elmhurst_extractor.py:559-567` builds a new WallDetails by
|
||
# copying Main's fields, but the field set it copies was frozen
|
||
# before Slice S0380.4 added `insulation_thickness_mm` — so the
|
||
# extension's `WallDetails.insulation_thickness_mm` falls through
|
||
# to its dataclass default (None), and the mapper surfaces
|
||
# `wall_insulation_thickness=None` on bp[1]. The cascade then
|
||
# routes Ext1's composite walls off the lodged-thickness path,
|
||
# over-stating Ext1 `external_walls_w_per_k` against worksheet
|
||
# line ref (29a) "External walls Ext1 5.21 0.25 1.3025".
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000903_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert — Ext1 inherits Main's 100 mm thickness and the EPC
|
||
# surfaces "100mm" on bp[1] (matching bp[0]).
|
||
assert len(epc.sap_building_parts) == 2
|
||
main_bp, ext1_bp = epc.sap_building_parts
|
||
assert main_bp.wall_insulation_thickness == "100mm"
|
||
assert ext1_bp.wall_insulation_thickness == "100mm"
|
||
|
||
|
||
def test_summary_0350_full_chain_sap_within_spec_floor_of_worksheet() -> None:
|
||
# Arrange — cert 0350-2968-2650-2796-5255 (Summary_000903.pdf /
|
||
# dr87-0001-000903.pdf) is the second heat-pump cert under per-cert
|
||
# Summary-path mapper validation and the first multi-bp cohort
|
||
# cert: Mitsubishi PUZ-WM50VHA ASHP (PCDB index 104568), main
|
||
# dwelling + 1 extension, 2 PV arrays (2x 1.5 kWp at SE / NW).
|
||
# Worksheet PDF "SAP value" line lodges unrounded SAP **84.1367**.
|
||
#
|
||
# First-attempt closure (validating the structural-debt-amortizes
|
||
# hypothesis): after Slices S0380.2..S0380.6 (which were forced by
|
||
# cert 0380) the cohort HP routing + cylinder block were already
|
||
# in place; cert 0350 needed only TWO new slices:
|
||
# - Slice S0380.8: extension "As Main Wall" inheritance copies
|
||
# `insulation_thickness_mm` (cert 0380 was single-bp, didn't
|
||
# exercise the inheritance path).
|
||
# - Slice S0380.9: refactor Elmhurst `Renewables` to support
|
||
# multiple PV arrays per dwelling (cert 0380 was single-array,
|
||
# didn't exercise multi-array PV).
|
||
# Both fixes are structural and apply cohort-wide.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000903_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Act
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert — ±0.07 ASHP-cohort spec-floor tolerance.
|
||
worksheet_unrounded_sap = 84.1367
|
||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < _ASHP_COHORT_CHAIN_TOLERANCE
|
||
|
||
|
||
def test_summary_2636_alt_wall_window_parses_alternative_wall_location() -> None:
|
||
# Arrange — cert 2636-0525-2600-0401-2296's §11 Windows block lodges
|
||
# one alt-wall window (the 1.19 m² north-facing one): the row's
|
||
# "Alternative wall" string appears BEFORE the W×H×A line, not
|
||
# after the frame_factor (the normal position for "External wall").
|
||
# The extractor's `_parse_window_from_anchors` was only scanning
|
||
# the post-frame_factor `middle` slice for wall-location tokens →
|
||
# defaulted to "External wall" for the alt-wall row → cascade
|
||
# allocated the window to the main wall instead of the alt-wall,
|
||
# leaving Main external walls W/K under-deducted by ~0.54 vs
|
||
# worksheet (29a). Fix: also scan the PRE-data slice
|
||
# `lines[before_start:data_idx]` for wall tokens.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000898_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert — the 1.19 m² window is recorded with wall_type =
|
||
# "Alternative wall"; all other windows stay on "External wall".
|
||
by_area = {round(w.window_width, 2): w.window_wall_type for w in epc.sap_windows}
|
||
assert by_area[1.19] == "Alternative wall"
|
||
assert by_area[2.25] == "External wall" # main-wall windows unchanged
|
||
|
||
|
||
def test_summary_2225_no_showers_lodged_resolves_to_zero_counts() -> None:
|
||
# Arrange — cert 2225-3062-8205-2856-7204's Summary §1x Baths and
|
||
# Showers block lodges 0 baths and ZERO showers (no shower rows at
|
||
# all). The Summary mapper's existing logic at
|
||
# `mapper.py:3536-3537` predicates the count assignment on
|
||
# `has_electric_shower`: when no electric shower is detected the
|
||
# counts collapse to None — but cert 2225 has no showers at all,
|
||
# not "non-electric showers". The None values then drive the
|
||
# cascade's default-1-mixer assumption, over-counting HW kWh.
|
||
# Same disposition the API path received in slice 102f-prep.8
|
||
# (commit 1d5183c6: "API mapper resolves shower_outlets=None →
|
||
# 0 mixers").
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000900_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
# Pre-condition: §1x lodges zero showers (proves the test sees
|
||
# the same no-showers fixture the cascade does).
|
||
assert len(site_notes.baths_and_showers.showers) == 0
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert — zero-shower lodgings resolve to explicit 0 counts (not
|
||
# None) so the cascade does not default-assume a mixer.
|
||
assert epc.sap_heating.electric_shower_count == 0
|
||
assert epc.sap_heating.mixer_shower_count == 0
|
||
|
||
|
||
def test_summary_2225_full_chain_sap_within_spec_floor_of_worksheet() -> None:
|
||
# Arrange — cert 2225-3062-8205-2856-7204 (Summary_000900.pdf):
|
||
# Mitsubishi PUZ-WM50VHA, single-bp single-array PV (3.28 kWp SE),
|
||
# ZERO showers lodged. Worksheet "SAP value" 88.7921. Slice
|
||
# S0380.11 closed the zero-shower defaulting bug (None → 0 mixers
|
||
# for cohort certs that lodge no showers); cert 2225 was the
|
||
# forcing function. Same disposition the API path received in
|
||
# slice 102f-prep.8 (commit 1d5183c6).
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000900_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Act
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert — ±0.07 ASHP-cohort spec-floor tolerance.
|
||
worksheet_unrounded_sap = 88.7921
|
||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < _ASHP_COHORT_CHAIN_TOLERANCE
|
||
|
||
|
||
def test_summary_2636_full_chain_sap_within_spec_floor_of_worksheet() -> None:
|
||
# Arrange — cert 2636-0525-2600-0401-2296 (Summary_000898.pdf):
|
||
# Mitsubishi PUZ-WM50VHA, mid-terrace house with **alt-wall +
|
||
# cantilever** — the most complex geometry in the ASHP cohort.
|
||
# Worksheet "SAP value" lodges 86.2641.
|
||
#
|
||
# Closed by two combined slices:
|
||
# - S0380.12: alt-wall window-location parser fix (walls W/K
|
||
# 20.5595 → 20.0240 = worksheet exact).
|
||
# - S0380.13: cantilever gate accepts "House" descriptive form
|
||
# in addition to the schema enum "0" (allowing the Summary
|
||
# mapper's descriptive property_type to trigger the cantilever
|
||
# detection that slice 102f-prep.9 added on the API path).
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000898_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Act
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert — ±0.07 ASHP-cohort spec-floor tolerance.
|
||
worksheet_unrounded_sap = 86.2641
|
||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < _ASHP_COHORT_CHAIN_TOLERANCE
|
||
|
||
|
||
def test_summary_mapper_raises_on_unmapped_cylinder_size_label() -> None:
|
||
# Arrange — start from a real cohort cert (any extracted site
|
||
# notes) and inject an unmapped §15.1 "Cylinder Size" label
|
||
# ("Tiny" — not in the lookup dict). `from_elmhurst_site_notes`
|
||
# must raise `UnmappedElmhurstLabel` rather than silently
|
||
# returning None for `cylinder_size` (the failure mode that hid
|
||
# cert 9418's "Large" miss until Slice S0380.14 surfaced it as
|
||
# a Δ +2.60 SAP gap).
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000899_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
site_notes.water_heating.cylinder_size_label = "Tiny"
|
||
|
||
# Act / Assert
|
||
with pytest.raises(UnmappedElmhurstLabel) as excinfo:
|
||
EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
assert excinfo.value.field == "cylinder_size"
|
||
assert excinfo.value.value == "Tiny"
|
||
|
||
|
||
def test_summary_mapper_raises_on_unmapped_cylinder_insulation_label() -> None:
|
||
# Arrange — mirror test for the §15.1 "Insulated" label dict.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000899_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
site_notes.water_heating.cylinder_insulation_label = "Polyester wool"
|
||
|
||
# Act / Assert
|
||
with pytest.raises(UnmappedElmhurstLabel) as excinfo:
|
||
EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
assert excinfo.value.field == "cylinder_insulation"
|
||
assert excinfo.value.value == "Polyester wool"
|
||
|
||
|
||
def test_all_seven_ashp_cohort_certs_extract_without_unmapped_label_raise() -> None:
|
||
# Arrange — coverage forcing function: every cohort cert must
|
||
# extract through `from_elmhurst_site_notes` without triggering an
|
||
# `UnmappedElmhurstLabel` raise from any strict helper. New cohort
|
||
# certs added in subsequent slices fall under the same gate, and
|
||
# any future Elmhurst-PDF variant with an unmapped label fails
|
||
# this test until the missing dict entry is added.
|
||
cohort_pdfs = (
|
||
_SUMMARY_000899_PDF, _SUMMARY_000903_PDF, _SUMMARY_000900_PDF,
|
||
_SUMMARY_000898_PDF, _SUMMARY_000901_PDF, _SUMMARY_000904_PDF,
|
||
_SUMMARY_000902_PDF,
|
||
)
|
||
|
||
# Act / Assert
|
||
for pdf in cohort_pdfs:
|
||
pages = _summary_pdf_to_textract_style_pages(pdf)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
# Strict mapper run — raises if any cylinder helper hits an
|
||
# unknown label.
|
||
EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
|
||
def test_summary_3336_triple_glazed_windows_route_to_code_6() -> None:
|
||
# Arrange — cert 3336-2825-9400-0512-8292's Summary §11 lodges
|
||
# "Triple post or during 2022" on every window; dr87-0001-000888
|
||
# confirms "Window, Triple glazed" on every line. The Elmhurst
|
||
# mapper must surface SAP 10.2 Table U2 code 6 so the §5 (66)..
|
||
# (67) daylight factor uses Table 6b col light g_L = 0.70 instead
|
||
# of the default DG g_L = 0.80 — the +0.0274 SAP regression that
|
||
# this slice closes is driven by the daylight-factor offset that
|
||
# the default-DG silently masked.
|
||
pages = _summary_pdf_to_textract_style_pages(
|
||
_FIXTURES / "Summary_000888.pdf"
|
||
)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert — every window on cert 3336 is triple-glazed → code 6.
|
||
assert epc.sap_windows, "expected windows on cert 3336"
|
||
for w in epc.sap_windows:
|
||
assert w.glazing_type == 6
|
||
|
||
|
||
def test_summary_000474_double_glazed_windows_route_to_code_3() -> None:
|
||
# Arrange — boiler-cohort cert (Summary_000474.pdf) lodges
|
||
# "Double between 2002 and 2021" / "Double with unknown install
|
||
# date" on every window. Both routes to SAP 10.2 Table U2 code 3
|
||
# (DG air-filled post-2002) per the `_ELMHURST_GLAZING_LABEL_TO
|
||
# _SAP10` dict — same Table 6b col light g_L = 0.80 as the
|
||
# default, so the cascade SAP is unchanged for these certs, but
|
||
# the integer pin guards against future cascade consumers that
|
||
# key on the subcode (e.g. a U-value default lookup for absent
|
||
# `WindowTransmissionDetails`).
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000474_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
assert epc.sap_windows, "expected windows on cert 000474"
|
||
for w in epc.sap_windows:
|
||
assert w.glazing_type == 3, (
|
||
f"expected DG post-2002 code 3, got {w.glazing_type!r}"
|
||
)
|
||
|
||
|
||
def test_summary_mapper_raises_on_unmapped_glazing_type_label() -> None:
|
||
# Arrange — same strict-coverage gate as the cylinder-size helper
|
||
# (Slice S0380.15 + S0380.16): silently routing an unknown glazing
|
||
# variant to a SAP default int hid the +0.05 SAP regression on 13
|
||
# triple-glazed certs until the cohort-2 first-attempt probe. After
|
||
# this slice, an unrecognised lodging surfaces immediately at
|
||
# extraction time.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000899_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
# Mutate the first window's glazing_type to an unmapped string.
|
||
site_notes.windows[0].glazing_type = "Quintuple glazed with helium"
|
||
|
||
# Act / Assert
|
||
with pytest.raises(UnmappedElmhurstLabel) as excinfo:
|
||
EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
assert excinfo.value.field == "glazing_type"
|
||
assert excinfo.value.value == "Quintuple glazed with helium"
|
||
|
||
|
||
def test_summary_2536_normal_cylinder_routes_to_code_2() -> None:
|
||
# Arrange — cert 2536-2525-0600-0788-2292's Summary §15.1 lodges
|
||
# "Cylinder Size: Normal". The dr87 worksheet lodges "Cylinder
|
||
# Volume 110.00" L on line ref (47); the cascade lookup
|
||
# `_CYLINDER_SIZE_CODE_TO_LITRES` now maps code 2 → 110 L per
|
||
# RdSAP 10 §10.5 Table 28's Normal (90-130 L) band midpoint.
|
||
# First cohort cert to exercise the "Normal" cylinder lodging.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000889_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
assert epc.sap_heating.cylinder_size == 2
|
||
|
||
|
||
def test_summary_9421_normal_cylinder_routes_to_code_2() -> None:
|
||
# Arrange — cert 9421-3045-3205-1646-6200's Summary §15.1 also
|
||
# lodges "Cylinder Size: Normal" (same 110 L cylinder as cert
|
||
# 2536). Second cohort cert exercising the "Normal" mapping —
|
||
# pinned to guard against silent regression of either the mapper
|
||
# dict entry OR the cascade volume default.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000884_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
assert epc.sap_heating.cylinder_size == 2
|
||
|
||
|
||
def test_summary_9418_large_cylinder_routes_to_code_4() -> None:
|
||
# Arrange — cert 9418-3062-8205-3566-7200's Summary §15.1 lodges
|
||
# "Cylinder Size: Large". The dr87 worksheet lodges "Cylinder
|
||
# Volume 210.00" L, and the cascade lookup
|
||
# `_CYLINDER_SIZE_CODE_TO_LITRES = {3: 160.0, 4: 210.0}` maps code
|
||
# 4 → 210 L. Cert 9418 is the first cohort cert to exercise the
|
||
# "Large" cylinder lodging (every other cohort cert is "Medium").
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000902_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Assert
|
||
assert epc.sap_heating.cylinder_size == 4
|
||
|
||
|
||
def test_summary_9418_full_chain_sap_within_spec_floor_of_worksheet() -> None:
|
||
# Arrange — cert 9418-3062-8205-3566-7200 (Summary_000902.pdf):
|
||
# **Daikin EDLQ05CAV3 ASHP** (PCDB index 102421 — distinct from
|
||
# the rest of the cohort's Mitsubishi 104568), end-terrace house
|
||
# with TWO 1.64 kWp PV arrays (N + S), 210 L cylinder.
|
||
# `heating_duration_code='24'` per Table N4 (continuous heating).
|
||
# Worksheet "SAP value" lodges 84.6305.
|
||
#
|
||
# Closes the cohort: the final ASHP cert. The only Summary-mapper
|
||
# gap was the missing "Large" → 4 mapping in
|
||
# `_ELMHURST_CYLINDER_SIZE_LABEL_TO_SAP10` (Slice S0380.14, this
|
||
# commit) — multi-array PV + Large-cylinder were the variants
|
||
# cert 9418 uniquely exercises.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000902_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Act
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert — ±0.07 ASHP-cohort spec-floor tolerance.
|
||
worksheet_unrounded_sap = 84.6305
|
||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < _ASHP_COHORT_CHAIN_TOLERANCE
|
||
|
||
|
||
def test_summary_3800_full_chain_sap_within_spec_floor_of_worksheet() -> None:
|
||
# Arrange — cert 3800-8515-0922-3398-3563 (Summary_000901.pdf /
|
||
# dr87-0001-000901.pdf) is the third ASHP cohort cert to close on
|
||
# the Summary path: Mitsubishi PUZ-WM50VHA ASHP (PCDB 104568).
|
||
# Worksheet "SAP value" lodges 86.1458.
|
||
#
|
||
# **First-try closure — zero new mapper slices required**. The
|
||
# structural work shipped in slices S0380.2..S0380.9 (HP routing,
|
||
# cylinder block, composite walls, multi-array PV, extension
|
||
# inheritance) was already sufficient for cert 3800's variant set.
|
||
# Strong evidence that the Summary mapper has reached completeness
|
||
# for the standard single-bp / single-array ASHP shape.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000901_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Act
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert — ±0.07 ASHP-cohort spec-floor tolerance.
|
||
worksheet_unrounded_sap = 86.1458
|
||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < _ASHP_COHORT_CHAIN_TOLERANCE
|
||
|
||
|
||
def test_summary_9285_full_chain_sap_within_spec_floor_of_worksheet() -> None:
|
||
# Arrange — cert 9285-3062-0205-7766-7200 (Summary_000904.pdf /
|
||
# dr87-0001-000904.pdf) is the fourth ASHP cohort cert to close on
|
||
# the Summary path: Mitsubishi PUZ-WM50VHA ASHP (PCDB 104568).
|
||
# Worksheet "SAP value" lodges 84.1369. Same "first-try closure,
|
||
# zero new slices" disposition as cert 3800 — the cohort's
|
||
# structural mapper completeness is the load-bearing claim.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000904_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Act
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert — ±0.07 ASHP-cohort spec-floor tolerance.
|
||
worksheet_unrounded_sap = 84.1369
|
||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < _ASHP_COHORT_CHAIN_TOLERANCE
|
||
|
||
|
||
def test_summary_0380_full_chain_sap_within_spec_floor_of_worksheet() -> None:
|
||
# Arrange — cert 0380-2471-3250-2596-8761 (Summary_000899.pdf /
|
||
# dr87-0001-000899.pdf) is the first heat-pump cert under per-cert
|
||
# Summary-path mapper validation: Mitsubishi PUZ-WM50VHA ASHP
|
||
# (PCDB index 104568), semi-detached bungalow age D, TFA 60.43 m².
|
||
# Worksheet PDF "SAP value" line lodges unrounded SAP **88.5104**.
|
||
# Slices S0380.2..S0380.6 closed the Summary path from Δ -54.7184
|
||
# to Δ +0.0594 — the same Appendix N3.6 PSR-interpolation
|
||
# precision floor at which the API path closes (commit c0086660
|
||
# slice 102f wired this floor for the full 7-cert ASHP cohort at
|
||
# the same ±0.07 tolerance). Closing further requires calculator
|
||
# work on the PSR interpolation step, not mapper work — the
|
||
# Summary EPC and API EPC produce IDENTICAL cascade outputs at
|
||
# this point (HW kWh, fabric W/K, HLC all match at 1e-4), so the
|
||
# +0.0594 residual is structural to the calculator's HP path for
|
||
# this fixture's PSR.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000899_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
|
||
# Act
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert — ±0.07 ASHP-cohort spec-floor tolerance (matches API
|
||
# path's slice 102f disposition; `_ASHP_COHORT_CHAIN_TOLERANCE`
|
||
# is defined alongside the API-path equivalents below).
|
||
worksheet_unrounded_sap = 88.5104
|
||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < _ASHP_COHORT_CHAIN_TOLERANCE
|
||
|
||
|
||
_API_0330_JSON = (
|
||
Path(__file__).parents[3]
|
||
/ "domain/sap10_calculator/rdsap/tests/fixtures/golden"
|
||
/ "0330-2249-8150-2326-4121.json"
|
||
)
|
||
|
||
_API_9501_JSON = (
|
||
Path(__file__).parents[3]
|
||
/ "domain/sap10_calculator/rdsap/tests/fixtures/golden"
|
||
/ "9501-3059-8202-7356-0204.json"
|
||
)
|
||
|
||
|
||
def test_api_9501_full_chain_sap_matches_worksheet_pdf_exactly() -> None:
|
||
# Arrange — cert 9501 is the third Layer 4 production gate (after
|
||
# cert 001479 and cert 0330): API path → from_api_response →
|
||
# cert_to_inputs → calculate_sap_from_inputs must hit the worksheet
|
||
# SAP at 1e-4. Cert 9501 is the FIRST flat in the production gate
|
||
# set — mid-terrace top-floor flat with RR + measured PV (2.36 kWp
|
||
# SW @ 45°). Worksheet target unrounded SAP **68.5252**.
|
||
#
|
||
# Slices 100a-100c jointly closed the API path from Δ -14.82 to
|
||
# 1e-4: 100a `room_in_roof_details` schema + Detailed-RR surface
|
||
# population (HLC 382.19 → 297.54 W/K vs worksheet 296.68); 100b
|
||
# per-bp TFA includes RR floor area (TFA 81.28 → 113.08); 100c
|
||
# `photovoltaic_supply.pv_arrays` schema + gap-aware glazing
|
||
# lookup (DG pre-2002 16+ → U=2.7 per RdSAP 10 Table 24).
|
||
doc = json.loads(_API_9501_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
|
||
# Act
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert — 1e-4 pin against the worksheet's continuous SAP.
|
||
worksheet_unrounded_sap = 68.5252
|
||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 1e-4
|
||
|
||
|
||
def test_api_9501_photovoltaic_array_surfaced() -> None:
|
||
# Arrange — cert 9501's API JSON lodges measured PV under
|
||
# `sap_energy_source.photovoltaic_supply.pv_arrays`. Two real-API
|
||
# PV shapes coexist: cohort cert 2130 lodges the outer wrapper as
|
||
# a nested list `[[{...}], ...]`; cert 9501 lodges a dict
|
||
# `{"pv_arrays": [{...}]}`. The existing schema models only the
|
||
# legacy `none_or_no_details` field on `PhotovoltaicSupply` — so
|
||
# cert 9501's `pv_arrays` payload was silently dropped, leaving
|
||
# `photovoltaic_arrays=None` and the cascade missing the worksheet's
|
||
# £250.02 PV credit.
|
||
doc = json.loads(_API_9501_JSON.read_text())
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
|
||
# Assert — single array with the lodged kWp/pitch/orientation/
|
||
# overshading values.
|
||
arrays = epc.sap_energy_source.photovoltaic_arrays
|
||
assert arrays is not None
|
||
assert len(arrays) == 1
|
||
assert abs(arrays[0].peak_power - 2.36) <= 1e-4
|
||
assert arrays[0].pitch == 3 # RdSAP §11.1 enum: 3 = 45°
|
||
assert arrays[0].orientation == 6 # SAP octant: SW
|
||
assert arrays[0].overshading == 1 # RdSAP: None or very little
|
||
|
||
|
||
_API_0380_JSON = (
|
||
Path(__file__).parents[3]
|
||
/ "domain/sap10_calculator/rdsap/tests/fixtures/golden"
|
||
/ "0380-2471-3250-2596-8761.json"
|
||
)
|
||
|
||
|
||
def test_api_0380_glazing_type_14_resolves_to_post_2022_dg_u_value() -> None:
|
||
# Arrange — cert 0380 (ASHP semi-detached bungalow, worksheet SAP
|
||
# 88.5104) lodges glazing_type=14 on all windows. The worksheet
|
||
# uses U=1.3258 (post-curtain) for line (27), which back-calculates
|
||
# to a raw U=1.40 — the SAP10.2 Table 24 row for "Double or triple
|
||
# glazed, 2022 or later". Code 13 in our existing dict carries the
|
||
# same U/g values; code 14 is the schema sibling for the same
|
||
# post-2022 product family (DG sealed-unit variants differ in
|
||
# the cert lodgement but agree on the spec U-value).
|
||
doc = json.loads(_API_0380_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
|
||
# Act — pick any window (cert 0380 lodges only glazing_type=14).
|
||
w = epc.sap_windows[0]
|
||
td = w.window_transmission_details
|
||
|
||
# Assert
|
||
assert td is not None
|
||
assert abs(td.u_value - 1.40) <= 1e-4
|
||
assert abs(td.solar_transmittance - 0.72) <= 1e-4
|
||
|
||
|
||
def test_api_0380_wall_with_external_insulation_routes_to_filled_cavity_u() -> None:
|
||
# Arrange — cert 0380's top-level walls[0].description lodges
|
||
# "Cavity wall, filled cavity and external insulation". The
|
||
# worksheet uses U=0.25 for the (29a) external-walls entry — the
|
||
# very-low-U "filled cavity + external insulation" composite that
|
||
# RdSAP 10 §5 routes through Table 6's filled-cavity row (with a
|
||
# further EWI reduction). Our cascade was computing U=0.32 via
|
||
# the as-built Table 13 bucketed cascade because
|
||
# `_described_as_insulated` only matches the past-participle
|
||
# "insulated" — "insulation" (noun) on its own falls through to
|
||
# False. Cert 0380's lodgement uses the noun form.
|
||
#
|
||
# Fix: `_described_as_insulated` should also match the noun
|
||
# "insulation" (excluding the existing "no insulation" hard
|
||
# negation), so cavity walls described as carrying insulation
|
||
# route to the cascade's Filled-cavity branch.
|
||
doc = json.loads(_API_0380_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
|
||
# Act
|
||
from domain.sap10_calculator.rdsap.cert_to_inputs import (
|
||
heat_transmission_section_from_cert,
|
||
)
|
||
ht = heat_transmission_section_from_cert(epc)
|
||
|
||
# Assert — main-wall HLC ≈ 46.46 m² × 0.25 = 11.62 W/K (worksheet
|
||
# exact). Tolerance 1e-2 absorbs sub-component rounding; the
|
||
# 1e-4 chain test downstream tightens to the cascade floor.
|
||
worksheet_walls_w_per_k = 11.62
|
||
assert abs(ht.walls_w_per_k - worksheet_walls_w_per_k) <= 1e-2
|
||
|
||
|
||
def test_api_0380_heat_pump_no_secondary_heating_per_table_11() -> None:
|
||
# Arrange — SAP 10.2 Table 11 explicitly notes "Cat 4 (heat pump):
|
||
# 0.00 (HP eff includes any secondary)" — heat pumps don't apply a
|
||
# Table 11 secondary fraction even when the cert lodges a secondary
|
||
# heating type, because the HP efficiency already incorporates any
|
||
# supplementary heat source. The `_SECONDARY_HEATING_FRACTION_BY_
|
||
# CATEGORY` dict in cert_to_inputs.py had entries for categories
|
||
# 1/2/3/5/6/7/10 but DID NOT include cat 4 — so HP certs with a
|
||
# lodged secondary fell through to the DEFAULT 0.10, billing 10%
|
||
# of space-heating cost as "secondary" (cert 0380: £72 secondary
|
||
# vs worksheet £0).
|
||
#
|
||
# Cert 0380 lodges secondary_heating_type=691 + main_heating_
|
||
# category=4 (HP, PCDB idx 104568). Worksheet line (242) "Space
|
||
# heating - secondary" shows 0.0 kWh; cascade was producing
|
||
# 547.30 kWh. Fix: dict entry `4: 0.0`.
|
||
doc = json.loads(_API_0380_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
|
||
# Act
|
||
from domain.sap10_calculator.calculator import calculate_sap_from_inputs
|
||
from domain.sap10_calculator.rdsap.cert_to_inputs import (
|
||
cert_to_inputs, SAP_10_2_SPEC_PRICES,
|
||
)
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert — secondary heating contributes 0 kWh / £0 on HP certs.
|
||
assert result.secondary_heating_fuel_kwh_per_yr == 0.0
|
||
|
||
|
||
def test_api_0380_heat_pump_no_pumps_fans_kwh_per_table_4f() -> None:
|
||
# Arrange — SAP 10.2 Table 4f lists annual pumps + fans electricity
|
||
# consumption by main heating category. Gas-fired boilers (cat 2)
|
||
# use 160 kWh/yr (115 central heating pump + 45 flue fan). Heat
|
||
# pumps (cat 4) have NO additional pumps/fans contribution because
|
||
# the HP system's circulation pump and fans are already
|
||
# incorporated into the system COP.
|
||
#
|
||
# The cascade's `_PUMPS_FANS_KWH_BY_MAIN_CATEGORY` dict only had a
|
||
# cat-2 entry; cat-4 HP certs fell through to the DEFAULT 130
|
||
# kWh/yr (~£17 at 13.19 p/kWh) — the worksheet line (249) "Pumps,
|
||
# fans and electric keep-hot" shows 0.0000 kWh/yr for cert 0380.
|
||
doc = json.loads(_API_0380_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
|
||
# Act
|
||
from domain.sap10_calculator.calculator import calculate_sap_from_inputs
|
||
from domain.sap10_calculator.rdsap.cert_to_inputs import (
|
||
cert_to_inputs, SAP_10_2_SPEC_PRICES,
|
||
)
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert
|
||
assert result.pumps_fans_kwh_per_yr == 0.0
|
||
|
||
|
||
_API_9418_JSON = (
|
||
Path(__file__).parents[3]
|
||
/ "domain/sap10_calculator/rdsap/tests/fixtures/golden"
|
||
/ "9418-3062-8205-3566-7200.json"
|
||
)
|
||
|
||
|
||
_API_2225_JSON = (
|
||
Path(__file__).parents[3]
|
||
/ "domain/sap10_calculator/rdsap/tests/fixtures/golden"
|
||
/ "2225-3062-8205-2856-7204.json"
|
||
)
|
||
|
||
_API_2636_JSON = (
|
||
Path(__file__).parents[3]
|
||
/ "domain/sap10_calculator/rdsap/tests/fixtures/golden"
|
||
/ "2636-0525-2600-0401-2296.json"
|
||
)
|
||
|
||
|
||
def test_api_2636_cantilever_floor_surfaces_as_exposed_floor() -> None:
|
||
# Arrange — cert 2636 (Mitsubishi ASHP, semi-detached, 2 storeys,
|
||
# property_type=0) has BP0 floor 0 area 39.18 m² and floor 1 area
|
||
# 42.92 m². The 3.74 m² difference is an upper-floor cantilever —
|
||
# worksheet (28b) "Exposed floor Main: 3.74 × 1.20 = 4.4880" treats
|
||
# it per RdSAP Table 20 U_exposed_floor at age-D + no insulation
|
||
# = 1.20 W/m²K.
|
||
#
|
||
# Without the cantilever surfaced, cert 2636 cascade SAP =
|
||
# 86.7514 vs worksheet 86.2641 (Δ +0.49 — by far the largest
|
||
# outlier in the 7-cert ASHP cohort, where the other 6 cluster
|
||
# at ±0.06). Pre-fix HLC drift was -4.51 W/K = 3.74 × 1.20 +
|
||
# 0.15 × 3.74 thermal-bridging contribution on the extra exposed
|
||
# area. Tolerance ±0.07 covers the residual PSR/HLC drift that
|
||
# this cert shares with the 7-cohort cluster (per the slice
|
||
# 102f-prep.10 alt-wall-allocation fix this cert moves from the
|
||
# near-zero cancellation state into the cohort cluster).
|
||
doc = json.loads(_API_2636_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
|
||
# Act — full cert→inputs→calculator cascade
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert — SAP within 0.07 of worksheet 86.2641.
|
||
assert abs(result.sap_score_continuous - 86.2641) < 0.07, (
|
||
f"cascade SAP={result.sap_score_continuous:.4f} vs worksheet 86.2641"
|
||
)
|
||
|
||
|
||
def test_api_2636_alt_wall_openings_deducted_from_alt_not_main() -> None:
|
||
# Arrange — cert 2636 has BP0 with `sap_alternative_wall_1`
|
||
# (area 12.76 m², cavity unfilled at age D → U=0.70) and 7
|
||
# windows. One window (1.14 × 1.04 ≈ 1.19 m²) lodges
|
||
# `window_wall_type=2` → it sits on the alt wall, not main.
|
||
#
|
||
# Per RdSAP §1.4.2 wall openings deduct from the wall they
|
||
# pierce. Worksheet (29a):
|
||
# Main: gross 61.73, openings 14.03, net 47.70 → 0.25 × 47.70 = 11.925
|
||
# Alt.1: gross 12.76, openings 1.19, net 11.57 → 0.70 × 11.57 = 8.099
|
||
# Total walls (29a) = 20.024
|
||
#
|
||
# Pre-fix cascade subtracted ALL openings from the (main+alt)
|
||
# gross then routed the alt at its FULL gross — over-counting
|
||
# alt's contribution by 1.19 × (0.70 − 0.25) ≈ 0.535 W/K, and
|
||
# under-counting main by the matching 1.19 × 0.25 — net +0.535.
|
||
doc = json.loads(_API_2636_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
|
||
# Act — full cascade so windows + doors are read from the cert.
|
||
inputs = cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
|
||
# Assert — worksheet sum 11.925 + 8.099 = 20.024 at 1e-3.
|
||
assert abs(inputs.heat_transmission.walls_w_per_k - 20.024) < 1e-3, (
|
||
f"cascade walls={inputs.heat_transmission.walls_w_per_k:.4f} "
|
||
f"vs worksheet 20.024"
|
||
)
|
||
|
||
|
||
def test_api_2225_no_mixer_lodged_uses_zero_showers_per_worksheet() -> None:
|
||
# Arrange — cert 2225 lodges `mixer_shower_count = None` (the field
|
||
# is unlodged in the API JSON, not "0"). The worksheet (42a) "Hot
|
||
# water usage for mixer showers" shows 0.0000 every month — the
|
||
# Elmhurst convention is "absent ⇒ no shower". Cascade previously
|
||
# defaulted to a single 7 L/min vented mixer when unlodged, which
|
||
# raised (44) daily HW use from 122.89 → 130.56 l/day (Jan) and
|
||
# added ~113 kWh/yr to (62) HW demand. The cohort-modal lodging
|
||
# is 0 (5/7 certs lodge mixer=0 explicitly).
|
||
doc = json.loads(_API_2225_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
|
||
# Act
|
||
inputs = cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
|
||
# Assert — HW fuel kWh tracks worksheet (247) 1634.04 at 1e-1
|
||
# (η_water = 172.85 implies demand 2824.44; fuel = demand / η).
|
||
worksheet_hw_fuel_kwh = 1634.04
|
||
assert abs(inputs.hot_water_kwh_per_yr - worksheet_hw_fuel_kwh) <= 0.1
|
||
|
||
|
||
def test_api_9418_daikin_24h_duration_mean_internal_temp_matches_worksheet_92() -> None:
|
||
# Arrange — cert 9418 (Daikin Altherma EDLQ05CAV3, PCDB 102421)
|
||
# lodges `heating_duration_code = "24"`. Per SAP 10.2 Table N4 (PDF
|
||
# p.107) this means N24,9 = 365 (all days operate at 24-hour
|
||
# heating, no off-period). Worksheet (87) MIT_living = 21.0 every
|
||
# month (= Th1, no off period), worksheet (90) MIT_elsewhere
|
||
# collapses to Th2 directly. Worksheet (92) blended at fLA = 0.30.
|
||
#
|
||
# Pre-slice-102f-prep.7 the helper's "V"-only gate returned None
|
||
# for this duration → bimodal cascade gave MIT ~17.8-19.8 (off by
|
||
# ~2°C). After Table N4 wiring the cascade lands at 1e-3.
|
||
doc = json.loads(_API_9418_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
|
||
# Act
|
||
inputs = cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
|
||
# Assert — worksheet (92) "MIT" 12-tuple at 1e-3 per month.
|
||
worksheet_mit_92 = (
|
||
19.8400, 19.8445, 19.8489, 19.8697, 19.8736, 19.8920,
|
||
19.8920, 19.8954, 19.8849, 19.8736, 19.8657, 19.8574,
|
||
)
|
||
for m, (cascade, ws) in enumerate(zip(
|
||
inputs.mean_internal_temp_monthly_c, worksheet_mit_92
|
||
)):
|
||
assert abs(cascade - ws) < 1e-3, (
|
||
f"month {m + 1}: cascade={cascade:.4f} vs worksheet={ws:.4f}"
|
||
)
|
||
|
||
|
||
def test_api_0380_mean_internal_temp_matches_worksheet_92_within_1e_3() -> None:
|
||
# Arrange — SAP 10.2 Appendix N3.5 (PDF p.107) replaces Table 9c
|
||
# steps 3-4 for heat-pump packages with PCDB data: each month
|
||
# blends Th, T_unimodal, T_bimodal via Equation N5.
|
||
#
|
||
# Cert 0380 (Mitsubishi PUZ-WM50VHA, PCDB 104568, PSR ≈ 1.43)
|
||
# lands on Table N5 row "1.2 or more" → annual totals (3, 38) →
|
||
# Jan(3, 28) + Dec(0, 10) extended days.
|
||
#
|
||
# Pre-slice-102f-prep.6 the cold-month MIT drifted +0.008°C due to
|
||
# `internal_gains_from_cert` injecting the central-heating pump's
|
||
# heating-season gain (~7 W) on HP certs. SAP 10.2 Table 4f
|
||
# specifies zero pump/fan gains on HP packages (cert 0380's
|
||
# worksheet line 70 = 0.0 every month) — that gating drops the
|
||
# spurious gain and tightens the MIT cascade against worksheet
|
||
# (92) to 1e-3 per month.
|
||
doc = json.loads(_API_0380_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
|
||
# Act
|
||
inputs = cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
|
||
# Assert — pin against worksheet line (92) "MIT" 12-tuple.
|
||
worksheet_mit_92 = (
|
||
18.9539, 18.0081, 18.3466, 18.8491, 19.3582, 19.8174,
|
||
20.0288, 20.0064, 19.6975, 19.0702, 18.3966, 18.1573,
|
||
)
|
||
for m, (cascade, ws) in enumerate(zip(
|
||
inputs.mean_internal_temp_monthly_c, worksheet_mit_92
|
||
)):
|
||
assert abs(cascade - ws) < 1e-3, (
|
||
f"month {m + 1}: cascade={cascade:.4f} vs worksheet={ws:.4f}"
|
||
)
|
||
|
||
|
||
def test_api_9501_room_in_roof_surfaces_populated() -> None:
|
||
# Arrange — cert 9501's API JSON lodges measured RR detail under
|
||
# `sap_room_in_roof.room_in_roof_details`: two gable walls
|
||
# (5.51 m × 2.45 m + 6.51 m × 2.45 m) and a flat ceiling (5.5 m ×
|
||
# 1.0 m, 300 mm insulation). The schema's `SapRoomInRoof` dataclass
|
||
# exposed the inner block under the wrong field name
|
||
# `room_in_roof_type_1` (the legacy Simplified Type 1 wrapper),
|
||
# so `from_dict` parsed the inner block as None — the API mapper
|
||
# then built `SapRoomInRoof` with no per-surface area data, and
|
||
# the cascade defaulted to the Simplified Type 2 "all elements"
|
||
# branch (RR floor_area × Table 18 col(4) age-B U=2.30) for the
|
||
# whole RR → roof HLC 149.43 vs worksheet 18.10 (Δ +131).
|
||
doc = json.loads(_API_9501_JSON.read_text())
|
||
|
||
# Act
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
|
||
# Assert — RR surfaces present and match worksheet element table:
|
||
# Gable Wall 1 = 13.50 m², Gable Wall 2 = 15.95 m², Flat Ceiling 1
|
||
# = 5.50 m² (per worksheet §3 element table).
|
||
rir = epc.sap_building_parts[0].sap_room_in_roof
|
||
assert rir is not None
|
||
assert rir.detailed_surfaces is not None
|
||
kinds_by_area = sorted((s.kind, s.area_m2) for s in rir.detailed_surfaces)
|
||
assert kinds_by_area == [
|
||
("flat_ceiling", 5.5),
|
||
("gable_wall_external", 13.50),
|
||
("gable_wall_external", 15.95),
|
||
]
|
||
|
||
|
||
def test_api_0330_full_chain_sap_matches_worksheet_pdf_exactly() -> None:
|
||
# Arrange — cert 0330-2249-8150-2326-4121 (second boiler validation
|
||
# cert: mains-gas Vaillant PCDB idx 10241, mid-terrace 2-bp dwelling,
|
||
# TFA 90.56 m²) has both an Elmhurst Summary PDF and a GOV.UK EPB API
|
||
# JSON. The Summary path lands at 1e-4 vs worksheet SAP 61.5993
|
||
# above; this Layer 4 production gate asserts the API path matches
|
||
# the worksheet to the same 1e-4 tolerance — same forcing function
|
||
# as cert 001479's Layer 4 test, applied to the second boiler cert.
|
||
#
|
||
# Slices 96-99 (flat-roof Table 18 col (3) U-values + glazing_type=2
|
||
# surfacing + shower-outlets list normalisation + window-area
|
||
# rounding alignment) jointly closed the API path from
|
||
# Δ +2.1453 → Δ -0.000011 vs worksheet 61.5993.
|
||
doc = json.loads(_API_0330_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
|
||
# Act
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert — 1e-4 pin against the worksheet's continuous SAP.
|
||
worksheet_unrounded_sap = 61.5993
|
||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 1e-4
|
||
|
||
|
||
def test_api_001479_full_chain_sap_matches_worksheet_pdf_exactly() -> None:
|
||
# Arrange — cert 001479 has both an Elmhurst Summary PDF and a GOV.UK
|
||
# EPB API JSON (ref 0535-9020-6509-0821-6222). The Summary cascade
|
||
# already pins at worksheet's 69.0094 ± 1e-4 above; this test is the
|
||
# Layer 4 production-path gate: API JSON → from_api_response →
|
||
# cert_to_inputs → calculate_sap_from_inputs must also hit 69.0094
|
||
# at 1e-4. Identical inputs must produce identical outputs; the
|
||
# calculator is deterministic, so any drift is a mapper coverage gap.
|
||
doc = json.loads(_API_001479_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
|
||
# Act
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
|
||
# Assert — 1e-4 pin against the worksheet's continuous SAP. ±0.5 is
|
||
# the API-only fallback (project memory `feedback_api_tolerance_1e_
|
||
# minus_4`); when the worksheet is available, identical-inputs-must-
|
||
# produce-identical-outputs is the bar.
|
||
worksheet_unrounded_sap = 69.0094
|
||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 1e-4
|
||
|
||
|
||
# ============================================================================
|
||
# Layer 4 chain tests — 7-cert ASHP cohort
|
||
# ============================================================================
|
||
# These pin the API → from_api_response → cert_to_inputs →
|
||
# calculate_sap_from_inputs cascade against each cert's Elmhurst dr87
|
||
# worksheet unrounded SAP. Tolerance is 0.07 (NOT 1e-4 like the boiler
|
||
# cohort above) — see HANDOVER_CERT_0380_MIT_CASCADE.md for the
|
||
# investigation: BRE web confirmed max_output_kw matches cascade
|
||
# exactly (4.39 / 3.933), cascade (39) annual HLC matches worksheet
|
||
# at 4 dp, but back-solving worksheet η_space implies ~0.15% drift
|
||
# in Elmhurst's internal interpolation precision (likely a vendor
|
||
# rounding convention not in the public SAP 10.2 spec). The 7 certs
|
||
# cluster within +0.030..+0.060 SAP — this is the spec-precision
|
||
# floor for the publicly-documented cascade.
|
||
#
|
||
# At rounded (integer SAP) precision, all 7 cascade integers match
|
||
# the lodged values exactly (residual = 0, pinned in
|
||
# `_GOLDEN_EXPECTATIONS`).
|
||
|
||
_API_0350_JSON = (
|
||
Path(__file__).parents[3]
|
||
/ "domain/sap10_calculator/rdsap/tests/fixtures/golden"
|
||
/ "0350-2968-2650-2796-5255.json"
|
||
)
|
||
_API_3800_JSON = (
|
||
Path(__file__).parents[3]
|
||
/ "domain/sap10_calculator/rdsap/tests/fixtures/golden"
|
||
/ "3800-8515-0922-3398-3563.json"
|
||
)
|
||
_API_9285_JSON = (
|
||
Path(__file__).parents[3]
|
||
/ "domain/sap10_calculator/rdsap/tests/fixtures/golden"
|
||
/ "9285-3062-0205-7766-7200.json"
|
||
)
|
||
|
||
_ASHP_COHORT_CHAIN_TOLERANCE: float = 0.07
|
||
"""SAP-precision floor for the 7-cert ASHP cohort — see handover."""
|
||
|
||
|
||
def test_api_0380_full_chain_sap_within_spec_floor_of_worksheet() -> None:
|
||
# Mitsubishi PUZ-WM50VHA PCDB 104568, semi-detached bungalow age D.
|
||
doc = json.loads(_API_0380_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
assert abs(result.sap_score_continuous - 88.5104) < _ASHP_COHORT_CHAIN_TOLERANCE
|
||
|
||
|
||
def test_api_0350_full_chain_sap_within_spec_floor_of_worksheet() -> None:
|
||
# Mitsubishi PUZ-WM50VHA PCDB 104568.
|
||
doc = json.loads(_API_0350_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
assert abs(result.sap_score_continuous - 84.1367) < _ASHP_COHORT_CHAIN_TOLERANCE
|
||
|
||
|
||
def test_api_2225_full_chain_sap_within_spec_floor_of_worksheet() -> None:
|
||
# Mitsubishi PUZ-WM50VHA PCDB 104568, with PV. Slice 102f-prep.8
|
||
# closed the shower_outlets=None default.
|
||
doc = json.loads(_API_2225_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
assert abs(result.sap_score_continuous - 88.7921) < _ASHP_COHORT_CHAIN_TOLERANCE
|
||
|
||
|
||
def test_api_2636_full_chain_sap_within_spec_floor_of_worksheet() -> None:
|
||
# Mitsubishi PUZ-WM50VHA PCDB 104568, with cantilever + alt wall.
|
||
# Slice 102f-prep.9 (cantilever) + 102f-prep.10 (alt-wall openings).
|
||
doc = json.loads(_API_2636_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
assert abs(result.sap_score_continuous - 86.2641) < _ASHP_COHORT_CHAIN_TOLERANCE
|
||
|
||
|
||
def test_api_3800_full_chain_sap_within_spec_floor_of_worksheet() -> None:
|
||
# Mitsubishi PUZ-WM50VHA PCDB 104568.
|
||
doc = json.loads(_API_3800_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
assert abs(result.sap_score_continuous - 86.1458) < _ASHP_COHORT_CHAIN_TOLERANCE
|
||
|
||
|
||
def test_api_9285_full_chain_sap_within_spec_floor_of_worksheet() -> None:
|
||
# Mitsubishi PUZ-WM50VHA PCDB 104568.
|
||
doc = json.loads(_API_9285_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
assert abs(result.sap_score_continuous - 84.1369) < _ASHP_COHORT_CHAIN_TOLERANCE
|
||
|
||
|
||
def test_api_9418_full_chain_sap_within_spec_floor_of_worksheet() -> None:
|
||
# Daikin Altherma EDLQ05CAV3 PCDB 102421, heating_duration_code='24'
|
||
# (continuous, all days at Th). Slice 102f-prep.7 closed Table N4.
|
||
doc = json.loads(_API_9418_JSON.read_text())
|
||
epc = EpcPropertyDataMapper.from_api_response(doc)
|
||
result = calculate_sap_from_inputs(
|
||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||
)
|
||
assert abs(result.sap_score_continuous - 84.6305) < _ASHP_COHORT_CHAIN_TOLERANCE
|
||
|
||
|
||
# ============================================================================
|
||
# Mapper-vs-hand-built EpcPropertyData diff tests
|
||
# ============================================================================
|
||
# The 6 cohort hand-builts (_elmhurst_worksheet_NNNNNN.build_epc) are the
|
||
# 100%-correct calculator-input ground truth — each cascades to its
|
||
# worksheet PDF's lodged SAP at 1e-4. The chain tests above only assert
|
||
# cascade-output equivalence; the mapper can pass them by producing a
|
||
# *different* EpcPropertyData that happens to cascade to the same number.
|
||
#
|
||
# These tests pin the missing layer: the mapper's EpcPropertyData must
|
||
# match the hand-built's load-bearing fields exactly. Every divergence
|
||
# surfaced here is a mapper coverage gap to close as its own slice.
|
||
#
|
||
# "Load-bearing" = the subset of EpcPropertyData fields that drive the
|
||
# SAP cascade or carry semantic cross-mapper meaning. Cert-metadata
|
||
# fields (address, registration dates, descriptive EnergyElement lists,
|
||
# tariff strings) are excluded because they don't change calculator
|
||
# output and vary by mapper pathway (the API publishes some, the
|
||
# Elmhurst Summary publishes others) without semantic disagreement.
|
||
|
||
# SapWindow sub-fields the cascade doesn't read (descriptive Union[int,
|
||
# str] codes lodged differently by each mapper). The cascade reads
|
||
# window_width / window_height / orientation / window_location /
|
||
# frame_factor / window_transmission_details.{u_value,solar_
|
||
# transmittance} — those WILL still be diffed; everything else on
|
||
# SapWindow is metadata and excluded to avoid noise from the int/str
|
||
# dual encoding (API mapper produces int codes; Elmhurst mapper
|
||
# surfaces the Summary's lodged strings).
|
||
_NON_LOAD_BEARING_WINDOW_SUBFIELDS: frozenset[str] = frozenset({
|
||
"frame_material",
|
||
"glazing_gap",
|
||
"window_type",
|
||
"glazing_type",
|
||
"window_wall_type",
|
||
"draught_proofed",
|
||
"permanent_shutters_present",
|
||
"permanent_shutters_insulated",
|
||
})
|
||
|
||
|
||
def _is_excluded_path(path: str) -> bool:
|
||
"""Return True for paths the diff should silently skip — non-cascade-
|
||
affecting Union[int, str] encoding differences between the API and
|
||
Elmhurst mapper outputs that cohort hand-built fixtures don't pin."""
|
||
if path.startswith("sap_windows[") and "]." in path:
|
||
suffix = path.split("].", 1)[1]
|
||
if suffix in _NON_LOAD_BEARING_WINDOW_SUBFIELDS:
|
||
return True
|
||
if suffix == "window_transmission_details.data_source":
|
||
return True
|
||
# `roof_construction_type` is set by the Elmhurst mapper from
|
||
# `roof.roof_type` (e.g. "Pitched (slates/tiles), access to loft") and
|
||
# left None by the cohort hand-builts. The cascade in
|
||
# `heat_transmission.py:562` only dispatches on the "sloping ceiling"
|
||
# substring (RdSAP §3.8); none of the cohort certs lodge pitched-
|
||
# sloping-ceiling roofs, so both values produce identical cascade
|
||
# output. Exclude from the diff to avoid flagging informational drift.
|
||
if path.startswith("sap_building_parts[") and path.endswith(".roof_construction_type"):
|
||
return True
|
||
# `sap_ventilation.has_suspended_timber_floor` and
|
||
# `..._sealed` are set explicitly on the hand-builts (to mirror the
|
||
# cohort U985 worksheets' (12) infiltration values) but left None by
|
||
# the Elmhurst mapper because the Summary PDF doesn't surface floor-
|
||
# construction in a parseable form. When None, `cert_to_inputs._
|
||
# has_suspended_timber_floor_per_spec` infers the value mechanically
|
||
# from per-bp floor-construction data — producing the same cascade
|
||
# output the explicit-bool hand-built path produces for cohort 000477
|
||
# / 000516 (where the spec inference and the worksheet agree). Where
|
||
# the spec inference and worksheet disagree (cohort 000474, 000480,
|
||
# 000487, 000490), the chain SAP-pin tests fail separately — that's
|
||
# a known Elmhurst-worksheet-vs-RdSAP-10 §5 (12) divergence, not a
|
||
# mapper diff issue.
|
||
if path == "sap_ventilation.has_suspended_timber_floor":
|
||
return True
|
||
if path == "sap_ventilation.suspended_timber_floor_sealed":
|
||
return True
|
||
return False
|
||
|
||
|
||
_LOAD_BEARING_FIELDS: tuple[str, ...] = (
|
||
# Cascade-driving structural fields
|
||
"sap_building_parts",
|
||
"sap_windows",
|
||
"sap_roof_windows",
|
||
"sap_heating",
|
||
"sap_ventilation",
|
||
"sap_energy_source",
|
||
"total_floor_area_m2",
|
||
# Building-classification fields driving default cascades
|
||
"dwelling_type",
|
||
"built_form",
|
||
"property_type",
|
||
"country_code",
|
||
"postcode",
|
||
# Counts and openings
|
||
"door_count",
|
||
"insulated_door_count",
|
||
"insulated_door_u_value",
|
||
"habitable_rooms_count",
|
||
"heated_rooms_count",
|
||
"wet_rooms_count",
|
||
"extensions_count",
|
||
"open_chimneys_count",
|
||
"blocked_chimneys_count",
|
||
"extract_fans_count",
|
||
# Lighting
|
||
"cfl_fixed_lighting_bulbs_count",
|
||
"led_fixed_lighting_bulbs_count",
|
||
"incandescent_fixed_lighting_bulbs_count",
|
||
"low_energy_fixed_lighting_bulbs_count",
|
||
"fixed_lighting_outlets_count",
|
||
"low_energy_fixed_lighting_outlets_count",
|
||
# HW / appliances
|
||
"solar_water_heating",
|
||
"has_hot_water_cylinder",
|
||
"has_fixed_air_conditioning",
|
||
"has_conservatory",
|
||
"has_heated_separate_conservatory",
|
||
# Envelope drivers
|
||
"percent_draughtproofed",
|
||
"mechanical_ventilation",
|
||
"pressure_test",
|
||
# Construction-detail flags
|
||
"addendum",
|
||
"lzc_energy_sources",
|
||
"any_unheated_rooms",
|
||
"number_of_storeys",
|
||
"sap_flat_details",
|
||
)
|
||
|
||
|
||
def _diff_load_bearing(
|
||
mapped: object, hand_built: object, path: str = "",
|
||
) -> list[str]:
|
||
"""Recursive field diff; yields one line per leaf divergence between
|
||
mapped EpcPropertyData and the hand-built fixture. Int/float type
|
||
differences with the same numeric value are not flagged.
|
||
|
||
Strict-pyright posture: arguments typed `object` so each branch
|
||
narrows via `isinstance` rather than threading `Any` through the
|
||
recursion (which pyright can't reason about under
|
||
`strict`/`typeCheckingMode = strict`)."""
|
||
out: list[str] = []
|
||
if type(mapped) is not type(hand_built):
|
||
if not (isinstance(mapped, (int, float)) and isinstance(hand_built, (int, float))):
|
||
if not _is_excluded_path(path):
|
||
out.append(
|
||
f"{path}: TYPE {type(mapped).__name__} vs "
|
||
f"{type(hand_built).__name__} mapped={mapped!r} "
|
||
f"handbuilt={hand_built!r}"
|
||
)
|
||
return out
|
||
if dataclasses.is_dataclass(mapped) and not isinstance(mapped, type) \
|
||
and dataclasses.is_dataclass(hand_built) and not isinstance(hand_built, type):
|
||
for fld in dataclasses.fields(mapped):
|
||
out.extend(_diff_load_bearing(
|
||
getattr(mapped, fld.name),
|
||
getattr(hand_built, fld.name),
|
||
f"{path}.{fld.name}" if path else fld.name,
|
||
))
|
||
return out
|
||
if isinstance(mapped, list) and isinstance(hand_built, list):
|
||
mapped_list = cast("list[object]", mapped)
|
||
hand_built_list = cast("list[object]", hand_built)
|
||
if len(mapped_list) != len(hand_built_list):
|
||
out.append(f"{path}: LEN {len(mapped_list)} vs {len(hand_built_list)}")
|
||
return out
|
||
for i, (m_item, h_item) in enumerate(zip(mapped_list, hand_built_list)):
|
||
out.extend(_diff_load_bearing(m_item, h_item, f"{path}[{i}]"))
|
||
return out
|
||
if mapped != hand_built:
|
||
if not _is_excluded_path(path):
|
||
out.append(f"{path}: mapped={mapped!r} handbuilt={hand_built!r}")
|
||
return out
|
||
|
||
|
||
def test_from_elmhurst_site_notes_matches_hand_built_000474() -> None:
|
||
# Arrange — _elmhurst_worksheet_000474.build_epc() is the canonical
|
||
# hand-built EpcPropertyData for cert U985-0001-000474; it cascades
|
||
# to the worksheet PDF's `SAP value 62.2584` at 1e-4 (cohort SAP-
|
||
# result pin). Routing the corresponding Summary PDF through the
|
||
# Elmhurst mapper MUST produce a load-bearing-field-equivalent
|
||
# EpcPropertyData; any divergence is a mapper-coverage gap.
|
||
#
|
||
# Tracer-bullet scope: cert 000474 only. Once GREEN, parametrize
|
||
# over the 5 other cohort fixtures and add cert 001479 (after
|
||
# `_elmhurst_worksheet_001479` lands at 1e-4 via Slice 62 iteration).
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000474_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
mapped = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
hand_built = _w000474.build_epc()
|
||
|
||
# Act
|
||
diffs: list[str] = []
|
||
for field_name in _LOAD_BEARING_FIELDS:
|
||
diffs.extend(_diff_load_bearing(
|
||
getattr(mapped, field_name, None),
|
||
getattr(hand_built, field_name, None),
|
||
field_name,
|
||
))
|
||
|
||
# Assert
|
||
assert not diffs, (
|
||
f"{len(diffs)} load-bearing divergence(s) between mapped and "
|
||
f"hand-built EpcPropertyData for cohort cert 000474:\n " +
|
||
"\n ".join(diffs)
|
||
)
|
||
|
||
|
||
def test_from_elmhurst_site_notes_matches_hand_built_000477() -> None:
|
||
# Arrange — _elmhurst_worksheet_000477.build_epc() is the canonical
|
||
# hand-built EpcPropertyData for cert U985-0001-000477 (single-bp
|
||
# mid-terrace, age band B, RIR with stud walls + party gables, no
|
||
# extension); it cascades to the worksheet PDF's `SAP value 65.0057`
|
||
# at 1e-4. Routing the Summary PDF through the Elmhurst mapper MUST
|
||
# produce a load-bearing-field-equivalent EpcPropertyData; any
|
||
# divergence is a mapper-coverage gap to close as its own slice.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000477_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
mapped = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
hand_built = _w000477.build_epc()
|
||
|
||
# Act
|
||
diffs: list[str] = []
|
||
for field_name in _LOAD_BEARING_FIELDS:
|
||
diffs.extend(_diff_load_bearing(
|
||
getattr(mapped, field_name, None),
|
||
getattr(hand_built, field_name, None),
|
||
field_name,
|
||
))
|
||
|
||
# Assert
|
||
assert not diffs, (
|
||
f"{len(diffs)} load-bearing divergence(s) between mapped and "
|
||
f"hand-built EpcPropertyData for cohort cert 000477:\n " +
|
||
"\n ".join(diffs)
|
||
)
|
||
|
||
|
||
def test_from_elmhurst_site_notes_matches_hand_built_000480() -> None:
|
||
# Arrange — _elmhurst_worksheet_000480.build_epc() is the canonical
|
||
# hand-built EpcPropertyData for cert U985-0001-000480 (mid-terrace
|
||
# with main + 1 extension + 19.83 m² RIR, gas combi); it cascades
|
||
# to the worksheet PDF's `SAP value 61.2986` at 1e-4. Routing the
|
||
# Summary PDF through the Elmhurst mapper MUST produce a load-
|
||
# bearing-field-equivalent EpcPropertyData; any divergence is a
|
||
# mapper-coverage gap to close as its own slice.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000480_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
mapped = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
hand_built = _w000480.build_epc()
|
||
|
||
# Act
|
||
diffs: list[str] = []
|
||
for field_name in _LOAD_BEARING_FIELDS:
|
||
diffs.extend(_diff_load_bearing(
|
||
getattr(mapped, field_name, None),
|
||
getattr(hand_built, field_name, None),
|
||
field_name,
|
||
))
|
||
|
||
# Assert
|
||
assert not diffs, (
|
||
f"{len(diffs)} load-bearing divergence(s) between mapped and "
|
||
f"hand-built EpcPropertyData for cohort cert 000480:\n " +
|
||
"\n ".join(diffs)
|
||
)
|
||
|
||
|
||
def test_from_elmhurst_site_notes_matches_hand_built_000487() -> None:
|
||
# Arrange — _elmhurst_worksheet_000487.build_epc() is the canonical
|
||
# hand-built EpcPropertyData for cert U985-0001-000487 (Enclosed
|
||
# Mid-Terrace, main + 1 extension + 21.03 m² RIR with explicit-U
|
||
# gable_wall_external, gas combi, 1 electric shower, 1.43 m²
|
||
# timber-frame alt wall on the extension); it cascades to the
|
||
# worksheet PDF's `SAP value 61.6431` at 1e-4. Routing the Summary
|
||
# PDF through the Elmhurst mapper MUST produce a load-bearing-
|
||
# field-equivalent EpcPropertyData; any divergence is a mapper-
|
||
# coverage gap to close as its own slice.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000487_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
mapped = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
hand_built = _w000487.build_epc()
|
||
|
||
# Act
|
||
diffs: list[str] = []
|
||
for field_name in _LOAD_BEARING_FIELDS:
|
||
diffs.extend(_diff_load_bearing(
|
||
getattr(mapped, field_name, None),
|
||
getattr(hand_built, field_name, None),
|
||
field_name,
|
||
))
|
||
|
||
# Assert
|
||
assert not diffs, (
|
||
f"{len(diffs)} load-bearing divergence(s) between mapped and "
|
||
f"hand-built EpcPropertyData for cohort cert 000487:\n " +
|
||
"\n ".join(diffs)
|
||
)
|
||
|
||
|
||
def test_from_elmhurst_site_notes_matches_hand_built_000490() -> None:
|
||
# Arrange — _elmhurst_worksheet_000490.build_epc() is the canonical
|
||
# hand-built EpcPropertyData for cert U985-0001-000490 (End-Terrace,
|
||
# main + 1 extension, gas combi + gas-secondary; sheltered_sides=1
|
||
# per RdSAP §S5); it cascades to the worksheet PDF's `SAP value
|
||
# 57.3979` at 1e-4. Routing the Summary PDF through the Elmhurst
|
||
# mapper MUST produce a load-bearing-field-equivalent
|
||
# EpcPropertyData; any divergence is a mapper-coverage gap.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000490_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
mapped = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
hand_built = _w000490.build_epc()
|
||
|
||
# Act
|
||
diffs: list[str] = []
|
||
for field_name in _LOAD_BEARING_FIELDS:
|
||
diffs.extend(_diff_load_bearing(
|
||
getattr(mapped, field_name, None),
|
||
getattr(hand_built, field_name, None),
|
||
field_name,
|
||
))
|
||
|
||
# Assert
|
||
assert not diffs, (
|
||
f"{len(diffs)} load-bearing divergence(s) between mapped and "
|
||
f"hand-built EpcPropertyData for cohort cert 000490:\n " +
|
||
"\n ".join(diffs)
|
||
)
|
||
|
||
|
||
def test_from_elmhurst_site_notes_matches_hand_built_000516() -> None:
|
||
# Arrange — _elmhurst_worksheet_000516.build_epc() is the canonical
|
||
# hand-built EpcPropertyData for cert U985-0001-000516 (Mid-Terrace,
|
||
# main + 19.02 m² RIR, 5 vertical windows + 1 roof window which the
|
||
# mapper routes to `sap_roof_windows` per `U > 3.0` discrimination);
|
||
# it cascades to the worksheet PDF's `SAP value 62.7937` at 1e-4.
|
||
# Routing the Summary PDF through the Elmhurst mapper MUST produce
|
||
# a load-bearing-field-equivalent EpcPropertyData.
|
||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000516_PDF)
|
||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||
mapped = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||
hand_built = _w000516.build_epc()
|
||
|
||
# Act
|
||
diffs: list[str] = []
|
||
for field_name in _LOAD_BEARING_FIELDS:
|
||
diffs.extend(_diff_load_bearing(
|
||
getattr(mapped, field_name, None),
|
||
getattr(hand_built, field_name, None),
|
||
field_name,
|
||
))
|
||
|
||
# Assert
|
||
assert not diffs, (
|
||
f"{len(diffs)} load-bearing divergence(s) between mapped and "
|
||
f"hand-built EpcPropertyData for cohort cert 000516:\n " +
|
||
"\n ".join(diffs)
|
||
)
|