test(worksheet): pin simulated case 52 — regular gas boiler + cylinder

Adds the mapper-driven e2e cascade pin for "simulated case 52" (000565
semi + regular non-combi mains-gas boiler SAP 102 + 160 L foam cylinder
heated from the main, no cylinder stat, uninsulated primary pipework,
standard tariff). Routes the Summary PDF through extractor + mapper +
calculator like the other 000565 / 001431_case* fixtures.

This closes the last untested branch of the cylinder/water chain: the
SAP 10.2 §4 cylinder storage loss (Table 2/2a/2b lines 51-55) + the
Table 3 PRIMARY circuit loss (59, uninsulated pipework + no stat) that
combi/immersion fixtures don't reach. All 11 SAP-result fields reconcile
to the U985 worksheet EXACTLY with no calculator change — SAP 57.2904
(=57), cost £911.1973, water 3929.7635 kWh — confirming the cylinder-loss
derivation is correct.

Summary mirrored to the tracked fixtures dir so the test doesn't depend
on the unstaged `sap worksheets/` workspace.

pyright strict gate not run locally (pyright not installed in this container).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-06-24 08:50:19 +00:00
parent cd5113abf2
commit f3e3494bf7
3 changed files with 129 additions and 0 deletions

Binary file not shown.

View file

@ -0,0 +1,110 @@
"""Mapper-driven cascade pin against the Elmhurst U985-0001-000565
"simulated case 52" worksheet a REGULAR (non-combi) mains-gas boiler
feeding a hot-water CYLINDER, on standard tariff.
Like 000565 / the _rr cases, this fixture does NOT hand-build the
EpcPropertyData: it routes the Summary PDF through
ElmhurstSiteNotesExtractor + from_elmhurst_site_notes so the SAP-result
pin grid exercises the WHOLE extractor + mapper + calculator pipeline.
This case was hand-built (Khalim) to ground-truth the LAST untested
branch of the cylinder/water-heating chain that the combi/immersion
fixtures don't reach: a regular boiler + cylinder heated from the main,
exercising SAP 10.2 §4
- Cylinder storage loss via Table 2 loss factor (51) 0.0181, Table 2a
volume factor (52) 0.9086, Table 2b temperature factor (53) 0.7020
(NO cylinder thermostat) (55) 1.8466.
- PRIMARY circuit loss (59) 128.3772 (winter) the Table 3 path for
UNINSULATED primary pipework + no cylinder stat. Case 50 (immersion,
no boiler) couldn't reach this branch.
- Combi loss (61) correctly 0 (regular boiler, not a combi).
The whole chain reconciles to the U985 worksheet EXACTLY with no
calculator change it pins the cylinder-loss derivation as correct.
Cert shape: 000565 semi shell, single main = mains-gas REGULAR boiler
(SAP code 102, control 2106 programmer + room stat + TRVs), water
heating from main (WHC 901) via a 160 L foam-insulated cylinder (no
cylinder stat, uninsulated primary pipework, in heated space), one
instantaneous electric shower, no secondary, no PV, standard tariff.
Source: user-simulated PDFs at `sap worksheets/golden fixture debugging/
simulated case 52/`. The Summary is mirrored into the tracked
`backend/documents_parser/tests/fixtures/Summary_000565_case52.pdf` so the
test runs without depending on the unstaged workspace.
Worksheet pin targets (U985-0001-000565 block 1 existing dwelling SAP):
- SAP rating 57 (258); continuous 57.2904; ECF 3.0616 (257)
- Total fuel cost £911.1973 (255)
- Total CO2 3834.8434 kg/year (272)
- Space heating 10563.5170 kWh/year ((98c))
- Main 1 fuel 13371.5405 kWh/year (211)
- Secondary fuel 0.0 kWh/year (215)
- Hot water fuel 3929.7635 kWh/year (219)
- Lighting 435.3204 kWh/year (232)
- Pumps/fans 401.6384 kWh/year (231)
Per [[feedback-zero-error-strict]] + [[feedback-e2e-validation-
philosophy]]: pins are abs=1e-4 against the worksheet PDF. The pin
values live in `test_e2e_elmhurst_sap_score._FIXTURE_PINS`.
"""
from __future__ import annotations
import re
import subprocess
from pathlib import Path
from typing import Final
from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor
from datatypes.epc.domain.epc_property_data import EpcPropertyData
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
# parents[0]=worksheet/, [1]=sap10_calculator/, [2]=domain/, [3]=tests/,
# [4]=repo root.
_SUMMARY_PDF: Final[Path] = (
Path(__file__).resolve().parents[4]
/ "backend" / "documents_parser" / "tests" / "fixtures"
/ "Summary_000565_case52.pdf"
)
def _summary_pdf_to_textract_style_pages(pdf_path: Path) -> list[str]:
"""Convert a Summary PDF into the per-page text format the
ElmhurstSiteNotesExtractor expects (label\\nvalue sequences). Mirror
of the helper in the other `_elmhurst_worksheet_*` fixtures.
"""
info = subprocess.run(
["pdfinfo", str(pdf_path)], capture_output=True, text=True, check=True,
).stdout
m = re.search(r"Pages:\s+(\d+)", info)
if m is None:
raise RuntimeError(f"Could not parse page count from {pdf_path}")
page_count = int(m.group(1))
pages: list[str] = []
for i in range(1, page_count + 1):
layout = subprocess.run(
[
"pdftotext", "-layout", "-f", str(i), "-l", str(i),
str(pdf_path), "-",
],
capture_output=True, text=True, check=True,
).stdout
tokens: list[str] = []
for line in layout.splitlines():
if not line.strip():
tokens.append("")
continue
parts = [p for p in re.split(r"\s{2,}", line.strip()) if p]
tokens.extend(parts)
pages.append("\n".join(tokens))
return pages
def build_epc() -> EpcPropertyData:
"""Route the simulated case-52 Summary through extractor + mapper.
No hand-built EpcPropertyData the extractor and mapper are part of
the test target.
"""
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_PDF)
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
return EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)

View file

@ -48,6 +48,7 @@ from tests.domain.sap10_calculator.worksheet import (
_elmhurst_worksheet_001431_case6 as _w001431_case6,
_elmhurst_worksheet_001431_case7 as _w001431_case7,
_elmhurst_worksheet_001431_case20 as _w001431_case20,
_elmhurst_worksheet_000565_case52 as _w000565_case52,
)
from tests.domain.sap10_calculator.worksheet._elmhurst_fixtures import (
ALL_FIXTURES as _ELMHURST_FIXTURES,
@ -296,6 +297,23 @@ _FIXTURE_PINS: Final[dict[str, FixtureCascadePins]] = {
lighting_kwh_per_yr=246.3083,
pumps_fans_kwh_per_yr=0.0,
),
# Mapper-driven — Summary_000565_case52.pdf → extractor → mapper →
# calculator. Regular (non-combi) mains-gas boiler (SAP 102) + a
# 160 L foam cylinder heated from the main (WHC 901), no cylinder
# stat + uninsulated primary pipework, standard tariff. Validates the
# cylinder storage loss (51-55) + PRIMARY loss (59) chain — the
# branch immersion/combi fixtures can't reach. Reconciles to the
# worksheet EXACTLY with no calculator change.
"000565_case52": FixtureCascadePins(
sap_score=57, sap_score_continuous=57.2904, ecf=3.0616,
total_fuel_cost_gbp=911.1973, co2_kg_per_yr=3834.8434,
space_heating_kwh_per_yr=10563.5170,
main_heating_fuel_kwh_per_yr=13371.5405,
secondary_heating_fuel_kwh_per_yr=0.0,
hot_water_kwh_per_yr=3929.7635,
lighting_kwh_per_yr=435.3204,
pumps_fans_kwh_per_yr=401.6384,
),
}
@ -315,6 +333,7 @@ _FIXTURE_MODULES: Final[dict[str, ModuleType]] = {
"001431_case6": _w001431_case6,
"001431_case7": _w001431_case7,
"001431_case20": _w001431_case20,
"000565_case52": _w000565_case52,
}