S0380.191: pin simulated 001431 gas-combi end-to-end at 1e-4 (e2e harness)

Adds the user-simulated 001431 case (the cert that drove S0380.189/.190)
as an Elmhurst-only e2e fixture: Summary PDF → extractor → mapper →
calculator, every Block-1 SapResult field pinned against the
P960-0001-001431 worksheet at abs=1e-4. All 11 pins pass with zero
residual — the case is clean, confirming the S0380.190 gas-combi fuel
derivation closes the Summary path natively.

Verified the handover's flagged "+0.0007 SAP" was a target artifact, not
a cascade gap: the worksheet displays ECF (257) rounded to 1.6047 and
integer SAP (258)=78; the cascade's continuous SAP is computed from the
UNROUNDED ECF = (255)*(256)/((4)+45) = 660.9750*0.4200/173.0, giving
77.6147 — which matches the worksheet's own unrounded value. Pinning the
continuous SAP from the display-rounded ECF (→ 77.6144) was the wrong
target. Block-1 line refs all match exactly: (211) 10699.7225, (219)
3327.1592, (231) 86.0, (232) 283.2229, (255) 660.9750, (272) 3000.1664,
Σ(98) 8987.7669.

Summary mirrored into the tracked fixtures dir as
Summary_001431_gas_combi.pdf (distinct name — the corpus reuses cert
001431 across every heating variant); source Summary + worksheet tracked
under sap worksheets/golden fixture debugging/ as the pin ground truth.

2302 passed (+11), 0 failed; pyright net-zero on new/changed files.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-06-02 22:44:32 +00:00 committed by Jun-te Kim
parent 306dd4c0c9
commit 846952f7cd
5 changed files with 147 additions and 0 deletions

View file

@ -0,0 +1,126 @@
"""Mapper-driven cascade pin against the Elmhurst P960-0001-001431
"simulated case 1" worksheet (gas-combi archetype).
Like 000565, this fixture does NOT hand-build the EpcPropertyData. It
routes the Summary PDF through ElmhurstSiteNotesExtractor +
EpcPropertyDataMapper.from_elmhurst_site_notes so the SAP-result pin
grid exercises the WHOLE extractor + mapper + calculator pipeline.
This is the cert that motivated S0380.190 the newer Elmhurst export
lodges the gas combi as §14.0 "Fuel Type" EMPTY + "Main Heating SAP
Code" 104 (condensing combi, EES "BGW"), with the carrier ("Mains
gas") only in §15.0 "Water Heating Fuel Type". Before S0380.190 the
mapper left `main_fuel_type=''` `cert_to_inputs` raised
`MissingMainFuelType`; `_elmhurst_gas_boiler_main_fuel` now derives
mains gas (code 26) from §15.0 per SAP 10.2 Table 4b (rows 101-119 are
gas-family boilers; the §15.0 fuel disambiguates the carrier because
the combi heats space + water from one appliance).
It is also the cert that motivated S0380.189 (thermal mass parameter
per RdSAP 10 §5.16 Table 22): solid brick WITH internal insulation
TMP 100, not the previously-hardcoded 250.
Source: user-simulated PDFs at `sap worksheets/golden fixture
debugging/simulated case 1/` (Summary_001431 (1).pdf input +
P960-0001-001431 - 2026-06-02T221203.958.pdf worksheet). The Summary
is mirrored into the tracked
`backend/documents_parser/tests/fixtures/Summary_001431_gas_combi.pdf`
(distinct name the corpus reuses cert 001431 across every heating
variant) so the test runs without depending on the unstaged workspace.
Cert shape (Summary §1-19): gas-combi mid-terrace, TFA 128 , solid
brick WITH internal insulation ( Table 22 TMP 100), no PV, no
secondary heating, no cylinder (combi instantaneous HW, WHC HWP / SAP
code 901). Condensing combi SAP code 104, EES "BGW".
Worksheet pin targets (P960-0001-001431 958.pdf, Block 1 energy
rating, lines 115-410; the second "FOR IMPROVED DWELLING" block is the
potential rating and is NOT pinned):
- SAP rating 78 (line 258)
- Energy cost factor 1.6047 (line 257; cascade carries it unrounded as
(255)*(256)/((4)+45) = 660.9750*0.4200/173.0 the continuous SAP
100 - 13.95*ECF is reconstructed from the unrounded ECF, NOT the
display-rounded 1.6047, so sap_score_continuous = 77.6147)
- Total fuel cost £660.9750 (line 255)
- CO2 3000.1664 kg/year (line 272)
- Space heating 8987.7669 kWh/year (Σ monthly (98))
- Main 1 fuel 10699.7225 kWh/year (line 211) mains gas
- Secondary fuel 0.0 (line 215)
- Hot water fuel 3327.1592 kWh/year (line 219) combi
- Lighting 283.2229 kWh/year (line 232)
- Pumps/fans 86.0 kWh/year (line 231)
Per [[feedback-zero-error-strict]] + [[feedback-e2e-validation-
philosophy]]: pins are abs=1e-4 against the worksheet PDF. Failing
pins are named extractor / mapper / calculator gaps to fix.
"""
from __future__ import annotations
import re
import subprocess
from pathlib import Path
from typing import Final
from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor
from datatypes.epc.domain.epc_property_data import EpcPropertyData
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
# parents[0]=worksheet/, [1]=sap10_calculator/, [2]=domain/, [3]=tests/,
# [4]=repo root.
_SUMMARY_PDF: Final[Path] = (
Path(__file__).resolve().parents[4]
/ "backend" / "documents_parser" / "tests" / "fixtures"
/ "Summary_001431_gas_combi.pdf"
)
def _summary_pdf_to_textract_style_pages(pdf_path: Path) -> list[str]:
"""Convert a Summary PDF into the per-page text format the
ElmhurstSiteNotesExtractor expects (label\\nvalue sequences).
Mirror of the helper in `backend/documents_parser/tests/
test_summary_pdf_mapper_chain.py::_summary_pdf_to_textract_style_
pages` (and `_elmhurst_worksheet_000565.py`). `pdftotext -layout`
preserves the spatial label/value pairing on each line; we split on
2+ spaces to surface the tokens, then rejoin newline-delimited.
"""
info = subprocess.run(
["pdfinfo", str(pdf_path)], capture_output=True, text=True, check=True,
).stdout
m = re.search(r"Pages:\s+(\d+)", info)
if m is None:
raise RuntimeError(f"Could not parse page count from {pdf_path}")
page_count = int(m.group(1))
pages: list[str] = []
for i in range(1, page_count + 1):
layout = subprocess.run(
[
"pdftotext", "-layout", "-f", str(i), "-l", str(i),
str(pdf_path), "-",
],
capture_output=True, text=True, check=True,
).stdout
tokens: list[str] = []
for line in layout.splitlines():
if not line.strip():
tokens.append("")
continue
parts = [p for p in re.split(r"\s{2,}", line.strip()) if p]
tokens.extend(parts)
pages.append("\n".join(tokens))
return pages
def build_epc() -> EpcPropertyData:
"""Route the simulated 001431 Summary through extractor + mapper.
No hand-built EpcPropertyData the extractor and mapper are part of
the test target. Exercises the S0380.190 gas-combi fuel derivation
(§14.0 Fuel Type empty + SAP code 104 mains gas via §15.0).
"""
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_PDF)
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
return EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)

View file

@ -37,6 +37,7 @@ from tests.domain.sap10_calculator.worksheet import (
_elmhurst_worksheet_000490 as _w000490,
_elmhurst_worksheet_000516 as _w000516,
_elmhurst_worksheet_000565 as _w000565,
_elmhurst_worksheet_001431 as _w001431,
)
from tests.domain.sap10_calculator.worksheet._elmhurst_fixtures import (
ALL_FIXTURES as _ELMHURST_FIXTURES,
@ -147,6 +148,25 @@ _FIXTURE_PINS: Final[dict[str, FixtureCascadePins]] = {
lighting_kwh_per_yr=1384.8353,
pumps_fans_kwh_per_yr=252.5159,
),
# Mapper-driven cohort entry — Summary_001431_gas_combi.pdf →
# extractor → mapper → calculator. Gas-combi mid-terrace, TFA 128,
# solid brick WITH internal insulation (Table 22 TMP 100), no PV /
# secondary / cylinder. The cert that motivated S0380.190 (gas-combi
# fuel from §15.0 when §14.0 Fuel Type is empty + SAP code 104) and
# S0380.189 (thermal mass parameter). Pins are worksheet Block 1
# (energy rating) line refs. sap_score_continuous is reconstructed
# from the UNROUNDED ECF ((255)*(256)/((4)+45)), not the display-
# rounded (257)=1.6047 — see the fixture module docstring.
"001431": FixtureCascadePins(
sap_score=78, sap_score_continuous=77.6147, ecf=1.6047,
total_fuel_cost_gbp=660.9750, co2_kg_per_yr=3000.1664,
space_heating_kwh_per_yr=8987.7669,
main_heating_fuel_kwh_per_yr=10699.7225,
secondary_heating_fuel_kwh_per_yr=0.0,
hot_water_kwh_per_yr=3327.1592,
lighting_kwh_per_yr=283.2229,
pumps_fans_kwh_per_yr=86.0,
),
}
@ -158,6 +178,7 @@ _FIXTURE_MODULES: Final[dict[str, ModuleType]] = {
"000490": _w000490,
"000516": _w000516,
"000565": _w000565,
"001431": _w001431,
}