S0380.194: pin sim case 3 (near-exact 6035 replica) e2e at 1e-4

Adds the user-simulated case-3 worksheet as e2e fixture `001431_rr8` —
Main + Extension + Simplified room-in-roof with 8 windows (≈14.15 m²,
reproducing golden cert 6035's glazing) and Main ground-floor HLP 15.99.
All 11 Block-1 line refs pin at abs=1e-4 against the worksheet (SAP 68,
cost 951.3425, CO2 4767.4862, space 16086.3557, main fuel 19150.4235,
HW 3307.2639, lighting 262.0885).

This is the third independent 1e-4 confirmation that the cascade
reproduces the spec engine for the 6035 archetype (after S0380.192
Simplified-RR + S0380.193 suspended-floor). It differs from 6035 in one
input only — the Main first-floor HLP (15.99 here vs 6035's 8.32) — so
6035's +19 PE vs the lodged register is lodged-register divergence, not
a calculator gap. A byte-identical 6035 replica (first-floor HLP 8.32)
would let 6035 itself be pinned directly to close that out.

2330 passed (+11), 0 failed; pyright net-zero.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-06-03 09:46:56 +00:00
parent 62fc27a5cc
commit e7a0c9885e
5 changed files with 131 additions and 0 deletions

Binary file not shown.

View file

@ -0,0 +1,112 @@
"""Mapper-driven cascade pin against the Elmhurst P960-0001-001431
"simulated case 3" worksheet a near-exact replica of golden cert
6035 (Main + Extension + Simplified room-in-roof, 8 windows).
Like 000565 / sim case 1 / sim case 2, this fixture does NOT hand-build
the EpcPropertyData: it routes the Summary PDF through
ElmhurstSiteNotesExtractor + from_elmhurst_site_notes so the SAP-result
pin grid exercises the WHOLE extractor + mapper + calculator pipeline.
Purpose: prove the calculator is spec-correct for the 6035 archetype
(after S0380.192 Simplified-RR + S0380.193 suspended-floor fixes). This
cert reproduces 6035's 8 windows (≈14.15 m²) and Main ground-floor
heat-loss perimeter (15.99 m). It still differs from 6035 in ONE input:
the Main FIRST-floor HLP is 15.99 here vs 6035's 8.32 (6035's upper
storey has less exposed perimeter), so it is not yet byte-identical to
6035. All 11 Block-1 line refs nonetheless pin at abs=1e-4 against this
cert's OWN worksheet, confirming the cascade reproduces the spec engine
exactly for this Main+Ext+RR+suspended-floor+gas-combi shape so 6035's
residual +19 PE vs the lodged register is lodged-register divergence,
not a cascade gap.
Cert shape: Main + Extension 1, both solid brick WITH internal
insulation (Main) / as-built (Ext1), 3 storeys, Simplified room-in-roof
on the Main (floor 29.75 , exposed + party gables), suspended
uninsulated ground floors, gas-combi SAP code 104, 8 windows, no PV.
Source: user-simulated PDFs at `sap worksheets/golden fixture
debugging/simulated case 3/`. The Summary is mirrored into the tracked
`backend/documents_parser/tests/fixtures/Summary_001431_rr8w.pdf`
(distinct name the corpus reuses cert 001431).
Worksheet pin targets (P960-0001-001431, Block 1 energy rating):
- SAP rating 68 (line 258), ECF 2.3146 (line 257)
- Total fuel cost £951.3425 (line 255)
- CO2 4767.4862 kg/year (line 272)
- Space heating 16086.3557 kWh/year (Σ monthly (98))
- Main 1 fuel 19150.4235 kWh/year (line 211)
- Secondary fuel 0.0 (line 215)
- Hot water fuel 3307.2639 kWh/year (line 219)
- Lighting 262.0885 kWh/year (line 232)
- Pumps/fans 86.0 kWh/year (line 231)
Per [[feedback-zero-error-strict]] + [[feedback-e2e-validation-
philosophy]]: pins are abs=1e-4 against the worksheet PDF.
"""
from __future__ import annotations
import re
import subprocess
from pathlib import Path
from typing import Final
from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor
from datatypes.epc.domain.epc_property_data import EpcPropertyData
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
# parents[0]=worksheet/, [1]=sap10_calculator/, [2]=domain/, [3]=tests/,
# [4]=repo root.
_SUMMARY_PDF: Final[Path] = (
Path(__file__).resolve().parents[4]
/ "backend" / "documents_parser" / "tests" / "fixtures"
/ "Summary_001431_rr8w.pdf"
)
def _summary_pdf_to_textract_style_pages(pdf_path: Path) -> list[str]:
"""Convert a Summary PDF into the per-page text format the
ElmhurstSiteNotesExtractor expects (label\\nvalue sequences).
Mirror of the helper in `test_summary_pdf_mapper_chain.py` /
`_elmhurst_worksheet_000565.py`.
"""
info = subprocess.run(
["pdfinfo", str(pdf_path)], capture_output=True, text=True, check=True,
).stdout
m = re.search(r"Pages:\s+(\d+)", info)
if m is None:
raise RuntimeError(f"Could not parse page count from {pdf_path}")
page_count = int(m.group(1))
pages: list[str] = []
for i in range(1, page_count + 1):
layout = subprocess.run(
[
"pdftotext", "-layout", "-f", str(i), "-l", str(i),
str(pdf_path), "-",
],
capture_output=True, text=True, check=True,
).stdout
tokens: list[str] = []
for line in layout.splitlines():
if not line.strip():
tokens.append("")
continue
parts = [p for p in re.split(r"\s{2,}", line.strip()) if p]
tokens.extend(parts)
pages.append("\n".join(tokens))
return pages
def build_epc() -> EpcPropertyData:
"""Route the simulated case-2 Summary through extractor + mapper.
No hand-built EpcPropertyData the extractor and mapper are part of
the test target. Exercises the S0380.192 Simplified-RR fix and the
S0380.193 suspended-floor sealed-rule fix.
"""
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_PDF)
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
return EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)

View file

@ -39,6 +39,7 @@ from tests.domain.sap10_calculator.worksheet import (
_elmhurst_worksheet_000565 as _w000565,
_elmhurst_worksheet_001431 as _w001431,
_elmhurst_worksheet_001431_rr as _w001431_rr,
_elmhurst_worksheet_001431_rr8 as _w001431_rr8,
)
from tests.domain.sap10_calculator.worksheet._elmhurst_fixtures import (
ALL_FIXTURES as _ELMHURST_FIXTURES,
@ -183,6 +184,23 @@ _FIXTURE_PINS: Final[dict[str, FixtureCascadePins]] = {
lighting_kwh_per_yr=282.6414,
pumps_fans_kwh_per_yr=86.0,
),
# Mapper-driven cohort entry — Summary_001431_rr8w.pdf → extractor →
# mapper → calculator. Near-exact 6035 replica: Main + Extension +
# Simplified room-in-roof, 8 windows (≈14.15 m², matching 6035),
# suspended uninsulated floors. Differs from 6035 only in the Main
# first-floor HLP (15.99 here vs 6035's 8.32). Pins at 1e-4 confirm
# the cascade is spec-correct for the archetype → 6035's +19 PE vs
# the lodged register is lodged-register divergence, not a calc gap.
"001431_rr8": FixtureCascadePins(
sap_score=68, sap_score_continuous=67.7118, ecf=2.3146,
total_fuel_cost_gbp=951.3425, co2_kg_per_yr=4767.4862,
space_heating_kwh_per_yr=16086.3557,
main_heating_fuel_kwh_per_yr=19150.4235,
secondary_heating_fuel_kwh_per_yr=0.0,
hot_water_kwh_per_yr=3307.2639,
lighting_kwh_per_yr=262.0885,
pumps_fans_kwh_per_yr=86.0,
),
}
@ -196,6 +214,7 @@ _FIXTURE_MODULES: Final[dict[str, ModuleType]] = {
"000565": _w000565,
"001431": _w001431,
"001431_rr": _w001431_rr,
"001431_rr8": _w001431_rr8,
}