mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
Slice S0380.52: cert 000565 Elmhurst-only mapper-driven cascade pin + glazing-label coverage
User pivot at end of prior session: don't hand-build EpcPropertyData
fixtures — route Summary PDFs through `EpcPropertyDataMapper.from_
elmhurst_site_notes` so the pin grid exercises extractor + mapper +
calculator, and each new Elmhurst doc grows mapper coverage instead
of bespoke fixture code.
New fixture cert 000565 is a stress-test cert (5 building parts, age
mix A→J, conservatory with heaters, curtain wall, basement walls,
mixed party-wall constructions) that surfaces many uncommon cascade
paths absent from the cohort-2 + ASHP corpus.
Mapper coverage extended for 3 Elmhurst §11 glazing labels surfaced
on this cert (per RdSAP-Schema-21.0.1, `datatypes/epc/domain/
epc_codes.csv` glazed_type rows):
"Triple between 2002 and 2021": 9 (RdSAP-21 schema row 9 — triple
glazing, installed 2002-2022 in EAW; `_G_PERPENDICULAR_BY_
GLAZING_TYPE[9] = 0.68`, `_G_LIGHT_BY_GLAZING_CODE[9] = 0.70`)
"Single glazing": 1 (alias of bare "Single"; cascade
g_L = 0.90, g⊥ = 0.85 per SAP 10.2 Table 6b)
"Double glazing, known data": 3 (Elmhurst lodgement of RdSAP-21
schema row 7 "double, known data"; manufacturer U-value and
g-value lodged via WindowTransmissionDetails override the
cascade's defaults — grouped under code 3 with other unknown-
date DG variants for cascade-equivalence on g_L/g⊥)
Per [[feedback-e2e-validation-philosophy]] + [[feedback-zero-error-
strict]]: pin tolerances are abs=1e-4 against U985-0001-000565.pdf
Block 1 line refs (pinned: SAP int + SAP continuous + ECF + total
fuel cost + CO2 + space heating + main 1 fuel + secondary fuel +
hot water + lighting + pumps/fans).
Outcome: 1/11 pin green (`secondary_heating_fuel_kwh_per_yr = 0`);
10 pins are now named calculator-gap residuals to fix in subsequent
slices:
main_heating_fuel_kwh_per_yr +27,665.01 kWh/yr (heat-pump SAP code
224 + gas combi via WHC 914 "from second main"; cascade probably
runs ASHP for DHW instead of routing through gas combi)
hot_water_kwh_per_yr +164.88 kWh/yr (FGHRS / solar HW /
Table 3a no-keep-hot for the gas combi DHW path)
lighting_kwh_per_yr -236.19 kWh/yr (RdSAP §12-1 bulb-
count cascade; 27 total / 7 low-energy / 20 incandescent lodged)
pumps_fans_kwh_per_yr -122.52 kWh/yr (cascade defaults
to 130; expected 252.52 = MEV PCDF 500755 + flue + solar pump)
Cohort regression check: 472 pass + 10 expected 000565 failures.
Pyright net-zero (32 errors before, 32 after).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
fc84c6d49a
commit
e51fcb74ca
4 changed files with 162 additions and 0 deletions
BIN
backend/documents_parser/tests/fixtures/Summary_000565.pdf
vendored
Normal file
BIN
backend/documents_parser/tests/fixtures/Summary_000565.pdf
vendored
Normal file
Binary file not shown.
|
|
@ -3724,10 +3724,23 @@ def _elmhurst_cylinder_insulation_code(
|
|||
# fixing the upstream extractor is deferred to a future slice.
|
||||
_ELMHURST_GLAZING_LABEL_TO_SAP10: Dict[str, int] = {
|
||||
"Single": 1,
|
||||
# Elmhurst §11 lodgement variant of the bare "Single" form — surfaced
|
||||
# on cert 000565 Window 3 (Wood frame, U=3.35, g=0.85). Same enum as
|
||||
# "Single" per Table U2 code 1; g_L=0.90 / g⊥=0.85.
|
||||
"Single glazing": 1,
|
||||
"Double pre 2002": 2,
|
||||
"Double between 2002 and 2021": 3,
|
||||
"Double with unknown install date": 3,
|
||||
"Double with unknown 16 mm or install date more": 3,
|
||||
# Elmhurst §11 lodgement of RdSAP-21 schema row 7 "double, known
|
||||
# data" — manufacturer U-value and g-value are lodged on the
|
||||
# SapWindow's `WindowTransmissionDetails` so the cascade reads
|
||||
# those directly. The glazing_type code only affects the §5
|
||||
# (66)..(67) daylight factor where g_L=0.80 across all DG variants
|
||||
# ({2, 3, 13}) — grouped under code 3 with the other unknown-date
|
||||
# DG variants for cascade-equivalence. First seen on cert 000565
|
||||
# Window 6 (Main, U=2.00, g=0.72).
|
||||
"Double glazing, known data": 3,
|
||||
"Double post or during 2022": 5,
|
||||
"Triple post or during 2022": 6,
|
||||
# One window in cert 2636 (Summary_000898.pdf) lodges the year-
|
||||
|
|
@ -3737,6 +3750,13 @@ _ELMHURST_GLAZING_LABEL_TO_SAP10: Dict[str, int] = {
|
|||
# Treated as the same enum as the full form per worksheet
|
||||
# "Triple glazed" lodging on cert 2636's dr87-0001-000898.pdf.
|
||||
"Triple post or during": 6,
|
||||
# RdSAP-Schema-21 row "triple glazing, installed 2002-2022 in EAW"
|
||||
# (epc_codes.csv code 9 — RdSAP-21 schema extension). First seen on
|
||||
# cert 000565 Window 2 (Summary_000565.pdf §11 row 2, manufacturer
|
||||
# U=2.00, g=0.72). Cascade's `_G_PERPENDICULAR_BY_GLAZING_TYPE`
|
||||
# row 9 returns Table 6b triple-glazed g⊥=0.68; the lodged
|
||||
# solar_transmittance=0.72 overrides per worksheet-pinned value.
|
||||
"Triple between 2002 and 2021": 9,
|
||||
"Secondary": 7,
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,126 @@
|
|||
"""Mapper-driven cascade pin against Elmhurst U985-0001-000565.
|
||||
|
||||
Unlike the 6 cohort fixtures (000474/000477/000480/000487/000490/
|
||||
000516), this fixture does NOT hand-build the EpcPropertyData. It
|
||||
routes the Summary_000565.pdf through ElmhurstSiteNotesExtractor +
|
||||
EpcPropertyDataMapper.from_elmhurst_site_notes so the SAP-result pin
|
||||
grid exercises the WHOLE extractor + mapper + calculator pipeline.
|
||||
|
||||
Failing SAP-result pins surface gaps in any of the three layers:
|
||||
- Extractor: lodgement fields not parsed from the Summary PDF
|
||||
- Mapper: code-to-int translations missing from the dispatch dicts
|
||||
- Calculator: cascade gaps (e.g. CF cavity-filled party-wall U=0.20
|
||||
from Table 15 row 3 has no SAP10 wall_construction code today)
|
||||
|
||||
Each failing pin localises to one of the three and becomes its own
|
||||
slice. As more Elmhurst Summary PDFs land, the mapper will handle
|
||||
them automatically rather than per-cert hand-building.
|
||||
|
||||
Source: PDF supplied by user 2026-05-28 at `sap worksheets/extended
|
||||
test case/`; mirrored into the tracked
|
||||
`backend/documents_parser/tests/fixtures/Summary_000565.pdf` so the
|
||||
test runs without depending on the unstaged user workspace.
|
||||
|
||||
Cert shape (Summary §1-19): House, Enclosed End-Terrace, 4 heated
|
||||
storeys, TFA 319.91 m², 5 building parts (Main + 4 extensions). Age
|
||||
mix A→J. Heat pump SAP code 224 + gas combi (PCDB 15100 Vaillant
|
||||
Ecotec plus 415) providing DHW only via water_heating_code 914
|
||||
("from second main system"). Solar HW (3 m² flat-panel, W,
|
||||
30° elevation), FGHRS (Zenex SuperFlow index 60063), MEV
|
||||
decentralised (PCDF 500755). Conservatory thermally separated WITH
|
||||
fixed heaters. Curtain Wall Post-2023 (Ext2), basement walls
|
||||
(Ext3+Ext4), CF cavity-filled party wall (Ext1), CU cavity-unfilled
|
||||
party wall (Main). RR on every part with mixed age bands.
|
||||
|
||||
Worksheet pin targets (U985-0001-000565.pdf, Block 1 — energy rating):
|
||||
- SAP value 28.5087 (line 257) → SAP rating 29 (line 258)
|
||||
- Energy cost factor 5.3866 (line 257)
|
||||
- Total fuel cost £4680.2593 (line 255, Table 12 prices)
|
||||
- CO2 6447.6263 kg/year (line 272)
|
||||
- Space heating 59008.3499 kWh/year (line 98c)
|
||||
- Main 1 fuel 34710.7941 kWh/year (line 211) — ASHP electricity
|
||||
- Secondary fuel 0.0 (line 215)
|
||||
- Hot water fuel 3755.0288 kWh/year (line 219) — gas combi via WHC 914
|
||||
- Lighting 1384.8353 kWh/year (line 232)
|
||||
- Pumps/fans 252.5159 kWh/year (line 231) — MEV 127.5 + flue 45 + solar 80
|
||||
|
||||
Per [[feedback-zero-error-strict]] + [[feedback-e2e-validation-
|
||||
philosophy]]: pins are abs=1e-4 against the worksheet PDF. Failing
|
||||
pins are named extractor / mapper / calculator gaps to fix.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
from typing import Final
|
||||
|
||||
from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor
|
||||
from datatypes.epc.domain.epc_property_data import EpcPropertyData
|
||||
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
|
||||
|
||||
|
||||
# Repo root → backend fixtures. parents[0]=tests/, parents[1]=worksheet/,
|
||||
# parents[2]=sap10_calculator/, parents[3]=domain/, parents[4]=repo root.
|
||||
_SUMMARY_PDF: Final[Path] = (
|
||||
Path(__file__).resolve().parents[4]
|
||||
/ "backend" / "documents_parser" / "tests" / "fixtures"
|
||||
/ "Summary_000565.pdf"
|
||||
)
|
||||
|
||||
|
||||
def _summary_pdf_to_textract_style_pages(pdf_path: Path) -> list[str]:
|
||||
"""Convert a Summary PDF into the per-page text format the
|
||||
ElmhurstSiteNotesExtractor expects (label\\nvalue sequences).
|
||||
|
||||
Mirror of the helper in `backend/documents_parser/tests/
|
||||
test_summary_pdf_mapper_chain.py::_summary_pdf_to_textract_style_
|
||||
pages`. Duplicated here rather than imported across the test/
|
||||
fixture boundary; the canonical version lives next to its callers
|
||||
and this fixture module is the only e2e harness consumer.
|
||||
|
||||
`pdftotext -layout` preserves the spatial pairing of label and
|
||||
value on each line; we split each line on 2+ spaces to surface
|
||||
the label/value tokens, then concatenate them back into a single
|
||||
newline-delimited stream per page.
|
||||
"""
|
||||
info = subprocess.run(
|
||||
["pdfinfo", str(pdf_path)], capture_output=True, text=True, check=True,
|
||||
).stdout
|
||||
m = re.search(r"Pages:\s+(\d+)", info)
|
||||
if m is None:
|
||||
raise RuntimeError(f"Could not parse page count from {pdf_path}")
|
||||
page_count = int(m.group(1))
|
||||
|
||||
pages: list[str] = []
|
||||
for i in range(1, page_count + 1):
|
||||
layout = subprocess.run(
|
||||
[
|
||||
"pdftotext", "-layout", "-f", str(i), "-l", str(i),
|
||||
str(pdf_path), "-",
|
||||
],
|
||||
capture_output=True, text=True, check=True,
|
||||
).stdout
|
||||
tokens: list[str] = []
|
||||
for line in layout.splitlines():
|
||||
if not line.strip():
|
||||
tokens.append("")
|
||||
continue
|
||||
parts = [p for p in re.split(r"\s{2,}", line.strip()) if p]
|
||||
tokens.extend(parts)
|
||||
pages.append("\n".join(tokens))
|
||||
return pages
|
||||
|
||||
|
||||
def build_epc() -> EpcPropertyData:
|
||||
"""Route Summary_000565.pdf through extractor + mapper.
|
||||
|
||||
No hand-built EpcPropertyData. The Elmhurst extractor and the
|
||||
mapper are part of the test target — failing SAP-result pins
|
||||
surface gaps in any of the three layers (extractor, mapper,
|
||||
calculator). Each gap becomes its own follow-up slice.
|
||||
"""
|
||||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_PDF)
|
||||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||||
return EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||||
|
|
@ -33,6 +33,7 @@ from domain.sap10_calculator.worksheet.tests import (
|
|||
_elmhurst_worksheet_000487 as _w000487,
|
||||
_elmhurst_worksheet_000490 as _w000490,
|
||||
_elmhurst_worksheet_000516 as _w000516,
|
||||
_elmhurst_worksheet_000565 as _w000565,
|
||||
)
|
||||
from domain.sap10_calculator.worksheet.tests._elmhurst_fixtures import (
|
||||
ALL_FIXTURES as _ELMHURST_FIXTURES,
|
||||
|
|
@ -129,6 +130,20 @@ _FIXTURE_PINS: Final[dict[str, FixtureCascadePins]] = {
|
|||
lighting_kwh_per_yr=230.8853,
|
||||
pumps_fans_kwh_per_yr=160.0,
|
||||
),
|
||||
# Mapper-driven cohort entry — Summary_000565.pdf → extractor →
|
||||
# mapper → calculator. 5 BPs, heat pump + gas combi DHW via WHC 914,
|
||||
# solar HW, FGHRS, conservatory with heaters, curtain wall, basement
|
||||
# walls. Pins are worksheet PDF Block 1 (energy-rating) line refs.
|
||||
"000565": FixtureCascadePins(
|
||||
sap_score=29, sap_score_continuous=28.5087, ecf=5.3866,
|
||||
total_fuel_cost_gbp=4680.2593, co2_kg_per_yr=6447.6263,
|
||||
space_heating_kwh_per_yr=59008.3499,
|
||||
main_heating_fuel_kwh_per_yr=34710.7941,
|
||||
secondary_heating_fuel_kwh_per_yr=0.0,
|
||||
hot_water_kwh_per_yr=3755.0288,
|
||||
lighting_kwh_per_yr=1384.8353,
|
||||
pumps_fans_kwh_per_yr=252.5159,
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
|
|
@ -139,6 +154,7 @@ _FIXTURE_MODULES: Final[dict[str, ModuleType]] = {
|
|||
"000487": _w000487,
|
||||
"000490": _w000490,
|
||||
"000516": _w000516,
|
||||
"000565": _w000565,
|
||||
}
|
||||
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue