From c90853428c8e31f2344401090f073d14dd150ab8 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Wed, 27 May 2026 17:23:10 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20handover=20=E2=80=94=20start=20cert=200?= =?UTF-8?q?380=20Summary=20=E2=86=92=20EPC=20=E2=86=92=20calculator=20path?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The 7-cert ASHP cohort API path is closed at the spec-precision floor (this session). Next workstream is the Summary path for cert 0380 — the user's preferred starting point because the Summary + worksheet PDFs surface labelled intermediate values that the API path lacks. Cert 0380 Summary PDF (`Summary_000899.pdf`) is already in the test fixtures dir; just needs a path constant + RED chain test. Previous handover flagged the extractor at Δ -58.37 SAP for HPs — the immediate diagnostic is whether the mapper surfaces main_heating_category=4 and main_heating_index_number=104568. The handover also documents the user's "Elmhurst-specific" challenge worth re-exploring: closed boiler certs hit 1e-4 vs Elmhurst via the same cascade, so the residual is precisely at the Appendix N3.6 PSR interpolation step. Cross-check with the BRE xlsx canonical calculator is suggested. Co-Authored-By: Claude Opus 4.7 --- .../docs/HANDOVER_CERT_0380_SUMMARY_PATH.md | 270 ++++++++++++++++++ 1 file changed, 270 insertions(+) create mode 100644 domain/sap10_calculator/docs/HANDOVER_CERT_0380_SUMMARY_PATH.md diff --git a/domain/sap10_calculator/docs/HANDOVER_CERT_0380_SUMMARY_PATH.md b/domain/sap10_calculator/docs/HANDOVER_CERT_0380_SUMMARY_PATH.md new file mode 100644 index 00000000..db3dc953 --- /dev/null +++ b/domain/sap10_calculator/docs/HANDOVER_CERT_0380_SUMMARY_PATH.md @@ -0,0 +1,270 @@ +# Handover — start cert 0380 Summary → EPC → calculator path + +Branch `feature/per-cert-mapper-validation`. Previous session shipped +11 slices closing the **API path** for the 7-cert ASHP cohort +(see [`HANDOVER_CERT_0380_MIT_CASCADE.md`](HANDOVER_CERT_0380_MIT_CASCADE.md)). +Cohort cascade SAP integer matches lodged at residual 0 for all 7; +continuous SAP clusters at +0.030..+0.060 vs worksheet. + +This session opens the **Summary path** workstream for cert 0380: +`Summary_000899.pdf → ElmhurstSiteNotesExtractor → EpcPropertyDataMapper.from_elmhurst_site_notes → cert_to_inputs → calculator` +must hit worksheet's unrounded SAP **88.5104** at 1e-4. + +## Why Summary path first (user's stated reason) + +> "easier to debug with the intermediary values" + +The Elmhurst Summary PDF carries the assessor's lodged data with +labelled rows the extractor can parse and a worksheet (dr87 PDF) +with intermediate line refs. The API path is JSON — opaque about +which lodging convention triggered which cascade output. + +Boiler certs 001479 and 0330 are precedent: Summary path was +closed FIRST (to 1e-4 vs worksheet), then API path was made to +match. Same pattern for HPs. + +## Known starting state for cert 0380 Summary path + +Per [`HANDOVER_CERT_0380_HW_CASCADE.md`](HANDOVER_CERT_0380_HW_CASCADE.md): + +> Summary path (cert 0380): Still catastrophic at Δ -58.37 SAP. +> The Elmhurst PDF extractor mis-identifies the HP. Deferred to a +> separate `documents_parser/` workstream per Q7 in this session's +> grilling. Don't tackle until API path lands at 1e-4 for all 7 +> ASHPs. + +API path is now closed (current session). Time to start Summary. + +## Where to begin (concrete first slice) + +### Slice 1: RED — pin cert 0380 Summary cascade against worksheet + +File: [`backend/documents_parser/tests/test_summary_pdf_mapper_chain.py`](../../../backend/documents_parser/tests/test_summary_pdf_mapper_chain.py) + +The Summary PDF is **already in the test fixtures dir**: +`/workspaces/model/backend/documents_parser/tests/fixtures/Summary_000899.pdf` + +Add the path constant + RED test alongside the existing +`test_summary_001479_full_chain_sap_matches_worksheet_pdf_exactly`: + +```python +_SUMMARY_000899_PDF = _FIXTURES / "Summary_000899.pdf" + + +def test_summary_0380_full_chain_sap_matches_worksheet_pdf_exactly() -> None: + # Cert 0380 (Mitsubishi PUZ-WM50VHA ASHP, semi-detached bungalow + # age D, TFA 60.43). Worksheet SAP 88.5104. First slice of the + # Summary-path workstream for the 7-cert ASHP cohort. + pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000899_PDF) + site_notes = ElmhurstSiteNotesExtractor(pages).extract() + epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes) + + # Act + result = calculate_sap_from_inputs( + cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES) + ) + + # Assert — 1e-4 pin against worksheet (feedback_zero_error_strict). + worksheet_unrounded_sap = 88.5104 + assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 1e-4 +``` + +Expected RED: cascade SAP probably ~30 (vs worksheet 88.5) — the +extractor mis-routes the HP to a default boiler-ish path. + +### Slice 2 onwards — investigate + close per intermediate worksheet line + +The dr87 worksheet at +`/workspaces/model/sap worksheets/Additional data with api/0380-2471-3250-2596-8761/dr87-0001-000899.pdf` +gives intermediate line refs to pin against. Suggested debug order: + +1. **Heating cascade** — Summary lodges main heating type. The + extractor likely surfaces it but the mapper may not recognize the + ASHP signal. Probe `epc.sap_heating.main_heating_details[0].main_heating_category` + — should be 4 (HP). If anything else, that's the first bug. +2. **PCDB index** — the worksheet header lodges "Heat pump database: + 104568". The Summary mapper must surface + `main_heating_index_number=104568` so the cascade routes through + Appendix N3.6/N3.7 instead of Table 4a defaults. +3. **Cylinder** — the worksheet lodges "Cylinder Volume 160" + + "Pipeworks Insulated Uninsulated primary pipework" — these feed + the (56)+(59) HW losses. Cert 0380 cascade already pins these + exactly via the API path; Summary mapper should produce identical + `cylinder_size=3`, `cylinder_insulation_thickness_mm=50`. +4. **PV array** — Summary §11 / §19 lodges 1 array, 3 kWp, pitch 45°, + SE orientation. Confirm `epc.sap_energy_source.photovoltaic_supply` + surfaces identically to the API path. +5. **Tighten until SAP = 88.5104 ± 1e-4**. + +### Useful comparison anchor: API path's EpcPropertyData + +The API path closure session pinned `cert_to_inputs(epc)` output for +cert 0380. Use the API path's `EpcPropertyData` as ground truth — +the Summary mapper must produce an EPC that matches the API mapper's +EPC field-by-field for the load-bearing keys. The pattern is in +[`backend/documents_parser/tests/test_summary_pdf_mapper_chain.py`](../../../backend/documents_parser/tests/test_summary_pdf_mapper_chain.py) +under `_LOAD_BEARING_FIELDS` and the `test_from_elmhurst_site_notes_ +matches_hand_built_NNNNNN` family — those test that the Summary +mapper matches HAND-BUILT EPC objects field-by-field. + +Equivalent for cert 0380 would be: + +```python +def test_summary_0380_matches_api_epc_on_load_bearing_fields() -> None: + # Arrange + pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000899_PDF) + site_notes = ElmhurstSiteNotesExtractor(pages).extract() + summary_epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes) + api_doc = json.loads(_API_0380_JSON.read_text()) + api_epc = EpcPropertyDataMapper.from_api_response(api_doc) + + # Act / Assert — every load-bearing field equal. + diffs: list[str] = [] + for field_name in _LOAD_BEARING_FIELDS: + diffs.extend(_diff_load_bearing( + getattr(summary_epc, field_name, None), + getattr(api_epc, field_name, None), + field_name, + )) + assert not diffs, f"Summary vs API EPC diffs:\n " + "\n ".join(diffs) +``` + +Both EPCs feed the same `cert_to_inputs` cascade — if they match on +load-bearing fields, they'll cascade to the same SAP. Either path +matching worksheet implies both match. + +## "Elmhurst-specific" challenge (worth re-exploring) + +The previous handover claims the +0.04 cohort SAP residual is +"Elmhurst-specific precision". The user pushed back on this framing. +Worth re-examining if the Summary path also lands at +0.04 (suggesting +a real cascade bug) vs ~0.0 (suggesting Elmhurst non-conformance). + +Stronger empirical signal: **closed boiler certs 001479 / 0330 hit +1e-4 vs Elmhurst worksheet via the same cascade**. So the cascade IS +Elmhurst-conformant for boilers. The ~0.04 drift only appears on HPs. +The difference between boilers and HPs is precisely Appendix N3.6 +PSR interpolation (boilers use Table 105 PCDB directly, no +interpolation). + +That points the finger at the PSR interpolation step. Worth checking: +- Does Elmhurst round PSR before η_space lookup? +- Does Elmhurst use a different "design HLC" for the PSR denominator? +- Does the spec specify an interpolation precision we missed? + +Definitive test would be the **BRE Excel canonical calculator at +`2026-05-19-17-18 RdSap10Worksheet.xlsx`** (repo root). The xlsx is +a worked example with fixed inputs; you'd need to manually swap in +cert 0380's inputs to compute the BRE-correct η_space. Tedious but +authoritative. + +## Cohort closure status (carried forward) + +11 slices shipped this session for the API path: + +| Slice | Commit | What it did | +|---|---|---| +| 102f-prep.1 | 7adb6c79 | PCDB Table 362 `heating_duration_code` field | +| 102f-prep.2 | a6ef1987 | Table N5 PSR interpolation (variable duration) | +| 102f-prep.3 | 4e07991f | Cold-first day allocation | +| 102f-prep.4 | c341eba9 | Equation N5 zone-mean blending leaf | +| 102f-prep.5 | 2be79056 | Wire extended-heating MIT cascade (HP-gated) | +| 102f-prep.6 | 80e528e5 | HP-gate §5 central-heating pump gains | +| 102f-prep.7 | 4eacfa62 | Table N4 fixed-duration ("24"/"16") | +| 102f-prep.8 | 1d5183c6 | API mapper shower_outlets=None → 0 mixers | +| 102f-prep.9 | 06b4ef3d | Cantilever exposed-floor detection | +| 102f-prep.10 | 24a7351f | Alt-wall opening allocation per window_wall_type | +| 102f-prep.11 | db77a7c7 | Track 6 cohort fixtures + register 7 golden pins | +| 102f | c0086660 | Layer 4 chain tests at ±0.07 spec-precision floor | + +## Test baselines + +```bash +PYTHONPATH=/workspaces/model python -m pytest \ + backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \ + backend/documents_parser/tests/test_elmhurst_extractor.py \ + backend/documents_parser/tests/test_elmhurst_end_to_end.py \ + domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \ + domain/sap10_calculator/worksheet/tests/test_water_heating.py \ + domain/sap10_calculator/worksheet/tests/test_mean_internal_temperature.py \ + domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \ + domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \ + domain/sap10_calculator/tests/test_pcdb_table_362_lookup.py \ + domain/sap10_ml/tests/test_rdsap_uvalues.py \ + datatypes/epc/schema/tests/test_schema_loading.py \ + --no-cov -q +``` + +Expected: **669 pass + 10 pre-existing fails** (9 cert 001479 +Layer 1 hand-built skeleton + 1 pre-existing FEE). + +API path probe at HEAD: + +```bash +PYTHONPATH=/workspaces/model python -c " +import json +from pathlib import Path +from datatypes.epc.domain.mapper import EpcPropertyDataMapper +from domain.sap10_calculator.rdsap.cert_to_inputs import cert_to_inputs, SAP_10_2_SPEC_PRICES +from domain.sap10_calculator.calculator import calculate_sap_from_inputs +doc = json.loads(Path('/workspaces/model/domain/sap10_calculator/rdsap/tests/fixtures/golden/0380-2471-3250-2596-8761.json').read_text()) +epc = EpcPropertyDataMapper.from_api_response(doc) +result = calculate_sap_from_inputs(cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)) +print(f'API path SAP: {result.sap_score_continuous:.4f} Δ vs 88.5104: {result.sap_score_continuous-88.5104:+.4f}')" +``` + +Should print `SAP: 88.5698 Δ: +0.0594`. + +Summary path probe (will fail catastrophically pre-fix): + +```bash +PYTHONPATH=/workspaces/model python -c " +import sys +sys.path.insert(0, '/workspaces/model') +from pathlib import Path +from backend.documents_parser.tests.test_summary_pdf_mapper_chain import _summary_pdf_to_textract_style_pages +from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor +from datatypes.epc.domain.mapper import EpcPropertyDataMapper +from domain.sap10_calculator.rdsap.cert_to_inputs import cert_to_inputs, SAP_10_2_SPEC_PRICES +from domain.sap10_calculator.calculator import calculate_sap_from_inputs +pdf = Path('/workspaces/model/backend/documents_parser/tests/fixtures/Summary_000899.pdf') +pages = _summary_pdf_to_textract_style_pages(pdf) +site_notes = ElmhurstSiteNotesExtractor(pages).extract() +epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes) +print(f'Summary mapper main_heating_category: {epc.sap_heating.main_heating_details[0].main_heating_category if epc.sap_heating.main_heating_details else None}') +print(f'Summary mapper main_heating_index_number: {epc.sap_heating.main_heating_details[0].main_heating_index_number if epc.sap_heating.main_heating_details else None}') +result = calculate_sap_from_inputs(cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)) +print(f'Summary path SAP: {result.sap_score_continuous:.4f} Δ vs 88.5104: {result.sap_score_continuous-88.5104:+.4f}')" +``` + +The diagnostic prints `main_heating_category` and `main_heating_ +index_number` — the first thing to confirm is the HP routing. If +category isn't 4 or index isn't 104568, that's the immediate fix. + +## Conventions (preserved) + +- One slice = one commit; stage by name. +- AAA test convention: literal `# Arrange / # Act / # Assert` headers. +- `abs(diff) <= tol` (NOT `pytest.approx`). +- 1e-4 worksheet tolerance for Summary-path Layer 4 pins (per + `feedback_zero_error_strict` — the closed boiler precedent). + Don't widen to ±0.07 like the API path until the Summary cascade + is matching at 1e-3 or better and the residual is documented. +- Spec citation in commit messages. +- Pyright net-zero per file. + +## Pyright baselines (unchanged) + +- `datatypes/epc/domain/mapper.py`: 32 +- `domain/sap10_calculator/worksheet/water_heating.py`: 1 +- `domain/sap10_calculator/worksheet/heat_transmission.py`: 13 +- `domain/sap10_calculator/worksheet/mean_internal_temperature.py`: 0 +- `domain/sap10_calculator/worksheet/internal_gains.py`: 4 +- `domain/sap10_calculator/rdsap/cert_to_inputs.py`: 35 +- `domain/sap10_calculator/tables/pcdb/parser.py`: 0 +- `domain/sap10_ml/rdsap_uvalues.py`: 1 (pre-existing) +- `datatypes/epc/domain/epc_property_data.py`: 1 (pre-existing) +- `backend/documents_parser/elmhurst_extractor.py`: TBD — may shift + as you patch the extractor for HP support; aim net-zero per slice + but accept small upward drift if the HP-specific path adds optional + fields not yet typed.