docs: handover — start cert 0380 Summary → EPC → calculator path

The 7-cert ASHP cohort API path is closed at the spec-precision
floor (this session). Next workstream is the Summary path for cert
0380 — the user's preferred starting point because the Summary +
worksheet PDFs surface labelled intermediate values that the API
path lacks.

Cert 0380 Summary PDF (`Summary_000899.pdf`) is already in the
test fixtures dir; just needs a path constant + RED chain test.
Previous handover flagged the extractor at Δ -58.37 SAP for HPs
— the immediate diagnostic is whether the mapper surfaces
main_heating_category=4 and main_heating_index_number=104568.

The handover also documents the user's "Elmhurst-specific"
challenge worth re-exploring: closed boiler certs hit 1e-4 vs
Elmhurst via the same cascade, so the residual is precisely at the
Appendix N3.6 PSR interpolation step. Cross-check with the BRE
xlsx canonical calculator is suggested.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-05-27 17:23:10 +00:00
parent c00866607b
commit c90853428c

View file

@ -0,0 +1,270 @@
# Handover — start cert 0380 Summary → EPC → calculator path
Branch `feature/per-cert-mapper-validation`. Previous session shipped
11 slices closing the **API path** for the 7-cert ASHP cohort
(see [`HANDOVER_CERT_0380_MIT_CASCADE.md`](HANDOVER_CERT_0380_MIT_CASCADE.md)).
Cohort cascade SAP integer matches lodged at residual 0 for all 7;
continuous SAP clusters at +0.030..+0.060 vs worksheet.
This session opens the **Summary path** workstream for cert 0380:
`Summary_000899.pdf → ElmhurstSiteNotesExtractor → EpcPropertyDataMapper.from_elmhurst_site_notes → cert_to_inputs → calculator`
must hit worksheet's unrounded SAP **88.5104** at 1e-4.
## Why Summary path first (user's stated reason)
> "easier to debug with the intermediary values"
The Elmhurst Summary PDF carries the assessor's lodged data with
labelled rows the extractor can parse and a worksheet (dr87 PDF)
with intermediate line refs. The API path is JSON — opaque about
which lodging convention triggered which cascade output.
Boiler certs 001479 and 0330 are precedent: Summary path was
closed FIRST (to 1e-4 vs worksheet), then API path was made to
match. Same pattern for HPs.
## Known starting state for cert 0380 Summary path
Per [`HANDOVER_CERT_0380_HW_CASCADE.md`](HANDOVER_CERT_0380_HW_CASCADE.md):
> Summary path (cert 0380): Still catastrophic at Δ -58.37 SAP.
> The Elmhurst PDF extractor mis-identifies the HP. Deferred to a
> separate `documents_parser/` workstream per Q7 in this session's
> grilling. Don't tackle until API path lands at 1e-4 for all 7
> ASHPs.
API path is now closed (current session). Time to start Summary.
## Where to begin (concrete first slice)
### Slice 1: RED — pin cert 0380 Summary cascade against worksheet
File: [`backend/documents_parser/tests/test_summary_pdf_mapper_chain.py`](../../../backend/documents_parser/tests/test_summary_pdf_mapper_chain.py)
The Summary PDF is **already in the test fixtures dir**:
`/workspaces/model/backend/documents_parser/tests/fixtures/Summary_000899.pdf`
Add the path constant + RED test alongside the existing
`test_summary_001479_full_chain_sap_matches_worksheet_pdf_exactly`:
```python
_SUMMARY_000899_PDF = _FIXTURES / "Summary_000899.pdf"
def test_summary_0380_full_chain_sap_matches_worksheet_pdf_exactly() -> None:
# Cert 0380 (Mitsubishi PUZ-WM50VHA ASHP, semi-detached bungalow
# age D, TFA 60.43). Worksheet SAP 88.5104. First slice of the
# Summary-path workstream for the 7-cert ASHP cohort.
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000899_PDF)
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
# Act
result = calculate_sap_from_inputs(
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
)
# Assert — 1e-4 pin against worksheet (feedback_zero_error_strict).
worksheet_unrounded_sap = 88.5104
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 1e-4
```
Expected RED: cascade SAP probably ~30 (vs worksheet 88.5) — the
extractor mis-routes the HP to a default boiler-ish path.
### Slice 2 onwards — investigate + close per intermediate worksheet line
The dr87 worksheet at
`/workspaces/model/sap worksheets/Additional data with api/0380-2471-3250-2596-8761/dr87-0001-000899.pdf`
gives intermediate line refs to pin against. Suggested debug order:
1. **Heating cascade** — Summary lodges main heating type. The
extractor likely surfaces it but the mapper may not recognize the
ASHP signal. Probe `epc.sap_heating.main_heating_details[0].main_heating_category`
— should be 4 (HP). If anything else, that's the first bug.
2. **PCDB index** — the worksheet header lodges "Heat pump database:
104568". The Summary mapper must surface
`main_heating_index_number=104568` so the cascade routes through
Appendix N3.6/N3.7 instead of Table 4a defaults.
3. **Cylinder** — the worksheet lodges "Cylinder Volume 160" +
"Pipeworks Insulated Uninsulated primary pipework" — these feed
the (56)+(59) HW losses. Cert 0380 cascade already pins these
exactly via the API path; Summary mapper should produce identical
`cylinder_size=3`, `cylinder_insulation_thickness_mm=50`.
4. **PV array** — Summary §11 / §19 lodges 1 array, 3 kWp, pitch 45°,
SE orientation. Confirm `epc.sap_energy_source.photovoltaic_supply`
surfaces identically to the API path.
5. **Tighten until SAP = 88.5104 ± 1e-4**.
### Useful comparison anchor: API path's EpcPropertyData
The API path closure session pinned `cert_to_inputs(epc)` output for
cert 0380. Use the API path's `EpcPropertyData` as ground truth —
the Summary mapper must produce an EPC that matches the API mapper's
EPC field-by-field for the load-bearing keys. The pattern is in
[`backend/documents_parser/tests/test_summary_pdf_mapper_chain.py`](../../../backend/documents_parser/tests/test_summary_pdf_mapper_chain.py)
under `_LOAD_BEARING_FIELDS` and the `test_from_elmhurst_site_notes_
matches_hand_built_NNNNNN` family — those test that the Summary
mapper matches HAND-BUILT EPC objects field-by-field.
Equivalent for cert 0380 would be:
```python
def test_summary_0380_matches_api_epc_on_load_bearing_fields() -> None:
# Arrange
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000899_PDF)
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
summary_epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
api_doc = json.loads(_API_0380_JSON.read_text())
api_epc = EpcPropertyDataMapper.from_api_response(api_doc)
# Act / Assert — every load-bearing field equal.
diffs: list[str] = []
for field_name in _LOAD_BEARING_FIELDS:
diffs.extend(_diff_load_bearing(
getattr(summary_epc, field_name, None),
getattr(api_epc, field_name, None),
field_name,
))
assert not diffs, f"Summary vs API EPC diffs:\n " + "\n ".join(diffs)
```
Both EPCs feed the same `cert_to_inputs` cascade — if they match on
load-bearing fields, they'll cascade to the same SAP. Either path
matching worksheet implies both match.
## "Elmhurst-specific" challenge (worth re-exploring)
The previous handover claims the +0.04 cohort SAP residual is
"Elmhurst-specific precision". The user pushed back on this framing.
Worth re-examining if the Summary path also lands at +0.04 (suggesting
a real cascade bug) vs ~0.0 (suggesting Elmhurst non-conformance).
Stronger empirical signal: **closed boiler certs 001479 / 0330 hit
1e-4 vs Elmhurst worksheet via the same cascade**. So the cascade IS
Elmhurst-conformant for boilers. The ~0.04 drift only appears on HPs.
The difference between boilers and HPs is precisely Appendix N3.6
PSR interpolation (boilers use Table 105 PCDB directly, no
interpolation).
That points the finger at the PSR interpolation step. Worth checking:
- Does Elmhurst round PSR before η_space lookup?
- Does Elmhurst use a different "design HLC" for the PSR denominator?
- Does the spec specify an interpolation precision we missed?
Definitive test would be the **BRE Excel canonical calculator at
`2026-05-19-17-18 RdSap10Worksheet.xlsx`** (repo root). The xlsx is
a worked example with fixed inputs; you'd need to manually swap in
cert 0380's inputs to compute the BRE-correct η_space. Tedious but
authoritative.
## Cohort closure status (carried forward)
11 slices shipped this session for the API path:
| Slice | Commit | What it did |
|---|---|---|
| 102f-prep.1 | 7adb6c79 | PCDB Table 362 `heating_duration_code` field |
| 102f-prep.2 | a6ef1987 | Table N5 PSR interpolation (variable duration) |
| 102f-prep.3 | 4e07991f | Cold-first day allocation |
| 102f-prep.4 | c341eba9 | Equation N5 zone-mean blending leaf |
| 102f-prep.5 | 2be79056 | Wire extended-heating MIT cascade (HP-gated) |
| 102f-prep.6 | 80e528e5 | HP-gate §5 central-heating pump gains |
| 102f-prep.7 | 4eacfa62 | Table N4 fixed-duration ("24"/"16") |
| 102f-prep.8 | 1d5183c6 | API mapper shower_outlets=None → 0 mixers |
| 102f-prep.9 | 06b4ef3d | Cantilever exposed-floor detection |
| 102f-prep.10 | 24a7351f | Alt-wall opening allocation per window_wall_type |
| 102f-prep.11 | db77a7c7 | Track 6 cohort fixtures + register 7 golden pins |
| 102f | c0086660 | Layer 4 chain tests at ±0.07 spec-precision floor |
## Test baselines
```bash
PYTHONPATH=/workspaces/model python -m pytest \
backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \
backend/documents_parser/tests/test_elmhurst_extractor.py \
backend/documents_parser/tests/test_elmhurst_end_to_end.py \
domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \
domain/sap10_calculator/worksheet/tests/test_water_heating.py \
domain/sap10_calculator/worksheet/tests/test_mean_internal_temperature.py \
domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \
domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \
domain/sap10_calculator/tests/test_pcdb_table_362_lookup.py \
domain/sap10_ml/tests/test_rdsap_uvalues.py \
datatypes/epc/schema/tests/test_schema_loading.py \
--no-cov -q
```
Expected: **669 pass + 10 pre-existing fails** (9 cert 001479
Layer 1 hand-built skeleton + 1 pre-existing FEE).
API path probe at HEAD:
```bash
PYTHONPATH=/workspaces/model python -c "
import json
from pathlib import Path
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
from domain.sap10_calculator.rdsap.cert_to_inputs import cert_to_inputs, SAP_10_2_SPEC_PRICES
from domain.sap10_calculator.calculator import calculate_sap_from_inputs
doc = json.loads(Path('/workspaces/model/domain/sap10_calculator/rdsap/tests/fixtures/golden/0380-2471-3250-2596-8761.json').read_text())
epc = EpcPropertyDataMapper.from_api_response(doc)
result = calculate_sap_from_inputs(cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES))
print(f'API path SAP: {result.sap_score_continuous:.4f} Δ vs 88.5104: {result.sap_score_continuous-88.5104:+.4f}')"
```
Should print `SAP: 88.5698 Δ: +0.0594`.
Summary path probe (will fail catastrophically pre-fix):
```bash
PYTHONPATH=/workspaces/model python -c "
import sys
sys.path.insert(0, '/workspaces/model')
from pathlib import Path
from backend.documents_parser.tests.test_summary_pdf_mapper_chain import _summary_pdf_to_textract_style_pages
from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
from domain.sap10_calculator.rdsap.cert_to_inputs import cert_to_inputs, SAP_10_2_SPEC_PRICES
from domain.sap10_calculator.calculator import calculate_sap_from_inputs
pdf = Path('/workspaces/model/backend/documents_parser/tests/fixtures/Summary_000899.pdf')
pages = _summary_pdf_to_textract_style_pages(pdf)
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
print(f'Summary mapper main_heating_category: {epc.sap_heating.main_heating_details[0].main_heating_category if epc.sap_heating.main_heating_details else None}')
print(f'Summary mapper main_heating_index_number: {epc.sap_heating.main_heating_details[0].main_heating_index_number if epc.sap_heating.main_heating_details else None}')
result = calculate_sap_from_inputs(cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES))
print(f'Summary path SAP: {result.sap_score_continuous:.4f} Δ vs 88.5104: {result.sap_score_continuous-88.5104:+.4f}')"
```
The diagnostic prints `main_heating_category` and `main_heating_
index_number` — the first thing to confirm is the HP routing. If
category isn't 4 or index isn't 104568, that's the immediate fix.
## Conventions (preserved)
- One slice = one commit; stage by name.
- AAA test convention: literal `# Arrange / # Act / # Assert` headers.
- `abs(diff) <= tol` (NOT `pytest.approx`).
- 1e-4 worksheet tolerance for Summary-path Layer 4 pins (per
`feedback_zero_error_strict` — the closed boiler precedent).
Don't widen to ±0.07 like the API path until the Summary cascade
is matching at 1e-3 or better and the residual is documented.
- Spec citation in commit messages.
- Pyright net-zero per file.
## Pyright baselines (unchanged)
- `datatypes/epc/domain/mapper.py`: 32
- `domain/sap10_calculator/worksheet/water_heating.py`: 1
- `domain/sap10_calculator/worksheet/heat_transmission.py`: 13
- `domain/sap10_calculator/worksheet/mean_internal_temperature.py`: 0
- `domain/sap10_calculator/worksheet/internal_gains.py`: 4
- `domain/sap10_calculator/rdsap/cert_to_inputs.py`: 35
- `domain/sap10_calculator/tables/pcdb/parser.py`: 0
- `domain/sap10_ml/rdsap_uvalues.py`: 1 (pre-existing)
- `datatypes/epc/domain/epc_property_data.py`: 1 (pre-existing)
- `backend/documents_parser/elmhurst_extractor.py`: TBD — may shift
as you patch the extractor for HP support; aim net-zero per slice
but accept small upward drift if the HP-specific path adds optional
fields not yet typed.