From 7fed541efa03c7c83e79508cd8f84c41b3875198 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Tue, 26 May 2026 22:15:14 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20update=20handover=20=E2=80=94=20cert=20?= =?UTF-8?q?9501=20closed,=20HP=20workstream=20still=20next?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cert 9501 (top-floor flat + RR + measured PV) is now CLOSED on both Summary and API paths at 1e-4 vs worksheet 68.5252 (Slices 99a-99e on Summary + 100a-100c on API). Three boiler certs in total now have Layer 4 production gates. Updated handover lists the 7 ASHP workstream (still deferred), the 8 cohort certs without worksheets (residuals tightened by Slice 100c's gap-aware DG-pre-2002 glazing lookup), and captures the 7 key learnings from cert 9501 closure as guidance for the HP workstream. Co-Authored-By: Claude Opus 4.7 --- .../docs/HANDOVER_CERT_9501_AND_HEATPUMPS.md | 289 ++++++++---------- 1 file changed, 127 insertions(+), 162 deletions(-) diff --git a/domain/sap10_calculator/docs/HANDOVER_CERT_9501_AND_HEATPUMPS.md b/domain/sap10_calculator/docs/HANDOVER_CERT_9501_AND_HEATPUMPS.md index 5bb95954..3e05df3c 100644 --- a/domain/sap10_calculator/docs/HANDOVER_CERT_9501_AND_HEATPUMPS.md +++ b/domain/sap10_calculator/docs/HANDOVER_CERT_9501_AND_HEATPUMPS.md @@ -1,168 +1,74 @@ -# Handover — Cert 9501 flat-exposure + heat-pump workstream +# Handover — Heat-pump workstream + remaining boiler audits You're picking up branch `feature/per-cert-mapper-validation` after -the cert 0330 boiler workflow landed (Layer 4 1e-4 gate GREEN on both -Summary and API paths, mirroring cert 001479). Two boiler certs are -now validated end-to-end against worksheets at 1e-4. The third boiler -cert (9501) is staged but RED at Δ -5.25 SAP because it surfaces a -new class of mapper gap: **flat-specific exposure**. +three boiler certs (001479, 0330, 9501) landed Layer 4 1e-4 chain +gates on BOTH the Summary and API paths. The boiler workflow is now +proven on three independent shapes — house mid-terrace (001479), +house mid-terrace with single extension (0330), top-floor flat with +RR + measured PV (9501). The next pieces are the 7 ASHP certs and +the 8 cohort golden certs that don't yet have worksheets. ## State at session start -Recent commits: +Most recent commits (cert 9501 closure): ``` +7992154f Slice 100c: API path — surface PV arrays + gap-aware glazing lookup +814ae798 Slice 100b: API TFA — include per-bp RR floor area in continuous TFA +7d460183 Slice 100a: API path — surface Detailed-RR per-surface areas +0735c7e8 Slice 99e: PV pitch enum-not-degrees + cert 9501 Layer 2 chain test +4264e0ad Slice 99d: surface PV array from Elmhurst Summary §19.0 +e9575b52 Slice 99c: Elmhurst mapper — RR gables external for flats + SO wall code +2cdaefcd Slice 99b: Elmhurst mapper — flat floor-position from floor.location +a76af2ec Slice 99a: Elmhurst extractor — no attachment line for flats +158c08f1 docs: handover for cert 9501 (flat exposure) + HP workstream +5d1778ac chore: stage cert 9501 fixtures 8443c770 Slice 98: API path shower-counts + window-rounding → cert 0330 1e-4 -aa6645e3 Slice 97: API glazing_type=2 → RdSAP 10 Table 24 (DG 2002-2021) +aa6645e3 Slice 97: API glazing_type=2 → RdSAP 10 Table 24 da5e7196 Slice 96: flat-roof U-value defaults — RdSAP 10 §5.11 Table 18 col (3) -5d1778ac chore: stage cert 9501 fixtures (second boiler validation cert) -17646c8a chore: stage cert 0380 fixtures (HP pilot — deferred workstream) -460f1735 chore: stage cert 0330 fixtures (boiler pilot) ``` -Test baselines you should see (197 pass + 9 pre-existing 001479 +Test baselines you should see (429 pass + 9 pre-existing 001479 Layer 1 fails): ```bash PYTHONPATH=/workspaces/model python -m pytest \ backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \ - domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \ + backend/documents_parser/tests/test_elmhurst_extractor.py \ + backend/documents_parser/tests/test_elmhurst_end_to_end.py \ domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \ domain/sap10_ml/tests/test_rdsap_uvalues.py \ datatypes/epc/schema/tests/test_schema_loading.py \ --no-cov -q ``` -Layer 4 1e-4 gates passing: +**Layer 4 1e-4 production gates passing (3 boiler certs, dual-path):** -- `test_api_001479_full_chain_sap_matches_worksheet_pdf_exactly` -- `test_api_0330_full_chain_sap_matches_worksheet_pdf_exactly` ← landed this session -- `test_summary_001479_full_chain_sap_matches_worksheet_pdf_exactly` -- `test_summary_0330_full_chain_sap_matches_worksheet_pdf_exactly` ← landed this session -- 000477 / 000516 cohort chain tests +| Cert | Heating | Dwelling | Worksheet SAP | Summary | API | +|---|---|---|---|---|---| +| 001479 (0535-...-6222) | Mains gas boiler | Mid-terrace house | 69.0094 | ✓ | ✓ | +| 0330-2249-... | Mains gas boiler | Mid-terrace house + ext | 61.5993 | ✓ | ✓ | +| 9501-3059-... | Mains gas boiler | Top-floor flat + RR + PV | 68.5252 | ✓ | ✓ | -## Cert 9501 — staged but RED (Δ -5.25 SAP) +## Outstanding workstreams (in priority order) -Fixtures committed in `5d1778ac`: -- API JSON: `domain/sap10_calculator/rdsap/tests/fixtures/golden/9501-3059-8202-7356-0204.json` -- Summary PDF: `backend/documents_parser/tests/fixtures/Summary_000784.pdf` -- Worksheet (reference): `sap worksheets/Additional data with api/9501-3059-8202-7356-0204/dr87-0001-000784.pdf` +### 1. Heat-pump workstream — 7 ASHP certs (DEFERRED until go-ahead) -Cert shape (per worksheet header): -- Property type: **Flat, Mid-Terrace** (mid-floor — `not` top-floor as - the prior handover claimed) -- Storeys (building): 4 -- Age band: B -- TFA: 113.08 m² -- Heating: mains-gas boiler, PCDB idx 19007 (Vaillant) -- Worksheet target unrounded SAP: **68.5252** +Cert refs (per the prior handover): 0350, 0380, 2225, 2636, 3800, +9285, 9418. Predominantly PCDB index 104568 (one model 102421). The +mapper has never been validated against a heat-pump cert. -### Cascade-component diff (Summary path vs worksheet) +Cert 0380 fixtures are already staged (commit `17646c8a`). Original +probe showed: -``` -TFA: 113.08 = 113.08 ✓ -walls: 148.89 vs 218.81 (Δ -69.92 ← BIG — missing RR gables) -roof: 18.10 = 18.10 ✓ (Table 18 age B col-(3) + - col-(1) compound — fine) -floor: 9.25 vs 0.00 (Δ +9.25 ← FLAT GROUND-FLOOR PARTY) -windows: 25.83 = 25.83 ✓ -doors: 5.55 = 5.55 ✓ -party: 7.36 vs 0.00 (Δ +7.36 ← worksheet U_party=0 for flat) -bridges: 25.00 vs 28.39 (Δ -3.39 ← downstream of (31) shrink) -(37) tot: 239.98 vs 296.68 (Δ -56.70 ← composite) - -ECF: 2.6326 vs 2.2563 (too high; SAP too low by 5.25) -``` - -### Worksheet element decomposition (line 187-205) - -``` -Element Net Area U A x U -(26) Doors uninsulated 1 1.85 3.00 5.55 -(27) Windows 1 10.60 2.44 25.83 -(28a) Ground floor Main 67.58 0.00 0.00 ← PARTY -(29a) External walls Main 99.26 1.70 168.74 -(29a) Roof room Main Gable Wall 1 13.50 1.70 22.95 ← RR -(29a) Roof room Main Gable Wall 2 15.95 1.70 27.12 ← RR -(30) Roof room Main Flat Ceiling 1 5.50 0.19 1.045 ← RR -(30) External roof Main 42.63 0.40 17.05 -(31) Total net area = 189.29 m² -(33) Fabric heat loss = 268.28 -(32) Party walls Main 52.54 0.00 0.00 ← PARTY -(32d) Dwelling below Main 6.85 — — ← PARTY -(35) TMP = 250 -(36) Bridges (0.150 × 189.29) 28.39 -(37) Total fabric heat loss 296.68 -``` - -### Localised mapper gaps - -The Summary path's `EpcPropertyData` has these load-bearing wrong -or missing fields: - -| Field | Currently | Should be | +| Path | Cascade SAP | Δ vs worksheet 88.5104 | |---|---|---| -| `dwelling_type` | `"Number of Storeys: flat"` (mangled by extractor) | `"Flat"` | -| `built_form` | `"Number of Storeys:"` (mangled) | `"Mid-Terrace"` (or similar) | -| `sap_flat_details` | `None` | populated with the cert's flat position | -| `sap_building_parts[0].sap_room_in_roof` | likely None | populated with the RR's gable walls + flat ceiling areas | +| Summary mapper | 18.08 | **-70.43** (catastrophic — Summary identifies HP as 80% boiler) | +| API mapper | 70.14 | **-18.37** | -**Order of attack** (each is a slice candidate): - -1. **Fix the Elmhurst extractor's `dwelling_type` / `built_form` - parsing** for this Summary PDF format. Some other section of the - PDF is bleeding into the parsed value (the "Number of Storeys:" - prefix). The extractor's anchor for `built_form` is likely - matching too eagerly; check `ElmhurstSiteNotesExtractor`. Don't - guess — read the Summary_000784.pdf header section + compare to - what `ElmhurstSiteNotesExtractor` returns. - -2. **Populate `sap_flat_details`** in `EpcPropertyDataMapper. - from_elmhurst_site_notes`. The cascade's `_dwelling_exposure` - reads from this field (see - `domain/sap10_calculator/rdsap/cert_to_inputs.py`) to gate - floor/roof contributions per RdSAP 10 §5. For cert 9501 (mid- - floor flat), both floor (party with dwelling below) and roof - (party with dwelling above) should be excluded — but the cert - does have an RR with gable walls and flat ceiling exposed - externally, so the dwelling has SOME exposed roof. - -3. **Populate `sap_room_in_roof`** with the RR-specific geometry: - gable walls 13.50 + 15.95 m², flat ceiling 5.50 m². Worksheet - lodges these as part of the Main bp's (29a) walls + (30) roof. - Cascade reads from `sap_room_in_roof.detailed_surfaces` — - check `worksheet/heat_transmission.py` for the surfacing - convention. - -4. **Re-pin or remove cert 9501 from Layer 4 tracking** once - Summary path lands at 1e-4. The RED test was NOT committed this - session (working-tree-only) — add the equivalent of - `test_summary_0330_full_chain_sap_matches_worksheet_pdf_exactly` - for cert 9501 once the gap closes. - -### API path expected gaps (after Summary lands) - -The API JSON for cert 9501 lodges `property_type=2` (Flat) and -`built_form=NR`. The API mapper needs to populate `sap_flat_details` -from `floors[]` + `roofs[]` + the GOV.UK schema's flat-specific -fields. Probable additional gaps (same pattern as Summary): -- `sap_flat_details` mid-floor exposure routing -- RR detection from cert's `roofs[].description` if the cert lodges - an attic-style roof - -## Heat-pump workstream (cert 0380 + 6 sibling ASHPs) — DEFERRED - -Per the user's direction, the 7 ASHP certs are deferred until the -boiler workflow is proven. Status: - -- Cert 0380 fixtures staged in commit `17646c8a` (worksheet target - SAP 88.5104). Original probe showed catastrophic Δ -70 SAP on - Summary path and Δ -18 SAP on API path — the Summary mapper - identified the HP as an 80%-efficient boiler. -- 6 other ASHPs share PCDB index 104568 (one uses 102421) — work - is likely shared across them. - -Work sketch (from the prior handover): +The Summary mapper is fundamentally broken on heat pumps; the API +mapper is partially-broken. Likely 15-30 slice workstream. Sketch +(from the prior handover, unchanged): 1. **API mapper**: surface `main_heating_index_number`, set `main_heating_category` for HPs, `main_fuel_type=29` (electric @@ -178,45 +84,104 @@ Work sketch (from the prior handover): 5. **Summary mapper**: separate slice — needs to identify HPs from the Summary PDF's heating section. -Do NOT start HP slices without an explicit go-ahead from the user. +**Do NOT start HP slices without an explicit go-ahead from the user.** -## Conventions (preserved) +### 2. 8 cohort golden certs without worksheets + +The 8 cert refs currently in `test_golden_fixtures.py` (0240, 0300, +0390-2954, 6035, 7536, 8135, 2130, 0390-2254) are API-only with +integer SAP residual pins. Some have non-trivial residuals +(0240=-14, 0390-2954=-6, 6035=-6) that suggest mapper coverage gaps. + +If worksheets become available for any of them, migrate to Layer 4 +1e-4 chain pins (cleanest forcing function). Until then, the +residual pins are the only gate. + +The recent gap-aware DG-pre-2002 glazing lookup (Slice 100c) tightened +PE / CO2 residuals on 5 of these 8 certs by surfacing the correct +spec-table U per `glazing_gap`. Other coverage gaps probably surface +similarly — gap-aware lookups for glazing_type=2 (DG 2002+) and 13 +(DG argon post-2022) are candidates the next time a residual drifts. + +### 3. Solar battery storage (user-flagged) + +User question this session: "do we handle solar battery?" — Partial +coverage: the data model has `SapEnergySource.pv_battery_count` + +`SapEnergySource.pv_batteries: Optional[PvBatteries]`, the API mapper +extracts both, but the Elmhurst Summary mapper hardcodes +`pv_battery_count=0` and doesn't parse battery details from the PDF. +The cascade's Appendix M battery-storage adjustment (PV self- +consumption fraction with battery) hasn't been audited. None of the +three closed boiler certs lodge a battery so it's not blocking — but +it's a known gap. + +## Key learnings from cert 9501 closure (replicate for HP workstream) + +1. **Two RR JSON shapes coexist**: `room_in_roof_type_1` (Simplified + Type 1, cohort certs) and `room_in_roof_details` (Detailed RR, + newer certs). The schema must model both; the API mapper picks + whichever block is populated. Slice 100a added the new dataclass + alongside the legacy one. + +2. **Two PV JSON shapes coexist**: `photovoltaic_supply` as a nested + list (cohort cert 2130) vs `{"pv_arrays": [{...}]}` dict wrapper + (cert 9501). Schema needs the `pv_arrays` field, mapper dispatcher + handles both shapes. Slice 100c. + +3. **RR floor area lives under `sap_room_in_roof.floor_area`, NOT + `sap_floor_dimensions`**: the per-bp TFA helper must add it + explicitly. Cohort certs (e.g. 0240 with 83.2 m² RR floor) were + silently dropping the RR area from TFA — Slice 100b fixed this + and tightened cohort 0240 SAP residual -15 → -14, 6035 PE + +49.51 → +47.85. + +4. **API `PhotovoltaicArray.pitch` is the RdSAP enum (1-5), NOT + degrees**: codes 1=0°, 2=30°, 3=45°, 4=60°, 5=90°. Summary mapper + needs `_elmhurst_pv_pitch_code` to snap-to-nearest. The wrong-by- + one-unit shift inflates PV generation ~2.5% (Slice 99e). + +5. **Glazing U-value is type+gap-aware in RdSAP 10 Table 24**: + `glazing_type=3` (DG pre-2002) has U=3.1 (6mm), 2.8 (12mm), 2.7 + (16+). 5/8 cohort certs use 16+ — flat lookup at the type-only + default U=2.8 was wrong for 5 of them. Slice 100c. + +6. **Flats with RR have external gable walls, not party walls**: + Top-floor flats sit at the building's end (no neighbour above); + the gables are exposed external (U = main-wall U) not party + (U=0.25). Threading `is_flat=True` through the RR surface + mapper picks `gable_wall_external` for un-typed gables. Slice 99c + (Summary) + Slice 100a (API). + +7. **`dwelling_type` floor-position prefix gates exposure routing**: + For flats, `_dwelling_exposure` in cert_to_inputs.py prefix- + matches "top-floor" / "mid-floor" / "ground-floor". The Elmhurst + mapper composes the position from `floor.location` ("dwelling + below" → not ground) + RR presence (→ top vs mid). Slice 99b. + +## Conventions (preserved — unchanged this session) - **One slice = one commit** — stage by name. - **AAA test convention** — literal `# Arrange / # Act / # Assert`. -- **`abs(diff) <= tol`** not `pytest.approx` (strict-pyright clean). +- **`abs(diff) <= tol`** not `pytest.approx`. - **1e-4 worksheet tolerance** when worksheet is available. -- **Spec citation** in commit messages when a slice implements a - spec rule (quote RdSAP 10 / SAP 10.2/10.3 page reference). -- **Pyright net-zero per file**. Updated baselines: - - `datatypes/epc/domain/mapper.py`: 33 +- **Spec citation** in commit messages when implementing a spec rule. +- **Pyright net-zero per file**. Updated baselines (Slice 100c + improved mapper.py by 1): + - `datatypes/epc/domain/mapper.py`: **32** (was 33; extracting + `_api_sap_window` resolved one) - `domain/sap10_calculator/worksheet/heat_transmission.py`: 13 - `domain/sap10_calculator/rdsap/cert_to_inputs.py`: 35 - `datatypes/epc/domain/epc_property_data.py`: 1 (pre-existing) - `domain/sap10_ml/rdsap_uvalues.py`: 1 (pre-existing) -## Tooling shortcuts (unchanged) - -- EPC fetch: `OPEN_EPC_API_TOKEN` (NOT `EPC_AUTH_TOKEN`) in - `backend/.env`. -- Worksheet SAP: `pdftotext -layout -` then grep. -- Cascade-component probe: reuse the inline pattern from this - handover's "Cascade-component diff" section above. - -## Open items / known gaps from prior session +## Open items / known gaps (carried forward) - Pre-existing `test_roof_insulated_assumed_with_ni_thickness_uses_ 50mm_per_section_5_11_4` in `test_heat_transmission.py` fails with `229.99 vs 68.0 ± 2` — verified pre-existing (stash test - showed same failure without my changes). Not addressed this - session; address separately when the §5.11.4 50mm-rule cascade - path is touched. -- 8 cohort golden certs (0240, 0300, 0390-2954, 6035, 7536, 8135, - 2130, 0390-2254) are API-only with integer SAP residual pins — - if worksheets become available for any of them, migrate to - Layer 4 1e-4 chain pins (cleanest forcing function). + showed same failure without cert 9501 changes). The §5.11.4 + 50mm-rule cascade path needs a separate audit. -Good luck. The diagnostic methodology (Summary path → worksheet 1e-4 -first, then API path catches up) is now proven on 2 boiler certs; -cert 9501 should land in ~3-5 slices once the flat-exposure plumbing -is in place. +Good luck with the HP workstream when the user gives the go-ahead. +Each cert pair has been closing in 3-5 slices using the methodology +proven over 8 slices (96-100c) on cert 9501.