From c783a15ff1aaa0fe8e2d6aca5ffc989475e7a259 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Tue, 26 May 2026 17:36:56 +0000 Subject: [PATCH] docs: handover for per-cert mapper validation workflow MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Rewrites the cert 001479 closure handover into a forward-looking brief for the new workstream: validating the API EpcPropertyDataMapper against 9 newly-staged (Summary + worksheet + API) cert triples. Key contents: - User's stated workflow (verbatim): Summary path proves itself against the worksheet → becomes canonical reference for API parity. - Folder-structure changes since the prior handover were written (packages/domain/ removed; sap10_calculator + sap10_ml now at the repo root under a PEP 420 namespace; docs/sap-spec/ moved into domain/sap10_calculator/docs/; PCDB data into tables/pcdb/data/). - New test data layout: `sap worksheets/Additional data with api/ /{Summary_NNNNNN.pdf, dr87-0001-NNNNNN.pdf}`. - Cert reference table with heating type, PCDB index, worksheet SAP, TFA, bp count, dwelling type for all 9 triples. - Major scope discovery: 7 of 9 are Air Source Heat Pumps (PCDB 104568 / 102421). The mapper has never been validated against HPs; cert 0380 pilot showed catastrophic deltas (Summary -70 / API -18 SAP vs worksheet). Recommended deferring HP certs until boiler workflow is proven. - Cert 0330 (mid-terrace gas boiler) pilot status: fixtures staged uncommitted; Summary path +0.47 SAP, API path +2.15 SAP vs worksheet 61.5993. Cascade-component diff localises 2 specific gaps (windows HLC +6.71 W/K likely from glazing_type=14 missing from Slice 93's transmission map; HW kWh +1060 needs §4 subsystem probe). - Tooling shortcut: use OPEN_EPC_API_TOKEN (not EPC_AUTH_TOKEN) in backend/.env with EpcClientService._fetch_certificate(cert_ref) to fetch raw JSON. - First actions for next agent: confirm baseline, commit cert 0330 fixtures, add RED Layer 2 test, iterate. Lesson preserved: cohort hand-builts encode non-spec quirks (e.g. has_suspended_timber_floor=False to override §(12) spec inference and match the non-spec worksheet). Cross-check against spec-inferred mapper output before trusting hand-built fields. Co-Authored-By: Claude Opus 4.7 --- .../docs/NEXT_AGENT_PROMPT.md | 602 ++++++++++-------- 1 file changed, 333 insertions(+), 269 deletions(-) diff --git a/domain/sap10_calculator/docs/NEXT_AGENT_PROMPT.md b/domain/sap10_calculator/docs/NEXT_AGENT_PROMPT.md index 569eac5a..f885b9cf 100644 --- a/domain/sap10_calculator/docs/NEXT_AGENT_PROMPT.md +++ b/domain/sap10_calculator/docs/NEXT_AGENT_PROMPT.md @@ -1,301 +1,365 @@ -# Handover — API mapper at 1e-4 on cert 001479; investigating goldens +# Handover — Per-cert validation workflow, 9 new triples staged -You are picking up branch `ara-backend-design-prd`. The cert 001479 API -path now hits the worksheet's continuous SAP 69.0094 **at < 1e-4** -(Slice 95). Layer 4 production goal is MET. Remaining work: investigate -golden cert residual outliers (especially cert 0240's -15 SAP) and -process any new (Summary + API) cert pairs the user sources. +You are picking up branch `feature/per-cert-mapper-validation` +(off main at `7fba27a7`, where the prior `ara-backend-design-prd` +work was merged via PR #1123). The user has shifted focus from +"close cert 001479 to 1e-4" (done — Slice 95) to "validate the +API mapper against more cert pairs to surface remaining mapping +gaps". 9 new (Summary + worksheet + API) triples have been +provided. The mapping is acknowledged-incomplete; expect many +mapper-completion slices. -## The end goal (re-confirmed by the user) +## The user's stated workflow (verbatim) -> **Production goal: `API JSON → EpcPropertyDataMapper.from_api_ -> response → SAP10 calculator → SAP rating` must match the SAP value -> the calculator emitted at lodge time to within 1e-4.** +> we pick one [cert], we then pass the Elmhurst summary document to +> `EpcPropertyDataMapper` to map the site notes data to +> `EpcPropertyData`, we then pass to the SAP calculator. If the +> output of the SAP calculator matches the SAP worksheet correctly, +> we know we have correctly mapped the EpcPropertyData. We then get +> the API response, map to `EpcPropertyData` using +> `EpcPropertyDataMapper`, then check if we have the same +> `EpcPropertyData` as the summary report (or same for the fields we +> care about). We also check we get the same result. > -> The acceptance tolerance is **1e-4 against the worksheet's -> continuous SAP value**, not ±0.5 against the published integer. -> ±0.5 only applies when no worksheet is available (the 8 cohort -> golden certs we have as API-only); when we have both API + worksheet -> (cert 001479), the 1e-4 bar is the bar. +> The `EpcPropertyData` objects matching is our signal that we've +> done things correctly. So this validates our mapping. -The earlier handover stated ±0.5 — that was wrong. The user -emphasised this twice: the calc is mechanical, identical inputs must -produce identical outputs, so when we have the continuous worksheet -value we should hit it exactly. See the conversation thread that led -to Slice 87. +Translation: Summary path proves itself against the worksheet → +becomes the canonical reference for the API path. This is Layer 2 + +Layer 3 + Layer 4 of the validation stack. -## Validation layers (current state) +## State at session start (this handover's baseline) + +Most recent commits (`sap10_calculator` + `sap10_ml` are now at the +repo root; `packages/domain/src/domain/` was removed): ``` -Layer 4: API mapper cascade SAP = worksheet SAP at 1e-4 (production goal) - └── Layer 3: API mapper EpcPropertyData ≡ Elmhurst mapper EpcPropertyData - └── Layer 2: Elmhurst-mapped EpcPropertyData → cascade SAP = worksheet SAP at 1e-4 - └── Layer 1: hand-built EpcPropertyData → cascade SAP = worksheet SAP at 1e-4 +6dc11e4d fix: resolve 10 remaining test_summary_pdf_mapper_chain failures +09fb6f1b fix: address 22 project-wide test failures from previous sweep +a7b08a4e refactor: move docs/sap-spec/ contents into domain/sap10_calculator/ +960130b0 deleted redundant packages folder +68401c51 refactor: lift-and-shift packages/domain/src/domain/ml → domain/sap10_ml +29ac35cc refactor: lift-and-shift packages/domain/src/domain/sap → domain/sap10_calculator +... (87b6045c "fixed merge conflicts from main", 168e7f18, 94975f3b deletions) +a75052dc chore: commit cert 001479 fixture + RdSAP/PCDF spec PDFs +f502db8c Slice 95: API mapper TFA from per-bp dims + window area 2dp rounding ``` -| Layer | Status | -|---|---| -| **1 — hand-built cascade pin** | ✅ 6 cohort certs (000474, 000477, 000480, 000487, 000490, 000516) GREEN at 1e-4; cert 001479 hand-built skeleton (Slice 62) still RED (2 of 11 pins green, hand-built has its own bugs — orthogonal to the production path) | -| **2 — Elmhurst-mapped path** | ✅ **Cert 001479 GREEN at 1e-4** (Slice 89); cohort: 2 GREEN (000477, 000516), 4 RED (000474, 000480, 000487, 000490 — Elmhurst U985 worksheets violate the RdSAP 10 §5 (12) spec; orthogonal to the production goal) | -| **3 — API-mapped ≡ Elmhurst-mapped (field-level)** | 🟡 Cascade outputs match at 1e-4 (Slice 95); field-level diff test not yet written but lower priority since cascade-output gate exists | -| **4 — API path cascade SAP** | ✅ **Cert 001479 GREEN at 1e-4** (Slice 95). `test_api_001479_full_chain_sap_matches_worksheet_pdf_exactly` formalises the gate. 8 other golden certs pinned at residual-from-integer at tolerance 0 | - -## Cumulative API SAP delta progression (cert 001479) - -The big breakthrough: implementing the RdSAP 10 §5 (12) spec rule -(`Floor infiltration (suspended timber ground floor only)` — page 29 -of `domain/sap10_calculator/docs/specs/RdSAP 10 Specification 10-06-2025.pdf`) revealed a -series of API-mapper coverage gaps that all needed fixing for the -spec rule's premise to be met. Each slice closed one gap: - -| Slice | Fix | API SAP delta | -|---|---|---| -| baseline | broken party wall enum, no descriptive strings | **+3.0752** | -| 87 | RdSAP 10 §5 (12) spec rule + Elmhurst-mapper switch to None | — | -| 88 | thread `bp.floor_construction_type` into `u_floor` cascade | — | -| 89 | PS pitched-sloping-ceiling roof area `÷ cos(30°)` (added `roof_construction_type` field on `SapBuildingPart`) | — | -| 90 | API `party_wall_construction` enum → SAP10 `u_party_wall` codes (1→3 Solid, 2→4 Cavity, etc.) | +1.5298 | -| 91 | descriptive strings via int→str lookups (`floor_construction_type`, `roof_construction_type`) + pre-1950 PS sloping → thickness=0 + per-bp roof description fix | +1.0970 | -| 92 | upper-floor `room_height_m += 0.25` + `is_exposed_floor` from `floor_heat_loss==1` + `floor_insulation_thickness="NI"→None` | +1.0022 | -| 93 | `window_transmission_details` from `glazing_type` int (code 3 → U=2.8/g=0.76, code 13 → U=1.4/g=0.72) | +1.1846 | -| 94 | `sheltered_sides` from API `built_form` + `floor_type` from `floor_heat_loss==7` | +0.0006 | -| 95 | API mapper `total_floor_area_m2` = Σ per-bp dims (worksheet-precise 68.51 not lodged-rounded 69) + RdSAP 10 §15 p.66 window 2dp area rounding in solar_gains/internal_gains | **< 1e-4** | - -Fabric breakdown for cert 001479 API path is now COMPLETELY EXACT -(all 6 components match worksheet to 4 d.p.): - -| Component | Cascade | Worksheet target | -|---|---|---| -| walls | 39.7652 | 39.7652 ✓ | -| party walls | 17.0700 | 17.0700 ✓ | -| roof | 10.3438 | 10.3438 ✓ | -| floor | 23.1705 | 23.1705 ✓ | -| windows | 43.5962 | 43.5962 ✓ | -| doors | 5.5500 | 5.5500 ✓ | -| **fabric total** | **139.4957** | **139.4957 ✓** | - -## What's left (queue, in priority order) - -### 1. Close cert 001479's residual 0.0006 SAP gap (1-3 slices) - -The remaining gap is non-fabric. Diff against the Summary path's -intermediate cascade values (which lands at 1e-4 GREEN): +Folder structure post-migration: ``` -Σ internal_gains_monthly_w: API 5339.27 Sum 5313.55 delta +25.72 -Σ solar_gains_monthly_w: API 5510.10 Sum 5508.60 delta +1.50 -Σ mean_internal_temp_monthly_c: API 214.87 Sum 213.51 delta +1.35 -Σ monthly_infiltration_ach: API 8.95 Sum 10.91 delta -1.96 -hot_water_kwh_per_yr: API 2365.00 Sum 2358.31 delta +6.69 +domain/ (PEP 420 namespace; no __init__.py) +├── addresses/, postcode.py, tasks/ +├── sap10_calculator/ ← was packages/domain/src/domain/sap/ +│ ├── calculator.py, climate/, rdsap/, tables/, validation/, worksheet/ +│ ├── docs/ ← was docs/sap-spec/ +│ │ ├── HANDOVER_NEXT.md, SAP_CALCULATOR.md +│ │ ├── NEXT_AGENT_PROMPT.md ← this file +│ │ └── specs/ ← RdSAP 10, SAP 10.2 + 10.3, PCDF spec PDFs +│ └── tables/pcdb/data/ ← pcdb10.dat + 7× pcdb_table_*.jsonl +└── sap10_ml/ ← was packages/domain/src/domain/ml/ ``` -Specifically: -- **Infiltration is still under by ~2 ACH/year**. The (12) spec rule - applies on both paths now (after Slice 87), so it's something else - — possibly `has_draught_lobby` (API=None, Summary=False; cascade - treats both as False so it shouldn't matter; verify) or `(13) - draught_lobby_ach`. Or storey count. Probe with - `ventilation_from_cert(api_mapped)` vs `ventilation_from_cert(sum_ - mapped)`. -- **HW kWh +6.7** suggests a small Appendix J §1a occupancy - difference, or a different Tcold series, or shower outlets. -- **Internal gains +25.7 W·months** — probably a pumps_fans count or - lighting bulb count mismatch. +`Path(__file__).parents[N]` indices were rebased through the move +(delta of 3); see `Dockerfile.test` (poppler-utils now installed for +test_summary_pdf_mapper_chain.py). + +## Test baselines you should see at HEAD `6dc11e4d` -Run the diff probe (the one from the conversation) to localise: ```bash -PYTHONPATH=/workspaces/model:/workspaces/model/packages/domain/src python -c " -from backend.documents_parser.tests.test_summary_pdf_mapper_chain import _diff_load_bearing, _LOAD_BEARING_FIELDS, _summary_pdf_to_textract_style_pages -from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor -from datatypes.epc.domain.mapper import EpcPropertyDataMapper -import json, dataclasses -from pathlib import Path - -api = json.loads(Path('/workspaces/model/domain/sap10_calculator/rdsap/tests/fixtures/golden/0535-9020-6509-0821-6222.json').read_text()) -api_mapped = EpcPropertyDataMapper.from_api_response(api) -pages = _summary_pdf_to_textract_style_pages(Path('/workspaces/model/backend/documents_parser/tests/fixtures/Summary_001479.pdf')) -sn = ElmhurstSiteNotesExtractor(pages).extract() -sum_mapped = EpcPropertyDataMapper.from_elmhurst_site_notes(sn) -diffs = [] -for f in _LOAD_BEARING_FIELDS: - diffs.extend(_diff_load_bearing(getattr(api_mapped, f, None), getattr(sum_mapped, f, None), f)) -print(f'{len(diffs)} load-bearing divergences') -for d in diffs[:40]: print(f' {d}') -" +PYTHONPATH=/workspaces/model python -m pytest \ + backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \ + domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \ + domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \ + --no-cov -q +# Expect: 17/0 in mapper-chain + Layer 1 baseline + golden residual baseline ``` -(NB: the original `_diff_load_bearing` was written for cohort -diff tests; the helper signature is `mapped, hand_built, path` — pass -api_mapped as `mapped` and sum_mapped as `hand_built` to surface API -gaps.) +Wider domain sweep (1654 / 20 baseline): 9 hand-built 001479 +skeleton + 10 cohort Layer 1 pins + 1 heat_transmission edge case += 20 RED, all pre-existing and orthogonal to mapper work. -### 2. Layer 3 — write the API ≡ Elmhurst diff test (1 slice) +**Layer 4 production gate**: +`test_api_001479_full_chain_sap_matches_worksheet_pdf_exactly` — +**GREEN at < 1e-4**. Keep it green. -Add `test_from_api_response_matches_from_elmhurst_site_notes_001479` -in `backend/documents_parser/tests/test_summary_pdf_mapper_chain.py`, -mirroring the cohort `test_from_elmhurst_site_notes_matches_hand_ -built_NNNNNN` pattern. Use `_diff_load_bearing` with `_LOAD_BEARING_ -FIELDS`. This formalises Layer 3 as a 1e-4 gate (zero load-bearing -divergences between the two mapper outputs). +## The new test data -This test will start RED with the residual diffs from step 1; closing -those slices brings it to GREEN. +Location: `sap worksheets/Additional data with api//` -### 3. More cert pairs (user is sourcing — pause for new data) +Each folder is named by the GOV.UK EPB certificate number. Contains: -The user has agreed to source 2-3 more (Elmhurst worksheet + GOV.UK -API JSON) pairs to validate the mapper isn't 001479-overfit. -Suggested diversity: +- `Summary_NNNNNN.pdf` — Elmhurst-format site notes +- `dr87-0001-NNNNNN.pdf` — worksheet (`dr87-` prefix is a Domna-tool + variant; same shape as the `P960-` worksheet for cert 001479) -- **Detached + RR** (would fix cert 0240's -14 residual which has a - Type-1 RR the mapper doesn't extract). -- **Mid-terrace with cavity-filled party walls** (API party_wall_ - construction=3 → spec U=0.2; currently mapped to SAP10 code 4 - which gives U=0.5; needs cascade extension at - `u_party_wall`). -- **Flat / maisonette** (party wall U=0 path; cert 9390 is one but - no worksheet). -- **Different age band** (E, J, K, L) to exercise the (12) spec - rule's age boundaries. +The API JSON is **not** in the folder — fetch from GOV.UK EPB using +the cert-ref: -Each new pair lands as a 1e-4 cascade-pin test. Pattern: ~3-5 new -mapper bugs per cert pair (similar to Slice 87-94 on 001479). Each -becomes its own slice. Stage by name; one slice = one commit. - -### 4. Investigate goldens with shifted residuals after Slices 87-95 - -Slices 87-94 shifted residuals on 7 of 10 API-only golden certs; -Slice 95 (precise TFA + window 2dp area rounding) shifted 5 more -(0240, 6035, 8135, 2130, 0390-2254). All residuals are re-pinned. -Current outliers and what we now know: - -- **0240** (-15 SAP, +17.8 PE): Detached age J + RR + 11 windows. The - earlier handover claim of "RR mapper gap" is **partly stale**: - - `room_in_roof_type_1.gable_wall_length_1/2` ARE extracted by the - 21.0.1 mapper (see mapper.py:1349-1369 — must have landed in - Slices 71-86). Cert 0240's RR cascades through with floor_area= - 83.2, gables 6.4 + 6.4, age J → U_RR = 0.30 W/m²K. - - `'Roof room(s), insulated (assumed)'` description NOT parsed — - but the spec basis for parsing it is unclear: age J's Table 18 - col(4) default already models insulation (U=0.30), and unlike - the regular-roof "insulated (assumed)" → 50 mm bucket rule - (RdSAP §5.11.4), no equivalent rule for RR has been identified. - - The -15 SAP residual is a mix, not a single RR gap. Subsystem - breakdown for cert 0240 (via cert_to_inputs cascade): - - walls 22.95, party 0, roof 76.93 (incl RR ~18.5), floor 29.43, - windows 41.55, doors 11.10, bridging 39.64; total HLC 221.6 W/K - - **windows_w_per_k = 41.55 is the most leverageable**: 11 - windows × 18.28 m² × U_default ≈ 2.27 W/m²K. Cert lodges - `glazing_type=2` for all windows but Slice 93's - `_API_GLAZING_TYPE_TO_TRANSMISSION` only covers codes 3 and 13; - surfacing code 2 would land a measurable U (likely ~1.8-2.0) - and close several W/K of fabric loss. - - Other potential gains: BP[0] non-RR ceiling lodges "Pitched, - 400+ mm loft insulation" (should U ~0.10); verify cascade - gives it that. - - **Net**: cert 0240 is not a single-slice fix; it's 3-5 - progressive mapper improvements (glazing_type 2 surfacing, - possibly more glazing codes, possibly RR description nuance). -- **0390-2954** (-6 SAP, -26.5 PE): large detached F (TFA 360), oil - PCDB-listed. Undocumented. PE going more negative than SAP suggests - the cost cascade is hitting harder than energy — possibly oil - price/efficiency interaction. -- **6035** (-6 SAP, +49.5 PE): mid-terrace age A + RR. Probably has - the same glazing_type-default-U issue as 0240 plus an age-A- - specific gap. - -### 5. (deferred) Cohort chain test RED triage - -4 cohort chain tests (000474, 000480, 000487, 000490) are RED -because the Elmhurst U985 worksheets emit (12) values that don't -follow RdSAP 10 §5 — see the conversation re: identical Summary §9 -lodgements producing different worksheet (12) for cohort 000477 vs -000480. The cascade is now spec-correct; the Elmhurst tool isn't. -Options: (a) mark as known-Elmhurst-non-spec, (b) add per-cert -override field, (c) wait for more cert pairs to confirm pattern. -**Not blocking the production goal.** - -## Key conventions (project memory) - -- **AAA test convention** — every new test uses literal `# Arrange / - # Act / # Assert` headers. -- **`abs(diff) <= tol`** not `pytest.approx` (strict-pyright partial- - unknown). -- **One slice = one commit** — stage by name (`git add `). -- **1e-4 tolerance** for the worksheet-comparable paths (Elmhurst - Summary + API both have worksheets for cert 001479). No widening, - no xfail. -- **Strict pyright net-zero** per file. Baselines: `mapper.py` 33, - `heat_transmission.py` 13, `cert_to_inputs.py` 35, - `epc_property_data.py` 0. -- **Spec citation in commit messages** — when a slice implements a - spec rule, quote the spec text (RdSAP 10 page reference). User - asked us to confirm against docs. - -## Cached artefacts - -- `domain/sap10_calculator/rdsap/tests/fixtures/golden/0535- - 9020-6509-0821-6222.json` — API JSON for cert 001479 (RdSAP-Schema- - 21.0.1). -- `backend/documents_parser/tests/fixtures/Summary_001479.pdf` — - Elmhurst site-notes PDF for cert 001479. -- `sap worksheets/lodged example/P960-0001-001479.pdf` — Domna's - worksheet output for cert 001479 (Continuous SAP 69.0094). -- `sap worksheets/U985-0001-NNNNNN.pdf` × 6 — cohort Elmhurst - worksheets (000474, 000477, 000480, 000487, 000490, 000516). -- `sap worksheets/U985-0001-NNNNNN.txt` × 6 — text exports of above. - -## Recent slice history (Slices 87-95, current branch) - -``` -f502db8c Slice 95: API mapper TFA from per-bp dims + window area 2dp rounding — cert 001479 to 1e-4 -03203418 Slice 94: API mapper sheltered_sides + floor_type — cert 001479 to 1e-3 -7281b7b3 Slice 93: API mapper window_transmission_details from glazing_type -8e752e57 Slice 92: API mapper floor dimensions (SAP +0.25m + exposed-floor + NI→None) -2cebba28 Slice 91: API mapper descriptive strings + roof description per-bp fix -fbbdca49 Slice 90: API mapper translates party_wall_construction → SAP10 enum -006e9842 Slice 89: PS pitched-sloping-ceiling roof area uses inclined surface -c40679d1 Slice 88: thread bp.floor_construction_type into u_floor cascade -aff331ff Slice 87: implement RdSAP 10 §5 (12) spec rule for suspended timber floor -2d3355ee Slice 86: 1:1 windows expansion in cohort 000516 (2 → 5 entries) -f863598d Slice 85: bulk-update cohort 000516 hand-built for Cat A diff parity +```python +from backend.epc_client.epc_client_service import EpcClientService +from dotenv import load_dotenv +import os +load_dotenv('/workspaces/model/backend/.env') # OPEN_EPC_API_TOKEN +svc = EpcClientService(auth_token=os.environ['OPEN_EPC_API_TOKEN']) +raw = svc._fetch_certificate('') # raw JSON dict ``` -Earlier slice context (71-86 closed cohort Layer 2) is in the prior -handover at commit `86eff23f` (`domain/sap10_calculator/docs/NEXT_AGENT_PROMPT.md` -before this rewrite). +Note: use `OPEN_EPC_API_TOKEN` not `EPC_AUTH_TOKEN` (the latter is +for a different/legacy API). -## First action +### 9 cert references + heating type + worksheet SAP -1. Confirm branch state — Slice 95 (`f502db8c`) closed cert 001479 to - < 1e-4 (was +0.0006 after Slice 94). Layer 4 is GREEN. -2. Run the full sweep: +| Cert ref | Worksheet | Heating | PCDB idx | Worksheet SAP | TFA | bps | Dwelling | +|---|---|---|---|---|---|---|---| +| `0330-2249-8150-2326-4121` | 000897 | **Mains gas boiler** | 10241 | 61.5993 | 69.14 | 2 | Mid-terrace house | +| `0350-2968-2650-2796-5255` | 000903 | ASHP | 104568 | 84.1367 | 90.54 | 2 | Mid-terrace house | +| `0380-2471-3250-2596-8761` | 000899 | ASHP | 104568 | 88.5104 | 60.43 | 1 | Semi-detached bungalow | +| `2225-3062-8205-2856-7204` | 000900 | ASHP | 104568 | 88.7921 | 82.49 | 1 | End-terrace house | +| `2636-0525-2600-0401-2296` | 000901 | ASHP | 104568 | 86.2641 | 82.10 | 1 | Mid-terrace house | +| `3800-8515-0922-3398-3563` | 000898 | ASHP | 104568 | 86.1458 | 81.34 | 2 | Mid-terrace house | +| `9285-3062-0205-7766-7200` | 000902 | ASHP | 104568 | 84.1369 | 85.90 | 1 | End-terrace house | +| `9418-3062-8205-3566-7200` | 000896 | ASHP | 102421 | 84.6305 | 74.37 | 3 | End-terrace house | +| `9501-3059-8202-7356-0204` | (RR cert — newest, added late in session) | **Mains gas boiler** | 19007 | (not measured) | — | — | Top-floor flat | + +**Heating-type split**: +- 2 mains gas boilers: 0330, 9501 (validated mapper territory) +- 7 ASHPs: 0350, 0380, 2225, 2636, 3800, 9285, 9418 (**brand-new + mapper territory — never validated**) + +One earlier mismatch — cert 0330's folder originally held the wrong +property's Summary/worksheet (17 vs 21 Summerfield Road); the user +fixed mid-session and Summary_000897/dr87-0001-000897 now match +cert ref 0330 correctly. The other 8 were audited and match. + +## Major scope discovery — Heat Pumps + +7 of the 9 new certs are Air Source Heat Pumps (predominantly PCDB +index 104568, one model 102421). The mapper has never been +validated against a heat-pump cert — cohort certs + cert 001479 are +all mains-gas boilers. + +**Cert 0380 (initial pilot attempt) showed catastrophic failures**: + +| Path | Cascade SAP | Δ vs worksheet 88.5104 | +|---|---|---| +| Summary mapper | 18.08 | **-70.43** | +| API mapper | 70.14 | **-18.37** | + +Diff: Summary identified the heat pump as an 80%-efficient boiler +(catastrophic); API correctly identified it as a heat pump with +COP=2.3 but cascade output still −18 SAP below worksheet (fabric +HLC 104 vs probably ~50 needed). The Summary mapper is +fundamentally broken on heat pumps; the API mapper is +partially-broken. + +**Recommendation**: defer the heat-pump certs until the boiler +workflow is proven. Closing 7 ASHP certs is plausibly a 15-30 slice +workstream (new mapper plumbing for PCDB COP, electric tariff +costing for HW + space heating, Appendix N heat-pump efficiency +adjustments, etc.). Cert 0380 (smallest TFA bungalow, single bp) +is the pilot HP cert once boiler workflow is proven. + +## Pilot status — cert 0330 (mains-gas mid-terrace boiler) + +Same shape as cert 001479 (proven). API JSON staged at +`domain/sap10_calculator/rdsap/tests/fixtures/golden/ +0330-2249-8150-2326-4121.json` (**uncommitted**). Summary PDF +copied to +`backend/documents_parser/tests/fixtures/Summary_000897.pdf` +(**uncommitted**). + +### Cascade SAP comparison + +| Path | Cascade SAP | Δ vs worksheet 61.5993 | +|---|---|---| +| Summary mapper | 62.0660 | **+0.4667** (just over 0.5) | +| API mapper | 63.7446 | **+2.1453** (≥2 SAP off) | +| Δ API↔Summary | +1.6786 | (mapper paths disagree) | + +### Cascade-component diff (API vs Summary) + +``` +TFA: 90.56 = 90.56 ✓ +storeys: 2 = 2 ✓ +HLC walls: 113.535 ≈ 113.520 (Δ +0.015 — negligible) +HLC roof: 7.323 = 7.323 ✓ +HLC floor: 30.705 = 30.705 ✓ +HLC windows: 36.455 vs 29.741 (Δ +6.71 ← BIG) +HLC doors: 11.100 = 11.100 ✓ +HLC party: 11.357 = 11.357 ✓ +HLC bridge: 28.347 = 28.347 ✓ +HLC total: 238.822 vs 232.093 (Δ +6.73 — all from windows) +Inf ACH: 0.7382 = 0.7382 ✓ +HW kWh: 3172.65 vs 2112.00 (Δ +1060 ← BIG) +Lighting kWh: 207.92 = 207.92 ✓ +Main eff: 0.8850 = 0.8850 ✓ +``` + +Two specific gaps to investigate as separate slices: + +1. **Windows HLC +6.71 W/K** — likely `glazing_type=14` (cert 0330) + not in Slice 93's `_API_GLAZING_TYPE_TO_TRANSMISSION` (only codes + 3 and 13 are mapped). Same shape as cert 001479's + `glazing_type=2` issue; extending the dict should close this. + Affects multiple certs that use code 14. + +2. **HW kWh +1060 (API 3172 vs Summary 2112)** — substantial + divergence in §4 hot water cascade. Needs probe of which + subsystem (occupancy N, shower outlets, electric_shower_count, + cylinder, etc.) the API mapper is reading wrong. Cert 0330 + doesn't have the +0.5m upper-storey adjustment quirk cert 001479 + needed (Slice 92), so different root cause likely. + +(The user observed: "the mapping is very much incomplete (hence we +have some non 0 matches to elmhurst summary matches)" — non-1e-4 +matches are expected and tractable.) + +### 116 field-level divergences (API vs Summary) + +Most are cascade-equivalent surfacing differences (Slice 91-era +descriptive strings + int/None vs explicit-bool patterns) — the +same shape `_is_excluded_path` already handles for the cohort +certs. New specific concrete diffs that DO affect the cascade: + +- `sap_windows[*].window_transmission_details` — Summary has + explicit U/g/data_source; API has None for `glazing_type=14` + (cascade falls back to default U → too high) +- `sap_windows[*].frame_factor` — Summary 0.7, API None +- `sap_windows[*].window_width / window_height` — same w*h area + rounding pattern as cert 001479 (handled in Slice 95) + +## Workflow recommendation for next slice queue + +For each new cert (after cert 0330 pilot lands): + +1. **Stage**: fetch API JSON, copy Summary PDF into fixtures +2. **Probe**: run the cascade-component diff (recreate the inline + pattern; the probe takes both `summary_epc` and `api_epc`, lowers + via `cert_to_inputs`, diffs each subsystem) +3. **Localise** the biggest cascade-component delta +4. **Fix** the mapper to close it; one fix = one slice +5. **Add Layer 4 1e-4 test** when both Summary and API paths hit + worksheet at 1e-4 (cert may pass Summary path first, then + iterate API mapper to catch up) +6. **Commit**: stage by name (`git add `), cite spec page + when implementing a spec rule + +### Cohort-style fixture pattern + +If a cert benefits from a hand-built fixture (Layer 1), mirror the +cohort pattern at +`domain/sap10_calculator/worksheet/tests/_elmhurst_worksheet_NNNNNN.py` +— with prefix `_dr87_worksheet_NNNNNN.py` for the new Domna-tool +worksheet variant. + +**WARNING (lesson from previous session)**: the cohort hand-builts +encode non-spec quirks (e.g. `has_suspended_timber_floor=False` to +mirror the worksheet's non-spec §(12) behaviour for 4 certs). Don't +blindly trust the hand-builts as spec-correct; cross-check against +the mapper's spec-inference output before committing. + +## Conventions (preserved from previous handover) + +- **One slice = one commit** — stage by name. +- **AAA test convention** — literal `# Arrange / # Act / # Assert` + headers in every new test. +- **`abs(diff) <= tol`** not `pytest.approx` (strict-pyright clean). +- **1e-4 worksheet tolerance** when worksheet is available; ±0.5 + fallback only for API-only goldens. +- **Spec citation** in commit messages when a slice implements a + spec rule (quote RdSAP 10 / SAP 10.2/10.3 page reference). +- **Pyright net-zero per file**. Baselines (re-verify at session + start): + - `datatypes/epc/domain/mapper.py`: 33 + - `domain/sap10_calculator/worksheet/heat_transmission.py`: 13 + - `domain/sap10_calculator/rdsap/cert_to_inputs.py`: 35 + - `datatypes/epc/domain/epc_property_data.py`: 0 + +## First actions for the next agent + +1. Confirm HEAD: `git log --oneline -1` → `6dc11e4d`. +2. Re-baseline: ```bash - PYTHONPATH=/workspaces/model:/workspaces/model/packages/domain/src \ - python -m pytest backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \ - domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \ - domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \ - --no-cov -q + PYTHONPATH=/workspaces/model python -m pytest \ + backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \ + domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \ + domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \ + --no-cov -q ``` - Expect **99 passed / 19 failed**. All 19 failures pre-existing: - 9× hand-built 001479 skeleton (`test_sap_result_pin[001479-*]`), - 6× cohort diff (`test_from_elmhurst_site_notes_matches_hand_built_*`), - 4× cohort chain (000474/000480/000487/000490 — Elmhurst non-spec). -3. Production goal is met for cert 001479. Next work focuses on the - golden cert residual outliers (§4 above) and new (Summary + API) - cert pairs from the user. The diff-probe methodology from Slice 95 - (cascade-component diff API vs Summary path; localise; fix mapper) - works for any new (Summary + API) pair — worksheet not required - when Summary path is established as canonical. -4. Don't lose sight of Layer 4: **API → SAP within 1e-4 of worksheet - continuous on cert 001479** is the production goal. **MET as of - Slice 95** — `test_api_001479_full_chain_sap_matches_worksheet_pdf_ - exactly` formalises this gate. +3. Pick up cert 0330 pilot. Either continue from where I left off + (fixtures staged uncommitted, 2 specific gaps identified above) + OR pivot to a different boiler cert if 0330 turns out + problematic (cert 9501 is the other boiler — top-floor flat with + PCDB idx 19007). +4. Commit cert 0330's fixtures (API JSON + Summary PDF) as the + foundation slice before working any mapper fixes: + ```bash + git add domain/sap10_calculator/rdsap/tests/fixtures/golden/0330-2249-8150-2326-4121.json + git add backend/documents_parser/tests/fixtures/Summary_000897.pdf + git commit -m "chore: stage cert 0330 fixtures (boiler pilot, worksheet SAP 61.5993)" + ``` +5. Add a RED Layer 2 test (Summary mapper cascade SAP at 1e-4 + vs 61.5993) — establishes the failing target. Then fix the + Summary path mapper bugs slice-by-slice. +6. Once Summary path is GREEN, do the same for the API path (Layer + 4). The API mapper may need additional fixes Summary doesn't + need — they're independent paths into the same `EpcPropertyData` + shape. +7. After cert 0330 lands as a clean Layer 4 1e-4 pin, repeat for + cert 9501 (the other boiler). 2 boiler certs proven is much + stronger evidence than 1. +8. Then plan the heat-pump workstream. The 7 ASHP certs share a + PCDB index (104568) so much of the fix is likely shared. Write + a follow-up handover for that workstream specifically. -The user is sourcing more cert pairs in parallel; when they arrive, -each one will surface ~3-5 mapper bugs along the same pattern as -Slices 87-95. The diagnostic methodology (diff Summary-mapper vs -API-mapper; localise by cascade component; fix the API mapper to -mirror the Summary's surfacing) works for any new (Summary + API) -pair — worksheet not required when Summary path is canonical (cert -001479 proves it is). +## Heat-pump workstream sketch (deferred) + +When the user gives the go-ahead, work order: + +1. **API mapper**: surface `main_heating_index_number`, set + `main_heating_category` for HPs, `main_fuel_type=29` (electric + heat pump). +2. **Cascade**: ensure `cert_to_inputs._main_heating_efficiency` + reads PCDB HP COP correctly. Investigate Table 4a/4b vs PCDB + precedence for HPs. +3. **Fuel cost**: HW + space heating on electricity tariffs + (Table 12) — check if the cascade has electric-tariff fuel-cost + plumbing wired up. +4. **Appendix N**: HP-specific efficiency adjustments (climate + + flow temperature). Likely the biggest cascade-side gap. +5. **Summary mapper**: separate slice — needs to identify HPs from + the Summary PDF's heating section. + +## Open items / known gaps not yet addressed + +- 8 API-only golden cert residuals still range from 0 to -15 SAP + delta (cert 0240 is the outlier — see prior handover §4 and + `test_golden_fixtures.py` notes). The user's stated end goal is + <0.5 SAP error on all goldens; cert 0240 needs RR-description + parsing (or Room-in-Roof mapping investigation) + glazing_type=2 + surfacing. +- Layer 3 field-parity test + (`test_from_api_response_matches_from_elmhurst_site_notes_001479`) + still not written. Lower priority since cascade-output Layer 4 + already gates parity. +- The 4 cohort chain tests for non-spec §(12) certs were deleted + this session; if the user later sources spec-compliant + worksheets for 000474/000480/000487/000490, those tests can be + restored (with the spec-correct hand-builts). + +## Tooling shortcuts + +- **EPC fetch**: `OPEN_EPC_API_TOKEN` (NOT `EPC_AUTH_TOKEN`) in + `backend/.env`. `EpcClientService._fetch_certificate(cert_ref)` + returns the raw JSON dict. +- **Worksheet SAP extract**: `pdftotext -layout -` + then `grep -E "SAP value\s+[0-9]+\.[0-9]+"`. Works for all + `dr87-`, `P960-`, and `U985-` worksheet variants. +- **Cascade-component probe template**: see the cert-0330 probe + inline above; same shape as the cert-001479 probe. + +Good luck. The methodology is proven on cert 001479 and partially +on cert 0330 (boiler pilot 95% closed). Each new cert pair should +land in 1-5 mapper slices. Stage by name; one slice = one commit; +cite spec when implementing a spec rule.