docs: handoff for expanding the real-life cert SAP-accuracy corpus

Strategy/context companion to the validate-cert-sap-accuracy skill: the per-cert loop, how to read the gov-API-vs-Elmhurst comparison, the code->value gotchas (immersion/cylinder/party-wall/baths/off-peak), known mapper gaps to chase (alt-wall drop), cert-selection for coverage, guardrails (corpus gauge, no tuning to one cert, no tolerance widening), and the current corpus state. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 13:10:47 +00:00 · 2026-06-15 15:28:40 +00:00 · 2026-06-15 15:28:40 +00:00 · e289c1449b
commit e289c1449b
parent 5c11fd35c8
1 changed files with 131 additions and 0 deletions
--- a/docs/HANDOVER_REAL_LIFE_CERT_ACCURACY.md
+++ b/docs/HANDOVER_REAL_LIFE_CERT_ACCURACY.md
@ -0,0 +1,131 @@
+# Handoff — Real-life cert SAP accuracy (validate → fix → expand)
+
+**Purpose.** Grow `real_life_examples` into a trustworthy regression corpus that
+validates this repo's SAP calculator against accredited **Elmhurst Energy**,
+one real certificate at a time — and use each cert to *improve the mapper and
+calculator* and *add test coverage*. This is the strategy/context doc; the
+step-by-step procedure is the **`/validate-cert-sap-accuracy`** skill.
+
+---
+
+## TL;DR — the loop per cert
+
+Run **`/validate-cert-sap-accuracy <uprn>`**. It drives:
+
+1. `scripts/fetch_real_life_epc_sample.py <uprn>` → saves
+   `backend/epc_api/json_samples/real_life_examples/<schema>/uprn_<uprn>/epc.json`,
+   prints schema + lodged rating + our engine's SAP.
+2. `/epc-to-elmhurst-rdsap-inputs <uprn>` → writes `elmhurst_inputs.md` (page-by-page
+   Elmhurst entry sheet with code→value mappings).
+3. **You** build it in Elmhurst, export the **Summary** and **SAP-10.2 worksheet**
+   PDFs → save as `elmhurst_summary.pdf` / `elmhurst_worksheet.pdf` in the sample dir.
+4. `scripts/compare_epc_paths.py <uprn>` → builds `EpcPropertyData` from BOTH the
+   gov-API json and the Elmhurst summary, deep-diffs them, runs BOTH through
+   `Sap10Calculator`, and prints Elmhurst's worksheet SAP.
+5. **Reconcile** the field diffs to convergence (see "Reading the comparison").
+6. **Pin** the agreed score: add a `RealCertExpectation` to
+   `tests/domain/sap10_calculator/test_real_cert_sap_accuracy.py`; the sample
+   dir is already the corpus entry, so the pin is what activates it.
+
+## Reading the comparison (the core skill)
+
+- **Our engine on Elmhurst inputs ≈ Elmhurst's worksheet SAP** → the *calculator*
+  is correct. (Proven repeatedly — it matches Elmhurst's fuel cost to the penny.)
+- **gov-API SAP vs Elmhurst-PDF SAP gap** → *input* differences only. Triage each
+  field diff into:
+  - **Elmhurst data-entry error** (swapped floor dims, wrong cylinder/immersion,
+    missing baths, wrong postcode) → fix in Elmhurst, re-export, re-compare.
+  - **gov-API mapper gap** → a real per-cert-mapper fix (improve the mapper). Flag
+    it; **don't tune to mask it**.
+  - **Ground-truth question** (what the property *actually* is) → you settle it;
+    align both sides to the lodged data.
+- Ignore cosmetic diffs: codes vs strings (tenure, region), empty `EnergyElement`
+  lists (the Elmhurst path stores construction in `sap_building_parts`).
+
+## Mental model (hard-won)
+
+- **The calculator is essentially exact.** Fed identical inputs it reproduces
+  accredited Elmhurst. So accuracy work is almost entirely **mapper fidelity** —
+  making the gov-API `EpcPropertyData` match what an assessor would key in.
+- **Lodged `energy_rating_current` is NOT a clean target** for pre-SAP10 schemas
+  (17.x–19.0 lodge SAP-2012 ratings — a different methodology). Use **Elmhurst on
+  the lodged inputs** as ground truth; cite lodged only as context.
+- **Pin the observed gov-API engine score**, not the lodged or Elmhurst number —
+  the test guards the production path. Record the Elmhurst-validated value + what
+  reconciled it in the comment.
+
+## Code→value cheatsheet (the gotchas that bit us)
+
+Full table: `.claude/skills/epc-to-elmhurst-rdsap-inputs/reference/mapping.md`. The
+ones that cost us time:
+
+| Field | Mapping | Note |
+|---|---|---|
+| `immersion_heating_type` | **1 = DUAL, 2 = SINGLE** | flips Table 13 eqn; swung cert 10002468137 by 4 SAP |
+| `cylinder_size` | **2 = 110 L, 3 = 160 L, 4 = 210 L** | pick the litres in Elmhurst, not the label |
+| `party_wall_construction` | 1=Solid (U 0), 2=cavity unfilled (0.5), 3=filled (0.2), 4/5=unknown (0.25) | code 1 ≠ "unable to determine" |
+| `cylinder_insulation_type` | 1=Foam, 2=Jacket | — |
+| Number of baths | `rooms_with_bath_and_or_shower + rooms_with_bath_and_mixer_shower` | Elmhurst WWHRS sub-tab, defaults to 0 |
+| Off-peak fuel (`29`) | space-heat 100% low rate (correct for storage heaters); water-heat = Table 13 split | meter = Economy-7/Dual |
+| `water_heating_code` 903 | Electric immersion off-peak → Elmhurst **"Water Heater"** category | not "Boiler Circulator" (901) |
+| Windows (reduced-field) | area = 0.148 × TFA × band; raw U from glazing code via `u_window` (RdSAP Table 24) | not real geometry |
+
+## Known mapper gaps to chase (improve the mapper)
+
+- **Lodged alt-wall dropped** — `sap_building_parts[].sap_alternative_wall_1` is
+  `None` on the gov-API path even when the cert lodges one (Elmhurst keeps it).
+  ~£1 / 0.06 SAP on cert 10002468137, but real. (per-cert-mapper / Khalim's domain.)
+- Add more as new certs surface them — that's the point of expanding the corpus.
+
+## Picking certs for coverage
+
+Maximise variety so each cert exercises new mapper/calculator paths:
+- **Heating**: gas combi, gas boiler + cylinder, oil, LPG, solid fuel, heat pump
+  (ASHP/GSHP), storage heaters (done), electric boiler, community/heat-network.
+- **Hot water**: combi, cylinder (foam/jacket), immersion (single/dual), solar HW,
+  WWHRS, instantaneous electric.
+- **Schema**: 17.0, 17.1 (done), 18.0 (done), 19.0, 20.0.0, 21.0.0, 21.0.1.
+- **Geometry**: flats (ground/mid/top floor), bungalow, extensions, room-in-roof,
+  conservatory, basement.
+- **Tariff/region**: mains gas, off-peak electric (done), 10/18/24-hour, varied regions.
+- **Tech**: PV (export/non-export), wind, FGHRS.
+
+## Guardrails
+
+- **RdSAP-21.0.1 corpus gauge** (`tests/infrastructure/epc_client/test_sap_accuracy_corpus.py`,
+  currently 66.9% within-0.5 SAP) is the broad regression net for any mapper/calc
+  change. **Ratchet thresholds up, never loosen.** Re-run it after every change.
+- **Don't tune the mapper to one cert** — fix generically and confirm against the
+  gauge. A single-cert tweak that regresses the corpus is net-negative.
+- **No tolerance widening** in the real-cert test — pin the observed integer SAP;
+  if a known engine bug blocks a cert, use `known_bug_xfail="…"` (strict xfail).
+
+## Current corpus
+
+| Sample | Schema | Pin | Status |
+|---|---|---|---|
+| `uprn_100020450179` | RdSAP-18.0 | 73 | matches lodged 73 |
+| `uprn_10002468137` | RdSAP-17.1 | 61 | Elmhurst-validated (dual immersion, 110 L, 2 baths); lodged 55 = old schema |
+| `uprn_10092973954` | SAP-17.1 (full SAP) | 77 | full-SAP mapper partial; pinned to observed (not lodged 83) |
+
+## Open threads
+
+- **Full-SAP mapper WIP** (`_sap_door_aggregates` D2 door slice) is parked in
+  `git stash` (`hyde-wip-before-main-merge`, `stash@{0}`) — not mine; the full-SAP
+  effort should `git stash show -p stash@{0}` and re-apply. Full-SAP (`SAP-Schema-*`)
+  support is incomplete; RdSAP schemas are the solid path.
+- The off-peak water-heating (Table 13) fix and the per-cert-mapper accuracy work
+  landed on `main` via PR #1217 — pull `main` before starting new certs.
+
+## Key files
+
+| Concern | Path |
+|---|---|
+| Per-cert loop (procedure) | skill `validate-cert-sap-accuracy` |
+| Cert → Elmhurst input sheet | skill `epc-to-elmhurst-rdsap-inputs` (+ `reference/mapping.md`) |
+| Capture a cert | `scripts/fetch_real_life_epc_sample.py` |
+| Compare the two paths | `scripts/compare_epc_paths.py` |
+| The accuracy test (pins) | `tests/domain/sap10_calculator/test_real_cert_sap_accuracy.py` |
+| Corpus regression gauge | `tests/infrastructure/epc_client/test_sap_accuracy_corpus.py` |
+| Mapper (improve here) | `datatypes/epc/domain/mapper.py`, `domain/sap10_calculator/rdsap/cert_to_inputs.py` |
+| Calculator | `domain/sap10_calculator/` |