# Handover — wide-scale API accuracy study + next steps Point-in-time note. Start from [`AGENT_GUIDE.md`](AGENT_GUIDE.md) for methodology, the 1e-4 bar, the per-line debugging loop, the section helpers, and the suite command. - **Branch:** `feature/per-cert-mapper-validation` - **HEAD:** `9521d524`. Next SAP slice: **S0380.235**. - **Baseline (§4 suite):** `tests/domain/sap10_calculator/ backend/documents_parser/tests/` → green (2412 passed, 1 skipped). Pre-existing out-of-scope failures unchanged (stone-§5.6 in `domain/sap10_ml/tests/`; `test_from_rdsap_schema.py::...test_total_floor_area`). ## Shipped this session (S0380.232-234 — the case-19 PV closure) The PV diverter (the prior handover's S0380.232 ask) needed two prerequisite spec bugs fixed first; all three landed: | slice | commit | spec | what | |---|---|---|---| | **S0380.232** | `212b0c92` | App M1 §3a (p.93, l.5470-5476) | D_PV excludes the LOW-rate portion of an off-peak electric main: `(211)` is only PV-eligible where its §10a code ∈ {30,32,34,35,38}. Storage heaters on 7-hr charge wholly at low rate → fraction 0.0 → excluded. β_Jan 0.894→0.792 (ws 0.791). New `_main_space_heating_high_rate_fraction`. | | **S0380.233** | `d4a8c02b` | App M1 §6 (p.94, l.5510-5513) | PV-used-in-dwelling credited at the Table 12a ALL_OTHER_USES **weighted** rate (7-hr 14.311 p/kWh), not the bare low rate (5.50). Was under-crediting onsite PV on every off-peak PV cert. Delegates to `_other_fuel_cost_gbp_per_kwh`; STANDARD unchanged. | | **S0380.234** | `9521d524` | Appendix G4 (p.72-73) | The PV diverter. 3 layers: extractor `Diverter present` + schema `pv_diverter` → `pv_diverter_present` flag (Elmhurst + API mappers) → `_pv_diverter_monthly_kwh` (SPV = export×0.8×0.9, clamp ≤ (62)+(63a), → (63b)m); `cert_to_inputs` recomputes (219) + PV export, β fixed pre-diverter. | **Case 19 now: SAP cont 50.33 → 51.34** (ws 51.2221; both round to lodged **51**), cost (255) 1847.5→1812.3 (ws 1816.6), CO2 3331→3120 (ws 3126), (233a) dwelling 1280.6 (ws **1280.4** — the β fix pins it). The diverter formula is **exact in summer** (Jun SPV 186.07 = export×0.72, matches ws (63b)). **The remaining +0.11 SAP on case 19 = two separate, still-open causes:** 1. **Winter Appendix-M monthly EPV shape.** Our annual EPV (2684.17) matches the worksheet exactly and Jun-Sep match per-month exactly, but Jan-May/Oct-Dec our EPV is ~9-11% LOW (worksheet Jan 68.2 vs ours 62.5). Back-solve: ws EPV_m = |(233a)_m| + |(63b)_m|/0.72. This under-diverts in winter → export (233b) 280.7 vs ws 184.2, and (219) 3322 vs ws 3188. **A two-array PV apportionment issue (case 19 has SE + NW arrays with different overshading) — chase in §M / Appendix U monthly radiation, NOT the diverter (which is validated).** 2. **Fabric (33) +1.0 W/K** (ours 305.04 vs ws 304.04) — a single element off by exactly 1.0; floor=25.000 is suspiciously round. Walk the per-element §3 breakdown. The **eval headline is flat** (42.9→43.0% <0.5; cat-7 5.25→4.93) — expected: the diverter is rare and the β/price effects are small on the rounded SAP. The value was pinning the worksheet-validated case 19 + fixing three real spec bugs that the curated cohort masked. ## Headline now (1,000-cert 2026 API sample, HEAD `f326e4eb`) | metric | value | was (handover baseline `9c0a373f`) | |---|---|---| | computed | 882 / 1000 | 882 | | **% \|err\| < 0.5** | **42.9%** | 41.8% | | % < 1 / < 2 / < 5 | 56.7% / 74.6% / 90.1% | 54.9 / 71.9 / 87.8 | | median / mean \|err\| | 0.73 / **2.04** | 0.79 / ~2.4 | | mean signed | −0.41 | +0.2 | **Error by heating cluster** (the load-bearing cut — re-run `analyse_api_sap_clusters.py`): | cluster | n | mean \|err\| | %<0.5 | note | |---|---|---|---|---| | cat 2 gas boiler + PCDB | 639 | 1.27 | 49.6% | well-trodden | | cat 2 gas, NO PCDB idx | 91 | 3.18 | 35.2% | non-PCDB Table-4b boilers | | cat 6 community | 45 | 2.59 | 31.1% | known-hard | | cat 7 electric storage | 40 | **5.25** | 10.0% | was 7.33 → S0380.227-229 | | cat 10 electric room heaters | 48 | **5.26** | 16.7% | was 9.49 → S0380.230-231 (bias gone) | | cat 4 HP + PCDB | 8 | 6.11 | 12.5% | small n, APM | | Flats (any) | 282 | 2.57 | 30.5% | geometry / communal | | real PV | 45 | 3.90 | 26.7% | Appendix M | **Worst individual offenders** (the long tail — `eval` TOP 40): `2100-5421-0922-1622-3463` (−60.8, our SAP **negative** −24.8 vs lodged 36 — a flat, 2 bps, cat-2; the single worst, likely a geometry/communal blow-up — START a per-cert dig here), `2958-8008` (+32, age 6=tiny), `9836-5829` (−29.5, cat-10 tail), several cat-7/cat-10 in the −20s. ## Work shipped (this session — S0380.227-231 + 3 mapper commits) | commit | what | |---|---| | **S0380.227** | dedicated DHW-only system (WHS 911) is NOT separately timed → no Table 2b ×0.9; TF (53) 0.54→0.60, (59) h=3→5 | | **S0380.228** | electric SECONDARY on off-peak bills at Table 12a `OTHER_DIRECT_ACTING_ELECTRIC` (1.00 high-frac), not 100% low | | **S0380.229** | dedicated water boiler/circulator (WHC 911-931) feeds cylinder via primary loop → Table 3 primary loss applies | | **S0380.230** | electric room heaters (cat 10) on off-peak → `OTHER_DIRECT_ACTING_ELECTRIC` (mirror of .228 for the MAIN). cat-10 9.49→7.11 | | **S0380.231** | Dual-meter electric room heaters → 10-hour tariff (RdSAP §12 Rule 3; codes 691-694,699). cat-10 7.11→5.26, bias +5.08→−0.86 | | `bd25a3c7` | SY system-built vs B basement: code 6 stays system-built; basement → explicit `wall_is_basement`/`is_basement` flag. `system_build` is a derived property (wall type). API path post-processes via addendum. (issue #1177 — see `docs/PR_NOTE_system_built_basement_1177.md`: field-vs-property merge landmine) | | `f326e4eb` | Elmhurst path now populates `roof_construction` (int) via `_elmhurst_roof_construction_int` for cross-mapper parity (API set it, Elmhurst didn't) | - **Toolkit (committed):** `scripts/fetch_2026_epc_sample.py`, `scripts/eval_api_sap_accuracy.py`, `scripts/analyse_api_sap_clusters.py`. The 1,000 cached JSONs live in `/tmp/epc_2026_sample/` (gitignored scratch — re-fetch with the sampler; `EPC_SAMPLE_CACHE` overrides the dir). Re-run the eval after any mapper/calculator change to watch the headline move. --- ## What this study did Fetched a **random 1,000-cert sample of domestic EPCs lodged Jan–May 2026** from the GOV.UK EPB register (the `/api/domestic/search` date-windowed endpoint to enumerate cert numbers across random pages → `/api/certificate` per cert for the full schema-21 JSON), ran each through the **API path** (`from_api_response → cert_to_inputs → continuous SAP`), and compared to the lodged rounded `energy_rating_current`. **This is the first measurement of raw-API behaviour on an unbiased population** — the curated golden cohort (~exact) masked it. ### Reproduce - Sampler/fetcher: `/tmp/sample_fetch_2026.py` → caches JSONs to `/tmp/epc_2026_sample/`. - Evaluator: `/tmp/eval_sap_accuracy.py` → per-cert CSV + summary (`% <0.5`, buckets, worst-40, raise breakdown). Cluster analysis: `/tmp/analyze2.py`. (Token in `backend/.env` `OPEN_EPC_API_TOKEN`; `date_end` must be < today.) - **These scripts are uncommitted (in /tmp).** Worth promoting to `scripts/` if this becomes a recurring measurement. --- ## Headline (at HEAD `9c0a373f`) | metric | value | |---|---| | computed | **882 / 1000** (100 unsupported pre-21 schema; 18 still raise) | | **% \|err\| < 0.5** (of computed) | **41.8%** | | % < 1.0 / < 2.0 / < 5.0 | 54.9% / 71.9% / 87.8% | | median / mean \|err\| | 0.79 / ~2.4 | | mean signed err | +0.2 (slight over-rate) | **Accuracy is dominated by heating type** (the load-bearing cut): | main_heating_category | n | mean \|err\| | %<0.5 | status | |---|---|---|---|---| | 2 = gas boiler (PCDB-indexed) | 579 | 1.30 | 48% | the well-trodden path | | **7 = electric storage heaters** | 39 | **7.33** | **3%** | **broken — #1 lever** | | **10 = electric room heaters** | 43 | **10.26** | **9%** | **broken — #2 lever** | | 6 = community scheme | 38 | 2.28 | 34% | known-hard | | Flats (any heating) | 242 | 3.19 | 29% | geometry + communal | --- ## Work shipped this session (S0380.219–225) Coverage unblocked **788 → 882 computed (+94)**; one real accuracy bug fixed (+22 certs). | slice | fix | certs | |---|---|---| | S0380.219 | floor_construction 3 → "Suspended, not timber" (RdSAP 10 field 3-1) | ~44 | | S0380.220 | floor_construction 0 → None (Table 19 unknown; proven inert) | 37 | | S0380.221 | default missing `post_town` (unused metadata) | 1 | | S0380.222 | roof_construction 6 (thatched) + 7 (dwelling above) → None (inert) | 5 | | S0380.223 | `_part_geometry` early-return key contract (RR KeyError) | 5 | | **S0380.224** | **loose-jacket cylinder storage loss (Table 2 Note 1)** — was None'd out → zero loss | **22** (mean err +2.29 → +0.45) | | S0380.225 | §10.7 no-water-heating default A-F → 12mm loose jacket | 2 | | S0380.226 | Elmhurst "Jacket" cylinder insulation → loose-jacket code 2 (Summary path) | (unblocked case 19) | Headline at HEAD: **882 / 1000 computed, 41.8% < 0.5** (re-run the eval to refresh). --- ## ★ Active worksheet: simulated case 19 — the electric-storage-heater debug The user generated `sap worksheets/golden fixture debugging/simulated case 19/` (`Summary_001431 (2).pdf` + `P960-0001-001431 - 2026-06-04T174437.228.pdf`), purpose-built to hit the #1 cluster. It exercises **electric storage heaters** (SAP code 402, control 2402 auto-charge, 7-hr off-peak tariff) + a **loose-jacket 210 L cylinder** + **WHS 911** (gas boiler for water only) + **room-in-roof gables (Party + Exposed) + an alternative wall + exposed floor + electric secondary**. **S0380.226 unblocked extraction** (the "Jacket" label was raising). The worksheet has FOUR blocks: **block 1 = rating** (UK-avg region 0; cost (255)=1816.58, SAP (258)=51, TF (53)=0.60, (51)=0.0330), **block 2 = demand** (postcode; CO2 (272)=3125.85, PE (286)=30271.76), blocks 3/4 = the potential/improved variants. Pin the rating block for SAP/cost, the demand block for PE/CO2. Worksheet header line 116 lodges **"Separate Time Control: No"** (NOT in the Summary §15 PDF — only in the P960 header). **Three slices shipped (S0380.227–229)** — closed the +9 cluster signature; SAP cont 60.2 → **50.33** (worksheet ~51.22): | slice | line ref | fix | SAP cont | |---|---|---|---| | **S0380.227** | TF (53) 0.54→**0.60**; (59) h=3→**h=5** | dedicated DHW-only system (WHS 911) is NOT separately timed → no Table 2b ×0.9 (RdSAP 10 §10.5.1). `_separately_timed_dhw` gated on WHC ∈ {901,902,914}. Worksheet-pins S0380.224's loose-jacket (51)=0.0330/(53)=0.60/(55)=3.4531/(56-57)Jan=107.0456 at 1e-4. | 60.2→60.1 | | **S0380.228** | cost (255) | electric SECONDARY on off-peak bills at Table 12a `OTHER_DIRECT_ACTING_ELECTRIC` (7-hr high-frac **1.00** = £0.1529), not the flat off-peak low (£0.0550). Worksheet (242): "1.00*15.29 + 0.00*5.50". THE primary cost driver (−340). | 60.1→**50.67** | | **S0380.229** | (62) 2493.30→**3169.98** | dedicated water-heating boiler/circulator (WHC 911-931) feeds the cylinder via a primary loop → Table 3 row 1 primary loss applies (keyed off `water_heating_code`, since `_water_heating_main` returns the electric SPACE main). Restored the missing (59)=676.68 kWh/yr. | 50.67→50.33 | **The ONE remaining case-19 cause — the PV diverter (63b) — is S0380.232.** Worksheet header line 124 "Diverter = Yes"; Summary §19 "Diverter present: Yes". Per **SAP 10.2 Appendix G4 (PDF p.72-73)** surplus PV is diverted to the cylinder immersion: `S_PV,diverter,m = EPV,m × (1 − βm) × 0.8 × 0.9`, clamped to ≤ (62)m + (63a)m, entered as a NEGATIVE (63b)m. (64)m = (62)m + (63a)m + (63b)m + … → (219)m = (64)m / eff. All four G4 inclusion conditions are met (PV connected to dwelling; cylinder 210 L > (43)=74.24; no solar HW; no battery). Worksheet (63b) annual ≈ −1097.67 kWh → (64) drops 3169.98 → 2072.31, (219) 4876.9 → 3188.17. It ALSO changes the PV β-split (export drops: worksheet dwelling 1280.39 / exported 184.16 vs our 1496.20 / 1187.98 with no diverter). This is a 3-layer feature (extractor `Diverter present` → mapper flag → calculator (63b) + β-split interaction) — implement as one focused slice. Spec note p.5485: for the β calc, (219)m must EXCLUDE the diverter saving. Smaller residuals after the diverter lands: main fuel (211) ours 20250.22 vs ws 19910.30 (+340), secondary (215) 3573.57 vs 3513.58 (+60), fabric (33) +1.0 (gable/alt-wall). Current demand block: CO2 (272) 3331.04 vs 3125.85, PE (286) 31653.23 vs 30271.76 — both will drop with the diverter (less grid import). **Debug recipe** (reuse the `/tmp/case19*.py` throwaways or rebuild): ```python pages → ElmhurstSiteNotesExtractor(...).extract() → from_elmhurst_site_notes → cert_to_inputs / cert_to_demand_inputs → calculate_sap_from_inputs # CI._cylinder_storage_loss_override(epc, main) → (57)m; CI._primary_loss_override(epc, age) → (59)m # CI._water_heating_worksheet_and_gains(epc=…, water_efficiency_pct=0.65, is_instantaneous=False, # primary_age=, pcdb_record=None) → wh_result with (45)/(46)/(57)/(59)/(62)/(64) ``` --- ## Remaining work, prioritised ### A. Accuracy clusters (highest value) 1. **PV diverter (S0380.232)** — closes case 19 to 1e-4 AND helps the real-PV cluster (45 certs, mean 3.90). Fully spec'd in the case-19 section above (Appendix G4). **Has a worksheet** → 1e-4 bar. Do this first: it's the one open cause on a validated worksheet. 2. **Electric storage heaters (cat 7, 40 certs, mean 5.25).** S0380.227-229 took it 7.33→5.25; the case-19 PV diverter will help further. Beyond that the tail is per-cert — a **dedicated cat-7 worksheet** (no PV, no diverter) would let you pin charge-control / responsiveness at 1e-4 instead of the ±0.5 lodged fallback. 3. **Electric room heaters (cat 10, 48 certs, mean 5.26).** S0380.230-231 fixed the systematic tariff bias (mean 9.49→5.26, signed +5.08→−0.86); the residual is now scattered per-cert (e.g. `9836-5829` −29.5, an under-rater). A **cat-10 worksheet** pins the tail at 1e-4. 4. **Non-PCDB gas boilers (cat 2, no idx, 91 certs, mean 3.18)** and **Flats (282, mean 2.57)** — the next volume levers once the electric clusters are worksheet-pinned. Flats = geometry / communal; start with the worst (`2100-5421` negative SAP). - **`2100-5421-0922-1622-3463` diagnosed (S0380.234 session):** NOT a flat — `property_type 0`, a **352 m² 2-storey uninsulated solid-wall** dwelling (wall_constr 3 / wall_ins 4 as-built; roof_type 4, no roof insulation). Our space-heating demand is **71,084 kWh/yr** → (37)=995.93 W/K → SAP −24.8 (lodged 36), cost £14,045. This is the **`as-built insulated-assumed`** U-value front ([[project_as_built_insulated_assumed_bug]]; S0380.209 fixed walls, "roof next"): the uninsulated-roof / as-built U over-estimates demand on big old dwellings. API-only (no worksheet → ±0.5 lodged fallback only); needs a generated worksheet or a roof-U spec audit to pin. It is one outlier, not a cluster-wide flats bug. ### B. Remaining raises (16 certs — all U-value / heat-loss-sensitive, NOT enum guesses) - **`gable_wall_type` 2 & 3 (14 certs).** RdSAP 10 **Table 4** RR walls: 0=Party (U=0.25), 1=Exposed (U=common wall), 2/3 = **Sheltered (U=external×R0.5)** + **Adjacent-to-heated (U=0)**, code↔type order unconfirmed (schema says "not yet seen"). Needs (i) a worksheet to pin which code is which + the U-values, and (ii) **calculator support** — the cascade only has `gable_wall`/`gable_wall_external` kinds; Sheltered (R=0.5) and Adjacent (U=0) are new. Best real example: `2818-3053-3203-2655-9204` lodges BOTH gable 2 and 3. - **`main_heating_category: 9` = warm air, mains gas (1 cert).** Needs §9 warm-air dispatch. - **`wall_insulation_thermal_conductivity` 3 (1 cert).** Verified it shifts wall U (53.96→51.61 across λ) → worksheet-backed (the resolver's own discipline). - **`floor_heat_loss` 8 (2 certs).** Semantically unconfirmed; inert for the 2 observed (non-Main bp) but potentially "heated space below" (→ should exclude the floor, a calculator change). Don't guess. The clean mapper-enum raises are **exhausted** — every remaining raise changes the answer, which is what the strict-raise guard exists to prevent. --- ## ★ Additional worksheets that would help most (the user will generate these on request) The two electric clusters are now systematic-bias-free (S0380.227-231) but their TAILS sit at the ±0.5-vs-lodged fallback bar because **no worksheet validates them at 1e-4**. The three highest-value worksheets to ask the user for: 1. **An electric ROOM-heater dwelling** (SAP code ~691, control 2601/2602/2603, Dual meter) — pins the cat-10 tail (48 certs, mean 5.26) at 1e-4. Make it PV-free + cylinder-free to isolate the space-heat path from the diverter/HW. 2. **An electric STORAGE-heater dwelling distinct from case 19** (no PV, no WHS-911) — pins the cat-7 tail (40 certs, mean 5.25): charge control (2401/2402), 7-hr vs 24-hr, responsiveness. 3. **A room-in-roof with a SHELTERED gable and an ADJACENT-TO-HEATED gable** (Table 4 types beyond Party/Exposed) — closes the `gable_wall_type` 2/3 raise (14 certs) and pins the Sheltered (U=ext×R0.5) / Adjacent (U=0) U-values the calculator must add. Per worksheet send BOTH the **Summary PDF** (input) and the **P960/dr87 worksheet PDF** (the `(1)..(286)` ground truth). Drop them in `sap worksheets/golden fixture debugging//` and run the case-19 debug recipe. The original "design one property" guidance (kept below for reference) is what case 19 was built from. ## What to generate — the single most productive worksheet (reference) Heating is one-per-property, so one worksheet can't cover all four broken heating types. But **fabric is independent of heating**, so the highest-ROI single artifact bundles the #1 accuracy cluster with the fabric that closes the gable raises and pins the loose-jacket fix. **Build (in Elmhurst, a simulated case is fine — same as the existing `simulated case N` worksheets) ONE property:** > **A house heated by ELECTRIC STORAGE HEATERS, with a room-in-roof and a hot-water cylinder:** > - **Heating:** electric storage heaters (off-peak / Economy-7 tariff), with a clear control > type. *This is the load-bearing choice — it validates the 39-cert cat-7 cluster.* > - **Hot water:** a cylinder with a **loose-jacket** insulation (not factory foam), a stated > jacket thickness, and a cylinder thermostat. *Pins S0380.224's loose-jacket storage loss > (56)m at 1e-4 — currently only direction-validated.* > - **Room-in-roof** with **two gable walls of different types** — ideally one **"Sheltered"** > and one **"Adjacent to another heated space"** (plus, if the tool allows, a Party and an > Exposed gable). *Gives the Table 4 U-values for gable_wall_type 2 & 3 and disambiguates the > code order — closes the 14-cert raise.* > - **An extension (2nd building part)** with a different floor exposure (e.g. over unheated > space or "to external air"). *Exercises multi-bp geometry + floor-exposure handling.* From that single worksheet I can pin, at 1e-4: the electric-storage space-heating lines ((210)/(211)/space-heat), the loose-jacket storage loss (56)m, the RR gable U-values (30)/(32), and the multi-bp fabric (27)–(37). That's **one cluster + one fix-validation + the biggest raise + fabric**, all in one document. **If you'd rather do two:** add a second worksheet that is identical but with **electric room heaters** instead of storage heaters — together they cover cat 7 + cat 10 (≈ 82 certs, the two worst clusters). A third for a **community-heating flat** would cover cat 6 + the flat geometry cluster. ### Then send me, per worksheet The **Summary PDF** (the Elmhurst input/site-notes) + the **worksheet PDF** (the `(1)..(286)` ground truth). With those I run both front-ends through the cascade and pin each line ref at 1e-4, exactly as for the `with api 3` pair (S0380.218). --- ## Conventions (unchanged) One cause = one slice = one commit; spec citation (page+line) in the message; AAA tests (`# Arrange / # Act / # Assert`); `abs(x - y) <= tol` (not `pytest.approx`); SAP 10.2 only; no tolerance widening / xfail / rel-tol. New code passes pyright strict with ZERO NEW errors (baseline-compare with `git stash`; mapper.py / cert_to_inputs.py / heat_transmission.py carry pre-existing errors — compare counts). Stage files by name (the tree has unrelated `pytest.ini`/`scripts/` changes that must NOT be staged). `Co-Authored-By: Claude Opus 4.8 `.