Records the three PV slices shipped (D_PV off-peak exclusion, weighted dwelling import price, Appendix G4 diverter), the resulting case-19 state (SAP 50.33→51.34, rounds to lodged 51), and the two remaining case-19 causes (winter Appendix-M EPV monthly shape; fabric (33) +1.0). Adds the `2100-5421` worst-offender diagnosis (a 352 m² uninsulated solid-wall dwelling on the as-built-insulated-assumed roof-U front, not a flats bug). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
20 KiB
Handover — wide-scale API accuracy study + next steps
Point-in-time note. Start from AGENT_GUIDE.md for methodology, the
1e-4 bar, the per-line debugging loop, the section helpers, and the suite command.
- Branch:
feature/per-cert-mapper-validation - HEAD:
9521d524. Next SAP slice: S0380.235. - Baseline (§4 suite):
tests/domain/sap10_calculator/ backend/documents_parser/tests/→ green (2412 passed, 1 skipped). Pre-existing out-of-scope failures unchanged (stone-§5.6 indomain/sap10_ml/tests/;test_from_rdsap_schema.py::...test_total_floor_area).
Shipped this session (S0380.232-234 — the case-19 PV closure)
The PV diverter (the prior handover's S0380.232 ask) needed two prerequisite spec bugs fixed first; all three landed:
| slice | commit | spec | what |
|---|---|---|---|
| S0380.232 | 212b0c92 |
App M1 §3a (p.93, l.5470-5476) | D_PV excludes the LOW-rate portion of an off-peak electric main: (211) is only PV-eligible where its §10a code ∈ {30,32,34,35,38}. Storage heaters on 7-hr charge wholly at low rate → fraction 0.0 → excluded. β_Jan 0.894→0.792 (ws 0.791). New _main_space_heating_high_rate_fraction. |
| S0380.233 | d4a8c02b |
App M1 §6 (p.94, l.5510-5513) | PV-used-in-dwelling credited at the Table 12a ALL_OTHER_USES weighted rate (7-hr 14.311 p/kWh), not the bare low rate (5.50). Was under-crediting onsite PV on every off-peak PV cert. Delegates to _other_fuel_cost_gbp_per_kwh; STANDARD unchanged. |
| S0380.234 | 9521d524 |
Appendix G4 (p.72-73) | The PV diverter. 3 layers: extractor Diverter present + schema pv_diverter → pv_diverter_present flag (Elmhurst + API mappers) → _pv_diverter_monthly_kwh (SPV = export×0.8×0.9, clamp ≤ (62)+(63a), → (63b)m); cert_to_inputs recomputes (219) + PV export, β fixed pre-diverter. |
Case 19 now: SAP cont 50.33 → 51.34 (ws 51.2221; both round to lodged 51), cost (255) 1847.5→1812.3 (ws 1816.6), CO2 3331→3120 (ws 3126), (233a) dwelling 1280.6 (ws 1280.4 — the β fix pins it). The diverter formula is exact in summer (Jun SPV 186.07 = export×0.72, matches ws (63b)).
The remaining +0.11 SAP on case 19 = two separate, still-open causes:
- Winter Appendix-M monthly EPV shape. Our annual EPV (2684.17) matches the worksheet exactly and Jun-Sep match per-month exactly, but Jan-May/Oct-Dec our EPV is ~9-11% LOW (worksheet Jan 68.2 vs ours 62.5). Back-solve: ws EPV_m = |(233a)_m| + |(63b)_m|/0.72. This under-diverts in winter → export (233b) 280.7 vs ws 184.2, and (219) 3322 vs ws 3188. A two-array PV apportionment issue (case 19 has SE + NW arrays with different overshading) — chase in §M / Appendix U monthly radiation, NOT the diverter (which is validated).
- Fabric (33) +1.0 W/K (ours 305.04 vs ws 304.04) — a single element off by exactly 1.0; floor=25.000 is suspiciously round. Walk the per-element §3 breakdown.
The eval headline is flat (42.9→43.0% <0.5; cat-7 5.25→4.93) — expected: the diverter is rare and the β/price effects are small on the rounded SAP. The value was pinning the worksheet-validated case 19 + fixing three real spec bugs that the curated cohort masked.
Headline now (1,000-cert 2026 API sample, HEAD f326e4eb)
| metric | value | was (handover baseline 9c0a373f) |
|---|---|---|
| computed | 882 / 1000 | 882 |
| % |err| < 0.5 | 42.9% | 41.8% |
| % < 1 / < 2 / < 5 | 56.7% / 74.6% / 90.1% | 54.9 / 71.9 / 87.8 |
| median / mean |err| | 0.73 / 2.04 | 0.79 / ~2.4 |
| mean signed | −0.41 | +0.2 |
Error by heating cluster (the load-bearing cut — re-run analyse_api_sap_clusters.py):
| cluster | n | mean |err| | %<0.5 | note |
|---|---|---|---|---|
| cat 2 gas boiler + PCDB | 639 | 1.27 | 49.6% | well-trodden |
| cat 2 gas, NO PCDB idx | 91 | 3.18 | 35.2% | non-PCDB Table-4b boilers |
| cat 6 community | 45 | 2.59 | 31.1% | known-hard |
| cat 7 electric storage | 40 | 5.25 | 10.0% | was 7.33 → S0380.227-229 |
| cat 10 electric room heaters | 48 | 5.26 | 16.7% | was 9.49 → S0380.230-231 (bias gone) |
| cat 4 HP + PCDB | 8 | 6.11 | 12.5% | small n, APM |
| Flats (any) | 282 | 2.57 | 30.5% | geometry / communal |
| real PV | 45 | 3.90 | 26.7% | Appendix M |
Worst individual offenders (the long tail — eval TOP 40): 2100-5421-0922-1622-3463
(−60.8, our SAP negative −24.8 vs lodged 36 — a flat, 2 bps, cat-2; the single worst, likely
a geometry/communal blow-up — START a per-cert dig here), 2958-8008 (+32, age 6=tiny),
9836-5829 (−29.5, cat-10 tail), several cat-7/cat-10 in the −20s.
Work shipped (this session — S0380.227-231 + 3 mapper commits)
| commit | what |
|---|---|
| S0380.227 | dedicated DHW-only system (WHS 911) is NOT separately timed → no Table 2b ×0.9; TF (53) 0.54→0.60, (59) h=3→5 |
| S0380.228 | electric SECONDARY on off-peak bills at Table 12a OTHER_DIRECT_ACTING_ELECTRIC (1.00 high-frac), not 100% low |
| S0380.229 | dedicated water boiler/circulator (WHC 911-931) feeds cylinder via primary loop → Table 3 primary loss applies |
| S0380.230 | electric room heaters (cat 10) on off-peak → OTHER_DIRECT_ACTING_ELECTRIC (mirror of .228 for the MAIN). cat-10 9.49→7.11 |
| S0380.231 | Dual-meter electric room heaters → 10-hour tariff (RdSAP §12 Rule 3; codes 691-694,699). cat-10 7.11→5.26, bias +5.08→−0.86 |
bd25a3c7 |
SY system-built vs B basement: code 6 stays system-built; basement → explicit wall_is_basement/is_basement flag. system_build is a derived property (wall type). API path post-processes via addendum. (issue #1177 — see docs/PR_NOTE_system_built_basement_1177.md: field-vs-property merge landmine) |
f326e4eb |
Elmhurst path now populates roof_construction (int) via _elmhurst_roof_construction_int for cross-mapper parity (API set it, Elmhurst didn't) |
- Toolkit (committed):
scripts/fetch_2026_epc_sample.py,scripts/eval_api_sap_accuracy.py,scripts/analyse_api_sap_clusters.py. The 1,000 cached JSONs live in/tmp/epc_2026_sample/(gitignored scratch — re-fetch with the sampler;EPC_SAMPLE_CACHEoverrides the dir). Re-run the eval after any mapper/calculator change to watch the headline move.
What this study did
Fetched a random 1,000-cert sample of domestic EPCs lodged Jan–May 2026 from the
GOV.UK EPB register (the /api/domestic/search date-windowed endpoint to enumerate cert
numbers across random pages → /api/certificate per cert for the full schema-21 JSON), ran
each through the API path (from_api_response → cert_to_inputs → continuous SAP), and
compared to the lodged rounded energy_rating_current.
This is the first measurement of raw-API behaviour on an unbiased population — the curated golden cohort (~exact) masked it.
Reproduce
- Sampler/fetcher:
/tmp/sample_fetch_2026.py→ caches JSONs to/tmp/epc_2026_sample/. - Evaluator:
/tmp/eval_sap_accuracy.py→ per-cert CSV + summary (% <0.5, buckets, worst-40, raise breakdown). Cluster analysis:/tmp/analyze2.py. (Token inbackend/.envOPEN_EPC_API_TOKEN;date_endmust be < today.) - These scripts are uncommitted (in /tmp). Worth promoting to
scripts/if this becomes a recurring measurement.
Headline (at HEAD 9c0a373f)
| metric | value |
|---|---|
| computed | 882 / 1000 (100 unsupported pre-21 schema; 18 still raise) |
| % |err| < 0.5 (of computed) | 41.8% |
| % < 1.0 / < 2.0 / < 5.0 | 54.9% / 71.9% / 87.8% |
| median / mean |err| | 0.79 / ~2.4 |
| mean signed err | +0.2 (slight over-rate) |
Accuracy is dominated by heating type (the load-bearing cut):
| main_heating_category | n | mean |err| | %<0.5 | status |
|---|---|---|---|---|
| 2 = gas boiler (PCDB-indexed) | 579 | 1.30 | 48% | the well-trodden path |
| 7 = electric storage heaters | 39 | 7.33 | 3% | broken — #1 lever |
| 10 = electric room heaters | 43 | 10.26 | 9% | broken — #2 lever |
| 6 = community scheme | 38 | 2.28 | 34% | known-hard |
| Flats (any heating) | 242 | 3.19 | 29% | geometry + communal |
Work shipped this session (S0380.219–225)
Coverage unblocked 788 → 882 computed (+94); one real accuracy bug fixed (+22 certs).
| slice | fix | certs |
|---|---|---|
| S0380.219 | floor_construction 3 → "Suspended, not timber" (RdSAP 10 field 3-1) | ~44 |
| S0380.220 | floor_construction 0 → None (Table 19 unknown; proven inert) | 37 |
| S0380.221 | default missing post_town (unused metadata) |
1 |
| S0380.222 | roof_construction 6 (thatched) + 7 (dwelling above) → None (inert) | 5 |
| S0380.223 | _part_geometry early-return key contract (RR KeyError) |
5 |
| S0380.224 | loose-jacket cylinder storage loss (Table 2 Note 1) — was None'd out → zero loss | 22 (mean err +2.29 → +0.45) |
| S0380.225 | §10.7 no-water-heating default A-F → 12mm loose jacket | 2 |
| S0380.226 | Elmhurst "Jacket" cylinder insulation → loose-jacket code 2 (Summary path) | (unblocked case 19) |
Headline at HEAD: 882 / 1000 computed, 41.8% < 0.5 (re-run the eval to refresh).
★ Active worksheet: simulated case 19 — the electric-storage-heater debug
The user generated sap worksheets/golden fixture debugging/simulated case 19/
(Summary_001431 (2).pdf + P960-0001-001431 - 2026-06-04T174437.228.pdf), purpose-built to
hit the #1 cluster. It exercises electric storage heaters (SAP code 402, control 2402
auto-charge, 7-hr off-peak tariff) + a loose-jacket 210 L cylinder + WHS 911 (gas
boiler for water only) + room-in-roof gables (Party + Exposed) + an alternative wall +
exposed floor + electric secondary.
S0380.226 unblocked extraction (the "Jacket" label was raising). The worksheet has FOUR blocks: block 1 = rating (UK-avg region 0; cost (255)=1816.58, SAP (258)=51, TF (53)=0.60, (51)=0.0330), block 2 = demand (postcode; CO2 (272)=3125.85, PE (286)=30271.76), blocks 3/4 = the potential/improved variants. Pin the rating block for SAP/cost, the demand block for PE/CO2. Worksheet header line 116 lodges "Separate Time Control: No" (NOT in the Summary §15 PDF — only in the P960 header).
Three slices shipped (S0380.227–229) — closed the +9 cluster signature; SAP cont 60.2 → 50.33 (worksheet ~51.22):
| slice | line ref | fix | SAP cont |
|---|---|---|---|
| S0380.227 | TF (53) 0.54→0.60; (59) h=3→h=5 | dedicated DHW-only system (WHS 911) is NOT separately timed → no Table 2b ×0.9 (RdSAP 10 §10.5.1). _separately_timed_dhw gated on WHC ∈ {901,902,914}. Worksheet-pins S0380.224's loose-jacket (51)=0.0330/(53)=0.60/(55)=3.4531/(56-57)Jan=107.0456 at 1e-4. |
60.2→60.1 |
| S0380.228 | cost (255) | electric SECONDARY on off-peak bills at Table 12a OTHER_DIRECT_ACTING_ELECTRIC (7-hr high-frac 1.00 = £0.1529), not the flat off-peak low (£0.0550). Worksheet (242): "1.0015.29 + 0.005.50". THE primary cost driver (−340). |
60.1→50.67 |
| S0380.229 | (62) 2493.30→3169.98 | dedicated water-heating boiler/circulator (WHC 911-931) feeds the cylinder via a primary loop → Table 3 row 1 primary loss applies (keyed off water_heating_code, since _water_heating_main returns the electric SPACE main). Restored the missing (59)=676.68 kWh/yr. |
50.67→50.33 |
The ONE remaining case-19 cause — the PV diverter (63b) — is S0380.232. Worksheet
header line 124 "Diverter = Yes"; Summary §19 "Diverter present: Yes". Per SAP 10.2 Appendix
G4 (PDF p.72-73) surplus PV is diverted to the cylinder immersion:
S_PV,diverter,m = EPV,m × (1 − βm) × 0.8 × 0.9, clamped to ≤ (62)m + (63a)m, entered as a
NEGATIVE (63b)m. (64)m = (62)m + (63a)m + (63b)m + … → (219)m = (64)m / eff. All four G4
inclusion conditions are met (PV connected to dwelling; cylinder 210 L > (43)=74.24; no solar
HW; no battery). Worksheet (63b) annual ≈ −1097.67 kWh → (64) drops 3169.98 → 2072.31, (219)
4876.9 → 3188.17. It ALSO changes the PV β-split (export drops: worksheet dwelling 1280.39 /
exported 184.16 vs our 1496.20 / 1187.98 with no diverter). This is a 3-layer feature
(extractor Diverter present → mapper flag → calculator (63b) + β-split interaction) —
implement as one focused slice. Spec note p.5485: for the β calc, (219)m must EXCLUDE the
diverter saving.
Smaller residuals after the diverter lands: main fuel (211) ours 20250.22 vs ws 19910.30 (+340), secondary (215) 3573.57 vs 3513.58 (+60), fabric (33) +1.0 (gable/alt-wall). Current demand block: CO2 (272) 3331.04 vs 3125.85, PE (286) 31653.23 vs 30271.76 — both will drop with the diverter (less grid import).
Debug recipe (reuse the /tmp/case19*.py throwaways or rebuild):
pages → ElmhurstSiteNotesExtractor(...).extract() → from_elmhurst_site_notes
→ cert_to_inputs / cert_to_demand_inputs → calculate_sap_from_inputs
# CI._cylinder_storage_loss_override(epc, main) → (57)m; CI._primary_loss_override(epc, age) → (59)m
# CI._water_heating_worksheet_and_gains(epc=…, water_efficiency_pct=0.65, is_instantaneous=False,
# primary_age=<band>, pcdb_record=None) → wh_result with (45)/(46)/(57)/(59)/(62)/(64)
Remaining work, prioritised
A. Accuracy clusters (highest value)
- PV diverter (S0380.232) — closes case 19 to 1e-4 AND helps the real-PV cluster (45 certs, mean 3.90). Fully spec'd in the case-19 section above (Appendix G4). Has a worksheet → 1e-4 bar. Do this first: it's the one open cause on a validated worksheet.
- Electric storage heaters (cat 7, 40 certs, mean 5.25). S0380.227-229 took it 7.33→5.25; the case-19 PV diverter will help further. Beyond that the tail is per-cert — a dedicated cat-7 worksheet (no PV, no diverter) would let you pin charge-control / responsiveness at 1e-4 instead of the ±0.5 lodged fallback.
- Electric room heaters (cat 10, 48 certs, mean 5.26). S0380.230-231 fixed the systematic
tariff bias (mean 9.49→5.26, signed +5.08→−0.86); the residual is now scattered per-cert
(e.g.
9836-5829−29.5, an under-rater). A cat-10 worksheet pins the tail at 1e-4. - Non-PCDB gas boilers (cat 2, no idx, 91 certs, mean 3.18) and Flats (282, mean 2.57) —
the next volume levers once the electric clusters are worksheet-pinned. Flats = geometry /
communal; start with the worst (
2100-5421negative SAP).2100-5421-0922-1622-3463diagnosed (S0380.234 session): NOT a flat —property_type 0, a 352 m² 2-storey uninsulated solid-wall dwelling (wall_constr 3 / wall_ins 4 as-built; roof_type 4, no roof insulation). Our space-heating demand is 71,084 kWh/yr → (37)=995.93 W/K → SAP −24.8 (lodged 36), cost £14,045. This is theas-built insulated-assumedU-value front (project_as_built_insulated_assumed_bug; S0380.209 fixed walls, "roof next"): the uninsulated-roof / as-built U over-estimates demand on big old dwellings. API-only (no worksheet → ±0.5 lodged fallback only); needs a generated worksheet or a roof-U spec audit to pin. It is one outlier, not a cluster-wide flats bug.
B. Remaining raises (16 certs — all U-value / heat-loss-sensitive, NOT enum guesses)
gable_wall_type2 & 3 (14 certs). RdSAP 10 Table 4 RR walls: 0=Party (U=0.25), 1=Exposed (U=common wall), 2/3 = Sheltered (U=external×R0.5) + Adjacent-to-heated (U=0), code↔type order unconfirmed (schema says "not yet seen"). Needs (i) a worksheet to pin which code is which + the U-values, and (ii) calculator support — the cascade only hasgable_wall/gable_wall_externalkinds; Sheltered (R=0.5) and Adjacent (U=0) are new. Best real example:2818-3053-3203-2655-9204lodges BOTH gable 2 and 3.main_heating_category: 9= warm air, mains gas (1 cert). Needs §9 warm-air dispatch.wall_insulation_thermal_conductivity3 (1 cert). Verified it shifts wall U (53.96→51.61 across λ) → worksheet-backed (the resolver's own discipline).floor_heat_loss8 (2 certs). Semantically unconfirmed; inert for the 2 observed (non-Main bp) but potentially "heated space below" (→ should exclude the floor, a calculator change). Don't guess.
The clean mapper-enum raises are exhausted — every remaining raise changes the answer, which is what the strict-raise guard exists to prevent.
★ Additional worksheets that would help most (the user will generate these on request)
The two electric clusters are now systematic-bias-free (S0380.227-231) but their TAILS sit at the ±0.5-vs-lodged fallback bar because no worksheet validates them at 1e-4. The three highest-value worksheets to ask the user for:
- An electric ROOM-heater dwelling (SAP code ~691, control 2601/2602/2603, Dual meter) — pins the cat-10 tail (48 certs, mean 5.26) at 1e-4. Make it PV-free + cylinder-free to isolate the space-heat path from the diverter/HW.
- An electric STORAGE-heater dwelling distinct from case 19 (no PV, no WHS-911) — pins the cat-7 tail (40 certs, mean 5.25): charge control (2401/2402), 7-hr vs 24-hr, responsiveness.
- A room-in-roof with a SHELTERED gable and an ADJACENT-TO-HEATED gable (Table 4 types
beyond Party/Exposed) — closes the
gable_wall_type2/3 raise (14 certs) and pins the Sheltered (U=ext×R0.5) / Adjacent (U=0) U-values the calculator must add.
Per worksheet send BOTH the Summary PDF (input) and the P960/dr87 worksheet PDF (the
(1)..(286) ground truth). Drop them in sap worksheets/golden fixture debugging/<name>/ and
run the case-19 debug recipe.
The original "design one property" guidance (kept below for reference) is what case 19 was built from.
What to generate — the single most productive worksheet (reference)
Heating is one-per-property, so one worksheet can't cover all four broken heating types. But fabric is independent of heating, so the highest-ROI single artifact bundles the #1 accuracy cluster with the fabric that closes the gable raises and pins the loose-jacket fix.
Build (in Elmhurst, a simulated case is fine — same as the existing simulated case N
worksheets) ONE property:
A house heated by ELECTRIC STORAGE HEATERS, with a room-in-roof and a hot-water cylinder:
- Heating: electric storage heaters (off-peak / Economy-7 tariff), with a clear control type. This is the load-bearing choice — it validates the 39-cert cat-7 cluster.
- Hot water: a cylinder with a loose-jacket insulation (not factory foam), a stated jacket thickness, and a cylinder thermostat. Pins S0380.224's loose-jacket storage loss (56)m at 1e-4 — currently only direction-validated.
- Room-in-roof with two gable walls of different types — ideally one "Sheltered" and one "Adjacent to another heated space" (plus, if the tool allows, a Party and an Exposed gable). Gives the Table 4 U-values for gable_wall_type 2 & 3 and disambiguates the code order — closes the 14-cert raise.
- An extension (2nd building part) with a different floor exposure (e.g. over unheated space or "to external air"). Exercises multi-bp geometry + floor-exposure handling.
From that single worksheet I can pin, at 1e-4: the electric-storage space-heating lines ((210)/(211)/space-heat), the loose-jacket storage loss (56)m, the RR gable U-values (30)/(32), and the multi-bp fabric (27)–(37). That's one cluster + one fix-validation + the biggest raise + fabric, all in one document.
If you'd rather do two: add a second worksheet that is identical but with electric room heaters instead of storage heaters — together they cover cat 7 + cat 10 (≈ 82 certs, the two worst clusters). A third for a community-heating flat would cover cat 6 + the flat geometry cluster.
Then send me, per worksheet
The Summary PDF (the Elmhurst input/site-notes) + the worksheet PDF (the (1)..(286)
ground truth). With those I run both front-ends through the cascade and pin each line ref at
1e-4, exactly as for the with api 3 pair (S0380.218).
Conventions (unchanged)
One cause = one slice = one commit; spec citation (page+line) in the message; AAA tests
(# Arrange / # Act / # Assert); abs(x - y) <= tol (not pytest.approx); SAP 10.2 only; no
tolerance widening / xfail / rel-tol. New code passes pyright strict with ZERO NEW errors
(baseline-compare with git stash; mapper.py / cert_to_inputs.py / heat_transmission.py carry
pre-existing errors — compare counts). Stage files by name (the tree has unrelated
pytest.ini/scripts/ changes that must NOT be staged).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>.