diff --git a/domain/sap10_calculator/docs/HANDOVER_API_SAMPLE_ACCURACY.md b/domain/sap10_calculator/docs/HANDOVER_API_SAMPLE_ACCURACY.md index 2aa3ff68..b47ff6ed 100644 --- a/domain/sap10_calculator/docs/HANDOVER_API_SAMPLE_ACCURACY.md +++ b/domain/sap10_calculator/docs/HANDOVER_API_SAMPLE_ACCURACY.md @@ -4,10 +4,50 @@ Point-in-time note. Start from [`AGENT_GUIDE.md`](AGENT_GUIDE.md) for methodolog 1e-4 bar, the per-line debugging loop, the section helpers, and the suite command. - **Branch:** `feature/per-cert-mapper-validation` -- **HEAD:** `0f6b4023` (S0380.229). Next slice: **S0380.230**. +- **HEAD:** `f326e4eb`. Next SAP slice: **S0380.232**. - **Baseline (§4 suite):** `tests/domain/sap10_calculator/ backend/documents_parser/tests/` - → green (2397 passed, 1 skipped). Pre-existing out-of-scope failures unchanged + → green (2407 passed, 1 skipped). Pre-existing out-of-scope failures unchanged (stone-§5.6 in `domain/sap10_ml/tests/`; `test_from_rdsap_schema.py::...test_total_floor_area`). + +## Headline now (1,000-cert 2026 API sample, HEAD `f326e4eb`) + +| metric | value | was (handover baseline `9c0a373f`) | +|---|---|---| +| computed | 882 / 1000 | 882 | +| **% \|err\| < 0.5** | **42.9%** | 41.8% | +| % < 1 / < 2 / < 5 | 56.7% / 74.6% / 90.1% | 54.9 / 71.9 / 87.8 | +| median / mean \|err\| | 0.73 / **2.04** | 0.79 / ~2.4 | +| mean signed | −0.41 | +0.2 | + +**Error by heating cluster** (the load-bearing cut — re-run `analyse_api_sap_clusters.py`): + +| cluster | n | mean \|err\| | %<0.5 | note | +|---|---|---|---|---| +| cat 2 gas boiler + PCDB | 639 | 1.27 | 49.6% | well-trodden | +| cat 2 gas, NO PCDB idx | 91 | 3.18 | 35.2% | non-PCDB Table-4b boilers | +| cat 6 community | 45 | 2.59 | 31.1% | known-hard | +| cat 7 electric storage | 40 | **5.25** | 10.0% | was 7.33 → S0380.227-229 | +| cat 10 electric room heaters | 48 | **5.26** | 16.7% | was 9.49 → S0380.230-231 (bias gone) | +| cat 4 HP + PCDB | 8 | 6.11 | 12.5% | small n, APM | +| Flats (any) | 282 | 2.57 | 30.5% | geometry / communal | +| real PV | 45 | 3.90 | 26.7% | Appendix M | + +**Worst individual offenders** (the long tail — `eval` TOP 40): `2100-5421-0922-1622-3463` +(−60.8, our SAP **negative** −24.8 vs lodged 36 — a flat, 2 bps, cat-2; the single worst, likely +a geometry/communal blow-up — START a per-cert dig here), `2958-8008` (+32, age 6=tiny), +`9836-5829` (−29.5, cat-10 tail), several cat-7/cat-10 in the −20s. + +## Work shipped (this session — S0380.227-231 + 3 mapper commits) + +| commit | what | +|---|---| +| **S0380.227** | dedicated DHW-only system (WHS 911) is NOT separately timed → no Table 2b ×0.9; TF (53) 0.54→0.60, (59) h=3→5 | +| **S0380.228** | electric SECONDARY on off-peak bills at Table 12a `OTHER_DIRECT_ACTING_ELECTRIC` (1.00 high-frac), not 100% low | +| **S0380.229** | dedicated water boiler/circulator (WHC 911-931) feeds cylinder via primary loop → Table 3 primary loss applies | +| **S0380.230** | electric room heaters (cat 10) on off-peak → `OTHER_DIRECT_ACTING_ELECTRIC` (mirror of .228 for the MAIN). cat-10 9.49→7.11 | +| **S0380.231** | Dual-meter electric room heaters → 10-hour tariff (RdSAP §12 Rule 3; codes 691-694,699). cat-10 7.11→5.26, bias +5.08→−0.86 | +| `bd25a3c7` | SY system-built vs B basement: code 6 stays system-built; basement → explicit `wall_is_basement`/`is_basement` flag. `system_build` is a derived property (wall type). API path post-processes via addendum. (issue #1177 — see `docs/PR_NOTE_system_built_basement_1177.md`: field-vs-property merge landmine) | +| `f326e4eb` | Elmhurst path now populates `roof_construction` (int) via `_elmhurst_roof_construction_int` for cross-mapper parity (API set it, Elmhurst didn't) | - **Toolkit (committed):** `scripts/fetch_2026_epc_sample.py`, `scripts/eval_api_sap_accuracy.py`, `scripts/analyse_api_sap_clusters.py`. The 1,000 cached JSONs live in `/tmp/epc_2026_sample/` (gitignored scratch — re-fetch with the sampler; @@ -103,7 +143,7 @@ PDF — only in the P960 header). | **S0380.228** | cost (255) | electric SECONDARY on off-peak bills at Table 12a `OTHER_DIRECT_ACTING_ELECTRIC` (7-hr high-frac **1.00** = £0.1529), not the flat off-peak low (£0.0550). Worksheet (242): "1.00*15.29 + 0.00*5.50". THE primary cost driver (−340). | 60.1→**50.67** | | **S0380.229** | (62) 2493.30→**3169.98** | dedicated water-heating boiler/circulator (WHC 911-931) feeds the cylinder via a primary loop → Table 3 row 1 primary loss applies (keyed off `water_heating_code`, since `_water_heating_main` returns the electric SPACE main). Restored the missing (59)=676.68 kWh/yr. | 50.67→50.33 | -**The ONE remaining case-19 cause — the PV diverter (63b) — is S0380.230 (next).** Worksheet +**The ONE remaining case-19 cause — the PV diverter (63b) — is S0380.232.** Worksheet header line 124 "Diverter = Yes"; Summary §19 "Diverter present: Yes". Per **SAP 10.2 Appendix G4 (PDF p.72-73)** surplus PV is diverted to the cylinder immersion: `S_PV,diverter,m = EPV,m × (1 − βm) × 0.8 × 0.9`, clamped to ≤ (62)m + (63a)m, entered as a @@ -134,15 +174,22 @@ pages → ElmhurstSiteNotesExtractor(...).extract() → from_elmhurst_site_notes ## Remaining work, prioritised -### A. Accuracy clusters (highest value — 80+ certs, mean err 7–10) -1. **Electric storage heaters (cat 7, 39 certs).** Distinct cascade — off-peak tariff split, - charge control (2401/2402), 7-hr/24-hr charge, Table 4a efficiency, responsiveness. **No - worksheet currently validates this path.** Errs both directions (−27..+16). -2. **Electric room heaters (cat 10, 43 certs).** Likewise (controls 2601/2602/2603). Worst - cluster by mean (10.26). -3. **Flats (242, 29% <0.5)** and **PV (40, 28%)** — secondary. +### A. Accuracy clusters (highest value) +1. **PV diverter (S0380.232)** — closes case 19 to 1e-4 AND helps the real-PV cluster (45 certs, + mean 3.90). Fully spec'd in the case-19 section above (Appendix G4). **Has a worksheet** → + 1e-4 bar. Do this first: it's the one open cause on a validated worksheet. +2. **Electric storage heaters (cat 7, 40 certs, mean 5.25).** S0380.227-229 took it 7.33→5.25; + the case-19 PV diverter will help further. Beyond that the tail is per-cert — a **dedicated + cat-7 worksheet** (no PV, no diverter) would let you pin charge-control / responsiveness at + 1e-4 instead of the ±0.5 lodged fallback. +3. **Electric room heaters (cat 10, 48 certs, mean 5.26).** S0380.230-231 fixed the systematic + tariff bias (mean 9.49→5.26, signed +5.08→−0.86); the residual is now scattered per-cert + (e.g. `9836-5829` −29.5, an under-rater). A **cat-10 worksheet** pins the tail at 1e-4. +4. **Non-PCDB gas boilers (cat 2, no idx, 91 certs, mean 3.18)** and **Flats (282, mean 2.57)** — + the next volume levers once the electric clusters are worksheet-pinned. Flats = geometry / + communal; start with the worst (`2100-5421` negative SAP). -### B. Remaining raises (18 certs — all U-value / heat-loss-sensitive, NOT enum guesses) +### B. Remaining raises (16 certs — all U-value / heat-loss-sensitive, NOT enum guesses) - **`gable_wall_type` 2 & 3 (14 certs).** RdSAP 10 **Table 4** RR walls: 0=Party (U=0.25), 1=Exposed (U=common wall), 2/3 = **Sheltered (U=external×R0.5)** + **Adjacent-to-heated (U=0)**, code↔type order unconfirmed (schema says "not yet seen"). Needs (i) a worksheet to @@ -161,16 +208,24 @@ is what the strict-raise guard exists to prevent. --- -## ★ Additional worksheets that would help most +## ★ Additional worksheets that would help most (the user will generate these on request) -Case 19 (above) already covers electric storage heaters + loose-jacket cylinder + RR. The two -that would add the most NEW coverage: -1. **An electric ROOM-heater dwelling** (SAP code ~691, control 2601/2602) — the **cat-10 - cluster (43 certs, worst by mean error 10.26)**, which case 19 does not touch. -2. **A room-in-roof with a SHELTERED gable and an ADJACENT-TO-HEATED gable** (Table 4 types +The two electric clusters are now systematic-bias-free (S0380.227-231) but their TAILS sit at +the ±0.5-vs-lodged fallback bar because **no worksheet validates them at 1e-4**. The three +highest-value worksheets to ask the user for: +1. **An electric ROOM-heater dwelling** (SAP code ~691, control 2601/2602/2603, Dual meter) — + pins the cat-10 tail (48 certs, mean 5.26) at 1e-4. Make it PV-free + cylinder-free to + isolate the space-heat path from the diverter/HW. +2. **An electric STORAGE-heater dwelling distinct from case 19** (no PV, no WHS-911) — pins the + cat-7 tail (40 certs, mean 5.25): charge control (2401/2402), 7-hr vs 24-hr, responsiveness. +3. **A room-in-roof with a SHELTERED gable and an ADJACENT-TO-HEATED gable** (Table 4 types beyond Party/Exposed) — closes the `gable_wall_type` 2/3 raise (14 certs) and pins the Sheltered (U=ext×R0.5) / Adjacent (U=0) U-values the calculator must add. +Per worksheet send BOTH the **Summary PDF** (input) and the **P960/dr87 worksheet PDF** (the +`(1)..(286)` ground truth). Drop them in `sap worksheets/golden fixture debugging//` and +run the case-19 debug recipe. + The original "design one property" guidance (kept below for reference) is what case 19 was built from.