diff --git a/domain/sap10_calculator/docs/HANDOVER_API_SAMPLE_ACCURACY.md b/domain/sap10_calculator/docs/HANDOVER_API_SAMPLE_ACCURACY.md index 7a7094c3..afbc4b6f 100644 --- a/domain/sap10_calculator/docs/HANDOVER_API_SAMPLE_ACCURACY.md +++ b/domain/sap10_calculator/docs/HANDOVER_API_SAMPLE_ACCURACY.md @@ -4,10 +4,15 @@ Point-in-time note. Start from [`AGENT_GUIDE.md`](AGENT_GUIDE.md) for methodolog 1e-4 bar, the per-line debugging loop, the section helpers, and the suite command. - **Branch:** `feature/per-cert-mapper-validation` -- **HEAD:** `9c0a373f` (S0380.225). Next slice: **S0380.226**. +- **HEAD:** `c236aa58` (S0380.226). Next slice: **S0380.227**. - **Baseline (§4 suite):** `tests/domain/sap10_calculator/ backend/documents_parser/tests/` - → green (2395 passed, 1 skipped). Pre-existing out-of-scope failures unchanged + → green (2397 passed, 1 skipped). Pre-existing out-of-scope failures unchanged (stone-§5.6 in `domain/sap10_ml/tests/`; `test_from_rdsap_schema.py::...test_total_floor_area`). +- **Toolkit (committed):** `scripts/fetch_2026_epc_sample.py`, + `scripts/eval_api_sap_accuracy.py`, `scripts/analyse_api_sap_clusters.py`. The 1,000 cached + JSONs live in `/tmp/epc_2026_sample/` (gitignored scratch — re-fetch with the sampler; + `EPC_SAMPLE_CACHE` overrides the dir). Re-run the eval after any mapper/calculator change to + watch the headline move. --- @@ -67,10 +72,45 @@ Coverage unblocked **788 → 882 computed (+94)**; one real accuracy bug fixed ( | S0380.223 | `_part_geometry` early-return key contract (RR KeyError) | 5 | | **S0380.224** | **loose-jacket cylinder storage loss (Table 2 Note 1)** — was None'd out → zero loss | **22** (mean err +2.29 → +0.45) | | S0380.225 | §10.7 no-water-heating default A-F → 12mm loose jacket | 2 | +| S0380.226 | Elmhurst "Jacket" cylinder insulation → loose-jacket code 2 (Summary path) | (unblocked case 19) | -**S0380.224 is only DIRECTION-validated** (the 22 certs moved toward lodged + §4/golden stayed -green) — it has **no worksheet pin on the loose-jacket magnitude**. A worksheet with a -loose-jacket cylinder would close that (see "What to generate" below). +Headline at HEAD: **882 / 1000 computed, 41.8% < 0.5** (re-run the eval to refresh). + +--- + +## ★ Active worksheet: simulated case 19 — the electric-storage-heater debug + +The user generated `sap worksheets/golden fixture debugging/simulated case 19/` +(`Summary_001431 (2).pdf` + `P960-0001-001431 - 2026-06-04T174437.228.pdf`), purpose-built to +hit the #1 cluster. It exercises **electric storage heaters** (SAP code 402, control 2402 +auto-charge, 7-hr off-peak tariff) + a **loose-jacket 210 L cylinder** + **WHS 911** (gas +boiler for water only) + **room-in-roof gables (Party + Exposed) + an alternative wall + +exposed floor + electric secondary**. + +**S0380.226 unblocked extraction** (the "Jacket" label was raising). Running the Summary path +through the cascade vs the worksheet (rating block) then exposes the cat-7 cluster bugs — our +**SAP cont 60.2 vs worksheet ~51.2 (+9, the cluster signature)**: + +| line ref | ours | worksheet | gap / cause | +|---|---|---|---| +| **cost (255)** | 1482.12 | **1816.58** | **−334 → the primary +9 SAP driver.** Likely the Economy-7 off-peak tariff cost split (SAP 10.2 Table 12a / §10c high-rate vs low-rate). START HERE. | +| Table 2b TF (53) | 0.54 | **0.60** | we apply the ×0.9 separately-timed multiplier, but the Summary lodges "Separate Time Control: No" → should be 0.60. Check `_table_2b_note_b_multiplier_applies` / the override's `separately_timed_dhw` for a WHS-911 + storage-heater dwelling. | +| HW fuel (219) | ~3642 | 3188.17 | +454 HW over. Tied to the TF bug + how WHS-911 routes storage vs combi loss (the WaterHeating result showed a spurious `combi_loss` and `solar_storage=0` via the section helper — verify against the FULL cascade, the section helper may not mirror it). | +| fabric (33) | 305.04 | 304.04 | +1.0 W/K — minor; gable / alternative-wall rounding. Low priority. | +| CO2 (272) | 3124.67 | 3125.85 | ≈ exact. | + +**Debug recipe** (reuse the `/tmp/case19*.py` throwaways or rebuild): +```python +pages → ElmhurstSiteNotesExtractor(...).extract() → from_elmhurst_site_notes +→ cert_to_inputs / cert_to_demand_inputs → calculate_sap_from_inputs +# section helpers: water_heating_section_from_cert / heat_transmission_section_from_cert +# CI._cylinder_storage_loss_override(epc, main) returns (57)m directly — useful to bisect. +``` +The worksheet's rating block is block 1 (UK-avg, region 0); the demand block (postcode) is +block 2. Pin the rating block for SAP/cost, the demand block for PE/CO2. + +S0380.224's loose-jacket magnitude can be **worksheet-pinned at 1e-4 here** once the TF bug is +fixed (worksheet (51)=0.0330, (52)=0.8298, (53)=0.6000, (55)=3.4531, (56)Jan=107.0456). --- @@ -103,7 +143,20 @@ is what the strict-raise guard exists to prevent. --- -## ★ What to generate — the single most productive worksheet +## ★ Additional worksheets that would help most + +Case 19 (above) already covers electric storage heaters + loose-jacket cylinder + RR. The two +that would add the most NEW coverage: +1. **An electric ROOM-heater dwelling** (SAP code ~691, control 2601/2602) — the **cat-10 + cluster (43 certs, worst by mean error 10.26)**, which case 19 does not touch. +2. **A room-in-roof with a SHELTERED gable and an ADJACENT-TO-HEATED gable** (Table 4 types + beyond Party/Exposed) — closes the `gable_wall_type` 2/3 raise (14 certs) and pins the + Sheltered (U=ext×R0.5) / Adjacent (U=0) U-values the calculator must add. + +The original "design one property" guidance (kept below for reference) is what case 19 was +built from. + +## What to generate — the single most productive worksheet (reference) Heating is one-per-property, so one worksheet can't cover all four broken heating types. But **fabric is independent of heating**, so the highest-ROI single artifact bundles the #1