docs: handover — fold in S0380.224-226 + simulated case 19 debug state

Bump HEAD/next-slice/baseline, note the committed scripts toolkit, and add
the active "simulated case 19" section: the electric-storage-heater +
loose-jacket worksheet the user generated, what S0380.226 unblocked, and
the prioritised cluster bugs it exposed (cost (255) -334 = the +9 SAP
driver; Table 2b TF x0.9; WHS-911 storage-vs-combi routing; fabric +1.0).
Updated the "what to generate" ask to the two highest-value follow-ups
(electric room heaters; Sheltered/Adjacent RR gables).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-06-04 17:14:05 +00:00
parent c236aa5836
commit 796dce9d69

View file

@ -4,10 +4,15 @@ Point-in-time note. Start from [`AGENT_GUIDE.md`](AGENT_GUIDE.md) for methodolog
1e-4 bar, the per-line debugging loop, the section helpers, and the suite command.
- **Branch:** `feature/per-cert-mapper-validation`
- **HEAD:** `9c0a373f` (S0380.225). Next slice: **S0380.226**.
- **HEAD:** `c236aa58` (S0380.226). Next slice: **S0380.227**.
- **Baseline (§4 suite):** `tests/domain/sap10_calculator/ backend/documents_parser/tests/`
→ green (2395 passed, 1 skipped). Pre-existing out-of-scope failures unchanged
→ green (2397 passed, 1 skipped). Pre-existing out-of-scope failures unchanged
(stone-§5.6 in `domain/sap10_ml/tests/`; `test_from_rdsap_schema.py::...test_total_floor_area`).
- **Toolkit (committed):** `scripts/fetch_2026_epc_sample.py`,
`scripts/eval_api_sap_accuracy.py`, `scripts/analyse_api_sap_clusters.py`. The 1,000 cached
JSONs live in `/tmp/epc_2026_sample/` (gitignored scratch — re-fetch with the sampler;
`EPC_SAMPLE_CACHE` overrides the dir). Re-run the eval after any mapper/calculator change to
watch the headline move.
---
@ -67,10 +72,45 @@ Coverage unblocked **788 → 882 computed (+94)**; one real accuracy bug fixed (
| S0380.223 | `_part_geometry` early-return key contract (RR KeyError) | 5 |
| **S0380.224** | **loose-jacket cylinder storage loss (Table 2 Note 1)** — was None'd out → zero loss | **22** (mean err +2.29 → +0.45) |
| S0380.225 | §10.7 no-water-heating default A-F → 12mm loose jacket | 2 |
| S0380.226 | Elmhurst "Jacket" cylinder insulation → loose-jacket code 2 (Summary path) | (unblocked case 19) |
**S0380.224 is only DIRECTION-validated** (the 22 certs moved toward lodged + §4/golden stayed
green) — it has **no worksheet pin on the loose-jacket magnitude**. A worksheet with a
loose-jacket cylinder would close that (see "What to generate" below).
Headline at HEAD: **882 / 1000 computed, 41.8% < 0.5** (re-run the eval to refresh).
---
## ★ Active worksheet: simulated case 19 — the electric-storage-heater debug
The user generated `sap worksheets/golden fixture debugging/simulated case 19/`
(`Summary_001431 (2).pdf` + `P960-0001-001431 - 2026-06-04T174437.228.pdf`), purpose-built to
hit the #1 cluster. It exercises **electric storage heaters** (SAP code 402, control 2402
auto-charge, 7-hr off-peak tariff) + a **loose-jacket 210 L cylinder** + **WHS 911** (gas
boiler for water only) + **room-in-roof gables (Party + Exposed) + an alternative wall +
exposed floor + electric secondary**.
**S0380.226 unblocked extraction** (the "Jacket" label was raising). Running the Summary path
through the cascade vs the worksheet (rating block) then exposes the cat-7 cluster bugs — our
**SAP cont 60.2 vs worksheet ~51.2 (+9, the cluster signature)**:
| line ref | ours | worksheet | gap / cause |
|---|---|---|---|
| **cost (255)** | 1482.12 | **1816.58** | **334 → the primary +9 SAP driver.** Likely the Economy-7 off-peak tariff cost split (SAP 10.2 Table 12a / §10c high-rate vs low-rate). START HERE. |
| Table 2b TF (53) | 0.54 | **0.60** | we apply the ×0.9 separately-timed multiplier, but the Summary lodges "Separate Time Control: No" → should be 0.60. Check `_table_2b_note_b_multiplier_applies` / the override's `separately_timed_dhw` for a WHS-911 + storage-heater dwelling. |
| HW fuel (219) | ~3642 | 3188.17 | +454 HW over. Tied to the TF bug + how WHS-911 routes storage vs combi loss (the WaterHeating result showed a spurious `combi_loss` and `solar_storage=0` via the section helper — verify against the FULL cascade, the section helper may not mirror it). |
| fabric (33) | 305.04 | 304.04 | +1.0 W/K — minor; gable / alternative-wall rounding. Low priority. |
| CO2 (272) | 3124.67 | 3125.85 | ≈ exact. |
**Debug recipe** (reuse the `/tmp/case19*.py` throwaways or rebuild):
```python
pages → ElmhurstSiteNotesExtractor(...).extract() → from_elmhurst_site_notes
→ cert_to_inputs / cert_to_demand_inputs → calculate_sap_from_inputs
# section helpers: water_heating_section_from_cert / heat_transmission_section_from_cert
# CI._cylinder_storage_loss_override(epc, main) returns (57)m directly — useful to bisect.
```
The worksheet's rating block is block 1 (UK-avg, region 0); the demand block (postcode) is
block 2. Pin the rating block for SAP/cost, the demand block for PE/CO2.
S0380.224's loose-jacket magnitude can be **worksheet-pinned at 1e-4 here** once the TF bug is
fixed (worksheet (51)=0.0330, (52)=0.8298, (53)=0.6000, (55)=3.4531, (56)Jan=107.0456).
---
@ -103,7 +143,20 @@ is what the strict-raise guard exists to prevent.
---
## ★ What to generate — the single most productive worksheet
## ★ Additional worksheets that would help most
Case 19 (above) already covers electric storage heaters + loose-jacket cylinder + RR. The two
that would add the most NEW coverage:
1. **An electric ROOM-heater dwelling** (SAP code ~691, control 2601/2602) — the **cat-10
cluster (43 certs, worst by mean error 10.26)**, which case 19 does not touch.
2. **A room-in-roof with a SHELTERED gable and an ADJACENT-TO-HEATED gable** (Table 4 types
beyond Party/Exposed) — closes the `gable_wall_type` 2/3 raise (14 certs) and pins the
Sheltered (U=ext×R0.5) / Adjacent (U=0) U-values the calculator must add.
The original "design one property" guidance (kept below for reference) is what case 19 was
built from.
## What to generate — the single most productive worksheet (reference)
Heating is one-per-property, so one worksheet can't cover all four broken heating types. But
**fabric is independent of heating**, so the highest-ROI single artifact bundles the #1