# Handover — wide-scale API accuracy study + next steps

Point-in-time note. Start from [`AGENT_GUIDE.md`](AGENT_GUIDE.md) for methodology, the
1e-4 bar, the per-line debugging loop, the section helpers, and the suite command.

- **Branch:** `feature/per-cert-mapper-validation`
- **HEAD:** `9521d524`. Next SAP slice: **S0380.235**.
- **Baseline (§4 suite):** `tests/domain/sap10_calculator/ backend/documents_parser/tests/`
  → green (2412 passed, 1 skipped). Pre-existing out-of-scope failures unchanged
  (stone-§5.6 in `domain/sap10_ml/tests/`; `test_from_rdsap_schema.py::...test_total_floor_area`).

## Shipped this session (S0380.232-234 — the case-19 PV closure)

The PV diverter (the prior handover's S0380.232 ask) needed two prerequisite
spec bugs fixed first; all three landed:

| slice | commit | spec | what |
|---|---|---|---|
| **S0380.232** | `212b0c92` | App M1 §3a (p.93, l.5470-5476) | D_PV excludes the LOW-rate portion of an off-peak electric main: `(211)` is only PV-eligible where its §10a code ∈ {30,32,34,35,38}. Storage heaters on 7-hr charge wholly at low rate → fraction 0.0 → excluded. β_Jan 0.894→0.792 (ws 0.791). New `_main_space_heating_high_rate_fraction`. |
| **S0380.233** | `d4a8c02b` | App M1 §6 (p.94, l.5510-5513) | PV-used-in-dwelling credited at the Table 12a ALL_OTHER_USES **weighted** rate (7-hr 14.311 p/kWh), not the bare low rate (5.50). Was under-crediting onsite PV on every off-peak PV cert. Delegates to `_other_fuel_cost_gbp_per_kwh`; STANDARD unchanged. |
| **S0380.234** | `9521d524` | Appendix G4 (p.72-73) | The PV diverter. 3 layers: extractor `Diverter present` + schema `pv_diverter` → `pv_diverter_present` flag (Elmhurst + API mappers) → `_pv_diverter_monthly_kwh` (SPV = export×0.8×0.9, clamp ≤ (62)+(63a), → (63b)m); `cert_to_inputs` recomputes (219) + PV export, β fixed pre-diverter. |

**Case 19 now: SAP cont 50.33 → 51.34** (ws 51.2221; both round to lodged **51**),
cost (255) 1847.5→1812.3 (ws 1816.6), CO2 3331→3120 (ws 3126), (233a) dwelling
1280.6 (ws **1280.4** — the β fix pins it). The diverter formula is **exact in
summer** (Jun SPV 186.07 = export×0.72, matches ws (63b)).

**The remaining +0.11 SAP on case 19 = two separate, still-open causes:**
1. **Winter Appendix-M monthly EPV shape.** Our annual EPV (2684.17) matches the
   worksheet exactly and Jun-Sep match per-month exactly, but Jan-May/Oct-Dec our
   EPV is ~9-11% LOW (worksheet Jan 68.2 vs ours 62.5). Back-solve: ws EPV_m =
   |(233a)_m| + |(63b)_m|/0.72. This under-diverts in winter → export (233b) 280.7
   vs ws 184.2, and (219) 3322 vs ws 3188. **A two-array PV apportionment issue
   (case 19 has SE + NW arrays with different overshading) — chase in §M / Appendix U
   monthly radiation, NOT the diverter (which is validated).**
2. **Fabric (33) +1.0 W/K** (ours 305.04 vs ws 304.04) — a single element off by
   exactly 1.0; floor=25.000 is suspiciously round. Walk the per-element §3 breakdown.

The **eval headline is flat** (42.9→43.0% <0.5; cat-7 5.25→4.93) — expected: the
diverter is rare and the β/price effects are small on the rounded SAP. The value
was pinning the worksheet-validated case 19 + fixing three real spec bugs that the
curated cohort masked.

## Headline now (1,000-cert 2026 API sample, HEAD `f326e4eb`)

| metric | value | was (handover baseline `9c0a373f`) |
|---|---|---|
| computed | 882 / 1000 | 882 |
| **% \|err\| < 0.5** | **42.9%** | 41.8% |
| % < 1 / < 2 / < 5 | 56.7% / 74.6% / 90.1% | 54.9 / 71.9 / 87.8 |
| median / mean \|err\| | 0.73 / **2.04** | 0.79 / ~2.4 |
| mean signed | −0.41 | +0.2 |

**Error by heating cluster** (the load-bearing cut — re-run `analyse_api_sap_clusters.py`):

| cluster | n | mean \|err\| | %<0.5 | note |
|---|---|---|---|---|
| cat 2 gas boiler + PCDB | 639 | 1.27 | 49.6% | well-trodden |
| cat 2 gas, NO PCDB idx | 91 | 3.18 | 35.2% | non-PCDB Table-4b boilers |
| cat 6 community | 45 | 2.59 | 31.1% | known-hard |
| cat 7 electric storage | 40 | **5.25** | 10.0% | was 7.33 → S0380.227-229 |
| cat 10 electric room heaters | 48 | **5.26** | 16.7% | was 9.49 → S0380.230-231 (bias gone) |
| cat 4 HP + PCDB | 8 | 6.11 | 12.5% | small n, APM |
| Flats (any) | 282 | 2.57 | 30.5% | geometry / communal |
| real PV | 45 | 3.90 | 26.7% | Appendix M |

**Worst individual offenders** (the long tail — `eval` TOP 40): `2100-5421-0922-1622-3463`
(−60.8, our SAP **negative** −24.8 vs lodged 36 — a flat, 2 bps, cat-2; the single worst, likely
a geometry/communal blow-up — START a per-cert dig here), `2958-8008` (+32, age 6=tiny),
`9836-5829` (−29.5, cat-10 tail), several cat-7/cat-10 in the −20s.

## Work shipped (this session — S0380.227-231 + 3 mapper commits)

| commit | what |
|---|---|
| **S0380.227** | dedicated DHW-only system (WHS 911) is NOT separately timed → no Table 2b ×0.9; TF (53) 0.54→0.60, (59) h=3→5 |
| **S0380.228** | electric SECONDARY on off-peak bills at Table 12a `OTHER_DIRECT_ACTING_ELECTRIC` (1.00 high-frac), not 100% low |
| **S0380.229** | dedicated water boiler/circulator (WHC 911-931) feeds cylinder via primary loop → Table 3 primary loss applies |
| **S0380.230** | electric room heaters (cat 10) on off-peak → `OTHER_DIRECT_ACTING_ELECTRIC` (mirror of .228 for the MAIN). cat-10 9.49→7.11 |
| **S0380.231** | Dual-meter electric room heaters → 10-hour tariff (RdSAP §12 Rule 3; codes 691-694,699). cat-10 7.11→5.26, bias +5.08→−0.86 |
| `bd25a3c7` | SY system-built vs B basement: code 6 stays system-built; basement → explicit `wall_is_basement`/`is_basement` flag. `system_build` is a derived property (wall type). API path post-processes via addendum. (issue #1177 — see `docs/PR_NOTE_system_built_basement_1177.md`: field-vs-property merge landmine) |
| `f326e4eb` | Elmhurst path now populates `roof_construction` (int) via `_elmhurst_roof_construction_int` for cross-mapper parity (API set it, Elmhurst didn't) |
- **Toolkit (committed):** `scripts/fetch_2026_epc_sample.py`,
  `scripts/eval_api_sap_accuracy.py`, `scripts/analyse_api_sap_clusters.py`. The 1,000 cached
  JSONs live in `/tmp/epc_2026_sample/` (gitignored scratch — re-fetch with the sampler;
  `EPC_SAMPLE_CACHE` overrides the dir). Re-run the eval after any mapper/calculator change to
  watch the headline move.

---

## What this study did

Fetched a **random 1,000-cert sample of domestic EPCs lodged Jan–May 2026** from the
GOV.UK EPB register (the `/api/domestic/search` date-windowed endpoint to enumerate cert
numbers across random pages → `/api/certificate` per cert for the full schema-21 JSON), ran
each through the **API path** (`from_api_response → cert_to_inputs → continuous SAP`), and
compared to the lodged rounded `energy_rating_current`.

**This is the first measurement of raw-API behaviour on an unbiased population** — the curated
golden cohort (~exact) masked it.

### Reproduce
- Sampler/fetcher: `/tmp/sample_fetch_2026.py` → caches JSONs to `/tmp/epc_2026_sample/`.
- Evaluator: `/tmp/eval_sap_accuracy.py` → per-cert CSV + summary (`% <0.5`, buckets, worst-40,
  raise breakdown). Cluster analysis: `/tmp/analyze2.py`. (Token in `backend/.env`
  `OPEN_EPC_API_TOKEN`; `date_end` must be < today.)
- **These scripts are uncommitted (in /tmp).** Worth promoting to `scripts/` if this becomes
  a recurring measurement.

---

## Headline (at HEAD `9c0a373f`)

| metric | value |
|---|---|
| computed | **882 / 1000** (100 unsupported pre-21 schema; 18 still raise) |
| **% \|err\| < 0.5** (of computed) | **41.8%** |
| % < 1.0 / < 2.0 / < 5.0 | 54.9% / 71.9% / 87.8% |
| median / mean \|err\| | 0.79 / ~2.4 |
| mean signed err | +0.2 (slight over-rate) |

**Accuracy is dominated by heating type** (the load-bearing cut):

| main_heating_category | n | mean \|err\| | %<0.5 | status |
|---|---|---|---|---|
| 2 = gas boiler (PCDB-indexed) | 579 | 1.30 | 48% | the well-trodden path |
| **7 = electric storage heaters** | 39 | **7.33** | **3%** | **broken — #1 lever** |
| **10 = electric room heaters** | 43 | **10.26** | **9%** | **broken — #2 lever** |
| 6 = community scheme | 38 | 2.28 | 34% | known-hard |
| Flats (any heating) | 242 | 3.19 | 29% | geometry + communal |

---

## Work shipped this session (S0380.219–225)

Coverage unblocked **788 → 882 computed (+94)**; one real accuracy bug fixed (+22 certs).

| slice | fix | certs |
|---|---|---|
| S0380.219 | floor_construction 3 → "Suspended, not timber" (RdSAP 10 field 3-1) | ~44 |
| S0380.220 | floor_construction 0 → None (Table 19 unknown; proven inert) | 37 |
| S0380.221 | default missing `post_town` (unused metadata) | 1 |
| S0380.222 | roof_construction 6 (thatched) + 7 (dwelling above) → None (inert) | 5 |
| S0380.223 | `_part_geometry` early-return key contract (RR KeyError) | 5 |
| **S0380.224** | **loose-jacket cylinder storage loss (Table 2 Note 1)** — was None'd out → zero loss | **22** (mean err +2.29 → +0.45) |
| S0380.225 | §10.7 no-water-heating default A-F → 12mm loose jacket | 2 |
| S0380.226 | Elmhurst "Jacket" cylinder insulation → loose-jacket code 2 (Summary path) | (unblocked case 19) |

Headline at HEAD: **882 / 1000 computed, 41.8% < 0.5** (re-run the eval to refresh).

---

## ★ Active worksheet: simulated case 19 — the electric-storage-heater debug

The user generated `sap worksheets/golden fixture debugging/simulated case 19/`
(`Summary_001431 (2).pdf` + `P960-0001-001431 - 2026-06-04T174437.228.pdf`), purpose-built to
hit the #1 cluster. It exercises **electric storage heaters** (SAP code 402, control 2402
auto-charge, 7-hr off-peak tariff) + a **loose-jacket 210 L cylinder** + **WHS 911** (gas
boiler for water only) + **room-in-roof gables (Party + Exposed) + an alternative wall +
exposed floor + electric secondary**.

**S0380.226 unblocked extraction** (the "Jacket" label was raising). The worksheet has FOUR
blocks: **block 1 = rating** (UK-avg region 0; cost (255)=1816.58, SAP (258)=51, TF (53)=0.60,
(51)=0.0330), **block 2 = demand** (postcode; CO2 (272)=3125.85, PE (286)=30271.76), blocks 3/4
= the potential/improved variants. Pin the rating block for SAP/cost, the demand block for
PE/CO2. Worksheet header line 116 lodges **"Separate Time Control: No"** (NOT in the Summary §15
PDF — only in the P960 header).

**Three slices shipped (S0380.227–229)** — closed the +9 cluster signature; SAP cont
60.2 → **50.33** (worksheet ~51.22):

| slice | line ref | fix | SAP cont |
|---|---|---|---|
| **S0380.227** | TF (53) 0.54→**0.60**; (59) h=3→**h=5** | dedicated DHW-only system (WHS 911) is NOT separately timed → no Table 2b ×0.9 (RdSAP 10 §10.5.1). `_separately_timed_dhw` gated on WHC ∈ {901,902,914}. Worksheet-pins S0380.224's loose-jacket (51)=0.0330/(53)=0.60/(55)=3.4531/(56-57)Jan=107.0456 at 1e-4. | 60.2→60.1 |
| **S0380.228** | cost (255) | electric SECONDARY on off-peak bills at Table 12a `OTHER_DIRECT_ACTING_ELECTRIC` (7-hr high-frac **1.00** = £0.1529), not the flat off-peak low (£0.0550). Worksheet (242): "1.00*15.29 + 0.00*5.50". THE primary cost driver (−340). | 60.1→**50.67** |
| **S0380.229** | (62) 2493.30→**3169.98** | dedicated water-heating boiler/circulator (WHC 911-931) feeds the cylinder via a primary loop → Table 3 row 1 primary loss applies (keyed off `water_heating_code`, since `_water_heating_main` returns the electric SPACE main). Restored the missing (59)=676.68 kWh/yr. | 50.67→50.33 |

**The ONE remaining case-19 cause — the PV diverter (63b) — is S0380.232.** Worksheet
header line 124 "Diverter = Yes"; Summary §19 "Diverter present: Yes". Per **SAP 10.2 Appendix
G4 (PDF p.72-73)** surplus PV is diverted to the cylinder immersion:
`S_PV,diverter,m = EPV,m × (1 − βm) × 0.8 × 0.9`, clamped to ≤ (62)m + (63a)m, entered as a
NEGATIVE (63b)m. (64)m = (62)m + (63a)m + (63b)m + … → (219)m = (64)m / eff. All four G4
inclusion conditions are met (PV connected to dwelling; cylinder 210 L > (43)=74.24; no solar
HW; no battery). Worksheet (63b) annual ≈ −1097.67 kWh → (64) drops 3169.98 → 2072.31, (219)
4876.9 → 3188.17. It ALSO changes the PV β-split (export drops: worksheet dwelling 1280.39 /
exported 184.16 vs our 1496.20 / 1187.98 with no diverter). This is a 3-layer feature
(extractor `Diverter present` → mapper flag → calculator (63b) + β-split interaction) —
implement as one focused slice. Spec note p.5485: for the β calc, (219)m must EXCLUDE the
diverter saving.

Smaller residuals after the diverter lands: main fuel (211) ours 20250.22 vs ws 19910.30
(+340), secondary (215) 3573.57 vs 3513.58 (+60), fabric (33) +1.0 (gable/alt-wall). Current
demand block: CO2 (272) 3331.04 vs 3125.85, PE (286) 31653.23 vs 30271.76 — both will drop with
the diverter (less grid import).

**Debug recipe** (reuse the `/tmp/case19*.py` throwaways or rebuild):
```python
pages → ElmhurstSiteNotesExtractor(...).extract() → from_elmhurst_site_notes
→ cert_to_inputs / cert_to_demand_inputs → calculate_sap_from_inputs
# CI._cylinder_storage_loss_override(epc, main) → (57)m; CI._primary_loss_override(epc, age) → (59)m
# CI._water_heating_worksheet_and_gains(epc=…, water_efficiency_pct=0.65, is_instantaneous=False,
#   primary_age=<band>, pcdb_record=None) → wh_result with (45)/(46)/(57)/(59)/(62)/(64)
```

---

## Remaining work, prioritised

### A. Accuracy clusters (highest value)
1. **PV diverter (S0380.232)** — closes case 19 to 1e-4 AND helps the real-PV cluster (45 certs,
   mean 3.90). Fully spec'd in the case-19 section above (Appendix G4). **Has a worksheet** →
   1e-4 bar. Do this first: it's the one open cause on a validated worksheet.
2. **Electric storage heaters (cat 7, 40 certs, mean 5.25).** S0380.227-229 took it 7.33→5.25;
   the case-19 PV diverter will help further. Beyond that the tail is per-cert — a **dedicated
   cat-7 worksheet** (no PV, no diverter) would let you pin charge-control / responsiveness at
   1e-4 instead of the ±0.5 lodged fallback.
3. **Electric room heaters (cat 10, 48 certs, mean 5.26).** S0380.230-231 fixed the systematic
   tariff bias (mean 9.49→5.26, signed +5.08→−0.86); the residual is now scattered per-cert
   (e.g. `9836-5829` −29.5, an under-rater). A **cat-10 worksheet** pins the tail at 1e-4.
4. **Non-PCDB gas boilers (cat 2, no idx, 91 certs, mean 3.18)** and **Flats (282, mean 2.57)** —
   the next volume levers once the electric clusters are worksheet-pinned. Flats = geometry /
   communal; start with the worst (`2100-5421` negative SAP).
   - **`2100-5421-0922-1622-3463` diagnosed (S0380.234 session):** NOT a flat — `property_type 0`,
     a **352 m² 2-storey uninsulated solid-wall** dwelling (wall_constr 3 / wall_ins 4 as-built;
     roof_type 4, no roof insulation). Our space-heating demand is **71,084 kWh/yr** → (37)=995.93
     W/K → SAP −24.8 (lodged 36), cost £14,045. This is the **`as-built insulated-assumed`**
     U-value front ([[project_as_built_insulated_assumed_bug]]; S0380.209 fixed walls, "roof next"):
     the uninsulated-roof / as-built U over-estimates demand on big old dwellings. API-only (no
     worksheet → ±0.5 lodged fallback only); needs a generated worksheet or a roof-U spec audit to
     pin. It is one outlier, not a cluster-wide flats bug.

### B. Remaining raises (16 certs — all U-value / heat-loss-sensitive, NOT enum guesses)
- **`gable_wall_type` 2 & 3 (14 certs).** RdSAP 10 **Table 4** RR walls: 0=Party (U=0.25),
  1=Exposed (U=common wall), 2/3 = **Sheltered (U=external×R0.5)** + **Adjacent-to-heated
  (U=0)**, code↔type order unconfirmed (schema says "not yet seen"). Needs (i) a worksheet to
  pin which code is which + the U-values, and (ii) **calculator support** — the cascade only
  has `gable_wall`/`gable_wall_external` kinds; Sheltered (R=0.5) and Adjacent (U=0) are new.
  Best real example: `2818-3053-3203-2655-9204` lodges BOTH gable 2 and 3.
- **`main_heating_category: 9` = warm air, mains gas (1 cert).** Needs §9 warm-air dispatch.
- **`wall_insulation_thermal_conductivity` 3 (1 cert).** Verified it shifts wall U
  (53.96→51.61 across λ) → worksheet-backed (the resolver's own discipline).
- **`floor_heat_loss` 8 (2 certs).** Semantically unconfirmed; inert for the 2 observed
  (non-Main bp) but potentially "heated space below" (→ should exclude the floor, a calculator
  change). Don't guess.

The clean mapper-enum raises are **exhausted** — every remaining raise changes the answer, which
is what the strict-raise guard exists to prevent.

---

## ★ Additional worksheets that would help most (the user will generate these on request)

The two electric clusters are now systematic-bias-free (S0380.227-231) but their TAILS sit at
the ±0.5-vs-lodged fallback bar because **no worksheet validates them at 1e-4**. The three
highest-value worksheets to ask the user for:
1. **An electric ROOM-heater dwelling** (SAP code ~691, control 2601/2602/2603, Dual meter) —
   pins the cat-10 tail (48 certs, mean 5.26) at 1e-4. Make it PV-free + cylinder-free to
   isolate the space-heat path from the diverter/HW.
2. **An electric STORAGE-heater dwelling distinct from case 19** (no PV, no WHS-911) — pins the
   cat-7 tail (40 certs, mean 5.25): charge control (2401/2402), 7-hr vs 24-hr, responsiveness.
3. **A room-in-roof with a SHELTERED gable and an ADJACENT-TO-HEATED gable** (Table 4 types
   beyond Party/Exposed) — closes the `gable_wall_type` 2/3 raise (14 certs) and pins the
   Sheltered (U=ext×R0.5) / Adjacent (U=0) U-values the calculator must add.

Per worksheet send BOTH the **Summary PDF** (input) and the **P960/dr87 worksheet PDF** (the
`(1)..(286)` ground truth). Drop them in `sap worksheets/golden fixture debugging/<name>/` and
run the case-19 debug recipe.

The original "design one property" guidance (kept below for reference) is what case 19 was
built from.

## What to generate — the single most productive worksheet (reference)

Heating is one-per-property, so one worksheet can't cover all four broken heating types. But
**fabric is independent of heating**, so the highest-ROI single artifact bundles the #1
accuracy cluster with the fabric that closes the gable raises and pins the loose-jacket fix.

**Build (in Elmhurst, a simulated case is fine — same as the existing `simulated case N`
worksheets) ONE property:**

> **A house heated by ELECTRIC STORAGE HEATERS, with a room-in-roof and a hot-water cylinder:**
> - **Heating:** electric storage heaters (off-peak / Economy-7 tariff), with a clear control
>   type. *This is the load-bearing choice — it validates the 39-cert cat-7 cluster.*
> - **Hot water:** a cylinder with a **loose-jacket** insulation (not factory foam), a stated
>   jacket thickness, and a cylinder thermostat. *Pins S0380.224's loose-jacket storage loss
>   (56)m at 1e-4 — currently only direction-validated.*
> - **Room-in-roof** with **two gable walls of different types** — ideally one **"Sheltered"**
>   and one **"Adjacent to another heated space"** (plus, if the tool allows, a Party and an
>   Exposed gable). *Gives the Table 4 U-values for gable_wall_type 2 & 3 and disambiguates the
>   code order — closes the 14-cert raise.*
> - **An extension (2nd building part)** with a different floor exposure (e.g. over unheated
>   space or "to external air"). *Exercises multi-bp geometry + floor-exposure handling.*

From that single worksheet I can pin, at 1e-4: the electric-storage space-heating lines
((210)/(211)/space-heat), the loose-jacket storage loss (56)m, the RR gable U-values (30)/(32),
and the multi-bp fabric (27)–(37). That's **one cluster + one fix-validation + the biggest
raise + fabric**, all in one document.

**If you'd rather do two:** add a second worksheet that is identical but with **electric room
heaters** instead of storage heaters — together they cover cat 7 + cat 10 (≈ 82 certs, the
two worst clusters). A third for a **community-heating flat** would cover cat 6 + the flat
geometry cluster.

### Then send me, per worksheet
The **Summary PDF** (the Elmhurst input/site-notes) + the **worksheet PDF** (the `(1)..(286)`
ground truth). With those I run both front-ends through the cascade and pin each line ref at
1e-4, exactly as for the `with api 3` pair (S0380.218).

---

## Conventions (unchanged)
One cause = one slice = one commit; spec citation (page+line) in the message; AAA tests
(`# Arrange / # Act / # Assert`); `abs(x - y) <= tol` (not `pytest.approx`); SAP 10.2 only; no
tolerance widening / xfail / rel-tol. New code passes pyright strict with ZERO NEW errors
(baseline-compare with `git stash`; mapper.py / cert_to_inputs.py / heat_transmission.py carry
pre-existing errors — compare counts). Stage files by name (the tree has unrelated
`pytest.ini`/`scripts/` changes that must NOT be staged).
`Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>`.