mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
docs: handover — 1000-cert API accuracy study + next-steps + worksheet ask
Captures the wide-scale 2026-register study (41.8% <0.5, heating-driven cluster table), the 7 slices shipped (S0380.219-225), the prioritised remaining work (electric-heating clusters + worksheet-backed raises), and the single highest-ROI worksheet to generate: an electric-storage-heater house with a loose-jacket cylinder + a room-in-roof with Sheltered/ Adjacent gables + an extension — one document that validates the #1 accuracy cluster, pins the S0380.224 loose-jacket fix at 1e-4, closes the gable_wall_type Table 4 raise, and exercises multi-bp fabric. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
9c0a373f7d
commit
19ed29e13c
1 changed files with 152 additions and 0 deletions
152
domain/sap10_calculator/docs/HANDOVER_API_SAMPLE_ACCURACY.md
Normal file
152
domain/sap10_calculator/docs/HANDOVER_API_SAMPLE_ACCURACY.md
Normal file
|
|
@ -0,0 +1,152 @@
|
|||
# Handover — wide-scale API accuracy study + next steps
|
||||
|
||||
Point-in-time note. Start from [`AGENT_GUIDE.md`](AGENT_GUIDE.md) for methodology, the
|
||||
1e-4 bar, the per-line debugging loop, the section helpers, and the suite command.
|
||||
|
||||
- **Branch:** `feature/per-cert-mapper-validation`
|
||||
- **HEAD:** `9c0a373f` (S0380.225). Next slice: **S0380.226**.
|
||||
- **Baseline (§4 suite):** `tests/domain/sap10_calculator/ backend/documents_parser/tests/`
|
||||
→ green (2395 passed, 1 skipped). Pre-existing out-of-scope failures unchanged
|
||||
(stone-§5.6 in `domain/sap10_ml/tests/`; `test_from_rdsap_schema.py::...test_total_floor_area`).
|
||||
|
||||
---
|
||||
|
||||
## What this study did
|
||||
|
||||
Fetched a **random 1,000-cert sample of domestic EPCs lodged Jan–May 2026** from the
|
||||
GOV.UK EPB register (the `/api/domestic/search` date-windowed endpoint to enumerate cert
|
||||
numbers across random pages → `/api/certificate` per cert for the full schema-21 JSON), ran
|
||||
each through the **API path** (`from_api_response → cert_to_inputs → continuous SAP`), and
|
||||
compared to the lodged rounded `energy_rating_current`.
|
||||
|
||||
**This is the first measurement of raw-API behaviour on an unbiased population** — the curated
|
||||
golden cohort (~exact) masked it.
|
||||
|
||||
### Reproduce
|
||||
- Sampler/fetcher: `/tmp/sample_fetch_2026.py` → caches JSONs to `/tmp/epc_2026_sample/`.
|
||||
- Evaluator: `/tmp/eval_sap_accuracy.py` → per-cert CSV + summary (`% <0.5`, buckets, worst-40,
|
||||
raise breakdown). Cluster analysis: `/tmp/analyze2.py`. (Token in `backend/.env`
|
||||
`OPEN_EPC_API_TOKEN`; `date_end` must be < today.)
|
||||
- **These scripts are uncommitted (in /tmp).** Worth promoting to `scripts/` if this becomes
|
||||
a recurring measurement.
|
||||
|
||||
---
|
||||
|
||||
## Headline (at HEAD `9c0a373f`)
|
||||
|
||||
| metric | value |
|
||||
|---|---|
|
||||
| computed | **882 / 1000** (100 unsupported pre-21 schema; 18 still raise) |
|
||||
| **% \|err\| < 0.5** (of computed) | **41.8%** |
|
||||
| % < 1.0 / < 2.0 / < 5.0 | 54.9% / 71.9% / 87.8% |
|
||||
| median / mean \|err\| | 0.79 / ~2.4 |
|
||||
| mean signed err | +0.2 (slight over-rate) |
|
||||
|
||||
**Accuracy is dominated by heating type** (the load-bearing cut):
|
||||
|
||||
| main_heating_category | n | mean \|err\| | %<0.5 | status |
|
||||
|---|---|---|---|---|
|
||||
| 2 = gas boiler (PCDB-indexed) | 579 | 1.30 | 48% | the well-trodden path |
|
||||
| **7 = electric storage heaters** | 39 | **7.33** | **3%** | **broken — #1 lever** |
|
||||
| **10 = electric room heaters** | 43 | **10.26** | **9%** | **broken — #2 lever** |
|
||||
| 6 = community scheme | 38 | 2.28 | 34% | known-hard |
|
||||
| Flats (any heating) | 242 | 3.19 | 29% | geometry + communal |
|
||||
|
||||
---
|
||||
|
||||
## Work shipped this session (S0380.219–225)
|
||||
|
||||
Coverage unblocked **788 → 882 computed (+94)**; one real accuracy bug fixed (+22 certs).
|
||||
|
||||
| slice | fix | certs |
|
||||
|---|---|---|
|
||||
| S0380.219 | floor_construction 3 → "Suspended, not timber" (RdSAP 10 field 3-1) | ~44 |
|
||||
| S0380.220 | floor_construction 0 → None (Table 19 unknown; proven inert) | 37 |
|
||||
| S0380.221 | default missing `post_town` (unused metadata) | 1 |
|
||||
| S0380.222 | roof_construction 6 (thatched) + 7 (dwelling above) → None (inert) | 5 |
|
||||
| S0380.223 | `_part_geometry` early-return key contract (RR KeyError) | 5 |
|
||||
| **S0380.224** | **loose-jacket cylinder storage loss (Table 2 Note 1)** — was None'd out → zero loss | **22** (mean err +2.29 → +0.45) |
|
||||
| S0380.225 | §10.7 no-water-heating default A-F → 12mm loose jacket | 2 |
|
||||
|
||||
**S0380.224 is only DIRECTION-validated** (the 22 certs moved toward lodged + §4/golden stayed
|
||||
green) — it has **no worksheet pin on the loose-jacket magnitude**. A worksheet with a
|
||||
loose-jacket cylinder would close that (see "What to generate" below).
|
||||
|
||||
---
|
||||
|
||||
## Remaining work, prioritised
|
||||
|
||||
### A. Accuracy clusters (highest value — 80+ certs, mean err 7–10)
|
||||
1. **Electric storage heaters (cat 7, 39 certs).** Distinct cascade — off-peak tariff split,
|
||||
charge control (2401/2402), 7-hr/24-hr charge, Table 4a efficiency, responsiveness. **No
|
||||
worksheet currently validates this path.** Errs both directions (−27..+16).
|
||||
2. **Electric room heaters (cat 10, 43 certs).** Likewise (controls 2601/2602/2603). Worst
|
||||
cluster by mean (10.26).
|
||||
3. **Flats (242, 29% <0.5)** and **PV (40, 28%)** — secondary.
|
||||
|
||||
### B. Remaining raises (18 certs — all U-value / heat-loss-sensitive, NOT enum guesses)
|
||||
- **`gable_wall_type` 2 & 3 (14 certs).** RdSAP 10 **Table 4** RR walls: 0=Party (U=0.25),
|
||||
1=Exposed (U=common wall), 2/3 = **Sheltered (U=external×R0.5)** + **Adjacent-to-heated
|
||||
(U=0)**, code↔type order unconfirmed (schema says "not yet seen"). Needs (i) a worksheet to
|
||||
pin which code is which + the U-values, and (ii) **calculator support** — the cascade only
|
||||
has `gable_wall`/`gable_wall_external` kinds; Sheltered (R=0.5) and Adjacent (U=0) are new.
|
||||
Best real example: `2818-3053-3203-2655-9204` lodges BOTH gable 2 and 3.
|
||||
- **`main_heating_category: 9` = warm air, mains gas (1 cert).** Needs §9 warm-air dispatch.
|
||||
- **`wall_insulation_thermal_conductivity` 3 (1 cert).** Verified it shifts wall U
|
||||
(53.96→51.61 across λ) → worksheet-backed (the resolver's own discipline).
|
||||
- **`floor_heat_loss` 8 (2 certs).** Semantically unconfirmed; inert for the 2 observed
|
||||
(non-Main bp) but potentially "heated space below" (→ should exclude the floor, a calculator
|
||||
change). Don't guess.
|
||||
|
||||
The clean mapper-enum raises are **exhausted** — every remaining raise changes the answer, which
|
||||
is what the strict-raise guard exists to prevent.
|
||||
|
||||
---
|
||||
|
||||
## ★ What to generate — the single most productive worksheet
|
||||
|
||||
Heating is one-per-property, so one worksheet can't cover all four broken heating types. But
|
||||
**fabric is independent of heating**, so the highest-ROI single artifact bundles the #1
|
||||
accuracy cluster with the fabric that closes the gable raises and pins the loose-jacket fix.
|
||||
|
||||
**Build (in Elmhurst, a simulated case is fine — same as the existing `simulated case N`
|
||||
worksheets) ONE property:**
|
||||
|
||||
> **A house heated by ELECTRIC STORAGE HEATERS, with a room-in-roof and a hot-water cylinder:**
|
||||
> - **Heating:** electric storage heaters (off-peak / Economy-7 tariff), with a clear control
|
||||
> type. *This is the load-bearing choice — it validates the 39-cert cat-7 cluster.*
|
||||
> - **Hot water:** a cylinder with a **loose-jacket** insulation (not factory foam), a stated
|
||||
> jacket thickness, and a cylinder thermostat. *Pins S0380.224's loose-jacket storage loss
|
||||
> (56)m at 1e-4 — currently only direction-validated.*
|
||||
> - **Room-in-roof** with **two gable walls of different types** — ideally one **"Sheltered"**
|
||||
> and one **"Adjacent to another heated space"** (plus, if the tool allows, a Party and an
|
||||
> Exposed gable). *Gives the Table 4 U-values for gable_wall_type 2 & 3 and disambiguates the
|
||||
> code order — closes the 14-cert raise.*
|
||||
> - **An extension (2nd building part)** with a different floor exposure (e.g. over unheated
|
||||
> space or "to external air"). *Exercises multi-bp geometry + floor-exposure handling.*
|
||||
|
||||
From that single worksheet I can pin, at 1e-4: the electric-storage space-heating lines
|
||||
((210)/(211)/space-heat), the loose-jacket storage loss (56)m, the RR gable U-values (30)/(32),
|
||||
and the multi-bp fabric (27)–(37). That's **one cluster + one fix-validation + the biggest
|
||||
raise + fabric**, all in one document.
|
||||
|
||||
**If you'd rather do two:** add a second worksheet that is identical but with **electric room
|
||||
heaters** instead of storage heaters — together they cover cat 7 + cat 10 (≈ 82 certs, the
|
||||
two worst clusters). A third for a **community-heating flat** would cover cat 6 + the flat
|
||||
geometry cluster.
|
||||
|
||||
### Then send me, per worksheet
|
||||
The **Summary PDF** (the Elmhurst input/site-notes) + the **worksheet PDF** (the `(1)..(286)`
|
||||
ground truth). With those I run both front-ends through the cascade and pin each line ref at
|
||||
1e-4, exactly as for the `with api 3` pair (S0380.218).
|
||||
|
||||
---
|
||||
|
||||
## Conventions (unchanged)
|
||||
One cause = one slice = one commit; spec citation (page+line) in the message; AAA tests
|
||||
(`# Arrange / # Act / # Assert`); `abs(x - y) <= tol` (not `pytest.approx`); SAP 10.2 only; no
|
||||
tolerance widening / xfail / rel-tol. New code passes pyright strict with ZERO NEW errors
|
||||
(baseline-compare with `git stash`; mapper.py / cert_to_inputs.py / heat_transmission.py carry
|
||||
pre-existing errors — compare counts). Stage files by name (the tree has unrelated
|
||||
`pytest.ini`/`scripts/` changes that must NOT be staged).
|
||||
`Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>`.
|
||||
Loading…
Add table
Reference in a new issue