mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
150 lines
10 KiB
Markdown
150 lines
10 KiB
Markdown
# Handover — API SAP accuracy (session 2): fabric + tariff fixes, and why we now need worksheets
|
||
|
||
**Branch:** `feature/per-cert-mapper-validation` (long-lived working branch — **NEVER PR to
|
||
main**; the user pushes/PRs when ready). **HEAD `4d1a58b8`**, local-only ahead of origin.
|
||
|
||
**READ ALSO:** `docs/HANDOVER_COST_DECOMPOSITION.md` (the decomposition method + price
|
||
calibration), and the auto-memory `project_per_cert_mapper_validation_state` (full slice log
|
||
+ deproven approaches).
|
||
|
||
## THE GOAL (unchanged, and we are FAR from it)
|
||
100% of API records with a lodged SAP must compute within **0.5 SAP** of the API's
|
||
`energy_rating_current`. `scripts/eval_api_sap_accuracy.py` headline (905 computed certs):
|
||
|
||
| metric | session-2 start | now (`4d1a58b8`) |
|
||
|--------|-----------------|------------------|
|
||
| **% \|err\| < 0.5** | 43.8% | **45.0%** |
|
||
| % \|err\| < 1.0 | — | 59.4% |
|
||
| % \|err\| < 2.0 | — | 77.6% |
|
||
| mean \|err\| | 2.01 | 1.757 |
|
||
| **mean signed** | −0.31 | **+0.019** |
|
||
| p99 \|err\| | — | 17.2 |
|
||
| max \|err\| | — | 61.4 |
|
||
|
||
**Be honest about where this is: 45% within 0.5 is poor.** The headline barely moved
|
||
(+1.2pp) across 6 fixes because each clean cause is small (10-30 certs). What DID change
|
||
decisively is the **signed bias: −0.31 → +0.02**. The systematic under-rating that defined
|
||
the sample at session start is gone — the remaining error is **bidirectional scatter**, ~55%
|
||
of certs are >0.5 off in BOTH directions, and there is **no single lever left that moves the
|
||
headline by more than ~0.3pp.** Further progress is per-cause, and increasingly needs
|
||
worksheet ground truth (see "Why we need worksheets" below).
|
||
|
||
## WHAT SHIPPED THIS SESSION (7 commits, all green, pyright net-zero)
|
||
1. `98f71d25` **decomposition tool** `scripts/decompose_api_cost_error.py` — calibrates the
|
||
consumer price from accurate gas certs (gas £0.0809, elec £0.2839/kWh), predicts each
|
||
component cost, clusters by (component × direction). **CAVEAT: it uses the STANDARD elec
|
||
price, so it MIS-FLAGS off-peak-heated certs as `heat:high`.** For electric certs compare
|
||
against the cascade's own cost intermediates (`SapResult.intermediate['main_heating_cost_gbp']`
|
||
etc.), not the decomposition.
|
||
2. `bb830741` **sloping-ceiling** — `roof_construction=8` carries `sloping_ceiling_insulation_thickness`
|
||
("100mm"); the mapper dropped it. Now fed → Table 17 col (1a). 9884 −5.5 → +0.06.
|
||
3. `6b045146` **gas-boiler fuel from §14.2 mains-gas meter** (Summary/Elmhurst path) — a
|
||
Table-4b gas boiler with a SEPARATE electric immersion (§15 "Electricity") used to raise
|
||
`MissingMainFuelType`; now falls back to the "Main gas: Yes" meter flag → mains gas.
|
||
4. `3aed8f85` **floor "another dwelling below" (code 6)** — party floor, no heat loss
|
||
(mirror of the roof's "another dwelling above" override). 2115 floor 47.85→0 W/K, −23→−4.
|
||
5. `a64e857b` **roof "Unknown insulation" → Table 18** (§5.11.4) — "NI"=Not Indicated
|
||
(undetermined), not zero; routes to age-band default not 2.30. Cluster mean|err| 7.8→1.8.
|
||
6. `678aa7af` **main-roof U ignores Room-in-Roof "no insulation" leak** — `_joined_descriptions`
|
||
concatenated ALL roofs[], so an RR "no insulation" contaminated the main-roof U. Now drops
|
||
"Roof room(s)" entries for the main-roof U (RR shell unaffected; golden 6035 safe).
|
||
7. `4d1a58b8` **Unknown-meter + storage/CPSU → off-peak tariff** (§12) — storage heaters
|
||
charge overnight; an Unknown (code-3) meter no longer bills their charge at standard
|
||
13.19p. `rdsap_tariff_for_cert` infers off-peak for Rule-1 CPSU/Rule-2 storage only; and
|
||
`_fuel_cost` now uses `_rdsap_tariff` (not raw `tariff_from_meter_type`). 7336 −26 → −0.16.
|
||
|
||
## DEPROVEN — do NOT retry (empirically failed this session)
|
||
- **roof `'ND'` (Not Determined) → Table 18.** `'ND'` is on ~305/905 certs and the lodged
|
||
calc genuinely uses the description's high U for many; routing all 'ND' to age-default broke
|
||
9 certs (some 0 → +15) for zero net gain. The description is load-bearing even with 'ND'.
|
||
(The narrow "**unknown**" word IS a clean signal — that's slice `a64e857b`.)
|
||
- **broad "all §12 Rule-3 electric → off-peak on Unknown meters".** Net-NEGATIVE (44.9→44.8,
|
||
bias flipped +0.16). Room-heater dwellings (code 691) over-credit when forced off-peak
|
||
(their electric-immersion HW goes off-peak). Direct-boiler 191 alone is +0.1 but requires a
|
||
191-vs-691 split that is NOT spec-grounded (both are Rule 3) — a population data-fit; left
|
||
unshipped on purpose (the user's principle: RdSAP is deterministic, no overfitting).
|
||
- **RR shell U Table-17-50mm** (from session 1, still true): golden 6035 disproves it.
|
||
|
||
## THE REMAINING CLUSTER MAP (where the error lives now)
|
||
Run `scripts/decompose_api_cost_error.py` for the live table. As of `4d1a58b8`:
|
||
|
||
| cluster | n | within 0.5 | note |
|
||
|---------|---|-----------|------|
|
||
| `heat:high` | 319 | 39% | we over-state heating energy (or off-peak mis-priced) |
|
||
| `heat:low` | 229 | 47% | we under-state heating energy |
|
||
| `hw:low` | 161 | 50% | |
|
||
| `hw:high` | 120 | 43% | |
|
||
| `balanced` | 76 | 55% | |
|
||
|
||
By dwelling type / system (from `_results.csv`):
|
||
- **Flats (prop 2): 283 certs, 31% within 0.5** — still the worst segment by far (houses 50%,
|
||
bungalows 59%). Signed −0.24. The fabric/tariff fixes helped but flats remain hardest.
|
||
- **Heat pumps (cat 4): 20 certs, 45% within 0.5, mean signed +1.43, mean|err| 3.81** — a
|
||
distinct OVER-rating cluster, UNTOUCHED this session. These have PCDB indices (e.g. 9472
|
||
+15.0 idx 104351, 2789 +13.4 idx 104632, 4135 +10.0 idx 106465). Likely an Appendix-N /
|
||
PCDB efficiency or HW-from-HP issue. **Good next target — it's a coherent over-rate cluster,
|
||
and HPs may be pinnable from a worksheet.**
|
||
- **Top single offenders** (see eval TOP-40): 2100 −61 (n_bps=2, electric, prop 0), 2958 +32
|
||
(single-bp electric), 0390 −29 (flat, "Flat no insulation"+ND roof — the deproven path),
|
||
2080 −25 (electric direct-boiler flat — mixed cause), 7921 −23 (gas, PCDB idx 16814).
|
||
|
||
## WHY WE NEED WORKSHEETS NOW (the user has accepted this)
|
||
The decomposition method got us the directional bias (under-rating → balanced). It is now
|
||
**exhausted for the bidirectional scatter** because:
|
||
1. For **electric/off-peak certs** the consumer-price `*_cost_current` fields diverge from the
|
||
SAP Table-12 prices the rating actually uses — the lodged total can EXCEED ours while the
|
||
lodged SAP is HIGHER. So we cannot back-calculate a reliable kWh/cost target.
|
||
2. The remaining causes (HW immersion off-peak charge-vs-on-demand split; HP Appendix-N
|
||
efficiency + HP-DHW; per-cert fabric like 2100's −61) are **sub-component values that the
|
||
±10% calibration cannot resolve** — they need a line-ref pin.
|
||
|
||
**What to generate (in priority order):** Elmhurst worksheets (P960 + Summary) for —
|
||
- **A heat pump cat-4 cert that over-rates**, e.g. `9472-3052-6202-0766-7200` (+15.0, idx
|
||
104351) or `2789-8331-7179-3314-1150` (+13.4). Pin §9b HP efficiency (Appendix N / Table
|
||
4a), the (206)/(207) seasonal eff, and HW-from-HP. This is the cleanest coherent cluster.
|
||
- **A meter-3 electric flat with electric-immersion HW**, e.g. `2474-3059-4202-4496-3200`
|
||
(−13.3, cat-2 direct-boiler 191) or `2080` (−25.5). Pin EXACTLY how RdSAP bills the
|
||
electric-immersion HW (§4 + Table 12a) and direct-acting heating on an off-peak tariff —
|
||
this resolves whether Rule-3 electric on Unknown meters should be off-peak (the unshipped
|
||
191 question) and the HW-off-peak split.
|
||
- (Optional) **2100-5421-0922-1622-3463** (−61, the worst) — 2 building parts, electric; a
|
||
worksheet would localise whether it's a §3 geometry or heating blowup.
|
||
|
||
The faithful-reproduction rule still holds: **use the cert's OWN data** (its API JSON is in
|
||
`/tmp/epc_2026_sample/<cert>.json`; generate the Elmhurst worksheet from the same property),
|
||
NOT a template-edited 001431. Template edits drift (session-1 lesson).
|
||
|
||
## TOOLS & CONVENTIONS
|
||
- `PYTHONPATH=/workspaces/model python scripts/eval_api_sap_accuracy.py` — headline + TOP-40
|
||
+ per-cert `/tmp/epc_2026_sample/_results.csv`.
|
||
- `PYTHONPATH=/workspaces/model python scripts/decompose_api_cost_error.py` — component
|
||
clusters + `_cost_decomposition.csv` (remember the off-peak caveat above).
|
||
- Sample: ~1009 cached API JSONs at `/tmp/epc_2026_sample` (override `EPC_SAMPLE_CACHE`).
|
||
- **Conventions (non-negotiable):** one cause = one slice = one commit; **spec citation
|
||
(page+line)** in the message; AAA test headers; `abs(x-y)<=tol` not `pytest.approx`;
|
||
SAP 10.2 only; **no tolerance-widening / xfail**; pyright strict **net-zero** (baseline-
|
||
compare via `git stash`); **stage files BY NAME** (the tree carries unrelated `scripts/`
|
||
+ "sap worksheets/" changes — never `git add -A`); RdSAP is **deterministic** — every fix
|
||
must be a spec rule, not a population data-fit (the user is firm on this);
|
||
`Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>`.
|
||
- **REGRESSION after any calculator change:** `tests/domain/sap10_calculator/`,
|
||
`backend/documents_parser/tests/`, `datatypes/epc/`, and the golden fixtures (esp. **6035**).
|
||
- **Pre-existing failures to IGNORE** (fail on the stashed baseline too, NOT yours):
|
||
`test_from_rdsap_schema.py::…::test_total_floor_area`, and the 2 stone-wall U tests in
|
||
`domain/sap10_ml/tests/test_rdsap_uvalues.py` (`…stone_granite_thin_wall_age_a_120mm…`,
|
||
`…stone_sandstone…`) — likely fallout from the §5.7 wall-U slice `27375d93`; worth a
|
||
separate fix but not yours to count against net-zero.
|
||
|
||
## ARCHITECTURE NOTES THAT COST TIME (so you don't re-discover them)
|
||
- The API cost path uses `inputs.fuel_cost` (the Table-32/12a **precompute**, `_fuel_cost`),
|
||
NOT the scalar `space_heating_fuel_cost_gbp_per_kwh`. `calculator.py:540` picks the
|
||
precompute when populated, ELSE the legacy scalar fields. `_fuel_cost` returns a ZERO
|
||
sentinel for any off-peak tariff → the calculator then falls back to the legacy scalar
|
||
fields (which DO carry the off-peak rate from `_space_heating_fuel_cost_gbp_per_kwh`). So a
|
||
tariff change only bites if it flips `_fuel_cost`'s tariff off STANDARD.
|
||
- `_table_12a_system_for_main` maps cat-10 room heaters → `OTHER_DIRECT_ACTING_ELECTRIC` but
|
||
leaves storage (401-409, correct: → None → 100% low rate) and **direct-boiler 191 / CPSU as
|
||
TODO** (→ None → pure low rate, which OVER-credits 191 on off-peak). Wiring 191/CPSU rows is
|
||
a prerequisite if you ever revisit Rule-3-on-Unknown.
|
||
- Fuel codes stored on `SapResult` are the RAW API enum (26 = mains gas), not Table-12 codes
|
||
— translate via `table_12.API_FUEL_TO_TABLE_12` (the decomposition script does this).
|