mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
Code 3 = "(other premises below)" = above partially heated space (§3.12 → U=0.7), confirmed 9/9 on single-BP certs (the diagnostic that dodged the lossy-floors[] contamination). Records the 7536 re-pin and the lesson that "irreducible residual" golden notes can mask a real mapper bug. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
168 lines
12 KiB
Markdown
168 lines
12 KiB
Markdown
# Handover — API SAP accuracy (session 3): raises cleared, now profile-driven
|
||
|
||
**Branch:** `feature/per-cert-mapper-validation` (long-lived working branch — **NEVER PR to
|
||
main**; the user pushes/PRs when ready). **HEAD `a8e5563a`+** (the profiler commit), local-only
|
||
ahead of origin.
|
||
|
||
**READ ALSO:** the auto-memory `project_per_cert_mapper_validation_state` (full slice log +
|
||
deproven approaches + the meter/shower data-fidelity findings), and the earlier
|
||
`docs/HANDOVER_API_ACCURACY_S2.md` (session-2 method).
|
||
|
||
## THE GOAL (unchanged)
|
||
100% of API records with a lodged SAP compute within **0.5 SAP** of the API's
|
||
`energy_rating_current`. Headline gauge:
|
||
`PYTHONPATH=/workspaces/model python scripts/eval_api_sap_accuracy.py`.
|
||
|
||
| metric | now (`a8e5563a`) |
|
||
|--------|------------------|
|
||
| **% \|err\| < 0.5** | **45.1%** |
|
||
| % \|err\| < 1.0 | 59.4% |
|
||
| mean \|err\| | 1.702 |
|
||
| mean signed | −0.006 (balanced) |
|
||
| computed / raises | **909 / 0** |
|
||
| unsupported_schema | 100 (deferred — see below) |
|
||
|
||
45% is still poor. The systematic bias is gone; remaining error is per-cert scatter + the
|
||
profile-surfaced buckets below.
|
||
|
||
## WHAT SHIPPED THIS SESSION (7 slices, all green, pyright net-zero)
|
||
1. `e41a0bc0` **PCDB heat pump w/o SAP code → Table 12a ASHP_APP_N SH split** (0.80 high-rate).
|
||
2. `2bc73fb0` **HP-DHW (WHC 901/902/914 + PCDB HP) → Table 12a WH 0.70 split.** Together (1)+(2)
|
||
killed the cat-4 heat-pump over-rating bias (+1.43 → +0.06).
|
||
3. `449d8c5b` **direct-acting electric boiler (191) → zero primary circuit loss** (SAP Table 3
|
||
p.160 zero list names it verbatim).
|
||
4. `f4048588` **wall_insulation_thermal_conductivity ignored → §5.8 default λ=0.04.** (See KEY
|
||
INSIGHT below — the gov field is RdSAP *output*, not an input.)
|
||
5. `1c5675a0` **floor_heat_loss=8 → no floor heat loss** (extension floor over a heated space;
|
||
RdSAP §3, like code 6).
|
||
6. `a8e5563a` **main_heating_category=9 (warm air) → Table 11 secondary fraction 0.10.**
|
||
(4)(5)(6) cleared **all 4 raises** — eval now has zero raises.
|
||
7. `(profiler)` **`scripts/profile_api_error.py`** — the new diagnostic (below).
|
||
|
||
## SESSION-4 UPDATE (HEAD `8741fbdf`) — read before re-working the leads below
|
||
- **Lead #1 `floor_codes=3` RESOLVED — the code IS authoritative.** The diagnostic that cracked
|
||
it: join each **single-BP** cert's `floor_heat_loss` code to its independent
|
||
`floors[].description` (the multi-BP tally was contaminated because a cert's `floors[]` summary
|
||
is LOSSY — it drops some BPs' descriptions). Single-BP gives a perfect 1:1 enum: code 1↔"To
|
||
external air"(exposed), 2↔"To unheated space"(semi-exposed), **3↔"(other premises below)"
|
||
(9/9)**, 6↔"(another dwelling below)"(party), 7↔Solid/Suspended(ground). Per RdSAP §3.12
|
||
(p.25) code 3 = "above a partially heated space" (non-domestic premises below) → §5.14 constant
|
||
**U=0.7** (NOT Table-20 semi-exposed, NOT ground). SHIPPED `8741fbdf`.
|
||
- **SHIPPED `b40e0f67`:** exposed-floor-on-flats (code 1) area fix — §3.12. A flat's code-1
|
||
floor was area-zeroed by `_dwelling_exposure`; now the per-BP `is_exposed_floor` overrides the
|
||
flat suppression upward (mirrors the "another dwelling below" party override).
|
||
- **SHIPPED `8741fbdf`:** code 3 → `is_above_partially_heated_space` (U=0.7) + area override.
|
||
**RE-PINNED golden 7536-3827** — its Ext2(bp3) code-3 floor was mis-read as "ground U=1.12" by
|
||
a prior agent (the lossy floors[] dropped its description), who declared the residual an
|
||
"irreducible register-rounding artifact, DO NOT chase". It was this bug: U 1.12→0.70, PE/CO2
|
||
residuals moved toward 0. **LESSON: "irreducible residual" golden notes are suspect — a real
|
||
mapper bug can hide there.** Eval (both slices): 45.1→45.3%, mean|err| 1.702→1.659, <1.0
|
||
59.5→60.2%. User is generating a fresh `0380-2087-8190-2996-3075` worksheet to independently
|
||
confirm U=0.7 (0380 now −0.63) — validate when it lands.
|
||
- **Leads re-checked, NOT clean:** `immersion_type=2` (+1.86) is high-scatter (mean|err| 3.71,
|
||
bidirectional). `main_control=2107` (+1.63) is correctly mapped ("Programmer, TRVs and bypass"
|
||
type 2 Table 4c(2)) — over-rate is diffuse gas-boiler/flat-fabric, not a dispatch bug.
|
||
`roof_codes=1` broad bucket is mean −0.15 (the −1.78 was top-floor-electric-flat outliers
|
||
−29/−25). Remaining gains need per-cert worksheets (start code-3) or the unsupported-schema ticket.
|
||
|
||
## KEY INSIGHT (load-bearing, from the user)
|
||
**The gov EPC API JSON is the published OUTPUT of RdSAP software (Elmhurst), not its input.**
|
||
So any API field Elmhurst doesn't expose as an *input* is register metadata the RdSAP10 method
|
||
does **not** consume — route it to the spec default, don't try to "use" it. This is exactly why
|
||
`wall_insulation_thermal_conductivity` (slice 4) → always λ=0.04. Apply the same lens to any
|
||
new "extra" API field before wiring it.
|
||
|
||
## THE NEW DIAGNOSTIC — `scripts/profile_api_error.py` (run this first)
|
||
`PYTHONPATH=/workspaces/model python scripts/profile_api_error.py` joins each computed cert's
|
||
signed error with a rich feature set from its **raw API JSON** (not the mapped EpcPropertyData),
|
||
and ranks (feature, value) buckets by error carried + by |mean signed| bias. This is how to find
|
||
"silly API-path handling" gaps. `--min-n N` sets the bucket floor.
|
||
|
||
### PRIORITISED LEADS (from the run at `a8e5563a` — verify with the profiler, they'll shift)
|
||
Cleanest "API-path handling" candidates first (small, biased buckets = likely a mapper/dispatch
|
||
bug, not noise):
|
||
|
||
1. **`floor_codes=3` → mean signed +5.37 (n=10).** We map API `floor_heat_loss=3` → "To unheated
|
||
space" (same as code 2). The +5.37 over-rate says that's wrong — code 3 likely isn't "unheated
|
||
space" (or its U is wrong). Pull the n=10 certs, check what code 3 really is (ask the user the
|
||
Elmhurst floor dropdown — the API=output lens). **Highest bias, smallest scope = start here.**
|
||
2. **Control-code biases:** `main_control=2306` −2.96 (n=11), `2602` +2.49 (n=14), `2107` +1.65
|
||
(n=38), `2402` +1.14 (n=10), `2307` +0.74 (n=11). Several control codes carry systematic bias
|
||
→ Table 4c/4e control dispatch gaps. `2107`/`2602` are the biggest. Check
|
||
`_CONTROL_TYPE_BY_CODE` + the Table 4c efficiency-adjustment / Table 4e control coverage.
|
||
3. **`immersion_type=2` (dual immersion) → +2.00 (n=43, mean|err| 3.85).** RdSAP §12 lists "dual
|
||
electric immersion" as an off-peak trigger; the cascade does NOT consume `immersion_heating_type`
|
||
for tariff (verified — only comments reference it). Wiring the §12 dual-immersion → off-peak
|
||
rule for Unknown meters is a clean spec slice. (1=single, 2=dual per the Elmhurst Summary.)
|
||
4. **`roof_codes=1` −1.78 (n=27)** (flat roof under-rate) and **`roof_insulation_thickness=None`
|
||
−1.18 (n=52)** — flat-roof / no-thickness roof handling.
|
||
5. **`main_data_source=2` / `has_pcdb_main=False` → 28% within 0.5, mean|err| 3.17 (n≈242).**
|
||
Non-PCDB heating systems (SAP-table efficiency) are a big under-rating cluster. Likely
|
||
Table 4b default-efficiency or fabric, but worth a look — it's 1/4 of the sample.
|
||
|
||
### Big scattered segments (need worksheets, NOT clean single fixes)
|
||
- **`whc=903` (electric immersion HW): 13% within 0.5, n=84** — looks like the worst bucket but
|
||
it's the electric **storage(cat-7)+room-heater(cat-10)** segment compounding (worst certs span
|
||
−29…+32, bidirectional). Not one bug.
|
||
- **`mains_gas=N` (electric): 21% within 0.5, mean|err| 4.27 (n=145)** — the hardest segment;
|
||
per-cert fabric/tariff scatter.
|
||
- **Flats (`property_type=2`): 31% within 0.5 (n=283)** — still the worst dwelling type.
|
||
- **cat-7 storage (+0.75) / cat-10 room heaters (+0.75)** — both net over-rate; bidirectional.
|
||
|
||
## DEPROVEN — do NOT retry (empirically failed in earlier sessions; details in memory)
|
||
- Routing **roof `'ND'` → Table 18** (description is load-bearing even with 'ND').
|
||
- Broad **"all Unknown(meter 3) electric → off-peak"** (over-credits room heaters). NOTE: the
|
||
meter-3 under-rate is partly an **irreducible data-fidelity artifact** — the register stores
|
||
meter_type=3 ("Unknown") on certs whose lodged rating actually used an off-peak meter (cert
|
||
2474: lodged 78 needs 18-hour, but API says Unknown → spec-faithful ~68). Don't chase those to
|
||
the lodged value.
|
||
- **RR shell U Table-17-50mm** (golden 6035 disproves it).
|
||
- **Shower enum is settled (non-bug):** API `shower_outlet_type` 1=non-electric(mixer)/2=electric
|
||
(cohort 2636/0330 validate at 1e-4); types 3/4/5 are finer gov-output sub-types (type 3 is all
|
||
on unsupported schema 19.1.0; type 4 already accurate). `shower_wwhrs` 1/2/3/4 = none / inst-
|
||
WWHRS-1 / inst-WWHRS-2 / storage. Low headline value — not worth pursuing.
|
||
|
||
## THE 100 unsupported_schema CERTS (deferred — bigger ticket)
|
||
SAP-Schema-19.1.0 (and other pre-21). The user is planning a separate big piece: map old schemas
|
||
→ new + **predict missing fields from similar-looking properties** (needs an EPC-prediction
|
||
method). That needs its own grilling session — do NOT start it here.
|
||
|
||
## WORKSHEET WORKFLOW (the user generates them on request)
|
||
For per-cert scatter that needs ground truth, ask the user to generate **P960 + Summary**
|
||
worksheets from the cert's OWN API JSON (`/tmp/epc_2026_sample/<cert>.json`). **Describe the cert
|
||
field-by-field first** (the user reproduces in Elmhurst; their repros are approximate — confirm
|
||
SAP matches lodged before pinning). Worksheets land under `sap worksheets/golden fixture
|
||
debugging/simulated case NN/` or `sap worksheets/additional with api 2/<cert>/`. Pin the cascade
|
||
to the P960 §3/§4/§9a/§10a line refs at abs=1e-4. **Caveat:** the user's repros often diverge
|
||
(wrong system / approximate inputs) — validate the BEHAVIOUR (e.g. λ, no-heat-loss) empirically
|
||
against the lodged SAP, don't blindly pin to a non-faithful repro.
|
||
|
||
## TOOLS & CONVENTIONS (non-negotiable)
|
||
- `scripts/eval_api_sap_accuracy.py` — headline + TOP-40 + `_results.csv`.
|
||
- `scripts/profile_api_error.py` — raw-API characteristic profiling (NEW, run first).
|
||
- `scripts/decompose_api_cost_error.py` — per-component cost decomposition (off-peak caveat: uses
|
||
STANDARD elec price, mis-flags off-peak certs).
|
||
- ~1009 cached API JSONs at `/tmp/epc_2026_sample` (`EPC_SAMPLE_CACHE` overrides).
|
||
- **one cause = one slice = one commit**; **spec citation (page+line)** in the message; AAA test
|
||
headers (`# Arrange/# Act/# Assert`); `abs(x-y)<=tol` not `pytest.approx`; **SAP 10.2 only**;
|
||
**no tolerance-widening / xfail**; RdSAP is **deterministic** — every fix is a spec rule, not a
|
||
population data-fit (the user is firm); pyright strict **net-zero** (baseline-compare via
|
||
`git stash`); **stage files BY NAME** (tree carries unrelated `scripts/` + `sap worksheets/`
|
||
changes — never `git add -A`); `Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>`.
|
||
- **REGRESSION after any calc/mapper change:** `tests/domain/sap10_calculator/`,
|
||
`backend/documents_parser/tests/`, `datatypes/epc/`, golden fixtures (esp. **6035**).
|
||
- **Pre-existing failures to IGNORE** (fail on the stashed baseline too): `test_total_floor_area`
|
||
and the 2 stone-wall U tests in `domain/sap10_ml/tests/test_rdsap_uvalues.py`.
|
||
|
||
## ARCHITECTURE NOTES (so you don't re-discover them)
|
||
- API path: `EpcPropertyDataMapper.from_api_response(doc)` → `cert_to_inputs(epc, prices=
|
||
SAP_10_2_SPEC_PRICES)` → `calculate_sap_from_inputs(...).sap_score_continuous`.
|
||
- Cost path uses `inputs.fuel_cost` (Table-32/12a precompute); `_fuel_cost` returns a ZERO
|
||
sentinel for off-peak → calculator falls back to the legacy scalar `_space_heating_fuel_cost_
|
||
gbp_per_kwh` (which DOES carry the off-peak rate). SapResult fuel codes are RAW API enums —
|
||
translate via `table_12.API_FUEL_TO_TABLE_12`.
|
||
- Heating efficiency: `_main_heating_detail_efficiency` → PCDB Table 105 winter eff (if PCDB
|
||
index) else `seasonal_efficiency(code, cat, fuel)` (Table 4a/4b, in `domain/sap10_ml/
|
||
sap_efficiencies.py`). Warm-air Table 4a code→eff map already covers 501-520.
|
||
- `sap10_ml/` is marked for eventual migration to `sap10_calculator/` but is still the live
|
||
u-value/efficiency path.
|