Model/docs/HANDOVER_API_PROFILING.md
Khalim Conn-Kowlessar d0f57a0e94 docs: session-4 handover — floor_heat_loss=3 resolved (U=0.7), 7536 re-pinned
Code 3 = "(other premises below)" = above partially heated space (§3.12 →
U=0.7), confirmed 9/9 on single-BP certs (the diagnostic that dodged the
lossy-floors[] contamination). Records the 7536 re-pin and the lesson that
"irreducible residual" golden notes can mask a real mapper bug.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 22:26:21 +00:00

168 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Handover — API SAP accuracy (session 3): raises cleared, now profile-driven
**Branch:** `feature/per-cert-mapper-validation` (long-lived working branch — **NEVER PR to
main**; the user pushes/PRs when ready). **HEAD `a8e5563a`+** (the profiler commit), local-only
ahead of origin.
**READ ALSO:** the auto-memory `project_per_cert_mapper_validation_state` (full slice log +
deproven approaches + the meter/shower data-fidelity findings), and the earlier
`docs/HANDOVER_API_ACCURACY_S2.md` (session-2 method).
## THE GOAL (unchanged)
100% of API records with a lodged SAP compute within **0.5 SAP** of the API's
`energy_rating_current`. Headline gauge:
`PYTHONPATH=/workspaces/model python scripts/eval_api_sap_accuracy.py`.
| metric | now (`a8e5563a`) |
|--------|------------------|
| **% \|err\| < 0.5** | **45.1%** |
| % \|err\| < 1.0 | 59.4% |
| mean \|err\| | 1.702 |
| mean signed | 0.006 (balanced) |
| computed / raises | **909 / 0** |
| unsupported_schema | 100 (deferred see below) |
45% is still poor. The systematic bias is gone; remaining error is per-cert scatter + the
profile-surfaced buckets below.
## WHAT SHIPPED THIS SESSION (7 slices, all green, pyright net-zero)
1. `e41a0bc0` **PCDB heat pump w/o SAP code → Table 12a ASHP_APP_N SH split** (0.80 high-rate).
2. `2bc73fb0` **HP-DHW (WHC 901/902/914 + PCDB HP) → Table 12a WH 0.70 split.** Together (1)+(2)
killed the cat-4 heat-pump over-rating bias (+1.43 +0.06).
3. `449d8c5b` **direct-acting electric boiler (191) → zero primary circuit loss** (SAP Table 3
p.160 zero list names it verbatim).
4. `f4048588` **wall_insulation_thermal_conductivity ignored → §5.8 default λ=0.04.** (See KEY
INSIGHT below the gov field is RdSAP *output*, not an input.)
5. `1c5675a0` **floor_heat_loss=8 → no floor heat loss** (extension floor over a heated space;
RdSAP §3, like code 6).
6. `a8e5563a` **main_heating_category=9 (warm air) → Table 11 secondary fraction 0.10.**
(4)(5)(6) cleared **all 4 raises** eval now has zero raises.
7. `(profiler)` **`scripts/profile_api_error.py`** the new diagnostic (below).
## SESSION-4 UPDATE (HEAD `8741fbdf`) — read before re-working the leads below
- **Lead #1 `floor_codes=3` RESOLVED the code IS authoritative.** The diagnostic that cracked
it: join each **single-BP** cert's `floor_heat_loss` code to its independent
`floors[].description` (the multi-BP tally was contaminated because a cert's `floors[]` summary
is LOSSY it drops some BPs' descriptions). Single-BP gives a perfect 1:1 enum: code 1↔"To
external air"(exposed), 2↔"To unheated space"(semi-exposed), **3↔"(other premises below)"
(9/9)**, 6↔"(another dwelling below)"(party), 7Solid/Suspended(ground). Per RdSAP §3.12
(p.25) code 3 = "above a partially heated space" (non-domestic premises below) §5.14 constant
**U=0.7** (NOT Table-20 semi-exposed, NOT ground). SHIPPED `8741fbdf`.
- **SHIPPED `b40e0f67`:** exposed-floor-on-flats (code 1) area fix §3.12. A flat's code-1
floor was area-zeroed by `_dwelling_exposure`; now the per-BP `is_exposed_floor` overrides the
flat suppression upward (mirrors the "another dwelling below" party override).
- **SHIPPED `8741fbdf`:** code 3 `is_above_partially_heated_space` (U=0.7) + area override.
**RE-PINNED golden 7536-3827** its Ext2(bp3) code-3 floor was mis-read as "ground U=1.12" by
a prior agent (the lossy floors[] dropped its description), who declared the residual an
"irreducible register-rounding artifact, DO NOT chase". It was this bug: U 1.120.70, PE/CO2
residuals moved toward 0. **LESSON: "irreducible residual" golden notes are suspect a real
mapper bug can hide there.** Eval (both slices): 45.145.3%, mean|err| 1.7021.659, <1.0
59.560.2%. User is generating a fresh `0380-2087-8190-2996-3075` worksheet to independently
confirm U=0.7 (0380 now 0.63) validate when it lands.
- **Leads re-checked, NOT clean:** `immersion_type=2` (+1.86) is high-scatter (mean|err| 3.71,
bidirectional). `main_control=2107` (+1.63) is correctly mapped ("Programmer, TRVs and bypass"
type 2 Table 4c(2)) over-rate is diffuse gas-boiler/flat-fabric, not a dispatch bug.
`roof_codes=1` broad bucket is mean 0.15 (the 1.78 was top-floor-electric-flat outliers
29/25). Remaining gains need per-cert worksheets (start code-3) or the unsupported-schema ticket.
## KEY INSIGHT (load-bearing, from the user)
**The gov EPC API JSON is the published OUTPUT of RdSAP software (Elmhurst), not its input.**
So any API field Elmhurst doesn't expose as an *input* is register metadata the RdSAP10 method
does **not** consume route it to the spec default, don't try to "use" it. This is exactly why
`wall_insulation_thermal_conductivity` (slice 4) always λ=0.04. Apply the same lens to any
new "extra" API field before wiring it.
## THE NEW DIAGNOSTIC — `scripts/profile_api_error.py` (run this first)
`PYTHONPATH=/workspaces/model python scripts/profile_api_error.py` joins each computed cert's
signed error with a rich feature set from its **raw API JSON** (not the mapped EpcPropertyData),
and ranks (feature, value) buckets by error carried + by |mean signed| bias. This is how to find
"silly API-path handling" gaps. `--min-n N` sets the bucket floor.
### PRIORITISED LEADS (from the run at `a8e5563a` — verify with the profiler, they'll shift)
Cleanest "API-path handling" candidates first (small, biased buckets = likely a mapper/dispatch
bug, not noise):
1. **`floor_codes=3` mean signed +5.37 (n=10).** We map API `floor_heat_loss=3` "To unheated
space" (same as code 2). The +5.37 over-rate says that's wrong code 3 likely isn't "unheated
space" (or its U is wrong). Pull the n=10 certs, check what code 3 really is (ask the user the
Elmhurst floor dropdown the API=output lens). **Highest bias, smallest scope = start here.**
2. **Control-code biases:** `main_control=2306` 2.96 (n=11), `2602` +2.49 (n=14), `2107` +1.65
(n=38), `2402` +1.14 (n=10), `2307` +0.74 (n=11). Several control codes carry systematic bias
Table 4c/4e control dispatch gaps. `2107`/`2602` are the biggest. Check
`_CONTROL_TYPE_BY_CODE` + the Table 4c efficiency-adjustment / Table 4e control coverage.
3. **`immersion_type=2` (dual immersion) +2.00 (n=43, mean|err| 3.85).** RdSAP §12 lists "dual
electric immersion" as an off-peak trigger; the cascade does NOT consume `immersion_heating_type`
for tariff (verified only comments reference it). Wiring the §12 dual-immersion off-peak
rule for Unknown meters is a clean spec slice. (1=single, 2=dual per the Elmhurst Summary.)
4. **`roof_codes=1` 1.78 (n=27)** (flat roof under-rate) and **`roof_insulation_thickness=None`
1.18 (n=52)** flat-roof / no-thickness roof handling.
5. **`main_data_source=2` / `has_pcdb_main=False` 28% within 0.5, mean|err| 3.17 (n242).**
Non-PCDB heating systems (SAP-table efficiency) are a big under-rating cluster. Likely
Table 4b default-efficiency or fabric, but worth a look it's 1/4 of the sample.
### Big scattered segments (need worksheets, NOT clean single fixes)
- **`whc=903` (electric immersion HW): 13% within 0.5, n=84** looks like the worst bucket but
it's the electric **storage(cat-7)+room-heater(cat-10)** segment compounding (worst certs span
29…+32, bidirectional). Not one bug.
- **`mains_gas=N` (electric): 21% within 0.5, mean|err| 4.27 (n=145)** the hardest segment;
per-cert fabric/tariff scatter.
- **Flats (`property_type=2`): 31% within 0.5 (n=283)** still the worst dwelling type.
- **cat-7 storage (+0.75) / cat-10 room heaters (+0.75)** both net over-rate; bidirectional.
## DEPROVEN — do NOT retry (empirically failed in earlier sessions; details in memory)
- Routing **roof `'ND'` → Table 18** (description is load-bearing even with 'ND').
- Broad **"all Unknown(meter 3) electric off-peak"** (over-credits room heaters). NOTE: the
meter-3 under-rate is partly an **irreducible data-fidelity artifact** the register stores
meter_type=3 ("Unknown") on certs whose lodged rating actually used an off-peak meter (cert
2474: lodged 78 needs 18-hour, but API says Unknown spec-faithful ~68). Don't chase those to
the lodged value.
- **RR shell U Table-17-50mm** (golden 6035 disproves it).
- **Shower enum is settled (non-bug):** API `shower_outlet_type` 1=non-electric(mixer)/2=electric
(cohort 2636/0330 validate at 1e-4); types 3/4/5 are finer gov-output sub-types (type 3 is all
on unsupported schema 19.1.0; type 4 already accurate). `shower_wwhrs` 1/2/3/4 = none / inst-
WWHRS-1 / inst-WWHRS-2 / storage. Low headline value not worth pursuing.
## THE 100 unsupported_schema CERTS (deferred — bigger ticket)
SAP-Schema-19.1.0 (and other pre-21). The user is planning a separate big piece: map old schemas
new + **predict missing fields from similar-looking properties** (needs an EPC-prediction
method). That needs its own grilling session do NOT start it here.
## WORKSHEET WORKFLOW (the user generates them on request)
For per-cert scatter that needs ground truth, ask the user to generate **P960 + Summary**
worksheets from the cert's OWN API JSON (`/tmp/epc_2026_sample/<cert>.json`). **Describe the cert
field-by-field first** (the user reproduces in Elmhurst; their repros are approximate confirm
SAP matches lodged before pinning). Worksheets land under `sap worksheets/golden fixture
debugging/simulated case NN/` or `sap worksheets/additional with api 2/<cert>/`. Pin the cascade
to the P960 §349a10a line refs at abs=1e-4. **Caveat:** the user's repros often diverge
(wrong system / approximate inputs) validate the BEHAVIOUR (e.g. λ, no-heat-loss) empirically
against the lodged SAP, don't blindly pin to a non-faithful repro.
## TOOLS & CONVENTIONS (non-negotiable)
- `scripts/eval_api_sap_accuracy.py` headline + TOP-40 + `_results.csv`.
- `scripts/profile_api_error.py` raw-API characteristic profiling (NEW, run first).
- `scripts/decompose_api_cost_error.py` per-component cost decomposition (off-peak caveat: uses
STANDARD elec price, mis-flags off-peak certs).
- ~1009 cached API JSONs at `/tmp/epc_2026_sample` (`EPC_SAMPLE_CACHE` overrides).
- **one cause = one slice = one commit**; **spec citation (page+line)** in the message; AAA test
headers (`# Arrange/# Act/# Assert`); `abs(x-y)<=tol` not `pytest.approx`; **SAP 10.2 only**;
**no tolerance-widening / xfail**; RdSAP is **deterministic** every fix is a spec rule, not a
population data-fit (the user is firm); pyright strict **net-zero** (baseline-compare via
`git stash`); **stage files BY NAME** (tree carries unrelated `scripts/` + `sap worksheets/`
changes never `git add -A`); `Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>`.
- **REGRESSION after any calc/mapper change:** `tests/domain/sap10_calculator/`,
`backend/documents_parser/tests/`, `datatypes/epc/`, golden fixtures (esp. **6035**).
- **Pre-existing failures to IGNORE** (fail on the stashed baseline too): `test_total_floor_area`
and the 2 stone-wall U tests in `domain/sap10_ml/tests/test_rdsap_uvalues.py`.
## ARCHITECTURE NOTES (so you don't re-discover them)
- API path: `EpcPropertyDataMapper.from_api_response(doc)` `cert_to_inputs(epc, prices=
SAP_10_2_SPEC_PRICES)` → `calculate_sap_from_inputs(...).sap_score_continuous`.
- Cost path uses `inputs.fuel_cost` (Table-32/12a precompute); `_fuel_cost` returns a ZERO
sentinel for off-peak → calculator falls back to the legacy scalar `_space_heating_fuel_cost_
gbp_per_kwh` (which DOES carry the off-peak rate). SapResult fuel codes are RAW API enums —
translate via `table_12.API_FUEL_TO_TABLE_12`.
- Heating efficiency: `_main_heating_detail_efficiency` → PCDB Table 105 winter eff (if PCDB
index) else `seasonal_efficiency(code, cat, fuel)` (Table 4a/4b, in `domain/sap10_ml/
sap_efficiencies.py`). Warm-air Table 4a code→eff map already covers 501-520.
- `sap10_ml/` is marked for eventual migration to `sap10_calculator/` but is still the live
u-value/efficiency path.