# Handover — API SAP accuracy (session 3): raises cleared, now profile-driven **Branch:** `feature/per-cert-mapper-validation` (long-lived working branch — **NEVER PR to main**; the user pushes/PRs when ready). **HEAD `a8e5563a`+** (the profiler commit), local-only ahead of origin. **READ ALSO:** the auto-memory `project_per_cert_mapper_validation_state` (full slice log + deproven approaches + the meter/shower data-fidelity findings), and the earlier `docs/HANDOVER_API_ACCURACY_S2.md` (session-2 method). ## THE GOAL (unchanged) 100% of API records with a lodged SAP compute within **0.5 SAP** of the API's `energy_rating_current`. Headline gauge: `PYTHONPATH=/workspaces/model python scripts/eval_api_sap_accuracy.py`. | metric | now (`a8e5563a`) | |--------|------------------| | **% \|err\| < 0.5** | **45.1%** | | % \|err\| < 1.0 | 59.4% | | mean \|err\| | 1.702 | | mean signed | −0.006 (balanced) | | computed / raises | **909 / 0** | | unsupported_schema | 100 (deferred — see below) | 45% is still poor. The systematic bias is gone; remaining error is per-cert scatter + the profile-surfaced buckets below. ## WHAT SHIPPED THIS SESSION (7 slices, all green, pyright net-zero) 1. `e41a0bc0` **PCDB heat pump w/o SAP code → Table 12a ASHP_APP_N SH split** (0.80 high-rate). 2. `2bc73fb0` **HP-DHW (WHC 901/902/914 + PCDB HP) → Table 12a WH 0.70 split.** Together (1)+(2) killed the cat-4 heat-pump over-rating bias (+1.43 → +0.06). 3. `449d8c5b` **direct-acting electric boiler (191) → zero primary circuit loss** (SAP Table 3 p.160 zero list names it verbatim). 4. `f4048588` **wall_insulation_thermal_conductivity ignored → §5.8 default λ=0.04.** (See KEY INSIGHT below — the gov field is RdSAP *output*, not an input.) 5. `1c5675a0` **floor_heat_loss=8 → no floor heat loss** (extension floor over a heated space; RdSAP §3, like code 6). 6. `a8e5563a` **main_heating_category=9 (warm air) → Table 11 secondary fraction 0.10.** (4)(5)(6) cleared **all 4 raises** — eval now has zero raises. 7. `(profiler)` **`scripts/profile_api_error.py`** — the new diagnostic (below). ## SESSION-4 UPDATE (HEAD `8741fbdf`) — read before re-working the leads below - **Lead #1 `floor_codes=3` RESOLVED — the code IS authoritative.** The diagnostic that cracked it: join each **single-BP** cert's `floor_heat_loss` code to its independent `floors[].description` (the multi-BP tally was contaminated because a cert's `floors[]` summary is LOSSY — it drops some BPs' descriptions). Single-BP gives a perfect 1:1 enum: code 1↔"To external air"(exposed), 2↔"To unheated space"(semi-exposed), **3↔"(other premises below)" (9/9)**, 6↔"(another dwelling below)"(party), 7↔Solid/Suspended(ground). Per RdSAP §3.12 (p.25) code 3 = "above a partially heated space" (non-domestic premises below) → §5.14 constant **U=0.7** (NOT Table-20 semi-exposed, NOT ground). SHIPPED `8741fbdf`. - **SHIPPED `b40e0f67`:** exposed-floor-on-flats (code 1) area fix — §3.12. A flat's code-1 floor was area-zeroed by `_dwelling_exposure`; now the per-BP `is_exposed_floor` overrides the flat suppression upward (mirrors the "another dwelling below" party override). - **SHIPPED `8741fbdf`:** code 3 → `is_above_partially_heated_space` (U=0.7) + area override. **RE-PINNED golden 7536-3827** — its Ext2(bp3) code-3 floor was mis-read as "ground U=1.12" by a prior agent (the lossy floors[] dropped its description), who declared the residual an "irreducible register-rounding artifact, DO NOT chase". It was this bug: U 1.12→0.70, PE/CO2 residuals moved toward 0. **LESSON: "irreducible residual" golden notes are suspect — a real mapper bug can hide there.** Eval (both slices): 45.1→45.3%, mean|err| 1.702→1.659, <1.0 59.5→60.2%. User is generating a fresh `0380-2087-8190-2996-3075` worksheet to independently confirm U=0.7 (0380 now −0.63) — validate when it lands. - **Leads re-checked, NOT clean:** `immersion_type=2` (+1.86) is high-scatter (mean|err| 3.71, bidirectional). `main_control=2107` (+1.63) is correctly mapped ("Programmer, TRVs and bypass" type 2 Table 4c(2)) — over-rate is diffuse gas-boiler/flat-fabric, not a dispatch bug. `roof_codes=1` broad bucket is mean −0.15 (the −1.78 was top-floor-electric-flat outliers −29/−25). Remaining gains need per-cert worksheets (start code-3) or the unsupported-schema ticket. ## KEY INSIGHT (load-bearing, from the user) **The gov EPC API JSON is the published OUTPUT of RdSAP software (Elmhurst), not its input.** So any API field Elmhurst doesn't expose as an *input* is register metadata the RdSAP10 method does **not** consume — route it to the spec default, don't try to "use" it. This is exactly why `wall_insulation_thermal_conductivity` (slice 4) → always λ=0.04. Apply the same lens to any new "extra" API field before wiring it. ## THE NEW DIAGNOSTIC — `scripts/profile_api_error.py` (run this first) `PYTHONPATH=/workspaces/model python scripts/profile_api_error.py` joins each computed cert's signed error with a rich feature set from its **raw API JSON** (not the mapped EpcPropertyData), and ranks (feature, value) buckets by error carried + by |mean signed| bias. This is how to find "silly API-path handling" gaps. `--min-n N` sets the bucket floor. ### PRIORITISED LEADS (from the run at `a8e5563a` — verify with the profiler, they'll shift) Cleanest "API-path handling" candidates first (small, biased buckets = likely a mapper/dispatch bug, not noise): 1. **`floor_codes=3` → mean signed +5.37 (n=10).** We map API `floor_heat_loss=3` → "To unheated space" (same as code 2). The +5.37 over-rate says that's wrong — code 3 likely isn't "unheated space" (or its U is wrong). Pull the n=10 certs, check what code 3 really is (ask the user the Elmhurst floor dropdown — the API=output lens). **Highest bias, smallest scope = start here.** 2. **Control-code biases:** `main_control=2306` −2.96 (n=11), `2602` +2.49 (n=14), `2107` +1.65 (n=38), `2402` +1.14 (n=10), `2307` +0.74 (n=11). Several control codes carry systematic bias → Table 4c/4e control dispatch gaps. `2107`/`2602` are the biggest. Check `_CONTROL_TYPE_BY_CODE` + the Table 4c efficiency-adjustment / Table 4e control coverage. 3. **`immersion_type=2` (dual immersion) → +2.00 (n=43, mean|err| 3.85).** RdSAP §12 lists "dual electric immersion" as an off-peak trigger; the cascade does NOT consume `immersion_heating_type` for tariff (verified — only comments reference it). Wiring the §12 dual-immersion → off-peak rule for Unknown meters is a clean spec slice. (1=single, 2=dual per the Elmhurst Summary.) 4. **`roof_codes=1` −1.78 (n=27)** (flat roof under-rate) and **`roof_insulation_thickness=None` −1.18 (n=52)** — flat-roof / no-thickness roof handling. 5. **`main_data_source=2` / `has_pcdb_main=False` → 28% within 0.5, mean|err| 3.17 (n≈242).** Non-PCDB heating systems (SAP-table efficiency) are a big under-rating cluster. Likely Table 4b default-efficiency or fabric, but worth a look — it's 1/4 of the sample. ### Big scattered segments (need worksheets, NOT clean single fixes) - **`whc=903` (electric immersion HW): 13% within 0.5, n=84** — looks like the worst bucket but it's the electric **storage(cat-7)+room-heater(cat-10)** segment compounding (worst certs span −29…+32, bidirectional). Not one bug. - **`mains_gas=N` (electric): 21% within 0.5, mean|err| 4.27 (n=145)** — the hardest segment; per-cert fabric/tariff scatter. - **Flats (`property_type=2`): 31% within 0.5 (n=283)** — still the worst dwelling type. - **cat-7 storage (+0.75) / cat-10 room heaters (+0.75)** — both net over-rate; bidirectional. ## DEPROVEN — do NOT retry (empirically failed in earlier sessions; details in memory) - Routing **roof `'ND'` → Table 18** (description is load-bearing even with 'ND'). - Broad **"all Unknown(meter 3) electric → off-peak"** (over-credits room heaters). NOTE: the meter-3 under-rate is partly an **irreducible data-fidelity artifact** — the register stores meter_type=3 ("Unknown") on certs whose lodged rating actually used an off-peak meter (cert 2474: lodged 78 needs 18-hour, but API says Unknown → spec-faithful ~68). Don't chase those to the lodged value. - **RR shell U Table-17-50mm** (golden 6035 disproves it). - **Shower enum is settled (non-bug):** API `shower_outlet_type` 1=non-electric(mixer)/2=electric (cohort 2636/0330 validate at 1e-4); types 3/4/5 are finer gov-output sub-types (type 3 is all on unsupported schema 19.1.0; type 4 already accurate). `shower_wwhrs` 1/2/3/4 = none / inst- WWHRS-1 / inst-WWHRS-2 / storage. Low headline value — not worth pursuing. ## THE 100 unsupported_schema CERTS (deferred — bigger ticket) SAP-Schema-19.1.0 (and other pre-21). The user is planning a separate big piece: map old schemas → new + **predict missing fields from similar-looking properties** (needs an EPC-prediction method). That needs its own grilling session — do NOT start it here. ## WORKSHEET WORKFLOW (the user generates them on request) For per-cert scatter that needs ground truth, ask the user to generate **P960 + Summary** worksheets from the cert's OWN API JSON (`/tmp/epc_2026_sample/.json`). **Describe the cert field-by-field first** (the user reproduces in Elmhurst; their repros are approximate — confirm SAP matches lodged before pinning). Worksheets land under `sap worksheets/golden fixture debugging/simulated case NN/` or `sap worksheets/additional with api 2//`. Pin the cascade to the P960 §3/§4/§9a/§10a line refs at abs=1e-4. **Caveat:** the user's repros often diverge (wrong system / approximate inputs) — validate the BEHAVIOUR (e.g. λ, no-heat-loss) empirically against the lodged SAP, don't blindly pin to a non-faithful repro. ## TOOLS & CONVENTIONS (non-negotiable) - `scripts/eval_api_sap_accuracy.py` — headline + TOP-40 + `_results.csv`. - `scripts/profile_api_error.py` — raw-API characteristic profiling (NEW, run first). - `scripts/decompose_api_cost_error.py` — per-component cost decomposition (off-peak caveat: uses STANDARD elec price, mis-flags off-peak certs). - ~1009 cached API JSONs at `/tmp/epc_2026_sample` (`EPC_SAMPLE_CACHE` overrides). - **one cause = one slice = one commit**; **spec citation (page+line)** in the message; AAA test headers (`# Arrange/# Act/# Assert`); `abs(x-y)<=tol` not `pytest.approx`; **SAP 10.2 only**; **no tolerance-widening / xfail**; RdSAP is **deterministic** — every fix is a spec rule, not a population data-fit (the user is firm); pyright strict **net-zero** (baseline-compare via `git stash`); **stage files BY NAME** (tree carries unrelated `scripts/` + `sap worksheets/` changes — never `git add -A`); `Co-Authored-By: Claude Opus 4.8 `. - **REGRESSION after any calc/mapper change:** `tests/domain/sap10_calculator/`, `backend/documents_parser/tests/`, `datatypes/epc/`, golden fixtures (esp. **6035**). - **Pre-existing failures to IGNORE** (fail on the stashed baseline too): `test_total_floor_area` and the 2 stone-wall U tests in `domain/sap10_ml/tests/test_rdsap_uvalues.py`. ## ARCHITECTURE NOTES (so you don't re-discover them) - API path: `EpcPropertyDataMapper.from_api_response(doc)` → `cert_to_inputs(epc, prices= SAP_10_2_SPEC_PRICES)` → `calculate_sap_from_inputs(...).sap_score_continuous`. - Cost path uses `inputs.fuel_cost` (Table-32/12a precompute); `_fuel_cost` returns a ZERO sentinel for off-peak → calculator falls back to the legacy scalar `_space_heating_fuel_cost_ gbp_per_kwh` (which DOES carry the off-peak rate). SapResult fuel codes are RAW API enums — translate via `table_12.API_FUEL_TO_TABLE_12`. - Heating efficiency: `_main_heating_detail_efficiency` → PCDB Table 105 winter eff (if PCDB index) else `seasonal_efficiency(code, cat, fuel)` (Table 4a/4b, in `domain/sap10_ml/ sap_efficiencies.py`). Warm-air Table 4a code→eff map already covers 501-520. - `sap10_ml/` is marked for eventual migration to `sap10_calculator/` but is still the live u-value/efficiency path.