From ae34ca4d744cad7028ba3ff89ce79e27a65b4c09 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sun, 7 Jun 2026 20:39:52 +0000 Subject: [PATCH] docs: session-3 API-profiling handover (raises cleared, profiler-driven leads) Co-Authored-By: Claude Opus 4.8 --- docs/HANDOVER_API_PROFILING.md | 142 +++++++++++++++++++++++++++++++++ 1 file changed, 142 insertions(+) create mode 100644 docs/HANDOVER_API_PROFILING.md diff --git a/docs/HANDOVER_API_PROFILING.md b/docs/HANDOVER_API_PROFILING.md new file mode 100644 index 00000000..b3491452 --- /dev/null +++ b/docs/HANDOVER_API_PROFILING.md @@ -0,0 +1,142 @@ +# Handover — API SAP accuracy (session 3): raises cleared, now profile-driven + +**Branch:** `feature/per-cert-mapper-validation` (long-lived working branch — **NEVER PR to +main**; the user pushes/PRs when ready). **HEAD `a8e5563a`+** (the profiler commit), local-only +ahead of origin. + +**READ ALSO:** the auto-memory `project_per_cert_mapper_validation_state` (full slice log + +deproven approaches + the meter/shower data-fidelity findings), and the earlier +`docs/HANDOVER_API_ACCURACY_S2.md` (session-2 method). + +## THE GOAL (unchanged) +100% of API records with a lodged SAP compute within **0.5 SAP** of the API's +`energy_rating_current`. Headline gauge: +`PYTHONPATH=/workspaces/model python scripts/eval_api_sap_accuracy.py`. + +| metric | now (`a8e5563a`) | +|--------|------------------| +| **% \|err\| < 0.5** | **45.1%** | +| % \|err\| < 1.0 | 59.4% | +| mean \|err\| | 1.702 | +| mean signed | −0.006 (balanced) | +| computed / raises | **909 / 0** | +| unsupported_schema | 100 (deferred — see below) | + +45% is still poor. The systematic bias is gone; remaining error is per-cert scatter + the +profile-surfaced buckets below. + +## WHAT SHIPPED THIS SESSION (7 slices, all green, pyright net-zero) +1. `e41a0bc0` **PCDB heat pump w/o SAP code → Table 12a ASHP_APP_N SH split** (0.80 high-rate). +2. `2bc73fb0` **HP-DHW (WHC 901/902/914 + PCDB HP) → Table 12a WH 0.70 split.** Together (1)+(2) + killed the cat-4 heat-pump over-rating bias (+1.43 → +0.06). +3. `449d8c5b` **direct-acting electric boiler (191) → zero primary circuit loss** (SAP Table 3 + p.160 zero list names it verbatim). +4. `f4048588` **wall_insulation_thermal_conductivity ignored → §5.8 default λ=0.04.** (See KEY + INSIGHT below — the gov field is RdSAP *output*, not an input.) +5. `1c5675a0` **floor_heat_loss=8 → no floor heat loss** (extension floor over a heated space; + RdSAP §3, like code 6). +6. `a8e5563a` **main_heating_category=9 (warm air) → Table 11 secondary fraction 0.10.** + (4)(5)(6) cleared **all 4 raises** — eval now has zero raises. +7. `(profiler)` **`scripts/profile_api_error.py`** — the new diagnostic (below). + +## KEY INSIGHT (load-bearing, from the user) +**The gov EPC API JSON is the published OUTPUT of RdSAP software (Elmhurst), not its input.** +So any API field Elmhurst doesn't expose as an *input* is register metadata the RdSAP10 method +does **not** consume — route it to the spec default, don't try to "use" it. This is exactly why +`wall_insulation_thermal_conductivity` (slice 4) → always λ=0.04. Apply the same lens to any +new "extra" API field before wiring it. + +## THE NEW DIAGNOSTIC — `scripts/profile_api_error.py` (run this first) +`PYTHONPATH=/workspaces/model python scripts/profile_api_error.py` joins each computed cert's +signed error with a rich feature set from its **raw API JSON** (not the mapped EpcPropertyData), +and ranks (feature, value) buckets by error carried + by |mean signed| bias. This is how to find +"silly API-path handling" gaps. `--min-n N` sets the bucket floor. + +### PRIORITISED LEADS (from the run at `a8e5563a` — verify with the profiler, they'll shift) +Cleanest "API-path handling" candidates first (small, biased buckets = likely a mapper/dispatch +bug, not noise): + +1. **`floor_codes=3` → mean signed +5.37 (n=10).** We map API `floor_heat_loss=3` → "To unheated + space" (same as code 2). The +5.37 over-rate says that's wrong — code 3 likely isn't "unheated + space" (or its U is wrong). Pull the n=10 certs, check what code 3 really is (ask the user the + Elmhurst floor dropdown — the API=output lens). **Highest bias, smallest scope = start here.** +2. **Control-code biases:** `main_control=2306` −2.96 (n=11), `2602` +2.49 (n=14), `2107` +1.65 + (n=38), `2402` +1.14 (n=10), `2307` +0.74 (n=11). Several control codes carry systematic bias + → Table 4c/4e control dispatch gaps. `2107`/`2602` are the biggest. Check + `_CONTROL_TYPE_BY_CODE` + the Table 4c efficiency-adjustment / Table 4e control coverage. +3. **`immersion_type=2` (dual immersion) → +2.00 (n=43, mean|err| 3.85).** RdSAP §12 lists "dual + electric immersion" as an off-peak trigger; the cascade does NOT consume `immersion_heating_type` + for tariff (verified — only comments reference it). Wiring the §12 dual-immersion → off-peak + rule for Unknown meters is a clean spec slice. (1=single, 2=dual per the Elmhurst Summary.) +4. **`roof_codes=1` −1.78 (n=27)** (flat roof under-rate) and **`roof_insulation_thickness=None` + −1.18 (n=52)** — flat-roof / no-thickness roof handling. +5. **`main_data_source=2` / `has_pcdb_main=False` → 28% within 0.5, mean|err| 3.17 (n≈242).** + Non-PCDB heating systems (SAP-table efficiency) are a big under-rating cluster. Likely + Table 4b default-efficiency or fabric, but worth a look — it's 1/4 of the sample. + +### Big scattered segments (need worksheets, NOT clean single fixes) +- **`whc=903` (electric immersion HW): 13% within 0.5, n=84** — looks like the worst bucket but + it's the electric **storage(cat-7)+room-heater(cat-10)** segment compounding (worst certs span + −29…+32, bidirectional). Not one bug. +- **`mains_gas=N` (electric): 21% within 0.5, mean|err| 4.27 (n=145)** — the hardest segment; + per-cert fabric/tariff scatter. +- **Flats (`property_type=2`): 31% within 0.5 (n=283)** — still the worst dwelling type. +- **cat-7 storage (+0.75) / cat-10 room heaters (+0.75)** — both net over-rate; bidirectional. + +## DEPROVEN — do NOT retry (empirically failed in earlier sessions; details in memory) +- Routing **roof `'ND'` → Table 18** (description is load-bearing even with 'ND'). +- Broad **"all Unknown(meter 3) electric → off-peak"** (over-credits room heaters). NOTE: the + meter-3 under-rate is partly an **irreducible data-fidelity artifact** — the register stores + meter_type=3 ("Unknown") on certs whose lodged rating actually used an off-peak meter (cert + 2474: lodged 78 needs 18-hour, but API says Unknown → spec-faithful ~68). Don't chase those to + the lodged value. +- **RR shell U Table-17-50mm** (golden 6035 disproves it). +- **Shower enum is settled (non-bug):** API `shower_outlet_type` 1=non-electric(mixer)/2=electric + (cohort 2636/0330 validate at 1e-4); types 3/4/5 are finer gov-output sub-types (type 3 is all + on unsupported schema 19.1.0; type 4 already accurate). `shower_wwhrs` 1/2/3/4 = none / inst- + WWHRS-1 / inst-WWHRS-2 / storage. Low headline value — not worth pursuing. + +## THE 100 unsupported_schema CERTS (deferred — bigger ticket) +SAP-Schema-19.1.0 (and other pre-21). The user is planning a separate big piece: map old schemas +→ new + **predict missing fields from similar-looking properties** (needs an EPC-prediction +method). That needs its own grilling session — do NOT start it here. + +## WORKSHEET WORKFLOW (the user generates them on request) +For per-cert scatter that needs ground truth, ask the user to generate **P960 + Summary** +worksheets from the cert's OWN API JSON (`/tmp/epc_2026_sample/.json`). **Describe the cert +field-by-field first** (the user reproduces in Elmhurst; their repros are approximate — confirm +SAP matches lodged before pinning). Worksheets land under `sap worksheets/golden fixture +debugging/simulated case NN/` or `sap worksheets/additional with api 2//`. Pin the cascade +to the P960 §3/§4/§9a/§10a line refs at abs=1e-4. **Caveat:** the user's repros often diverge +(wrong system / approximate inputs) — validate the BEHAVIOUR (e.g. λ, no-heat-loss) empirically +against the lodged SAP, don't blindly pin to a non-faithful repro. + +## TOOLS & CONVENTIONS (non-negotiable) +- `scripts/eval_api_sap_accuracy.py` — headline + TOP-40 + `_results.csv`. +- `scripts/profile_api_error.py` — raw-API characteristic profiling (NEW, run first). +- `scripts/decompose_api_cost_error.py` — per-component cost decomposition (off-peak caveat: uses + STANDARD elec price, mis-flags off-peak certs). +- ~1009 cached API JSONs at `/tmp/epc_2026_sample` (`EPC_SAMPLE_CACHE` overrides). +- **one cause = one slice = one commit**; **spec citation (page+line)** in the message; AAA test + headers (`# Arrange/# Act/# Assert`); `abs(x-y)<=tol` not `pytest.approx`; **SAP 10.2 only**; + **no tolerance-widening / xfail**; RdSAP is **deterministic** — every fix is a spec rule, not a + population data-fit (the user is firm); pyright strict **net-zero** (baseline-compare via + `git stash`); **stage files BY NAME** (tree carries unrelated `scripts/` + `sap worksheets/` + changes — never `git add -A`); `Co-Authored-By: Claude Opus 4.8 `. +- **REGRESSION after any calc/mapper change:** `tests/domain/sap10_calculator/`, + `backend/documents_parser/tests/`, `datatypes/epc/`, golden fixtures (esp. **6035**). +- **Pre-existing failures to IGNORE** (fail on the stashed baseline too): `test_total_floor_area` + and the 2 stone-wall U tests in `domain/sap10_ml/tests/test_rdsap_uvalues.py`. + +## ARCHITECTURE NOTES (so you don't re-discover them) +- API path: `EpcPropertyDataMapper.from_api_response(doc)` → `cert_to_inputs(epc, prices= + SAP_10_2_SPEC_PRICES)` → `calculate_sap_from_inputs(...).sap_score_continuous`. +- Cost path uses `inputs.fuel_cost` (Table-32/12a precompute); `_fuel_cost` returns a ZERO + sentinel for off-peak → calculator falls back to the legacy scalar `_space_heating_fuel_cost_ + gbp_per_kwh` (which DOES carry the off-peak rate). SapResult fuel codes are RAW API enums — + translate via `table_12.API_FUEL_TO_TABLE_12`. +- Heating efficiency: `_main_heating_detail_efficiency` → PCDB Table 105 winter eff (if PCDB + index) else `seasonal_efficiency(code, cat, fuel)` (Table 4a/4b, in `domain/sap10_ml/ + sap_efficiencies.py`). Warm-air Table 4a code→eff map already covers 501-520. +- `sap10_ml/` is marked for eventual migration to `sap10_calculator/` but is still the live + u-value/efficiency path.