From ae34ca4d744cad7028ba3ff89ce79e27a65b4c09 Mon Sep 17 00:00:00 2001
From: Khalim Conn-Kowlessar <kconnkowlessar@gmail.com>
Date: Sun, 7 Jun 2026 20:39:52 +0000
Subject: [PATCH] docs: session-3 API-profiling handover (raises cleared,
 profiler-driven leads)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/HANDOVER_API_PROFILING.md | 142 +++++++++++++++++++++++++++++++++
 1 file changed, 142 insertions(+)
 create mode 100644 docs/HANDOVER_API_PROFILING.md

diff --git a/docs/HANDOVER_API_PROFILING.md b/docs/HANDOVER_API_PROFILING.md
new file mode 100644
index 00000000..b3491452
--- /dev/null
+++ b/docs/HANDOVER_API_PROFILING.md
@@ -0,0 +1,142 @@
+# Handover — API SAP accuracy (session 3): raises cleared, now profile-driven
+
+**Branch:** `feature/per-cert-mapper-validation` (long-lived working branch — **NEVER PR to
+main**; the user pushes/PRs when ready). **HEAD `a8e5563a`+** (the profiler commit), local-only
+ahead of origin.
+
+**READ ALSO:** the auto-memory `project_per_cert_mapper_validation_state` (full slice log +
+deproven approaches + the meter/shower data-fidelity findings), and the earlier
+`docs/HANDOVER_API_ACCURACY_S2.md` (session-2 method).
+
+## THE GOAL (unchanged)
+100% of API records with a lodged SAP compute within **0.5 SAP** of the API's
+`energy_rating_current`. Headline gauge:
+`PYTHONPATH=/workspaces/model python scripts/eval_api_sap_accuracy.py`.
+
+| metric | now (`a8e5563a`) |
+|--------|------------------|
+| **% \|err\| < 0.5** | **45.1%** |
+| % \|err\| < 1.0 | 59.4% |
+| mean \|err\| | 1.702 |
+| mean signed | −0.006 (balanced) |
+| computed / raises | **909 / 0** |
+| unsupported_schema | 100 (deferred — see below) |
+
+45% is still poor. The systematic bias is gone; remaining error is per-cert scatter + the
+profile-surfaced buckets below.
+
+## WHAT SHIPPED THIS SESSION (7 slices, all green, pyright net-zero)
+1. `e41a0bc0` **PCDB heat pump w/o SAP code → Table 12a ASHP_APP_N SH split** (0.80 high-rate).
+2. `2bc73fb0` **HP-DHW (WHC 901/902/914 + PCDB HP) → Table 12a WH 0.70 split.** Together (1)+(2)
+   killed the cat-4 heat-pump over-rating bias (+1.43 → +0.06).
+3. `449d8c5b` **direct-acting electric boiler (191) → zero primary circuit loss** (SAP Table 3
+   p.160 zero list names it verbatim).
+4. `f4048588` **wall_insulation_thermal_conductivity ignored → §5.8 default λ=0.04.** (See KEY
+   INSIGHT below — the gov field is RdSAP *output*, not an input.)
+5. `1c5675a0` **floor_heat_loss=8 → no floor heat loss** (extension floor over a heated space;
+   RdSAP §3, like code 6).
+6. `a8e5563a` **main_heating_category=9 (warm air) → Table 11 secondary fraction 0.10.**
+   (4)(5)(6) cleared **all 4 raises** — eval now has zero raises.
+7. `(profiler)` **`scripts/profile_api_error.py`** — the new diagnostic (below).
+
+## KEY INSIGHT (load-bearing, from the user)
+**The gov EPC API JSON is the published OUTPUT of RdSAP software (Elmhurst), not its input.**
+So any API field Elmhurst doesn't expose as an *input* is register metadata the RdSAP10 method
+does **not** consume — route it to the spec default, don't try to "use" it. This is exactly why
+`wall_insulation_thermal_conductivity` (slice 4) → always λ=0.04. Apply the same lens to any
+new "extra" API field before wiring it.
+
+## THE NEW DIAGNOSTIC — `scripts/profile_api_error.py` (run this first)
+`PYTHONPATH=/workspaces/model python scripts/profile_api_error.py` joins each computed cert's
+signed error with a rich feature set from its **raw API JSON** (not the mapped EpcPropertyData),
+and ranks (feature, value) buckets by error carried + by |mean signed| bias. This is how to find
+"silly API-path handling" gaps. `--min-n N` sets the bucket floor.
+
+### PRIORITISED LEADS (from the run at `a8e5563a` — verify with the profiler, they'll shift)
+Cleanest "API-path handling" candidates first (small, biased buckets = likely a mapper/dispatch
+bug, not noise):
+
+1. **`floor_codes=3` → mean signed +5.37 (n=10).** We map API `floor_heat_loss=3` → "To unheated
+   space" (same as code 2). The +5.37 over-rate says that's wrong — code 3 likely isn't "unheated
+   space" (or its U is wrong). Pull the n=10 certs, check what code 3 really is (ask the user the
+   Elmhurst floor dropdown — the API=output lens). **Highest bias, smallest scope = start here.**
+2. **Control-code biases:** `main_control=2306` −2.96 (n=11), `2602` +2.49 (n=14), `2107` +1.65
+   (n=38), `2402` +1.14 (n=10), `2307` +0.74 (n=11). Several control codes carry systematic bias
+   → Table 4c/4e control dispatch gaps. `2107`/`2602` are the biggest. Check
+   `_CONTROL_TYPE_BY_CODE` + the Table 4c efficiency-adjustment / Table 4e control coverage.
+3. **`immersion_type=2` (dual immersion) → +2.00 (n=43, mean|err| 3.85).** RdSAP §12 lists "dual
+   electric immersion" as an off-peak trigger; the cascade does NOT consume `immersion_heating_type`
+   for tariff (verified — only comments reference it). Wiring the §12 dual-immersion → off-peak
+   rule for Unknown meters is a clean spec slice. (1=single, 2=dual per the Elmhurst Summary.)
+4. **`roof_codes=1` −1.78 (n=27)** (flat roof under-rate) and **`roof_insulation_thickness=None`
+   −1.18 (n=52)** — flat-roof / no-thickness roof handling.
+5. **`main_data_source=2` / `has_pcdb_main=False` → 28% within 0.5, mean|err| 3.17 (n≈242).**
+   Non-PCDB heating systems (SAP-table efficiency) are a big under-rating cluster. Likely
+   Table 4b default-efficiency or fabric, but worth a look — it's 1/4 of the sample.
+
+### Big scattered segments (need worksheets, NOT clean single fixes)
+- **`whc=903` (electric immersion HW): 13% within 0.5, n=84** — looks like the worst bucket but
+  it's the electric **storage(cat-7)+room-heater(cat-10)** segment compounding (worst certs span
+  −29…+32, bidirectional). Not one bug.
+- **`mains_gas=N` (electric): 21% within 0.5, mean|err| 4.27 (n=145)** — the hardest segment;
+  per-cert fabric/tariff scatter.
+- **Flats (`property_type=2`): 31% within 0.5 (n=283)** — still the worst dwelling type.
+- **cat-7 storage (+0.75) / cat-10 room heaters (+0.75)** — both net over-rate; bidirectional.
+
+## DEPROVEN — do NOT retry (empirically failed in earlier sessions; details in memory)
+- Routing **roof `'ND'` → Table 18** (description is load-bearing even with 'ND').
+- Broad **"all Unknown(meter 3) electric → off-peak"** (over-credits room heaters). NOTE: the
+  meter-3 under-rate is partly an **irreducible data-fidelity artifact** — the register stores
+  meter_type=3 ("Unknown") on certs whose lodged rating actually used an off-peak meter (cert
+  2474: lodged 78 needs 18-hour, but API says Unknown → spec-faithful ~68). Don't chase those to
+  the lodged value.
+- **RR shell U Table-17-50mm** (golden 6035 disproves it).
+- **Shower enum is settled (non-bug):** API `shower_outlet_type` 1=non-electric(mixer)/2=electric
+  (cohort 2636/0330 validate at 1e-4); types 3/4/5 are finer gov-output sub-types (type 3 is all
+  on unsupported schema 19.1.0; type 4 already accurate). `shower_wwhrs` 1/2/3/4 = none / inst-
+  WWHRS-1 / inst-WWHRS-2 / storage. Low headline value — not worth pursuing.
+
+## THE 100 unsupported_schema CERTS (deferred — bigger ticket)
+SAP-Schema-19.1.0 (and other pre-21). The user is planning a separate big piece: map old schemas
+→ new + **predict missing fields from similar-looking properties** (needs an EPC-prediction
+method). That needs its own grilling session — do NOT start it here.
+
+## WORKSHEET WORKFLOW (the user generates them on request)
+For per-cert scatter that needs ground truth, ask the user to generate **P960 + Summary**
+worksheets from the cert's OWN API JSON (`/tmp/epc_2026_sample/<cert>.json`). **Describe the cert
+field-by-field first** (the user reproduces in Elmhurst; their repros are approximate — confirm
+SAP matches lodged before pinning). Worksheets land under `sap worksheets/golden fixture
+debugging/simulated case NN/` or `sap worksheets/additional with api 2/<cert>/`. Pin the cascade
+to the P960 §3/§4/§9a/§10a line refs at abs=1e-4. **Caveat:** the user's repros often diverge
+(wrong system / approximate inputs) — validate the BEHAVIOUR (e.g. λ, no-heat-loss) empirically
+against the lodged SAP, don't blindly pin to a non-faithful repro.
+
+## TOOLS & CONVENTIONS (non-negotiable)
+- `scripts/eval_api_sap_accuracy.py` — headline + TOP-40 + `_results.csv`.
+- `scripts/profile_api_error.py` — raw-API characteristic profiling (NEW, run first).
+- `scripts/decompose_api_cost_error.py` — per-component cost decomposition (off-peak caveat: uses
+  STANDARD elec price, mis-flags off-peak certs).
+- ~1009 cached API JSONs at `/tmp/epc_2026_sample` (`EPC_SAMPLE_CACHE` overrides).
+- **one cause = one slice = one commit**; **spec citation (page+line)** in the message; AAA test
+  headers (`# Arrange/# Act/# Assert`); `abs(x-y)<=tol` not `pytest.approx`; **SAP 10.2 only**;
+  **no tolerance-widening / xfail**; RdSAP is **deterministic** — every fix is a spec rule, not a
+  population data-fit (the user is firm); pyright strict **net-zero** (baseline-compare via
+  `git stash`); **stage files BY NAME** (tree carries unrelated `scripts/` + `sap worksheets/`
+  changes — never `git add -A`); `Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>`.
+- **REGRESSION after any calc/mapper change:** `tests/domain/sap10_calculator/`,
+  `backend/documents_parser/tests/`, `datatypes/epc/`, golden fixtures (esp. **6035**).
+- **Pre-existing failures to IGNORE** (fail on the stashed baseline too): `test_total_floor_area`
+  and the 2 stone-wall U tests in `domain/sap10_ml/tests/test_rdsap_uvalues.py`.
+
+## ARCHITECTURE NOTES (so you don't re-discover them)
+- API path: `EpcPropertyDataMapper.from_api_response(doc)` → `cert_to_inputs(epc, prices=
+  SAP_10_2_SPEC_PRICES)` → `calculate_sap_from_inputs(...).sap_score_continuous`.
+- Cost path uses `inputs.fuel_cost` (Table-32/12a precompute); `_fuel_cost` returns a ZERO
+  sentinel for off-peak → calculator falls back to the legacy scalar `_space_heating_fuel_cost_
+  gbp_per_kwh` (which DOES carry the off-peak rate). SapResult fuel codes are RAW API enums —
+  translate via `table_12.API_FUEL_TO_TABLE_12`.
+- Heating efficiency: `_main_heating_detail_efficiency` → PCDB Table 105 winter eff (if PCDB
+  index) else `seasonal_efficiency(code, cat, fuel)` (Table 4a/4b, in `domain/sap10_ml/
+  sap_efficiencies.py`). Warm-air Table 4a code→eff map already covers 501-520.
+- `sap10_ml/` is marked for eventual migration to `sap10_calculator/` but is still the live
+  u-value/efficiency path.