Model/docs/HANDOVER_API_PROFILING.md
Khalim Conn-Kowlessar 75ef250ec8 docs: session-4 handover — exposed-floor fix shipped, floor-3 enum unconfirmed
Record that profiler lead #1 (floor_codes=3) is not a clean single cause
(bimodal + confounded), that the paired worksheet certs confirm only
codes 1/6/7 (code 3 unmapped → needs a worksheet for 0380-2087), and that
immersion_type=2 / main_control=2107 / roof_codes=1 are scatter, not
dispatch bugs. The exposed-floor-on-flats fix (§3.12) shipped at b40e0f67.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:53:25 +00:00

12 KiB
Raw Blame History

Handover — API SAP accuracy (session 3): raises cleared, now profile-driven

Branch: feature/per-cert-mapper-validation (long-lived working branch — NEVER PR to main; the user pushes/PRs when ready). HEAD a8e5563a+ (the profiler commit), local-only ahead of origin.

READ ALSO: the auto-memory project_per_cert_mapper_validation_state (full slice log + deproven approaches + the meter/shower data-fidelity findings), and the earlier docs/HANDOVER_API_ACCURACY_S2.md (session-2 method).

THE GOAL (unchanged)

100% of API records with a lodged SAP compute within 0.5 SAP of the API's energy_rating_current. Headline gauge: PYTHONPATH=/workspaces/model python scripts/eval_api_sap_accuracy.py.

metric now (a8e5563a)
% |err| < 0.5 45.1%
% |err| < 1.0 59.4%
mean |err| 1.702
mean signed 0.006 (balanced)
computed / raises 909 / 0
unsupported_schema 100 (deferred — see below)

45% is still poor. The systematic bias is gone; remaining error is per-cert scatter + the profile-surfaced buckets below.

WHAT SHIPPED THIS SESSION (7 slices, all green, pyright net-zero)

  1. e41a0bc0 PCDB heat pump w/o SAP code → Table 12a ASHP_APP_N SH split (0.80 high-rate).
  2. 2bc73fb0 HP-DHW (WHC 901/902/914 + PCDB HP) → Table 12a WH 0.70 split. Together (1)+(2) killed the cat-4 heat-pump over-rating bias (+1.43 → +0.06).
  3. 449d8c5b direct-acting electric boiler (191) → zero primary circuit loss (SAP Table 3 p.160 zero list names it verbatim).
  4. f4048588 wall_insulation_thermal_conductivity ignored → §5.8 default λ=0.04. (See KEY INSIGHT below — the gov field is RdSAP output, not an input.)
  5. 1c5675a0 floor_heat_loss=8 → no floor heat loss (extension floor over a heated space; RdSAP §3, like code 6).
  6. a8e5563a main_heating_category=9 (warm air) → Table 11 secondary fraction 0.10. (4)(5)(6) cleared all 4 raises — eval now has zero raises.
  7. (profiler) scripts/profile_api_error.py — the new diagnostic (below).

SESSION-4 UPDATE (HEAD b40e0f67) — read before re-working the leads below

  • Lead #1 floor_codes=3 is NOT clean — worked it, enum UNCONFIRMED. It's bimodal (mid-floor flats over-rate, top-floor maisonettes 0434/0761 want ~zero floor loss) and confounded (9763 +19.48 is a WALL bug: walls=8.19 W/K for 59.8 m²). The 48 paired API+Summary worksheet certs (sap worksheets/Additional data with api/ + additional with api 2/, folder=API cert id) confirm code 7↔"G Ground", code 1↔"E To external air", code 6↔no-heat-loss but NONE cover code 2 or 3 → code 3 is genuinely unmapped. RdSAP §3.12 (p.25) flat floor categories: exposed→Table20, semi-exposed(unheated)→Table20, above-partial(non-domestic)→0.7, ground→ISO13370. NEXT: get a worksheet for 0380-2087-8190-2996-3075 (mid-floor flat, single BP, roof correctly party so the floor is the ONLY heat-loss unknown; lodged 66, we +3.71). Tried Table-20 for codes 2/3: overshoots (9494 +0.56→-3.67), reverted — and picking ground-U-vs-Table20 by eval score is a data-fit.
  • SHIPPED b40e0f67: exposed-floor-on-flats (code 1) area fix — §3.12. A flat's code-1 floor was area-zeroed by _dwelling_exposure; now the per-BP is_exposed_floor overrides the flat suppression upward (mirrors the "another dwelling below" party override). 45.1→45.3%.
  • Leads re-checked, NOT clean: immersion_type=2 (+1.86) is high-scatter (mean|err| 3.71, bidirectional). main_control=2107 (+1.63) is correctly mapped ("Programmer, TRVs and bypass" type 2 Table 4c(2)) — over-rate is diffuse gas-boiler/flat-fabric, not a dispatch bug. roof_codes=1 broad bucket is mean 0.15 (the 1.78 was top-floor-electric-flat outliers 29/25). Remaining gains need per-cert worksheets (start code-3) or the unsupported-schema ticket.

KEY INSIGHT (load-bearing, from the user)

The gov EPC API JSON is the published OUTPUT of RdSAP software (Elmhurst), not its input. So any API field Elmhurst doesn't expose as an input is register metadata the RdSAP10 method does not consume — route it to the spec default, don't try to "use" it. This is exactly why wall_insulation_thermal_conductivity (slice 4) → always λ=0.04. Apply the same lens to any new "extra" API field before wiring it.

THE NEW DIAGNOSTIC — scripts/profile_api_error.py (run this first)

PYTHONPATH=/workspaces/model python scripts/profile_api_error.py joins each computed cert's signed error with a rich feature set from its raw API JSON (not the mapped EpcPropertyData), and ranks (feature, value) buckets by error carried + by |mean signed| bias. This is how to find "silly API-path handling" gaps. --min-n N sets the bucket floor.

PRIORITISED LEADS (from the run at a8e5563a — verify with the profiler, they'll shift)

Cleanest "API-path handling" candidates first (small, biased buckets = likely a mapper/dispatch bug, not noise):

  1. floor_codes=3 → mean signed +5.37 (n=10). We map API floor_heat_loss=3 → "To unheated space" (same as code 2). The +5.37 over-rate says that's wrong — code 3 likely isn't "unheated space" (or its U is wrong). Pull the n=10 certs, check what code 3 really is (ask the user the Elmhurst floor dropdown — the API=output lens). Highest bias, smallest scope = start here.
  2. Control-code biases: main_control=2306 2.96 (n=11), 2602 +2.49 (n=14), 2107 +1.65 (n=38), 2402 +1.14 (n=10), 2307 +0.74 (n=11). Several control codes carry systematic bias → Table 4c/4e control dispatch gaps. 2107/2602 are the biggest. Check _CONTROL_TYPE_BY_CODE + the Table 4c efficiency-adjustment / Table 4e control coverage.
  3. immersion_type=2 (dual immersion) → +2.00 (n=43, mean|err| 3.85). RdSAP §12 lists "dual electric immersion" as an off-peak trigger; the cascade does NOT consume immersion_heating_type for tariff (verified — only comments reference it). Wiring the §12 dual-immersion → off-peak rule for Unknown meters is a clean spec slice. (1=single, 2=dual per the Elmhurst Summary.)
  4. roof_codes=1 1.78 (n=27) (flat roof under-rate) and roof_insulation_thickness=None 1.18 (n=52) — flat-roof / no-thickness roof handling.
  5. main_data_source=2 / has_pcdb_main=False → 28% within 0.5, mean|err| 3.17 (n≈242). Non-PCDB heating systems (SAP-table efficiency) are a big under-rating cluster. Likely Table 4b default-efficiency or fabric, but worth a look — it's 1/4 of the sample.

Big scattered segments (need worksheets, NOT clean single fixes)

  • whc=903 (electric immersion HW): 13% within 0.5, n=84 — looks like the worst bucket but it's the electric storage(cat-7)+room-heater(cat-10) segment compounding (worst certs span 29…+32, bidirectional). Not one bug.
  • mains_gas=N (electric): 21% within 0.5, mean|err| 4.27 (n=145) — the hardest segment; per-cert fabric/tariff scatter.
  • Flats (property_type=2): 31% within 0.5 (n=283) — still the worst dwelling type.
  • cat-7 storage (+0.75) / cat-10 room heaters (+0.75) — both net over-rate; bidirectional.

DEPROVEN — do NOT retry (empirically failed in earlier sessions; details in memory)

  • Routing roof 'ND' → Table 18 (description is load-bearing even with 'ND').
  • Broad "all Unknown(meter 3) electric → off-peak" (over-credits room heaters). NOTE: the meter-3 under-rate is partly an irreducible data-fidelity artifact — the register stores meter_type=3 ("Unknown") on certs whose lodged rating actually used an off-peak meter (cert 2474: lodged 78 needs 18-hour, but API says Unknown → spec-faithful ~68). Don't chase those to the lodged value.
  • RR shell U Table-17-50mm (golden 6035 disproves it).
  • Shower enum is settled (non-bug): API shower_outlet_type 1=non-electric(mixer)/2=electric (cohort 2636/0330 validate at 1e-4); types 3/4/5 are finer gov-output sub-types (type 3 is all on unsupported schema 19.1.0; type 4 already accurate). shower_wwhrs 1/2/3/4 = none / inst- WWHRS-1 / inst-WWHRS-2 / storage. Low headline value — not worth pursuing.

THE 100 unsupported_schema CERTS (deferred — bigger ticket)

SAP-Schema-19.1.0 (and other pre-21). The user is planning a separate big piece: map old schemas → new + predict missing fields from similar-looking properties (needs an EPC-prediction method). That needs its own grilling session — do NOT start it here.

WORKSHEET WORKFLOW (the user generates them on request)

For per-cert scatter that needs ground truth, ask the user to generate P960 + Summary worksheets from the cert's OWN API JSON (/tmp/epc_2026_sample/<cert>.json). Describe the cert field-by-field first (the user reproduces in Elmhurst; their repros are approximate — confirm SAP matches lodged before pinning). Worksheets land under sap worksheets/golden fixture debugging/simulated case NN/ or sap worksheets/additional with api 2/<cert>/. Pin the cascade to the P960 §3/§4/§9a/§10a line refs at abs=1e-4. Caveat: the user's repros often diverge (wrong system / approximate inputs) — validate the BEHAVIOUR (e.g. λ, no-heat-loss) empirically against the lodged SAP, don't blindly pin to a non-faithful repro.

TOOLS & CONVENTIONS (non-negotiable)

  • scripts/eval_api_sap_accuracy.py — headline + TOP-40 + _results.csv.
  • scripts/profile_api_error.py — raw-API characteristic profiling (NEW, run first).
  • scripts/decompose_api_cost_error.py — per-component cost decomposition (off-peak caveat: uses STANDARD elec price, mis-flags off-peak certs).
  • ~1009 cached API JSONs at /tmp/epc_2026_sample (EPC_SAMPLE_CACHE overrides).
  • one cause = one slice = one commit; spec citation (page+line) in the message; AAA test headers (# Arrange/# Act/# Assert); abs(x-y)<=tol not pytest.approx; SAP 10.2 only; no tolerance-widening / xfail; RdSAP is deterministic — every fix is a spec rule, not a population data-fit (the user is firm); pyright strict net-zero (baseline-compare via git stash); stage files BY NAME (tree carries unrelated scripts/ + sap worksheets/ changes — never git add -A); Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>.
  • REGRESSION after any calc/mapper change: tests/domain/sap10_calculator/, backend/documents_parser/tests/, datatypes/epc/, golden fixtures (esp. 6035).
  • Pre-existing failures to IGNORE (fail on the stashed baseline too): test_total_floor_area and the 2 stone-wall U tests in domain/sap10_ml/tests/test_rdsap_uvalues.py.

ARCHITECTURE NOTES (so you don't re-discover them)

  • API path: EpcPropertyDataMapper.from_api_response(doc)cert_to_inputs(epc, prices= SAP_10_2_SPEC_PRICES)calculate_sap_from_inputs(...).sap_score_continuous.
  • Cost path uses inputs.fuel_cost (Table-32/12a precompute); _fuel_cost returns a ZERO sentinel for off-peak → calculator falls back to the legacy scalar _space_heating_fuel_cost_ gbp_per_kwh (which DOES carry the off-peak rate). SapResult fuel codes are RAW API enums — translate via table_12.API_FUEL_TO_TABLE_12.
  • Heating efficiency: _main_heating_detail_efficiency → PCDB Table 105 winter eff (if PCDB index) else seasonal_efficiency(code, cat, fuel) (Table 4a/4b, in domain/sap10_ml/ sap_efficiencies.py). Warm-air Table 4a code→eff map already covers 501-520.
  • sap10_ml/ is marked for eventual migration to sap10_calculator/ but is still the live u-value/efficiency path.