From 590cb97ef6f1e601a495f15c297f94024bb5bcfb Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Tue, 9 Jun 2026 14:54:08 +0000 Subject: [PATCH] docs: session-9 close-out + session-10 handover (summary-report-based audit) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Session 9 ran five independent data-driven audits (profiler, dropped-field scan, CO2/PE reconciliation, cross-provider LIG parity, HW-demand reconciliation) — all converged on diffuse remaining gap — and shipped glazing Table-24 (+16 certs) + HW-only heat-network DLF, taking 54.90% -> 56.8% within-0.5. The data-driven seam is exhausted; session 10 switches to worksheet-level ground truth via the summary-report-based per-cert audit. New agent prompt at HANDOVER_SUMMARY_AUDIT.md with method, starter candidate certs, ruled-out list, and conventions. Co-Authored-By: Claude Opus 4.8 --- docs/HANDOVER_API_PROFILING.md | 6 ++ docs/HANDOVER_SUMMARY_AUDIT.md | 145 +++++++++++++++++++++++++++++++++ 2 files changed, 151 insertions(+) create mode 100644 docs/HANDOVER_SUMMARY_AUDIT.md diff --git a/docs/HANDOVER_API_PROFILING.md b/docs/HANDOVER_API_PROFILING.md index fd7e0195..0b37765b 100644 --- a/docs/HANDOVER_API_PROFILING.md +++ b/docs/HANDOVER_API_PROFILING.md @@ -1,5 +1,11 @@ # Handover — API SAP accuracy (session 3): raises cleared, now profile-driven +> **➡️ SESSION 10 STARTS HERE: `docs/HANDOVER_SUMMARY_AUDIT.md`.** HEAD `872bc585`, **56.8% +> within-0.5** (909 computed / 0 raises). Session 9 ran FIVE data-driven audit angles — all +> converged on "remaining gap is diffuse" — and shipped the glazing Table-24 win (+16 certs) + +> HW-only heat-network DLF. The data-driven seam is mined out; **session 10 switches to the +> summary-report-based per-cert worksheet audit.** Read that doc first. + **Branch:** `feature/per-cert-mapper-validation` (long-lived working branch — **NEVER PR to main**; the user pushes/PRs when ready). **HEAD `a8e5563a`+** (the profiler commit), local-only ahead of origin. diff --git a/docs/HANDOVER_SUMMARY_AUDIT.md b/docs/HANDOVER_SUMMARY_AUDIT.md new file mode 100644 index 00000000..20d44be1 --- /dev/null +++ b/docs/HANDOVER_SUMMARY_AUDIT.md @@ -0,0 +1,145 @@ +# Handover — API SAP accuracy, SESSION 10: summary-report-based per-cert audit + +You're continuing API→SAP accuracy work on branch **`feature/per-cert-mapper-validation`** in +`/workspaces/model`, **HEAD `872bc585`**. This is a **long-lived working branch — NEVER PR to +main**; the user pushes/PRs when ready. 31 commits ahead of `origin`, unpushed. + +## THE GOAL (measurable, unchanged) +100% of API records with a lodged SAP compute **within 0.5 SAP** of the API's +`energy_rating_current`. Headline gauge: +``` +PYTHONPATH=/workspaces/model python scripts/eval_api_sap_accuracy.py +``` +**Current: 56.8% within-0.5** (within-1.0 72.2%, within-2.0 84.8%, mean|err| 1.197, median 0.438, +signed −0.229, **909 computed / 0 raises**, 100 unsupported_schema). Writes `_results.csv` to the +cache. Re-profile with `scripts/profile_api_error.py --min-n 12`; component decomposition with +`scripts/decompose_api_cost_error.py`. ~1009 cached API JSONs at `/tmp/epc_2026_sample` +(`EPC_SAMPLE_CACHE` overrides). + +## ⚠️ THE PIVOT — why this session is DIFFERENT +The previous session (9) ran **FIVE independent data-driven audit angles**, all of which converged +on the same conclusion — the clean systematic levers are harvested and the remaining gap is +**diffuse** (data-fidelity matching the reference software, data-composition, per-cert scatter): +1. **Error-bucket profiler** → scatter, no clean residual bias. +2. **Dropped-field scan** (raw-JSON field present but mapped-None) → every field plumbed. +3. **CO2/PE reconciliation** vs lodged → systematic +15% but it's a **factor-basis** difference + (our SAP 10.2 Table 12 vs the lodged EPC's published basis), NOT demand (cost-SAP matches), + NOT scope (our CO2/PE correctly exclude appliances/cooking per spec line 326). **Off-goal** — + CO2/PE don't feed the cost-based SAP rating. +4. **Cross-provider parity** (LIG-21.0 vs 21.0.1, same builder): LIG under-rates −0.59, cleanest in + cavity (LIG −0.45 vs standard −0.01) — but the wall U computes CORRECTLY; the cause is diffuse + (composition: more solid-brick/system-built/non-PCDB mains, all under-rating in BOTH datasets; + plus per-cert scatter). No recoverable LIG-specific mapping bug. +5. **HW-demand reconciliation** vs lodged HW cost → median residual ≈ £0, well-calibrated. The + high-HW/m² certs are small flats (SAP HW floor effect) and are ACCURATE. + +**The data-driven seam is mined out.** The user (correctly — they drove the glazing find) wants to +switch to **worksheet-level ground truth**: the **summary-report-based audit**. Do NOT re-run the +five angles above expecting a new clean bug; pursue per-cert worksheet pins instead. + +## THE METHOD — summary-report-based audit (this session's loop) +For a chosen cert, the **user generates two Elmhurst worksheets from the cert's OWN API JSON** +(`/tmp/epc_2026_sample/.json`): the **P960** (full SAP worksheet, line refs `(1a)..(486)`) +and the **Summary**. Your loop: +1. **Describe the cert field-by-field FIRST** (so the user can reproduce it in Elmhurst): dwelling + type, TFA, age band, every building part (wall/roof/floor construction + insulation + thickness), + windows, the heating system (sap_main_heating_code, category, control, emitter, fuel, PCDB index), + water heating (whc, fuel, cylinder), ventilation, PV. Use the mapper to dump the *mapped* + `EpcPropertyData` so the description matches what we actually compute on. +2. **Pin the cascade to the worksheet line refs at abs=1e-4** — `heat_transmission_section_from_cert` + for §3 (26)..(37), the water-heating/§4, §9a/§10a etc. Localise the divergence to a specific + line ref → extractor / mapper / calculator gap. +3. **VALIDATE BEHAVIOUR against the LODGED SAP, not blindly against the user's repro.** The user's + Elmhurst repros are APPROXIMATE (they often pick a slightly-wrong system / inputs). Confirm the + repro's continuous SAP ≈ the lodged `energy_rating_current` BEFORE trusting its line refs; if the + repro diverges from lodged, the repro is the problem, not our cascade. (See + `reference_elmhurst_only_test_pattern` + the `_elmhurst_worksheet_000565` prototype for the + mapper-driven cascade-fixture shape.) +4. One confirmed cause = one TDD slice = one commit (conventions below). + +### Starter candidate certs (clean gas, single-building-part, schema 21.0.1 — NOT electric-fabric +### tail, NOT LIG, NOT deproven). |err| in 0.7–6, good worksheet targets: +``` +8700-1771-0622-8501-3963 -5.77 gas cat2 whc=903 (electric immersion HW on a gas main — odd) +2135-2729-0509-0142-6226 +5.29 sapcode 119 +4700-6865-0122-1501-3963 -5.19 detached gas boiler +0700-6754-0922-3505-3963 +4.35 whc=911 (gas boiler/circulator for water only) +9093-3060-2207-6506-0204 +3.60 sapcode 502 cat9 (WARM AIR main) + whc=950 — see SESSION-9 below +0330-2817-5590-2096-7831 -2.95 gas cat2 mid-terrace +``` +The full list (81 clean candidates) regenerates from the profiler/`_results.csv`. The user may pick +their own certs from domain knowledge — let them drive selection; your job is the field-by-field +description + the line-ref pin. + +## WHAT SHIPPED IN SESSION 9 (don't redo) — 54.90% → 56.8% +- **`a0432977` glazing single/secondary/triple per RdSAP 10 Table 24 (THE BIG WIN, +16 certs).** + `_API_GLAZING_TYPE_TO_TRANSMISSION` only mapped double-glazing [1,2,3,13,14]; single (5/15, U 4.8), + secondary (4/11/12), triple (6/8/9/10) returned None → silent u_window default U=2.5. Single glazing + at half its real heat loss was the killer. Method that found it: profile `sap_windows[].glazing_type`, + decode vs `epc_codes.csv` `glazed_type`. +- **`872bc585` HW-only heat-network DLF (whc 950/951/952).** The Table 12c distribution loss fired + only for `_is_heat_network_main AND whc∈{901,902,914}`; HW-only heat networks missed it entirely. + Added a whc-gated branch `water_eff = plant_eff / DLF` (RdSAP §10, spec p.36). All 3 corpus whc=950 + certs improved in |err|; cert **9093 still +3.60** — its residual is the **warm-air main (sapcode + 502, cat 9)**, a SEPARATE cause and a good worksheet candidate. +- **`7878a969` fuel strict-raise** at the Table-12 factor boundary (the cert-8536 collision class). +- **`49fb6c1b` glazing g remap** (codes 4/5 → correct cascade g-slots) — correctness, 0 SAP impact. +- **`a7990edb` ROBUSTNESS GUARDS** (forcing functions): `_api_glazing_transmission` + + `_api_cascade_glazing_type` raise `UnmappedApiCode` on present-but-unmapped glazing; + `seasonal_efficiency` + `water_heating_efficiency` raise `UnmappedSapCode` on present-but-unmapped + codes (was the blind 0.80/0.78 default). **0 current-corpus impact (tables complete) — these are + guards.** KEY for this session: **if a worksheet-audit cert RAISES, that's the guard surfacing a + real gap — map the code.** Also re-verify: efficiency table already covers WHC 908 (multi-point + gas) / 950 (HW heat network) — those are NOT unmapped bugs. + +## RULED OUT — do NOT re-chase (verified this session + DEPROVEN list in HANDOVER_API_PROFILING.md) +- **The 100 `unsupported_schema` certs are full-SAP NEW BUILDS** (`assessment_type="SAP"`, mean + rating 86, transaction_type 6 = new dwelling). Structurally different (sap_walls/sap_roofs/ + sap_openings with measured U-values, DER, construction_year). **Out of scope for a retrofit + product — do NOT build a parallel pipeline.** They're already excluded from the 56.8%. +- **Solid brick** (gas, −0.52): spec-faithful — `u_wall` applies RdSAP §5.7 Table 13 thickness; + direction wrong for a thickness gap. Data-fidelity (old houses outperform as-built). +- **Roof code-8 sloping-ceiling "insulated"-no-thickness** (cert 7921 −23): data-fidelity, we ≡ + Elmhurst at uninsulated. **meter_type=3** (Unknown meter): data-fidelity. Orientation code-9 drop: + the East/West "fix" HURTS the gauge; conservatory-only spec rule; leave it. +- LIG-21.0 divergence, CO2/PE +15%, HW-demand over-estimate: all diffuse / off-goal (see THE PIVOT). + +## CONVENTIONS (non-negotiable) +One cause = one slice = one commit; **spec citation (page+line) in the message** (the user +explicitly asks us to confirm against the SAP 10.2 / RdSAP 10 PDFs in `domain/sap10_calculator/ +docs/specs/` before claiming a fix — see `feedback_spec_citation_in_commits`); AAA test headers +(`# Arrange / # Act / # Assert`); **`abs(x-y)<=tol` not `pytest.approx`** (strict-pyright); +private-symbol test imports single-line with `# pyright: ignore[reportPrivateUsage]`; **SAP 10.2 +only** (ignore the 10.3 PDF); no tolerance-widening / xfail; RdSAP is deterministic — every fix is a +spec rule, apply uniformly even when it unmasks offsetting errors, **but flag any within-0.5 +regression to the user**; **pyright strict net-zero** (baseline-compare via `git stash`; avoid +`**dict` unpacking into `make_minimal_sap10_epc` — explodes pyright); **stage files BY NAME** (the +tree carries unrelated `scripts/` + `sap worksheets/` changes — never `git add -A`); end commit +messages with `Co-Authored-By: Claude Opus 4.8 `. + +**Regression gate** after any calc/mapper change (goldens esp. 6035 + 000565 are the gate): +``` +PYTHONPATH=/workspaces/model python -m pytest tests/domain/sap10_calculator/ \ + domain/sap10_ml/tests/ datatypes/epc/ backend/documents_parser/tests/ -q +``` +**IGNORE these pre-existing fails** (not yours): `test_total_floor_area`, the 2 stone-wall U tests +in `test_rdsap_uvalues.py`, the flaky `test_other_client_error_propagates` (passes in isolation). + +## ARCHITECTURE (quick map) +API path = `EpcPropertyDataMapper.from_api_response(doc)` → `from_rdsap_schema_21_0_1` → +`cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)` → `calculate_sap_from_inputs(...)`. Fabric U via +`domain/sap10_ml/rdsap_uvalues.py` (`u_wall/u_roof/u_floor/u_window`) feeding +`worksheet/heat_transmission.py` (per-BP loop). HW in `cert_to_inputs` §4 + `worksheet/water_heating.py`. +Efficiency in `domain/sap10_ml/sap_efficiencies.py` (`seasonal_efficiency` / `water_heating_efficiency`, +now strict-raising). Fuel cost/CO2/PE: `tables/table_12.py` + `tables/table_32.py`. SAP equation: +`worksheet/rating.py` (ECF = 0.42·cost/(TFA+45)). The §3 breakdown helper for pins: +`cert_to_inputs.heat_transmission_section_from_cert(epc)` → `HeatTransmission` (every (26)..(37) line +ref). **KEY INSIGHT: the gov-API JSON is the published OUTPUT of RdSAP software, not its input — +route fields Elmhurst doesn't consume to the spec default.** + +## READ ALSO +- `docs/HANDOVER_API_PROFILING.md` — the full SESSION-3..9 log + the load-bearing **DEPROVEN** list. +- Auto-memories: `project_per_cert_mapper_validation_state`, `reference_unmapped_sap_code`, + `reference_unmapped_api_code`, `reference_fuel_code_collision`, `feedback_software_no_special_handling`, + `feedback_spec_citation_in_commits`, `feedback_worksheet_not_api_reference`, + `reference_elmhurst_only_test_pattern`.