diff --git a/docs/HANDOVER_API_PROFILING.md b/docs/HANDOVER_API_PROFILING.md index 516abed2..c9c7bec3 100644 --- a/docs/HANDOVER_API_PROFILING.md +++ b/docs/HANDOVER_API_PROFILING.md @@ -13,17 +13,20 @@ deproven approaches + the meter/shower data-fidelity findings), and the earlier `energy_rating_current`. Headline gauge: `PYTHONPATH=/workspaces/model python scripts/eval_api_sap_accuracy.py`. -| metric | now (`a8e5563a`) | -|--------|------------------| -| **% \|err\| < 0.5** | **45.1%** | -| % \|err\| < 1.0 | 59.4% | -| mean \|err\| | 1.702 | -| mean signed | −0.006 (balanced) | -| computed / raises | **909 / 0** | -| unsupported_schema | 100 (deferred — see below) | +| metric | session-3 (`a8e5563a`) | **session-4 (`faf29942`)** | +|--------|------------------|------------------| +| **% \|err\| < 0.5** | 45.1% | **47.6%** | +| % \|err\| < 1.0 | 59.4% | **62.6%** | +| % \|err\| < 2.0 | 77.7% | **79.6%** | +| mean \|err\| | 1.702 | **1.586** | +| computed / raises | 909 / 0 | **909 / 0** | +| unsupported_schema | 100 (deferred) | 100 (deferred) | -45% is still poor. The systematic bias is gone; remaining error is per-cert scatter + the -profile-surfaced buckets below. +**SESSION-4 shipped (45.1 → 47.6%):** four spec-grounded fixes + closed one false lead. +See the `## SESSION-4 …` blocks below and the auto-memory for full detail. The systematic bias +is gone; the winning method this session was the **description-vs-code audit** + an +**outlier-robust categorical sweep** (rank by net directional skew + MEDIAN, not mean — the +mean-based metric is fooled by multi-cause outliers). 47.6% is still the target's halfway point. ## WHAT SHIPPED THIS SESSION (7 slices, all green, pyright net-zero) 1. `e41a0bc0` **PCDB heat pump w/o SAP code → Table 12a ASHP_APP_N SH split** (0.80 high-rate). @@ -39,7 +42,42 @@ profile-surfaced buckets below. (4)(5)(6) cleared **all 4 raises** — eval now has zero raises. 7. `(profiler)` **`scripts/profile_api_error.py`** — the new diagnostic (below). -## SESSION-4 UPDATE (HEAD `8741fbdf`) — read before re-working the leads below +## SESSION-4 UPDATE (HEAD `faf29942`) — read before re-working the leads below + +### Shipped this session (45.1 → 47.6%) +1. `b40e0f67` **exposed-floor-on-flats** (floor_heat_loss=1) — §3.12; per-BP override of the + dwelling-level flat suppression. +2. `8741fbdf` **floor_heat_loss=3 → above partially heated space, U=0.7** (§3.12/§5.14) + + re-pinned golden 7536 (its "irreducible residual" was THIS bug). +3. `5e7ef5c7` **boiler interlock for TRVs+bypass controls 2107/2111** (§9.4.11) — biggest single + win (+1.6pts). The no-interlock set was keyed off the wrong signal (the "+0.6 °C" annotation); + 2107/2111 lack a room thermostat → −5pp + Table 4f ×1.3 pump. +4. `faf29942` **description-lodged secondary heating** (§A.2.2/Table 11) — gas/oil boilers with an + API-description-only secondary ("Portable electric heaters (assumed)", code field None) + dropped the secondary (sec_kWh=0); now `_has_lodged_secondary_description` fires Table 11. + Also added cat-8 (electric underfloor) Table-11 fraction 0.10. +- `560c912c`/`d0f57a0e` docs: **roof_construction=8 lead CLOSED as data-fidelity** (not a bug — see + the roof-8 section below; user worksheet sim-case-29 proved we ≡ Elmhurst). + +### The two methods that worked (reuse these) +- **Description-vs-code audit:** join each int code (`floor_heat_loss`, `roof_construction`, + `wall_construction`, secondary type) to its authoritative `…[].description`, **on single-element + certs only** (multi-element `[]` arrays are LOSSY). Mis-maps fall out (floor-3, secondary). +- **Outlier-robust categorical sweep** (`/tmp/cat_audit2.py`): rank field-values by **net + directional skew** (#under−0.5 minus #over+0.5) + **MEDIAN** error. The mean-based directionality + metric (`/tmp/cat_audit.py`) gets FOOLED by multi-cause outliers (e.g. "Solid brick no insulation" + looked systematic at mean −1.07 but median is −0.22 = scatter; 2100 −61/RR drove it). + +### Open robust leads (verify with `/tmp/cat_audit2.py` — they shift; check MEDIAN not mean) +- `whc=903` electric-immersion HW: **median +0.87, n=84** — likely off-peak immersion handling + (the handover noted WHC 903 raises NotImplementedError on the Table-12a off-peak-immersion row). +- `main_heat_cat=7` electric storage: median +1.05, n=41 — over-rate (tariff/cost; partly artifact). +- `immersion_type=2` dual: +1.50, n=43 — we OVER-credit (so §12 dual→off-peak would worsen it). +- `dwelling_type=Top-floor flat`: median −1.24, n=99 — under-rate, mostly fabric scatter/artifacts. +- **Low-dir = SCATTER, do NOT single-fix:** non-PCDB main / data_source=2 (n=242, 28% within-0.5), + mains_gas=N electric (n=145), most flats. These are per-cert/data-fidelity, not one bug. + +### Resolved/closed this session (don't re-chase) - **Lead #1 `floor_codes=3` RESOLVED — the code IS authoritative.** The diagnostic that cracked it: join each **single-BP** cert's `floor_heat_loss` code to its independent `floors[].description` (the multi-BP tally was contaminated because a cert's `floors[]` summary