docs: session-4 handover — interlock + secondary fixes, robust-audit method, open leads

Updates the headline (45.1 → 47.6%), records the four shipped fixes + the roof-8 false-lead closure, documents the two methods that worked (description-vs-code audit + outlier-robust categorical sweep by net skew + median), and lists the open robust leads (whc=903 immersion HW, cat-7 storage, dual immersion) with the scatter buckets to avoid. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 13:10:47 +00:00 · 2026-06-08 14:47:31 +00:00 · 2026-06-08 14:47:31 +00:00 · d83c431c7d
commit d83c431c7d
parent faf29942ba
1 changed files with 49 additions and 11 deletions
--- a/docs/HANDOVER_API_PROFILING.md
+++ b/docs/HANDOVER_API_PROFILING.md
@ -13,17 +13,20 @@ deproven approaches + the meter/shower data-fidelity findings), and the earlier
 `energy_rating_current`. Headline gauge:
 `PYTHONPATH=/workspaces/model python scripts/eval_api_sap_accuracy.py`.

-| metric | now (`a8e5563a`) |
-|--------|------------------|
-| **% \|err\| < 0.5** | **45.1%** |
-| % \|err\| < 1.0 | 59.4% |
-| mean \|err\| | 1.702 |
-| mean signed | −0.006 (balanced) |
-| computed / raises | **909 / 0** |
-| unsupported_schema | 100 (deferred — see below) |
+| metric | session-3 (`a8e5563a`) | **session-4 (`faf29942`)** |
+|--------|------------------|------------------|
+| **% \|err\| < 0.5** | 45.1% | **47.6%** |
+| % \|err\| < 1.0 | 59.4% | **62.6%** |
+| % \|err\| < 2.0 | 77.7% | **79.6%** |
+| mean \|err\| | 1.702 | **1.586** |
+| computed / raises | 909 / 0 | **909 / 0** |
+| unsupported_schema | 100 (deferred) | 100 (deferred) |

-45% is still poor. The systematic bias is gone; remaining error is per-cert scatter + the
-profile-surfaced buckets below.
+**SESSION-4 shipped (45.1 → 47.6%):** four spec-grounded fixes + closed one false lead.
+See the `## SESSION-4 …` blocks below and the auto-memory for full detail. The systematic bias
+is gone; the winning method this session was the **description-vs-code audit** + an
+**outlier-robust categorical sweep** (rank by net directional skew + MEDIAN, not mean — the
+mean-based metric is fooled by multi-cause outliers). 47.6% is still the target's halfway point.

 ## WHAT SHIPPED THIS SESSION (7 slices, all green, pyright net-zero)
 1. `e41a0bc0` **PCDB heat pump w/o SAP code → Table 12a ASHP_APP_N SH split** (0.80 high-rate).
@ -39,7 +42,42 @@ profile-surfaced buckets below.
   (4)(5)(6) cleared **all 4 raises** — eval now has zero raises.
 7. `(profiler)` **`scripts/profile_api_error.py`** — the new diagnostic (below).

-## SESSION-4 UPDATE (HEAD `8741fbdf`) — read before re-working the leads below
+## SESSION-4 UPDATE (HEAD `faf29942`) — read before re-working the leads below
+
+### Shipped this session (45.1 → 47.6%)
+1. `b40e0f67` **exposed-floor-on-flats** (floor_heat_loss=1) — §3.12; per-BP override of the
+   dwelling-level flat suppression.
+2. `8741fbdf` **floor_heat_loss=3 → above partially heated space, U=0.7** (§3.12/§5.14) +
+   re-pinned golden 7536 (its "irreducible residual" was THIS bug).
+3. `5e7ef5c7` **boiler interlock for TRVs+bypass controls 2107/2111** (§9.4.11) — biggest single
+   win (+1.6pts). The no-interlock set was keyed off the wrong signal (the "+0.6 °C" annotation);
+   2107/2111 lack a room thermostat → −5pp + Table 4f ×1.3 pump.
+4. `faf29942` **description-lodged secondary heating** (§A.2.2/Table 11) — gas/oil boilers with an
+   API-description-only secondary ("Portable electric heaters (assumed)", code field None)
+   dropped the secondary (sec_kWh=0); now `_has_lodged_secondary_description` fires Table 11.
+   Also added cat-8 (electric underfloor) Table-11 fraction 0.10.
+- `560c912c`/`d0f57a0e` docs: **roof_construction=8 lead CLOSED as data-fidelity** (not a bug — see
+  the roof-8 section below; user worksheet sim-case-29 proved we ≡ Elmhurst).
+
+### The two methods that worked (reuse these)
+- **Description-vs-code audit:** join each int code (`floor_heat_loss`, `roof_construction`,
+  `wall_construction`, secondary type) to its authoritative `…[].description`, **on single-element
+  certs only** (multi-element `[]` arrays are LOSSY). Mis-maps fall out (floor-3, secondary).
+- **Outlier-robust categorical sweep** (`/tmp/cat_audit2.py`): rank field-values by **net
+  directional skew** (#under−0.5 minus #over+0.5) + **MEDIAN** error. The mean-based directionality
+  metric (`/tmp/cat_audit.py`) gets FOOLED by multi-cause outliers (e.g. "Solid brick no insulation"
+  looked systematic at mean −1.07 but median is −0.22 = scatter; 2100 −61/RR drove it).
+
+### Open robust leads (verify with `/tmp/cat_audit2.py` — they shift; check MEDIAN not mean)
+- `whc=903` electric-immersion HW: **median +0.87, n=84** — likely off-peak immersion handling
+  (the handover noted WHC 903 raises NotImplementedError on the Table-12a off-peak-immersion row).
+- `main_heat_cat=7` electric storage: median +1.05, n=41 — over-rate (tariff/cost; partly artifact).
+- `immersion_type=2` dual: +1.50, n=43 — we OVER-credit (so §12 dual→off-peak would worsen it).
+- `dwelling_type=Top-floor flat`: median −1.24, n=99 — under-rate, mostly fabric scatter/artifacts.
+- **Low-dir = SCATTER, do NOT single-fix:** non-PCDB main / data_source=2 (n=242, 28% within-0.5),
+  mains_gas=N electric (n=145), most flats. These are per-cert/data-fidelity, not one bug.
+
+### Resolved/closed this session (don't re-chase)
 - **Lead #1 `floor_codes=3` RESOLVED — the code IS authoritative.** The diagnostic that cracked
  it: join each **single-BP** cert's `floor_heat_loss` code to its independent
  `floors[].description` (the multi-BP tally was contaminated because a cert's `floors[]` summary