mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-30 13:10:47 +00:00
docs: session-4 handover — interlock + secondary fixes, robust-audit method, open leads
Updates the headline (45.1 → 47.6%), records the four shipped fixes + the roof-8 false-lead closure, documents the two methods that worked (description-vs-code audit + outlier-robust categorical sweep by net skew + median), and lists the open robust leads (whc=903 immersion HW, cat-7 storage, dual immersion) with the scatter buckets to avoid. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
faf29942ba
commit
d83c431c7d
1 changed files with 49 additions and 11 deletions
|
|
@ -13,17 +13,20 @@ deproven approaches + the meter/shower data-fidelity findings), and the earlier
|
|||
`energy_rating_current`. Headline gauge:
|
||||
`PYTHONPATH=/workspaces/model python scripts/eval_api_sap_accuracy.py`.
|
||||
|
||||
| metric | now (`a8e5563a`) |
|
||||
|--------|------------------|
|
||||
| **% \|err\| < 0.5** | **45.1%** |
|
||||
| % \|err\| < 1.0 | 59.4% |
|
||||
| mean \|err\| | 1.702 |
|
||||
| mean signed | −0.006 (balanced) |
|
||||
| computed / raises | **909 / 0** |
|
||||
| unsupported_schema | 100 (deferred — see below) |
|
||||
| metric | session-3 (`a8e5563a`) | **session-4 (`faf29942`)** |
|
||||
|--------|------------------|------------------|
|
||||
| **% \|err\| < 0.5** | 45.1% | **47.6%** |
|
||||
| % \|err\| < 1.0 | 59.4% | **62.6%** |
|
||||
| % \|err\| < 2.0 | 77.7% | **79.6%** |
|
||||
| mean \|err\| | 1.702 | **1.586** |
|
||||
| computed / raises | 909 / 0 | **909 / 0** |
|
||||
| unsupported_schema | 100 (deferred) | 100 (deferred) |
|
||||
|
||||
45% is still poor. The systematic bias is gone; remaining error is per-cert scatter + the
|
||||
profile-surfaced buckets below.
|
||||
**SESSION-4 shipped (45.1 → 47.6%):** four spec-grounded fixes + closed one false lead.
|
||||
See the `## SESSION-4 …` blocks below and the auto-memory for full detail. The systematic bias
|
||||
is gone; the winning method this session was the **description-vs-code audit** + an
|
||||
**outlier-robust categorical sweep** (rank by net directional skew + MEDIAN, not mean — the
|
||||
mean-based metric is fooled by multi-cause outliers). 47.6% is still the target's halfway point.
|
||||
|
||||
## WHAT SHIPPED THIS SESSION (7 slices, all green, pyright net-zero)
|
||||
1. `e41a0bc0` **PCDB heat pump w/o SAP code → Table 12a ASHP_APP_N SH split** (0.80 high-rate).
|
||||
|
|
@ -39,7 +42,42 @@ profile-surfaced buckets below.
|
|||
(4)(5)(6) cleared **all 4 raises** — eval now has zero raises.
|
||||
7. `(profiler)` **`scripts/profile_api_error.py`** — the new diagnostic (below).
|
||||
|
||||
## SESSION-4 UPDATE (HEAD `8741fbdf`) — read before re-working the leads below
|
||||
## SESSION-4 UPDATE (HEAD `faf29942`) — read before re-working the leads below
|
||||
|
||||
### Shipped this session (45.1 → 47.6%)
|
||||
1. `b40e0f67` **exposed-floor-on-flats** (floor_heat_loss=1) — §3.12; per-BP override of the
|
||||
dwelling-level flat suppression.
|
||||
2. `8741fbdf` **floor_heat_loss=3 → above partially heated space, U=0.7** (§3.12/§5.14) +
|
||||
re-pinned golden 7536 (its "irreducible residual" was THIS bug).
|
||||
3. `5e7ef5c7` **boiler interlock for TRVs+bypass controls 2107/2111** (§9.4.11) — biggest single
|
||||
win (+1.6pts). The no-interlock set was keyed off the wrong signal (the "+0.6 °C" annotation);
|
||||
2107/2111 lack a room thermostat → −5pp + Table 4f ×1.3 pump.
|
||||
4. `faf29942` **description-lodged secondary heating** (§A.2.2/Table 11) — gas/oil boilers with an
|
||||
API-description-only secondary ("Portable electric heaters (assumed)", code field None)
|
||||
dropped the secondary (sec_kWh=0); now `_has_lodged_secondary_description` fires Table 11.
|
||||
Also added cat-8 (electric underfloor) Table-11 fraction 0.10.
|
||||
- `560c912c`/`d0f57a0e` docs: **roof_construction=8 lead CLOSED as data-fidelity** (not a bug — see
|
||||
the roof-8 section below; user worksheet sim-case-29 proved we ≡ Elmhurst).
|
||||
|
||||
### The two methods that worked (reuse these)
|
||||
- **Description-vs-code audit:** join each int code (`floor_heat_loss`, `roof_construction`,
|
||||
`wall_construction`, secondary type) to its authoritative `…[].description`, **on single-element
|
||||
certs only** (multi-element `[]` arrays are LOSSY). Mis-maps fall out (floor-3, secondary).
|
||||
- **Outlier-robust categorical sweep** (`/tmp/cat_audit2.py`): rank field-values by **net
|
||||
directional skew** (#under−0.5 minus #over+0.5) + **MEDIAN** error. The mean-based directionality
|
||||
metric (`/tmp/cat_audit.py`) gets FOOLED by multi-cause outliers (e.g. "Solid brick no insulation"
|
||||
looked systematic at mean −1.07 but median is −0.22 = scatter; 2100 −61/RR drove it).
|
||||
|
||||
### Open robust leads (verify with `/tmp/cat_audit2.py` — they shift; check MEDIAN not mean)
|
||||
- `whc=903` electric-immersion HW: **median +0.87, n=84** — likely off-peak immersion handling
|
||||
(the handover noted WHC 903 raises NotImplementedError on the Table-12a off-peak-immersion row).
|
||||
- `main_heat_cat=7` electric storage: median +1.05, n=41 — over-rate (tariff/cost; partly artifact).
|
||||
- `immersion_type=2` dual: +1.50, n=43 — we OVER-credit (so §12 dual→off-peak would worsen it).
|
||||
- `dwelling_type=Top-floor flat`: median −1.24, n=99 — under-rate, mostly fabric scatter/artifacts.
|
||||
- **Low-dir = SCATTER, do NOT single-fix:** non-PCDB main / data_source=2 (n=242, 28% within-0.5),
|
||||
mains_gas=N electric (n=145), most flats. These are per-cert/data-fidelity, not one bug.
|
||||
|
||||
### Resolved/closed this session (don't re-chase)
|
||||
- **Lead #1 `floor_codes=3` RESOLVED — the code IS authoritative.** The diagnostic that cracked
|
||||
it: join each **single-BP** cert's `floor_heat_loss` code to its independent
|
||||
`floors[].description` (the multi-BP tally was contaminated because a cert's `floors[]` summary
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue