mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
Adds services/ml_training_data/src/ml_training_data/sap_parity_probe.py — samples N certs from the v18a corpus, streams them via BulkZipReader, runs Sap10Calculator, prints MAE/RMSE/bias + worst-N residuals. Baseline across 100 certs: MAE 8.41, RMSE 13.98, bias -2.65, 0 errors. docs/sap-spec/PARITY_FINDINGS.md captures the dominant failure pattern (flats + bungalows under-predicted, 10 of the worst-15 are flats whose floor/roof are party with neighbouring dwellings) and the priority- ordered Session B iteration backlog (S-B-flat-surfaces first). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3.9 KiB
3.9 KiB
Sap10Calculator parity probe — findings as of 2026-05-18
100-cert random sample from data/ml_training/runs/2025_2026_n250000_v18a/data.parquet, filtered to cert sap-score 20-95 (typical band). 0 errors — calculator runs end-to-end on every cert.
Headline
| Metric | Value |
|---|---|
| MAE | 8.41 SAP-points |
| RMSE | 13.98 |
| Bias | -2.65 (slight under-prediction) |
| Within ±1 | 18.0% |
| Within ±3 | 36.0% |
| Within ±5 | 57.0% |
| Within ±10 | 84.0% |
| Worst residual | -56 SAP-points |
Session B success criterion is MAE ≤ 1.0 on the typical subset; we're 8× that on the first pass, which roughly matches ADR-0009's expectation that the first run shakes out spec-interpretation gaps.
Dominant failure shape: flats and bungalows under-predicted
10 of the 15 worst residuals are flats or bungalows. Pattern: calculator charges floor + roof heat loss to dwellings that don't have exposed floor / roof surfaces (mid-floor flats, top-floor flats with party ceiling, etc.).
Worst 15 (residual = predicted − actual):
| Cert | actual | predicted | residual | TFA | dwelling |
|---|---|---|---|---|---|
| 0320-2756-7670-2196-2035 | 78 | 22 | -56 | 57 | Semi-detached bungalow |
| 0036-1125-8600-0165-2206 | 63 | 18 | -45 | 42 | Mid-floor flat |
| 0340-2394-5510-2925-4421 | 75 | 35 | -40 | 73 | Mid-floor flat |
| 9360-2179-9590-2495-2615 | 78 | 39 | -39 | 54 | Ground-floor flat |
| 0036-0529-1500-0700-8276 | 75 | 36 | -39 | 47 | Top-floor flat |
| 0350-2182-9590-2526-7841 | 43 | 4 | -39 | 119 | Top-floor flat |
| 2148-3061-6204-0016-7204 | 81 | 44 | -37 | 67 | Mid-floor flat |
| 0800-1364-0922-4522-3963 | 71 | 37 | -34 | 70 | Detached bungalow |
| 2110-6453-5050-8205-9605 | 63 | 31 | -32 | 43 | Ground-floor maisonette |
| 2903-8339-6962-6004-0725 | 75 | 47 | -28 | 11 | Top-floor flat |
| 0320-2850-3380-2125-1661 | 70 | 48 | -22 | 45 | Semi-detached bungalow |
| 8035-9023-1500-0237-3226 | 43 | 63 | +20 | 64 | Detached bungalow |
| 9590-7751-0022-0599-3953 | 51 | 69 | +18 | 74 | Detached house |
| 2118-1198-2619-1711-7960 | 62 | 46 | -16 | 42 | Mid-floor flat |
| 3336-3822-5500-0437-9202 | 70 | 59 | -11 | 73 | Mid-floor maisonette |
Session B iteration backlog (priority order)
- S-B-flat-surfaces — Map
dwelling_typeto exposed floor/roof flags. Mid/top flats lose theiru_floor × ground_floor_area; mid/ground flats lose theiru_roof × top_floor_area. Expected impact: closes most of the −20 to −56 residuals. - S-B-heating-eff-fallback — When
sap_main_heating_codeis None, fall back throughmain_heating_category+ age band to a modern-condensing-boiler efficiency, not the legacy 0.80. ~28% of our 100-cert sample had a null code with category=2. - S-B-electric-storage-tariff — Electric storage heaters (codes 401-409) should price space-heating fuel at Economy-7 low rate (Table 32 code 31, ~5.5 p/kWh), not standard rate 30. This is a 2× cost reduction on those certs.
- S-B-wall-uvalue-cascade-review — Worst non-flat residuals suggest the wall U-value cascade is too conservative for recently-built / well-insulated stock. Review
domain.ml.rdsap_uvalues.u_wallagainst RdSAP 10 Table 5. - S-B-bungalow-investigation — Bungalow residuals don't fit the flat-surfaces pattern (bungalows have full floor+roof). Hypothesis: thermal-bridging y-factor + storey-count interaction over-counts envelope. Probe specifically before deciding.
- S-B-pump-fan-default — We default to 130 kWh/yr; SAP 10.3 Table 4f says higher for systems with mechanical ventilation. Marginal but consistent.
How to reproduce
python adhoc/sap_calculator/probe_n.py # 100 certs, seed=7
python adhoc/sap_calculator/probe_n.py 500 13 # bigger sample
python adhoc/sap_calculator/probe_worst.py # detailed cert-by-cert dump
probe_n.py runs in ~80s. Errors: 0/100. Mapper handles every real cert shape encountered.