Model

mirror of https://github.com/Hestia-Homes/Model.git synced 2026-06-08 11:17:27 +00:00

Author	SHA1	Message	Date
Khalim Conn-Kowlessar	ac1aa56ab1	P2.1: extract predict_sap_for_cert; swap probe to SAP 10.2 spec prices ADR-0010 P2: cert-calibration layer is deleted, the probe uses SAP_10_2_SPEC_PRICES (already defined in cert_to_inputs.py). Extracts a pure predict_sap_for_cert(cert_document, *, prices) -> int helper out of main()'s inline pipeline so the spec-prices path is unit- testable in isolation; the helper is also reusable for P3's cohort- filtered probe variant. The pinned regression value (SAP=67 for cert 6035-7729 under spec prices, vs the cert's lodged SAP of 73 under cert-cal prices) lives in services/ml_training_data/tests/unit/test_sap_parity_probe.py. It will drift as P4 (PCDB) and the section sweep land their fixes; that's expected. cert_calibration_prices is still imported by test_golden_fixtures.py and the table_12_cert_calibration module is intact. P2.2/P2.3 retire those. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 09:51:42 +00:00
Khalim Conn-Kowlessar	136f149d46	tooling: widen parity probe sap_score range to (5, 99) Previous bound (20, 95) excluded full-SAP new-builds (sap_score 90+, which carry the dramatic wall U-value gap) and deepest-tail heritage certs (sap_score ≤ 20). Widening so the sample reflects the populations where the calculator's biggest spec gaps live. New baseline at 300 certs, seed=7: SAP MAE 5.34 → 4.59 (-0.75) PE MAE 48.99 → 46.78 (-2.21) PE bias 42.07 → 41.78 (-0.29) Note: the v18a parquet only contains ~0.7% certs with age_band=None, while the raw bulk zip has 15% full-SAP "Average thermal transmittance" certs. The parquet is filtering them somewhere upstream — to be chased in separate work. Until then, parity-probe MAE will under-show the true corpus impact of slices that target full-SAP certs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 20:38:22 +00:00
Khalim Conn-Kowlessar	1c0cb9ac07	tooling: per-end-use PEUI decomposition in parity probe Adds primary-energy breakdown (space heating, hot water, lighting, pumps, PV) per cert plus stratified bias reports by main_heating_ category, construction_age_band, and dwelling_type. Used to localise the +51 kWh/m² PEUI bias to envelope-side over-prediction on pre-1996 fabric, which the bare SAP-residual ranking didn't surface. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 20:14:39 +00:00
Khalim Conn-Kowlessar	2a9999bdf6	slice S-B22: primary energy in SapResult + Table 12 PEF column Wires SAP 10.2 Table 12 "Primary energy factor" column into Table 12 helpers and onto CalculatorInputs as three per-end-use factors (space heating, hot water, other). calculate_sap_from_inputs now emits primary_energy_kwh_per_yr and primary_energy_kwh_per_m2 on SapResult, matching the cert's `energy_consumption_current` field (PEUI). Triggered by a decomposition that revealed I'd been comparing our delivered energy to the cert's primary energy — apples to oranges. With proper primary-energy comparison the actual finding is: 300-cert primary-energy diff (cert calibration prices): energy MAE: 57.3 kWh/m² energy bias: +51.6 (we over-predict by ~50%) energy P50: +49.5 This is a much bigger systemic bug than the SAP MAE 5.34 suggested. Closing it requires investigating either (a) demand model over-prediction, (b) HW losses, (c) PEF values per fuel, or (d) cert reporting convention differences. Targeted for the next context. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 19:02:30 +00:00
Khalim Conn-Kowlessar	92727568a3	slice S-B10: price-table seam for cert-calibration parity validation Separates the SAP-spec source of truth from the empirical cert- calibration prices. cert_to_inputs() now accepts a `prices: PriceTable` parameter defaulting to SAP_10_2_SPEC_PRICES (3.64 gas, 16.49 elec, 9.40 7h-low — verbatim from SAP 10.2 §12.2 / Table 12). Parity probe passes the empirical cert_calibration_prices() factory from domain.sap.tables.table_12_cert_calibration which carries the lower prices that match the cert assessor software's actual output (3.48, 13.19, 5.50). This split is documented in both table modules: cert calibration is explicitly NOT spec-correct, it just matches observed cert behaviour for parity testing. 100-cert parity probe with cert-calibration prices: MAE 6.66 → 4.99 (recovered from spec-price regression; also -0.41 from absolute baseline thanks to other S-B fixes) RMSE 10.29 → 7.13 bias -4.66 → -1.03 within ±1: 20% → 23% within ±3: 38% → 47% within ±5: 63% → 67% within ±10: 82% → 93% Session-B progress overall (S-B2 baseline → here): MAE 8.41 → 4.99, within ±1 doubled (10% → 23%). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 15:20:46 +00:00
Khalim Conn-Kowlessar	dde8ae30fa	S-B2: parity probe + first-pass findings (100-cert baseline) Adds services/ml_training_data/src/ml_training_data/sap_parity_probe.py — samples N certs from the v18a corpus, streams them via BulkZipReader, runs Sap10Calculator, prints MAE/RMSE/bias + worst-N residuals. Baseline across 100 certs: MAE 8.41, RMSE 13.98, bias -2.65, 0 errors. docs/sap-spec/PARITY_FINDINGS.md captures the dominant failure pattern (flats + bungalows under-predicted, 10 of the worst-15 are flats whose floor/roof are party with neighbouring dwellings) and the priority- ordered Session B iteration backlog (S-B-flat-surfaces first). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 13:59:23 +00:00
Khalim Conn-Kowlessar	6072d8795a	slice 16i: MAE + RMSE in metrics; sample_weight_fn + low_sap_tail_weight train_baseline now returns mae + rmse alongside mape/smape/r2. MAE is the user-facing metric ("predicted SAP within N points"); RMSE the quadratic counterpart. Both come straight from sklearn. New sample_weight_fn parameter: callable(y_train) -> per-row weights. Threads into LGBMRegressor.fit's sample_weight argument. Default None preserves existing behaviour. Default tail strategy exposed as low_sap_tail_weight(y, threshold=58, weight=3): 3x weight where SAP < 58. Threshold picked from slice 16h's per-decile residuals — decile 0 (SAP 1-58) carries 17% MAPE vs <5% body. Three TDD tracers, all AAA.	2026-05-17 14:48:00 +00:00
Khalim Conn-Kowlessar	ece1279475	revert slice 16g: drop mape objective per 16h ablation 250k retrain showed objective='mape' loses ~0.6 percentage points of global sap_score MAPE (3.92% with regression vs 4.50% with mape) and ~0.7 pts on peui_ucl. The mape objective over-weights the low-SAP tail (weight ~1/y) and drags the body MAPE up by more than it gains in the tail. Body MAPE on v16 features is already strong (2.38% on deciles 1-8); the remaining tail bias at decile 0 (SAP<58, +3.1 bias) needs a different fix -- sample weights or stratified loss -- queued as slice 16i.	2026-05-17 14:34:04 +00:00
Khalim Conn-Kowlessar	700ff4640c	slice 16g: LightGBM objective=mape for sap_score + peui_ucl Per ADR-0008: the v15 baseline reports MAPE but optimises MSE, which under-weights tail rows. Switching to objective='mape' applies gradient proportional to 1/\|y\| and lets the model focus where MAPE penalises. Targets co2_emissions, space_heating_kwh, hot_water_kwh, and peui_raw retain the default 'regression' objective (some rows have ~zero CO2 from heavy PV; MAPE objective destabilises near zero). Sample weights deferred to slice 16i if slice 16h's per-decile residuals still show tail bias after the objective switch.	2026-05-17 12:06:13 +00:00
Khalim Conn-Kowlessar	fd8d71eb05	slice 15e: per-decile residuals reporting in train_baseline Adds `_per_decile_residuals` and writes `residuals_<target>.json` next to metrics.json. Buckets test-set rows by deciles of the true target value; each bucket carries count + MAPE + MAE + mean residual + true_min/max. Lets us tell whether errors concentrate in the tails of the true distribution (e.g. SAP<40 / SAP>85) vs the mid-band — which the global MAPE alone hides. Baseline for slice 16's MAPE-improvement ablations.	2026-05-17 11:18:40 +00:00
Khalim Conn-Kowlessar	a1f89b6033	slice 15c: stream build_features so 500k+ cert runs fit memory Previously kept the full list of EpcPropertyData in memory before calling EpcMlTransform.to_rows. For the 25k slice that's ~30 MB; for the 580k full-2026 corpus it OOM-killed the process silently. Now: parse cert -> to_row -> append dict -> drop EpcPropertyData reference, so memory is O(row-dict * n) instead of O(EpcPropertyData * n). Same end-of-frame post-processing (categorical casts, column-order pin). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 00:36:53 +00:00
Khalim Conn-Kowlessar	c496f345f8	slice 14l: bigger-run fixes — UCL guard, PV Measurement coercion, sMAPE Three changes surfaced by the 25k 2026 run: - transform._peui_ucl returns None for non-positive raw PEUI (net-exporters). apply_ucl_correction would otherwise raise ValueError on negative input. - PhotovoltaicArray scalars (peak_power, pitch, orientation, overshading) now accept Measurement \| int \| float in the schema; mapper coerces via _measurement_value. - train_baseline reports sMAPE alongside MAPE — handles zero-actual rows (e.g. co2_emissions for net-zero certs) where MAPE explodes. Results at N=25,000 RdSAP 2026 certs (~32s end-to-end): sap_score MAPE=0.064 sMAPE=0.054 R^2=0.762 co2_emissions sMAPE=0.140 R^2=0.890 peui_raw MAPE=0.126 sMAPE=0.120 R^2=0.714 peui_ucl MAPE=0.114 sMAPE=0.108 R^2=0.736 space_heating_kwh MAPE=0.167 sMAPE=0.157 R^2=0.915 hot_water_kwh MAPE=0.089 sMAPE=0.086 R^2=0.737 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 21:15:37 +00:00
Khalim Conn-Kowlessar	8fddd25b9a	slice 14k: E2E pipeline runs on real 2026 RdSAP certs Two production fixes surfaced by the live run: - mapper.from_rdsap_schema_21_0_1 now sets the three ML target scalars (energy_rating_current, co2_emissions_current, energy_consumption_current). They were silently None for every cert before, leaving the only labels as the kWh fields from renewable_heat_incentive. - train_baseline coerces object-dtype columns to numeric (None -> NaN) and drops rows with null target per fit, so LightGBM accepts the frame. E2E on 500 real certs (~1s): sap_score R^2=0.604 MAPE=0.084 co2_emissions R^2=0.813 MAPE=0.130 peui_raw R^2=0.979 MAPE=0.026 space_heating_kwh R^2=0.823 MAPE=0.213 hot_water_kwh R^2=0.519 MAPE=0.115 peui_ucl excluded: UCL correction still needs wiring. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 20:47:41 +00:00
Khalim Conn-Kowlessar	611c07de94	slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload) Bulk entries are NDJSON of wrapper records, not a JSON array. Each wrapper carries certificate_number, assessment_type, and a stringified document with the actual EPC schema payload. Filter to RdSAP, unwrap document, then map. remote_bulk_fetcher: per-entry presigned-URL refresh (30s S3 TTL). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 19:45:52 +00:00
Khalim Conn-Kowlessar	9eb70cede1	slice 14g: remote_bulk_fetcher extracts ZIP entries via HTTP Range (no full download) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 19:16:52 +00:00
Khalim Conn-Kowlessar	b676e05d49	slice 14f: train_baseline fits LightGBM per target, emits MAPE/R^2 + importance Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 18:47:49 +00:00
Khalim Conn-Kowlessar	23ba2ef271	slice 14e: write_training_dataset emits parquet + schema.json + manifest.json Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 18:43:31 +00:00
Khalim Conn-Kowlessar	20fd55d5a1	slice 14d: build_features wires bulk reader -> mapper -> EpcMlTransform ijson use_float fixes Decimal/float coercion when streaming JSON. pyright extraPaths so the new pkg type-checks against domna-domain. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 18:38:41 +00:00
Khalim Conn-Kowlessar	0ff9d546b8	slice 14c: BulkZipReader streams certs from gov bulk JSON ZIP Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 18:27:24 +00:00
Khalim Conn-Kowlessar	7a6c8b4f24	slice 14b: Storage protocol + LocalStorage impl Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 17:52:54 +00:00
Khalim Conn-Kowlessar	eb42cb88a1	slice 14a: ml_training_data pkg + sample.py (CSV filter + random sample) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 17:39:43 +00:00
Khalim Conn-Kowlessar	dfe9e3ddbe	added potential file scaffolding:	2026-05-15 10:56:53 +00:00

22 commits