Model

mirror of https://github.com/Hestia-Homes/Model.git synced 2026-06-08 11:17:27 +00:00

Author	SHA1	Message	Date
Khalim Conn-Kowlessar	6072d8795a	slice 16i: MAE + RMSE in metrics; sample_weight_fn + low_sap_tail_weight train_baseline now returns mae + rmse alongside mape/smape/r2. MAE is the user-facing metric ("predicted SAP within N points"); RMSE the quadratic counterpart. Both come straight from sklearn. New sample_weight_fn parameter: callable(y_train) -> per-row weights. Threads into LGBMRegressor.fit's sample_weight argument. Default None preserves existing behaviour. Default tail strategy exposed as low_sap_tail_weight(y, threshold=58, weight=3): 3x weight where SAP < 58. Threshold picked from slice 16h's per-decile residuals — decile 0 (SAP 1-58) carries 17% MAPE vs <5% body. Three TDD tracers, all AAA.	2026-05-17 14:48:00 +00:00
Khalim Conn-Kowlessar	ece1279475	revert slice 16g: drop mape objective per 16h ablation 250k retrain showed objective='mape' loses ~0.6 percentage points of global sap_score MAPE (3.92% with regression vs 4.50% with mape) and ~0.7 pts on peui_ucl. The mape objective over-weights the low-SAP tail (weight ~1/y) and drags the body MAPE up by more than it gains in the tail. Body MAPE on v16 features is already strong (2.38% on deciles 1-8); the remaining tail bias at decile 0 (SAP<58, +3.1 bias) needs a different fix -- sample weights or stratified loss -- queued as slice 16i.	2026-05-17 14:34:04 +00:00
Khalim Conn-Kowlessar	700ff4640c	slice 16g: LightGBM objective=mape for sap_score + peui_ucl Per ADR-0008: the v15 baseline reports MAPE but optimises MSE, which under-weights tail rows. Switching to objective='mape' applies gradient proportional to 1/\|y\| and lets the model focus where MAPE penalises. Targets co2_emissions, space_heating_kwh, hot_water_kwh, and peui_raw retain the default 'regression' objective (some rows have ~zero CO2 from heavy PV; MAPE objective destabilises near zero). Sample weights deferred to slice 16i if slice 16h's per-decile residuals still show tail bias after the objective switch.	2026-05-17 12:06:13 +00:00
Khalim Conn-Kowlessar	fd8d71eb05	slice 15e: per-decile residuals reporting in train_baseline Adds `_per_decile_residuals` and writes `residuals_<target>.json` next to metrics.json. Buckets test-set rows by deciles of the true target value; each bucket carries count + MAPE + MAE + mean residual + true_min/max. Lets us tell whether errors concentrate in the tails of the true distribution (e.g. SAP<40 / SAP>85) vs the mid-band — which the global MAPE alone hides. Baseline for slice 16's MAPE-improvement ablations.	2026-05-17 11:18:40 +00:00
Khalim Conn-Kowlessar	611c07de94	slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload) Bulk entries are NDJSON of wrapper records, not a JSON array. Each wrapper carries certificate_number, assessment_type, and a stringified document with the actual EPC schema payload. Filter to RdSAP, unwrap document, then map. remote_bulk_fetcher: per-entry presigned-URL refresh (30s S3 TTL). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 19:45:52 +00:00
Khalim Conn-Kowlessar	b676e05d49	slice 14f: train_baseline fits LightGBM per target, emits MAPE/R^2 + importance Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 18:47:49 +00:00
Khalim Conn-Kowlessar	23ba2ef271	slice 14e: write_training_dataset emits parquet + schema.json + manifest.json Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 18:43:31 +00:00
Khalim Conn-Kowlessar	20fd55d5a1	slice 14d: build_features wires bulk reader -> mapper -> EpcMlTransform ijson use_float fixes Decimal/float coercion when streaming JSON. pyright extraPaths so the new pkg type-checks against domna-domain. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 18:38:41 +00:00
Khalim Conn-Kowlessar	0ff9d546b8	slice 14c: BulkZipReader streams certs from gov bulk JSON ZIP Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 18:27:24 +00:00
Khalim Conn-Kowlessar	7a6c8b4f24	slice 14b: Storage protocol + LocalStorage impl Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 17:52:54 +00:00
Khalim Conn-Kowlessar	eb42cb88a1	slice 14a: ml_training_data pkg + sample.py (CSV filter + random sample) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 17:39:43 +00:00

11 commits