Model/services/ml_training_data/tests/unit
Khalim Conn-Kowlessar 700ff4640c slice 16g: LightGBM objective=mape for sap_score + peui_ucl
Per ADR-0008: the v15 baseline reports MAPE but optimises MSE, which
under-weights tail rows. Switching to objective='mape' applies gradient
proportional to 1/|y| and lets the model focus where MAPE penalises.

Targets co2_emissions, space_heating_kwh, hot_water_kwh, and peui_raw
retain the default 'regression' objective (some rows have ~zero CO2 from
heavy PV; MAPE objective destabilises near zero).

Sample weights deferred to slice 16i if slice 16h's per-decile residuals
still show tail bias after the objective switch.
2026-05-17 12:06:13 +00:00
..
__init__.py slice 14a: ml_training_data pkg + sample.py (CSV filter + random sample) 2026-05-16 17:39:43 +00:00
test_build_features.py slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload) 2026-05-16 19:45:52 +00:00
test_bulk_zip_reader.py slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload) 2026-05-16 19:45:52 +00:00
test_sample.py slice 14a: ml_training_data pkg + sample.py (CSV filter + random sample) 2026-05-16 17:39:43 +00:00
test_storage.py slice 14c: BulkZipReader streams certs from gov bulk JSON ZIP 2026-05-16 18:27:24 +00:00
test_train_baseline.py slice 16g: LightGBM objective=mape for sap_score + peui_ucl 2026-05-17 12:06:13 +00:00
test_write_parquet.py slice 14e: write_training_dataset emits parquet + schema.json + manifest.json 2026-05-16 18:43:31 +00:00