Model/services/ml_training_data/tests/unit
Khalim Conn-Kowlessar ece1279475 revert slice 16g: drop mape objective per 16h ablation
250k retrain showed objective='mape' loses ~0.6 percentage points of
global sap_score MAPE (3.92% with regression vs 4.50% with mape) and
~0.7 pts on peui_ucl. The mape objective over-weights the low-SAP tail
(weight ~1/y) and drags the body MAPE up by more than it gains in the
tail.

Body MAPE on v16 features is already strong (2.38% on deciles 1-8); the
remaining tail bias at decile 0 (SAP<58, +3.1 bias) needs a different
fix -- sample weights or stratified loss -- queued as slice 16i.
2026-05-17 14:34:04 +00:00
..
__init__.py slice 14a: ml_training_data pkg + sample.py (CSV filter + random sample) 2026-05-16 17:39:43 +00:00
test_build_features.py slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload) 2026-05-16 19:45:52 +00:00
test_bulk_zip_reader.py slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload) 2026-05-16 19:45:52 +00:00
test_sample.py slice 14a: ml_training_data pkg + sample.py (CSV filter + random sample) 2026-05-16 17:39:43 +00:00
test_storage.py slice 14c: BulkZipReader streams certs from gov bulk JSON ZIP 2026-05-16 18:27:24 +00:00
test_train_baseline.py revert slice 16g: drop mape objective per 16h ablation 2026-05-17 14:34:04 +00:00
test_write_parquet.py slice 14e: write_training_dataset emits parquet + schema.json + manifest.json 2026-05-16 18:43:31 +00:00