Model

mirror of https://github.com/Hestia-Homes/Model.git synced 2026-07-27 23:35:01 +00:00

History

Khalim Conn-Kowlessar 6072d8795a slice 16i: MAE + RMSE in metrics; sample_weight_fn + low_sap_tail_weight train_baseline now returns mae + rmse alongside mape/smape/r2. MAE is the user-facing metric ("predicted SAP within N points"); RMSE the quadratic counterpart. Both come straight from sklearn. New sample_weight_fn parameter: callable(y_train) -> per-row weights. Threads into LGBMRegressor.fit's sample_weight argument. Default None preserves existing behaviour. Default tail strategy exposed as low_sap_tail_weight(y, threshold=58, weight=3): 3x weight where SAP < 58. Threshold picked from slice 16h's per-decile residuals — decile 0 (SAP 1-58) carries 17% MAPE vs <5% body. Three TDD tracers, all AAA.		2026-05-17 14:48:00 +00:00
..
__init__.py	slice 14a: ml_training_data pkg + sample.py (CSV filter + random sample)	2026-05-16 17:39:43 +00:00
test_build_features.py	slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload)	2026-05-16 19:45:52 +00:00
test_bulk_zip_reader.py	slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload)	2026-05-16 19:45:52 +00:00
test_sample.py	slice 14a: ml_training_data pkg + sample.py (CSV filter + random sample)	2026-05-16 17:39:43 +00:00
test_storage.py	slice 14c: BulkZipReader streams certs from gov bulk JSON ZIP	2026-05-16 18:27:24 +00:00
test_train_baseline.py	slice 16i: MAE + RMSE in metrics; sample_weight_fn + low_sap_tail_weight	2026-05-17 14:48:00 +00:00
test_write_parquet.py	slice 14e: write_training_dataset emits parquet + schema.json + manifest.json	2026-05-16 18:43:31 +00:00