Model/services/ml_training_data/tests/unit
Khalim Conn-Kowlessar 611c07de94 slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload)
Bulk entries are NDJSON of wrapper records, not a JSON array. Each wrapper
carries certificate_number, assessment_type, and a stringified document with
the actual EPC schema payload. Filter to RdSAP, unwrap document, then map.

remote_bulk_fetcher: per-entry presigned-URL refresh (30s S3 TTL).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 19:45:52 +00:00
..
__init__.py slice 14a: ml_training_data pkg + sample.py (CSV filter + random sample) 2026-05-16 17:39:43 +00:00
test_build_features.py slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload) 2026-05-16 19:45:52 +00:00
test_bulk_zip_reader.py slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload) 2026-05-16 19:45:52 +00:00
test_sample.py slice 14a: ml_training_data pkg + sample.py (CSV filter + random sample) 2026-05-16 17:39:43 +00:00
test_storage.py slice 14c: BulkZipReader streams certs from gov bulk JSON ZIP 2026-05-16 18:27:24 +00:00
test_train_baseline.py slice 14f: train_baseline fits LightGBM per target, emits MAPE/R^2 + importance 2026-05-16 18:47:49 +00:00
test_write_parquet.py slice 14e: write_training_dataset emits parquet + schema.json + manifest.json 2026-05-16 18:43:31 +00:00