Model/services/ml_training_data
Khalim Conn-Kowlessar 611c07de94 slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload)
Bulk entries are NDJSON of wrapper records, not a JSON array. Each wrapper
carries certificate_number, assessment_type, and a stringified document with
the actual EPC schema payload. Filter to RdSAP, unwrap document, then map.

remote_bulk_fetcher: per-entry presigned-URL refresh (30s S3 TTL).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 19:45:52 +00:00
..
src/ml_training_data slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload) 2026-05-16 19:45:52 +00:00
tests slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload) 2026-05-16 19:45:52 +00:00
pyproject.toml slice 14g: remote_bulk_fetcher extracts ZIP entries via HTTP Range (no full download) 2026-05-16 19:16:52 +00:00