Model/services/ml_training_data/tests/unit
Khalim Conn-Kowlessar ac1aa56ab1 P2.1: extract predict_sap_for_cert; swap probe to SAP 10.2 spec prices
ADR-0010 P2: cert-calibration layer is deleted, the probe uses
SAP_10_2_SPEC_PRICES (already defined in cert_to_inputs.py). Extracts
a pure predict_sap_for_cert(cert_document, *, prices) -> int helper
out of main()'s inline pipeline so the spec-prices path is unit-
testable in isolation; the helper is also reusable for P3's cohort-
filtered probe variant.

The pinned regression value (SAP=67 for cert 6035-7729 under spec
prices, vs the cert's lodged SAP of 73 under cert-cal prices) lives
in services/ml_training_data/tests/unit/test_sap_parity_probe.py.
It will drift as P4 (PCDB) and the section sweep land their fixes;
that's expected.

cert_calibration_prices is still imported by test_golden_fixtures.py
and the table_12_cert_calibration module is intact. P2.2/P2.3 retire
those.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 09:51:42 +00:00
..
__init__.py slice 14a: ml_training_data pkg + sample.py (CSV filter + random sample) 2026-05-16 17:39:43 +00:00
test_build_features.py slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload) 2026-05-16 19:45:52 +00:00
test_bulk_zip_reader.py slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload) 2026-05-16 19:45:52 +00:00
test_sample.py slice 14a: ml_training_data pkg + sample.py (CSV filter + random sample) 2026-05-16 17:39:43 +00:00
test_sap_parity_probe.py P2.1: extract predict_sap_for_cert; swap probe to SAP 10.2 spec prices 2026-05-19 09:51:42 +00:00
test_storage.py slice 14c: BulkZipReader streams certs from gov bulk JSON ZIP 2026-05-16 18:27:24 +00:00
test_train_baseline.py slice 16i: MAE + RMSE in metrics; sample_weight_fn + low_sap_tail_weight 2026-05-17 14:48:00 +00:00
test_write_parquet.py slice 14e: write_training_dataset emits parquet + schema.json + manifest.json 2026-05-16 18:43:31 +00:00