ADR-0010 P2: cert-calibration layer is deleted, the probe uses
SAP_10_2_SPEC_PRICES (already defined in cert_to_inputs.py). Extracts
a pure predict_sap_for_cert(cert_document, *, prices) -> int helper
out of main()'s inline pipeline so the spec-prices path is unit-
testable in isolation; the helper is also reusable for P3's cohort-
filtered probe variant.
The pinned regression value (SAP=67 for cert 6035-7729 under spec
prices, vs the cert's lodged SAP of 73 under cert-cal prices) lives
in services/ml_training_data/tests/unit/test_sap_parity_probe.py.
It will drift as P4 (PCDB) and the section sweep land their fixes;
that's expected.
cert_calibration_prices is still imported by test_golden_fixtures.py
and the table_12_cert_calibration module is intact. P2.2/P2.3 retire
those.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
train_baseline now returns mae + rmse alongside mape/smape/r2. MAE is the
user-facing metric ("predicted SAP within N points"); RMSE the quadratic
counterpart. Both come straight from sklearn.
New sample_weight_fn parameter: callable(y_train) -> per-row weights.
Threads into LGBMRegressor.fit's sample_weight argument. Default None
preserves existing behaviour.
Default tail strategy exposed as low_sap_tail_weight(y, threshold=58,
weight=3): 3x weight where SAP < 58. Threshold picked from slice 16h's
per-decile residuals — decile 0 (SAP 1-58) carries 17% MAPE vs <5% body.
Three TDD tracers, all AAA.
250k retrain showed objective='mape' loses ~0.6 percentage points of
global sap_score MAPE (3.92% with regression vs 4.50% with mape) and
~0.7 pts on peui_ucl. The mape objective over-weights the low-SAP tail
(weight ~1/y) and drags the body MAPE up by more than it gains in the
tail.
Body MAPE on v16 features is already strong (2.38% on deciles 1-8); the
remaining tail bias at decile 0 (SAP<58, +3.1 bias) needs a different
fix -- sample weights or stratified loss -- queued as slice 16i.
Per ADR-0008: the v15 baseline reports MAPE but optimises MSE, which
under-weights tail rows. Switching to objective='mape' applies gradient
proportional to 1/|y| and lets the model focus where MAPE penalises.
Targets co2_emissions, space_heating_kwh, hot_water_kwh, and peui_raw
retain the default 'regression' objective (some rows have ~zero CO2 from
heavy PV; MAPE objective destabilises near zero).
Sample weights deferred to slice 16i if slice 16h's per-decile residuals
still show tail bias after the objective switch.
Adds `_per_decile_residuals` and writes `residuals_<target>.json` next to
metrics.json. Buckets test-set rows by deciles of the true target value;
each bucket carries count + MAPE + MAE + mean residual + true_min/max.
Lets us tell whether errors concentrate in the tails of the true distribution
(e.g. SAP<40 / SAP>85) vs the mid-band — which the global MAPE alone hides.
Baseline for slice 16's MAPE-improvement ablations.
Bulk entries are NDJSON of wrapper records, not a JSON array. Each wrapper
carries certificate_number, assessment_type, and a stringified document with
the actual EPC schema payload. Filter to RdSAP, unwrap document, then map.
remote_bulk_fetcher: per-entry presigned-URL refresh (30s S3 TTL).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ijson use_float fixes Decimal/float coercion when streaming JSON.
pyright extraPaths so the new pkg type-checks against domna-domain.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>