diff --git a/docs/HANDOVER_EPC_PREDICTION.md b/docs/HANDOVER_EPC_PREDICTION.md new file mode 100644 index 00000000..74b7c5f4 --- /dev/null +++ b/docs/HANDOVER_EPC_PREDICTION.md @@ -0,0 +1,144 @@ +# EPC Prediction — handover + +Branch `feature/epc-prediction` @ `d8f015fb` (37 ahead of `origin/main`; local-only, +not pushed). Tree clean. All ranked backlog (#1222–1228) closed. + +## What this is +Deterministic **neighbour synthesis** that predicts a structured `EpcPropertyData` +for an EPC-less UK home from its postcode-cohort of neighbours, so it flows through +the modelling pipeline. NOT ML. Validation methodology + harness are built; the work +is a measurable accuracy backlog. + +## READ FIRST (hold the full state) +- Memory `project_epc_prediction` — the spine: design, every commit, metrics, the + open fronts, gotchas. Read it first. +- `docs/adr/0029-…` (design, 6 forks) and `docs/adr/0030-…component-first.md` + (validation methodology — internalise: predict components, SAP/carbon/PE are a + calculator-floored *secondary* guard). +- Memory `feedback_per_component_best_method` — THE load-bearing principle this + session established (see below). +- Convention memories: `feedback_aaa_test_convention`, + `feedback_abs_diff_over_pytest_approx`, `feedback_commit_per_slice`, + `feedback_bigger_slices_for_uniform_work`. + +## The methodology (ADR-0030) +- **Component Accuracy is the PRIMARY signal** — predicted vs API-actual components, + calculator-free. SAP/CO₂/PE vs lodged is SECONDARY and calculator-floored. +- Source cohort keeps ALL cert vintages; only held-out validation TARGETS are + SAP 10.2 (`sap_version == 10.2`). +- The committed **Tier-1 gate** (`tests/domain/epc_prediction/test_component_accuracy_gate.py`) + runs the calculator-free scorer over the frozen anonymised fixture + (`tests/fixtures/epc_prediction/`, 36 SAP-10.2 targets) and asserts per-component + ratchet floors. Deterministic → exact. **Tighten-only**: when you improve a + component, bump its floor in the same commit. A *mapper or fixture change* + re-baselines floors (not a regression) — document it. + +## THE PRINCIPLE that drove this session +**Give each component its own best-fit synthesis method; never force one global +mechanism on all of them.** Validated head-to-head on the harness: +- Permanent fabric categoricals (wall, age) → **physical-similarity-weighted mode** + (size×age toward cohort centre). +- Time-varying components (roof insulation, glazing) → **recency-weighted mode**. +- Coherence-coupled cluster (heating) → **coherent whole-cluster donor**, NEVER + field-moded. +- Point-estimate scalar (floor area) → **cohort median** (MAD-minimising). +- Geo-varying components (age, wall, floor, glazing) → additionally **geo-proximity + weighted**; roof showed no geo signal → excluded. +All live in `domain/epc_prediction/epc_prediction.py` as composable weight vectors +(`_similarity_weights` × `_recency_weights` × `_geo_weights`, combined via `_combine`, +fed to `_weighted_mode`). + +## Closed this session (#1222 was done before; #1223–1228 this session) +- **#1226** per-prediction confidence (`PredictionConfidence`, compute-only; + agreement strongly predicts correctness, r=0.582). +- **#1224** physical-similarity-weighted categorical mode (wall_insul/roof/floor +1–3pp). +- **#1223** per-component, NOT a global recency template: floor-area→cohort median + + glazing→recency mode. (A global recency template was rejected — it disturbed the + coherence-coupled heating cluster.) +- **#1225** coherent heating donor (modal signature = fuel+category+cylinder, recency + tie-break). Biggest SAP lever: control 66→74%, SAP MAE 7.08→6.00 pre-merge. +- **#1228** PEI investigation — DISPROVED the unit-bug hypothesis (calc/lodged ratio + 1.06); reframed as calc floor + prediction-sensitivity. Report now surfaces CO₂/PEI + calc floors. (Open calc-branch remnant; largely closed by the main merge — see below.) +- **#1227** geo-proximity weighting — grilled, signal-checked (STRONG GO, esp. age), + built per-component. Batch `GeospatialRepository.coordinates_for_uprns`, coords + threaded onto `Comparable`/`PredictionTarget`, haversine kernel (`_GEO_SCALE_KM=0.1`, + gate-safe optimum). Intra-postcode lift modest (cohort = 1 postcode); the bigger + prize is cross-postcode expansion (deferred, needs dense corpus). +- **Corpus grown 40→150 postcodes** (`6e9f8312`); roof-insulation ±1 reporting. +- **Merged `origin/main`** (96 commits of calculator/mapper gap fixes, `0b2827e9`). + +## Current metrics (post-merge, 150-pc corpus, 514 SAP-10.2 targets) +Component Accuracy (calculator-free): wall 91.2, wall_insul 79.0, age 57.2 (±1 84.7), +roof_construction 78.2, floor_construction 79.6, heating_fuel 96.9, heating_category +95.7, heating_control 73.9, water_fuel 96.3, water_code 95.3, has_cylinder 89.7, +cylinder_insul 52.4, secondary 42.0, roof_insul 49.3 (±1 53.7), floor_insul 94.7, +room_in_roof 96.5, glazing 67.3, pv 98.8, solar 99.8. + +Floor area: **MAE 10.48 m² / MAPE 13.2% / typical (median actual) 61 m²** (cohort +median, unweighted). + +End-to-end vs lodged (SECONDARY, calculator-floored): +SAP pred MAE 6.25 / **calc floor 0.95** (was 1.57 pre-merge, orig 3.25 — the calc +fixes nearly validated the calculator, so the gap is now almost all prediction); +CO₂ 0.61 / floor 0.18; PEI 39.6 / floor 13.7. + +## Key files +- `domain/epc_prediction/epc_prediction.py` — `EpcPrediction.predict`: median floor + area + per-component weighted modes + glazing + heating donor + overrides. +- `domain/epc_prediction/comparable_properties.py` — `select_comparables` ladder; + `Comparable`/`PredictionTarget` (carry `coordinates`). +- `domain/epc_prediction/prediction_comparison.py` — `compare_prediction` (25 signals). +- `domain/epc_prediction/validation.py` — `iter_predictions` + `evaluate_component_accuracy` + (one scorer, calculator-free). +- `harness/epc_prediction_corpus.py` — `load_corpus` (+ `_coordinates.json` sidecar), + `load_coordinates`, `anonymise_payload`. +- `repositories/geospatial/` — `GeospatialRepository.coordinates_for_uprns` (batch). +- `scripts/validate_epc_prediction.py` (full report), `build_epc_prediction_fixture.py`, + `fetch_epc_prediction_corpus.py`, `fetch_corpus_coordinates.py`. + +## Open fronts (ranked) +1. **Geo-weighted floor-area median** — measured quick win: MAE 10.48→**9.77**, + MAPE 13.2→12.2%. Swap `_median_floor_area` for a geo-weighted median (reuse + `_geo_weights`); gate-check + ratchet the floor_area ceiling. Smallest next slice. +2. **Cross-postcode geo expansion** — the real geo payoff (distance-weighted cohort + beyond the single postcode). Needs a *densely-sampled* corpus (current 150 are + scattered, so a target's true geo-neighbours aren't in-corpus). Design grilled; + build a dense corpus first. +3. **Slice-5 production wiring** — `ComparableProperties` repo + the + `ModellingOrchestrator` owning the EPC *estimation* + distance calcs (a deliberate + shift from ADR-0029, which put the fallback in Ingestion). WRITE AN ADR when this + lands (it reverses where the fallback lives). Add a provenance marker + (`EpcPropertyData` has no predicted/source field yet). +4. Weak components with headroom only via NEW signals: age 57% / roof_insul 49% + (method-exhausted — confirmed recency/similarity/plain all tie-or-worse); + cylinder_insul / secondary are tiny-n. + +## How to run +- Token + S3 creds: `set -a; . backend/.env; set +a` (AWS creds mounted at `~/.aws`). +- Tests: `PYTHONPATH=. python -m pytest tests/domain/epc_prediction tests/harness/test_epc_prediction_corpus.py tests/repositories/geospatial -o addopts="" -p no:cacheprovider -q` +- Full report: `PYTHONPATH=. python scripts/validate_epc_prediction.py` (corpus + `/tmp/epc_prediction_corpus`). +- Gate is just a pytest test (deterministic, calculator-free). +- pyright strict, zero new errors, on every touched file. + +## In-flight / gotchas +- **Corpus lives in `/tmp/epc_prediction_corpus`** (gitignored; 150 pc / 3719 certs + + `_coordinates.json`). Backed up to `/workspaces/home/epc_prediction_corpus_backup` + (persistent host mount — survives container rebuild; `/tmp` does NOT). Coords backup + at `/workspaces/home/epc_prediction_corpus_coords_backup.json`. If `/tmp` is wiped, + restore from the backup before running the full report. +- **Coordinates**: OS Open-UPRN parquet is `DATA_BUCKET/spatial/` (boto3 — s3fs NOT + installed; read via `get_object`→BytesIO; `boto3.client` needs + `# pyright: ignore[reportUnknownMemberType, reportUnknownVariableType]`). The cert + payload carries `uprn` (the join key). The committed fixture ships `_coordinates.json` + (OGL OS OpenData) so the gate exercises geo without S3. +- **NEVER commit** the API token, `/tmp` corpus, or the coords cache. The + `tests/fixtures/epc_prediction` one is anonymised + intentional. +- Conventions: AAA test headers; `abs(x-y) <= tol` not `pytest.approx`; commit per + slice (stage by name, watch untracked); ADR-cite in commit messages; class is + `EpcPrediction` (no "Service"). +- Per-item workflow: implement TDD red→green on this branch → run the harness → + record before/after → ratchet gate floors → `gh issue comment` impact → close. +- The merge is **local, not pushed** — push only if asked. +- Update memory `project_epc_prediction` as state changes.