docs(epc-prediction): handover for the accuracy backlog + geo work

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 13:10:47 +00:00 · 2026-06-15 15:12:00 +00:00 · 2026-06-15 15:12:00 +00:00 · da3fc92d53
commit da3fc92d53
parent d8f015fb0e
1 changed files with 144 additions and 0 deletions
--- a/docs/HANDOVER_EPC_PREDICTION.md
+++ b/docs/HANDOVER_EPC_PREDICTION.md
@ -0,0 +1,144 @@
+# EPC Prediction — handover
+
+Branch `feature/epc-prediction` @ `d8f015fb` (37 ahead of `origin/main`; local-only,
+not pushed). Tree clean. All ranked backlog (#1222–1228) closed.
+
+## What this is
+Deterministic **neighbour synthesis** that predicts a structured `EpcPropertyData`
+for an EPC-less UK home from its postcode-cohort of neighbours, so it flows through
+the modelling pipeline. NOT ML. Validation methodology + harness are built; the work
+is a measurable accuracy backlog.
+
+## READ FIRST (hold the full state)
+- Memory `project_epc_prediction` — the spine: design, every commit, metrics, the
+  open fronts, gotchas. Read it first.
+- `docs/adr/0029-…` (design, 6 forks) and `docs/adr/0030-…component-first.md`
+  (validation methodology — internalise: predict components, SAP/carbon/PE are a
+  calculator-floored *secondary* guard).
+- Memory `feedback_per_component_best_method` — THE load-bearing principle this
+  session established (see below).
+- Convention memories: `feedback_aaa_test_convention`,
+  `feedback_abs_diff_over_pytest_approx`, `feedback_commit_per_slice`,
+  `feedback_bigger_slices_for_uniform_work`.
+
+## The methodology (ADR-0030)
+- **Component Accuracy is the PRIMARY signal** — predicted vs API-actual components,
+  calculator-free. SAP/CO₂/PE vs lodged is SECONDARY and calculator-floored.
+- Source cohort keeps ALL cert vintages; only held-out validation TARGETS are
+  SAP 10.2 (`sap_version == 10.2`).
+- The committed **Tier-1 gate** (`tests/domain/epc_prediction/test_component_accuracy_gate.py`)
+  runs the calculator-free scorer over the frozen anonymised fixture
+  (`tests/fixtures/epc_prediction/`, 36 SAP-10.2 targets) and asserts per-component
+  ratchet floors. Deterministic → exact. **Tighten-only**: when you improve a
+  component, bump its floor in the same commit. A *mapper or fixture change*
+  re-baselines floors (not a regression) — document it.
+
+## THE PRINCIPLE that drove this session
+**Give each component its own best-fit synthesis method; never force one global
+mechanism on all of them.** Validated head-to-head on the harness:
+- Permanent fabric categoricals (wall, age) → **physical-similarity-weighted mode**
+  (size×age toward cohort centre).
+- Time-varying components (roof insulation, glazing) → **recency-weighted mode**.
+- Coherence-coupled cluster (heating) → **coherent whole-cluster donor**, NEVER
+  field-moded.
+- Point-estimate scalar (floor area) → **cohort median** (MAD-minimising).
+- Geo-varying components (age, wall, floor, glazing) → additionally **geo-proximity
+  weighted**; roof showed no geo signal → excluded.
+All live in `domain/epc_prediction/epc_prediction.py` as composable weight vectors
+(`_similarity_weights` × `_recency_weights` × `_geo_weights`, combined via `_combine`,
+fed to `_weighted_mode`).
+
+## Closed this session (#1222 was done before; #1223–1228 this session)
+- **#1226** per-prediction confidence (`PredictionConfidence`, compute-only;
+  agreement strongly predicts correctness, r=0.582).
+- **#1224** physical-similarity-weighted categorical mode (wall_insul/roof/floor +1–3pp).
+- **#1223** per-component, NOT a global recency template: floor-area→cohort median +
+  glazing→recency mode. (A global recency template was rejected — it disturbed the
+  coherence-coupled heating cluster.)
+- **#1225** coherent heating donor (modal signature = fuel+category+cylinder, recency
+  tie-break). Biggest SAP lever: control 66→74%, SAP MAE 7.08→6.00 pre-merge.
+- **#1228** PEI investigation — DISPROVED the unit-bug hypothesis (calc/lodged ratio
+  1.06); reframed as calc floor + prediction-sensitivity. Report now surfaces CO₂/PEI
+  calc floors. (Open calc-branch remnant; largely closed by the main merge — see below.)
+- **#1227** geo-proximity weighting — grilled, signal-checked (STRONG GO, esp. age),
+  built per-component. Batch `GeospatialRepository.coordinates_for_uprns`, coords
+  threaded onto `Comparable`/`PredictionTarget`, haversine kernel (`_GEO_SCALE_KM=0.1`,
+  gate-safe optimum). Intra-postcode lift modest (cohort = 1 postcode); the bigger
+  prize is cross-postcode expansion (deferred, needs dense corpus).
+- **Corpus grown 40→150 postcodes** (`6e9f8312`); roof-insulation ±1 reporting.
+- **Merged `origin/main`** (96 commits of calculator/mapper gap fixes, `0b2827e9`).
+
+## Current metrics (post-merge, 150-pc corpus, 514 SAP-10.2 targets)
+Component Accuracy (calculator-free): wall 91.2, wall_insul 79.0, age 57.2 (±1 84.7),
+roof_construction 78.2, floor_construction 79.6, heating_fuel 96.9, heating_category
+95.7, heating_control 73.9, water_fuel 96.3, water_code 95.3, has_cylinder 89.7,
+cylinder_insul 52.4, secondary 42.0, roof_insul 49.3 (±1 53.7), floor_insul 94.7,
+room_in_roof 96.5, glazing 67.3, pv 98.8, solar 99.8.
+
+Floor area: **MAE 10.48 m² / MAPE 13.2% / typical (median actual) 61 m²** (cohort
+median, unweighted).
+
+End-to-end vs lodged (SECONDARY, calculator-floored):
+SAP pred MAE 6.25 / **calc floor 0.95** (was 1.57 pre-merge, orig 3.25 — the calc
+fixes nearly validated the calculator, so the gap is now almost all prediction);
+CO₂ 0.61 / floor 0.18; PEI 39.6 / floor 13.7.
+
+## Key files
+- `domain/epc_prediction/epc_prediction.py` — `EpcPrediction.predict`: median floor
+  area + per-component weighted modes + glazing + heating donor + overrides.
+- `domain/epc_prediction/comparable_properties.py` — `select_comparables` ladder;
+  `Comparable`/`PredictionTarget` (carry `coordinates`).
+- `domain/epc_prediction/prediction_comparison.py` — `compare_prediction` (25 signals).
+- `domain/epc_prediction/validation.py` — `iter_predictions` + `evaluate_component_accuracy`
+  (one scorer, calculator-free).
+- `harness/epc_prediction_corpus.py` — `load_corpus` (+ `_coordinates.json` sidecar),
+  `load_coordinates`, `anonymise_payload`.
+- `repositories/geospatial/` — `GeospatialRepository.coordinates_for_uprns` (batch).
+- `scripts/validate_epc_prediction.py` (full report), `build_epc_prediction_fixture.py`,
+  `fetch_epc_prediction_corpus.py`, `fetch_corpus_coordinates.py`.
+
+## Open fronts (ranked)
+1. **Geo-weighted floor-area median** — measured quick win: MAE 10.48→**9.77**,
+   MAPE 13.2→12.2%. Swap `_median_floor_area` for a geo-weighted median (reuse
+   `_geo_weights`); gate-check + ratchet the floor_area ceiling. Smallest next slice.
+2. **Cross-postcode geo expansion** — the real geo payoff (distance-weighted cohort
+   beyond the single postcode). Needs a *densely-sampled* corpus (current 150 are
+   scattered, so a target's true geo-neighbours aren't in-corpus). Design grilled;
+   build a dense corpus first.
+3. **Slice-5 production wiring** — `ComparableProperties` repo + the
+   `ModellingOrchestrator` owning the EPC *estimation* + distance calcs (a deliberate
+   shift from ADR-0029, which put the fallback in Ingestion). WRITE AN ADR when this
+   lands (it reverses where the fallback lives). Add a provenance marker
+   (`EpcPropertyData` has no predicted/source field yet).
+4. Weak components with headroom only via NEW signals: age 57% / roof_insul 49%
+   (method-exhausted — confirmed recency/similarity/plain all tie-or-worse);
+   cylinder_insul / secondary are tiny-n.
+
+## How to run
+- Token + S3 creds: `set -a; . backend/.env; set +a` (AWS creds mounted at `~/.aws`).
+- Tests: `PYTHONPATH=. python -m pytest tests/domain/epc_prediction tests/harness/test_epc_prediction_corpus.py tests/repositories/geospatial -o addopts="" -p no:cacheprovider -q`
+- Full report: `PYTHONPATH=. python scripts/validate_epc_prediction.py` (corpus
+  `/tmp/epc_prediction_corpus`).
+- Gate is just a pytest test (deterministic, calculator-free).
+- pyright strict, zero new errors, on every touched file.
+
+## In-flight / gotchas
+- **Corpus lives in `/tmp/epc_prediction_corpus`** (gitignored; 150 pc / 3719 certs +
+  `_coordinates.json`). Backed up to `/workspaces/home/epc_prediction_corpus_backup`
+  (persistent host mount — survives container rebuild; `/tmp` does NOT). Coords backup
+  at `/workspaces/home/epc_prediction_corpus_coords_backup.json`. If `/tmp` is wiped,
+  restore from the backup before running the full report.
+- **Coordinates**: OS Open-UPRN parquet is `DATA_BUCKET/spatial/` (boto3 — s3fs NOT
+  installed; read via `get_object`→BytesIO; `boto3.client` needs
+  `# pyright: ignore[reportUnknownMemberType, reportUnknownVariableType]`). The cert
+  payload carries `uprn` (the join key). The committed fixture ships `_coordinates.json`
+  (OGL OS OpenData) so the gate exercises geo without S3.
+- **NEVER commit** the API token, `/tmp` corpus, or the coords cache. The
+  `tests/fixtures/epc_prediction` one is anonymised + intentional.
+- Conventions: AAA test headers; `abs(x-y) <= tol` not `pytest.approx`; commit per
+  slice (stage by name, watch untracked); ADR-cite in commit messages; class is
+  `EpcPrediction` (no "Service").
+- Per-item workflow: implement TDD red→green on this branch → run the harness →
+  record before/after → ratchet gate floors → `gh issue comment` impact → close.
+- The merge is **local, not pushed** — push only if asked.
+- Update memory `project_epc_prediction` as state changes.