docs(epc-prediction): handover for the accuracy backlog + geo work

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-06-15 15:12:00 +00:00
parent d8f015fb0e
commit da3fc92d53

View file

@ -0,0 +1,144 @@
# EPC Prediction — handover
Branch `feature/epc-prediction` @ `d8f015fb` (37 ahead of `origin/main`; local-only,
not pushed). Tree clean. All ranked backlog (#12221228) closed.
## What this is
Deterministic **neighbour synthesis** that predicts a structured `EpcPropertyData`
for an EPC-less UK home from its postcode-cohort of neighbours, so it flows through
the modelling pipeline. NOT ML. Validation methodology + harness are built; the work
is a measurable accuracy backlog.
## READ FIRST (hold the full state)
- Memory `project_epc_prediction` — the spine: design, every commit, metrics, the
open fronts, gotchas. Read it first.
- `docs/adr/0029-…` (design, 6 forks) and `docs/adr/0030-…component-first.md`
(validation methodology — internalise: predict components, SAP/carbon/PE are a
calculator-floored *secondary* guard).
- Memory `feedback_per_component_best_method` — THE load-bearing principle this
session established (see below).
- Convention memories: `feedback_aaa_test_convention`,
`feedback_abs_diff_over_pytest_approx`, `feedback_commit_per_slice`,
`feedback_bigger_slices_for_uniform_work`.
## The methodology (ADR-0030)
- **Component Accuracy is the PRIMARY signal** — predicted vs API-actual components,
calculator-free. SAP/CO₂/PE vs lodged is SECONDARY and calculator-floored.
- Source cohort keeps ALL cert vintages; only held-out validation TARGETS are
SAP 10.2 (`sap_version == 10.2`).
- The committed **Tier-1 gate** (`tests/domain/epc_prediction/test_component_accuracy_gate.py`)
runs the calculator-free scorer over the frozen anonymised fixture
(`tests/fixtures/epc_prediction/`, 36 SAP-10.2 targets) and asserts per-component
ratchet floors. Deterministic → exact. **Tighten-only**: when you improve a
component, bump its floor in the same commit. A *mapper or fixture change*
re-baselines floors (not a regression) — document it.
## THE PRINCIPLE that drove this session
**Give each component its own best-fit synthesis method; never force one global
mechanism on all of them.** Validated head-to-head on the harness:
- Permanent fabric categoricals (wall, age) → **physical-similarity-weighted mode**
(size×age toward cohort centre).
- Time-varying components (roof insulation, glazing) → **recency-weighted mode**.
- Coherence-coupled cluster (heating) → **coherent whole-cluster donor**, NEVER
field-moded.
- Point-estimate scalar (floor area) → **cohort median** (MAD-minimising).
- Geo-varying components (age, wall, floor, glazing) → additionally **geo-proximity
weighted**; roof showed no geo signal → excluded.
All live in `domain/epc_prediction/epc_prediction.py` as composable weight vectors
(`_similarity_weights` × `_recency_weights` × `_geo_weights`, combined via `_combine`,
fed to `_weighted_mode`).
## Closed this session (#1222 was done before; #12231228 this session)
- **#1226** per-prediction confidence (`PredictionConfidence`, compute-only;
agreement strongly predicts correctness, r=0.582).
- **#1224** physical-similarity-weighted categorical mode (wall_insul/roof/floor +13pp).
- **#1223** per-component, NOT a global recency template: floor-area→cohort median +
glazing→recency mode. (A global recency template was rejected — it disturbed the
coherence-coupled heating cluster.)
- **#1225** coherent heating donor (modal signature = fuel+category+cylinder, recency
tie-break). Biggest SAP lever: control 66→74%, SAP MAE 7.08→6.00 pre-merge.
- **#1228** PEI investigation — DISPROVED the unit-bug hypothesis (calc/lodged ratio
1.06); reframed as calc floor + prediction-sensitivity. Report now surfaces CO₂/PEI
calc floors. (Open calc-branch remnant; largely closed by the main merge — see below.)
- **#1227** geo-proximity weighting — grilled, signal-checked (STRONG GO, esp. age),
built per-component. Batch `GeospatialRepository.coordinates_for_uprns`, coords
threaded onto `Comparable`/`PredictionTarget`, haversine kernel (`_GEO_SCALE_KM=0.1`,
gate-safe optimum). Intra-postcode lift modest (cohort = 1 postcode); the bigger
prize is cross-postcode expansion (deferred, needs dense corpus).
- **Corpus grown 40→150 postcodes** (`6e9f8312`); roof-insulation ±1 reporting.
- **Merged `origin/main`** (96 commits of calculator/mapper gap fixes, `0b2827e9`).
## Current metrics (post-merge, 150-pc corpus, 514 SAP-10.2 targets)
Component Accuracy (calculator-free): wall 91.2, wall_insul 79.0, age 57.2 (±1 84.7),
roof_construction 78.2, floor_construction 79.6, heating_fuel 96.9, heating_category
95.7, heating_control 73.9, water_fuel 96.3, water_code 95.3, has_cylinder 89.7,
cylinder_insul 52.4, secondary 42.0, roof_insul 49.3 (±1 53.7), floor_insul 94.7,
room_in_roof 96.5, glazing 67.3, pv 98.8, solar 99.8.
Floor area: **MAE 10.48 m² / MAPE 13.2% / typical (median actual) 61 m²** (cohort
median, unweighted).
End-to-end vs lodged (SECONDARY, calculator-floored):
SAP pred MAE 6.25 / **calc floor 0.95** (was 1.57 pre-merge, orig 3.25 — the calc
fixes nearly validated the calculator, so the gap is now almost all prediction);
CO₂ 0.61 / floor 0.18; PEI 39.6 / floor 13.7.
## Key files
- `domain/epc_prediction/epc_prediction.py``EpcPrediction.predict`: median floor
area + per-component weighted modes + glazing + heating donor + overrides.
- `domain/epc_prediction/comparable_properties.py``select_comparables` ladder;
`Comparable`/`PredictionTarget` (carry `coordinates`).
- `domain/epc_prediction/prediction_comparison.py``compare_prediction` (25 signals).
- `domain/epc_prediction/validation.py``iter_predictions` + `evaluate_component_accuracy`
(one scorer, calculator-free).
- `harness/epc_prediction_corpus.py``load_corpus` (+ `_coordinates.json` sidecar),
`load_coordinates`, `anonymise_payload`.
- `repositories/geospatial/``GeospatialRepository.coordinates_for_uprns` (batch).
- `scripts/validate_epc_prediction.py` (full report), `build_epc_prediction_fixture.py`,
`fetch_epc_prediction_corpus.py`, `fetch_corpus_coordinates.py`.
## Open fronts (ranked)
1. **Geo-weighted floor-area median** — measured quick win: MAE 10.48→**9.77**,
MAPE 13.2→12.2%. Swap `_median_floor_area` for a geo-weighted median (reuse
`_geo_weights`); gate-check + ratchet the floor_area ceiling. Smallest next slice.
2. **Cross-postcode geo expansion** — the real geo payoff (distance-weighted cohort
beyond the single postcode). Needs a *densely-sampled* corpus (current 150 are
scattered, so a target's true geo-neighbours aren't in-corpus). Design grilled;
build a dense corpus first.
3. **Slice-5 production wiring**`ComparableProperties` repo + the
`ModellingOrchestrator` owning the EPC *estimation* + distance calcs (a deliberate
shift from ADR-0029, which put the fallback in Ingestion). WRITE AN ADR when this
lands (it reverses where the fallback lives). Add a provenance marker
(`EpcPropertyData` has no predicted/source field yet).
4. Weak components with headroom only via NEW signals: age 57% / roof_insul 49%
(method-exhausted — confirmed recency/similarity/plain all tie-or-worse);
cylinder_insul / secondary are tiny-n.
## How to run
- Token + S3 creds: `set -a; . backend/.env; set +a` (AWS creds mounted at `~/.aws`).
- Tests: `PYTHONPATH=. python -m pytest tests/domain/epc_prediction tests/harness/test_epc_prediction_corpus.py tests/repositories/geospatial -o addopts="" -p no:cacheprovider -q`
- Full report: `PYTHONPATH=. python scripts/validate_epc_prediction.py` (corpus
`/tmp/epc_prediction_corpus`).
- Gate is just a pytest test (deterministic, calculator-free).
- pyright strict, zero new errors, on every touched file.
## In-flight / gotchas
- **Corpus lives in `/tmp/epc_prediction_corpus`** (gitignored; 150 pc / 3719 certs +
`_coordinates.json`). Backed up to `/workspaces/home/epc_prediction_corpus_backup`
(persistent host mount — survives container rebuild; `/tmp` does NOT). Coords backup
at `/workspaces/home/epc_prediction_corpus_coords_backup.json`. If `/tmp` is wiped,
restore from the backup before running the full report.
- **Coordinates**: OS Open-UPRN parquet is `DATA_BUCKET/spatial/` (boto3 — s3fs NOT
installed; read via `get_object`→BytesIO; `boto3.client` needs
`# pyright: ignore[reportUnknownMemberType, reportUnknownVariableType]`). The cert
payload carries `uprn` (the join key). The committed fixture ships `_coordinates.json`
(OGL OS OpenData) so the gate exercises geo without S3.
- **NEVER commit** the API token, `/tmp` corpus, or the coords cache. The
`tests/fixtures/epc_prediction` one is anonymised + intentional.
- Conventions: AAA test headers; `abs(x-y) <= tol` not `pytest.approx`; commit per
slice (stage by name, watch untracked); ADR-cite in commit messages; class is
`EpcPrediction` (no "Service").
- Per-item workflow: implement TDD red→green on this branch → run the harness →
record before/after → ratchet gate floors → `gh issue comment` impact → close.
- The merge is **local, not pushed** — push only if asked.
- Update memory `project_epc_prediction` as state changes.