Merge pull request #1238 from Hestia-Homes/feature/epc-prediction

Feature/epc prediction
2026-06-30 13:10:47 +00:00 · 2026-06-16 21:58:40 +08:00 · 2026-06-16 21:58:40 +08:00 · 90bed458f4
commit 90bed458f4
parent e1e570fdc7 a43c03ed94
332 changed files with 5804 additions and 42 deletions
--- a/CONTEXT.md
+++ b/CONTEXT.md
@ -62,9 +62,13 @@ A structured dataclass (`domain.addresses.user_address.UserAddress`) capturing a
 _Avoid_: user input, raw address, user_inputed_address

 **Comparable Properties**:
-The reference cohort matched to a target Property by both geographic proximity (postcode prefix / UPRN range) and physical similarity (property type, built form, age band); used by the EPC Prediction Service for gap-filling and anomaly detection.
+The reference cohort matched to a target Property, used by **EPC Prediction** for gap-filling and anomaly detection. Selected by a **filter-then-relax ladder**: hard filters on identity (property type, built form) and any known **Landlord Override** (e.g. a known solid-brick wall) while at least *k* remain, widening the geographic scope (postcode → postcode-prefix) or demoting a known field to a weight when sparse. Survivors are weighted for prediction by **geographic proximity × recency × physical similarity** — closer, newer (newer EPCs are higher quality), more-similar comparables count more. **All cert vintages are kept** as source evidence — a building's physical components (wall / roof / floor / heating fuel / age) are agnostic of the SAP methodology that rated them, so a pre-SAP10 neighbour is valid evidence; recency is a graduated **weight**, never a hard drop, and matters most for the one component that genuinely goes stale (the heating system, when a boiler is replaced). Only the **validation target** is restricted to SAP 10.2 — see [[validation-cohort]] / [[sap-spec-version]] — because performance can only be checked against a same-spec lodged figure.
 _Avoid_: neighbours, similar properties, peer set

+**EPC Prediction**:
+Producing a Property's `EpcPropertyData` picture from its **Comparable Properties** when it has no EPC (~30% of UK homes, typically long-tenure). **Deterministic** neighbour synthesis (k-NN-style — *not* ML; no trained model): take the cohort **mode** for the homogeneous categoricals (wall / roof / floor construction + insulation, construction age band), copy a single representative comparable's **structure** wholesale (building parts, per-window dimensions + orientations, floor dimensions) so the picture stays internally consistent for the calculator, then apply **Landlord Overrides** and the known inputs on top. The result is scored through **SAP10 Calculation** like any other **Effective EPC**, so a predicted Property flows through Rebaselining, Bill Derivation, and Modelling unchanged — held in a **distinct predicted-EPC slot** that coexists with any lodged EPC (so provenance is structural and the UI can flag it; see ADR-0031). A **known property type is required** — the hard cohort filter (a flat is never sized from houses) — supplied by a **Landlord Override** (or, later, an Ordnance Survey lookup); a Property whose property type is genuinely unknown is **gated out**, never predicted from a mixed-type cohort and never given a national default. The same cohort machinery also produces **EPC Anomaly Flags** for Properties that *do* have an EPC. A future learned-weighting refinement is possible but separate, as with the calculator's ML residual head.
+_Avoid_: EpcPredictionService (no "service" suffix — name the operation), ML prediction (it is deterministic), EPC estimation
+
 ### Survey documents

 **Ventilation Audit**:
@ -88,15 +92,15 @@ _Avoid_: patches (deprecated), corrections, manual EPC, edits
 ### Modelling

 **Effective EPC**:
-The assembled `EpcPropertyData` picture the modelling pipeline scores for a single Property. Assembled from whichever source applies: Site Notes alone; or the public EPC with **Landlord Overrides** applied; or — when the EPC is **old** — its schema re-mapped to current via **Reduced-Field Synthesis** (deterministic, from the cert plus calibrated coefficients — no neighbour data); or — when there is **no EPC** — components **estimated from surrounding properties** (a separate neighbour-prediction ML mechanism, not yet implemented). Carries source-derived physical fields and originally recorded performance values; the performance scored from this picture is held separately in **Baseline Performance**.
+The assembled `EpcPropertyData` picture the modelling pipeline scores for a single Property. Assembled from whichever source applies: Site Notes alone; or the public EPC with **Landlord Overrides** applied; or — when the EPC is **old** — its schema re-mapped to current via **Reduced-Field Synthesis** (deterministic, from the cert plus calibrated coefficients — no neighbour data); or — when there is **no EPC** — components **estimated from surrounding properties** via **EPC Prediction** (deterministic neighbour synthesis). Carries source-derived physical fields and originally recorded performance values; the performance scored from this picture is held separately in **Baseline Performance**.
 _Avoid_: modelling EPC, working EPC, resolved EPC, derived EPC

 **Rebaselining**:
-Establishing a Property's **Effective Performance** (SAP score, EPC Band, CO2, Primary Energy Intensity, space-heating & hot-water kWh) by **assembling the Effective EPC picture and scoring it** through **SAP10 Calculation** (the deterministic `Sap10Calculator`, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013). The *assembly* is the substance: apply **Landlord Overrides** (e.g. boiler → ASHP, wall insulated) as a simulation on the `EpcPropertyData`; re-map an old-schema EPC to current via **Reduced-Field Synthesis** (deterministic, cert-only); estimate components from surrounding properties when there is no EPC (neighbour-prediction gap-fill — a separate ML mechanism, not yet implemented). The calculator is the **scoring engine at the tail**, not the whole of Rebaselining — so its call lives inside the Rebaseliner, after assembly. Triggered whenever the assembled picture differs from the lodged record: (a) the EPC was lodged under a methodology the calculator supersedes (`sap_version < 10.2`), (b) Overrides / Site Notes changed the physical state (walls / heating / windows / etc.), or (c) the picture is estimated or remapped rather than a real current EPC. Produces Effective Performance; Lodged Performance is preserved unchanged. The same single scoring also yields the per-end-use kWh that **Bill Derivation** prices — one scoring, two products. kWh is an ML target per ADR-0007 — see [[epc-ml-transform]].
+Establishing a Property's **Effective Performance** (SAP score, EPC Band, CO2, Primary Energy Intensity, space-heating & hot-water kWh) by **assembling the Effective EPC picture and scoring it** through **SAP10 Calculation** (the deterministic `Sap10Calculator`, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013). The *assembly* is the substance: apply **Landlord Overrides** (e.g. boiler → ASHP, wall insulated) as a simulation on the `EpcPropertyData`; re-map an old-schema EPC to current via **Reduced-Field Synthesis** (deterministic, cert-only); estimate components from surrounding properties when there is no EPC (**EPC Prediction** — deterministic neighbour gap-fill). The calculator is the **scoring engine at the tail**, not the whole of Rebaselining — so its call lives inside the Rebaseliner, after assembly. Triggered whenever the assembled picture differs from the lodged record: (a) the EPC was lodged under a methodology the calculator supersedes (`sap_version < 10.2`), (b) Overrides / Site Notes changed the physical state (walls / heating / windows / etc.), or (c) the picture is estimated or remapped rather than a real current EPC. Produces Effective Performance; Lodged Performance is preserved unchanged. The same single scoring also yields the per-end-use kWh that **Bill Derivation** prices — one scoring, two products. kWh is an ML target per ADR-0007 — see [[epc-ml-transform]].
 _Avoid_: re-scoring, re-prediction, performance recomputation, refresh (for cache-freshness)

 **Reduced-Field Synthesis**:
-Deterministically translating an **old / reduced-data EPC schema** into the current `EpcPropertyData`, synthesising the *measured* fields the target expects from the source's *reduced or categorical* fields, using only the cert itself plus fixed calibrated coefficients — never neighbour data. Used when re-mapping a **pre-SAP10** cert (e.g. `RdSAP-Schema-20.0.0`) as part of assembling the **Effective EPC**: e.g. a glazing-area *band* + floor area → window m²; bath/shower *room counts* → bath and shower counts. A *best attempt* with no ground truth to validate against (per the **Validation Cohort** rule, a pre-SAP10 cert has no same-spec lodged figure to check), so each synthesis assumption is recorded explicitly in code and tests to keep it debuggable. Distinct from **neighbour-prediction gap-fill** (ML estimation of genuinely-absent fields from surrounding properties — the no-EPC path, a separate mechanism not yet implemented) and from the calculator's own RdSAP Table-5 defaulting in `cert_to_inputs` (which expands `EpcPropertyData` into the full SAP input set downstream).
+Deterministically translating an **old / reduced-data EPC schema** into the current `EpcPropertyData`, synthesising the *measured* fields the target expects from the source's *reduced or categorical* fields, using only the cert itself plus fixed calibrated coefficients — never neighbour data. Used when re-mapping a **pre-SAP10** cert (e.g. `RdSAP-Schema-20.0.0`) as part of assembling the **Effective EPC**: e.g. a glazing-area *band* + floor area → window m²; bath/shower *room counts* → bath and shower counts. A *best attempt* with no ground truth to validate against (per the **Validation Cohort** rule, a pre-SAP10 cert has no same-spec lodged figure to check), so each synthesis assumption is recorded explicitly in code and tests to keep it debuggable. Distinct from **EPC Prediction** (deterministic neighbour estimation of genuinely-absent fields from surrounding properties — the no-EPC path) and from the calculator's own RdSAP Table-5 defaulting in `cert_to_inputs` (which expands `EpcPropertyData` into the full SAP input set downstream).
 _Avoid_: gap-fill (means the neighbour-ML path), reduced-data expansion (overloaded with the calculator's Table-5 step), remapping (the schema-translation part only)

 **Baseline Performance**:
@ -125,6 +129,10 @@ _Avoid_: SAP version (ambiguous with the `sap_version` field on the cert, which

 **Validation Cohort**:
 The subset of corpus certs used to validate **SAP10 Calculation** against **Lodged Performance**, filtered to certs lodged after the calculator's target **SAP Spec Version** rolled out in commercial assessor software — currently `inspection_date ≥ 2025-07-01` (a buffer past 14-03-2025 to allow vendor rollout). Smaller than the full corpus but each cert is comparable under the same spec, so probe MAE is a clean signal of calculator-vs-spec correctness rather than spec-version mixture noise. ADR-0010.
+
+**Component Accuracy**:
+The primary, **calculator-independent** measure of **EPC Prediction** quality: how closely the predicted `EpcPropertyData` *components* (heating fuel + category + controls, hot water, wall / roof / floor construction + insulation, age band, glazing, doors, floor area + geometry) match the actual ones, scored by leave-one-out over a held-out target. Categoricals score as a classification hit-rate, numerics as a residual. Load-bearing principle: **predict the components well and correct SAP / carbon / PE fall out once calculator gaps close** — so Component Accuracy is what prediction is tuned against, while `calc(predicted)` vs API-lodged SAP / carbon / PE is a secondary, **calculator-floored** end-to-end check. The held-out target must be a **SAP 10.2** cert (`sap_version == 10.2`) — the only vintage with full-fidelity lodged components — but the source **Comparable Properties** cohort keeps all vintages. Never validated by `calc(predicted)` vs `calc(actual)`: that cancels (and so hides) calculator error against a circular ground truth.
+_Avoid_: prediction accuracy (vague), SAP accuracy (that is the calculator-floored end-to-end check, not the primary signal)
 _Avoid_: parity cohort, validation set, corpus sample

 **Measure Application**:
@ -361,7 +369,7 @@ _Avoid_: API key, auth token, secret
 - A Property's **Baseline Performance** holds two halves: **Lodged Performance** (the gov register's SAP / band / carbon / heat) and **Effective Performance** (what the modelling pipeline scored against). The two are equal unless **Rebaselining** fires.
 - **Rebaselining** produces **Effective Performance** by ML re-prediction across SAP score, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh, when either (a) the Effective EPC was lodged under a pre-SAP10 schema, or (b) the Effective EPC's physical state diverges from the lodged EPC. **Lodged Performance** is never overwritten.
 - **Bill Derivation** derives **fuel split** and **bills** from kWh values (sourced from the EPC's `renewable_heat_incentive` fields for baseline SAP10 properties, or from ML when Rebaselining fires), reading current **Fuel Rates** and **Carbon Factors** from their respective repos.
- The **EPC Prediction Service** uses **Comparable Properties** for both gap-filling and producing **EPC Anomaly Flags**.
+- **EPC Prediction** uses **Comparable Properties** for both gap-filling (the no-EPC path) and producing **EPC Anomaly Flags** (the has-EPC path).
 - Triggering the model against N **Scenarios** produces N **Plans** per Property. Each **Plan** holds one **Optimised Package** — its selected **Plan Measures** — plus the Property's post-retrofit figures.
 - A **Scenario Snapshot** is pinned at trigger time per (task, scenario) so mid-run edits to the live Scenario do not affect an in-flight modelling job.
 - A **Recommendation** references one **Measure Type** and carries property-specific cost and impact.
--- a/applications/ara_first_run/handler.py
+++ b/applications/ara_first_run/handler.py
@ -10,6 +10,7 @@ from sqlmodel import Session
 from applications.ara_first_run.ara_first_run_trigger_body import (
    AraFirstRunTriggerBody,
 )
+from domain.epc_prediction.epc_prediction import EpcPrediction
 from domain.property_baseline.calculator_rebaseliner import CalculatorRebaseliner
 from domain.sap10_calculator.calculator import Sap10Calculator
 from infrastructure.postgres.config import PostgresConfig
@ -17,8 +18,10 @@ from infrastructure.postgres.engine import make_engine
 from orchestration.property_baseline_orchestrator import PropertyBaselineOrchestrator
 from orchestration.ara_first_run_pipeline import AraFirstRunPipeline
 from orchestration.ingestion_orchestrator import (
+    ComparablesRepo,
    EpcFetcher,
    IngestionOrchestrator,
+    PredictionAttributesReader,
    SolarFetcher,
 )
 from orchestration.modelling_orchestrator import ModellingOrchestrator
@ -65,12 +68,23 @@ def build_first_run_pipeline(
    epc_fetcher: EpcFetcher,
    geospatial_repo: GeospatialRepository,
    solar_fetcher: SolarFetcher,
+    comparables_repo: Optional[ComparablesRepo] = None,
+    prediction_attributes_reader: Optional[PredictionAttributesReader] = None,
 ) -> AraFirstRunPipeline:
    """Compose the real three-stage pipeline on a Unit-of-Work factory.

    Each stage opens its own unit(s) and commits per batch (ADR-0012); the
    handler no longer holds a session. The source clients are passed in because
    their config is not settled — see ``_source_clients_from_env``.
+
+    EPC Prediction gap-fill (ADR-0031) is the predictor itself (pure) plus two
+    injected collaborators: the postcode-cohort source and the Landlord-Override
+    attributes reader. Both default to None, so the feature is **off** until they
+    are supplied — an EPC-less Property is then predicted into its predicted slot.
+    The cohort repo is injected (not built here) because its EPC client is the
+    same source client whose wiring is still pending; the attributes reader is the
+    `property_overrides` read adapter built separately. Until both are passed,
+    ingestion behaves exactly as before.
    """
    return AraFirstRunPipeline(
        ingestion=IngestionOrchestrator(
@ -78,6 +92,9 @@ def build_first_run_pipeline(
            epc_fetcher=epc_fetcher,
            geospatial_repo=geospatial_repo,
            solar_fetcher=solar_fetcher,
+            comparables_repo=comparables_repo,
+            prediction_attributes_reader=prediction_attributes_reader,
+            epc_prediction=EpcPrediction(),
        ),
        baseline=PropertyBaselineOrchestrator(
            unit_of_work=unit_of_work,
--- a/docs/HANDOVER_EPC_PREDICTION.md
+++ b/docs/HANDOVER_EPC_PREDICTION.md
@ -0,0 +1,144 @@
+# EPC Prediction — handover
+
+Branch `feature/epc-prediction` @ `d8f015fb` (37 ahead of `origin/main`; local-only,
+not pushed). Tree clean. All ranked backlog (#1222–1228) closed.
+
+## What this is
+Deterministic **neighbour synthesis** that predicts a structured `EpcPropertyData`
+for an EPC-less UK home from its postcode-cohort of neighbours, so it flows through
+the modelling pipeline. NOT ML. Validation methodology + harness are built; the work
+is a measurable accuracy backlog.
+
+## READ FIRST (hold the full state)
+- Memory `project_epc_prediction` — the spine: design, every commit, metrics, the
+  open fronts, gotchas. Read it first.
+- `docs/adr/0029-…` (design, 6 forks) and `docs/adr/0030-…component-first.md`
+  (validation methodology — internalise: predict components, SAP/carbon/PE are a
+  calculator-floored *secondary* guard).
+- Memory `feedback_per_component_best_method` — THE load-bearing principle this
+  session established (see below).
+- Convention memories: `feedback_aaa_test_convention`,
+  `feedback_abs_diff_over_pytest_approx`, `feedback_commit_per_slice`,
+  `feedback_bigger_slices_for_uniform_work`.
+
+## The methodology (ADR-0030)
+- **Component Accuracy is the PRIMARY signal** — predicted vs API-actual components,
+  calculator-free. SAP/CO₂/PE vs lodged is SECONDARY and calculator-floored.
+- Source cohort keeps ALL cert vintages; only held-out validation TARGETS are
+  SAP 10.2 (`sap_version == 10.2`).
+- The committed **Tier-1 gate** (`tests/domain/epc_prediction/test_component_accuracy_gate.py`)
+  runs the calculator-free scorer over the frozen anonymised fixture
+  (`tests/fixtures/epc_prediction/`, 36 SAP-10.2 targets) and asserts per-component
+  ratchet floors. Deterministic → exact. **Tighten-only**: when you improve a
+  component, bump its floor in the same commit. A *mapper or fixture change*
+  re-baselines floors (not a regression) — document it.
+
+## THE PRINCIPLE that drove this session
+**Give each component its own best-fit synthesis method; never force one global
+mechanism on all of them.** Validated head-to-head on the harness:
+- Permanent fabric categoricals (wall, age) → **physical-similarity-weighted mode**
+  (size×age toward cohort centre).
+- Time-varying components (roof insulation, glazing) → **recency-weighted mode**.
+- Coherence-coupled cluster (heating) → **coherent whole-cluster donor**, NEVER
+  field-moded.
+- Point-estimate scalar (floor area) → **cohort median** (MAD-minimising).
+- Geo-varying components (age, wall, floor, glazing) → additionally **geo-proximity
+  weighted**; roof showed no geo signal → excluded.
+All live in `domain/epc_prediction/epc_prediction.py` as composable weight vectors
+(`_similarity_weights` × `_recency_weights` × `_geo_weights`, combined via `_combine`,
+fed to `_weighted_mode`).
+
+## Closed this session (#1222 was done before; #1223–1228 this session)
+- **#1226** per-prediction confidence (`PredictionConfidence`, compute-only;
+  agreement strongly predicts correctness, r=0.582).
+- **#1224** physical-similarity-weighted categorical mode (wall_insul/roof/floor +1–3pp).
+- **#1223** per-component, NOT a global recency template: floor-area→cohort median +
+  glazing→recency mode. (A global recency template was rejected — it disturbed the
+  coherence-coupled heating cluster.)
+- **#1225** coherent heating donor (modal signature = fuel+category+cylinder, recency
+  tie-break). Biggest SAP lever: control 66→74%, SAP MAE 7.08→6.00 pre-merge.
+- **#1228** PEI investigation — DISPROVED the unit-bug hypothesis (calc/lodged ratio
+  1.06); reframed as calc floor + prediction-sensitivity. Report now surfaces CO₂/PEI
+  calc floors. (Open calc-branch remnant; largely closed by the main merge — see below.)
+- **#1227** geo-proximity weighting — grilled, signal-checked (STRONG GO, esp. age),
+  built per-component. Batch `GeospatialRepository.coordinates_for_uprns`, coords
+  threaded onto `Comparable`/`PredictionTarget`, haversine kernel (`_GEO_SCALE_KM=0.1`,
+  gate-safe optimum). Intra-postcode lift modest (cohort = 1 postcode); the bigger
+  prize is cross-postcode expansion (deferred, needs dense corpus).
+- **Corpus grown 40→150 postcodes** (`6e9f8312`); roof-insulation ±1 reporting.
+- **Merged `origin/main`** (96 commits of calculator/mapper gap fixes, `0b2827e9`).
+
+## Current metrics (post-merge, 150-pc corpus, 514 SAP-10.2 targets)
+Component Accuracy (calculator-free): wall 91.2, wall_insul 79.0, age 57.2 (±1 84.7),
+roof_construction 78.2, floor_construction 79.6, heating_fuel 96.9, heating_category
+95.7, heating_control 73.9, water_fuel 96.3, water_code 95.3, has_cylinder 89.7,
+cylinder_insul 52.4, secondary 42.0, roof_insul 49.3 (±1 53.7), floor_insul 94.7,
+room_in_roof 96.5, glazing 67.3, pv 98.8, solar 99.8.
+
+Floor area: **MAE 10.48 m² / MAPE 13.2% / typical (median actual) 61 m²** (cohort
+median, unweighted).
+
+End-to-end vs lodged (SECONDARY, calculator-floored):
+SAP pred MAE 6.25 / **calc floor 0.95** (was 1.57 pre-merge, orig 3.25 — the calc
+fixes nearly validated the calculator, so the gap is now almost all prediction);
+CO₂ 0.61 / floor 0.18; PEI 39.6 / floor 13.7.
+
+## Key files
+- `domain/epc_prediction/epc_prediction.py` — `EpcPrediction.predict`: median floor
+  area + per-component weighted modes + glazing + heating donor + overrides.
+- `domain/epc_prediction/comparable_properties.py` — `select_comparables` ladder;
+  `Comparable`/`PredictionTarget` (carry `coordinates`).
+- `domain/epc_prediction/prediction_comparison.py` — `compare_prediction` (25 signals).
+- `domain/epc_prediction/validation.py` — `iter_predictions` + `evaluate_component_accuracy`
+  (one scorer, calculator-free).
+- `harness/epc_prediction_corpus.py` — `load_corpus` (+ `_coordinates.json` sidecar),
+  `load_coordinates`, `anonymise_payload`.
+- `repositories/geospatial/` — `GeospatialRepository.coordinates_for_uprns` (batch).
+- `scripts/validate_epc_prediction.py` (full report), `build_epc_prediction_fixture.py`,
+  `fetch_epc_prediction_corpus.py`, `fetch_corpus_coordinates.py`.
+
+## Open fronts (ranked)
+1. **Geo-weighted floor-area median** — measured quick win: MAE 10.48→**9.77**,
+   MAPE 13.2→12.2%. Swap `_median_floor_area` for a geo-weighted median (reuse
+   `_geo_weights`); gate-check + ratchet the floor_area ceiling. Smallest next slice.
+2. **Cross-postcode geo expansion** — the real geo payoff (distance-weighted cohort
+   beyond the single postcode). Needs a *densely-sampled* corpus (current 150 are
+   scattered, so a target's true geo-neighbours aren't in-corpus). Design grilled;
+   build a dense corpus first.
+3. **Slice-5 production wiring** — `ComparableProperties` repo + the
+   `ModellingOrchestrator` owning the EPC *estimation* + distance calcs (a deliberate
+   shift from ADR-0029, which put the fallback in Ingestion). WRITE AN ADR when this
+   lands (it reverses where the fallback lives). Add a provenance marker
+   (`EpcPropertyData` has no predicted/source field yet).
+4. Weak components with headroom only via NEW signals: age 57% / roof_insul 49%
+   (method-exhausted — confirmed recency/similarity/plain all tie-or-worse);
+   cylinder_insul / secondary are tiny-n.
+
+## How to run
+- Token + S3 creds: `set -a; . backend/.env; set +a` (AWS creds mounted at `~/.aws`).
+- Tests: `PYTHONPATH=. python -m pytest tests/domain/epc_prediction tests/harness/test_epc_prediction_corpus.py tests/repositories/geospatial -o addopts="" -p no:cacheprovider -q`
+- Full report: `PYTHONPATH=. python scripts/validate_epc_prediction.py` (corpus
+  `/tmp/epc_prediction_corpus`).
+- Gate is just a pytest test (deterministic, calculator-free).
+- pyright strict, zero new errors, on every touched file.
+
+## In-flight / gotchas
+- **Corpus lives in `/tmp/epc_prediction_corpus`** (gitignored; 150 pc / 3719 certs +
+  `_coordinates.json`). Backed up to `/workspaces/home/epc_prediction_corpus_backup`
+  (persistent host mount — survives container rebuild; `/tmp` does NOT). Coords backup
+  at `/workspaces/home/epc_prediction_corpus_coords_backup.json`. If `/tmp` is wiped,
+  restore from the backup before running the full report.
+- **Coordinates**: OS Open-UPRN parquet is `DATA_BUCKET/spatial/` (boto3 — s3fs NOT
+  installed; read via `get_object`→BytesIO; `boto3.client` needs
+  `# pyright: ignore[reportUnknownMemberType, reportUnknownVariableType]`). The cert
+  payload carries `uprn` (the join key). The committed fixture ships `_coordinates.json`
+  (OGL OS OpenData) so the gate exercises geo without S3.
+- **NEVER commit** the API token, `/tmp` corpus, or the coords cache. The
+  `tests/fixtures/epc_prediction` one is anonymised + intentional.
+- Conventions: AAA test headers; `abs(x-y) <= tol` not `pytest.approx`; commit per
+  slice (stage by name, watch untracked); ADR-cite in commit messages; class is
+  `EpcPrediction` (no "Service").
+- Per-item workflow: implement TDD red→green on this branch → run the harness →
+  record before/after → ratchet gate floors → `gh issue comment` impact → close.
+- The merge is **local, not pushed** — push only if asked.
+- Update memory `project_epc_prediction` as state changes.
--- a/docs/HANDOVER_EPC_PREDICTION_WIRING.md
+++ b/docs/HANDOVER_EPC_PREDICTION_WIRING.md
@ -0,0 +1,99 @@
+# EPC Prediction — production wiring handover (for Jun-te)
+
+The EPC Prediction **gap-fill** is wired end-to-end behind seams, with one real
+dependency stubbed: reading an EPC-less Property's resolved Landlord Overrides.
+This note is what's needed to finish it once your `property_overrides` read path
+lands. Design is **ADR-0031**; terms in **CONTEXT.md** (EPC Prediction, Effective
+EPC, EPC Anomaly Flag).
+
+## What's already built (slices 5a–5e, all on `feature/epc-prediction`)
+
+- **5a** `Property.predicted_epc` slot + a `"predicted"` `source_path` /
+  `effective_epc` branch — used only when there's no lodged EPC and no Site Notes
+  (a real source always wins).
+- **5b** `ComparablePropertiesRepository.candidates_for(postcode)` +
+  `EpcComparablePropertiesRepository` adapter (postcode search → per-cert fetch →
+  batched UPRN→coords). Composes with `EpcClientService` + `GeospatialS3Repository`.
+- **5c** EPC store `source` discriminator (`lodged` | `predicted`) so the two
+  coexist per property; `get_predicted_for_property` / `_for_properties`;
+  `PropertyPostgresRepository` hydrates `predicted_epc`. **Needs a DB migration —
+  see `docs/MIGRATION_NOTE_predicted_epc_source.md`.**
+- **5d** `build_prediction_target(identity, coords, attributes)` + the eligibility
+  **gate** (unknown `property_type` → not predicted). Override attributes come
+  through the `PredictionTargetAttributesReader` port (the stub).
+- **5e** `IngestionOrchestrator` wiring: when the three prediction collaborators
+  are injected, an EPC-less Property is predicted from its cohort and persisted to
+  the predicted slot. The collaborators are **optional** — unwired, ingestion is
+  unchanged.
+
+## Your part — three things
+
+### 1. Implement `PredictionTargetAttributesReader` (the stub)
+
+`repositories/property/prediction_target_attributes_reader.py` defines the port:
+`attributes_for(property_id) -> PredictionTargetAttributes` (property_type,
+built_form, wall_construction). Build the adapter as a read over the
+`property_overrides` fact layer (the finaliser writes it via
+`PropertyOverrideRepository.upsert_all`; you're adding the read side).
+
+**Code-space gotcha.** `select_comparables` filters
+`comparable.epc.property_type == target.property_type`, and the cohort EPCs carry
+gov **API codes** (e.g. `"0"`/`"2"`). Landlord Overrides resolve to enum *value*
+strings (e.g. `"House"`). Your adapter must map override value → the API-code
+space, or `property_type` will never match and every cohort comes back empty.
+Same for `built_form`. (`domain/epc/property_type.py`, `built_form_type.py` are
+the enums; `datatypes/epc/domain/epc_codes.csv` has the code table.)
+`property_type` unresolved → return `PredictionTargetAttributes(property_type=None)`
+so the gate skips the Property.
+
+### 2. Run the Drizzle migration
+
+`epc_property.source` column — full spec in
+`docs/MIGRATION_NOTE_predicted_epc_source.md` (column + default `'lodged'` +
+relax any `property_id` uniqueness to `(property_id, source)`).
+
+### 3. Pass the two collaborators at the composition root
+
+This is now wired: `build_first_run_pipeline` (in `applications/ara_first_run/handler.py`)
+already constructs `epc_prediction=EpcPrediction()` and accepts the other two as
+optional params that it threads into the `IngestionOrchestrator`. So the on-switch
+is just supplying them once they exist:
+
+```python
+build_first_run_pipeline(
+    ...,
+    comparables_repo=EpcComparablePropertiesRepository(epc_client, geospatial_repo),
+    prediction_attributes_reader=<your property_overrides adapter>,  # task #1
+)
+```
+
+`epc_client` is the same EPC source client behind `epc_fetcher` (the concrete
+`EpcClientService` exposes `search_by_postcode` + `get_by_certificate_number`),
+so build it alongside the other source clients in `_source_clients_from_env`
+(pending #1136). Until **both** are passed, ingestion ignores prediction — no
+orchestrator or handler edits needed, just the two arguments.
+
+## One open item — Validation Cohort exclusion
+
+A predicted-source Property has **no real lodged record**, so it must not be
+scored as if it did (CONTEXT: Validation Cohort; ADR-0031 dec-3). There is **no
+Validation-Cohort code path today** to exclude it from — when one is built (or in
+any QA that compares `calc(effective_epc)` vs lodged), exclude on the structural
+signal:
+
+```python
+if prop.source_path == "predicted":
+    continue  # predicted EPC — no ground truth to validate against
+```
+
+Note too: `PropertyBaselinePerformance.lodged` is derived from `effective_epc`
+regardless of source (`property_baseline_orchestrator` → `lodged_performance`), so
+for a predicted Property that "lodged" is synthesised, not real. Decide whether
+baseline should null/flag it for predicted properties when this lands.
+
+## Anomaly dual-use (later, not now)
+
+Slice-5 is gap-fill only (`epc is None`). The slot model already supports
+predicting for *every* Property to compare predicted vs lodged (**EPC Anomaly
+Flags**) — see ADR-0031 dec-4. Reuses the same `ComparableProperties` repo + the
+predicted slot.
--- a/docs/MIGRATION_NOTE_predicted_epc_source.md
+++ b/docs/MIGRATION_NOTE_predicted_epc_source.md
@ -0,0 +1,51 @@
+# Migration note — `epc_property.source` (predicted-EPC slot)
+
+**For the team to action before merging the EPC Prediction production-wiring
+branch.** The model-side code is done and tested against the SQLModel-built test
+DB; the **production Drizzle schema needs a matching column** that this repo does
+not own.
+
+## What changed in code (this branch)
+
+Per **ADR-0031**, a Property can now hold a **lodged** EPC and a **predicted** EPC
+(EPC Prediction gap-fill) at the same time. The two are distinguished by a new
+`source` discriminator on the `epc_property` row:
+
+- `infrastructure/postgres/epc_property_table.py` — `EpcPropertyModel` gains
+  `source: str = Field(default="lodged")`.
+- `repositories/epc/epc_postgres_repository.py` — `save(..., source="lodged")`
+  writes it; `_delete_for_property` is now per-source (idempotency no longer wipes
+  the other slot); `get_for_property` / `get_for_properties` filter `source =
+  'lodged'`; new `get_predicted_for_property` / `get_predicted_for_properties`
+  read `source = 'predicted'`.
+
+The test database is built from the SQLModel mirrors via `create_all`, so tests
+already exercise the column. **Production is not** — hence this note.
+
+## Required Drizzle migration
+
+On the `epc_property` table:
+
+1. **Add column** `source` — `text` (or your enum), **NOT NULL**, **default
+   `'lodged'`**. The default backfills every existing row as a real EPC, which is
+   correct (all current rows are lodged).
+2. **Relax any single-row-per-property uniqueness.** If a unique constraint /
+   index exists on `epc_property(property_id)`, it must become
+   **`(property_id, source)`** — a property may now have one `lodged` row and one
+   `predicted` row. (Verify whether such a constraint exists; the SQLModel mirror
+   has none, but the production schema may.)
+3. **Recommended index** `(property_id, source)` — every predicted/lodged read
+   filters on both columns.
+
+## Allowed values
+
+`source ∈ {'lodged', 'predicted'}` (see `EpcSource` in
+`repositories/epc/epc_repository.py`). No other values are written.
+
+## Why
+
+ADR-0031: predicted EPCs are stored in their own slot rather than overwriting the
+lodged `epc`, so (a) provenance is structural — the Validation Cohort excludes
+predicted-sourced Properties and the UI flags them — and (b) lodged + predicted
+coexist, which the planned **EPC Anomaly Flags** feature needs (compare a
+Property's lodged EPC against its predicted one).
--- a/docs/adr/0029-epc-prediction-from-comparable-properties.md
+++ b/docs/adr/0029-epc-prediction-from-comparable-properties.md
@ -0,0 +1,50 @@
+# EPC Prediction from Comparable Properties
+
+~30% of UK homes (typically long-tenure) have no EPC. **EPC Prediction** produces a Property's `EpcPropertyData` picture from its **Comparable Properties** so an EPC-less Property flows through the rest of the pipeline (Rebaselining, Bill Derivation, Modelling) unchanged. This records the load-bearing design decisions taken in a grill-with-docs session.
+
+## Status
+
+Accepted (design). Implementation pending.
+
+## Decisions
+
+### 1. Predict the physical picture, score it with our calculator — never a SAP scalar
+
+Prediction outputs a structured `EpcPropertyData` (building parts, windows, floor dimensions, construction + insulation, age band); SAP / CO2 / PEI / per-end-use kWh come from running `Sap10Calculator` on it. This is the same "assemble a picture, score once" mechanic as every other **Effective EPC** path (Landlord Overrides, Reduced-Field Synthesis), so a predicted Property is fully usable downstream (bills, measures, optimiser) — a directly-aggregated SAP scalar (legacy `SearchEpc`) would be a dead-end number. It also makes the component-classification accuracy metrics meaningful and keeps errors traceable to a wrong component rather than an opaque regression.
+
+### 2. Deterministic neighbour synthesis, not ML
+
+No trained model, no learned weights, no fit pipeline: filter a cohort, take categorical modes, copy a representative template, apply overrides. CONTEXT's prior "ML mechanism" framing is corrected — calling it ML invited the wrong architecture (training data, model artifacts, retraining). A future *learned-weighting* refinement is possible but separate, mirroring the calculator's flagged-future ML residual head. The domain class is `EpcPrediction` (no "Service" suffix, per the `BillDerivation` convention).
+
+### 3. Fetch-phase fallback, behind a domain service + a cohort repository port
+
+A pure **`EpcPrediction`** domain service (cohort of comparable `EpcPropertyData` in → predicted `EpcPropertyData` out) sits behind a **`ComparableProperties`** repository port that owns the cohort IO (postcode search → per-cert fetch, cached). It wires into `IngestionOrchestrator._fetch`: when `epc_fetcher.get_by_uprn` returns `None`, fetch the cohort and predict, persisting the picture **marked as predicted** (so the UI flags it and the Validation Cohort excludes it). Baseline and Modelling are untouched. Chosen over a fetcher-decorator (hides a heavy cohort fetch behind `get_by_uprn`) and a dedicated stage (a whole stage for "fill the gap when absent", duplicating IO ingestion already does). The heavy cohort IO stays visible in the no-unit IO phase.
+
+### 4. Hybrid synthesis: cohort-mode categoricals + a coherent structural template
+
+You cannot average a list of windows (counts differ; a mean orientation is meaningless) or building parts. So:
+- **Homogeneous categoricals** (wall / roof / floor construction + insulation, age band) → cohort **mode** (robust to one oddball; drives the classification-rate metrics).
+- **Structure & geometry** (building parts, per-window dimensions + orientations, floor dimensions) → copied wholesale from a **single representative comparable** chosen to be consistent with those modes and closest on geo + size (internally consistent for the calculator; drives the window-area / building-parts / floor-area residual metrics).
+- **Landlord Overrides** and the known inputs are applied **on top**.
+
+Rejected: field-by-field aggregation (legacy — incoherent, may not score sensibly) and single-nearest-neighbour copied wholesale (one atypical neighbour sets the categoricals → weaker classification).
+
+### 5. Cohort selection: filter-then-relax ladder, weighted by geo × recency × similarity
+
+Selection hard-filters on identity (property type, built form) and any **known Landlord Override** (e.g. solid brick — the mixed-street border case) **while ≥ k comparables remain**, widening the geographic scope (postcode → postcode-prefix) or demoting a known to a strong weight when sparse. Survivors are weighted by **geographic proximity** (true coordinates via `GeospatialRepository`, not the legacy house-number proxy) **× recency** (newer EPCs are higher quality) **× physical similarity**; ~~pre-SAP10 / very old certs are dropped~~ (amended by [ADR-0030](0030-epc-prediction-validation-is-sap-version-aware-and-component-first.md): all vintages are kept — components are methodology-agnostic — with recency as a graduated weight; only the *validation target* must be SAP 10.2). So a known field acts twice: upstream on cohort selection, and again as an override on the final picture.
+
+### 6. Dual use: gap-fill (no EPC) and anomaly flags (has EPC)
+
+The same cohort + comparison machinery produces **EPC Anomaly Flags** for Properties that *do* have an EPC (e.g. "all neighbours are 1930s; this lodges 1950 — correct?") — advisory, surfaced for user review. The no-EPC gap-fill lands first; the always-on anomaly-flag wiring is a follow-on increment.
+
+## Validation
+
+> **Superseded by [ADR-0030](0030-epc-prediction-validation-is-sap-version-aware-and-component-first.md).** The SAP-version mixing in the cohort makes the lodged-SAP comparisons below (esp. the neighbour-mean baseline) invalid; validation is now component-first over SAP-10.2-only targets. The frozen-corpus leave-one-out shape stands.
+
+A **frozen postcode-clustered corpus** (a one-off fetch caches N postcodes × all their certs as `EpcPropertyData`) backs an offline, deterministic, repeatable **leave-one-out** harness over thousands of properties: drop a property with an EPC from its own cohort, predict it, compare predicted vs actual. Metrics: **classification rate** on wall / roof / floor construction + insulation and construction age band; **residuals** on SAP, total window area + window count, building-parts count, total floor area. SAP is reported three ways to attribute error — predicted-then-calculated vs (a) lodged SAP (end-to-end), (b) calculator-on-actual-components (isolates prediction error), (c) a direct neighbour-mean SAP baseline (proves predict-then-calculate beats naïve averaging).
+
+## Open (implementation-level)
+
+- **Provenance marker** on the picture (predicted vs real) — exact representation TBD; needed for the UI flag and Validation Cohort exclusion.
+- **No-cohort fallback** when zero comparables survive even after relaxing (low-confidence national property-type + age defaults, or skip-with-flag).
+- **Confidence signal** (cohort size + agreement) carried for the UI and anomaly thresholds.
--- a/docs/adr/0030-epc-prediction-validation-is-sap-version-aware-and-component-first.md
+++ b/docs/adr/0030-epc-prediction-validation-is-sap-version-aware-and-component-first.md
@ -0,0 +1,53 @@
+# EPC Prediction validation is SAP-version-aware and component-first
+
+**Status: Accepted.** Supersedes the **Validation** section of [ADR-0029](0029-epc-prediction-from-comparable-properties.md) and amends its decision 5 (cohort selection). All other ADR-0029 decisions stand (predict a picture and score it; deterministic neighbour synthesis; fetch-phase fallback; hybrid mode + template synthesis; dual gap-fill / anomaly use).
+
+## Why this ADR exists
+
+ADR-0029's validation rested on a three-way SAP comparison, including a **neighbour-mean-of-lodged-SAP baseline** that predict-then-calculate was meant to beat. A second-order problem was invisible when that was written: the gov EPC register spans **multiple SAP spec versions**, and a property's neighbours are mostly *older* certs. In our development corpus only **~16% of certs are SAP 10.2** (`sap_version == 10.2`, schema 21.0.0 / 21.0.1); the rest were lodged under SAP 2012 (RdSAP 9.x). The same dwelling scores a *different* SAP under different spec versions, so:
+
+- **Averaging neighbours' lodged SAP is invalid** — it blends 2012 and 10.2 numbers to estimate a 10.2 target. The ADR-0029 "baseline" was never a fair comparator; on the real corpus it appeared to *beat* prediction purely as an artifact of this mixing. It is removed.
+- **Comparing our calculator's output to a neighbour's lodged figure is only meaningful within a same-spec cohort** — the existing **SAP Spec Version** / **Validation Cohort** rule (ADR-0010) already said this for calculator validation; it applies equally here.
+
+Separately, measuring `calc(predicted)` against the held-out cert's **lodged** SAP conflates two unrelated errors: the *prediction* error and the calculator's own **API-path residual** (~3 SAP on random gov-API certs today — a known, *separate* workstream, since the calculator pins at 1e-4 only on the Elmhurst worksheets). A perfect prediction still scores ~3 off lodged. So lodged-SAP error is the wrong thing to tune prediction against.
+
+## Decisions
+
+### 1. The source cohort keeps all cert vintages; only the validation **target** is SAP 10.2
+
+A building's physical **components** (wall / roof / floor / heating fuel / age band) are agnostic of the SAP methodology that rated them — a pre-SAP10 neighbour is valid *evidence* about the street. Dropping pre-SAP10 certs from the cohort (ADR-0029 decision 5) would discard ~84% of neighbours and gut prediction. So: **all vintages stay in the Comparable Properties cohort**, with recency as a graduated *weight* (never a hard drop), mattering most for the one component that genuinely goes stale — the heating system, when a boiler is replaced. Only the held-out **validation target** is restricted to `sap_version == 10.2`, the only vintage with full-fidelity lodged components to check against. (Target selection uses the API `sap_version` field directly, not the `inspection_date ≥ 2025-07-01` proxy.)
+
+### 2. **Component Accuracy** is the primary, calculator-independent signal
+
+Prediction is tuned against how closely the predicted `EpcPropertyData` *components* match the actual ones — **not** against any SAP score. Scored by leave-one-out over a 10.2 target: categoricals as a classification hit-rate (with `None` = not-applicable excluded from the denominator), numerics as a residual. Coverage spans the SAP-load-bearing components, led by **heating** (the proven dominant lever — ablating heating to the actual cuts the SAP error from ~7 to ~4.5):
+
+- **Heating** — main fuel, main category, main control, water-heating fuel/code, has-cylinder, cylinder insulation, secondary heating
+- **Fabric** — wall / roof / floor construction + insulation, age band (plus a **±1-band** rate, since adjacent bands ≈ same U-value), room-in-roof
+- **Glazing** — modal glazing type; window count + total-area residuals
+- **Counts / geometry** — door count, building-parts count, floor area
+- **Renewables** — PV presence, solar water heating
+
+Load-bearing principle: **predict the components well and correct SAP / carbon / PE fall out once calculator gaps close.** Component Accuracy makes progress even while the calculator moves underneath us.
+
+### 3. `calc(predicted)` vs **API-lodged** SAP / carbon / PE is a secondary, calculator-floored check — and two comparisons are rejected
+
+The end-to-end number — does the predicted picture score like the official 10.2 EPC — is reported but **not** the thing we drive to zero: it is floored by the calculator's API-path residual and improves as *both* prediction and the calculator workstream land. Carbon and PE are *more* version-sensitive than SAP (grid factors shifted sharply between SAP 2012 and 10.2), so they too are compared only on 10.2 targets.
+
+Rejected:
+- **`calc(predicted)` vs `calc(actual)`** — cancels (and so *hides*) calculator error against a *circular* ground truth (our own calculator); a systematically wrong prediction in the calculator's blind spot would score perfectly. Not a validation signal; at most an internal attribution diagnostic.
+- **neighbour-mean-of-lodged-SAP baseline** — mixes SAP versions (see above).
+
+No synthetic SAP-weighted Component Accuracy index: weighting components by SAP impact reintroduces the calculator. The per-component table stays flat; the end-to-end MAE *is* the holistic rollup.
+
+### 4. Two validation tiers, one shared scorer
+
+- **Tier 1 — committed CI gate.** A small, **anonymized**, frozen fixture under `tests/fixtures/` (addresses hashed — the predictor uses address only as a dedup key — `post_town` dropped; postcode + component fields retained; gov data is OGL). A **ratcheting regression gate**: each per-component floor / residual ceiling is the currently-measured value and only ever *tightens* (honouring the repo's no-tolerance-widening ethos); a regression fails the build. End-to-end SAP / carbon / PE thresholds are loose and explicitly **calculator-floored** — gross-regression guards, not targets. Gates when the fixture is present; skips with a message otherwise.
+- **Tier 1.5 — S3-hosted scale run (near-term).** A few-thousand-cert anonymised corpus stored in **S3** rather than committed to git (too large to commit, but far more statistically stable than the 36-target gate fixture). The integration test pulls it to a temp dir and runs the *same* `load_corpus` + `evaluate_component_accuracy`, then reports / asserts. This is the immediate realization of "validate at scale" — it reuses the committed-fixture machinery wholesale (only the data *source* differs) and needs no bulk-export streaming. Its numbers re-baseline the Tier-1 floors.
+- **Tier 2 — offline national battle-test (deferred).** Built on `harness/epc_bulk.py` (streams the gov **bulk export** via HTTP range requests, filtered by `sap_version`) and `harness/cohort.py` (offline sweep that **captures per-cert raises** instead of aborting). Streams the register and **buckets by postcode** — because bulk is the *whole* register, every postcode is dense, giving national breadth *and* dense cohorts at once. Over tens of thousands of 10.2 targets it emits the Component Accuracy table, the end-to-end MAE, **and a failure taxonomy** (unsupported-schema, mapper raise, calculator raise, no-cohort, no-10.2-target) — the battle-test half. Not committed, not CI-gated; its numbers periodically **re-baseline the Tier-1 floors**.
+
+All tiers run the *same* `evaluate_component_accuracy` / `compare_prediction` logic over `load_corpus` — one scorer, three data sources (committed fixture, S3 corpus, bulk stream).
+
+## Consequences
+
+- ADR-0029's "Validation" section and its decision-5 clause "pre-SAP10 / very old certs are dropped" are superseded by the above. The CONTEXT terms **Comparable Properties** (all-vintage source) and **Component Accuracy** (new) are updated to match.
+- The Tier-1 fixture is the first committed gov-EPC fixture sized for *statistical* stability rather than worksheet-exact pinning — a deliberate departure from the repo's 1e-4 pin convention, justified by prediction's irreducible error.
--- a/docs/adr/0031-epc-prediction-production-wiring.md
+++ b/docs/adr/0031-epc-prediction-production-wiring.md
@ -0,0 +1,87 @@
+# EPC Prediction production wiring
+
+ADR-0029 settled *how* EPC Prediction synthesises a Property's `EpcPropertyData`
+from its **Comparable Properties**; this records *how it wires into the running
+pipeline* — where estimation runs, how a predicted EPC is stored and told apart
+from a lodged one, and which Properties are eligible. Resolved in a
+grill-with-docs session after the algorithm + validation harness were built and
+the accuracy backlog (#1222–1228) closed.
+
+## Status
+
+Accepted (design). Refines ADR-0029 decision 3. Implementation pending (slice 5).
+
+## Decisions
+
+### 1. Estimation runs in Ingestion — the #1227 "shift to Modelling" is dropped
+
+The cohort fetch + predict happens in `IngestionOrchestrator`, when
+`epc_fetcher.get_by_uprn` returns `None` — upholding ADR-0029 decision 3. A
+design note from issue #1227 had proposed moving estimation (and its distance
+calcs) into `ModellingOrchestrator`; that is reversed here. Ingestion is already
+the EPC-acquisition phase and *already resolves the Property's coordinates*
+(`spatial`), so it can run the geo-weighted prediction with no new IO surface;
+the First Run stages communicate **only through persisted state** (the pipeline
+threads just `property_ids`, each stage reloads the `Property`), so a prediction
+produced in Modelling would either have to be persisted there anyway or
+recomputed every run. No rationale for the Modelling shift survived review.
+Baseline and Modelling stay untouched — they read a populated `effective_epc`.
+
+### 2. The predicted EPC is persisted in a distinct slot, never overwriting the lodged one
+
+Because stages communicate via persisted state, the prediction **must be saved**
+for Modelling to see it — in-memory-only would never reach stage 3. It is stored
+as a **distinct predicted-EPC slot** on the Property (the EPC table reused with a
+`source` discriminator — `lodged` / `predicted`), so a lodged EPC and a predicted
+EPC can **coexist** on one Property. Coexistence is load-bearing: it is what lets
+the same cohort machinery produce **EPC Anomaly Flags** for Properties that *do*
+have an EPC (the dual-use named in ADR-0029), and it means a later real-EPC fetch
+fills its own slot without the predicted one muddying provenance. Rejected: a
+single EPC slot with an `is_predicted` flag — it cannot hold both, so it forecloses
+anomaly detection and makes "lodged later arrives" ambiguous.
+
+### 3. Provenance is structural — on the Property, not on `EpcPropertyData`
+
+`EpcPropertyData` gains no `predicted` / `source` field. Which slot the picture
+came from *is* its provenance. `Property.effective_epc` / `source_path` gain a
+`"predicted"` branch, used only when there is no lodged EPC **and** no Site Notes
+(the existing precedence is unchanged; a real source always wins). The
+**Validation Cohort** then excludes any Property whose `effective_epc` resolves
+via the predicted slot — it has no same-spec lodged ground truth — and the UI
+flags it as predicted. Keeping `EpcPropertyData` clean means every downstream
+consumer (calculator, generators, bill derivation) is unchanged and oblivious to
+how the picture was sourced, exactly as for Landlord-Override and Reduced-Field
+pictures.
+
+### 4. Slice 5 is gap-fill only; always-predict (anomaly) is a follow-on
+
+Prediction runs only when `epc is None`. Predicting for *every* Property to
+compare against its lodged EPC (EPC Anomaly Flags) is real and the slot model
+supports it, but it triples the ingestion cohort IO and needs its own
+comparison + divergence-threshold + UI surface — so it does not ride in on the
+wiring slice. The predicted-EPC slot and the `ComparableProperties` repository
+this slice introduces are exactly what the anomaly capability reuses.
+
+### 5. A known property type is required — eligibility is gated, never defaulted
+
+A `PredictionTarget` needs `postcode` (from `PropertyIdentity`), `coordinates`
+(geospatial), and `property_type` + `built_form` + `wall_construction` from
+**Landlord Overrides**. `property_type` is the **hard** cohort filter (a flat
+must not be sized from houses), so it is a **required input**: a Property whose
+property type is genuinely unknown is **gated out** before prediction — flagged
+un-predictable, never predicted from a mixed-type cohort and never given a
+national default. (An Ordnance Survey `postcode_search` source can supply
+property type more broadly than landlord input does; wiring it is a later
+enhancement that widens the eligible population — out of scope here.) The
+`ComparableProperties` repository port deferred by ADR-0029 is built in this
+slice: it owns the cohort IO (postcode search → per-cert fetch → UPRN→coordinate
+resolution) and returns candidate `Comparable`s for the domain
+`select_comparables` to filter.
+
+## Consequences
+
+- A schema change: the EPC store gains a `source` discriminator (or equivalent),
+  and the Property repository a `get_predicted_for_property` read.
+- Slice 5's reach is bounded to EPC-less Properties with a landlord-supplied
+  property type. The `postcode_search` integration is the lever to broaden it.
+- `EpcPropertyData` stays unchanged, so no downstream consumer is touched.
--- a/domain/epc_prediction/README.md
+++ b/domain/epc_prediction/README.md
@ -0,0 +1,69 @@
+# EPC Prediction
+
+Predict a structured `EpcPropertyData` for an **EPC-less** UK home from its
+postcode neighbours, so it flows through the rest of the pipeline (Baseline, Bill
+Derivation, Modelling) exactly like a home that has an EPC. It is **deterministic
+neighbour synthesis** — cohort modes + a coherent template + per-component
+weighting — **not ML**. ~30% of UK homes (typically long-tenure) have no EPC.
+
+- **Design**: [ADR-0029](../../docs/adr/0029-epc-prediction-from-comparable-properties.md) (algorithm),
+  [ADR-0030](../../docs/adr/0030-epc-prediction-validation-is-sap-version-aware-and-component-first.md) (validation),
+  [ADR-0031](../../docs/adr/0031-epc-prediction-production-wiring.md) (production wiring).
+- **Glossary**: see *EPC Prediction*, *Comparable Properties*, *Component
+  Accuracy*, *EPC Anomaly Flag* in [CONTEXT.md](../../CONTEXT.md).
+
+## The flow (gap-fill)
+
+```
+Ingestion: a Property has no lodged EPC (epc_fetcher.get_by_uprn → None)
+   │
+   ├─ resolve its attributes (property_type/built_form/wall) from Landlord Overrides
+   │     └─ property_type unknown? → GATED OUT, not predicted (no national defaults)
+   ├─ build a PredictionTarget (postcode + coordinates + attributes)
+   ├─ ComparableProperties repo: fetch the postcode cohort (search → per-cert → coords)
+   ├─ select_comparables(): filter to the reference cohort (type-hard, built-form-soft)
+   ├─ EpcPrediction.predict(): synthesise the picture (modes + template + donor + weights)
+   └─ persist to the Property's PREDICTED slot  (source = "predicted")
+            │
+Modelling/Baseline: Property.effective_epc returns the predicted picture
+            (source_path == "predicted"), scored like any other Effective EPC.
+```
+
+A lodged EPC always wins — prediction is last-resort gap-fill.
+
+## Where the pieces live
+
+| Concern | File |
+|---|---|
+| Synthesis (modes, template, heating donor, geo/recency/similarity weights) | `epc_prediction.py` |
+| Cohort selection (filter-then-relax ladder) | `comparable_properties.py` |
+| Target assembly + eligibility gate | `prediction_target.py` |
+| Cohort IO port + EPC-API/geospatial adapter | `repositories/comparable_properties/` |
+| Predicted-EPC persistence (`source` discriminator) | `repositories/epc/` |
+| `predicted` source path on the aggregate | `domain/property/property.py` |
+| Ingestion wiring (gate → predict → persist) | `orchestration/ingestion_orchestrator.py` |
+| Validation (leave-one-out, component-first) + ratcheting gate | `validation.py`, `tests/domain/epc_prediction/test_component_accuracy_gate.py` |
+
+## See it run
+
+`tests/e2e/test_epc_prediction_e2e.py` — the whole flow against the real DB +
+repos, only the external HTTP clients faked. Start there.
+
+## Status
+
+Algorithm + validation: **built**. Production gap-fill wiring: **built behind
+seams** (slices 5a–5e). Two things finish it — a DB migration and the
+`property_overrides` read adapter — see
+[the wiring handover](../../docs/HANDOVER_EPC_PREDICTION_WIRING.md) and
+[the migration note](../../docs/MIGRATION_NOTE_predicted_epc_source.md).
+**EPC Anomaly Flags** (predict for *every* home, compare to lodged) is the
+designed next step the storage already supports.
+
+## Run the tests
+
+```bash
+PYTHONPATH=. python -m pytest tests/e2e/test_epc_prediction_e2e.py \
+  tests/domain/epc_prediction tests/orchestration/test_ingestion_prediction.py \
+  tests/repositories/comparable_properties tests/repositories/epc/test_epc_predicted_slot.py \
+  -o addopts="" -q
+```
--- a/domain/epc_prediction/init.py
+++ b/domain/epc_prediction/init.py
--- a/domain/epc_prediction/comparable_properties.py
+++ b/domain/epc_prediction/comparable_properties.py
@ -0,0 +1,126 @@
+"""Comparable Properties selection for EPC Prediction (ADR-0029).
+
+Given a `PredictionTarget` (the known inputs for an EPC-less Property) and the
+raw postcode cohort of candidate `ComparableProperty` objects, `select_comparables`
+chooses the reference cohort EPC Prediction synthesises from. Pure domain logic —
+the cohort IO (postcode search → per-cert fetch) lives behind a repository port.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from datetime import date
+from typing import Callable, Optional
+
+from datatypes.epc.domain.epc_property_data import EpcPropertyData
+from domain.epc_prediction.prediction_target import PredictionTarget
+from domain.geospatial.coordinates import Coordinates
+
+# Default floor on the cohort: a conditioning filter (built form, a known
+# override) is applied only while at least this many comparables survive it,
+# else it is relaxed (ADR-0029 filter-then-relax ladder).
+_DEFAULT_MINIMUM_COHORT = 5
+
+
+@dataclass(frozen=True)
+class ComparableProperty:
+    """One candidate neighbour: its structured `EpcPropertyData` picture plus the
+    register metadata not carried on the cert (identity for leave-one-out
+    exclusion; recency + address for weighting + re-lodgement dedup)."""
+
+    epc: EpcPropertyData
+    certificate_number: str
+    address: Optional[str] = None
+    registration_date: Optional[date] = None
+    # Resolved from the neighbour's UPRN at the boundary (the harness / modelling
+    # orchestrator), so the pure predictor can weight by physical distance to the
+    # target without an IO dependency. None when no UPRN/coordinate is available.
+    coordinates: Optional[Coordinates] = None
+
+
+@dataclass(frozen=True)
+class ComparableProperties:
+    """The selected reference cohort for a `PredictionTarget`."""
+
+    members: tuple[ComparableProperty, ...]
+
+
+def _maybe_filter(
+    cohort: list[ComparableProperty],
+    predicate: Callable[[ComparableProperty], bool],
+    *,
+    active: bool,
+    minimum_cohort: int,
+) -> list[ComparableProperty]:
+    """Apply a conditioning filter only while it leaves at least
+    `minimum_cohort` comparables; otherwise relax it (keep the pre-filter
+    cohort) — the filter-then-relax ladder (ADR-0029)."""
+    if not active:
+        return cohort
+    filtered = [c for c in cohort if predicate(c)]
+    return filtered if len(filtered) >= minimum_cohort else cohort
+
+
+def select_comparables(
+    target: PredictionTarget,
+    candidates: list[ComparableProperty],
+    *,
+    minimum_cohort: int = _DEFAULT_MINIMUM_COHORT,
+) -> ComparableProperties:
+    """Select the ComparableProperty Properties for `target` from the raw postcode
+    cohort. The register lists every historical lodgement, so first dedupe each
+    address to its latest cert (one comparable per real neighbour); then property
+    type is an always-hard filter (a flat is never a comparable for a house) and
+    built form is a conditioning filter on the relax ladder."""
+    cohort = _dedupe_to_latest_per_address(candidates)
+    cohort = [
+        c for c in cohort if c.epc.property_type == target.property_type
+    ]
+    cohort = _maybe_filter(
+        cohort,
+        lambda c: c.epc.built_form == target.built_form,
+        active=target.built_form is not None,
+        minimum_cohort=minimum_cohort,
+    )
+    cohort = _maybe_filter(
+        cohort,
+        lambda c: _main_wall_construction(c) == target.wall_construction,
+        active=target.wall_construction is not None,
+        minimum_cohort=minimum_cohort,
+    )
+    return ComparableProperties(members=tuple(cohort))
+
+
+def _dedupe_to_latest_per_address(
+    candidates: list[ComparableProperty],
+) -> list[ComparableProperty]:
+    """Collapse the register's re-lodgements: keep one comparable per address —
+    the latest by registration date (ties broken by certificate number, for
+    determinism) — so a re-lodged neighbour does not count more than once.
+    Candidates with no address are passed through untouched (each is its own
+    neighbour). Input order is otherwise preserved."""
+    latest: dict[str, ComparableProperty] = {}
+    passthrough: list[ComparableProperty] = []
+    for c in candidates:
+        if c.address is None:
+            passthrough.append(c)
+            continue
+        incumbent = latest.get(c.address)
+        if incumbent is None or _recency_key(c) > _recency_key(incumbent):
+            latest[c.address] = c
+    return list(latest.values()) + passthrough
+
+
+def _recency_key(comparable: ComparableProperty) -> tuple[date, str]:
+    """Sort key making the most recent (then highest cert number) win. A missing
+    registration date sorts oldest."""
+    return (
+        comparable.registration_date or date.min,
+        comparable.certificate_number,
+    )
+
+
+def _main_wall_construction(comparable: ComparableProperty) -> object:
+    """The main building part's wall construction, or None when no part lodged."""
+    parts = comparable.epc.sap_building_parts
+    return parts[0].wall_construction if parts else None
--- a/domain/epc_prediction/epc_prediction.py
+++ b/domain/epc_prediction/epc_prediction.py
@ -0,0 +1,546 @@
+"""EPC Prediction synthesis (ADR-0029).
+
+`EpcPrediction.predict` turns the selected `ComparableProperties` into a
+predicted `EpcPropertyData`: copy a coherent representative template's structure
+(building parts, windows, geometry), set the homogeneous categoricals to the
+recency-weighted cohort mode, then apply Landlord Overrides on top. Pure domain
+logic — deterministic neighbour synthesis, not ML.
+"""
+
+from __future__ import annotations
+
+import copy
+import math
+import statistics
+from collections import Counter, defaultdict
+from dataclasses import dataclass
+from datetime import date
+from typing import Callable, Iterable, Optional, Union
+
+from datatypes.epc.domain.epc_property_data import (
+    EpcPropertyData,
+    MainHeatingDetail,
+    SapBuildingPart,
+)
+from domain.epc_prediction.comparable_properties import (
+    ComparableProperty,
+    ComparableProperties,
+)
+from domain.epc_prediction.prediction_target import PredictionTarget
+from domain.geospatial.coordinates import Coordinates
+
+
+@dataclass(frozen=True)
+class PredictionConfidence:
+    """A compute-only confidence signal for a prediction (ADR-0029 open item).
+
+    `cohort_size` is the number of ComparableProperty Properties the prediction drew on;
+    `component_agreement` maps a homogeneous component to the cohort's *agreement*
+    — the modal value's share (0..1) of the neighbours that lodge one. A small or
+    split cohort flags a component downstream may want to treat cautiously (e.g.
+    the per-dwelling fields with a low accuracy ceiling). Surfacing / persisting
+    this is a separate HITL follow-up; here it is computed only.
+    """
+
+    cohort_size: int
+    component_agreement: dict[str, float]
+
+    def agreement(self, component: str) -> Optional[float]:
+        """The cohort's modal-value share for a component, or None when no
+        neighbour lodges one (it was not applicable)."""
+        return self.component_agreement.get(component)
+
+
+class EpcPrediction:
+    """Synthesises a predicted `EpcPropertyData` from ComparableProperty Properties."""
+
+    def predict(
+        self, target: PredictionTarget, comparables: ComparableProperties
+    ) -> EpcPropertyData:
+        """Predict the target's EPC picture: copy a representative template's
+        structure (coherent for the calculator), set the predicted floor area to
+        the cohort median (the best point estimate of the target's size, decoupled
+        from the one template's own area), then set the homogeneous categoricals
+        to the cohort mode."""
+        template: ComparableProperty = self._template(comparables)
+        predicted: EpcPropertyData = copy.deepcopy(template.epc)
+        predicted.total_floor_area_m2 = _geo_weighted_floor_area(
+            comparables.members, target.coordinates
+        )
+        self._apply_categorical_modes(predicted, comparables, target.coordinates)
+        self._apply_glazing_mode(predicted, comparables, target.coordinates)
+        self._apply_heating_donor(predicted, comparables)
+        self._apply_overrides(predicted, target)
+        return predicted
+
+    @staticmethod
+    def _apply_heating_donor(
+        predicted: EpcPropertyData, comparables: ComparableProperties
+    ) -> None:
+        """Replace the structural template's heating with a coherent donor's whole
+        `SapHeating` cluster (ADR-0029; issue #1225). Heating sub-fields can't be
+        field-moded without breaking system coherence (e.g. a fuel that doesn't
+        match the emitter), so the cluster is copied as a unit from a single
+        neighbour: the one matching the cohort's modal heating *signature* (main
+        fuel + category + cylinder presence), the most recent among those matches
+        (a recent cert reflects the current system). This makes the predicted
+        heating both representative and internally coherent, rather than whatever
+        the size-representative template happened to carry. No donor (no neighbour
+        lodges a main heating system) leaves the template's heating in place."""
+        donor = _heating_donor(comparables.members)
+        if donor is None:
+            return
+        predicted.sap_heating = copy.deepcopy(donor.epc.sap_heating)
+        predicted.has_hot_water_cylinder = donor.epc.has_hot_water_cylinder
+        predicted.solar_water_heating = donor.epc.solar_water_heating
+
+    @staticmethod
+    def _apply_glazing_mode(
+        predicted: EpcPropertyData,
+        comparables: ComparableProperties,
+        target_coordinates: Optional[Coordinates],
+    ) -> None:
+        """Set every window's glazing type to the recency- and geo-weighted cohort
+        mode. Glazing is retrofitted over a dwelling's life (single → double), so
+        a recent neighbour reflects the current state (recency, like roof
+        insulation); it also varies geographically (retrofit waves by street), so
+        a nearer neighbour counts for more. NOT the plain mode (which regressed)
+        or the template copy. The window geometry (size, count) is left on the
+        template; only the glazing categorical moves."""
+        members = comparables.members
+        weights = _combine(
+            _recency_weights(members), _geo_weights(target_coordinates, members)
+        )
+        glazing = _weighted_mode(
+            (_comparable_modal_glazing(c) for c in members), weights
+        )
+        if glazing is None:
+            return
+        for window in predicted.sap_windows:
+            window.glazing_type = glazing
+
+    def confidence(
+        self, comparables: ComparableProperties
+    ) -> PredictionConfidence:
+        """Compute the per-prediction confidence from the cohort: its size plus,
+        for each homogeneous categorical, the modal value's share among the
+        neighbours that lodge one (ADR-0029). Compute-only — it never alters the
+        prediction, only annotates how much the cohort agreed."""
+        members: tuple[ComparableProperty, ...] = comparables.members
+        agreement: dict[str, float] = {}
+        for attr in _MAIN_PART_CATEGORICALS:
+            share: Optional[float] = _modal_share(
+                _main_part_attr(c, attr) for c in members
+            )
+            if share is not None:
+                agreement[attr] = share
+        for attr in _FLOOR_DIM_CATEGORICALS:
+            floor_share: Optional[float] = _modal_share(
+                _main_floor_attr(c, attr) for c in members
+            )
+            if floor_share is not None:
+                agreement[attr] = floor_share
+        return PredictionConfidence(
+            cohort_size=len(members), component_agreement=agreement
+        )
+
+    @staticmethod
+    def _template(comparables: ComparableProperties) -> ComparableProperty:
+        """The representative comparable whose structure seeds the prediction:
+        the member whose floor area is closest to the cohort median. A single
+        neighbour's geometry is copied wholesale, so a size-representative
+        template keeps the prediction off the cohort's size outliers (ADR-0029
+        decision 4: closest on size)."""
+        members: tuple[ComparableProperty, ...] = comparables.members
+        median_area: float = statistics.median(
+            c.epc.total_floor_area_m2 for c in members
+        )
+        return min(
+            members,
+            key=lambda c: abs(c.epc.total_floor_area_m2 - median_area),
+        )
+
+    @staticmethod
+    def _apply_categorical_modes(
+        predicted: EpcPropertyData,
+        comparables: ComparableProperties,
+        target_coordinates: Optional[Coordinates],
+    ) -> None:
+        """Override the predicted picture's homogeneous categoricals — wall /
+        roof / floor construction + insulation, age band — with the cohort mode
+        (robust to an atypical template, per ADR-0029 decision 4). The mode is
+        physically-similarity-weighted (decision 5): each neighbour's vote decays
+        with its distance from the cohort's physical centre, so the mode leans on
+        the most representative neighbours. The components that vary
+        *geographically* — age band, wall construction, floor construction (homes
+        built together cluster) — additionally take a geo-proximity weight, so a
+        nearer neighbour counts for more; the rest (e.g. roof construction, which
+        showed no geo signal) do not. The template still supplies the geometry;
+        only the categorical codes move to the mode."""
+        if not predicted.sap_building_parts:
+            return
+        main: SapBuildingPart = predicted.sap_building_parts[0]
+        members = comparables.members
+        similarity: list[float] = _similarity_weights(members)
+        geo: list[float] = _geo_weights(target_coordinates, members)
+        similarity_geo: list[float] = _combine(similarity, geo)
+        for attr in _MAIN_PART_CATEGORICALS:
+            if attr in _RECENCY_WEIGHTED_CATEGORICALS:
+                mode = _recency_weighted_mode(members, attr)
+            else:
+                weights = (
+                    similarity_geo
+                    if attr in _GEO_WEIGHTED_CATEGORICALS
+                    else similarity
+                )
+                mode = _weighted_mode(
+                    (_main_part_attr(c, attr) for c in members), weights
+                )
+            if mode is not None:
+                setattr(main, attr, mode)
+        floor_dims = main.sap_floor_dimensions
+        if floor_dims:
+            for attr in _FLOOR_DIM_CATEGORICALS:
+                floor_weights = (
+                    similarity_geo
+                    if attr in _GEO_WEIGHTED_CATEGORICALS
+                    else similarity
+                )
+                floor_mode = _weighted_int_mode(
+                    (_main_floor_attr(c, attr) for c in members), floor_weights
+                )
+                if floor_mode is not None:
+                    setattr(floor_dims[0], attr, floor_mode)
+
+    @staticmethod
+    def _apply_overrides(
+        predicted: EpcPropertyData, target: PredictionTarget
+    ) -> None:
+        """Apply the known Landlord Overrides on top of the estimate — a known
+        value always wins over the cohort mode (ADR-0029)."""
+        if not predicted.sap_building_parts:
+            return
+        if target.wall_construction is not None:
+            predicted.sap_building_parts[0].wall_construction = (
+                target.wall_construction
+            )
+
+
+# The homogeneous categoricals carried directly on the main building part. Floor
+# categoricals live on the main floor dimension and glazing on the windows; both
+# are handled separately.
+_MAIN_PART_CATEGORICALS: tuple[str, ...] = (
+    "wall_construction",
+    "wall_insulation_type",
+    "construction_age_band",
+    "roof_construction",
+    "roof_insulation_thickness",
+)
+
+# Integer-coded categoricals on the main building part's ground-floor dimension.
+_FLOOR_DIM_CATEGORICALS: tuple[str, ...] = (
+    "floor_construction",
+    "floor_insulation",
+)
+
+# Categoricals whose physical value CHANGES over time (e.g. loft top-ups), so a
+# recent neighbour reflects the current state better than an old one — these take
+# a recency-WEIGHTED mode. Permanent categoricals (wall / age) take the plain
+# mode: recency-weighting them was net-negative on the validation corpus (it
+# discards data that is still valid). `_RECENCY_TAU_YEARS` is the exponential
+# decay constant (≈2.8-year half-life), chosen on the corpus (roof insulation
+# +4pp / +12pp on the fixture).
+_RECENCY_WEIGHTED_CATEGORICALS: frozenset[str] = frozenset(
+    {"roof_insulation_thickness"}
+)
+_RECENCY_TAU_YEARS: float = 4.0
+_DAYS_PER_YEAR: float = 365.0
+
+# Physical-similarity weighting of the categorical mode (ADR-0029 decision 5): a
+# comparable's vote decays exponentially with how far it sits from the cohort's
+# physical centre — floor area from the median, construction age from the modal
+# band — so an outlier-sized or outlier-era neighbour can't sway the mode. Scales
+# chosen on the validation corpus (wall-insulation +2.8pp / roof +1.1pp /
+# floor-construction +2.4pp / floor-insulation +1.2pp; gate-safe, no regression).
+_SIMILARITY_SIZE_SCALE_M2: float = 20.0
+_SIMILARITY_AGE_WEIGHT: float = 0.5
+_AGE_BAND_ORDER: str = "ABCDEFGHIJKL"
+
+# Geo-proximity weighting (#1227): a neighbour's vote decays with its haversine
+# distance to the target, so a closer neighbour counts for more. Applied only to
+# the components that showed a clear distance signal in the corpus — age band,
+# wall + floor construction, glazing (homes built / retrofitted together cluster);
+# roof construction showed no decay, so it is excluded. `_GEO_SCALE_KM` is the
+# kernel length-scale (chosen on the corpus). Off when the target has no
+# coordinates; neutral for a neighbour with none (never penalised for missing
+# data). floor_construction lives on the floor dimension but shares this set.
+_GEO_SCALE_KM: float = 0.1
+_GEO_WEIGHTED_CATEGORICALS: frozenset[str] = frozenset(
+    {"construction_age_band", "wall_construction", "floor_construction"}
+)
+
+
+def _main_part_attr(
+    comparable: ComparableProperty, attr: str
+) -> Optional[Union[int, str]]:
+    parts: list[SapBuildingPart] = comparable.epc.sap_building_parts
+    return getattr(parts[0], attr) if parts else None
+
+
+def _main_floor_attr(comparable: ComparableProperty, attr: str) -> Optional[int]:
+    parts: list[SapBuildingPart] = comparable.epc.sap_building_parts
+    if not parts:
+        return None
+    dims = parts[0].sap_floor_dimensions
+    value: Optional[int] = getattr(dims[0], attr) if dims else None
+    return value
+
+
+def _geo_weighted_floor_area(
+    members: tuple[ComparableProperty, ...],
+    target_coordinates: Optional[Coordinates],
+) -> float:
+    """The cohort's geo-proximity-weighted median floor area — the point estimate
+    of the target's size. The median minimises mean absolute deviation, so it is
+    the best single guess for an unknown neighbour's area; geo-weighting it leans
+    the estimate toward the nearer neighbours, because homes built together share
+    a footprint (the same street signal that already weights age / wall, #1227).
+    Reduces exactly to the plain median when geo weighting is off (no target
+    coordinates ⇒ uniform weights), preserving the MAD-minimising guarantee. Set
+    independently of the structural template (the calculator derives heat loss
+    from the building-part geometry, not this scalar, so the two need not agree)."""
+    weights: list[float] = _geo_weights(target_coordinates, members)
+    return _weighted_median(
+        [
+            (comparable.epc.total_floor_area_m2, weight)
+            for comparable, weight in zip(members, weights)
+        ]
+    )
+
+
+def _weighted_median(values_weights: list[tuple[float, float]]) -> float:
+    """The weighted median of (value, weight) pairs: the smallest value at which
+    the cumulative weight reaches half the total. When a value's weight splits the
+    total exactly in half, the two straddling values are averaged — so with
+    uniform weights this reduces exactly to `statistics.median` (including the
+    even-count midpoint average). Assumes a non-empty input."""
+    ordered: list[tuple[float, float]] = sorted(values_weights)
+    half: float = sum(weight for _, weight in ordered) / 2
+    cumulative: float = 0.0
+    for index, (value, weight) in enumerate(ordered):
+        cumulative += weight
+        if cumulative > half:
+            return value
+        if cumulative == half and index + 1 < len(ordered):
+            return (value + ordered[index + 1][0]) / 2
+    return ordered[-1][0]
+
+
+def _age_band_index(comparable: ComparableProperty) -> Optional[int]:
+    """The main building part's construction-age-band position (A=0 … L=11), or
+    None when no recognisable band is lodged."""
+    band = _main_part_attr(comparable, "construction_age_band")
+    if isinstance(band, str) and band in _AGE_BAND_ORDER:
+        return _AGE_BAND_ORDER.index(band)
+    return None
+
+
+def _similarity_weights(members: tuple[ComparableProperty, ...]) -> list[float]:
+    """A physical-similarity weight per comparable (ADR-0029 decision 5): the
+    product of an exponential decay in its floor-area distance from the cohort
+    median and in its age-band distance from the cohort's modal band. A neighbour
+    missing a size or age contributes a neutral weight on that axis, so it is
+    never penalised for absent data. Aligned with `members` index-for-index."""
+    if not members:
+        return []
+    median_area: float = statistics.median(
+        c.epc.total_floor_area_m2 for c in members
+    )
+    age_indices: list[Optional[int]] = [_age_band_index(c) for c in members]
+    present_ages: list[int] = [i for i in age_indices if i is not None]
+    modal_age: Optional[float] = (
+        statistics.median(present_ages) if present_ages else None
+    )
+    weights: list[float] = []
+    for comparable, age_index in zip(members, age_indices):
+        size_term: float = math.exp(
+            -abs(comparable.epc.total_floor_area_m2 - median_area)
+            / _SIMILARITY_SIZE_SCALE_M2
+        )
+        age_term: float = (
+            math.exp(-_SIMILARITY_AGE_WEIGHT * abs(age_index - modal_age))
+            if modal_age is not None and age_index is not None
+            else 1.0
+        )
+        weights.append(size_term * age_term)
+    return weights
+
+
+def _weighted_mode(
+    values: Iterable[Optional[Union[int, str]]], weights: list[float]
+) -> Optional[Union[int, str]]:
+    """The value with the greatest total similarity weight (ties broken by first
+    appearance, matching `_mode`), or None when no non-None value is present."""
+    totals: dict[Union[int, str], float] = defaultdict(float)
+    for value, weight in zip(values, weights):
+        if value is not None:
+            totals[value] += weight
+    if not totals:
+        return None
+    return max(totals, key=lambda value: totals[value])
+
+
+def _weighted_int_mode(
+    values: Iterable[Optional[int]], weights: list[float]
+) -> Optional[int]:
+    """`_weighted_mode` narrowed to int-coded fields (keeps pyright strict happy
+    when the target attribute is typed `Optional[int]`)."""
+    totals: dict[int, float] = defaultdict(float)
+    for value, weight in zip(values, weights):
+        if value is not None:
+            totals[value] += weight
+    if not totals:
+        return None
+    return max(totals, key=lambda value: totals[value])
+
+
+def _modal_share(
+    values: Iterable[Optional[Union[int, str]]],
+) -> Optional[float]:
+    """The most common value's share of the present (non-None) values — a 0..1
+    measure of how much the cohort agrees — or None when none are present."""
+    present = [v for v in values if v is not None]
+    if not present:
+        return None
+    modal_count: int = Counter(present).most_common(1)[0][1]
+    return modal_count / len(present)
+
+
+def _combine(left: list[float], right: list[float]) -> list[float]:
+    """Element-wise product of two aligned weight vectors (compose weighting
+    factors, e.g. similarity × geo-proximity)."""
+    return [a * b for a, b in zip(left, right)]
+
+
+def _haversine_km(origin: Coordinates, point: Coordinates) -> float:
+    """Great-circle distance in km between two WGS84 points."""
+    radius_km = 6371.0
+    lat1, lat2 = math.radians(origin.latitude), math.radians(point.latitude)
+    delta_lat = lat2 - lat1
+    delta_lon = math.radians(point.longitude - origin.longitude)
+    h = (
+        math.sin(delta_lat / 2) ** 2
+        + math.cos(lat1) * math.cos(lat2) * math.sin(delta_lon / 2) ** 2
+    )
+    return 2 * radius_km * math.asin(min(1.0, math.sqrt(h)))
+
+
+def _geo_weights(
+    target: Optional[Coordinates], members: tuple[ComparableProperty, ...]
+) -> list[float]:
+    """A geo-proximity weight per comparable — an exponential decay in haversine
+    distance to the target. All-neutral (1.0) when the target has no coordinates
+    (geo weighting off) or a neighbour has none (never penalised for absent
+    data); aligned with `members` index-for-index."""
+    if target is None:
+        return [1.0] * len(members)
+    weights: list[float] = []
+    for comparable in members:
+        coordinates = comparable.coordinates
+        if coordinates is None:
+            weights.append(1.0)
+        else:
+            weights.append(
+                math.exp(-_haversine_km(target, coordinates) / _GEO_SCALE_KM)
+            )
+    return weights
+
+
+def _recency_weights(members: tuple[ComparableProperty, ...]) -> list[float]:
+    """A recency weight per comparable — exponential decay in the cert's age
+    relative to the newest in the cohort, so newer neighbours dominate. All-equal
+    when no registration dates are lodged. Aligned with `members`."""
+    newest: date = max(
+        (c.registration_date or date.min for c in members), default=date.min
+    )
+    return [
+        math.exp(
+            -((newest - (c.registration_date or date.min)).days / _DAYS_PER_YEAR)
+            / _RECENCY_TAU_YEARS
+        )
+        for c in members
+    ]
+
+
+def _recency_weighted_choice(
+    members: tuple[ComparableProperty, ...],
+    value_of: Callable[[ComparableProperty], Optional[Union[int, str]]],
+) -> Optional[Union[int, str]]:
+    """The recency-weighted cohort mode of a per-comparable value: each
+    neighbour's vote decays exponentially with the cert's age relative to the
+    newest in the cohort, so newer neighbours dominate and a stale majority can't
+    outvote the current state. Falls back to a plain mode when no registration
+    dates are lodged (all ages 0 ⇒ equal weight). Returns None when no comparable
+    supplies a value. Used for the time-varying components — those upgraded over a
+    dwelling's life (loft top-ups)."""
+    return _weighted_mode(
+        (value_of(comparable) for comparable in members),
+        _recency_weights(members),
+    )
+
+
+def _recency_weighted_mode(
+    members: tuple[ComparableProperty, ...], attr: str
+) -> Optional[Union[int, str]]:
+    """`_recency_weighted_choice` over a main building-part attribute."""
+    return _recency_weighted_choice(
+        members, lambda comparable: _main_part_attr(comparable, attr)
+    )
+
+
+def _comparable_modal_glazing(
+    comparable: ComparableProperty,
+) -> Optional[Union[int, str]]:
+    """A comparable's modal glazing type — the most common across its windows, or
+    None when it lodges none. One glazing signal per neighbour, robust to a single
+    odd window, matching how the harness scores `modal_glazing_type`."""
+    types = [window.glazing_type for window in comparable.epc.sap_windows]
+    return Counter(types).most_common(1)[0][0] if types else None
+
+
+def _main_heating_detail(comparable: ComparableProperty) -> Optional[MainHeatingDetail]:
+    """The primary heating system's detail row, or None when none is lodged."""
+    details = comparable.epc.sap_heating.main_heating_details
+    return details[0] if details else None
+
+
+def _heating_signature(
+    comparable: ComparableProperty,
+) -> Optional[tuple[Union[int, str], Optional[int], bool]]:
+    """The donor-matching signature — main fuel + heating category + cylinder
+    presence: the coarse identity of the heating system. None when no main heating
+    system is lodged, so the comparable is not a donor candidate."""
+    detail = _main_heating_detail(comparable)
+    if detail is None:
+        return None
+    return (
+        detail.main_fuel_type,
+        detail.main_heating_category,
+        comparable.epc.has_hot_water_cylinder,
+    )
+
+
+def _heating_donor(members: tuple[ComparableProperty, ...]) -> Optional[ComparableProperty]:
+    """The coherent heating donor: the comparable whose heating signature is the
+    cohort mode, breaking ties toward the most recent cert (then certificate
+    number, for determinism). None when no neighbour lodges a heating system."""
+    signed = [(c, _heating_signature(c)) for c in members]
+    signatures = [sig for _, sig in signed if sig is not None]
+    if not signatures:
+        return None
+    modal = Counter(signatures).most_common(1)[0][0]
+    matches = [c for c, sig in signed if sig == modal]
+    return max(
+        matches,
+        key=lambda c: (c.registration_date or date.min, c.certificate_number),
+    )
--- a/domain/epc_prediction/prediction_comparison.py
+++ b/domain/epc_prediction/prediction_comparison.py
@ -0,0 +1,287 @@
+"""Per-Property prediction comparison for the EPC Prediction validation harness
+(ADR-0029).
+
+`compare_prediction` scores a predicted `EpcPropertyData` against the actual one
+on the accuracy signals the leave-one-out harness aggregates: classification
+matches on the key categoricals (wall / roof / floor construction + insulation,
+construction age band) and residuals on the geometry (window area + count,
+building-parts count, floor area). Pure — the SAP residual is computed in the
+runner, which has the calculator and the lodged SAP.
+"""
+
+from __future__ import annotations
+
+from collections import Counter
+from dataclasses import dataclass
+from typing import Optional
+
+from datatypes.epc.domain.epc_property_data import (
+    EpcPropertyData,
+    MainHeatingDetail,
+    SapBuildingPart,
+)
+
+
+@dataclass(frozen=True)
+class PredictionComparison:
+    """One Property's prediction accuracy: per-component classification hits +
+    geometry residuals (predicted − actual). `categorical_hits` maps a component
+    name to its hit: True / False, or `None` ("not applicable") when the actual
+    lodges no value there, so the harness can keep it out of the
+    classification-rate denominator rather than score a free win. Keyed by name
+    (not flat fields) so the component set can grow without reshaping the
+    runner — see ADR-0030 Component Accuracy."""
+
+    categorical_hits: dict[str, Optional[bool]]
+    floor_area_residual: float
+    building_parts_residual: int
+    window_count_residual: int
+    total_window_area_residual: float
+    door_count_residual: int
+
+
+def _main(epc: EpcPropertyData) -> SapBuildingPart:
+    return epc.sap_building_parts[0]
+
+
+def _main_floor_construction(epc: EpcPropertyData) -> Optional[int]:
+    """The main building part's ground-floor construction code, or None when no
+    floor dimension is lodged."""
+    dims = _main(epc).sap_floor_dimensions
+    return dims[0].floor_construction if dims else None
+
+
+def _classify(predicted: object, actual: object) -> Optional[bool]:
+    """A categorical hit: None ("not applicable") when the actual is absent,
+    else whether the predicted value matches it."""
+    if actual is None:
+        return None
+    return predicted == actual
+
+
+# RdSAP construction age bands, oldest → newest. Adjacent bands carry near-
+# identical U-values, so an off-by-one is treated as a (SAP-neutral) ±1 hit.
+_AGE_BAND_ORDER: str = "ABCDEFGHIJKL"
+
+
+def _age_band_within_one(predicted: object, actual: object) -> Optional[bool]:
+    """A ±1-band age hit: None when the actual is absent, True on an exact or
+    adjacent-band match, else False (issue #1222 — exact match overstates the
+    SAP impact of age-band misses)."""
+    if actual is None:
+        return None
+    if predicted == actual:
+        return True
+    if (
+        isinstance(predicted, str)
+        and isinstance(actual, str)
+        and predicted in _AGE_BAND_ORDER
+        and actual in _AGE_BAND_ORDER
+    ):
+        return (
+            abs(_AGE_BAND_ORDER.index(predicted) - _AGE_BAND_ORDER.index(actual))
+            <= 1
+        )
+    return False
+
+
+# RdSAP roof-insulation thickness buckets, thinnest → thickest. Uninsulated is
+# lodged as either 0 or "NI" (not insulated), so both map to the bottom rung;
+# "ND" (no data) is off the scale entirely. Adjacent buckets carry near-identical
+# roof U-values, so an off-by-one bucket is treated as a (SAP-neutral) ±1 hit —
+# the same measurement honesty as the construction age band (issue #1222).
+_ROOF_THICKNESS_ORDINAL: dict[object, int] = {
+    0: 0,
+    "NI": 0,
+    "12mm": 1,
+    "25mm": 2,
+    "50mm": 3,
+    "75mm": 4,
+    "100mm": 5,
+    "125mm": 6,
+    "150mm": 7,
+    "175mm": 8,
+    "200mm": 9,
+    "225mm": 10,
+    "250mm": 11,
+    "270mm": 12,
+    "300mm": 13,
+    "350mm": 14,
+    "400mm+": 15,
+}
+
+
+def _roof_insulation_within_one(
+    predicted: object, actual: object
+) -> Optional[bool]:
+    """A ±1-bucket roof-insulation hit: None when the actual is absent, True on an
+    exact or adjacent-bucket match, else False. Off the ordered scale (e.g. the
+    "ND" no-data category) only an exact match counts."""
+    if actual is None:
+        return None
+    if predicted == actual:
+        return True
+    pred_rung = _ROOF_THICKNESS_ORDINAL.get(predicted)
+    actual_rung = _ROOF_THICKNESS_ORDINAL.get(actual)
+    if pred_rung is None or actual_rung is None:
+        return False
+    return abs(pred_rung - actual_rung) <= 1
+
+
+def _main_heating_detail(epc: EpcPropertyData) -> Optional[MainHeatingDetail]:
+    """The primary heating system's detail row, or None when none is lodged."""
+    details = epc.sap_heating.main_heating_details
+    return details[0] if details else None
+
+
+def _heating_hits(
+    predicted: EpcPropertyData, actual: EpcPropertyData
+) -> dict[str, Optional[bool]]:
+    """Classification hits for the heating components — the dominant SAP lever
+    (ADR-0030). Main-system fields come off the primary `MainHeatingDetail`;
+    hot-water + secondary fields off `SapHeating`."""
+    pred_main = _main_heating_detail(predicted)
+    actual_main = _main_heating_detail(actual)
+    pred_h = predicted.sap_heating
+    actual_h = actual.sap_heating
+    return {
+        "heating_main_fuel": _classify(
+            getattr(pred_main, "main_fuel_type", None),
+            getattr(actual_main, "main_fuel_type", None),
+        ),
+        "heating_main_category": _classify(
+            getattr(pred_main, "main_heating_category", None),
+            getattr(actual_main, "main_heating_category", None),
+        ),
+        "heating_main_control": _classify(
+            getattr(pred_main, "main_heating_control", None),
+            getattr(actual_main, "main_heating_control", None),
+        ),
+        "water_heating_fuel": _classify(
+            pred_h.water_heating_fuel, actual_h.water_heating_fuel
+        ),
+        "water_heating_code": _classify(
+            pred_h.water_heating_code, actual_h.water_heating_code
+        ),
+        "has_hot_water_cylinder": _classify(
+            predicted.has_hot_water_cylinder, actual.has_hot_water_cylinder
+        ),
+        "cylinder_insulation_type": _classify(
+            pred_h.cylinder_insulation_type, actual_h.cylinder_insulation_type
+        ),
+        "secondary_heating_type": _classify(
+            pred_h.secondary_heating_type, actual_h.secondary_heating_type
+        ),
+    }
+
+
+def _modal_glazing_type(epc: EpcPropertyData) -> Optional[object]:
+    """The most common glazing type across the dwelling's windows, or None when
+    none are lodged. A single dwelling-level glazing signal, robust to one
+    odd window."""
+    types = [w.glazing_type for w in epc.sap_windows]
+    return Counter(types).most_common(1)[0][0] if types else None
+
+
+def _has_pv(epc: EpcPropertyData) -> bool:
+    """True iff the dwelling lodges any photovoltaic supply (either path)."""
+    source = epc.sap_energy_source
+    return source.photovoltaic_supply is not None or bool(
+        source.photovoltaic_arrays
+    )
+
+
+def _renewables_and_fabric_hits(
+    predicted: EpcPropertyData, actual: EpcPropertyData
+) -> dict[str, Optional[bool]]:
+    """Hits for the remaining fabric-insulation, glazing and renewables
+    components (ADR-0030). Presence flags (room-in-roof, PV, solar) are always
+    applicable — predicting absence when present is a real miss."""
+    return {
+        "roof_insulation_thickness": _classify(
+            _main(predicted).roof_insulation_thickness,
+            _main(actual).roof_insulation_thickness,
+        ),
+        "roof_insulation_thickness_pm1": _roof_insulation_within_one(
+            _main(predicted).roof_insulation_thickness,
+            _main(actual).roof_insulation_thickness,
+        ),
+        "floor_insulation": _classify(
+            _main_floor_insulation(predicted), _main_floor_insulation(actual)
+        ),
+        "has_room_in_roof": _classify(
+            _main(predicted).sap_room_in_roof is not None,
+            _main(actual).sap_room_in_roof is not None,
+        ),
+        "modal_glazing_type": _classify(
+            _modal_glazing_type(predicted), _modal_glazing_type(actual)
+        ),
+        "has_pv": _classify(_has_pv(predicted), _has_pv(actual)),
+        "solar_water_heating": _classify(
+            predicted.solar_water_heating, actual.solar_water_heating
+        ),
+    }
+
+
+def _main_floor_insulation(epc: EpcPropertyData) -> Optional[int]:
+    """The main building part's ground-floor insulation code, or None when no
+    floor dimension is lodged."""
+    dims = _main(epc).sap_floor_dimensions
+    return dims[0].floor_insulation if dims else None
+
+
+def _total_window_area(epc: EpcPropertyData) -> float:
+    return sum(w.window_width * w.window_height for w in epc.sap_windows)
+
+
+def compare_prediction(
+    predicted: EpcPropertyData, actual: EpcPropertyData
+) -> PredictionComparison:
+    """Compare a predicted picture against the actual one, field by field. All
+    residuals are signed, predicted − actual."""
+    fabric_hits: dict[str, Optional[bool]] = {
+        "wall_construction": _classify(
+            _main(predicted).wall_construction,
+            _main(actual).wall_construction,
+        ),
+        "wall_insulation_type": _classify(
+            _main(predicted).wall_insulation_type,
+            _main(actual).wall_insulation_type,
+        ),
+        "construction_age_band": _classify(
+            _main(predicted).construction_age_band,
+            _main(actual).construction_age_band,
+        ),
+        "construction_age_band_pm1": _age_band_within_one(
+            _main(predicted).construction_age_band,
+            _main(actual).construction_age_band,
+        ),
+        "roof_construction": _classify(
+            _main(predicted).roof_construction,
+            _main(actual).roof_construction,
+        ),
+        "floor_construction": _classify(
+            _main_floor_construction(predicted),
+            _main_floor_construction(actual),
+        ),
+    }
+    return PredictionComparison(
+        categorical_hits={
+            **fabric_hits,
+            **_heating_hits(predicted, actual),
+            **_renewables_and_fabric_hits(predicted, actual),
+        },
+        floor_area_residual=(
+            predicted.total_floor_area_m2 - actual.total_floor_area_m2
+        ),
+        building_parts_residual=(
+            len(predicted.sap_building_parts) - len(actual.sap_building_parts)
+        ),
+        window_count_residual=(
+            len(predicted.sap_windows) - len(actual.sap_windows)
+        ),
+        total_window_area_residual=(
+            _total_window_area(predicted) - _total_window_area(actual)
+        ),
+        door_count_residual=predicted.door_count - actual.door_count,
+    )
--- a/domain/epc_prediction/prediction_target.py
+++ b/domain/epc_prediction/prediction_target.py
@ -0,0 +1,68 @@
+"""Assemble an EPC-less Property's PredictionTarget, with the eligibility gate
+(ADR-0031 slice-5d).
+
+A `PredictionTarget` needs the target's own known inputs: its postcode (to find
+the cohort), coordinates (to distance-weight it), and the Landlord-Override
+attributes that condition selection — `property_type` (the HARD cohort filter),
+plus optional `built_form` / `wall_construction`. `property_type` is required: a
+Property whose type is unknown is gated out (never sized from a mixed-type
+cohort), so the builder returns None and the caller skips prediction.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Optional, Union
+
+from domain.geospatial.coordinates import Coordinates
+from domain.property.property import PropertyIdentity
+
+
+@dataclass(frozen=True)
+class PredictionTarget:
+    """The known inputs for the Property whose EPC we are predicting — the fields
+    guaranteed at ingestion (plus any Landlord Overrides, added as they're used).
+    `built_form` is often but not always known.
+    """
+
+    postcode: str
+    property_type: str
+    built_form: Optional[str] = None
+    # A known Landlord Override (e.g. solid brick) conditions cohort selection —
+    # matching comparables are emphasised while enough remain (ADR-0029).
+    wall_construction: Optional[Union[int, str]] = None
+    # The target Property's own coordinates (resolved from its UPRN), against
+    # which neighbours are distance-weighted. None disables geo-weighting.
+    coordinates: Optional[Coordinates] = None
+
+
+@dataclass(frozen=True)
+class PredictionTargetAttributes:
+    """The target Property's own attributes resolved from Landlord Overrides,
+    needed to find and condition its cohort. `property_type` is the code-space
+    value the cohort EPCs carry (e.g. "2"); None means it could not be resolved,
+    which gates the Property out of prediction."""
+
+    property_type: Optional[str]
+    built_form: Optional[str] = None
+    wall_construction: Optional[Union[int, str]] = None
+
+
+def build_prediction_target(
+    identity: PropertyIdentity,
+    coordinates: Optional[Coordinates],
+    attributes: PredictionTargetAttributes,
+) -> Optional[PredictionTarget]:
+    """The PredictionTarget for an EPC-less Property, or None when ineligible —
+    `property_type` is the hard cohort filter, so a Property whose type is unknown
+    is gated out of prediction (ADR-0031) rather than sized from a mixed-type
+    cohort."""
+    if attributes.property_type is None:
+        return None
+    return PredictionTarget(
+        postcode=identity.postcode,
+        property_type=attributes.property_type,
+        built_form=attributes.built_form,
+        wall_construction=attributes.wall_construction,
+        coordinates=coordinates,
+    )
--- a/domain/epc_prediction/validation.py
+++ b/domain/epc_prediction/validation.py
@ -0,0 +1,160 @@
+"""Component Accuracy aggregation for EPC Prediction (ADR-0030).
+
+The leave-one-out scorer, calculator-FREE on purpose: it holds out each SAP 10.2
+target, predicts it from its (all-vintage) ComparableProperty Properties, and aggregates
+the per-component classification hits + geometry residuals from
+`compare_prediction`. This is the *primary*, calculator-independent signal — the
+end-to-end SAP / carbon / PE check (which needs the calculator) is layered on top
+by the runner. The same function backs both the committed ratcheting gate and the
+offline national battle-test (one scorer, two harnesses).
+
+Pure given the loaded cohorts: corpus IO (reading + mapping cert payloads) is the
+caller's job, so this is directly unit-testable.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from datetime import date
+from typing import Iterable, Iterator, Optional, Sequence
+
+from datatypes.epc.domain.epc_property_data import EpcPropertyData
+from domain.epc_prediction.comparable_properties import (
+    ComparableProperty,
+    select_comparables,
+)
+from domain.epc_prediction.epc_prediction import EpcPrediction
+from domain.epc_prediction.prediction_comparison import compare_prediction
+from domain.epc_prediction.prediction_target import PredictionTarget
+
+# Only SAP 10.2 certs are valid held-out targets (ADR-0030) — the only vintage
+# with full-fidelity lodged components. The source cohort keeps all vintages.
+_SAP_10_2: float = 10.2
+
+
+def _empty_classification() -> dict[str, list[int]]:
+    return {}
+
+
+def _empty_residuals() -> dict[str, list[float]]:
+    return {}
+
+
+@dataclass
+class ComponentAccuracy:
+    """Aggregated leave-one-out Component Accuracy over a corpus.
+
+    `classification` maps a component name to [hits, applicable-total] (a
+    not-applicable `None` hit is excluded from the total); `residuals` maps a
+    numeric component to its signed (predicted − actual) values. `targets` counts
+    the held-out SAP 10.2 properties scored.
+    """
+
+    classification: dict[str, list[int]] = field(
+        default_factory=_empty_classification
+    )
+    residuals: dict[str, list[float]] = field(default_factory=_empty_residuals)
+    targets: int = 0
+
+    def rate(self, component: str) -> Optional[float]:
+        """The classification hit-rate for a component, or None when nothing was
+        applicable."""
+        hits, total = self.classification.get(component, [0, 0])
+        return hits / total if total else None
+
+    def mean_abs_residual(self, component: str) -> Optional[float]:
+        """Mean absolute residual for a numeric component, or None when empty."""
+        values = self.residuals.get(component, [])
+        return sum(abs(v) for v in values) / len(values) if values else None
+
+
+def _recency_key(comparable: ComparableProperty) -> tuple[date, str]:
+    return (
+        comparable.registration_date or date.min,
+        comparable.certificate_number,
+    )
+
+
+def _latest_per_address(cohort: Sequence[ComparableProperty]) -> list[ComparableProperty]:
+    """One held-out property per address — the latest cert, the best ground
+    truth. Comparables with no address each stand alone."""
+    latest: dict[str, ComparableProperty] = {}
+    standalone: list[ComparableProperty] = []
+    for c in cohort:
+        if c.address is None:
+            standalone.append(c)
+        elif c.address not in latest or _recency_key(c) > _recency_key(
+            latest[c.address]
+        ):
+            latest[c.address] = c
+    return list(latest.values()) + standalone
+
+
+def iter_predictions(
+    cohorts: Iterable[Sequence[ComparableProperty]],
+    *,
+    target_sap_version: float = _SAP_10_2,
+) -> Iterator[tuple[EpcPropertyData, EpcPropertyData]]:
+    """Yield `(predicted, actual)` for every SAP-`target_sap_version` held-out
+    target across the cohorts — the single leave-one-out orchestration the
+    Component Accuracy scorer and the runner's calculator end-to-end both consume
+    (ADR-0030: one scorer, two harnesses). A target is held out by whole address
+    (so a re-lodgement can't leak) and predicted from its all-vintage cohort."""
+    predictor = EpcPrediction()
+    for cohort in cohorts:
+        for held_out in _latest_per_address(cohort):
+            if held_out.epc.sap_version != target_sap_version:
+                continue
+            others = [
+                c
+                for c in cohort
+                if c.address is None or c.address != held_out.address
+            ]
+            actual = held_out.epc
+            target = PredictionTarget(
+                postcode=actual.postcode,
+                property_type=actual.property_type or "",
+                built_form=actual.built_form,
+                coordinates=held_out.coordinates,
+            )
+            comparables = select_comparables(target, others)
+            if not comparables.members:
+                continue
+            yield predictor.predict(target, comparables), actual
+
+
+def evaluate_component_accuracy(
+    cohorts: Iterable[Sequence[ComparableProperty]],
+    *,
+    target_sap_version: float = _SAP_10_2,
+) -> ComponentAccuracy:
+    """Score Component Accuracy by leave-one-out over each postcode cohort —
+    aggregating the `compare_prediction` hits + residuals across every held-out
+    SAP-`target_sap_version` target. Calculator-free (the primary signal)."""
+    accuracy = ComponentAccuracy()
+    for predicted, actual in iter_predictions(
+        cohorts, target_sap_version=target_sap_version
+    ):
+        comparison = compare_prediction(predicted, actual)
+        accuracy.targets += 1
+        for name, hit in comparison.categorical_hits.items():
+            counter = accuracy.classification.setdefault(name, [0, 0])
+            if hit is not None:
+                counter[1] += 1
+                counter[0] += int(hit)
+        accuracy.residuals.setdefault("floor_area", []).append(
+            comparison.floor_area_residual
+        )
+        accuracy.residuals.setdefault("window_count", []).append(
+            float(comparison.window_count_residual)
+        )
+        accuracy.residuals.setdefault("total_window_area", []).append(
+            comparison.total_window_area_residual
+        )
+        accuracy.residuals.setdefault("building_parts", []).append(
+            float(comparison.building_parts_residual)
+        )
+        accuracy.residuals.setdefault("door_count", []).append(
+            float(comparison.door_count_residual)
+        )
+    return accuracy
--- a/domain/property/property.py
+++ b/domain/property/property.py
@ -7,7 +7,7 @@ from datatypes.epc.domain.epc_property_data import EpcPropertyData
 from domain.geospatial.planning_restrictions import PlanningRestrictions
 from domain.property.site_notes import SiteNotes

-SourcePath = Literal["site_notes", "epc_with_overlay"]
+SourcePath = Literal["site_notes", "epc_with_overlay", "predicted"]


@dataclass(frozen=True)
@ -38,6 +38,11 @@ class Property:
    identity: PropertyIdentity
    epc: Optional[EpcPropertyData] = None
    site_notes: Optional[SiteNotes] = None
+    # A neighbour-synthesised EpcPropertyData (EPC Prediction gap-fill, ADR-0031),
+    # held in its own slot so it coexists with any lodged `epc` (provenance is
+    # structural). Used as the Effective EPC only as a last resort — when there is
+    # neither a lodged EPC nor Site Notes; a real source always wins.
+    predicted_epc: Optional[EpcPropertyData] = None
    # The current open-market value (a Property Valuation) — externally sourced
    # and mostly absent; feeds the Plan's Valuation Uplift £ forms (ADR-0018).
    current_market_value: Optional[float] = None
@ -62,8 +67,11 @@ class Property:
            return "site_notes"
        if self.epc is not None:
            return "epc_with_overlay"
+        if self.predicted_epc is not None:
+            return "predicted"
        raise ValueError(
-            "Property has neither Site Notes nor an EPC; no source path to model from"
+            "Property has neither Site Notes, an EPC, nor a predicted EPC; "
+            "no source path to model from"
        )

    @property
@ -71,10 +79,15 @@ class Property:
        """The EpcPropertyData the modelling pipeline scores against.

        Path 1: the Site Notes' surveyed data. Path 2: the public EPC (Landlord
-        Overrides overlay is a later slice — returned as-is for now).
+        Overrides overlay is a later slice — returned as-is for now). Path 3: a
+        neighbour-synthesised EPC (EPC Prediction gap-fill, ADR-0031), used only
+        when neither real source is present.
        """
        if self.source_path == "site_notes":
            assert self.site_notes is not None
            return self.site_notes.to_epc_property_data()
+        if self.source_path == "predicted":
+            assert self.predicted_epc is not None
+            return self.predicted_epc
        assert self.epc is not None
        return self.epc
--- a/harness/epc_prediction_corpus.py
+++ b/harness/epc_prediction_corpus.py
@ -0,0 +1,129 @@
+"""Load a postcode-clustered EPC corpus into ComparableProperty cohorts (ADR-0030).
+
+The IO half of the EPC Prediction validation: read each postcode's cached cert
+payloads, map them through `EpcPropertyDataMapper.from_api_response`, and build
+`ComparableProperty`s carrying the register metadata (address + registration date) the
+leave-one-out scorer needs to dedupe re-lodgements and hold out a whole address.
+A cert the mapper rejects (unsupported schema, malformed) is skipped, never fatal.
+
+Shared by the committed-fixture gate, the local validation script, and the
+offline national battle-test — the corpus directory differs, the loading does
+not. Layout: `<dir>/<POSTCODE>/<cert>.json` + `<dir>/_index.json`.
+"""
+
+from __future__ import annotations
+
+import hashlib
+import json
+from datetime import date
+from pathlib import Path
+from typing import Any, Optional
+
+from datatypes.epc.domain.mapper import EpcPropertyDataMapper
+from domain.epc_prediction.comparable_properties import ComparableProperty
+from domain.geospatial.coordinates import Coordinates
+
+# Identifying free-text fields blanked when freezing a payload into the committed
+# fixture (postcode is kept — it is coarse open data and the cohort key).
+_PII_BLANK_FIELDS = ("address_line_2", "address_line_3", "post_town")
+
+
+def load_corpus(corpus_dir: Path) -> list[list[ComparableProperty]]:
+    """Load every postcode cohort under `corpus_dir`. Returns one list of
+    Comparables per postcode (the unit the leave-one-out scorer iterates)."""
+    index_path = corpus_dir / "_index.json"
+    if not index_path.exists():
+        raise FileNotFoundError(
+            f"no corpus index at {index_path} — run a corpus fetch first"
+        )
+    index: dict[str, list[str]] = json.loads(index_path.read_text())
+    coordinates = load_coordinates(corpus_dir)
+    return [
+        _load_cohort(corpus_dir, postcode, certs, coordinates)
+        for postcode, certs in index.items()
+    ]
+
+
+def _load_cohort(
+    corpus_dir: Path,
+    postcode: str,
+    certs: list[str],
+    coordinates: dict[int, Coordinates],
+) -> list[ComparableProperty]:
+    cohort: list[ComparableProperty] = []
+    for cert in certs:
+        path = corpus_dir / postcode / f"{cert}.json"
+        if not path.exists():
+            continue
+        raw: dict[str, Any] = json.loads(path.read_text())
+        try:
+            epc = EpcPropertyDataMapper.from_api_response(raw)
+        except Exception:  # noqa: BLE001 — a bad cert must not abort the sweep
+            continue
+        uprn = _uprn(raw)
+        cohort.append(
+            ComparableProperty(
+                epc=epc,
+                certificate_number=cert,
+                address=_address(raw),
+                registration_date=_registration_date(raw),
+                coordinates=coordinates.get(uprn) if uprn is not None else None,
+            )
+        )
+    return cohort
+
+
+def load_coordinates(corpus_dir: Path) -> dict[int, Coordinates]:
+    """The optional `_coordinates.json` sidecar (`{uprn: [lon, lat]}`), resolved
+    from the OS Open-UPRN data by `fetch_corpus_coordinates.py`. Absent for a
+    corpus without geo data — geo-weighting then simply stays off."""
+    path = corpus_dir / "_coordinates.json"
+    if not path.exists():
+        return {}
+    raw: dict[str, list[float]] = json.loads(path.read_text())
+    return {
+        int(uprn): Coordinates(longitude=lon_lat[0], latitude=lon_lat[1])
+        for uprn, lon_lat in raw.items()
+    }
+
+
+def _uprn(raw: dict[str, Any]) -> Optional[int]:
+    value = raw.get("uprn")
+    return int(value) if value is not None else None
+
+
+def stable_hash(prefix: str, value: str) -> str:
+    """A short, deterministic, one-way token for a free-text identifier. Stable
+    across re-lodgements of the same address (normalised first), so dedup still
+    collapses them — but the plaintext address never lands in the repo."""
+    digest = hashlib.sha1(value.strip().upper().encode()).hexdigest()[:12]
+    return f"{prefix}-{digest}"
+
+
+def anonymise_payload(raw: dict[str, Any]) -> dict[str, Any]:
+    """De-identify a cert payload for the committed fixture: hash the street
+    address (`address_line_1`) and certificate number into stable tokens, blank
+    the other free-text address lines, and keep everything else — postcode,
+    registration date, SAP version, lodged figures, and all component fields —
+    untouched (gov data is OGL; only the direct identifiers are removed)."""
+    out = dict(raw)
+    address = raw.get("address_line_1")
+    if address:
+        out["address_line_1"] = stable_hash("addr", str(address))
+    cert = raw.get("certificate_number")
+    if cert:
+        out["certificate_number"] = stable_hash("cert", str(cert))
+    for blank_field in _PII_BLANK_FIELDS:
+        if blank_field in out:
+            out[blank_field] = ""
+    return out
+
+
+def _address(raw: dict[str, Any]) -> Optional[str]:
+    value = raw.get("address_line_1")
+    return str(value).strip().upper() if value else None
+
+
+def _registration_date(raw: dict[str, Any]) -> Optional[date]:
+    value = raw.get("registration_date")
+    return date.fromisoformat(str(value)) if value else None
--- a/infrastructure/postgres/epc_property_table.py
+++ b/infrastructure/postgres/epc_property_table.py
@ -25,6 +25,12 @@ class EpcPropertyModel(SQLModel, table=True):
    property_id: Optional[int] = Field(default=None)
    portfolio_id: Optional[int] = Field(default=None)
    uploaded_file_id: Optional[int] = Field(default=None)
+    # Provenance of this EPC picture: "lodged" (a real public/landlord EPC) or
+    # "predicted" (EPC Prediction gap-fill, ADR-0031). A property may hold one of
+    # each, so reads filter on it. Defaults to "lodged" — every existing row is a
+    # real EPC. (Requires a matching `source` column in the Drizzle schema — see
+    # docs/handover; the SQLModel mirror is what the test DB builds from.)
+    source: str = Field(default="lodged")

    # Identity / admin
    uprn: Optional[int] = Field(default=None)
@ -190,6 +196,7 @@ class EpcPropertyModel(SQLModel, table=True):
        data: EpcPropertyData,
        property_id: Optional[int] = None,
        portfolio_id: Optional[int] = None,
+        source: str = "lodged",
    ) -> EpcPropertyModel:
        es = data.sap_energy_source
        h = data.sap_heating
@ -202,6 +209,7 @@ class EpcPropertyModel(SQLModel, table=True):
        return cls(
            property_id=property_id,
            portfolio_id=portfolio_id,
+            source=source,
            uprn=data.uprn,
            uprn_source=data.uprn_source,
            report_reference=data.report_reference,
--- a/orchestration/ingestion_orchestrator.py
+++ b/orchestration/ingestion_orchestrator.py
@ -5,7 +5,18 @@ from dataclasses import dataclass
 from typing import Any, Optional, Protocol

 from datatypes.epc.domain.epc_property_data import EpcPropertyData
+from domain.epc_prediction.comparable_properties import (
+    ComparableProperty,
+    select_comparables,
+)
+from domain.epc_prediction.epc_prediction import EpcPrediction
+from domain.epc_prediction.prediction_target import (
+    PredictionTargetAttributes,
+    build_prediction_target,
+)
+from domain.geospatial.coordinates import Coordinates
 from domain.geospatial.spatial_reference import SpatialReference
+from domain.property.property import PropertyIdentity
 from repositories.geospatial.geospatial_repository import GeospatialRepository
 from repositories.unit_of_work import UnitOfWork

@ -16,6 +27,19 @@ class EpcFetcher(Protocol):
    def get_by_uprn(self, uprn: int) -> Optional[EpcPropertyData]: ...


+class ComparablesRepo(Protocol):
+    """The cohort source for EPC Prediction (e.g. EpcComparablePropertiesRepository)."""
+
+    def candidates_for(self, postcode: str) -> list[ComparableProperty]: ...
+
+
+class PredictionAttributesReader(Protocol):
+    """Resolves an EPC-less Property's prediction attributes from Landlord
+    Overrides (e.g. the property_overrides read adapter)."""
+
+    def attributes_for(self, property_id: int) -> PredictionTargetAttributes: ...
+
+
 class SolarFetcher(Protocol):
    """The slice of the Google Solar client Ingestion needs (e.g. GoogleSolarApiClient)."""

@ -24,6 +48,17 @@ class SolarFetcher(Protocol):
    ) -> dict[str, Any]: ...


+@dataclass
+class _Prep:
+    """A property's transactional inputs read in the unit phase, before external
+    IO: its identity (postcode + uprn) and, when the predictor is wired, its
+    resolved prediction attributes (so the no-unit fetch phase can predict)."""
+
+    property_id: int
+    identity: PropertyIdentity
+    attributes: Optional[PredictionTargetAttributes]
+
+
@dataclass
 class _Fetched:
    """One property's externally-fetched source data, awaiting the write phase."""
@ -31,6 +66,7 @@ class _Fetched:
    property_id: int
    uprn: int
    epc: Optional[EpcPropertyData]
+    predicted_epc: Optional[EpcPropertyData]
    solar_insights: Optional[dict[str, Any]]
    spatial: Optional[SpatialReference]

@ -59,46 +95,104 @@ class IngestionOrchestrator:
        epc_fetcher: EpcFetcher,
        geospatial_repo: GeospatialRepository,
        solar_fetcher: SolarFetcher,
+        comparables_repo: Optional[ComparablesRepo] = None,
+        prediction_attributes_reader: Optional[PredictionAttributesReader] = None,
+        epc_prediction: Optional[EpcPrediction] = None,
    ) -> None:
        self._unit_of_work = unit_of_work
        self._epc_fetcher = epc_fetcher
        self._geospatial_repo = geospatial_repo
        self._solar_fetcher = solar_fetcher
+        # EPC Prediction gap-fill (ADR-0031): when all three are wired, an EPC-less
+        # Property is predicted from its postcode cohort and persisted to the
+        # predicted slot. When any is absent, prediction is simply off and
+        # ingestion behaves exactly as before.
+        self._comparables_repo = comparables_repo
+        self._prediction_attributes_reader = prediction_attributes_reader
+        self._epc_prediction = epc_prediction

    def run(self, property_ids: list[int]) -> None:
-        uprns = self._uprns_for(property_ids)
-        fetched = [self._fetch(property_id, uprn) for property_id, uprn in uprns]
+        preps = self._prepare(property_ids)
+        fetched = [self._fetch(prep) for prep in preps]
        self._persist(fetched)

-    def _uprns_for(self, property_ids: list[int]) -> list[tuple[int, int]]:
+    def _prepare(self, property_ids: list[int]) -> list[_Prep]:
        # A short read unit; properties with no UPRN (e.g. landlord_property_id
-        # only) are skipped — a later Site-Notes path covers them.
+        # only) are skipped — a later Site-Notes path covers them. Prediction
+        # attributes (Landlord Overrides) are resolved here, in-unit, so the
+        # no-unit fetch phase holds everything it needs to predict.
        with self._unit_of_work() as uow:
            properties = uow.property.get_many(property_ids)
-            return [
-                (property_id, prop.identity.uprn)
-                for property_id, prop in zip(property_ids, properties, strict=True)
-                if prop.identity.uprn is not None
-            ]
+            preps: list[_Prep] = []
+            for property_id, prop in zip(property_ids, properties, strict=True):
+                if prop.identity.uprn is None:
+                    continue
+                attributes = (
+                    self._prediction_attributes_reader.attributes_for(property_id)
+                    if self._prediction_attributes_reader is not None
+                    else None
+                )
+                preps.append(_Prep(property_id, prop.identity, attributes))
+            return preps

-    def _fetch(self, property_id: int, uprn: int) -> _Fetched:
+    def _fetch(self, prep: _Prep) -> _Fetched:
        # No unit open here — this is the external-IO phase. One spatial
        # reference lookup yields the coordinates (which drive the Solar fetch)
        # and the planning protections (cached for Modelling, ADR-0020).
+        uprn = prep.identity.uprn
+        assert uprn is not None  # _prepare drops UPRN-less properties
        epc = self._epc_fetcher.get_by_uprn(uprn)
        solar_insights: Optional[dict[str, Any]] = None
        spatial: Optional[SpatialReference] = self._geospatial_repo.spatial_for(uprn)
-        if spatial is not None and spatial.coordinates is not None:
+        coordinates = spatial.coordinates if spatial is not None else None
+        if coordinates is not None:
            solar_insights = self._solar_fetcher.get_building_insights(
-                spatial.coordinates.longitude, spatial.coordinates.latitude
+                coordinates.longitude, coordinates.latitude
            )
-        return _Fetched(property_id, uprn, epc, solar_insights, spatial)
+        predicted_epc = (
+            self._predict(prep.identity, coordinates, prep.attributes)
+            if epc is None
+            else None
+        )
+        return _Fetched(
+            prep.property_id, uprn, epc, predicted_epc, solar_insights, spatial
+        )
+
+    def _predict(
+        self,
+        identity: PropertyIdentity,
+        coordinates: Optional[Coordinates],
+        attributes: Optional[PredictionTargetAttributes],
+    ) -> Optional[EpcPropertyData]:
+        """Synthesise the EPC-less Property's picture from its postcode cohort, or
+        None when the predictor is unwired, the Property is gated out (unknown
+        property type), or no comparables survive selection (ADR-0031)."""
+        if (
+            self._comparables_repo is None
+            or self._epc_prediction is None
+            or attributes is None
+        ):
+            return None
+        target = build_prediction_target(identity, coordinates, attributes)
+        if target is None:
+            return None
+        candidates = self._comparables_repo.candidates_for(identity.postcode)
+        comparables = select_comparables(target, candidates)
+        if not comparables.members:
+            return None
+        return self._epc_prediction.predict(target, comparables)

    def _persist(self, fetched: list[_Fetched]) -> None:
        with self._unit_of_work() as uow:
            for item in fetched:
                if item.epc is not None:
                    uow.epc.save(item.epc, property_id=item.property_id)
+                elif item.predicted_epc is not None:
+                    uow.epc.save(
+                        item.predicted_epc,
+                        property_id=item.property_id,
+                        source="predicted",
+                    )
                # The live `solar` table is keyed by UPRN and needs the fetch's
                # coordinates; insights are only set when those coordinates were
                # resolved, so spatial.coordinates is non-None alongside them.
--- a/repositories/comparable_properties/init.py
+++ b/repositories/comparable_properties/init.py
--- a/repositories/comparable_properties/comparable_properties_repository.py
+++ b/repositories/comparable_properties/comparable_properties_repository.py
@ -0,0 +1,24 @@
+"""The ComparableProperties repository port (ADR-0029 decision 3; ADR-0031).
+
+Owns the cohort IO for EPC Prediction — given a target's postcode, return the
+candidate `ComparableProperty`s (the postcode's other lodged certs, mapped to
+`EpcPropertyData` with their register metadata + resolved coordinates). The pure
+domain `select_comparables` then filters these into the reference cohort, and
+`EpcPrediction.predict` synthesises the picture. Kept a port so the orchestrator
+depends on the cohort source abstractly and tests substitute a fake.
+"""
+
+from __future__ import annotations
+
+from abc import ABC, abstractmethod
+
+from domain.epc_prediction.comparable_properties import ComparableProperty
+
+
+class ComparablePropertiesRepository(ABC):
+    @abstractmethod
+    def candidates_for(self, postcode: str) -> list[ComparableProperty]:
+        """Every candidate neighbour in `postcode` — one `ComparableProperty` per lodged
+        cert, carrying its `EpcPropertyData`, certificate number, address,
+        registration date, and resolved coordinates (None when unresolvable)."""
+        ...
--- a/repositories/comparable_properties/epc_comparable_properties_repository.py
+++ b/repositories/comparable_properties/epc_comparable_properties_repository.py
@ -0,0 +1,82 @@
+"""EPC-API + geospatial adapter for the ComparableProperties port (ADR-0031).
+
+Assembles a postcode's candidate cohort: the EPC search lists the postcode's
+lodged certs, each is fetched + mapped to `EpcPropertyData`, and the certs' UPRNs
+are resolved to coordinates in one batched geospatial read (closely-numbered
+UPRNs share a partition). Register metadata the cert itself doesn't carry
+(address, registration date) is threaded off the search row.
+"""
+
+from __future__ import annotations
+
+from datetime import date
+from typing import Optional, Protocol
+
+from datatypes.epc.domain.epc_property_data import EpcPropertyData
+from datatypes.epc.search.epc_search_result import EpcSearchResult
+from domain.epc_prediction.comparable_properties import ComparableProperty
+from domain.geospatial.coordinates import Coordinates
+from repositories.comparable_properties.comparable_properties_repository import (
+    ComparablePropertiesRepository,
+)
+
+
+class CohortEpcClient(Protocol):
+    """The slice of the EPC-API client the cohort fetch needs (e.g.
+    `EpcClientService`)."""
+
+    def search_by_postcode(self, postcode: str) -> list[EpcSearchResult]: ...
+
+    def get_by_certificate_number(self, cert_num: str) -> EpcPropertyData: ...
+
+
+class CohortGeospatial(Protocol):
+    """The geospatial slice the cohort fetch needs — batch UPRN→coordinate."""
+
+    def coordinates_for_uprns(
+        self, uprns: list[int]
+    ) -> dict[int, Coordinates]: ...
+
+
+class EpcComparablePropertiesRepository(ComparablePropertiesRepository):
+    def __init__(
+        self, epc_client: CohortEpcClient, geospatial: CohortGeospatial
+    ) -> None:
+        self._epc_client = epc_client
+        self._geospatial = geospatial
+
+    def candidates_for(self, postcode: str) -> list[ComparableProperty]:
+        results: list[EpcSearchResult] = self._epc_client.search_by_postcode(
+            postcode
+        )
+        uprns: list[int] = [r.uprn for r in results if r.uprn is not None]
+        coordinates: dict[int, Coordinates] = self._geospatial.coordinates_for_uprns(
+            uprns
+        )
+        return [self._comparable(result, coordinates) for result in results]
+
+    def _comparable(
+        self, result: EpcSearchResult, coordinates: dict[int, Coordinates]
+    ) -> ComparableProperty:
+        epc: EpcPropertyData = self._epc_client.get_by_certificate_number(
+            result.certificate_number
+        )
+        resolved: Optional[Coordinates] = (
+            coordinates.get(result.uprn) if result.uprn is not None else None
+        )
+        return ComparableProperty(
+            epc=epc,
+            certificate_number=result.certificate_number,
+            address=result.address_line_1,
+            registration_date=_parse_date(result.registration_date),
+            coordinates=resolved,
+        )
+
+
+def _parse_date(value: str) -> Optional[date]:
+    """The register's ISO registration date, or None when unparseable (the
+    predictor falls back to an unweighted recency)."""
+    try:
+        return date.fromisoformat(value[:10])
+    except ValueError:
+        return None
--- a/repositories/epc/epc_postgres_repository.py
+++ b/repositories/epc/epc_postgres_repository.py
@ -45,7 +45,7 @@ from infrastructure.postgres.epc_property_table import (
    EpcRenewableHeatIncentiveModel,
    EpcWindowModel,
 )
-from repositories.epc.epc_repository import EpcRepository
+from repositories.epc.epc_repository import EpcRepository, EpcSource
 from utilities.private import private

 _T = TypeVar("_T")
@ -88,14 +88,16 @@ class EpcPostgresRepository(EpcRepository):
        data: EpcPropertyData,
        property_id: Optional[int] = None,
        portfolio_id: Optional[int] = None,
+        source: EpcSource = "lodged",
    ) -> int:
-        # Idempotent on property_id: a re-run replaces the property's EPC graph
-        # rather than duplicating it (ADR-0012). Anonymous saves (no property_id)
-        # always insert.
+        # Idempotent on (property_id, source): a re-run replaces the property's
+        # EPC graph for THAT source rather than duplicating it (ADR-0012), and a
+        # predicted save leaves the lodged one intact, and vice versa (ADR-0031).
+        # Anonymous saves (no property_id) always insert.
        if property_id is not None:
-            self._delete_for_property(property_id)
+            self._delete_for_property(property_id, source)
        parent = EpcPropertyModel.from_epc_property_data(
-            data, property_id=property_id, portfolio_id=portfolio_id
+            data, property_id=property_id, portfolio_id=portfolio_id, source=source
        )
        self._session.add(parent)
        self._session.flush()
@ -154,15 +156,16 @@ class EpcPostgresRepository(EpcRepository):
            )
        return epc_property_id

-    def _delete_for_property(self, property_id: int) -> None:
-        """Remove the property's existing EPC graph (parent + child tables) so a
-        re-save replaces rather than duplicates (ADR-0012)."""
+    def _delete_for_property(self, property_id: int, source: EpcSource) -> None:
+        """Remove the property's existing EPC graph for `source` (parent + child
+        tables) so a re-save replaces rather than duplicates (ADR-0012), without
+        disturbing the other source's slot (ADR-0031)."""
        epc_ids = [
            i
            for i in self._session.exec(
-                select(EpcPropertyModel.id).where(
-                    EpcPropertyModel.property_id == property_id
-                )
+                select(EpcPropertyModel.id)
+                .where(EpcPropertyModel.property_id == property_id)
+                .where(EpcPropertyModel.source == source)
            ).all()
            if i is not None
        ]
@ -200,9 +203,20 @@ class EpcPostgresRepository(EpcRepository):
        )

    def get_for_property(self, property_id: int) -> Optional[EpcPropertyData]:
+        return self._get_for_property(property_id, source="lodged")
+
+    def get_predicted_for_property(
+        self, property_id: int
+    ) -> Optional[EpcPropertyData]:
+        return self._get_for_property(property_id, source="predicted")
+
+    def _get_for_property(
+        self, property_id: int, source: EpcSource
+    ) -> Optional[EpcPropertyData]:
        row = self._session.exec(
            select(EpcPropertyModel)
            .where(EpcPropertyModel.property_id == property_id)
+            .where(EpcPropertyModel.source == source)
            .order_by(EpcPropertyModel.id)  # type: ignore[arg-type]
        ).first()
        if row is None or row.id is None:
@ -212,13 +226,26 @@ class EpcPostgresRepository(EpcRepository):
    def get_for_properties(
        self, property_ids: list[int]
    ) -> dict[int, EpcPropertyData]:
-        """Bulk-hydrate a batch's EPCs in a handful of per-table IN queries
-        (ADR-0012), not N x per-property. Load-whole per ADR-0002."""
+        """Bulk-hydrate a batch's LODGED EPCs, keyed by property_id."""
+        return self._for_properties(property_ids, source="lodged")
+
+    def get_predicted_for_properties(
+        self, property_ids: list[int]
+    ) -> dict[int, EpcPropertyData]:
+        """Bulk-hydrate a batch's PREDICTED EPCs (ADR-0031), keyed by property_id."""
+        return self._for_properties(property_ids, source="predicted")
+
+    def _for_properties(
+        self, property_ids: list[int], source: EpcSource
+    ) -> dict[int, EpcPropertyData]:
+        """Bulk-hydrate a batch's EPCs of one `source` in a handful of per-table IN
+        queries (ADR-0012), not N x per-property. Load-whole per ADR-0002."""
        if not property_ids:
            return {}
        parents = self._session.exec(
            select(EpcPropertyModel)
            .where(col(EpcPropertyModel.property_id).in_(property_ids))
+            .where(EpcPropertyModel.source == source)
            .order_by(EpcPropertyModel.id)  # type: ignore[arg-type]
        ).all()
        parent_by_property: dict[int, EpcPropertyModel] = {}
--- a/repositories/epc/epc_repository.py
+++ b/repositories/epc/epc_repository.py
@ -1,10 +1,14 @@
 from __future__ import annotations

 from abc import ABC, abstractmethod
-from typing import Optional
+from typing import Literal, Optional

 from datatypes.epc.domain.epc_property_data import EpcPropertyData

+# Provenance of a persisted EPC picture (ADR-0031): a real "lodged" EPC, or a
+# "predicted" one synthesised by EPC Prediction. A property can hold one of each.
+EpcSource = Literal["lodged", "predicted"]
+

 class EpcRepository(ABC):
    """Persists and loads the structured EPC Property Data slice.
@ -12,7 +16,8 @@ class EpcRepository(ABC):
    `save` writes the `EpcPropertyData` to the `epc_property` parent row and its
    child tables; `get` reconstructs the persisted projection back into an
    `EpcPropertyData`. Round-trip fidelity over that projection is pinned by the
-    Slice-1 round-trip test (Hestia-Homes/Model#1129).
+    Slice-1 round-trip test (Hestia-Homes/Model#1129). Each EPC carries a
+    `source` so a lodged and a predicted picture coexist per property (ADR-0031).
    """

    @abstractmethod
@ -21,18 +26,36 @@ class EpcRepository(ABC):
        data: EpcPropertyData,
        property_id: int | None = None,
        portfolio_id: int | None = None,
+        source: EpcSource = "lodged",
    ) -> int: ...

    @abstractmethod
    def get(self, epc_property_id: int) -> EpcPropertyData: ...

    @abstractmethod
-    def get_for_property(self, property_id: int) -> Optional[EpcPropertyData]: ...
+    def get_for_property(self, property_id: int) -> Optional[EpcPropertyData]:
+        """The property's LODGED EPC (the predicted slot is read separately)."""
+        ...
+
+    @abstractmethod
+    def get_predicted_for_property(
+        self, property_id: int
+    ) -> Optional[EpcPropertyData]:
+        """The property's PREDICTED EPC (EPC Prediction gap-fill), or None."""
+        ...

    @abstractmethod
    def get_for_properties(
        self, property_ids: list[int]
    ) -> dict[int, EpcPropertyData]:
-        """Bulk-hydrate a batch's EPCs, keyed by property_id (only those with an
-        EPC are present). A handful of per-table queries, not N per property."""
+        """Bulk-hydrate a batch's LODGED EPCs, keyed by property_id (only those
+        with one are present). A handful of per-table queries, not N per property."""
+        ...
+
+    @abstractmethod
+    def get_predicted_for_properties(
+        self, property_ids: list[int]
+    ) -> dict[int, EpcPropertyData]:
+        """Bulk-hydrate a batch's PREDICTED EPCs (ADR-0031), keyed by property_id
+        (only those with one are present)."""
        ...
--- a/repositories/geospatial/geospatial_repository.py
+++ b/repositories/geospatial/geospatial_repository.py
@ -18,6 +18,21 @@ class GeospatialRepository(ABC):
    @abstractmethod
    def coordinates_for(self, uprn: int) -> Optional[Coordinates]: ...

+    def coordinates_for_uprns(
+        self, uprns: list[int]
+    ) -> dict[int, Coordinates]:
+        """Resolve many UPRNs at once, returning only those covered. The default
+        is a per-UPRN loop; adapters whose storage is partitioned (e.g. the S3
+        Open-UPRN parquet) override this to read each partition once for all the
+        UPRNs it covers — far fewer reads when the UPRNs are co-located, as
+        closely-numbered UPRNs share a partition."""
+        resolved: dict[int, Coordinates] = {}
+        for uprn in uprns:
+            coordinates = self.coordinates_for(uprn)
+            if coordinates is not None:
+                resolved[uprn] = coordinates
+        return resolved
+
    def spatial_for(self, uprn: int) -> Optional[SpatialReference]:
        """The Property's coordinates and planning protections together, in one
        reference lookup (ADR-0020) — Ingestion uses the coordinates to drive
--- a/repositories/geospatial/geospatial_s3_repository.py
+++ b/repositories/geospatial/geospatial_s3_repository.py
@ -1,5 +1,6 @@
 from __future__ import annotations

+from collections import defaultdict
 from collections.abc import Callable
 from typing import Any, Optional

@ -62,6 +63,30 @@ class GeospatialS3Repository(GeospatialRepository):
        reference: Optional[SpatialReference] = self.spatial_for(uprn)
        return reference.coordinates if reference is not None else None

+    def coordinates_for_uprns(
+        self, uprns: list[int]
+    ) -> dict[int, Coordinates]:
+        """Batch lookup that reads the meta once, groups the UPRNs by their
+        covering partition, and reads each partition once for all the UPRNs it
+        covers (co-located UPRNs share a partition, so a cohort is typically one
+        or two reads). Uncovered / absent UPRNs are omitted from the result."""
+        meta = self._read_parquet(_META_KEY)
+        by_partition: dict[str, list[int]] = defaultdict(list)
+        for uprn in uprns:
+            covering = meta[(meta["lower"] <= uprn) & (meta["upper"] >= uprn)]
+            if not covering.empty:
+                by_partition[str(covering["filenames"].iloc[0])].append(uprn)
+        resolved: dict[int, Coordinates] = {}
+        for filename, partition_uprns in by_partition.items():
+            partition = self._read_parquet(f"spatial/{filename}")
+            rows = partition[partition["UPRN"].isin(partition_uprns)]
+            for _, row in rows.iterrows():
+                resolved[int(row["UPRN"])] = Coordinates(
+                    longitude=float(row["LONGITUDE"]),
+                    latitude=float(row["LATITUDE"]),
+                )
+        return resolved
+
    def planning_restrictions_for(self, uprn: int) -> Optional[PlanningRestrictions]:
        reference: Optional[SpatialReference] = self.spatial_for(uprn)
        return reference.restrictions if reference is not None else None
--- a/repositories/property/prediction_target_attributes_reader.py
+++ b/repositories/property/prediction_target_attributes_reader.py
@ -0,0 +1,23 @@
+"""Read port for an EPC-less Property's prediction attributes (ADR-0031 slice-5d).
+
+Returns the `property_type` / `built_form` / `wall_construction` resolved from
+Landlord Overrides that `build_prediction_target` needs. Kept a port because the
+real adapter — a read over the `property_overrides` fact layer — is being built
+separately (see docs/HANDOVER_EPC_PREDICTION_WIRING.md); the ingestion wiring
+depends on this abstraction and tests substitute a fake.
+"""
+
+from __future__ import annotations
+
+from abc import ABC, abstractmethod
+
+from domain.epc_prediction.prediction_target import PredictionTargetAttributes
+
+
+class PredictionTargetAttributesReader(ABC):
+    @abstractmethod
+    def attributes_for(self, property_id: int) -> PredictionTargetAttributes:
+        """The Property's resolved prediction attributes. `property_type` is None
+        when it could not be resolved — which gates the Property out of
+        prediction (`build_prediction_target` returns None)."""
+        ...
--- a/repositories/property/property_postgres_repository.py
+++ b/repositories/property/property_postgres_repository.py
@ -72,6 +72,7 @@ class PropertyPostgresRepository(PropertyRepository):
        return Property(
            identity=identity,
            epc=self._epc().get_for_property(property_id),
+            predicted_epc=self._epc().get_predicted_for_property(property_id),
            planning_restrictions=_restrictions_of(row.uprn, restrictions),
        )

@ -83,6 +84,7 @@ class PropertyPostgresRepository(PropertyRepository):
        ).all()
        row_by_id = {row.id: row for row in rows}
        epcs = self._epc().get_for_properties(property_ids)
+        predicted_epcs = self._epc().get_predicted_for_properties(property_ids)
        restrictions: dict[int, PlanningRestrictions] = self._restrictions_for(
            [row.uprn for row in rows if row.uprn is not None]
        )
@ -101,6 +103,7 @@ class PropertyPostgresRepository(PropertyRepository):
                        landlord_property_id=row.landlord_property_id,
                    ),
                    epc=epcs.get(property_id),
+                    predicted_epc=predicted_epcs.get(property_id),
                    planning_restrictions=_restrictions_of(row.uprn, restrictions),
                )
            )
--- a/scripts/build_epc_prediction_fixture.py
+++ b/scripts/build_epc_prediction_fixture.py
@ -0,0 +1,117 @@
+"""Freeze a small, anonymised EPC Prediction fixture for the Tier-1 gate (ADR-0030).
+
+Curates a deterministic subset of the local scratch corpus
+(`/tmp/epc_prediction_corpus`, gitignored) into a committed fixture under
+`tests/fixtures/epc_prediction/`. Selection keeps postcodes that can actually be
+scored — at least one SAP 10.2 target plus a second distinct address to predict
+it from. Every payload is run through `anonymise_payload` first, so the street
+address + certificate number become opaque tokens and no plaintext address lands
+in the repo (postcode + component data are open gov data and kept).
+
+The committed fixture is the deterministic basis for the ratcheting gate; the
+large scratch corpus stays local for iteration + the offline battle-test.
+
+USAGE
+-----
+    PYTHONPATH=. python scripts/build_epc_prediction_fixture.py
+
+Source: $EPC_PREDICTION_CORPUS (default /tmp/epc_prediction_corpus).
+"""
+
+from __future__ import annotations
+
+import json
+import os
+from pathlib import Path
+from typing import Any
+
+from harness.epc_prediction_corpus import anonymise_payload, stable_hash
+
+SOURCE = Path(os.environ.get("EPC_PREDICTION_CORPUS", "/tmp/epc_prediction_corpus"))
+FIXTURE = Path("tests/fixtures/epc_prediction")
+
+_SAP_10_2 = "10.2"
+_MAX_POSTCODES = 15       # keep the committed fixture small
+_MAX_COHORT = 25          # cap certs per postcode to bound repo size
+
+
+def _load_payloads(
+    postcode: str, certs: list[str]
+) -> list[tuple[str, dict[str, Any]]]:
+    """The `(source cert number, payload)` pairs for a postcode — the cert
+    number lives in the index/filename, not the cached payload."""
+    payloads: list[tuple[str, dict[str, Any]]] = []
+    for cert in certs:
+        path = SOURCE / postcode / f"{cert}.json"
+        if path.exists():
+            payloads.append((cert, json.loads(path.read_text())))
+    return payloads
+
+
+def _qualifies(payloads: list[tuple[str, dict[str, Any]]]) -> bool:
+    """A postcode is usable iff it has ≥1 SAP 10.2 cert (a valid target) and ≥2
+    distinct addresses (so the target has at least one neighbour to predict it)."""
+    has_target = any(
+        str(p.get("sap_version")) == _SAP_10_2 for _, p in payloads
+    )
+    addresses = {
+        str(p.get("address_line_1", "")).strip().upper() for _, p in payloads
+    }
+    return has_target and len(addresses) >= 2
+
+
+def main() -> None:
+    index: dict[str, list[str]] = json.loads(
+        (SOURCE / "_index.json").read_text()
+    )
+    fixture_index: dict[str, list[str]] = {}
+    kept_uprns: set[str] = set()
+    total_certs = 0
+    for postcode, certs in index.items():
+        if len(fixture_index) >= _MAX_POSTCODES:
+            break
+        payloads = _load_payloads(postcode, certs)
+        if not _qualifies(payloads):
+            continue
+        kept: list[str] = []
+        for cert, raw in payloads[:_MAX_COHORT]:
+            cert_token = stable_hash("cert", cert)
+            anon = anonymise_payload(raw)
+            out = FIXTURE / postcode / f"{cert_token}.json"
+            out.parent.mkdir(parents=True, exist_ok=True)
+            out.write_text(json.dumps(anon))
+            kept.append(cert_token)
+            uprn = raw.get("uprn")
+            if uprn is not None:
+                kept_uprns.add(str(int(uprn)))
+        fixture_index[postcode] = kept
+        total_certs += len(kept)
+    (FIXTURE / "_index.json").parent.mkdir(parents=True, exist_ok=True)
+    (FIXTURE / "_index.json").write_text(json.dumps(fixture_index, indent=2))
+    _write_coordinates(kept_uprns)
+    print(
+        f"wrote {len(fixture_index)} postcodes / {total_certs} anonymised certs "
+        f"to {FIXTURE}"
+    )
+
+
+def _write_coordinates(kept_uprns: set[str]) -> None:
+    """Carry the geo-proximity coordinates for the kept UPRNs into the committed
+    fixture (subset of the corpus `_coordinates.json`), so the gate exercises
+    geo-weighting without S3. Skipped when the corpus has no coordinates sidecar.
+    Coordinates are OS OpenData (OGL) and add no identifiability beyond the UPRN
+    already kept in the fixture."""
+    source = SOURCE / "_coordinates.json"
+    if not source.exists():
+        return
+    corpus_coords: dict[str, list[float]] = json.loads(source.read_text())
+    fixture_coords = {
+        uprn: corpus_coords[uprn]
+        for uprn in kept_uprns
+        if uprn in corpus_coords
+    }
+    (FIXTURE / "_coordinates.json").write_text(json.dumps(fixture_coords))
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/fetch_corpus_coordinates.py
+++ b/scripts/fetch_corpus_coordinates.py
@ -0,0 +1,101 @@
+"""One-time: resolve coordinates for every EPC Prediction corpus UPRN (#1227).
+
+Reads the OS Open-UPRN parquet from S3 (DATA_BUCKET / spatial/) via boto3 and
+resolves each corpus cert's `uprn` to WGS84 lon/lat. UPRNs are grouped by their
+covering partition (the same UPRN-range bucketing `GeospatialS3Repository` uses),
+so each ~1.7 MB partition is read at most once — the efficient batch lookup we
+intend to add to the Geospatial Repo. Caches `{uprn: [lon, lat]}` locally
+(gitignored) so the validation harness can score intra-postcode distances
+without S3.
+
+USAGE
+-----
+    set -a; . backend/.env; set +a
+    PYTHONPATH=. python scripts/fetch_corpus_coordinates.py
+
+Source corpus: $EPC_PREDICTION_CORPUS (default /tmp/epc_prediction_corpus).
+Output: <corpus>/../epc_prediction_corpus_coords.json
+"""
+
+from __future__ import annotations
+
+import io
+import json
+import os
+from collections import defaultdict
+from pathlib import Path
+from typing import Any
+
+import boto3
+import pandas as pd
+
+CORPUS = Path(os.environ.get("EPC_PREDICTION_CORPUS", "/tmp/epc_prediction_corpus"))
+# Sidecar inside the corpus dir, so `load_corpus` picks it up automatically.
+OUT = CORPUS / "_coordinates.json"
+_BUCKET = os.environ["DATA_BUCKET"]
+_META_KEY = "spatial/filename_meta.parquet"
+
+
+def _reader() -> Any:
+    # boto3.client is overloaded per-service in the installed stubs; bind to Any.
+    boto3_client: Any = boto3.client  # pyright: ignore[reportUnknownMemberType, reportUnknownVariableType]
+    s3: Any = boto3_client("s3")
+
+    def read_parquet(key: str) -> pd.DataFrame:
+        response: dict[str, Any] = s3.get_object(Bucket=_BUCKET, Key=key)
+        body: bytes = response["Body"].read()
+        return pd.read_parquet(io.BytesIO(body))
+
+    return read_parquet
+
+
+def _corpus_uprns() -> set[int]:
+    index: dict[str, list[str]] = json.loads((CORPUS / "_index.json").read_text())
+    uprns: set[int] = set()
+    for postcode, certs in index.items():
+        for cert in certs:
+            path = CORPUS / postcode / f"{cert}.json"
+            if not path.exists():
+                continue
+            raw: dict[str, Any] = json.loads(path.read_text())
+            uprn = raw.get("uprn")
+            if uprn is not None:
+                uprns.add(int(uprn))
+    return uprns
+
+
+def main() -> None:
+    read_parquet = _reader()
+    uprns = _corpus_uprns()
+    print(f"corpus UPRNs: {len(uprns)}")
+
+    meta = read_parquet(_META_KEY)
+    # Group each UPRN by its covering partition (lower <= uprn <= upper), so each
+    # partition file is read once for all the UPRNs it covers.
+    by_partition: dict[str, list[int]] = defaultdict(list)
+    uncovered = 0
+    for uprn in uprns:
+        covering = meta[(meta["lower"] <= uprn) & (meta["upper"] >= uprn)]
+        if covering.empty:
+            uncovered += 1
+            continue
+        by_partition[str(covering["filenames"].iloc[0])].append(uprn)
+    print(f"distinct partitions to read: {len(by_partition)}; uncovered: {uncovered}")
+
+    coords: dict[str, list[float]] = {}
+    for i, (filename, part_uprns) in enumerate(sorted(by_partition.items()), 1):
+        partition = read_parquet(f"spatial/{filename}")
+        rows = partition[partition["UPRN"].isin(part_uprns)]
+        for _, row in rows.iterrows():
+            coords[str(int(row["UPRN"]))] = [
+                float(row["LONGITUDE"]),
+                float(row["LATITUDE"]),
+            ]
+        print(f"  [{i}/{len(by_partition)}] {filename}: +{len(rows)}")
+
+    OUT.write_text(json.dumps(coords))
+    print(f"resolved {len(coords)}/{len(uprns)} UPRNs -> {OUT}")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/fetch_epc_prediction_corpus.py
+++ b/scripts/fetch_epc_prediction_corpus.py
@ -0,0 +1,162 @@
+"""Build the frozen postcode-clustered corpus for EPC Prediction validation
+(ADR-0029).
+
+WHAT THIS IS FOR
+----------------
+EPC Prediction estimates an EPC-less Property's `EpcPropertyData` from its
+**ComparableProperty Properties** — the other certs in its postcode. Validating that
+needs *geographic clusters* (many certs per postcode), not random certs, so the
+leave-one-out harness can drop one cert and predict it from its neighbours.
+
+This script builds that corpus once, offline-reusable: it samples postcodes
+from the register (an unbiased spread over dates/regions), then for each
+postcode downloads **every** domestic cert's full schema payload — the exact
+shape `EpcPropertyDataMapper.from_api_response` consumes — grouped on disk by
+postcode. The validation harness then runs entirely against this cache: fast,
+deterministic, no rate limits.
+
+Pair it with `validate_epc_prediction.py` (the leave-one-out accuracy harness).
+
+HOW THE SAMPLE IS DRAWN
+-----------------------
+Postcodes are seeded by sampling random PAGES of `/api/domestic/search` across
+a past date window (the register orders by registration date, so random pages
+give an unbiased postcode spread). Each seed cert contributes its postcode; we
+take the first N distinct postcodes and pull each one's *entire* cohort via
+`search_by_postcode` -> per-cert `/api/certificate`.
+
+USAGE
+-----
+    PYTHONPATH=. python scripts/fetch_epc_prediction_corpus.py
+
+Resumable — re-running skips certs already cached, so it is safe to interrupt.
+Token is read from `backend/.env` (`OPEN_EPC_API_TOKEN`). The register rejects
+a `date_end` that includes today, so keep the window in the past.
+
+Cache dir defaults to `/tmp/epc_prediction_corpus`, overridable via the
+`EPC_PREDICTION_CORPUS` env var. Layout:
+    <cache>/<POSTCODE_NOSPACE>/<cert_number>.json   # raw API `data` payload
+    <cache>/_index.json                             # {postcode: [cert, ...]}
+"""
+
+import json
+import os
+import random
+import time
+from pathlib import Path
+
+import httpx
+from dotenv import load_dotenv
+
+load_dotenv("backend/.env")
+TOKEN = os.environ["OPEN_EPC_API_TOKEN"]
+BASE = "https://api.get-energy-performance-data.communities.gov.uk"
+H = {"Authorization": f"Bearer {TOKEN}", "Accept": "application/json"}
+CACHE = Path(os.environ.get("EPC_PREDICTION_CORPUS", "/tmp/epc_prediction_corpus"))
+CACHE.mkdir(parents=True, exist_ok=True)
+
+# Seed-postcode sampling. `date_end` must be strictly before today. TOTAL_PAGES
+# is the `totalPages` the search returns for this window at page_size=100 —
+# re-probe if you change the window (it only needs to be an upper bound for the
+# random page draw; out-of-range pages just return fewer rows).
+WINDOW = {"date_start": "2026-01-01", "date_end": "2026-05-31"}
+TOTAL_PAGES = 7402
+SEED_PAGES = 20        # random search pages → postcode seeds
+N_POSTCODES = 150      # distinct postcodes to pull full cohorts for
+random.seed(2026)      # reproducible draw
+
+
+def _get(url: str, params: dict[str, object], timeout: float = 20.0, tries: int = 5):
+    """GET with retry/backoff on 429 + 5xx (honours Retry-After)."""
+    r = None
+    for i in range(tries):
+        try:
+            r = httpx.get(url, params=params, headers=H, timeout=timeout)
+        except httpx.HTTPError:
+            time.sleep(1.5 * (i + 1))
+            continue
+        if r.status_code == 429 or r.status_code >= 500:
+            ra = r.headers.get("Retry-After")
+            time.sleep(float(ra) if ra else 1.5 * (i + 1))
+            continue
+        return r
+    return r
+
+
+def _normalise_postcode(postcode: str) -> str:
+    return postcode.replace(" ", "").upper()
+
+
+def sample_postcodes() -> list[str]:
+    """Draw distinct postcodes from random search pages across the window."""
+    pages = sorted(random.sample(range(1, TOTAL_PAGES + 1), SEED_PAGES))
+    seen: dict[str, None] = {}
+    for p in pages:
+        r = _get(
+            f"{BASE}/api/domestic/search",
+            {**WINDOW, "current_page": p, "page_size": 100},
+        )
+        if r is None or not r.is_success:
+            print(f"  seed page {p} -> {getattr(r, 'status_code', 'ERR')}")
+            continue
+        for row in r.json().get("data", []):
+            pc = row.get("postcode")
+            if pc:
+                seen[_normalise_postcode(pc)] = None
+        print(f"  page {p}: cumulative {len(seen)} distinct postcodes")
+        if len(seen) >= N_POSTCODES:
+            break
+    return list(seen)[:N_POSTCODES]
+
+
+def cohort_cert_numbers(postcode: str) -> list[str]:
+    r = _get(f"{BASE}/api/domestic/search", {"postcode": postcode})
+    if r is None or not r.is_success:
+        return []
+    return [
+        row["certificateNumber"]
+        for row in r.json().get("data", [])
+        if row.get("certificateNumber")
+    ]
+
+
+def fetch_cert(postcode: str, cert: str) -> bool:
+    """Fetch + cache one cert's raw `data` payload. Returns True on success
+    (or already-cached)."""
+    out = CACHE / postcode / f"{cert}.json"
+    if out.exists():
+        return True
+    r = _get(f"{BASE}/api/certificate", {"certificate_number": cert})
+    if r is None or not r.is_success:
+        return False
+    try:
+        payload = r.json()["data"]
+    except (KeyError, ValueError):
+        return False
+    out.parent.mkdir(parents=True, exist_ok=True)
+    out.write_text(json.dumps(payload))
+    return True
+
+
+def main() -> None:
+    print("sampling seed postcodes ...")
+    postcodes = sample_postcodes()
+    print(f"pulling full cohorts for {len(postcodes)} postcodes into {CACHE} ...")
+    index: dict[str, list[str]] = {}
+    t0 = time.time()
+    total_certs = 0
+    for i, pc in enumerate(postcodes, 1):
+        certs = cohort_cert_numbers(pc)
+        fetched = [c for c in certs if fetch_cert(pc, c)]
+        index[pc] = fetched
+        total_certs += len(fetched)
+        print(f"  [{i}/{len(postcodes)}] {pc}: {len(fetched)}/{len(certs)} certs")
+    (CACHE / "_index.json").write_text(json.dumps(index, indent=2))
+    print(
+        f"DONE in {time.time() - t0:.0f}s: {len(postcodes)} postcodes, "
+        f"{total_certs} certs cached under {CACHE}"
+    )
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/fetch_epc_prediction_dense_corpus.py
+++ b/scripts/fetch_epc_prediction_dense_corpus.py
@ -0,0 +1,197 @@
+"""Build a *geographically dense* postcode-clustered corpus for EPC Prediction
+(cross-postcode geo expansion — follow-up to ADR-0029 / issue #1227, #1237).
+
+WHY A SECOND CORPUS
+-------------------
+`fetch_epc_prediction_corpus.py` samples *scattered* national postcodes — fine
+for intra-postcode validation, but a held-out target's true geo-neighbours (the
+adjacent postcodes on its street) are NOT in that corpus, so the cross-postcode
+geo lever (distance-weighting a cohort that spans postcode boundaries) and
+built-form-aware sizing (#1237) cannot be measured on it.
+
+This builds dense clusters instead: each of K reproducible seed postcodes is
+expanded — via postcodes.io's nearest-postcode endpoint — into EVERY unit
+postcode within `RADIUS_M`, and each of those gets its full EPC cohort pulled.
+The result is a handful of dense neighbourhoods (a target's real neighbours ARE
+in-corpus) spread across the country (the seeds are nationally sampled, so the
+validation set stays diverse).
+
+postcodes.io is a CORPUS-BUILD dependency only (a free, public, OGL postcode
+service) — the predictor stays pure. The gov EPC API has no area/prefix search
+(a partial postcode 400s; only a full unit is accepted), which is why the
+neighbour enumeration is external.
+
+USAGE
+-----
+    PYTHONPATH=. python scripts/fetch_epc_prediction_dense_corpus.py          # full
+    PYTHONPATH=. python scripts/fetch_epc_prediction_dense_corpus.py --pilot  # 2 seeds
+
+Resumable — re-running skips cached certs. Token from `backend/.env`. Cache dir
+defaults to `/tmp/epc_prediction_dense_corpus` (separate from the scattered one),
+overridable via `EPC_PREDICTION_DENSE_CORPUS`. Layout matches the other corpus
+(`<POSTCODE_NOSPACE>/<cert>.json` + `_index.json`), so `load_corpus` and the
+coordinate resolver consume it unchanged.
+"""
+
+import json
+import os
+import random
+import sys
+import time
+from pathlib import Path
+from typing import Any, Optional
+
+import httpx
+from dotenv import load_dotenv
+
+load_dotenv("backend/.env")
+TOKEN = os.environ["OPEN_EPC_API_TOKEN"]
+BASE = "https://api.get-energy-performance-data.communities.gov.uk"
+H = {"Authorization": f"Bearer {TOKEN}", "Accept": "application/json"}
+POSTCODES_IO = "https://api.postcodes.io"
+CACHE = Path(
+    os.environ.get("EPC_PREDICTION_DENSE_CORPUS", "/tmp/epc_prediction_dense_corpus")
+)
+CACHE.mkdir(parents=True, exist_ok=True)
+
+# Seed sampling mirrors the scattered fetch (random search pages → an unbiased
+# national postcode spread), then each seed is densified. `date_end` must be
+# strictly before today.
+WINDOW = {"date_start": "2026-01-01", "date_end": "2026-05-31"}
+TOTAL_PAGES = 7402
+SEED_PAGES = 8         # random search pages → seed postcodes
+N_SEEDS = 25           # dense neighbourhood clusters to build
+RADIUS_M = 300         # postcodes.io nearest-postcode radius around each seed
+MAX_PER_SEED = 60      # cap unit postcodes per seed (dense urban seeds can be huge)
+random.seed(2026)      # reproducible draw
+
+
+def _get(url: str, params: dict[str, Any], headers: Optional[dict[str, str]] = None,
+         timeout: float = 20.0, tries: int = 5):
+    """GET with retry/backoff on 429 + 5xx (honours Retry-After)."""
+    r = None
+    for i in range(tries):
+        try:
+            r = httpx.get(url, params=params, headers=headers or {}, timeout=timeout)
+        except httpx.HTTPError:
+            time.sleep(1.5 * (i + 1))
+            continue
+        if r.status_code == 429 or r.status_code >= 500:
+            ra = r.headers.get("Retry-After")
+            time.sleep(float(ra) if ra else 1.5 * (i + 1))
+            continue
+        return r
+    return r
+
+
+def _normalise_postcode(postcode: str) -> str:
+    return postcode.replace(" ", "").upper()
+
+
+def sample_seed_postcodes(n_seeds: int) -> list[str]:
+    """Draw distinct seed postcodes from random search pages across the window."""
+    pages = sorted(random.sample(range(1, TOTAL_PAGES + 1), SEED_PAGES))
+    seen: dict[str, None] = {}
+    for p in pages:
+        r = _get(
+            f"{BASE}/api/domestic/search",
+            {**WINDOW, "current_page": p, "page_size": 100},
+            headers=H,
+        )
+        if r is None or not r.is_success:
+            print(f"  seed page {p} -> {getattr(r, 'status_code', 'ERR')}")
+            continue
+        for row in r.json().get("data", []):
+            pc = row.get("postcode")
+            if pc:
+                seen[pc] = None
+        if len(seen) >= n_seeds:
+            break
+    return list(seen)[:n_seeds]
+
+
+def nearby_postcodes(seed: str) -> list[str]:
+    """Every unit postcode within `RADIUS_M` of `seed`, via postcodes.io's
+    nearest-postcode endpoint (seeded on the seed's own coordinates). Returns the
+    seed itself plus its neighbours (deduped, capped)."""
+    s = _get(f"{POSTCODES_IO}/postcodes/{seed.replace(' ', '%20')}", {})
+    if s is None or not s.is_success:
+        return [seed]
+    res: dict[str, Any] = s.json().get("result") or {}
+    lat: Any = res.get("latitude")
+    lon: Any = res.get("longitude")
+    if lat is None or lon is None:
+        return [seed]
+    r = _get(
+        f"{POSTCODES_IO}/postcodes",
+        {"lon": lon, "lat": lat, "radius": RADIUS_M, "limit": 100},
+    )
+    if r is None or not r.is_success:
+        return [seed]
+    items: list[dict[str, Any]] = r.json().get("result") or []
+    found: list[str] = [str(x["postcode"]) for x in items if x.get("postcode")]
+    ordered = [seed] + [p for p in found if p != seed]
+    return ordered[:MAX_PER_SEED]
+
+
+def cohort_cert_numbers(postcode: str) -> list[str]:
+    r = _get(f"{BASE}/api/domestic/search", {"postcode": postcode}, headers=H)
+    if r is None or not r.is_success:
+        return []
+    return [
+        row["certificateNumber"]
+        for row in r.json().get("data", [])
+        if row.get("certificateNumber")
+    ]
+
+
+def fetch_cert(postcode_nospace: str, cert: str) -> bool:
+    """Fetch + cache one cert's raw `data` payload (True on success / cached)."""
+    out = CACHE / postcode_nospace / f"{cert}.json"
+    if out.exists():
+        return True
+    r = _get(f"{BASE}/api/certificate", {"certificate_number": cert}, headers=H)
+    if r is None or not r.is_success:
+        return False
+    try:
+        payload = r.json()["data"]
+    except (KeyError, ValueError):
+        return False
+    out.parent.mkdir(parents=True, exist_ok=True)
+    out.write_text(json.dumps(payload))
+    return True
+
+
+def main() -> None:
+    pilot = "--pilot" in sys.argv
+    n_seeds = 2 if pilot else N_SEEDS
+    print(f"sampling {n_seeds} seed postcodes ...")
+    seeds = sample_seed_postcodes(n_seeds)
+    print(f"seeds: {seeds}")
+
+    index: dict[str, list[str]] = {}
+    t0 = time.time()
+    total_certs = 0
+    for si, seed in enumerate(seeds, 1):
+        neighbourhood = nearby_postcodes(seed)
+        print(f"\n[seed {si}/{len(seeds)}] {seed}: {len(neighbourhood)} postcodes "
+              f"within {RADIUS_M}m")
+        for pc in neighbourhood:
+            nospace = _normalise_postcode(pc)
+            if nospace in index:
+                continue  # neighbourhoods can overlap; fetch each postcode once
+            certs = cohort_cert_numbers(pc)
+            fetched = [c for c in certs if fetch_cert(nospace, c)]
+            if fetched:
+                index[nospace] = fetched
+                total_certs += len(fetched)
+        print(f"  cumulative: {len(index)} postcodes, {total_certs} certs")
+    (CACHE / "_index.json").write_text(json.dumps(index, indent=2))
+    print(
+        f"\nDONE in {time.time() - t0:.0f}s: {len(seeds)} seeds, "
+        f"{len(index)} postcodes, {total_certs} certs under {CACHE}"
+    )
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/validate_epc_prediction.py
+++ b/scripts/validate_epc_prediction.py
@ -0,0 +1,177 @@
+"""Leave-one-out accuracy harness for EPC Prediction (ADR-0029).
+
+Runs entirely against the frozen postcode-clustered corpus
+(`fetch_epc_prediction_corpus.py`). For every cert that has neighbours, it
+drops that cert from its postcode cohort, predicts it from the rest using only
+its *guaranteed* inputs (property type + built form), and compares the predicted
+`EpcPropertyData` to the actual one.
+
+Reports the ADR-0029 metrics:
+  - classification rate: main wall construction (extend as coverage grows);
+  - geometry residuals: floor area, window count + total window area, building
+    parts (mean signed + mean absolute);
+  - SAP reported three ways — predicted-then-calculated vs (a) the actual lodged
+    SAP, (b) the calculator on the actual components, (c) the neighbour-mean SAP
+    baseline (the number predict-then-calculate must beat).
+
+USAGE
+-----
+    PYTHONPATH=. python scripts/validate_epc_prediction.py
+
+Corpus dir: $EPC_PREDICTION_CORPUS (default /tmp/epc_prediction_corpus).
+"""
+
+from __future__ import annotations
+
+import os
+import statistics
+from pathlib import Path
+from typing import Optional
+
+from datatypes.epc.domain.epc_property_data import EpcPropertyData
+from domain.epc_prediction.comparable_properties import ComparableProperty
+from domain.epc_prediction.validation import (
+    evaluate_component_accuracy,
+    iter_predictions,
+)
+from domain.sap10_calculator.calculator import Sap10Calculator, SapResult
+from harness.epc_prediction_corpus import load_corpus
+
+_KG_PER_TONNE: float = 1000.0
+
+CORPUS = Path(os.environ.get("EPC_PREDICTION_CORPUS", "/tmp/epc_prediction_corpus"))
+
+
+def _result(
+    calculator: Sap10Calculator, epc: EpcPropertyData
+) -> Optional[SapResult]:
+    try:
+        return calculator.calculate(epc)
+    except Exception:  # noqa: BLE001 — some pictures don't score; count as misses
+        return None
+
+
+def _co2_tonnes(result: SapResult) -> float:
+    """Calculated annual CO2 in tonnes, matching the lodged `co2_emissions_current`
+    scale (see domain/property_baseline/performance.py)."""
+    return result.co2_kg_per_yr / _KG_PER_TONNE
+
+
+def main() -> None:
+    cohorts = load_corpus(CORPUS)
+    calculator = Sap10Calculator()
+
+    # PRIMARY signal — Component Accuracy, calculator-free (the shared scorer).
+    accuracy = evaluate_component_accuracy(cohorts)
+
+    print(f"corpus: {CORPUS}")
+    print(f"predicted {accuracy.targets} SAP-10.2 held-out targets\n")
+    print("--- Component Accuracy (PRIMARY, calculator-independent) ---")
+    for name, (hits, total) in accuracy.classification.items():
+        if total:
+            print(f"CLASSIFICATION  {name}: {hits}/{total} = {hits / total:.1%}")
+    print()
+    _floor_area_error(cohorts)
+    _residual("floor_area (m2)", accuracy.residuals.get("floor_area", []))
+    _residual("window_count", accuracy.residuals.get("window_count", []))
+    _residual(
+        "total_window_area (m2)", accuracy.residuals.get("total_window_area", [])
+    )
+    _residual("building_parts", accuracy.residuals.get("building_parts", []))
+    _residual("door_count", accuracy.residuals.get("door_count", []))
+
+    # SECONDARY guard — end-to-end vs API-lodged, calculator-FLOORED. Re-walks the
+    # same held-out targets (one orchestration via iter_predictions).
+    sap_vs_lodged: list[float] = []
+    co2_vs_lodged: list[float] = []
+    pei_vs_lodged: list[float] = []
+    # Calculator floors — calc(actual) vs lodged — per metric. Each is the error
+    # the end-to-end cannot beat (the API-path mapper/calculator residual, a
+    # separate workstream), so it attributes how much of a metric's pred-vs-lodged
+    # gap is the calculator vs the prediction. PEI carries a far larger floor than
+    # SAP (~16 vs ~1.6 kWh/m2 / pts), so the headline PEI MAE must not be read as
+    # pure prediction error (issue #1228).
+    sap_floor: list[float] = []
+    co2_floor: list[float] = []
+    pei_floor: list[float] = []
+    for predicted, actual in iter_predictions(cohorts):
+        pred_result = _result(calculator, predicted)
+        actual_result = _result(calculator, actual)
+        lodged_sap = actual.energy_rating_current
+        lodged_co2 = actual.co2_emissions_current
+        lodged_pei = actual.energy_consumption_current
+        if pred_result is not None:
+            if lodged_sap is not None:
+                sap_vs_lodged.append(
+                    abs(pred_result.sap_score_continuous - lodged_sap)
+                )
+            if lodged_co2 is not None:
+                co2_vs_lodged.append(abs(_co2_tonnes(pred_result) - lodged_co2))
+            if lodged_pei is not None:
+                pei_vs_lodged.append(
+                    abs(pred_result.primary_energy_kwh_per_m2 - lodged_pei)
+                )
+        if actual_result is not None:
+            if lodged_sap is not None:
+                sap_floor.append(
+                    abs(actual_result.sap_score_continuous - lodged_sap)
+                )
+            if lodged_co2 is not None:
+                co2_floor.append(abs(_co2_tonnes(actual_result) - lodged_co2))
+            if lodged_pei is not None:
+                pei_floor.append(
+                    abs(actual_result.primary_energy_kwh_per_m2 - lodged_pei)
+                )
+
+    print()
+    print("--- End-to-end vs API-lodged (SECONDARY, calculator-FLOORED) ---")
+    _sap_line("SAP |pred − lodged|", sap_vs_lodged)
+    _sap_line("CO2 (t) |pred − lodged|", co2_vs_lodged)
+    _sap_line("PEI (kWh/m2) |pred − lodged|", pei_vs_lodged)
+    _sap_line("  floor: SAP |calc(actual) − lodged|", sap_floor)
+    _sap_line("  floor: CO2 |calc(actual) − lodged|", co2_floor)
+    _sap_line("  floor: PEI |calc(actual) − lodged|", pei_floor)
+
+
+def _floor_area_error(cohorts: list[list[ComparableProperty]]) -> None:
+    """Floor-area accuracy as MAE (m²) and MAPE (% of the actual), plus the
+    typical (median actual) size — so the absolute error can be read relative to
+    how big dwellings are. The predicted area is the cohort median, set
+    independently of the geo/similarity weighting that drives the categoricals."""
+    pairs = [
+        (predicted.total_floor_area_m2, actual.total_floor_area_m2)
+        for predicted, actual in iter_predictions(cohorts)
+    ]
+    valid = [(p, a) for p, a in pairs if a]
+    if not valid:
+        print("RESIDUAL  floor_area: (none)")
+        return
+    mae = statistics.mean(abs(p - a) for p, a in valid)
+    mape = statistics.mean(abs(p - a) / a for p, a in valid)
+    typical = statistics.median(a for _, a in valid)
+    print(
+        f"RESIDUAL  floor_area: MAE {mae:.2f} m2 | MAPE {mape:.1%} | "
+        f"typical (median actual) {typical:.0f} m2 (n={len(valid)})"
+    )
+
+
+def _residual(label: str, values: list[float]) -> None:
+    if not values:
+        print(f"RESIDUAL  {label}: (none)")
+        return
+    mean_signed = statistics.mean(values)
+    mean_abs = statistics.mean(abs(v) for v in values)
+    print(f"RESIDUAL  {label}: mean {mean_signed:+.2f} | mean|·| {mean_abs:.2f} "
+          f"(n={len(values)})")
+
+
+def _sap_line(label: str, values: list[float]) -> None:
+    if not values:
+        print(f"{label}: (none)")
+        return
+    print(f"{label}: MAE {statistics.mean(values):.2f} | "
+          f"median {statistics.median(values):.2f} (n={len(values)})")
+
+
+if __name__ == "__main__":
+    main()
--- a/tests/domain/epc_prediction/init.py
+++ b/tests/domain/epc_prediction/init.py
--- a/tests/domain/epc_prediction/test_comparable_properties.py
+++ b/tests/domain/epc_prediction/test_comparable_properties.py
@ -0,0 +1,173 @@
+"""Behaviour of ComparableProperty Properties selection (ADR-0029): given a prediction
+target's known inputs and the raw postcode cohort, choose + weight the
+comparables EPC Prediction will synthesise from. Filter-then-relax ladder:
+hard filters on identity (property type, built form) + known overrides while
+enough remain, weighted by recency × similarity. Pure domain logic.
+"""
+
+from datetime import date
+from typing import Optional, Union
+
+from datatypes.epc.domain.epc_property_data import EpcPropertyData, SapBuildingPart
+from domain.epc_prediction.comparable_properties import (
+    ComparableProperty,
+    ComparableProperties,
+    select_comparables,
+)
+from domain.epc_prediction.prediction_target import PredictionTarget
+
+
+def _comparable(
+    *,
+    property_type: str,
+    certificate_number: str,
+    built_form: str = "1",
+    wall_construction: Optional[Union[int, str]] = None,
+    address: Optional[str] = None,
+    registration_date: Optional[date] = None,
+) -> ComparableProperty:
+    """A ComparableProperty carrying only the fields under test (opaque EpcPropertyData
+    with property_type / built_form / main wall set — the partial-instance idiom)."""
+    epc: EpcPropertyData = object.__new__(EpcPropertyData)
+    epc.property_type = property_type
+    epc.built_form = built_form
+    main: SapBuildingPart = object.__new__(SapBuildingPart)
+    if wall_construction is not None:
+        main.wall_construction = wall_construction
+    epc.sap_building_parts = [main]
+    return ComparableProperty(
+        epc=epc,
+        certificate_number=certificate_number,
+        address=address,
+        registration_date=registration_date,
+    )
+
+
+def test_selects_only_candidates_of_the_same_property_type() -> None:
+    # Arrange — a target house (property_type "2"); cohort of 2 houses + 1 flat.
+    target = PredictionTarget(postcode="LS6 1AA", property_type="2")
+    candidates = [
+        _comparable(property_type="2", certificate_number="A"),
+        _comparable(property_type="2", certificate_number="B"),
+        _comparable(property_type="1", certificate_number="C"),
+    ]
+
+    # Act
+    result: ComparableProperties = select_comparables(target, candidates)
+
+    # Assert — the flat is excluded; the two houses remain.
+    assert {c.certificate_number for c in result.members} == {"A", "B"}
+
+
+def test_dedupes_re_lodgements_to_the_latest_cert_per_address() -> None:
+    # Arrange — a register cohort with one address (FLAT 3) lodged three times.
+    # Comparables are one-per-real-neighbour, so a re-lodged address must not
+    # count three times towards the mode; the latest cert is its current state.
+    target = PredictionTarget(postcode="LS6 1AA", property_type="2")
+    candidates = [
+        _comparable(
+            property_type="2",
+            certificate_number="OLD",
+            address="FLAT 3",
+            registration_date=date(2020, 4, 6),
+        ),
+        _comparable(
+            property_type="2",
+            certificate_number="MID",
+            address="FLAT 3",
+            registration_date=date(2021, 2, 1),
+        ),
+        _comparable(
+            property_type="2",
+            certificate_number="NEW",
+            address="FLAT 3",
+            registration_date=date(2025, 1, 20),
+        ),
+        _comparable(
+            property_type="2",
+            certificate_number="OTHER",
+            address="FLAT 5",
+            registration_date=date(2024, 9, 27),
+        ),
+    ]
+
+    # Act
+    result: ComparableProperties = select_comparables(target, candidates)
+
+    # Assert — FLAT 3 collapses to its latest cert; FLAT 5 is untouched.
+    assert {c.certificate_number for c in result.members} == {"NEW", "OTHER"}
+
+
+def test_filters_to_the_known_built_form_when_enough_remain() -> None:
+    # Arrange — a mid-terrace target (built_form "4"); cohort of 5 mid-terraces
+    # + 2 detached, all houses. The built form is known and leaves ≥ k, so it is
+    # applied as a hard filter.
+    target = PredictionTarget(
+        postcode="LS6 1AA", property_type="2", built_form="4"
+    )
+    candidates = [
+        _comparable(property_type="2", built_form="4", certificate_number=f"T{i}")
+        for i in range(5)
+    ] + [
+        _comparable(property_type="2", built_form="1", certificate_number=f"D{i}")
+        for i in range(2)
+    ]
+
+    # Act
+    result: ComparableProperties = select_comparables(
+        target, candidates, minimum_cohort=5
+    )
+
+    # Assert — only the five mid-terraces survive.
+    assert {c.certificate_number for c in result.members} == {
+        "T0", "T1", "T2", "T3", "T4"
+    }
+
+
+def test_known_wall_override_emphasises_matching_comparables() -> None:
+    # Arrange — a mixed street: 5 solid-brick (code 2) + 3 cavity (code 1) houses.
+    # We KNOW the target is solid brick (a Landlord Override), and the filter
+    # leaves ≥ k, so cavity neighbours are dropped (the border-property case).
+    target = PredictionTarget(
+        postcode="LS6 1AA", property_type="2", wall_construction=2
+    )
+    candidates = [
+        _comparable(property_type="2", wall_construction=2, certificate_number=f"S{i}")
+        for i in range(5)
+    ] + [
+        _comparable(property_type="2", wall_construction=1, certificate_number=f"C{i}")
+        for i in range(3)
+    ]
+
+    # Act
+    result: ComparableProperties = select_comparables(
+        target, candidates, minimum_cohort=5
+    )
+
+    # Assert — only the solid-brick comparables remain.
+    assert {c.certificate_number for c in result.members} == {
+        "S0", "S1", "S2", "S3", "S4"
+    }
+
+
+def test_known_wall_override_relaxes_when_too_few_match() -> None:
+    # Arrange — only 2 solid-brick but 6 cavity houses; the override would leave
+    # 2 (< k=5), so it relaxes to keep the full type cohort (graceful degradation).
+    target = PredictionTarget(
+        postcode="LS6 1AA", property_type="2", wall_construction=2
+    )
+    candidates = [
+        _comparable(property_type="2", wall_construction=2, certificate_number=f"S{i}")
+        for i in range(2)
+    ] + [
+        _comparable(property_type="2", wall_construction=1, certificate_number=f"C{i}")
+        for i in range(6)
+    ]
+
+    # Act
+    result: ComparableProperties = select_comparables(
+        target, candidates, minimum_cohort=5
+    )
+
+    # Assert — relaxed: all eight houses retained.
+    assert len(result.members) == 8
--- a/tests/domain/epc_prediction/test_component_accuracy_gate.py
+++ b/tests/domain/epc_prediction/test_component_accuracy_gate.py
@ -0,0 +1,118 @@
+"""Tier-1 ratcheting Component Accuracy gate (ADR-0030).
+
+Runs the calculator-free leave-one-out scorer over the committed, anonymised
+fixture and asserts each per-component classification rate / geometry residual is
+no worse than a committed baseline. Because the prediction is deterministic and
+the fixture is frozen, every run reproduces the same numbers exactly — so a
+failure means a real *regression* in prediction quality, never sample noise.
+
+The floors / ceilings are the currently-measured values and only ever **tighten**
+(the repo's no-tolerance-widening ethos applied to an aggregate): when prediction
+improves, ratchet the relevant floor up in the same change. The end-to-end
+SAP / carbon / PE guards are deliberately *not* here — they need the calculator,
+whose API-path residual is a separate workstream; the component floors are the
+real gate (ADR-0030).
+"""
+
+from pathlib import Path
+
+import pytest
+
+from domain.epc_prediction.validation import (
+    ComponentAccuracy,
+    evaluate_component_accuracy,
+)
+from harness.epc_prediction_corpus import load_corpus
+
+_FIXTURE = Path(__file__).parents[3] / "tests" / "fixtures" / "epc_prediction"
+
+# Minimum classification hit-rate per component (ratchet floors). Tighten — never
+# loosen — as prediction improves. Values are the measured rates over the frozen
+# 36-target fixture; a 1e-3 tolerance absorbs float rounding only.
+_RATE_FLOORS: dict[str, float] = {
+    "wall_construction": 0.8889,
+    "wall_insulation_type": 0.8333,
+    "construction_age_band": 0.6389,
+    "construction_age_band_pm1": 0.8333,
+    "roof_construction": 0.7222,
+    "floor_construction": 0.8125,
+    "heating_main_fuel": 0.9722,
+    "heating_main_category": 0.9444,
+    "heating_main_control": 0.8056,
+    "water_heating_fuel": 0.9722,
+    "water_heating_code": 0.9444,
+    "has_hot_water_cylinder": 0.8889,
+    "cylinder_insulation_type": 0.5000,
+    "secondary_heating_type": 0.0000,
+    "roof_insulation_thickness": 0.4118,
+    "roof_insulation_thickness_pm1": 0.4118,
+    "floor_insulation": 0.9375,
+    "has_room_in_roof": 0.8333,
+    "modal_glazing_type": 0.5556,
+    "has_pv": 1.0000,
+    "solar_water_heating": 1.0000,
+}
+
+# Maximum mean absolute residual per numeric component (ratchet ceilings).
+# window_count is deliberately excluded — it is cosmetic for SAP (issue #1222):
+# the predicted picture clusters at a mapper-default 4 windows while actuals
+# spread 1-21, yet total_window_area (the SAP-relevant signal) stays tight.
+#
+# floor_area was re-baselined 11.8983 -> 12.0378 when floor-area sizing moved from
+# the plain cohort median to the geo-proximity-weighted median (a *method* change,
+# not a loosening). The change is a clear win on the full 514-target corpus
+# (MAE 10.48 -> 9.73 / MAPE 13.2% -> 12.2%); the n=36 frozen fixture moved +0.14
+# the other way as small-sample noise (one target's shift moves an n=36 MAE more
+# than that). The ceiling still pins the new deterministic value exactly, so the
+# tighten-only ratchet resumes from here.
+_RESIDUAL_CEILINGS: dict[str, float] = {
+    "floor_area": 12.0378,
+    "total_window_area": 4.4067,
+    "building_parts": 0.3333,
+    "door_count": 0.6389,
+}
+
+_TOLERANCE = 1e-3
+
+
+@pytest.fixture(scope="module")
+def accuracy() -> ComponentAccuracy:
+    if not (_FIXTURE / "_index.json").exists():
+        pytest.skip(f"no EPC Prediction fixture at {_FIXTURE}")
+    return evaluate_component_accuracy(load_corpus(_FIXTURE))
+
+
+def test_fixture_yields_the_expected_target_count(
+    accuracy: ComponentAccuracy,
+) -> None:
+    # The frozen fixture must still produce its full set of SAP-10.2 targets — a
+    # drop means the fixture or the target filter changed.
+    assert accuracy.targets >= 36
+
+
+@pytest.mark.parametrize("component,floor", sorted(_RATE_FLOORS.items()))
+def test_classification_rate_does_not_regress(
+    accuracy: ComponentAccuracy, component: str, floor: float
+) -> None:
+    # Arrange / Act
+    rate = accuracy.rate(component)
+
+    # Assert — the component is still applicable and at or above its floor.
+    assert rate is not None, f"{component} had no applicable targets"
+    assert rate >= floor - _TOLERANCE, (
+        f"{component} classification regressed: {rate:.4f} < floor {floor:.4f}"
+    )
+
+
+@pytest.mark.parametrize("component,ceiling", sorted(_RESIDUAL_CEILINGS.items()))
+def test_residual_does_not_regress(
+    accuracy: ComponentAccuracy, component: str, ceiling: float
+) -> None:
+    # Arrange / Act
+    mean_abs = accuracy.mean_abs_residual(component)
+
+    # Assert — the mean absolute residual is at or below its ceiling.
+    assert mean_abs is not None, f"{component} had no residuals"
+    assert mean_abs <= ceiling + _TOLERANCE, (
+        f"{component} residual regressed: {mean_abs:.4f} > ceiling {ceiling:.4f}"
+    )
--- a/tests/domain/epc_prediction/test_epc_prediction.py
+++ b/tests/domain/epc_prediction/test_epc_prediction.py
@ -0,0 +1,545 @@
+"""Behaviour of EPC Prediction synthesis (ADR-0029): turn the selected
+ComparableProperty Properties into a predicted EpcPropertyData. Hybrid — copy a coherent
+representative template's structure (building parts, windows, geometry), set the
+homogeneous categoricals to the recency-weighted cohort mode, apply Landlord
+Overrides on top. Pure domain logic.
+"""
+
+from datetime import date
+from typing import Optional, Union
+
+from datatypes.epc.domain.epc_property_data import (
+    EpcPropertyData,
+    MainHeatingDetail,
+    SapBuildingPart,
+    SapFloorDimension,
+    SapHeating,
+    SapWindow,
+)
+from domain.geospatial.coordinates import Coordinates
+from domain.epc_prediction.comparable_properties import (
+    ComparableProperty,
+    ComparableProperties,
+)
+from domain.epc_prediction.epc_prediction import (
+    EpcPrediction,
+    PredictionConfidence,
+)
+from domain.epc_prediction.prediction_target import PredictionTarget
+
+
+def _epc(
+    *,
+    building_parts: int = 1,
+    floor_area: float = 80.0,
+    wall_construction: Union[int, str] = 1,
+    wall_insulation_type: Union[int, str] = 1,
+    construction_age_band: str = "K",
+    roof_construction: Optional[int] = 1,
+    roof_insulation_thickness: Optional[Union[str, int]] = 100,
+    floor_construction: Optional[int] = 1,
+    floor_insulation: Optional[int] = 1,
+    glazing_type: Union[int, str] = 3,
+    main_fuel_type: Union[int, str] = 1,
+    main_heating_category: Optional[int] = 1,
+    main_heating_control: Union[int, str] = 1,
+    water_heating_fuel: Optional[int] = 1,
+    water_heating_code: Optional[int] = 1,
+    has_hot_water_cylinder: bool = True,
+    solar_water_heating: bool = False,
+) -> EpcPropertyData:
+    epc: EpcPropertyData = object.__new__(EpcPropertyData)
+    epc.property_type = "2"
+    epc.built_form = "4"
+    epc.total_floor_area_m2 = floor_area
+    parts: list[SapBuildingPart] = []
+    for _ in range(building_parts):
+        part: SapBuildingPart = object.__new__(SapBuildingPart)
+        part.wall_construction = wall_construction
+        part.wall_insulation_type = wall_insulation_type
+        part.construction_age_band = construction_age_band
+        part.roof_construction = roof_construction
+        part.roof_insulation_thickness = roof_insulation_thickness
+        floor_dim: SapFloorDimension = object.__new__(SapFloorDimension)
+        floor_dim.floor_construction = floor_construction
+        floor_dim.floor_insulation = floor_insulation
+        part.sap_floor_dimensions = [floor_dim]
+        parts.append(part)
+    epc.sap_building_parts = parts
+    window: SapWindow = object.__new__(SapWindow)
+    window.window_width = 1.0
+    window.window_height = 1.0
+    window.glazing_type = glazing_type
+    epc.sap_windows = [window]
+    heating: SapHeating = object.__new__(SapHeating)
+    detail: MainHeatingDetail = object.__new__(MainHeatingDetail)
+    detail.main_fuel_type = main_fuel_type
+    detail.main_heating_category = main_heating_category
+    detail.main_heating_control = main_heating_control
+    heating.main_heating_details = [detail]
+    heating.water_heating_fuel = water_heating_fuel
+    heating.water_heating_code = water_heating_code
+    heating.cylinder_insulation_type = 1
+    heating.secondary_heating_type = None
+    epc.sap_heating = heating
+    epc.has_hot_water_cylinder = has_hot_water_cylinder
+    epc.solar_water_heating = solar_water_heating
+    return epc
+
+
+def _cohort(*epcs: EpcPropertyData) -> ComparableProperties:
+    return ComparableProperties(
+        members=tuple(
+            ComparableProperty(epc=e, certificate_number=str(i)) for i, e in enumerate(epcs)
+        )
+    )
+
+
+def _dated_cohort(
+    *dated: tuple[EpcPropertyData, date],
+) -> ComparableProperties:
+    return ComparableProperties(
+        members=tuple(
+            ComparableProperty(epc=e, certificate_number=str(i), registration_date=d)
+            for i, (e, d) in enumerate(dated)
+        )
+    )
+
+
+def test_predicts_a_picture_by_copying_a_representative_template() -> None:
+    # Arrange — a single comparable with a distinctive structure (2 building
+    # parts, 92 m²); with nothing else to go on it is the template.
+    template = _epc(building_parts=2, floor_area=92.0)
+    target = PredictionTarget(postcode="LS6 1AA", property_type="2")
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(target, _cohort(template))
+
+    # Assert — the structure is copied wholesale (and it is a copy, not the same
+    # object — the baseline must never be mutated).
+    assert len(predicted.sap_building_parts) == 2
+    assert predicted.total_floor_area_m2 == 92.0
+    assert predicted is not template
+
+
+def test_template_is_the_member_closest_to_the_cohort_median_size() -> None:
+    # Arrange — the cohort spans a wide range of sizes; members[0] is an atypical
+    # tiny 20 m² outlier. A single neighbour's geometry is copied wholesale, so
+    # the template must be the size-representative member (closest to the median),
+    # not whoever happens to come first (ADR-0029 decision 4: closest on size).
+    cohort = _cohort(
+        _epc(floor_area=20.0),
+        _epc(floor_area=80.0),
+        _epc(floor_area=200.0),
+    )
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(
+        PredictionTarget(postcode="LS6 1AA", property_type="2"), cohort
+    )
+
+    # Assert — the 80 m² member (the median) seeds the structure, not the 20 m²
+    # outlier sitting at members[0].
+    assert predicted.total_floor_area_m2 == 80.0
+
+
+def test_sets_main_wall_construction_to_the_cohort_mode() -> None:
+    # Arrange — the template (members[0]) is solid brick (2), but the cohort
+    # majority is cavity (1). The homogeneous categorical should follow the mode,
+    # not the one template, so the prediction is robust to an atypical template.
+    cohort = _cohort(
+        _epc(wall_construction=2),
+        _epc(wall_construction=1),
+        _epc(wall_construction=1),
+        _epc(wall_construction=1),
+    )
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(
+        PredictionTarget(postcode="LS6 1AA", property_type="2"), cohort
+    )
+
+    # Assert — cavity (the mode) wins over the solid-brick template.
+    assert predicted.sap_building_parts[0].wall_construction == 1
+
+
+def test_sets_the_other_homogeneous_categoricals_to_the_cohort_mode() -> None:
+    # Arrange — the median-size template (members[0], 80 m²) is an atypical
+    # outlier on every categorical; the cohort majority disagrees. Age band,
+    # wall insulation, roof construction and floor construction are all
+    # homogeneous categoricals, so each should follow its mode, not the one
+    # template (ADR-0029 decision 4).
+    cohort = _cohort(
+        _epc(
+            floor_area=80.0,
+            construction_age_band="A",
+            wall_insulation_type=9,
+            roof_construction=7,
+            floor_construction=7,
+        ),
+        _epc(
+            construction_age_band="K",
+            wall_insulation_type=1,
+            roof_construction=2,
+            floor_construction=3,
+        ),
+        _epc(
+            construction_age_band="K",
+            wall_insulation_type=1,
+            roof_construction=2,
+            floor_construction=3,
+        ),
+    )
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(
+        PredictionTarget(postcode="LS6 1AA", property_type="2"), cohort
+    )
+
+    # Assert — every categorical follows the cohort mode over the outlier
+    # template.
+    main = predicted.sap_building_parts[0]
+    assert main.construction_age_band == "K"
+    assert main.wall_insulation_type == 1
+    assert main.roof_construction == 2
+    assert main.sap_floor_dimensions[0].floor_construction == 3
+
+
+def test_modes_roof_and_floor_insulation() -> None:
+    # Arrange — the median-size template (members[0]) is an outlier on roof
+    # insulation thickness and floor insulation; the cohort majority disagrees.
+    # These are independent fabric categoricals, so each should follow its
+    # cohort mode like the construction categoricals do.
+    cohort = _cohort(
+        _epc(floor_area=80.0, roof_insulation_thickness=25, floor_insulation=9),
+        _epc(roof_insulation_thickness=300, floor_insulation=2),
+        _epc(roof_insulation_thickness=300, floor_insulation=2),
+    )
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(
+        PredictionTarget(postcode="LS6 1AA", property_type="2"), cohort
+    )
+
+    # Assert — each follows the cohort mode over the outlier template.
+    main = predicted.sap_building_parts[0]
+    assert main.roof_insulation_thickness == 300
+    assert main.sap_floor_dimensions[0].floor_insulation == 2
+
+
+def test_recency_weights_roof_insulation_mode() -> None:
+    # Arrange — an old majority (three 2015 certs at 100 mm) and a recent
+    # minority (two 2025 certs at 300 mm). Roof insulation is topped up over
+    # time, so the recent neighbours reflect the current state: the recency-
+    # weighted mode must pick 300 over the plain-majority 100.
+    cohort = _dated_cohort(
+        (_epc(roof_insulation_thickness=100), date(2015, 1, 1)),
+        (_epc(roof_insulation_thickness=100), date(2015, 1, 1)),
+        (_epc(roof_insulation_thickness=100), date(2015, 1, 1)),
+        (_epc(roof_insulation_thickness=300), date(2025, 1, 1)),
+        (_epc(roof_insulation_thickness=300), date(2025, 1, 1)),
+    )
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(
+        PredictionTarget(postcode="LS6 1AA", property_type="2"), cohort
+    )
+
+    # Assert — recency overrides the stale majority.
+    assert predicted.sap_building_parts[0].roof_insulation_thickness == 300
+
+
+def test_floor_area_is_the_cohort_median_not_the_templates_own_area() -> None:
+    # Arrange — an even-sized cohort whose median (70) falls between members, so
+    # the size-representative template (the first member closest to the median,
+    # 60 m²) does not itself sit on the median. The predicted floor area is a
+    # point estimate of the target's size, best served by the cohort median (the
+    # MAD-minimising estimator), decoupled from whichever template seeds the
+    # structure.
+    cohort = _cohort(
+        _epc(floor_area=40.0),
+        _epc(floor_area=60.0),
+        _epc(floor_area=80.0),
+        _epc(floor_area=100.0),
+    )
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(
+        PredictionTarget(postcode="LS6 1AA", property_type="2"), cohort
+    )
+
+    # Assert — the floor area is the cohort median (70), not the template's 60.
+    assert predicted.total_floor_area_m2 == 70.0
+
+
+def test_floor_area_leans_toward_the_nearest_neighbours_size() -> None:
+    # Arrange — three FAR neighbours are 60 m²; one neighbour AT the target is
+    # 120 m². The plain median would be 60, but homes built together share a
+    # footprint, so the geo-proximity-weighted median leans toward the near
+    # neighbour's size.
+    here = Coordinates(longitude=0.0, latitude=0.0)
+    far = Coordinates(longitude=1.0, latitude=1.0)  # ~150 km away
+    cohort = ComparableProperties(
+        members=(
+            ComparableProperty(_epc(floor_area=60.0), "1", coordinates=far),
+            ComparableProperty(_epc(floor_area=60.0), "2", coordinates=far),
+            ComparableProperty(_epc(floor_area=60.0), "3", coordinates=far),
+            ComparableProperty(_epc(floor_area=120.0), "4", coordinates=here),
+        )
+    )
+    target = PredictionTarget(
+        postcode="LS6 1AA", property_type="2", coordinates=here
+    )
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(target, cohort)
+
+    # Assert — the near neighbour's size dominates the far majority.
+    assert predicted.total_floor_area_m2 == 120.0
+
+
+def test_floor_area_median_is_unweighted_without_target_coordinates() -> None:
+    # Arrange — identical cohort, but the target has no coordinates, so geo
+    # weighting is off and the floor area reduces to the plain cohort median (60).
+    here = Coordinates(longitude=0.0, latitude=0.0)
+    far = Coordinates(longitude=1.0, latitude=1.0)
+    cohort = ComparableProperties(
+        members=(
+            ComparableProperty(_epc(floor_area=60.0), "1", coordinates=far),
+            ComparableProperty(_epc(floor_area=60.0), "2", coordinates=far),
+            ComparableProperty(_epc(floor_area=60.0), "3", coordinates=far),
+            ComparableProperty(_epc(floor_area=120.0), "4", coordinates=here),
+        )
+    )
+    target = PredictionTarget(postcode="LS6 1AA", property_type="2")
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(target, cohort)
+
+    # Assert — without target coordinates, the plain median (60) wins.
+    assert predicted.total_floor_area_m2 == 60.0
+
+
+def test_categorical_mode_leans_on_size_similar_neighbours() -> None:
+    # Arrange — a count majority (three) carries wall-insulation 9, but two of
+    # them are 400 m² size outliers; the cohort centre (median 100 m²) holds
+    # wall-insulation 1. Physical-similarity weighting down-weights the outliers,
+    # so the size-representative value 1 wins over the plain-count majority 9.
+    cohort = _cohort(
+        _epc(floor_area=100.0, wall_insulation_type=1),
+        _epc(floor_area=100.0, wall_insulation_type=1),
+        _epc(floor_area=100.0, wall_insulation_type=9),
+        _epc(floor_area=400.0, wall_insulation_type=9),
+        _epc(floor_area=400.0, wall_insulation_type=9),
+    )
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(
+        PredictionTarget(postcode="LS6 1AA", property_type="2"), cohort
+    )
+
+    # Assert — the size-similar value wins over the outlier-driven majority.
+    assert predicted.sap_building_parts[0].wall_insulation_type == 1
+
+
+def test_categorical_mode_leans_on_age_similar_neighbours() -> None:
+    # Arrange — same size throughout (so size weighting is neutral). A count
+    # majority (three) carries wall-insulation 9, but two of them are age-band A
+    # outliers while the cohort's modal band is K. Age-similarity weighting
+    # down-weights the outliers, so the band-representative value 1 wins.
+    cohort = _cohort(
+        _epc(construction_age_band="K", wall_insulation_type=1),
+        _epc(construction_age_band="K", wall_insulation_type=1),
+        _epc(construction_age_band="K", wall_insulation_type=9),
+        _epc(construction_age_band="A", wall_insulation_type=9),
+        _epc(construction_age_band="A", wall_insulation_type=9),
+    )
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(
+        PredictionTarget(postcode="LS6 1AA", property_type="2"), cohort
+    )
+
+    # Assert — the age-similar value wins over the outlier-driven majority.
+    assert predicted.sap_building_parts[0].wall_insulation_type == 1
+
+
+def test_confidence_reports_cohort_size_and_unanimous_agreement() -> None:
+    # Arrange — a unanimous cohort: three neighbours, all cavity-walled (1).
+    cohort = _cohort(
+        _epc(wall_construction=1),
+        _epc(wall_construction=1),
+        _epc(wall_construction=1),
+    )
+
+    # Act
+    confidence: PredictionConfidence = EpcPrediction().confidence(cohort)
+
+    # Assert — three neighbours, total agreement on the wall construction.
+    assert confidence.cohort_size == 3
+    assert confidence.agreement("wall_construction") == 1.0
+
+
+def test_confidence_agreement_is_the_modal_share_of_the_cohort() -> None:
+    # Arrange — three of four neighbours are cavity (1), one is solid brick (2),
+    # so the cohort is split on the wall construction.
+    cohort = _cohort(
+        _epc(wall_construction=1),
+        _epc(wall_construction=1),
+        _epc(wall_construction=1),
+        _epc(wall_construction=2),
+    )
+
+    # Act
+    confidence: PredictionConfidence = EpcPrediction().confidence(cohort)
+
+    # Assert — agreement is the modal value's share of the cohort: 3 of 4.
+    share: Optional[float] = confidence.agreement("wall_construction")
+    assert share is not None
+    assert abs(share - 0.75) <= 1e-9
+
+
+def test_confidence_excludes_absent_component_values_from_the_denominator() -> None:
+    # Arrange — two neighbours lodge a roof construction (both code 2); one lodges
+    # none. The missing value must not dilute the agreement to 2/3.
+    cohort = _cohort(
+        _epc(roof_construction=2),
+        _epc(roof_construction=2),
+        _epc(roof_construction=None),
+    )
+
+    # Act
+    confidence: PredictionConfidence = EpcPrediction().confidence(cohort)
+
+    # Assert — agreement counts only the two present, unanimous values (1.0),
+    # while the cohort size still reflects all three neighbours.
+    share: Optional[float] = confidence.agreement("roof_construction")
+    assert share is not None
+    assert abs(share - 1.0) <= 1e-9
+    assert confidence.cohort_size == 3
+
+
+def test_heating_is_a_coherent_donor_not_the_structural_template() -> None:
+    # Arrange — the size-representative template (median 80 m²) runs an atypical
+    # system (fuel 99, no cylinder), but the cohort's modal heating signature is a
+    # gas system (fuel 1) with a cylinder, including a recent 2024 cert. Heating
+    # sub-fields can't be field-moded, so the whole SapHeating cluster must be
+    # copied from the coherent modal donor — the most recent among the matches —
+    # not inherited from the structural template.
+    cohort = _dated_cohort(
+        (
+            _epc(
+                floor_area=80.0,
+                main_fuel_type=99,
+                main_heating_control=99,
+                has_hot_water_cylinder=False,
+            ),
+            date(2016, 1, 1),
+        ),
+        (_epc(main_fuel_type=1, main_heating_control=5), date(2018, 1, 1)),
+        (_epc(main_fuel_type=1, main_heating_control=5), date(2019, 1, 1)),
+        (_epc(main_fuel_type=1, main_heating_control=7), date(2024, 1, 1)),
+    )
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(
+        PredictionTarget(postcode="LS6 1AA", property_type="2"), cohort
+    )
+
+    # Assert — heating comes coherently from the modal-signature donor (gas +
+    # cylinder), the most recent match (control 7 from 2024), not the template's
+    # fuel 99.
+    detail = predicted.sap_heating.main_heating_details[0]
+    assert detail.main_fuel_type == 1
+    assert detail.main_heating_control == 7
+    assert predicted.has_hot_water_cylinder is True
+
+
+def test_glazing_follows_the_recency_weighted_cohort_mode() -> None:
+    # Arrange — an old majority single-glazed (type 1, 2015) and a recent
+    # minority double-glazed (type 3, 2025). Glazing is retrofitted over time
+    # (single → double), so the recent neighbours reflect the current state: the
+    # recency-weighted mode must pick double over the stale single-glazed
+    # majority, like roof insulation thickness.
+    cohort = _dated_cohort(
+        (_epc(glazing_type=1), date(2015, 1, 1)),
+        (_epc(glazing_type=1), date(2015, 1, 1)),
+        (_epc(glazing_type=1), date(2015, 1, 1)),
+        (_epc(glazing_type=3), date(2025, 1, 1)),
+        (_epc(glazing_type=3), date(2025, 1, 1)),
+    )
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(
+        PredictionTarget(postcode="LS6 1AA", property_type="2"), cohort
+    )
+
+    # Assert — every predicted window takes the recent glazing over the majority.
+    assert all(window.glazing_type == 3 for window in predicted.sap_windows)
+
+
+def test_geo_proximity_weights_the_nearest_neighbour() -> None:
+    # Arrange — same size + age (so similarity weighting is uniform). Three FAR
+    # neighbours are cavity (1); one neighbour AT the target is solid brick (2).
+    # wall construction is a geo-weighted component, so the near neighbour
+    # outweighs the far majority.
+    here = Coordinates(longitude=0.0, latitude=0.0)
+    far = Coordinates(longitude=1.0, latitude=1.0)  # ~150 km away
+    cohort = ComparableProperties(
+        members=(
+            ComparableProperty(_epc(wall_construction=1), "1", coordinates=far),
+            ComparableProperty(_epc(wall_construction=1), "2", coordinates=far),
+            ComparableProperty(_epc(wall_construction=1), "3", coordinates=far),
+            ComparableProperty(_epc(wall_construction=2), "4", coordinates=here),
+        )
+    )
+    target = PredictionTarget(
+        postcode="LS6 1AA", property_type="2", coordinates=here
+    )
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(target, cohort)
+
+    # Assert — the near neighbour's wall wins over the far majority.
+    assert predicted.sap_building_parts[0].wall_construction == 2
+
+
+def test_geo_proximity_is_off_without_target_coordinates() -> None:
+    # Arrange — identical cohort, but the target has no coordinates, so geo
+    # weighting is disabled and the plain cohort majority (cavity, 1) wins.
+    here = Coordinates(longitude=0.0, latitude=0.0)
+    far = Coordinates(longitude=1.0, latitude=1.0)
+    cohort = ComparableProperties(
+        members=(
+            ComparableProperty(_epc(wall_construction=1), "1", coordinates=far),
+            ComparableProperty(_epc(wall_construction=1), "2", coordinates=far),
+            ComparableProperty(_epc(wall_construction=1), "3", coordinates=far),
+            ComparableProperty(_epc(wall_construction=2), "4", coordinates=here),
+        )
+    )
+    target = PredictionTarget(postcode="LS6 1AA", property_type="2")
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(target, cohort)
+
+    # Assert — without target coordinates, the majority wins (geo off).
+    assert predicted.sap_building_parts[0].wall_construction == 1
+
+
+def test_applies_a_known_wall_override_over_the_mode() -> None:
+    # Arrange — the cohort mode is cavity (1), but we KNOW the target is solid
+    # brick (2), a Landlord Override. The known value must win over the estimate.
+    cohort = _cohort(
+        _epc(wall_construction=1),
+        _epc(wall_construction=1),
+        _epc(wall_construction=1),
+    )
+    target = PredictionTarget(
+        postcode="LS6 1AA", property_type="2", wall_construction=2
+    )
+
+    # Act
+    predicted: EpcPropertyData = EpcPrediction().predict(target, cohort)
+
+    # Assert — the known override overrides the cohort mode.
+    assert predicted.sap_building_parts[0].wall_construction == 2
--- a/tests/domain/epc_prediction/test_prediction_comparison.py
+++ b/tests/domain/epc_prediction/test_prediction_comparison.py
@ -0,0 +1,365 @@
+"""Behaviour of the per-Property prediction comparison (ADR-0029): given a
+predicted EpcPropertyData and the actual one, report the accuracy signals the
+validation harness aggregates — classification matches on the key categoricals
+and residuals on the geometry. Pure; SAP residual is computed in the runner
+(it needs the calculator + lodged SAP).
+"""
+
+from typing import Optional, Union
+
+from datatypes.epc.domain.epc_property_data import (
+    EpcPropertyData,
+    MainHeatingDetail,
+    PhotovoltaicSupply,
+    SapBuildingPart,
+    SapEnergySource,
+    SapFloorDimension,
+    SapHeating,
+    SapRoomInRoof,
+    SapWindow,
+)
+from domain.epc_prediction.prediction_comparison import compare_prediction
+
+
+def _epc(
+    *,
+    wall_construction: int = 1,
+    wall_insulation_type: Union[int, str] = 1,
+    construction_age_band: str = "K",
+    roof_construction: Optional[int] = 1,
+    roof_insulation_thickness: Optional[Union[str, int]] = 100,
+    floor_construction: Optional[int] = 1,
+    floor_insulation: Optional[int] = 1,
+    has_room_in_roof: bool = False,
+    floor_area: float = 80.0,
+    building_parts: int = 1,
+    windows: Optional[list[tuple[float, float]]] = None,
+    glazing_type: Union[int, str] = 3,
+    door_count: int = 2,
+    has_pv: bool = False,
+    solar_water_heating: bool = False,
+    main_fuel_type: Union[int, str] = 20,
+    main_heating_category: Optional[int] = 2,
+    main_heating_control: Union[int, str] = 2100,
+    water_heating_fuel: Optional[int] = 20,
+    water_heating_code: Optional[int] = 901,
+    has_hot_water_cylinder: bool = True,
+    cylinder_insulation_type: Optional[Union[int, str]] = 1,
+    secondary_heating_type: Optional[Union[int, str]] = None,
+) -> EpcPropertyData:
+    epc: EpcPropertyData = object.__new__(EpcPropertyData)
+    epc.total_floor_area_m2 = floor_area
+    epc.door_count = door_count
+    epc.solar_water_heating = solar_water_heating
+    parts: list[SapBuildingPart] = []
+    for _ in range(building_parts):
+        part: SapBuildingPart = object.__new__(SapBuildingPart)
+        part.wall_construction = wall_construction
+        part.wall_insulation_type = wall_insulation_type
+        part.construction_age_band = construction_age_band
+        part.roof_construction = roof_construction
+        part.roof_insulation_thickness = roof_insulation_thickness
+        part.sap_room_in_roof = (
+            object.__new__(SapRoomInRoof) if has_room_in_roof else None
+        )
+        floor_dim: SapFloorDimension = object.__new__(SapFloorDimension)
+        floor_dim.floor_construction = floor_construction
+        floor_dim.floor_insulation = floor_insulation
+        part.sap_floor_dimensions = [floor_dim]
+        parts.append(part)
+    epc.sap_building_parts = parts
+    detail: MainHeatingDetail = object.__new__(MainHeatingDetail)
+    detail.main_fuel_type = main_fuel_type
+    detail.main_heating_category = main_heating_category
+    detail.main_heating_control = main_heating_control
+    heating: SapHeating = object.__new__(SapHeating)
+    heating.main_heating_details = [detail]
+    heating.water_heating_fuel = water_heating_fuel
+    heating.water_heating_code = water_heating_code
+    heating.cylinder_insulation_type = cylinder_insulation_type
+    heating.secondary_heating_type = secondary_heating_type
+    epc.sap_heating = heating
+    epc.has_hot_water_cylinder = has_hot_water_cylinder
+    sap_windows: list[SapWindow] = []
+    for width, height in windows or []:
+        w: SapWindow = object.__new__(SapWindow)
+        w.window_width = width
+        w.window_height = height
+        w.glazing_type = glazing_type
+        sap_windows.append(w)
+    epc.sap_windows = sap_windows
+    energy: SapEnergySource = object.__new__(SapEnergySource)
+    energy.photovoltaic_supply = (
+        object.__new__(PhotovoltaicSupply) if has_pv else None
+    )
+    energy.photovoltaic_arrays = None
+    epc.sap_energy_source = energy
+    return epc
+
+
+def test_scores_age_band_within_one_band() -> None:
+    # Arrange — predicted age band K, actual J (adjacent). Adjacent RdSAP age
+    # bands carry near-identical U-values, so an off-by-one is ~SAP-neutral: it
+    # misses the exact hit but counts as a ±1-band hit (issue #1222).
+    predicted = _epc(construction_age_band="K")
+    actual = _epc(construction_age_band="J")
+
+    # Act
+    hits = compare_prediction(predicted, actual).categorical_hits
+
+    # Assert
+    assert hits["construction_age_band"] is False
+    assert hits["construction_age_band_pm1"] is True
+
+
+def test_age_band_two_apart_misses_both() -> None:
+    # Arrange — predicted K, actual H (three bands apart): a real miss on both.
+    predicted = _epc(construction_age_band="K")
+    actual = _epc(construction_age_band="H")
+
+    # Act
+    hits = compare_prediction(predicted, actual).categorical_hits
+
+    # Assert
+    assert hits["construction_age_band"] is False
+    assert hits["construction_age_band_pm1"] is False
+
+
+def test_scores_roof_insulation_within_one_bucket() -> None:
+    # Arrange — predicted 250mm, actual 270mm (adjacent RdSAP buckets). Adjacent
+    # thicknesses carry near-identical roof U-values, so it misses the exact hit
+    # but counts as a ±1-bucket hit, like the age band (issue #1222).
+    predicted = _epc(roof_insulation_thickness="250mm")
+    actual = _epc(roof_insulation_thickness="270mm")
+
+    # Act
+    hits = compare_prediction(predicted, actual).categorical_hits
+
+    # Assert
+    assert hits["roof_insulation_thickness"] is False
+    assert hits["roof_insulation_thickness_pm1"] is True
+
+
+def test_roof_insulation_two_buckets_apart_misses_both() -> None:
+    # Arrange — predicted 100mm, actual 200mm (three buckets apart: 100/150/200):
+    # a real miss on both exact and ±1.
+    predicted = _epc(roof_insulation_thickness="100mm")
+    actual = _epc(roof_insulation_thickness="200mm")
+
+    # Act
+    hits = compare_prediction(predicted, actual).categorical_hits
+
+    # Assert
+    assert hits["roof_insulation_thickness"] is False
+    assert hits["roof_insulation_thickness_pm1"] is False
+
+
+def test_roof_insulation_off_scale_no_data_only_exact_counts() -> None:
+    # Arrange — actual is the off-scale "ND" (no-data) category; a non-equal
+    # prediction can't be an adjacent-bucket hit.
+    predicted = _epc(roof_insulation_thickness="200mm")
+    actual = _epc(roof_insulation_thickness="ND")
+
+    # Act
+    hits = compare_prediction(predicted, actual).categorical_hits
+
+    # Assert
+    assert hits["roof_insulation_thickness"] is False
+    assert hits["roof_insulation_thickness_pm1"] is False
+
+
+def test_flags_a_correct_main_wall_construction_classification() -> None:
+    # Arrange — predicted and actual agree on cavity (1).
+    predicted = _epc(wall_construction=1)
+    actual = _epc(wall_construction=1)
+
+    # Act
+    comparison = compare_prediction(predicted, actual)
+
+    # Assert
+    assert comparison.categorical_hits["wall_construction"] is True
+
+
+def test_flags_an_incorrect_main_wall_construction_classification() -> None:
+    # Arrange — predicted cavity (1), actual solid brick (2).
+    predicted = _epc(wall_construction=1)
+    actual = _epc(wall_construction=2)
+
+    # Act
+    comparison = compare_prediction(predicted, actual)
+
+    # Assert
+    assert comparison.categorical_hits["wall_construction"] is False
+
+
+def test_classifies_the_extra_homogeneous_categoricals() -> None:
+    # Arrange — predicted agrees on age band, wall insulation, roof and floor
+    # construction with the actual; only wall insulation differs.
+    predicted = _epc(
+        construction_age_band="K",
+        wall_insulation_type=2,
+        roof_construction=3,
+        floor_construction=1,
+    )
+    actual = _epc(
+        construction_age_band="K",
+        wall_insulation_type=1,
+        roof_construction=3,
+        floor_construction=1,
+    )
+
+    # Act
+    comparison = compare_prediction(predicted, actual)
+
+    # Assert
+    assert comparison.categorical_hits["construction_age_band"] is True
+    assert comparison.categorical_hits["wall_insulation_type"] is False
+    assert comparison.categorical_hits["roof_construction"] is True
+    assert comparison.categorical_hits["floor_construction"] is True
+
+
+def test_classifies_the_heating_components() -> None:
+    # Arrange — predicted and actual agree on everything heating except the main
+    # fuel (predicted oil 28, actual gas 20) and secondary heating (predicted
+    # none, actual a wood stove 693). Heating is the dominant SAP lever, so each
+    # heating component is scored (ADR-0030 Component Accuracy).
+    predicted = _epc(
+        main_fuel_type=28,
+        main_heating_category=2,
+        main_heating_control=2100,
+        water_heating_fuel=20,
+        water_heating_code=901,
+        has_hot_water_cylinder=True,
+        cylinder_insulation_type=1,
+        secondary_heating_type=None,
+    )
+    actual = _epc(
+        main_fuel_type=20,
+        main_heating_category=2,
+        main_heating_control=2100,
+        water_heating_fuel=20,
+        water_heating_code=901,
+        has_hot_water_cylinder=True,
+        cylinder_insulation_type=1,
+        secondary_heating_type=693,
+    )
+
+    # Act
+    hits = compare_prediction(predicted, actual).categorical_hits
+
+    # Assert
+    assert hits["heating_main_fuel"] is False
+    assert hits["heating_main_category"] is True
+    assert hits["heating_main_control"] is True
+    assert hits["water_heating_fuel"] is True
+    assert hits["water_heating_code"] is True
+    assert hits["has_hot_water_cylinder"] is True
+    assert hits["cylinder_insulation_type"] is True
+    # Secondary heating is absent in the prediction but present in the actual —
+    # a real miss (predicted None ≠ actual 693), not "not applicable".
+    assert hits["secondary_heating_type"] is False
+
+
+def test_classifies_fabric_insulation_and_room_in_roof() -> None:
+    # Arrange — predicted and actual disagree on roof insulation thickness and on
+    # whether there's a room-in-roof, but agree on floor insulation.
+    predicted = _epc(
+        roof_insulation_thickness=100,
+        floor_insulation=1,
+        has_room_in_roof=False,
+    )
+    actual = _epc(
+        roof_insulation_thickness=270,
+        floor_insulation=1,
+        has_room_in_roof=True,
+    )
+
+    # Act
+    hits = compare_prediction(predicted, actual).categorical_hits
+
+    # Assert
+    assert hits["roof_insulation_thickness"] is False
+    assert hits["floor_insulation"] is True
+    # Room-in-roof presence is always applicable — predicting "no RR" when there
+    # is one is a real miss, not "not applicable".
+    assert hits["has_room_in_roof"] is False
+
+
+def test_classifies_glazing_renewables_and_door_count() -> None:
+    # Arrange — predicted glazing type, PV and solar disagree with the actual;
+    # door count is over-predicted by one.
+    predicted = _epc(
+        windows=[(1.0, 1.0), (1.0, 1.0)],
+        glazing_type=3,
+        has_pv=False,
+        solar_water_heating=False,
+        door_count=3,
+    )
+    actual = _epc(
+        windows=[(1.0, 1.0), (1.0, 1.0)],
+        glazing_type=4,
+        has_pv=True,
+        solar_water_heating=True,
+        door_count=2,
+    )
+
+    # Act
+    comparison = compare_prediction(predicted, actual)
+    hits = comparison.categorical_hits
+
+    # Assert
+    assert hits["modal_glazing_type"] is False
+    assert hits["has_pv"] is False
+    assert hits["solar_water_heating"] is False
+    assert comparison.door_count_residual == 1
+
+
+def test_categorical_hit_is_not_applicable_when_actual_is_absent() -> None:
+    # Arrange — the actual lodges no roof construction (a flat under another
+    # dwelling). A hit there is not applicable, not a free win, so it must not
+    # count towards the roof classification rate.
+    predicted = _epc(roof_construction=3)
+    actual = _epc(roof_construction=None)
+
+    # Act
+    comparison = compare_prediction(predicted, actual)
+
+    # Assert
+    assert comparison.categorical_hits["roof_construction"] is None
+
+
+def test_reports_the_floor_area_residual_as_predicted_minus_actual() -> None:
+    # Arrange — predicted 90 m², actual 100 m² (a 10 m² under-prediction).
+    predicted = _epc(floor_area=90.0)
+    actual = _epc(floor_area=100.0)
+
+    # Act
+    comparison = compare_prediction(predicted, actual)
+
+    # Assert — signed residual, predicted − actual.
+    assert abs(comparison.floor_area_residual - (-10.0)) <= 1e-9
+
+
+def test_reports_the_building_parts_count_residual() -> None:
+    # Arrange — predicted a single part; the actual has a main + an extension.
+    predicted = _epc(building_parts=1)
+    actual = _epc(building_parts=2)
+
+    # Act
+    comparison = compare_prediction(predicted, actual)
+
+    # Assert — predicted − actual.
+    assert comparison.building_parts_residual == -1
+
+
+def test_reports_window_count_and_total_area_residuals() -> None:
+    # Arrange — predicted 2 windows (3 m² total); actual 1 window (1 m²).
+    predicted = _epc(windows=[(1.0, 1.0), (2.0, 1.0)])
+    actual = _epc(windows=[(1.0, 1.0)])
+
+    # Act
+    comparison = compare_prediction(predicted, actual)
+
+    # Assert
+    assert comparison.window_count_residual == 1
+    assert abs(comparison.total_window_area_residual - 2.0) <= 1e-9
--- a/tests/domain/epc_prediction/test_prediction_target.py
+++ b/tests/domain/epc_prediction/test_prediction_target.py
@ -0,0 +1,56 @@
+"""Assembling an EPC-less Property's PredictionTarget, and the eligibility gate:
+a Property whose property type is unknown is not predicted (ADR-0031 slice-5d)."""
+
+from __future__ import annotations
+
+from typing import Optional
+
+from domain.epc_prediction.prediction_target import (
+    PredictionTarget,
+    PredictionTargetAttributes,
+    build_prediction_target,
+)
+from domain.geospatial.coordinates import Coordinates
+from domain.property.property import PropertyIdentity
+
+
+def _identity(postcode: str = "LS6 1AA") -> PropertyIdentity:
+    return PropertyIdentity(
+        portfolio_id=1, postcode=postcode, address="1 Some Street", uprn=12345
+    )
+
+
+def test_target_is_assembled_from_identity_coords_and_overrides() -> None:
+    # Arrange — a known property type + built form + wall (Landlord Overrides),
+    # and resolved coordinates.
+    here = Coordinates(longitude=-1.55, latitude=53.81)
+    attributes = PredictionTargetAttributes(
+        property_type="2", built_form="3", wall_construction=1
+    )
+
+    # Act
+    target: Optional[PredictionTarget] = build_prediction_target(
+        _identity(), here, attributes
+    )
+
+    # Assert — every known input is threaded onto the target.
+    assert target is not None
+    assert target.postcode == "LS6 1AA"
+    assert target.property_type == "2"
+    assert target.built_form == "3"
+    assert target.wall_construction == 1
+    assert target.coordinates is here
+
+
+def test_an_unknown_property_type_gates_the_property_out() -> None:
+    # Arrange — property type is the hard cohort filter; without it the Property
+    # must not be predicted from a mixed-type cohort (ADR-0031).
+    attributes = PredictionTargetAttributes(property_type=None)
+
+    # Act
+    target: Optional[PredictionTarget] = build_prediction_target(
+        _identity(), None, attributes
+    )
+
+    # Assert — gated out: no target to predict from.
+    assert target is None
--- a/tests/domain/epc_prediction/test_validation.py
+++ b/tests/domain/epc_prediction/test_validation.py
@ -0,0 +1,123 @@
+"""Behaviour of the Component Accuracy leave-one-out scorer (ADR-0030): given
+loaded postcode cohorts, hold out each SAP 10.2 target, predict it from its
+all-vintage neighbours, and aggregate the per-component hits + residuals. Pure
+(no IO, no calculator) — corpus loading is the caller's job.
+"""
+
+from datetime import date
+from typing import Optional, Union
+
+from datatypes.epc.domain.epc_property_data import (
+    EpcPropertyData,
+    MainHeatingDetail,
+    SapBuildingPart,
+    SapEnergySource,
+    SapFloorDimension,
+    SapHeating,
+)
+from domain.epc_prediction.comparable_properties import ComparableProperty
+from domain.epc_prediction.validation import evaluate_component_accuracy
+
+
+def _comparable(
+    *,
+    certificate_number: str,
+    address: str,
+    sap_version: float,
+    wall_construction: Union[int, str] = 1,
+    registration_date: Optional[date] = None,
+) -> ComparableProperty:
+    """A ComparableProperty carrying a fully-populated opaque EpcPropertyData — every
+    field the predictor + comparison read (the partial-instance idiom)."""
+    epc: EpcPropertyData = object.__new__(EpcPropertyData)
+    epc.sap_version = sap_version
+    epc.postcode = "LS6 1AA"
+    epc.property_type = "2"
+    epc.built_form = "4"
+    epc.total_floor_area_m2 = 80.0
+    epc.door_count = 2
+    epc.solar_water_heating = False
+    epc.has_hot_water_cylinder = True
+    part: SapBuildingPart = object.__new__(SapBuildingPart)
+    part.wall_construction = wall_construction
+    part.wall_insulation_type = 1
+    part.construction_age_band = "K"
+    part.roof_construction = 1
+    part.roof_insulation_thickness = 100
+    part.sap_room_in_roof = None
+    floor_dim: SapFloorDimension = object.__new__(SapFloorDimension)
+    floor_dim.floor_construction = 1
+    floor_dim.floor_insulation = 1
+    part.sap_floor_dimensions = [floor_dim]
+    epc.sap_building_parts = [part]
+    epc.sap_windows = []
+    detail: MainHeatingDetail = object.__new__(MainHeatingDetail)
+    detail.main_fuel_type = 20
+    detail.main_heating_category = 2
+    detail.main_heating_control = 2100
+    heating: SapHeating = object.__new__(SapHeating)
+    heating.main_heating_details = [detail]
+    heating.water_heating_fuel = 20
+    heating.water_heating_code = 901
+    heating.cylinder_insulation_type = 1
+    heating.secondary_heating_type = None
+    epc.sap_heating = heating
+    energy: SapEnergySource = object.__new__(SapEnergySource)
+    energy.photovoltaic_supply = None
+    energy.photovoltaic_arrays = None
+    epc.sap_energy_source = energy
+    return ComparableProperty(
+        epc=epc,
+        certificate_number=certificate_number,
+        address=address,
+        registration_date=registration_date,
+    )
+
+
+def test_scores_only_sap_10_2_targets() -> None:
+    # Arrange — a cohort of two distinct addresses: one SAP 10.2, one older
+    # (SAP 9.94). Only the 10.2 cert is a valid held-out target; the older one
+    # is kept as source evidence (its components are still valid).
+    cohort = [
+        _comparable(
+            certificate_number="A", address="1 THE ROW", sap_version=10.2
+        ),
+        _comparable(
+            certificate_number="B", address="2 THE ROW", sap_version=9.94
+        ),
+    ]
+
+    # Act
+    accuracy = evaluate_component_accuracy([cohort])
+
+    # Assert — exactly one target scored (the 10.2 cert), predicted from the
+    # older neighbour; the older cert was never held out.
+    assert accuracy.targets == 1
+    assert accuracy.rate("wall_construction") == 1.0
+
+
+def test_aggregates_a_wall_classification_miss() -> None:
+    # Arrange — the 10.2 target is solid brick (2); its only neighbour (the
+    # source) is cavity (1), so the predicted mode misses the wall.
+    cohort = [
+        _comparable(
+            certificate_number="A",
+            address="1 THE ROW",
+            sap_version=10.2,
+            wall_construction=2,
+        ),
+        _comparable(
+            certificate_number="B",
+            address="2 THE ROW",
+            sap_version=10.2,
+            wall_construction=1,
+        ),
+    ]
+
+    # Act
+    accuracy = evaluate_component_accuracy([cohort])
+
+    # Assert — both are 10.2 targets, and each is predicted from the other (the
+    # opposite wall), so wall_construction is missed both times.
+    assert accuracy.targets == 2
+    assert accuracy.rate("wall_construction") == 0.0
--- a/tests/domain/property/test_property.py
+++ b/tests/domain/property/test_property.py
@ -98,6 +98,44 @@ def test_effective_epc_follows_the_selected_source_path() -> None:
    assert epc_property.effective_epc is public_epc


+def test_source_path_is_predicted_when_only_a_predicted_epc_is_present() -> None:
+    # Arrange — no lodged EPC, no Site Notes; just a neighbour-synthesised picture
+    # (EPC Prediction gap-fill, ADR-0031).
+    predicted = _epc()
+    prop = Property(identity=_identity(), predicted_epc=predicted)
+
+    # Act / Assert — predicted is the last-resort source, not a raise
+    assert prop.source_path == "predicted"
+    assert prop.effective_epc is predicted
+
+
+def test_a_lodged_epc_wins_over_a_predicted_epc() -> None:
+    # Arrange — both a real lodged EPC and a neighbour-synthesised one are present;
+    # the real source must win (prediction is last-resort only, ADR-0031).
+    lodged = _epc()
+    predicted = _epc()
+    prop = Property(identity=_identity(), epc=lodged, predicted_epc=predicted)
+
+    # Act / Assert
+    assert prop.source_path == "epc_with_overlay"
+    assert prop.effective_epc is lodged
+
+
+def test_site_notes_win_over_a_predicted_epc() -> None:
+    # Arrange — Site Notes and a predicted EPC are present; the survey wins.
+    survey_epc = _epc()
+    predicted = _epc()
+    prop = Property(
+        identity=_identity(),
+        site_notes=SiteNotes(surveyed_at=date(2024, 6, 1), epc=survey_epc),
+        predicted_epc=predicted,
+    )
+
+    # Act / Assert
+    assert prop.source_path == "site_notes"
+    assert prop.effective_epc is survey_epc
+
+
 def test_property_with_no_source_raises() -> None:
    # Arrange
    prop = Property(identity=_identity())
--- a/tests/e2e/init.py
+++ b/tests/e2e/init.py
--- a/tests/e2e/test_epc_prediction_e2e.py
+++ b/tests/e2e/test_epc_prediction_e2e.py
@ -0,0 +1,177 @@
+"""END-TO-END showcase: an EPC-less Property flows through Ingestion, gets a
+predicted EPC synthesised from its postcode cohort, is persisted to the predicted
+slot, and comes back out of the Property repository resolving as the Effective
+EPC (ADR-0031).
+
+This is the full production path with ONLY the external HTTP clients faked (the
+EPC API, the geospatial S3 reader, the Solar API) — everything else is the real
+thing: the real Postgres Unit of Work, the real EPC + Property repositories
+against the test database, the real `EpcComparablePropertiesRepository`, and the
+real `EpcPrediction`. It is the canonical "see the whole flow" reference; the
+narrower unit tests live in:
+  - tests/orchestration/test_ingestion_prediction.py   (orchestrator: gate / persist)
+  - tests/repositories/epc/test_epc_predicted_slot.py  (the lodged|predicted slot)
+  - tests/domain/property/test_property.py             (the "predicted" source path)
+  - tests/domain/epc_prediction/test_prediction_target.py (the eligibility gate)
+"""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+from typing import Any, Optional
+
+from sqlalchemy import Engine
+from sqlmodel import Session
+
+from datatypes.epc.domain.epc_property_data import EpcPropertyData
+from datatypes.epc.domain.mapper import EpcPropertyDataMapper
+from datatypes.epc.search.epc_search_result import EpcSearchResult
+from domain.epc_prediction.epc_prediction import EpcPrediction
+from domain.epc_prediction.prediction_target import PredictionTargetAttributes
+from domain.geospatial.coordinates import Coordinates
+from domain.geospatial.planning_restrictions import PlanningRestrictions
+from domain.geospatial.spatial_reference import SpatialReference
+from domain.property.property import Property
+from infrastructure.postgres.property_table import PropertyRow
+from orchestration.ingestion_orchestrator import IngestionOrchestrator
+from repositories.comparable_properties.epc_comparable_properties_repository import (
+    EpcComparablePropertiesRepository,
+)
+from repositories.epc.epc_postgres_repository import EpcPostgresRepository
+from repositories.geospatial.geospatial_repository import GeospatialRepository
+from repositories.postgres_unit_of_work import PostgresUnitOfWork
+from repositories.property.property_postgres_repository import (
+    PropertyPostgresRepository,
+)
+from repositories.spatial.spatial_postgres_repository import SpatialPostgresRepository
+
+_JSON_SAMPLES = Path(__file__).resolve().parents[2] / "backend/epc_api/json_samples"
+_POSTCODE = "LS6 1AA"
+
+
+def _epc() -> EpcPropertyData:
+    raw: dict[str, Any] = json.loads(
+        (_JSON_SAMPLES / "RdSAP-Schema-21.0.0" / "epc.json").read_text()
+    )
+    return EpcPropertyDataMapper.from_api_response(raw)
+
+
+# --- fakes for the THREE external HTTP boundaries (everything else is real) ----
+
+
+class _FakeCohortEpcClient:
+    """Stands in for the live EPC API: the postcode's lodged certs + their data."""
+
+    def __init__(self, results: list[EpcSearchResult]) -> None:
+        self._results = results
+
+    def search_by_postcode(self, postcode: str) -> list[EpcSearchResult]:
+        return self._results
+
+    def get_by_certificate_number(self, cert_num: str) -> EpcPropertyData:
+        return _epc()
+
+
+class _FakeGeospatialRepo(GeospatialRepository):
+    """Stands in for the S3 Open-UPRN reader: UPRN → coordinates."""
+
+    def __init__(self, coords: dict[int, Coordinates]) -> None:
+        self._coords = coords
+
+    def coordinates_for(self, uprn: int) -> Optional[Coordinates]:
+        return self._coords.get(uprn)
+
+    def spatial_for(self, uprn: int) -> Optional[SpatialReference]:
+        coordinates = self._coords.get(uprn)
+        if coordinates is None:
+            return None
+        return SpatialReference(
+            coordinates=coordinates, restrictions=PlanningRestrictions()
+        )
+
+
+class _NoEpcFetcher:
+    """The target Property is EPC-less — the EPC API finds nothing for its UPRN."""
+
+    def get_by_uprn(self, uprn: int) -> Optional[EpcPropertyData]:
+        return None
+
+
+class _NoSolarFetcher:
+    def get_building_insights(
+        self, longitude: float, latitude: float
+    ) -> dict[str, Any]:
+        return {}
+
+
+class _FakeAttributesReader:
+    """Stands in for Jun-te's property_overrides read adapter: the landlord-known
+    property type (here a House, code "0", matching the cohort)."""
+
+    def attributes_for(self, property_id: int) -> PredictionTargetAttributes:
+        return PredictionTargetAttributes(property_type="0", built_form="2")
+
+
+def _cohort_results() -> list[EpcSearchResult]:
+    return [
+        EpcSearchResult(
+            certificate_number=f"CERT-{i}",
+            address_line_1=f"{i} Neighbour Road",
+            address_line_2=None,
+            address_line_3=None,
+            address_line_4=None,
+            postcode=_POSTCODE,
+            post_town="LEEDS",
+            uprn=20000 + i,
+            current_energy_efficiency_band="D",
+            registration_date=f"2023-0{i + 1}-01",
+        )
+        for i in range(3)
+    ]
+
+
+def test_epc_less_property_is_predicted_persisted_and_resolved_end_to_end(
+    db_engine: Engine,
+) -> None:
+    # Arrange — an EPC-less Property exists in the database (postcode + UPRN known,
+    # no EPC lodged), plus its postcode cohort behind the faked EPC API.
+    with Session(db_engine) as session:
+        row = PropertyRow(
+            portfolio_id=1, postcode=_POSTCODE, address="1 Target Street", uprn=10000
+        )
+        session.add(row)
+        session.commit()
+        property_id = row.id
+        assert property_id is not None
+
+    cohort_coords = {20000 + i: Coordinates(longitude=-1.55, latitude=53.81) for i in range(3)}
+    comparables_repo = EpcComparablePropertiesRepository(
+        _FakeCohortEpcClient(_cohort_results()), _FakeGeospatialRepo(cohort_coords)
+    )
+    orchestrator = IngestionOrchestrator(
+        unit_of_work=lambda: PostgresUnitOfWork(lambda: Session(db_engine)),
+        epc_fetcher=_NoEpcFetcher(),
+        geospatial_repo=_FakeGeospatialRepo({10000: Coordinates(longitude=-1.55, latitude=53.81)}),
+        solar_fetcher=_NoSolarFetcher(),
+        comparables_repo=comparables_repo,
+        prediction_attributes_reader=_FakeAttributesReader(),
+        epc_prediction=EpcPrediction(),
+    )
+
+    # Act — run Ingestion: no lodged EPC found → predict from the cohort → persist.
+    orchestrator.run([property_id])
+
+    # Assert — reloading the Property through the real repository, its Effective
+    # EPC is the predicted picture, flagged by the "predicted" source path.
+    with Session(db_engine) as session:
+        epc_repo = EpcPostgresRepository(session)
+        prop: Property = PropertyPostgresRepository(
+            session, epc_repo, SpatialPostgresRepository(session)
+        ).get(property_id)
+
+    assert prop.epc is None  # no lodged EPC
+    assert prop.predicted_epc is not None  # a predicted one was persisted
+    assert prop.source_path == "predicted"
+    assert prop.effective_epc is prop.predicted_epc
+    assert prop.effective_epc.property_type == "0"
--- a/tests/fixtures/epc_prediction/BD24JG/cert-01f1488000e8.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-01f1488000e8.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-15c0ce8ea563.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-15c0ce8ea563.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-1ef220911b4b.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-1ef220911b4b.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-208c5dfbaee2.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-208c5dfbaee2.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-496b8c226d26.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-496b8c226d26.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-4cce4c3fb33b.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-4cce4c3fb33b.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-5e427854bd6d.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-5e427854bd6d.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-5f4d37dacdf8.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-5f4d37dacdf8.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-6439eb9f1504.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-6439eb9f1504.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-75aa70bdd22a.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-75aa70bdd22a.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-7e61d706db57.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-7e61d706db57.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-8d1b1e15063c.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-8d1b1e15063c.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-cc1c722822ba.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-cc1c722822ba.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-d1396ff56fec.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-d1396ff56fec.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-d7e3196e1a0c.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-d7e3196e1a0c.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-dcb2c6ff3317.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-dcb2c6ff3317.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-e648d6164f10.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-e648d6164f10.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-e8c34b2323e0.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-e8c34b2323e0.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-eb71d39605ae.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-eb71d39605ae.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-eed1ed76757a.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-eed1ed76757a.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-f089b44ae169.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-f089b44ae169.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-f326c2524ab3.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-f326c2524ab3.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-f3aae3d2c3c9.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-f3aae3d2c3c9.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-f481cd1abc1f.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-f481cd1abc1f.json
--- a/tests/fixtures/epc_prediction/BD24JG/cert-f52356b57b37.json
+++ b/tests/fixtures/epc_prediction/BD24JG/cert-f52356b57b37.json
--- a/tests/fixtures/epc_prediction/CF481ND/cert-0dd25677d889.json
+++ b/tests/fixtures/epc_prediction/CF481ND/cert-0dd25677d889.json
--- a/tests/fixtures/epc_prediction/CF481ND/cert-3077aedcbe8b.json
+++ b/tests/fixtures/epc_prediction/CF481ND/cert-3077aedcbe8b.json
--- a/tests/fixtures/epc_prediction/CF481ND/cert-5aad1cfe207c.json
+++ b/tests/fixtures/epc_prediction/CF481ND/cert-5aad1cfe207c.json
--- a/tests/fixtures/epc_prediction/CF481ND/cert-5b3816460805.json
+++ b/tests/fixtures/epc_prediction/CF481ND/cert-5b3816460805.json
--- a/tests/fixtures/epc_prediction/CF481ND/cert-6030fde8e888.json
+++ b/tests/fixtures/epc_prediction/CF481ND/cert-6030fde8e888.json
--- a/tests/fixtures/epc_prediction/CF481ND/cert-6e6d1776f8b7.json
+++ b/tests/fixtures/epc_prediction/CF481ND/cert-6e6d1776f8b7.json
--- a/tests/fixtures/epc_prediction/CF481ND/cert-7791c2c9073d.json
+++ b/tests/fixtures/epc_prediction/CF481ND/cert-7791c2c9073d.json
--- a/tests/fixtures/epc_prediction/CF481ND/cert-8a1b88d2a80a.json
+++ b/tests/fixtures/epc_prediction/CF481ND/cert-8a1b88d2a80a.json
--- a/tests/fixtures/epc_prediction/CF481ND/cert-96d09ac53f57.json
+++ b/tests/fixtures/epc_prediction/CF481ND/cert-96d09ac53f57.json
--- a/tests/fixtures/epc_prediction/CF481ND/cert-ca16b6a09f55.json
+++ b/tests/fixtures/epc_prediction/CF481ND/cert-ca16b6a09f55.json
--- a/tests/fixtures/epc_prediction/CF481ND/cert-e54dae311758.json
+++ b/tests/fixtures/epc_prediction/CF481ND/cert-e54dae311758.json
--- a/tests/fixtures/epc_prediction/CF481ND/cert-f5de74d7fffc.json
+++ b/tests/fixtures/epc_prediction/CF481ND/cert-f5de74d7fffc.json
--- a/tests/fixtures/epc_prediction/CV15QJ/cert-1b4b3d26f79c.json
+++ b/tests/fixtures/epc_prediction/CV15QJ/cert-1b4b3d26f79c.json
--- a/tests/fixtures/epc_prediction/CV15QJ/cert-346dc8ab15a0.json
+++ b/tests/fixtures/epc_prediction/CV15QJ/cert-346dc8ab15a0.json
--- a/tests/fixtures/epc_prediction/CV15QJ/cert-526df35482d7.json
+++ b/tests/fixtures/epc_prediction/CV15QJ/cert-526df35482d7.json
--- a/tests/fixtures/epc_prediction/CV15QJ/cert-73d50930d0ac.json
+++ b/tests/fixtures/epc_prediction/CV15QJ/cert-73d50930d0ac.json
--- a/tests/fixtures/epc_prediction/CV15QJ/cert-8105f351163f.json
+++ b/tests/fixtures/epc_prediction/CV15QJ/cert-8105f351163f.json
--- a/tests/fixtures/epc_prediction/CV15QJ/cert-840682d5f191.json
+++ b/tests/fixtures/epc_prediction/CV15QJ/cert-840682d5f191.json
--- a/tests/fixtures/epc_prediction/CV15QJ/cert-f9dee3ea91ac.json
+++ b/tests/fixtures/epc_prediction/CV15QJ/cert-f9dee3ea91ac.json
--- a/tests/fixtures/epc_prediction/CV15QJ/cert-fc5fe3d2a055.json
+++ b/tests/fixtures/epc_prediction/CV15QJ/cert-fc5fe3d2a055.json
--- a/tests/fixtures/epc_prediction/CV78UG/cert-0246fdfa9718.json
+++ b/tests/fixtures/epc_prediction/CV78UG/cert-0246fdfa9718.json
--- a/tests/fixtures/epc_prediction/CV78UG/cert-2e6f5943059a.json
+++ b/tests/fixtures/epc_prediction/CV78UG/cert-2e6f5943059a.json
--- a/tests/fixtures/epc_prediction/CV78UG/cert-7d9beea6555e.json
+++ b/tests/fixtures/epc_prediction/CV78UG/cert-7d9beea6555e.json
--- a/tests/fixtures/epc_prediction/CV78UG/cert-89894e90fc9c.json
+++ b/tests/fixtures/epc_prediction/CV78UG/cert-89894e90fc9c.json
--- a/tests/fixtures/epc_prediction/CV78UG/cert-91dd248e55ee.json
+++ b/tests/fixtures/epc_prediction/CV78UG/cert-91dd248e55ee.json
--- a/tests/fixtures/epc_prediction/CV78UG/cert-924d78d64f06.json
+++ b/tests/fixtures/epc_prediction/CV78UG/cert-924d78d64f06.json
--- a/tests/fixtures/epc_prediction/CV78UG/cert-94454d5d782e.json
+++ b/tests/fixtures/epc_prediction/CV78UG/cert-94454d5d782e.json
--- a/tests/fixtures/epc_prediction/CV78UG/cert-9b4787ad7813.json
+++ b/tests/fixtures/epc_prediction/CV78UG/cert-9b4787ad7813.json
--- a/tests/fixtures/epc_prediction/CV78UG/cert-9c0b5437b98b.json
+++ b/tests/fixtures/epc_prediction/CV78UG/cert-9c0b5437b98b.json
--- a/tests/fixtures/epc_prediction/CV78UG/cert-aef738e4b1c0.json
+++ b/tests/fixtures/epc_prediction/CV78UG/cert-aef738e4b1c0.json
--- a/tests/fixtures/epc_prediction/CV78UG/cert-ba51394914cf.json
+++ b/tests/fixtures/epc_prediction/CV78UG/cert-ba51394914cf.json
--- a/tests/fixtures/epc_prediction/CV78UG/cert-cd99c8b93a27.json
+++ b/tests/fixtures/epc_prediction/CV78UG/cert-cd99c8b93a27.json
--- a/Show more
+++ b/Show more