docs(epc-prediction): slice-5f production-wiring handover for Jun-te

The gap-fill is wired end-to-end (slices 5a-5e) behind seams; this note is what's
left to switch it on in production: (1) implement the PredictionTargetAttributesReader
stub over property_overrides — with the override-value → API-code mapping
select_comparables needs; (2) run the epc_property.source Drizzle migration; (3)
pass the three optional collaborators at the IngestionOrchestrator composition
root. Plus the open Validation-Cohort exclusion (no code path exists yet — exclude
on source_path == "predicted" when one is built) and the anomaly dual-use pointer.

No code change: the validation exclusion has no consumer to attach to today, and
the structural signal (source_path == "predicted") already exists from slice-5a.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-06-16 04:05:00 +00:00
parent 5727ac53c1
commit b677448fa0

View file

@ -0,0 +1,94 @@
# EPC Prediction — production wiring handover (for Jun-te)
The EPC Prediction **gap-fill** is wired end-to-end behind seams, with one real
dependency stubbed: reading an EPC-less Property's resolved Landlord Overrides.
This note is what's needed to finish it once your `property_overrides` read path
lands. Design is **ADR-0031**; terms in **CONTEXT.md** (EPC Prediction, Effective
EPC, EPC Anomaly Flag).
## What's already built (slices 5a5e, all on `feature/epc-prediction`)
- **5a** `Property.predicted_epc` slot + a `"predicted"` `source_path` /
`effective_epc` branch — used only when there's no lodged EPC and no Site Notes
(a real source always wins).
- **5b** `ComparablePropertiesRepository.candidates_for(postcode)` +
`EpcComparablePropertiesRepository` adapter (postcode search → per-cert fetch →
batched UPRN→coords). Composes with `EpcClientService` + `GeospatialS3Repository`.
- **5c** EPC store `source` discriminator (`lodged` | `predicted`) so the two
coexist per property; `get_predicted_for_property` / `_for_properties`;
`PropertyPostgresRepository` hydrates `predicted_epc`. **Needs a DB migration —
see `docs/MIGRATION_NOTE_predicted_epc_source.md`.**
- **5d** `build_prediction_target(identity, coords, attributes)` + the eligibility
**gate** (unknown `property_type` → not predicted). Override attributes come
through the `PredictionTargetAttributesReader` port (the stub).
- **5e** `IngestionOrchestrator` wiring: when the three prediction collaborators
are injected, an EPC-less Property is predicted from its cohort and persisted to
the predicted slot. The collaborators are **optional** — unwired, ingestion is
unchanged.
## Your part — three things
### 1. Implement `PredictionTargetAttributesReader` (the stub)
`repositories/property/prediction_target_attributes_reader.py` defines the port:
`attributes_for(property_id) -> PredictionTargetAttributes` (property_type,
built_form, wall_construction). Build the adapter as a read over the
`property_overrides` fact layer (the finaliser writes it via
`PropertyOverrideRepository.upsert_all`; you're adding the read side).
**Code-space gotcha.** `select_comparables` filters
`comparable.epc.property_type == target.property_type`, and the cohort EPCs carry
gov **API codes** (e.g. `"0"`/`"2"`). Landlord Overrides resolve to enum *value*
strings (e.g. `"House"`). Your adapter must map override value → the API-code
space, or `property_type` will never match and every cohort comes back empty.
Same for `built_form`. (`domain/epc/property_type.py`, `built_form_type.py` are
the enums; `datatypes/epc/domain/epc_codes.csv` has the code table.)
`property_type` unresolved → return `PredictionTargetAttributes(property_type=None)`
so the gate skips the Property.
### 2. Run the Drizzle migration
`epc_property.source` column — full spec in
`docs/MIGRATION_NOTE_predicted_epc_source.md` (column + default `'lodged'` +
relax any `property_id` uniqueness to `(property_id, source)`).
### 3. Wire the collaborators at the composition root
Wherever `IngestionOrchestrator(...)` is constructed for the real run, pass the
three optional kwargs:
```python
IngestionOrchestrator(
...,
comparables_repo=EpcComparablePropertiesRepository(epc_client, geospatial_repo),
prediction_attributes_reader=<your property_overrides adapter>,
epc_prediction=EpcPrediction(),
)
```
That's the on-switch. Until all three are passed, ingestion ignores prediction.
## One open item — Validation Cohort exclusion
A predicted-source Property has **no real lodged record**, so it must not be
scored as if it did (CONTEXT: Validation Cohort; ADR-0031 dec-3). There is **no
Validation-Cohort code path today** to exclude it from — when one is built (or in
any QA that compares `calc(effective_epc)` vs lodged), exclude on the structural
signal:
```python
if prop.source_path == "predicted":
continue # predicted EPC — no ground truth to validate against
```
Note too: `PropertyBaselinePerformance.lodged` is derived from `effective_epc`
regardless of source (`property_baseline_orchestrator``lodged_performance`), so
for a predicted Property that "lodged" is synthesised, not real. Decide whether
baseline should null/flag it for predicted properties when this lands.
## Anomaly dual-use (later, not now)
Slice-5 is gap-fill only (`epc is None`). The slot model already supports
predicting for *every* Property to compare predicted vs lodged (**EPC Anomaly
Flags**) — see ADR-0031 dec-4. Reuses the same `ComparableProperties` repo + the
predicted slot.