Two review points from @dancafc:
1) Rename the `Comparable` dataclass → `ComparableProperty` (it models one
comparable *property*; the collection stays `ComparableProperties`). Applied
across domain, repositories, orchestration, harness, scripts, and tests with a
word-boundary rename so `ComparableProperties` is untouched.
2) Move `PredictionTarget` out of comparable_properties.py into prediction_target.py
(where `PredictionTargetAttributes` + `build_prediction_target` already live).
comparable_properties.py now imports it; no import cycle (prediction_target no
longer depends on comparable_properties). Importers updated.
92 tests pass across the touched suites; pyright strict clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire EPC Prediction gap-fill into IngestionOrchestrator (ADR-0031). When the
predictor collaborators are injected (ComparablesRepo + PredictionAttributesReader
+ EpcPrediction), an EPC-less Property is predicted from its postcode cohort and
persisted to the predicted slot; the eligibility gate (unknown property_type) and
"a lodged EPC is never predicted over" both hold. The two-phase contract is kept:
prediction attributes (Landlord Overrides) resolve in the unit prep phase, the
cohort fetch + select + predict run in the no-unit IO phase, persistence in the
write phase. All three collaborators are OPTIONAL — unwired, ingestion behaves
exactly as before (existing tests unchanged).
3 tests (predict+persist, gate, lodged-wins); 228 pass across orchestration +
epc_prediction + repositories; pyright strict clean. Production composition-root
wiring (real ComparableProperties + override-attributes adapters) is part of the
Jun-te handover.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Slice 3c.4. Ingestion now resolves the whole spatial reference in one lookup
(`spatial_for`) — the coordinates drive the Solar fetch as before, and the
reference (coordinates + planning protections) is persisted per-UPRN via
`uow.spatial` in the same write batch, so Modelling can read the protections
back off the Property (ADR-0020). `_Fetched` carries the UPRN and the reference
into the write phase.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Final slice of ADR-0012: collapse the per-property read round-trips a batch
made (Baseline hydrated ~8 queries x 30 properties one at a time) into a
handful of per-table IN queries.
- EpcPostgresRepository: extracted a shared `_compose(rows)` from `get` (the
windows + floor-dim fetches are now passed in, not fetched inline), so both
`get` and the new `get_for_properties(property_ids)` build EpcPropertyData
from pre-fetched rows. `get_for_properties` fetches each child table once
(`WHERE epc_property_id IN ...`), groups in memory, and composes — load-whole
per ADR-0002.
- PropertyRepository.get_many(property_ids) -> Properties: one query for the
property rows + one bulk EPC hydration, composed in input order.
- BaselineOrchestrator / IngestionOrchestrator read the batch via get_many
instead of N x get.
- Ports + fakes gain the bulk methods.
The #1129 round-trip fidelity test stays green (the compose extraction is
behaviour-preserving). New tests: bulk hydration correctness + round-trips are
constant w.r.t. batch size (one-per-table, proven by query count). 123 pass;
pyright strict clean; AAA.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replaces the handler's whole-pipeline Session (one transaction across all
three stages, connection pinned during Ingestion's external IO) with a
Unit-of-Work per stage (ADR-0012, added here). Each stage runs its batch in
one unit and commits once; any property raising aborts the batch and the
subtask fails noisily.
- BaselineOrchestrator(unit_of_work, rebaseliner): one unit for the batch,
commit once. Raise on a pre-SAP10 property leaves the unit uncommitted.
- IngestionOrchestrator(unit_of_work, epc_fetcher, geospatial_repo,
solar_fetcher): fetch/write split — phase 1 fetches the whole batch (EPC /
coords / solar) with NO unit open; phase 2 writes in one unit and commits.
The connection is never held during external IO. Geospatial S3 repo stays
injected (reference data, not transactional).
- Handler: module-scoped engine (pool reused across warm invocations) + a UoW
factory; whole-pipeline `with Session` gone. `build_first_run_pipeline`
composes on the factory. Source clients still behind the raising seam.
- ADR-0012 records the decision (per-stage boundary, all-or-nothing batch,
idempotent re-run, fetch/write split, module-scoped engine). Modelling stub
left untouched (no-op, no DB) per the ADR.
Tests: orchestrators on a shared FakeUnitOfWork (assert persisted batch +
exactly-once commit + no-commit-on-raise). New real-DB E2E integration test:
real PostgresUnitOfWork, Ingestion writes the EPC → Baseline reads it back
through the repo → re-run replaces, not duplicates (1 EPC row, 1 baseline row
after two runs). 121 pass in tests/; pyright strict clean; AAA.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Stage 1 of the pipeline: per property, read its UPRN from the property row,
fetch its EPC, resolve coordinates from the Geospatial reference repo, thread
those into the Solar fetcher, and persist EPC + solar via repos. Fetchers never
call each other — the orchestrator threads the coordinate (ADR-0011). Coordinates
are reference data (deterministic from UPRN), resolved transiently to drive the
solar fetch rather than persisted per-property.
Depends on thin EpcFetcher/SolarFetcher Protocols (EpcClientService and
GoogleSolarApiClient satisfy them structurally). Unit-tested against fakes — no
DB, gov API, or network: persists EPC, threads coords into solar, skips
UPRN-less properties and skips solar when coordinates are absent. pyright clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>