Commit graph

7 commits

Author SHA1 Message Date
Khalim Conn-Kowlessar
7ca1f815f6 refactor(epc-prediction): PR review — rename ComparableProperty, relocate PredictionTarget
Two review points from @dancafc:

1) Rename the `Comparable` dataclass → `ComparableProperty` (it models one
   comparable *property*; the collection stays `ComparableProperties`). Applied
   across domain, repositories, orchestration, harness, scripts, and tests with a
   word-boundary rename so `ComparableProperties` is untouched.

2) Move `PredictionTarget` out of comparable_properties.py into prediction_target.py
   (where `PredictionTargetAttributes` + `build_prediction_target` already live).
   comparable_properties.py now imports it; no import cycle (prediction_target no
   longer depends on comparable_properties). Importers updated.

92 tests pass across the touched suites; pyright strict clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 13:34:44 +00:00
Khalim Conn-Kowlessar
5727ac53c1 feat(epc-prediction): slice-5e ingestion wiring (gate → predict → persist)
Wire EPC Prediction gap-fill into IngestionOrchestrator (ADR-0031). When the
predictor collaborators are injected (ComparablesRepo + PredictionAttributesReader
+ EpcPrediction), an EPC-less Property is predicted from its postcode cohort and
persisted to the predicted slot; the eligibility gate (unknown property_type) and
"a lodged EPC is never predicted over" both hold. The two-phase contract is kept:
prediction attributes (Landlord Overrides) resolve in the unit prep phase, the
cohort fetch + select + predict run in the no-unit IO phase, persistence in the
write phase. All three collaborators are OPTIONAL — unwired, ingestion behaves
exactly as before (existing tests unchanged).

3 tests (predict+persist, gate, lodged-wins); 228 pass across orchestration +
epc_prediction + repositories; pyright strict clean. Production composition-root
wiring (real ComparableProperties + override-attributes adapters) is part of the
Jun-te handover.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 04:03:02 +00:00
Jun-te Kim
32de7f6c3f 17.1 and 18 done by claude 2026-06-12 12:52:36 +00:00
Khalim Conn-Kowlessar
af5dfdf8e2 feat(ingestion): cache the spatial reference per UPRN
Slice 3c.4. Ingestion now resolves the whole spatial reference in one lookup
(`spatial_for`) — the coordinates drive the Solar fetch as before, and the
reference (coordinates + planning protections) is persisted per-UPRN via
`uow.spatial` in the same write batch, so Modelling can read the protections
back off the Property (ADR-0020). `_Fetched` carries the UPRN and the reference
into the write phase.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 17:24:03 +00:00
Khalim Conn-Kowlessar
8685f8ba3a perf(repos): bulk get_many / get_for_properties — batch reads, not N round-trips (#1138)
Final slice of ADR-0012: collapse the per-property read round-trips a batch
made (Baseline hydrated ~8 queries x 30 properties one at a time) into a
handful of per-table IN queries.

- EpcPostgresRepository: extracted a shared `_compose(rows)` from `get` (the
  windows + floor-dim fetches are now passed in, not fetched inline), so both
  `get` and the new `get_for_properties(property_ids)` build EpcPropertyData
  from pre-fetched rows. `get_for_properties` fetches each child table once
  (`WHERE epc_property_id IN ...`), groups in memory, and composes — load-whole
  per ADR-0002.
- PropertyRepository.get_many(property_ids) -> Properties: one query for the
  property rows + one bulk EPC hydration, composed in input order.
- BaselineOrchestrator / IngestionOrchestrator read the batch via get_many
  instead of N x get.
- Ports + fakes gain the bulk methods.

The #1129 round-trip fidelity test stays green (the compose extraction is
behaviour-preserving). New tests: bulk hydration correctness + round-trips are
constant w.r.t. batch size (one-per-table, proven by query count). 123 pass;
pyright strict clean; AAA.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 10:33:24 +00:00
Khalim Conn-Kowlessar
48a488d1e9 refactor(orchestration): wire stages onto the UnitOfWork; per-stage commit (#1138)
Replaces the handler's whole-pipeline Session (one transaction across all
three stages, connection pinned during Ingestion's external IO) with a
Unit-of-Work per stage (ADR-0012, added here). Each stage runs its batch in
one unit and commits once; any property raising aborts the batch and the
subtask fails noisily.

- BaselineOrchestrator(unit_of_work, rebaseliner): one unit for the batch,
  commit once. Raise on a pre-SAP10 property leaves the unit uncommitted.
- IngestionOrchestrator(unit_of_work, epc_fetcher, geospatial_repo,
  solar_fetcher): fetch/write split — phase 1 fetches the whole batch (EPC /
  coords / solar) with NO unit open; phase 2 writes in one unit and commits.
  The connection is never held during external IO. Geospatial S3 repo stays
  injected (reference data, not transactional).
- Handler: module-scoped engine (pool reused across warm invocations) + a UoW
  factory; whole-pipeline `with Session` gone. `build_first_run_pipeline`
  composes on the factory. Source clients still behind the raising seam.
- ADR-0012 records the decision (per-stage boundary, all-or-nothing batch,
  idempotent re-run, fetch/write split, module-scoped engine). Modelling stub
  left untouched (no-op, no DB) per the ADR.

Tests: orchestrators on a shared FakeUnitOfWork (assert persisted batch +
exactly-once commit + no-commit-on-raise). New real-DB E2E integration test:
real PostgresUnitOfWork, Ingestion writes the EPC → Baseline reads it back
through the repo → re-run replaces, not duplicates (1 EPC row, 1 baseline row
after two runs). 121 pass in tests/; pyright strict clean; AAA.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 09:54:47 +00:00
Khalim Conn-Kowlessar
1696cccba6 feat(ingestion): IngestionOrchestrator end-to-end (#1134)
Stage 1 of the pipeline: per property, read its UPRN from the property row,
fetch its EPC, resolve coordinates from the Geospatial reference repo, thread
those into the Solar fetcher, and persist EPC + solar via repos. Fetchers never
call each other — the orchestrator threads the coordinate (ADR-0011). Coordinates
are reference data (deterministic from UPRN), resolved transiently to drive the
solar fetch rather than persisted per-property.

Depends on thin EpcFetcher/SolarFetcher Protocols (EpcClientService and
GoogleSolarApiClient satisfy them structurally). Unit-tested against fakes — no
DB, gov API, or network: persists EPC, threads coords into solar, skips
UPRN-less properties and skips solar when coordinates are absent. pyright clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 19:58:21 +00:00