feat(epc-prediction): thread prediction injection points through the composition root

build_first_run_pipeline now constructs epc_prediction=EpcPrediction() and accepts
comparables_repo + prediction_attributes_reader as optional params, threading them
into IngestionOrchestrator (ADR-0031). The on-switch is now just supplying those
two arguments — no orchestrator/handler edits — once they exist: the cohort repo
(its EPC client is the source client pending #1136) and the property_overrides
attributes reader (built separately). Both default None, so the feature stays OFF
and ingestion is unchanged until they're passed.

The epc_property.source migration is live, so the predicted-EPC persistence slot
(slice-5c) now works against the real DB. Handover updated to reflect the simpler
composition-root step.

pyright strict clean; handler + pipeline + ingestion-prediction tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-06-16 13:53:54 +00:00
parent 7ca1f815f6
commit a43c03ed94
2 changed files with 29 additions and 7 deletions

View file

@ -10,6 +10,7 @@ from sqlmodel import Session
from applications.ara_first_run.ara_first_run_trigger_body import (
AraFirstRunTriggerBody,
)
from domain.epc_prediction.epc_prediction import EpcPrediction
from domain.property_baseline.calculator_rebaseliner import CalculatorRebaseliner
from domain.sap10_calculator.calculator import Sap10Calculator
from infrastructure.postgres.config import PostgresConfig
@ -17,8 +18,10 @@ from infrastructure.postgres.engine import make_engine
from orchestration.property_baseline_orchestrator import PropertyBaselineOrchestrator
from orchestration.ara_first_run_pipeline import AraFirstRunPipeline
from orchestration.ingestion_orchestrator import (
ComparablesRepo,
EpcFetcher,
IngestionOrchestrator,
PredictionAttributesReader,
SolarFetcher,
)
from orchestration.modelling_orchestrator import ModellingOrchestrator
@ -65,12 +68,23 @@ def build_first_run_pipeline(
epc_fetcher: EpcFetcher,
geospatial_repo: GeospatialRepository,
solar_fetcher: SolarFetcher,
comparables_repo: Optional[ComparablesRepo] = None,
prediction_attributes_reader: Optional[PredictionAttributesReader] = None,
) -> AraFirstRunPipeline:
"""Compose the real three-stage pipeline on a Unit-of-Work factory.
Each stage opens its own unit(s) and commits per batch (ADR-0012); the
handler no longer holds a session. The source clients are passed in because
their config is not settled see ``_source_clients_from_env``.
EPC Prediction gap-fill (ADR-0031) is the predictor itself (pure) plus two
injected collaborators: the postcode-cohort source and the Landlord-Override
attributes reader. Both default to None, so the feature is **off** until they
are supplied an EPC-less Property is then predicted into its predicted slot.
The cohort repo is injected (not built here) because its EPC client is the
same source client whose wiring is still pending; the attributes reader is the
`property_overrides` read adapter built separately. Until both are passed,
ingestion behaves exactly as before.
"""
return AraFirstRunPipeline(
ingestion=IngestionOrchestrator(
@ -78,6 +92,9 @@ def build_first_run_pipeline(
epc_fetcher=epc_fetcher,
geospatial_repo=geospatial_repo,
solar_fetcher=solar_fetcher,
comparables_repo=comparables_repo,
prediction_attributes_reader=prediction_attributes_reader,
epc_prediction=EpcPrediction(),
),
baseline=PropertyBaselineOrchestrator(
unit_of_work=unit_of_work,

View file

@ -52,21 +52,26 @@ so the gate skips the Property.
`docs/MIGRATION_NOTE_predicted_epc_source.md` (column + default `'lodged'` +
relax any `property_id` uniqueness to `(property_id, source)`).
### 3. Wire the collaborators at the composition root
### 3. Pass the two collaborators at the composition root
Wherever `IngestionOrchestrator(...)` is constructed for the real run, pass the
three optional kwargs:
This is now wired: `build_first_run_pipeline` (in `applications/ara_first_run/handler.py`)
already constructs `epc_prediction=EpcPrediction()` and accepts the other two as
optional params that it threads into the `IngestionOrchestrator`. So the on-switch
is just supplying them once they exist:
```python
IngestionOrchestrator(
build_first_run_pipeline(
...,
comparables_repo=EpcComparablePropertiesRepository(epc_client, geospatial_repo),
prediction_attributes_reader=<your property_overrides adapter>,
epc_prediction=EpcPrediction(),
prediction_attributes_reader=<your property_overrides adapter>, # task #1
)
```
That's the on-switch. Until all three are passed, ingestion ignores prediction.
`epc_client` is the same EPC source client behind `epc_fetcher` (the concrete
`EpcClientService` exposes `search_by_postcode` + `get_by_certificate_number`),
so build it alongside the other source clients in `_source_clients_from_env`
(pending #1136). Until **both** are passed, ingestion ignores prediction — no
orchestrator or handler edits needed, just the two arguments.
## One open item — Validation Cohort exclusion