Model/orchestration/baseline_orchestrator.py
Khalim Conn-Kowlessar 48a488d1e9 refactor(orchestration): wire stages onto the UnitOfWork; per-stage commit (#1138)
Replaces the handler's whole-pipeline Session (one transaction across all
three stages, connection pinned during Ingestion's external IO) with a
Unit-of-Work per stage (ADR-0012, added here). Each stage runs its batch in
one unit and commits once; any property raising aborts the batch and the
subtask fails noisily.

- BaselineOrchestrator(unit_of_work, rebaseliner): one unit for the batch,
  commit once. Raise on a pre-SAP10 property leaves the unit uncommitted.
- IngestionOrchestrator(unit_of_work, epc_fetcher, geospatial_repo,
  solar_fetcher): fetch/write split — phase 1 fetches the whole batch (EPC /
  coords / solar) with NO unit open; phase 2 writes in one unit and commits.
  The connection is never held during external IO. Geospatial S3 repo stays
  injected (reference data, not transactional).
- Handler: module-scoped engine (pool reused across warm invocations) + a UoW
  factory; whole-pipeline `with Session` gone. `build_first_run_pipeline`
  composes on the factory. Source clients still behind the raising seam.
- ADR-0012 records the decision (per-stage boundary, all-or-nothing batch,
  idempotent re-run, fetch/write split, module-scoped engine). Modelling stub
  left untouched (no-op, no DB) per the ADR.

Tests: orchestrators on a shared FakeUnitOfWork (assert persisted batch +
exactly-once commit + no-commit-on-raise). New real-DB E2E integration test:
real PostgresUnitOfWork, Ingestion writes the EPC → Baseline reads it back
through the repo → re-run replaces, not duplicates (1 EPC row, 1 baseline row
after two runs). 121 pass in tests/; pyright strict clean; AAA.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 09:54:47 +00:00

66 lines
2.5 KiB
Python

from __future__ import annotations
from collections.abc import Callable
from datatypes.epc.domain.epc_property_data import (
EpcPropertyData,
RenewableHeatIncentive,
)
from domain.baseline.baseline_performance import BaselinePerformance
from domain.baseline.performance import lodged_performance
from domain.baseline.rebaseliner import Rebaseliner
from repositories.unit_of_work import UnitOfWork
class BaselineOrchestrator:
"""Stage 2: establish each Property's Baseline Performance and persist it.
Runs the whole batch in **one** Unit of Work and commits once (ADR-0012):
for each property it hydrates the Property via the unit's PropertyRepo,
resolves the Effective EPC, reads Lodged Performance off it, runs the
Rebaseliner to produce Effective Performance, and persists the pair plus the
deterministic kWh. Any property raising aborts the batch — the unit is left
uncommitted, so nothing persists and the subtask fails noisily.
Reads only from repos — never a Fetcher or HTTP (ADR-0003) — so it is
byte-identical whether Ingestion ran milliseconds ago (First Run) or last
week. The injected Rebaseliner is the re-score-on-override seam (ADR-0011).
"""
def __init__(
self,
*,
unit_of_work: Callable[[], UnitOfWork],
rebaseliner: Rebaseliner,
) -> None:
self._unit_of_work = unit_of_work
self._rebaseliner = rebaseliner
def run(self, property_ids: list[int]) -> None:
with self._unit_of_work() as uow:
for property_id in property_ids:
effective_epc = uow.property.get(property_id).effective_epc
lodged = lodged_performance(effective_epc)
effective, reason = self._rebaseliner.rebaseline(
effective_epc, lodged
)
rhi = _require_rhi(effective_epc)
baseline = BaselinePerformance(
lodged=lodged,
effective=effective,
rebaseline_reason=reason,
space_heating_kwh=rhi.space_heating_kwh,
water_heating_kwh=rhi.water_heating_kwh,
)
uow.baseline.save(baseline, property_id)
uow.commit()
def _require_rhi(epc: EpcPropertyData) -> RenewableHeatIncentive:
rhi = epc.renewable_heat_incentive
if rhi is None:
raise ValueError(
"Effective EPC is missing renewable_heat_incentive; cannot read "
"baseline space-heating / hot-water kWh"
)
return rhi