Model/ara_backend_design.md
Khalim Conn-Kowlessar ce12b114c7 docs(ara): next-agent handover for Property Baseline (SAP calc) + Modelling
Orientation for the next chat picking up the two open fronts after the
ara_first_run rebuild shipped:
- where things stand (merged to main via per-cert; branch/worktree layout;
  PRs into per-cert), authoritative ADRs/CONTEXT to read,
- current architecture + key files (post baseline→property_baseline /
  FirstRun→AraFirstRun rename),
- conventions + gotchas (TDD, ephemeral PG, FakeUnitOfWork, pyright noise to
  ignore, gh-credential push workaround),
- Task 1: wire Sap10Calculator into PropertyBaselineOrchestrator (Calculated
  SAP10 Performance as a third value-set; failure-posture decision),
- Task 2: Modelling (stubs to build out; MaterialsRepository naming open;
  needs a UoW when writing Plans),
- the raising/no-op seams not to mistake for done,
- known doc drift flagged (CONTEXT term vs PropertyBaselinePerformance class;
  stale domain/sap/ path → domain/sap10_calculator).

Also banners ara_backend_design.md as superseded (architecture) by ADR-0011/0012.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:20:06 +00:00

53 KiB
Raw Blame History

ARA Backend Redesign — Design PRD

⚠️ SUPERSEDED (architecture sections). This is an early draft PRD. The actual architecture as built differs — see the ADRs in docs/adr/ (especially 0011 composable stage orchestrators, 0012 Unit-of-Work per-stage batch) and docs/HANDOVER_ARA_NEXT.md for current state. Treat this doc as historical context, not the source of truth for layout/contracts.

Status: Draft for team review Author: Khalim Conn-Kowlessar (with Claude grill session) Branch: ara-backend-design-prd Scope: Service architecture + domain model + contracts for the new modelling backend. Linked sub-PRDs cover ML training pipeline, DB schema migration, and historical EPC re-mapping.


1. Context

1.1 The forcing function

The current modelling backend (backend/engine/engine.pymodel_engine, 1331 LOC) was built as an MVP. It is:

  • Tightly coupled to a specific gov EPC API that is being decommissioned on 30 May 2026 (~17 days from today).
  • A monolith — one async function reaches into DB modules, HTTP clients, ML lambdas, S3, and queue infrastructure directly.
  • Bottlenecked on a single person — Khalim is the only contributor able to safely modify the engine because no one else can predict the blast radius of a change.
  • Already returning erroneous data from the old API (clients are aware). The replacement API is partially built (backend/epc_client/epc_client_service.py) on the current feature branch.

1.2 What needs to change

Beyond just swapping API clients, this is the moment to rebuild the backend into a production-grade, contribute-able codebase, with:

  • A clear domain model rooted in the new EPC schema (EpcPropertyData).
  • Service boundaries that other team members can read, fix, and extend without needing the entire mental model.
  • Repository-mediated persistence so business logic can be tested without spinning up a database.
  • A separation between data fetching (slow, IO-heavy, external) and modelling (deterministic, fast, internal).
  • Baseline kWh and bills derived deterministically from the Effective EPC (SAP physics + UCL correction + per-fuel rates from a refreshable repo) rather than from the EPC's recorded cost fields (which use fuel rates pinned to the inspection date) or from an ML kWh prediction.

1.3 Out of scope for this PRD

These ship as linked sub-PRDs:

  • Sub-PRD (ii) — ML training pipeline (autogluon repo + parquet generation in this repo + scoring model retraining for the new EPC schema)
  • Sub-PRD (iii) — DB schema migration (new tables: site_notes, landlord_overrides, EPC cache, parallel write strategy)
  • Sub-PRD (iv) — Historical EPC re-mapping (one-off + ongoing batch job: legacy stored EPCs → new EpcPropertyData shape)

The contracts this PRD defines are the inputs each sub-PRD consumes.


2. Goals and non-goals

2.1 Goals

  1. Survive the 30 May API shutdown — even if it means a brief degraded window, modelling continues to function against the new gov EPC API.
  2. Decouple data fetching from modelling — modelling never makes external HTTP calls; it reads everything from repositories.
  3. Make every service unit-testable against fakes — no test needs a real DB, a real gov API, or a real ML lambda to verify business logic.
  4. Establish a single Property aggregate root as the domain centrepiece; all 9 modelling concerns are slices of one aggregate.
  5. Versioned ML data contract — the EPC-to-features transform is the single shared artifact between this repo and the autogluon repo.
  6. Per-property UI surfaces — fetched data can be shown to users for review and override before modelling runs; modelling is triggered separately. This will enable a landlord facing version of the product where we fetch the open data, present back to the user for review and then perform the modelling.

2.2 Non-goals

  • Multi-region deploy, GDPR-class data minimisation work, or compliance reporting — separate workstreams.
  • Replacement of the front-end. The new APIs preserve enough of the existing response shape that the FE migrates incrementally.
  • Removing pandas. The ML transform output is a parquet-friendly DataFrame-like shape; that stays.
  • A workflow engine (Prefect / Temporal / Airflow). Coordinator-class orchestration plus the existing SQS-fanout pattern is sufficient at the scale we serve.

3. Cutover plan

Forced cut-over, driven by the 30 May deadline. There is no strangler period because the Old EPC API death takes model_engine with it.

3.1 Phase 0 — Status quo (now → 30 May)

  • model_engine keeps running against the Old EPC API for as long as it works.
  • Build of the 9 new services starts this week, in parallel to the old engine continuing to serve traffic.
  • The new ara/ package lives alongside backend/ but is not yet wired into any production endpoint.
  • Goal: keep the lights on until the API dies; start the build immediately so the dark period is short.

3.2 Phase 1 — Forced cut-over (30 May onwards)

  • On 30 May the Old EPC API dies; model_engine ceases to function for any new modelling run.
  • Some downtime is expected and accepted. Clients are aware.
  • Modelling resumes when the new pipeline is ready end-to-end. Remains to be decided if we have a per-portfolio flag, purely for the front end to reference old tables where necessary. No parallel pipelines, no traffic split — the new pipeline is the only pipeline.
  • Calico and Hyde are the first live clients onto the new pipeline in June.
  • model_engine, SearchEpc, the legacy Property, and surrounding modules in backend/ are deleted once the new pipeline is serving all traffic.

3.3 What is not done

  • No strangler — there is nothing to strangle once the Old EPC API dies on 30 May.
  • No parallel-shadow run — would double compute and require diff tooling we don't have, while the old engine is already known to return bad data so diffs would be noise.
  • TBC per-portfolio feature flag. Without this, the cut-over is all-or-nothing. All old portfolios are broken.

4. Architecture overview

┌─────────────────────────────────────────────────────────────────────┐
│  Trigger endpoint(s)                                                │
│  (one or two — see §4.5; deferred decision)                         │
└───────────┬──────────────────────────────────────────┬──────────────┘
            │                                          │
            ▼                                          ▼
   ┌─────────────────┐                       ┌─────────────────┐
   │ IngestionPipe   │   SQS, batches of N   │ ModellingPipe   │
   │ -----------     │ ◄─────────────────────│ -----------     │
   │ Fetchers run    │                       │ Reads via Repos │
   │ Persist via     │                       │ Calls Services  │
   │ Repos           │                       │ ML predictions  │
   └────────┬────────┘                       └────────┬────────┘
            │                                         │
            └───────────────► Repos ◄─────────────────┘
                                  │
                                  ▼
                        ┌──────────────────┐
                        │  Postgres tables │
                        │  (property,      │
                        │   epc_cache,     │
                        │   site_notes,    │
                        │   landlord_      │
                        │   overrides,     │
                        │   plans, etc.)   │
                        └──────────────────┘

  ┌──────────────────────────┐
  │  RefreshOrchestrator     │  triggers Ingestion → diff → conditionally Modelling
  └──────────────────────────┘

4.1 Class taxonomy

Every class falls into exactly one of four roles:

Role Job Examples
Fetchers Call external APIs. Return raw response data. No DB. EpcClientService, GeospatialFetcher, SolarFetcher, SiteNotesIngester
Repos Persist and load domain aggregates. SQL hidden inside. No external IO. PropertyRepo, EpcCacheRepo, SiteNotesRepo, LandlordOverridesRepo, RecommendationsRepo, GenericDataRepo, SubtaskRepo
Services Business logic over domain objects. No external IO except via injected Fetchers / Repos. EpcRemappingService, EpcPredictionService, EpcEnergyDerivationService, KwhImpactService, ImpactPredictionService, RecommendationService, OptimiserService, FeatureBuilder, ResultsPersister
Orchestrators Compose Fetchers + Services + Repos to produce an end-to-end result. The only place where step order is encoded. IngestionPipeline, ModellingPipeline, RefreshOrchestrator

This taxonomy is strict. A class that fetches and persists belongs in the Service layer and depends on a Fetcher + a Repo. No back-channels.

4.2 Two pipelines, one direction

Data flows one way only: Ingestion → Repos → Modelling.

  • Ingestion writes; never calls Modelling.
  • Modelling reads; never calls Fetchers.

If Modelling needs fresh data, it returns "stale" and the caller decides whether to ingest first. This makes Modelling a pure function of repository state, which is the property that makes it reproducible, debuggable, and testable.

4.3 RefreshOrchestrator

Sits above both pipelines. Job:

  1. Trigger IngestionPipeline for a portfolio.
  2. After ingestion completes, ask repos: "did anything change vs the last modelled snapshot?"
  3. If yes, trigger ModellingPipeline. If no, return early.

This avoids re-modelling 100k properties when only 200 had refreshed EPC data.

4.4 SQS fanout (preserved from current architecture)

The existing trigger_plan_entrypoint SQS-chunking pattern is kept. Both pipelines fan out per batch of ~30100 properties (tuneable). Each consumer runs one batch end-to-end through the relevant pipeline.

UPRN partitioning: the trigger endpoint groups UPRNs by locality (postcode prefix / UPRN range) before chunking, so each batch maximises shared upstream fetches (one geospatial-range pull serves all 30 properties in the batch).

4.5 One endpoint for v1

For Phase 1 we ship one trigger endpoint that internally chains Ingestion → Modelling via RefreshOrchestrator. This matches the current FastAPI-fronted Lambda pattern (the FastAPI app in services/<svc>/ is a thin entrypoint that invokes the modelling Lambda).

We can split into two endpoints later (refresh-only vs model-only) once a real workflow demands it — e.g. a Landlord-Override edit that should re-model without re-fetching open data. The class taxonomy and RefreshOrchestrator boundary allow this split without re-architecting.

4.6 Trigger contract

The trigger payload is reduced compared to today's PlanTriggerRequest (backend/app/plan/schemas.py:98) — most of what's currently in the request body moves into the persisted Scenario aggregate.

class ModelTriggerRequest(BaseModel):
    portfolio_id: UUID
    property_ids: list[UUID] | S3Ref           # inline up to ~10k, S3 ref above
    scenario_ids: list[UUID]                   # 1+; resolved + pinned to ScenarioSnapshot at fan-out
    task_id: UUID
    subtask_id: UUID                           # SQS state machine, preserved from today

Everything that used to ride at the top level dies or moves:

  • goal, budget, goal_value, inclusions, exclusions, required_measures, enforce_fabric_first, scenario_name, housing_type → into Scenario / ScenarioPhase.
  • patches_file_path, already_installed_file_path, non_invasive_recommendations_file_path → gone; Landlord Overrides covers all three.
  • valuation_file_path → gone; ValuationService derives it.
  • ashp_cop, default_u_valuesHeatingSystemAssumptionsRepo / global config; not per-trigger.
  • multi_plan → gone; scenario_ids: list[...] handles N runs natively (one Plan per scenario per property).
  • event_type, epc_certificate_number, lmk_key, file_format, sheet_name, index_start/index_end, file_type → ingestion-side concerns; if needed, ride on a separate ingestion-trigger payload.

Scenario snapshotting: at fan-out time RefreshOrchestrator reads each requested Scenario, writes a ScenarioSnapshot keyed by (task_id, scenario_id), and per-batch SQS messages reference the snapshot. Mid-run edits to the live Scenario do not affect an in-flight modelling job. Snapshots are read-only and can be garbage-collected after the task completes.


5. Domain model

5.1 Aggregate root: Property

Property is the centrepiece. Every service operates on one or more Property instances. Every repo writes one slice of Property. The aggregate carries all state for a single property's modelling run.

@dataclass
class PropertyIdentity:
    portfolio_id: UUID
    uprn: Optional[int]
    landlord_property_id: Optional[str]
    address: AddressLines
    postcode: str

@dataclass
class Property:
    identity: PropertyIdentity

    # --- Source data — modelling path is determined by which of these are set ---
    epc: Optional[EpcPropertyData]              # from gov API (or remapped historical)
    site_notes: Optional[SiteNotes]             # our own survey; supersedes EPC when present
    landlord_overrides: Optional[LandlordOverrides]   # sparse, only meaningful when epc set

    # --- Enrichments ---
    geospatial: Optional[GeoSpatial]
    solar: Optional[SolarPotential]
    epc_anomaly_flags: Optional[EpcAnomalyFlags]  # from EpcPredictionService vs neighbours

    # --- Modelling outputs ---
    baseline_performance: Optional[BaselinePerformance]   # carries lodged + effective pair; see §5.4
    recommendations: list[Recommendation]
    impact_predictions: Optional[ImpactPredictions]
    plans: list[Plan]                                     # one per Scenario the property was modelled against

    # --- Derived ---
    @property
    def source_path(self) -> Literal["site_notes", "epc_with_overlay"]: ...

    @property
    def effective_epc(self) -> EpcPropertyData:
        """The EPC the modelling pipeline actually scores against."""
        ...

5.2 Properties collection

A first-class iterable, so batch operations are obvious:

@dataclass
class Properties:
    items: list[Property]

    def __iter__(self) -> Iterator[Property]: ...
    def __len__(self) -> int: ...
    def filter(self, pred: Callable[[Property], bool]) -> "Properties": ...
    def map(self, fn: Callable[[Property], Property]) -> "Properties": ...
    def with_landlord_overrides(self) -> "Properties": ...

Services typically take and return Properties, not lists.

5.3 Other aggregates

Aggregate Owns Repo
Property property identity, epc, site_notes, landlord_overrides, enrichments, modelling results PropertyRepo
Plan per-property modelling output for one Scenario: ordered phases: list[PlanPhase], each carrying its OptimisedPackage, ending state snapshot, and rolled-over options RecommendationsRepo
Scenario portfolio-wide scenario metadata (goal, budget, exclusions, housing type) plus ordered phases: list[ScenarioPhase]; each phase carries measure_types_allowed, phase budget, phase target RecommendationsRepo
ScenarioSnapshot frozen copy of a Scenario pinned at trigger time, keyed by (task_id, scenario_id), so mid-run scenario edits don't affect an in-flight modelling job RecommendationsRepo
Subtask / Task SQS fanout state SubtaskRepo
EpcCache gov-API responses keyed by UPRN, with freshness/TTL EpcCacheRepo
GenericData UPRN-range geospatial, postcode lookups, shared static data GenericDataRepo
FuelRates time-versioned, region-aware per-fuel rates (pence/kWh), standing charges, SEG export rate, calorific values FuelRatesRepo
CarbonFactors time-versioned per-fuel CO2 emission factors (kgCO2e/kWh); Defra publishes annually CarbonFactorsRepo
HeatingSystemAssumptions boiler efficiency tables, ASHP/GSHP COPs, solar-thermal coverage proportion; per-property physical assumptions, not fuel-market data HeatingSystemAssumptionsRepo

Aggregates are loaded whole — never half a Property. If a slice is too large to load eagerly (e.g. recommendation history), it lives in a separate aggregate.

A single-phase Scenario is phases: [<one ScenarioPhase>] with all measure types allowed and the full budget on it — no special-case path through the pipeline.

5.4 BaselinePerformance carries lodged + effective

@dataclass
class BaselinePerformance:
    # As-lodged: unmodified EPC fields (or Site Notes' recorded values where Site Notes are the source).
    lodged_sap: int
    lodged_band: Epc
    lodged_carbon: float
    lodged_heat_demand: float

    # Effective: what the modelling pipeline actually scored against.
    # Equals lodged when neither rebaselining trigger fires; equals ML output when rebaselined.
    effective_sap: int
    effective_band: Epc
    effective_carbon: float
    effective_heat_demand: float

    # kWh / fuel split / bills — always derived deterministically from the Effective EPC by
    # EpcEnergyDerivationService (SAP physics + UCL correction + FuelRates lookup).
    # Lodged kWh / bills are not stored separately — the EPC's recorded cost fields are pinned to
    # inspection-date fuel rates, so we always re-derive bills from current FuelRates regardless.
    annual_kwh: float
    fuel_split: dict[Fuel, float]
    annual_bills: dict[Fuel, float]

    rebaselined: bool
    rebaseline_reason: Optional[Literal["pre_sap10", "physical_state_changed", "both"]]

The pair lets the FE show "lodged rating vs SAP10-equivalent rebaselined rating" side by side without a separate query. Both fields are always populated; when no rebaselining trigger fires, effective_* equals lodged_*.


6. Source-of-truth and overlay precedence

There are exactly two modelling paths. The Property.source_path property selects.

6.1 Path 1 — Site notes

If a Property has site_notes and they are newer than any available EPC (or no EPC exists), site notes are the complete source of truth:

  • effective_epc = site_notes.to_epc_property_data().
  • EPC fields not covered by site notes — none expected. Site notes are committed to being a full-coverage survey. Treat any gap as a survey-quality bug, not a fallback signal.
  • LandlordOverrides are not applicable in Path 1 (the survey supersedes).

6.2 Path 2 — EPC with landlord overlay

If a Property has no site notes (or the EPC is newer):

  • effective_epc = epc with landlord_overrides applied as a sparse field-level overlay (landlord > epc).
  • LandlordOverrides are sparse: each row represents one corrected field. Schema TBD at implementation time; assume flat input via Excel/CSV for v1, with a flag to revisit shape after first customer onboarding.

6.3 Recency tie-break

When a property has both site notes and a public EPC, the newer of the two wins. Rationale: a recent EPC may reflect retrofit work done after our survey; conversely a recent survey reflects on-site observations the EPC cannot capture.

This tie-break is implemented in Property.source_path and may be tuned later (e.g. always prefer surveys regardless of date, or per-portfolio policy).

6.4 Rebaselining trigger

ML re-predicts SAP / carbon / heat when either of these holds:

  1. Pre-SAP10 schemaeffective_epc.sap_version < 10.0. The EPC was rated under SAP 2012 (or earlier) and we want a SAP10-equivalent baseline so all properties are scored against the same model version. Canonical signal is the sap_version: float field; fall back to schema_type string, then to lodgement_date if both are absent. Site Notes are assumed SAP10 by construction (PasHub / ECMK produce them now) — Path 1 typically doesn't trigger this leg.
  2. Physical state changedeffective_epc differs from the lodged EPC's physical fields (walls / heating / windows / etc.). Triggered by Landlord Overrides changing physical state, or by Site Notes that contradict the lodged EPC.

When triggered, a single ML call re-predicts SAP/carbon/heat with the current Effective EPC state as input. Both reasons can fire together; the prediction is still one call.

kWh is always re-derived via EpcEnergyDerivationService — even when no ML rebaseline runs — because the EPC's recorded cost fields use fuel rates pinned to the inspection date, and current rates from FuelRatesRepo are what we want to surface to users.

The diff mechanism for "physical state changed" (content hash, dirty flag, etc.) is an implementation detail; start with a content hash of the physical-state subset of EpcPropertyData stored alongside the previous run.

6.5 Deprecated concepts

  • Patches (patch_epc) — removed. Functionality subsumed by LandlordOverrides.
  • Already-installed measures — likely subsumed by LandlordOverrides ("we have a heat pump now" → override heating fields). Confirmed at implementation time.
  • Non-invasive recommendations — TBD whether this concept survives; not blocking.

7. Persistence: repositories and unit of work

7.1 What a repository is

A repository owns the SQL for one aggregate. Nothing else writes SQL for that aggregate. Callers see only domain objects.

class PropertyRepo(Protocol):
    def get(self, identity: PropertyIdentity) -> Optional[Property]: ...
    def bulk_save(self, uow: UnitOfWork, properties: Properties) -> None: ...
    def find_by_portfolio(self, portfolio_id: UUID) -> Properties: ...
    def find_stale(self, portfolio_id: UUID, threshold: timedelta) -> Properties: ...

Implementation references current db_funcs.* modules during phase 0 to avoid a big-bang SQL rewrite, but the interface is fixed.

7.2 Unit of Work

Multi-table writes inside a single aggregate, or across aggregates that share a transaction (e.g. property + plan + recommendations) go through a UnitOfWork:

with self.uow_factory() as uow:
    self.property_repo.bulk_save(uow, properties)
    self.recommendations_repo.bulk_save(uow, plans)
    uow.commit()

UoW owns the SQLAlchemy session lifecycle. Repos use the session passed in via the UoW. Outside a UoW, repos use a short-lived read session.

7.3 Repository inventory

Repo Tables it owns
PropertyRepo properties, property_details_epc, property_spatial
EpcCacheRepo new table: epc_api_cache (TTL, raw API response, mapped EpcPropertyData)
SiteNotesRepo new table: site_notes (replaces current energy_assessments)
LandlordOverridesRepo new table: landlord_overrides (sparse, per-field rows for audit)
RecommendationsRepo plans, plan_phases, recommendations, recommendation_parts, scenarios, scenario_phases, scenario_snapshots
GenericDataRepo new table or S3-backed: UPRN-range geospatial + postcode-keyed shared static data
FuelRatesRepo new table: fuel_rates(fuel_type, rate_pence_per_kwh, standing_charge_pence_per_day, calorific_value_kwh_per_unit, unit, effective_from, effective_to, region_code Optional, source). SEG export rate is a row with fuel_type = 'electricity_export'.
CarbonFactorsRepo new table: carbon_factors(fuel_type, kgco2e_per_kwh, effective_from, effective_to, source). Defra publishes annually.
HeatingSystemAssumptionsRepo new table(s): boiler efficiency, ASHP/GSHP COP, solar-thermal coverage proportion. Static-ish, manual refresh.
SubtaskRepo tasks, subtasks (existing)

DDL migrations are scoped to sub-PRD (iii).

7.4 Fakes

For tests, each repo has a FakeXRepo companion backed by a dict. Service unit tests inject fakes. No DB required.


8. ML contract

8.1 Where ML lives

Concern Owner
Defining the EPC → features transform This repo (ara.domain.sap10_ml.EpcMlTransform)
Loading data, applying transform, writing training parquet to S3 This repo (sub-PRD (ii) batch job)
Training, hyperparameter search, deployment Autogluon repo
Scoring at modelling time This repo (FeatureBuilder calls EpcMlTransform, sends DataFrame to deployed lambda)

The autogluon repo is intentionally dumb: it consumes parquet, knows which column is the target, knows which columns to ignore. It has no EPC semantics.

8.2 EpcMlTransform

A separate class (not a method on EpcPropertyData), because:

  • The data class stays clean of training-infrastructure concerns.
  • Versioned transforms (EpcMlTransformV1, EpcMlTransformV2) swap easily.
  • Future need: injection of normalisation stats from the training set is straightforward on a class, awkward on a dataclass.
class EpcMlTransform:
    VERSION: str = "1.0.0"  # semver

    def to_row(self, epc: EpcPropertyData) -> dict[str, Any]: ...
    def to_rows(self, properties: Properties) -> pd.DataFrame: ...
    def schema(self) -> dict[str, type]: ...  # for parquet emission + validation

The interesting work — flattening List[SapWindow], List[SapBuildingPart] into fixed-width columns — lives inside this class. Domain decisions (top-N windows, aggregate roofs, etc.) are encoded here and reviewed by Khalim. Sub-PRD (ii) goes into detail.

8.3 Versioning

  • Transform class is semver-tagged (VERSION = "1.0.0").
  • S3 path for training parquet includes the version: s3://.../training/v1.0.0/....
  • Deployed scoring lambda is tagged with the transform version it was trained against.
  • Modelling pipeline asserts at startup that its EpcMlTransform.VERSION matches the deployed lambda's tag; mismatch = hard fail at deploy time.

Bump major when removing or renaming columns. Bump minor when adding optional columns (older models still scoreable; new models can be trained against new fields).

8.4 ML model families

Both ML calls (rebaselining + per-measure impact) use the same EpcMlTransform:

Service Lambda Target
RebaseliningService (S4b) baseline-models-* SAP / carbon / heat demand under the current Effective EPC state (SAP10-equivalent)
ImpactPredictionService (S6) impact-models-* SAP / carbon / heat demand impact per measure (and per battery option, using new EPC battery fields)

Annual kWh and bills are never an ML target — derived deterministically by EpcEnergyDerivationService (S4a). Recommendation kWh delta is derived from the SAP delta predicted by S6 plus heating-system fuel + COP, not via a separate ML call.

The two families are trained against the same input feature schema; only target columns differ. Sub-PRD (ii) handles training-time details.


9. Service catalogue

The classes below implement the pipeline end-to-end. Detailed signatures are deliberately left for implementers — this PRD documents purpose, dependencies, and rough shape; per-service grill sessions produce the contracts.

Out of the legacy engine (deleted, not migrated): PredictionMatrix (debug-only, moves to test fixtures), extract_portfolio_aggregation_data (dead code, FE aggregates dynamically per §10), inspections plumbing (inspections_map is initialised but never populated in the current engine), patches / already_installed / non_invasive_recommendations (subsumed by Landlord Overrides), ECO4 / WHLG funding integration (get_funding_data and optimise_with_scenarios' funding paths), the pre-recommendation kWh ML lambda (KWH_MODEL_PREFIXES), and floor-count / heat-loss-perimeter estimation from geospatial (now on EpcPropertyData). Address matching (address2UPRN) lives as a separate service, not inside EpcClientService.

9.1 Fetchers (called by IngestionPipeline)

# Class Purpose Dependencies
F1 EpcClientService Fetches EPCs from new gov API. Already exists at backend/epc_client/. Scope narrows compared to current SearchEpc — address matching (address2uprn) and OS API estimation are not its concern. httpx
F2 GeospatialFetcher Fetches UPRN-range geospatial data. Replaces OpenUprnClient. Floor count and heat-loss perimeter estimation are no longer needed — both are now on EpcPropertyData directly (number_of_storeys, SapFloorDimension.heat_loss_perimeter_m). Scope reduces to building geometry and postcode-area context. S3 / Ordnance Survey API
F3 SolarFetcher Wraps Google Solar API; building-level + unit-level scenes. Google Solar API
F4 SiteNotesIngester Loads site notes from Excel uploads / structured input. Persists via SiteNotesRepo. S3, repo
F5 FuelRatesFetcher Scheduled ETL — scrapes Ofgem regional caps and per-fuel rates, writes timeseries rows to FuelRatesRepo. Manual CSV upload fallback for off-cycle corrections. Ofgem feed, repo
F6 CarbonFactorsFetcher Same shape as F5 against Defra's annual CO2 factor publication. Defra feed, repo

9.2 Domain services (called by ModellingPipeline)

# Class Original-list # Purpose Reads Writes
S1 EpcRemappingService 4 Re-map legacy / historical EPCs into new EpcPropertyData shape. EpcCacheRepo EpcCacheRepo (mapped column)
S2 EpcPredictionService 3 For every property: produce predicted EPC + per-field anomaly flags vs neighbours. Used both for gap-fill (Path 2 if EPC missing) and UI surfacing. EpcCacheRepo, GenericDataRepo
S3 FeatureBuilder (new) Wraps EpcMlTransform. Converts Properties → scoring DataFrame.
S4a EpcEnergyDerivationService (new) Derives annual kWh + fuel split + bills from the Effective EPC. Deterministic, no ML. Pipeline: (1) source regulated PEUI — either from energy_consumption_current × floor_area when EPC field present and no physical override, or from SAP physics (heat demand × area + SAP hot-water + SAP lighting) for Site Notes / overridden cases; (2) add appliance + cooking via SAP Appendix L formulas (port of AnnualBillSavings.estimate_appliances_energy_use); (3) apply UCL per-band correction (Few et al. 2023, Table 3), keyed on the post-state Effective EPC's band — not the lodged band; (4) decompose total PEUI into end-use shares via SAP-physics proportions; (5) primary→delivered per fuel using SAP primary factors; (6) bills = delivered kWh per fuel × current rate from FuelRatesRepo + standing charges + SEG credits. CO2 emissions from CarbonFactorsRepo. FuelRatesRepo, CarbonFactorsRepo, HeatingSystemAssumptionsRepo
S4b RebaseliningService (new, partial overlap with old "rebaselining" logic) Triggered by §6.4 conditions (pre-SAP10 schema or physical state changed). Calls SAP/carbon/heat ML lambdas to produce SAP10-equivalent baseline against the current Effective EPC state. Both BaselinePerformance.lodged_* and effective_* are populated downstream — pair is always stored, equal when not rebaselined. kWh is re-derived via S4a, not ML. FeatureBuilder
S5 RecommendationService 6 Generates per-property recommendations against the current rolling Effective EPC. Invoked once per (scenario × phase) — filters candidates to the phase's measure_types_allowed, returns candidates eligible against the post-prior-phase state. Replaces current Recommendations (1383 LOC). MaterialsRepo
S6 ImpactPredictionService 7 Calls SAP / carbon / heat impact ML lambda for every candidate recommendation (FE displays all options to user). Invoked per (scenario × phase) with the rolling state's feature vector. Recommendation kWh delta is derived deterministically from SAP delta + heating-system fuel/COP, not from a separate ML call. Battery impact uses the new EPC battery fields (energy_pv_battery_count, energy_pv_battery_capacity) as ML inputs — the deterministic BatterySAPScorer from the legacy engine is replaced by ML prediction. FeatureBuilder
S7 OptimiserService 8 Per-phase optimisation against rolling state. Reads PlanPhase.state_at_end[n-1] to honour cross-phase constraints (fabric-first, heat-pump-needs-insulation, ventilation). Wraps current CostOptimiser / GainOptimiser / optimise_with_scenarios minus the dead ECO-funding paths. Unselected candidates roll into phase n+1's candidate pool (auto vs user-marked TBD, §15).
S8 ValuationService Estimates per-property valuation (current + post-retrofit) from academic-paper-based regression on EPC change, property type, region. Improvement on the existing PropertyValuation.estimate code — exact shape deferred to per-service grill.
S9 ResultsPersister 9 Final step: writes Plan (with phases[]) + Recommendations + Property updates via repos under one UoW, per scenario. All write repos

9.3 Orchestrators

# Class Purpose
O1 IngestionPipeline Per-batch SQS consumer. Calls F1F4, persists via repos.
O2 ModellingPipeline Per-batch SQS consumer. Reads from repos, runs S1→S8 in order, ends with persistence.
O3 RefreshOrchestrator Top-level: triggers Ingestion → diff → optionally Modelling.

9.4 ModellingPipeline step order

For each Property in the batch, against each pinned ScenarioSnapshot from the trigger payload:

Per-property setup (runs once regardless of scenario count):
  1.  PropertyRepo.get()  →  Property (epc, site_notes, overrides, geospatial, solar)
  2.  EpcRemappingService — if epc is in legacy schema, upgrade to current
  3.  EpcPredictionService — predicted EPC + per-field anomaly flags (always runs)
  4.  Compute Property.effective_epc (path-1 or path-2)
  5.  RebaseliningService — IF §6.4 conditions hold (pre-SAP10 OR physical state changed),
                            re-predict SAP/carbon/heat via ML against the Effective EPC state.
                            Populate BaselinePerformance.lodged_* + effective_*.
  6.  EpcEnergyDerivationService — SAP-physics + UCL (post-state band) + FuelRates → kWh, fuel split, bills.

Per-scenario loop:
  Per-phase loop (in scenario phase order):
    7.  RecommendationService — generate candidate measures, restricted to phase's measure_types_allowed,
                                 against the rolling Effective EPC state (baseline for phase 1; updated for phase 2+).
    8.  ImpactPredictionService — predict SAP/carbon/heat impact for those candidates, ML scored against
                                   the rolling state's feature vector. All candidates scored (FE shows options).
    9.  OptimiserService — select package within phase budget + phase goal. Reads earlier-phase state to honour
                            cross-phase constraints (fabric-first, heat-pump-needs-insulation, ventilation).
    10. Apply package → roll state forward (simulate post-package SAP / kWh / bills via S4a + impact predictions
                        from step 8). Record `PlanPhase.state_at_end`. Unselected options become
                        `PlanPhase.rolled_over_options` and are eligible candidates next phase.
  11. ResultsPersister — write Plan (phases[]) + Recommendations under one UoW for this scenario.

Steps 16 run once per property regardless of scenario count. Steps 710 run once per (scenario × phase) per property. Step 11 runs once per scenario per property.

Batching: steps 5, 8 batch the whole batch into one ML call where possible. Step 8's cost scales with N_phases × N_scenarios × N_candidate_measures; multi-phase pays its own ML bill, single-phase scenarios cost the same as today.

Note vs the current model_engine: the pre-recommendation kWh ML call has been removed. Baseline kWh now comes from EpcEnergyDerivationService (SAP physics + UCL + FuelRates). ML is reserved for SAP/carbon/heat (rebaselining + impact prediction). Recommendation-level kWh delta is derived deterministically from the impact-predicted SAP delta plus heating-system fuel + COP from HeatingSystemAssumptionsRepo; no separate kWh ML lambda.

Open future change (flagged §15): SAP-impact-of-a-measure is not strictly additive — installing measure A changes the SAP impact of measure B. The current per-measure ML scoring + linear optimisation approximates this. A future iteration may pre-define candidate packages and ML-score whole packages, accepting the combinatorial cost in return for accuracy. Defer until implementation reveals where the approximation hurts.

9.5 Per-service contracts — deferred

Method signatures, return types, error semantics, and edge-case behaviour are explicitly out of scope for this PRD. The implementer of each service runs a /grill-me session against this document and produces a detailed sub-design before coding.


10. Cross-batch concerns

Concern Status Approach
Building-level solar adjustment Deferred — future TODO, not implemented today. The current building_ids block in model_engine is dead-ish; it operates on the in-process batch only. New design preserves that limitation. Future feature: a post-modelling consolidation pass that groups results by building_id across batches and re-optimises.
Portfolio aggregation Dropped. Front-end computes aggregations dynamically from per-property plans. extract_portfolio_aggregation_data in current engine is dead code (defined, never called) — deleting.
Shared upstream data Handled by orchestrator partitioning + GenericDataRepo. Trigger endpoint groups UPRNs by postcode / UPRN-range before SQS chunking so each batch maximises intra-batch sharing. GenericDataRepo caches across batches so first batch pays, subsequent batches hit cache.

11. Repository layout — monorepo via uv workspaces

The repo is restructured as a Python monorepo using uv workspaces. Shared types and shared infra live as workspace packages under packages/; each deployable Lambda or microservice lives as its own package under services/. Each services/<svc>/ has its own pyproject.toml, Dockerfile, and Lambda image — the bundle contains only that service's deps + its workspace deps, keeping cold-start size and package weight contained.

/
├── pyproject.toml                      # workspace root
├── uv.lock
│
├── packages/                           # shared workspace packages — imported by services/
│   ├── domain/                         # "domna-domain"
│   │   ├── pyproject.toml
│   │   └── src/domain/
│   │       ├── property.py             # Property, Properties, PropertyIdentity
│   │       ├── site_notes.py
│   │       ├── landlord_overrides.py
│   │       ├── baseline_performance.py # lodged + effective pair
│   │       ├── plan.py                 # Plan, PlanPhase, OptimisedPackage
│   │       ├── scenario.py             # Scenario, ScenarioPhase, ScenarioSnapshot
│   │       ├── recommendation.py
│   │       ├── geospatial.py
│   │       ├── solar.py
│   │       ├── anomaly_flags.py
│   │       └── ml/
│   │           ├── transform.py        # EpcMlTransform (versioned)
│   │           └── schema.py
│   │
│   ├── repos/                          # "domna-repos" — persistence, no business logic
│   │   ├── pyproject.toml
│   │   └── src/repos/
│   │       ├── unit_of_work.py
│   │       ├── property_repo.py
│   │       ├── epc_cache_repo.py
│   │       ├── site_notes_repo.py
│   │       ├── landlord_overrides_repo.py
│   │       ├── recommendations_repo.py
│   │       ├── generic_data_repo.py
│   │       ├── fuel_rates_repo.py
│   │       ├── carbon_factors_repo.py
│   │       ├── heating_system_assumptions_repo.py
│   │       └── subtask_repo.py
│   │
│   ├── fetchers/                       # "domna-fetchers" — external API clients
│   │   ├── pyproject.toml
│   │   └── src/fetchers/
│   │       ├── epc_client.py           # wraps backend/epc_client/
│   │       ├── geospatial.py
│   │       ├── solar.py
│   │       ├── fuel_rates_fetcher.py
│   │       └── carbon_factors_fetcher.py
│   │
│   └── utils/                          # "domna-utils" — logging, AWS, S3, cloudwatch, subtasks
│       ├── pyproject.toml
│       └── src/utils/
│
├── services/                           # deployable units, one Lambda image each
│   ├── ara/                            # the modelling backend
│   │   ├── pyproject.toml              # deps: domna-domain, domna-repos, domna-fetchers, domna-utils, ML libs
│   │   ├── Dockerfile
│   │   ├── src/ara/
│   │   │   ├── services/               # EpcRemappingService, EpcPredictionService,
│   │   │   │                           # EpcEnergyDerivationService, RebaseliningService,
│   │   │   │                           # FeatureBuilder, RecommendationService,
│   │   │   │                           # ImpactPredictionService, OptimiserService,
│   │   │   │                           # ValuationService, ResultsPersister
│   │   │   ├── orchestrators/          # IngestionPipeline, ModellingPipeline, RefreshOrchestrator
│   │   │   └── lambdas/                # handler.py per Lambda + event-shape contracts
│   │   └── tests/
│   │       ├── fakes/                  # FakePropertyRepo, FakeEpcClient, etc.
│   │       ├── unit/                   # service tests using fakes only
│   │       └── integration/            # real DB + real SQS via localstack
│   │
│   ├── address2uprn/                   # messy-address → UPRN matching, pre-modelling step
│   │   ├── pyproject.toml
│   │   ├── Dockerfile
│   │   └── src/address2uprn/
│   ├── hubspot/                        # existing Hubspot ETL
│   ├── pashub/                         # PasHub survey ingestion
│   ├── ecmk/                           # ECMK assessment ingestion
│   └── magicplan/                      # MagicPlan integration
│
├── backend/                            # legacy FastAPI app + microservices, kept until cut-over
│   ├── app/                            # FastAPI; thin entrypoints that invoke service Lambdas
│   └── ...                             # legacy engine, SearchEpc, etc.; deleted after cut-over
│
├── datatypes/                          # existing — EPC schemas; eventually folds into packages/domain/
└── docs/
    └── adr/                            # architectural decision records

Boundary properties (enforced by package structure, not convention):

  • A services/<svc>/ package can import domain.*, import repos.*, import fetchers.*, import utils.*. It cannot import another service's modules — they're separate distributions with no cross-import path.
  • ADR-0003 (Ingestion / Modelling separation) is preserved: modelling services in services/ara/src/ara/services/ depend only on repos.* + domain.*, never on fetchers. Orchestrators are the only place fetchers and services meet.

Migration (incremental, not big-bang):

  1. Carve out packages/domain/ first — fold datatypes/epc/domain/ + the new aggregate types into it.
  2. Carve out packages/utils/ from current utils/ + backend/utils/.
  3. Carve out packages/repos/ and packages/fetchers/ once services/ara/ is being built and needs them.
  4. services/ara/ is greenfield — no legacy code lives in it.
  5. services/address2uprn/, services/pashub/, etc. are split out as their owners pick them up.
  6. backend/ shrinks to the FastAPI entrypoint layer once everything else has moved.

Reused intact (no rewrite needed at carve-out time):

  • backend/epc_client/ → folds into packages/fetchers/src/fetchers/epc_client.py.
  • datatypes/epc/domain/ → folds into packages/domain/src/domain/epc/.
  • recommendations/optimiser/ → wrapped by services/ara/src/ara/services/optimiser.py.
  • backend/app/db/ → repos delegate into db_funcs.* until SQL is rewritten under sub-PRD (iii).

12. Testing strategy

12.1 Unit tests (the bulk)

Every service test injects fake fetchers and fake repos. No DB, no network, no ML lambda. A service test verifies one slice of logic in 530 lines.

Example:

def test_epc_prediction_flags_anomalous_wall_type():
    neighbours = [_make_epc(wall_construction="solid") for _ in range(5)]
    target = _make_property(epc=_make_epc(wall_construction="cavity"))
    repo = FakeGenericDataRepo(neighbours_by_postcode={target.identity.postcode: neighbours})

    svc = EpcPredictionService(generic_repo=repo)
    result = svc.run(Properties([target]))

    assert result[0].epc_anomaly_flags.wall_construction == "differs_from_neighbours"

12.2 Integration tests

One per pipeline (Ingestion, Modelling, Refresh). Real Postgres (testcontainers or localstack), fake fetchers (hitting recorded fixtures), fake ML lambdas (returning canned predictions). Catches schema / SQL / transaction issues.

12.3 Contract tests

The transform (EpcMlTransform) has its own test suite:

  • Golden file: given a fixed Property, output matches an expected DataFrame row exactly.
  • Schema test: the output columns exactly match a checked-in CSV header (so autogluon team sees breakage on PR).

12.4 What is NOT tested

  • The autogluon repo's training code — owned there.
  • The gov EPC API behaviour — assumed via the official spec.
  • Front-end aggregation logic — owned there.

13. Observability

Each pipeline step emits a structured log line at start and end with:

{step, property_id, uprn, portfolio_id, subtask_id, duration_ms, outcome, error?}

Errors propagate with the Property.identity attached, so a portfolio of 100k can be triaged by grep.

The existing task/subtask state machine is preserved — IngestionPipeline and ModellingPipeline update subtask status at start (in progress), end (complete / failed), with the CloudWatch log URL attached as today.

CloudWatch alarms exist on subtask failure rate; thresholds remain unchanged.


14. Data flow: a worked example

A landlord uploads a corrected heating system for UPRN 12345 via the UI.

  1. UIPOST /properties/12345/overrides → writes to landlord_overrides table via LandlordOverridesRepo.
  2. RefreshOrchestrator invoked (either automatically on override-write, or by a "re-model" button). Notes: ingestion is not triggered because no external state changed.
  3. ModellingPipeline invoked on a batch of [12345]:
    • Reads Property(uprn=12345) from PropertyRepo.
    • Property.effective_epc = epc + landlord_overrides → heating system fields differ from baseline.
    • RebaseliningService triggered: ML re-predicts SAP / carbon / heat against the new effective EPC.
    • EpcEnergyDerivationService re-runs over the new effective EPC to derive baseline kWh + fuel split + bills (no ML).
    • RecommendationService regenerates recommendations against the new baseline.
    • OptimiserService re-picks optimal package.
    • ResultsPersister writes new plan under one UoW (old plan is superseded; whether to soft-archive is a sub-PRD (iii) decision).

Total external calls: zero. The override write is the only thing that hit a network boundary, and that was the inbound HTTP from the UI.


15. Open questions for team review

  1. One endpoint vs two (§4.5) — resolved: single endpoint for Phase 1; split later when a real workflow demands it.
  2. LandlordOverrides shape (§6.2) — flat-Excel-shape for v1, with a flag to revisit after first customer.
  3. already_installed and non_invasive_recommendations (§6.5) — both likely subsumed by overlay, but final call deferred.
  4. Recency tie-break policy (§6.3) — default "newer wins"; team to consider per-portfolio override.
  5. GenericDataRepo storage backend — Postgres table, S3, or DynamoDB. Postgres is the path of least infra change; recommend defaulting to that.
  6. Soft-archive vs hard-overwrite for superseded plans (§14) — affects audit / undo behaviour. Defer to sub-PRD (iii).
  7. Building-level optimisation as a Phase 2 service (§10) — agreed deferred; flag for roadmap discussion.
  8. Transform versioning policy (§8.3) — semver chosen; team to confirm bump conventions.
  9. UCL EPC-correction model (§9.2 S4a) — resolved: Few et al. 2023 (Energy & Buildings 288, 113024). Implementation pattern already in AnnualBillSavings.adjust_energy_to_metered — port the per-band gradients/intercepts (Table 3) into EpcEnergyDerivationService, keyed on the post-state Effective EPC band.
  10. Fuel-price source for bill calculation (§9.2 S4a) — resolved: FuelRatesRepo is a time-versioned, region-aware table; ETL by FuelRatesFetcher (Ofgem feed + manual upload fallback). Per-portfolio override deferred to v2 — confirm whether Calico / Hyde have bulk-buy contracts before first onboarding.
  11. kWh handling under Rebaselining (§9.4) — resolved: ML re-predicts SAP/carbon/heat only; EpcEnergyDerivationService re-derives kWh from the rebaselined Effective EPC. Heating-fuel-type change is handled naturally because S4a re-reads heating fields from the Effective EPC.
  12. Phase rollover semantics (§9.2 S7) — when a candidate measure isn't selected in phase n, does it auto-roll into phase n+1's candidate pool, or does the user mark which measure types can roll? Auto is simpler; user-marked is more flexible. Decide at scenario-builder UX time.
  13. Package-level vs per-measure ML scoring (§9.4) — SAP impact of a measure is not strictly additive; the current per-measure scoring + linear optimisation approximates this. A future iteration may pre-define candidate packages and ML-score whole packages. Defer until per-service grill on OptimiserService.
  14. UCL extrapolation scope (§9.2 S4a) — the Few et al. paper is gas-heated, no PV, England + Wales only. Current legacy code applies the correction to all properties regardless. Keep silent extrapolation for v1, or stratify (no correction for non-gas / PV) and surface uncertainty to FE? Defer to per-service grill.
  15. ValuationService rebuild (§9.2 S8) — existing PropertyValuation.estimate cites several papers; the rebuild should improve the regression. Shape deferred to per-service grill.
  16. Battery-via-ML cutover (§9.2 S6) — confirm the new ML model is trained against energy_pv_battery_count + energy_pv_battery_capacity and the legacy BatterySAPScorer can be retired without regression for battery-equipped properties.

16. Linked sub-PRDs (placeholders)

  • Sub-PRD (ii) — ML training pipelinedocs/sub-prds/ml-training-pipeline.md (TBC)
  • Sub-PRD (iii) — DB schema migrationdocs/sub-prds/db-schema-migration.md (TBC)
  • Sub-PRD (iv) — Historical EPC re-mappingdocs/sub-prds/historical-epc-remap.md (TBC)

Each sub-PRD owner: TBC. Each is independently reviewable but consumes the contracts defined in §5 (Property aggregate), §7 (repos), §8 (ML transform).


17. Next steps

  1. Team review of this PRD (target: ~1 week).
  2. Open follow-up grill sessions per service (/grill-me on each of S1S8 + F1F4) before that service is implemented.
  3. Break into issues via /to-issues against the project tracker.
  4. Stand up the empty ara/ package skeleton + fakes + first integration-test scaffold as PR-1.
  5. Land services in dependency order: domain → repos → fetchers → services → orchestrators → API.

Phase 1 milestone gate: first portfolio (Calico or Hyde) routed through the new pipeline end-to-end in June, with a manual spot-check on 5 representative properties to confirm outputs are reasonable. No parity-against-old-engine check — the old engine is dead by then.