mirror of https://github.com/Hestia-Homes/Model.git synced 2026-06-08 11:17:27 +00:00

Khalim Conn-Kowlessar fcbaf58a40 initial prd

2026-05-13 19:29:02 +00:00

33 KiB

Raw Blame History

ARA Backend Redesign — Design PRD

Status: Draft for team review Author: Khalim Conn-Kowlessar (with Claude grill session) Branch: ara-backend-design-prd Scope: Service architecture + domain model + contracts for the new modelling backend. Linked sub-PRDs cover ML training pipeline, DB schema migration, and historical EPC re-mapping.

1. Context

1.1 The forcing function

The current modelling backend (backend/engine/engine.py — model_engine, 1331 LOC) was built as an MVP. It is:

Tightly coupled to a specific gov EPC API that is being decommissioned on 30 May 2026 (~17 days from today).
A monolith — one async function reaches into DB modules, HTTP clients, ML lambdas, S3, and queue infrastructure directly.
Bottlenecked on a single person — Khalim is the only contributor able to safely modify the engine because no one else can predict the blast radius of a change.
Already returning erroneous data from the old API (clients are aware). The replacement API is partially built (backend/epc_client/epc_client_service.py) on the current feature branch.

1.2 What needs to change

Beyond just swapping API clients, this is the moment to rebuild the backend into a production-grade, contribute-able codebase, with:

A clear domain model rooted in the new EPC schema (EpcPropertyData).
Service boundaries that other team members can read, fix, and extend without needing the entire mental model.
Repository-mediated persistence so business logic can be tested without spinning up a database.
A separation between data fetching (slow, IO-heavy, external) and modelling (deterministic, fast, internal).

1.3 Out of scope for this PRD

These ship as linked sub-PRDs:

Sub-PRD (ii) — ML training pipeline (autogluon repo + parquet generation in this repo + scoring model retraining for the new EPC schema)
Sub-PRD (iii) — DB schema migration (new tables: site_notes, landlord_overrides, EPC cache, parallel write strategy)
Sub-PRD (iv) — Historical EPC re-mapping (one-off + ongoing batch job: legacy stored EPCs → new EpcPropertyData shape)

The contracts this PRD defines are the inputs each sub-PRD consumes.

2. Goals and non-goals

2.1 Goals

Survive the 30 May API shutdown — even if it means a brief degraded window, modelling continues to function against the new gov EPC API.
Decouple data fetching from modelling — modelling never makes external HTTP calls; it reads everything from repositories.
Make every service unit-testable against fakes — no test needs a real DB, a real gov API, or a real ML lambda to verify business logic.
Establish a single Property aggregate root as the domain centrepiece; all 9 modelling concerns are slices of one aggregate.
Versioned ML data contract — the EPC-to-features transform is the single shared artifact between this repo and the autogluon repo.
Per-property UI surfaces — fetched data is shown to users for review and override before modelling runs; modelling is triggered separately.

2.2 Non-goals

Multi-region deploy, GDPR-class data minimisation work, or compliance reporting — separate workstreams.
Replacement of the front-end. The new APIs preserve enough of the existing response shape that the FE migrates incrementally.
Removing pandas. The ML transform output is a parquet-friendly DataFrame-like shape; that stays.
A workflow engine (Prefect / Temporal / Airflow). Coordinator-class orchestration plus the existing SQS-fanout pattern is sufficient at the scale we serve.

3. Cutover strategy

Two-phase cutover, driven by the 30 May deadline.

3.1 Phase 0 — Stopgap (now → end of May)

The current model_engine keeps running. SearchEpc is rewired to delegate to EpcClientService (the new gov API client already built on this branch).
Old-schema EPCs persisted in the DB are read as-is; the EPC re-mapping service is not yet wired in.
Goal: no modelling outage at the API death date. Some degraded behaviour acceptable; clients are aware.

3.2 Phase 1 — Strangler (June → ~Q4 2026)

New ara/ package built alongside the old code. New endpoints expose the new pipeline. The old model_engine keeps running.
Per-portfolio feature flag: when set, the trigger endpoint routes the portfolio through the new pipeline. Default is the old pipeline.
Each of the 9 services is built, tested, and ships independently. Adding a service to the new pipeline does not require deleting the old one.
When confidence is high (last portfolio migrated, no regressions seen for N weeks), the old engine is deleted.

3.3 What is not done

No parallel-shadow run (run both, diff outputs). Reason: doubles compute per plan, requires diff tooling we don't have, and the old engine is already known to return bad data — diffs would be noise.
No big-bang switch. Reason: 9 services is too much change to land in one PR.

4. Architecture overview

┌─────────────────────────────────────────────────────────────────────┐
│  Trigger endpoint(s)                                                │
│  (one or two — see §4.5; deferred decision)                         │
└───────────┬──────────────────────────────────────────┬──────────────┘
            │                                          │
            ▼                                          ▼
   ┌─────────────────┐                       ┌─────────────────┐
   │ IngestionPipe   │   SQS, batches of N   │ ModellingPipe   │
   │ -----------     │ ◄─────────────────────│ -----------     │
   │ Fetchers run    │                       │ Reads via Repos │
   │ Persist via     │                       │ Calls Services  │
   │ Repos           │                       │ ML predictions  │
   └────────┬────────┘                       └────────┬────────┘
            │                                         │
            └───────────────► Repos ◄─────────────────┘
                                  │
                                  ▼
                        ┌──────────────────┐
                        │  Postgres tables │
                        │  (property,      │
                        │   epc_cache,     │
                        │   site_notes,    │
                        │   landlord_      │
                        │   overrides,     │
                        │   plans, etc.)   │
                        └──────────────────┘

  ┌──────────────────────────┐
  │  RefreshOrchestrator     │  triggers Ingestion → diff → conditionally Modelling
  └──────────────────────────┘

4.1 Class taxonomy

Every class falls into exactly one of four roles:

Role	Job	Examples
Fetchers	Call external APIs. Return raw response data. No DB.	`EpcClientService`, `GeospatialFetcher`, `SolarFetcher`, `SiteNotesIngester`
Repos	Persist and load domain aggregates. SQL hidden inside. No external IO.	`PropertyRepo`, `EpcCacheRepo`, `SiteNotesRepo`, `LandlordOverridesRepo`, `RecommendationsRepo`, `GenericDataRepo`, `SubtaskRepo`
Services	Business logic over domain objects. No external IO except via injected Fetchers / Repos.	`EpcRemappingService`, `EpcPredictionService`, `KwhPredictionService`, `ImpactPredictionService`, `RecommendationService`, `OptimiserService`, `FeatureBuilder`, `ResultsPersister`
Orchestrators	Compose Fetchers + Services + Repos to produce an end-to-end result. The only place where step order is encoded.	`IngestionPipeline`, `ModellingPipeline`, `RefreshOrchestrator`

This taxonomy is strict. A class that fetches and persists belongs in the Service layer and depends on a Fetcher + a Repo. No back-channels.

4.2 Two pipelines, one direction

Data flows one way only: Ingestion → Repos → Modelling.

Ingestion writes; never calls Modelling.
Modelling reads; never calls Fetchers.

If Modelling needs fresh data, it returns "stale" and the caller decides whether to ingest first. This makes Modelling a pure function of repository state, which is the property that makes it reproducible, debuggable, and testable.

4.3 RefreshOrchestrator

Sits above both pipelines. Job:

Trigger IngestionPipeline for a portfolio.
After ingestion completes, ask repos: "did anything change vs the last modelled snapshot?"
If yes, trigger ModellingPipeline. If no, return early.

This avoids re-modelling 100k properties when only 200 had refreshed EPC data.

4.4 SQS fanout (preserved from current architecture)

The existing trigger_plan_entrypoint SQS-chunking pattern is kept. Both pipelines fan out per batch of ~30–100 properties (tuneable). Each consumer runs one batch end-to-end through the relevant pipeline.

UPRN partitioning: the trigger endpoint groups UPRNs by locality (postcode prefix / UPRN range) before chunking, so each batch maximises shared upstream fetches (one geospatial-range pull serves all 30 properties in the batch).

4.5 One API or two? (deferred)

The team will decide at implementation time whether Ingestion and Modelling sit behind:

(a) One unified API with a single trigger endpoint that runs both phases.
(b) Two APIs, each with its own trigger, RefreshOrchestrator chains them.

Either is workable if the class taxonomy is preserved. Deferred to implementation review.

5. Domain model

5.1 Aggregate root: `Property`

Property is the centrepiece. Every service operates on one or more Property instances. Every repo writes one slice of Property. The aggregate carries all state for a single property's modelling run.

@dataclass
class PropertyIdentity:
    portfolio_id: UUID
    uprn: Optional[int]
    landlord_property_id: Optional[str]
    address: AddressLines
    postcode: str

@dataclass
class Property:
    identity: PropertyIdentity

    # --- Source data — modelling path is determined by which of these are set ---
    epc: Optional[EpcPropertyData]              # from gov API (or remapped historical)
    site_notes: Optional[SiteNotes]             # our own survey; supersedes EPC when present
    landlord_overrides: Optional[LandlordOverrides]   # sparse, only meaningful when epc set

    # --- Enrichments ---
    geospatial: Optional[GeoSpatial]
    solar: Optional[SolarPotential]
    epc_anomaly_flags: Optional[EpcAnomalyFlags]  # from EpcPredictionService vs neighbours

    # --- Modelling outputs ---
    baseline_predictions: Optional[BaselinePredictions]   # SAP/carbon/heat after rebaselining
    recommendations: list[Recommendation]
    impact_predictions: Optional[ImpactPredictions]
    optimised_package: Optional[OptimisedPackage]

    # --- Derived ---
    @property
    def source_path(self) -> Literal["site_notes", "epc_with_overlay"]: ...

    @property
    def effective_epc(self) -> EpcPropertyData:
        """The EPC the modelling pipeline actually scores against."""
        ...

5.2 `Properties` collection

A first-class iterable, so batch operations are obvious:

@dataclass
class Properties:
    items: list[Property]

    def __iter__(self) -> Iterator[Property]: ...
    def __len__(self) -> int: ...
    def filter(self, pred: Callable[[Property], bool]) -> "Properties": ...
    def map(self, fn: Callable[[Property], Property]) -> "Properties": ...
    def with_landlord_overrides(self) -> "Properties": ...

Services typically take and return Properties, not lists.

5.3 Other aggregates

Aggregate	Owns	Repo
`Property`	property identity, epc, site_notes, landlord_overrides, enrichments, modelling results	`PropertyRepo`
`Plan`	per-property modelling output, scenario membership, plan + recommendations + parts	`RecommendationsRepo`
`Scenario`	portfolio-wide scenario metadata	`RecommendationsRepo`
`Subtask` / `Task`	SQS fanout state	`SubtaskRepo`
`EpcCache`	gov-API responses keyed by UPRN, with freshness/TTL	`EpcCacheRepo`
`GenericData`	UPRN-range geospatial, postcode lookups, shared static data	`GenericDataRepo`

Aggregates are loaded whole — never half a Property. If a slice is too large to load eagerly (e.g. recommendation history), it lives in a separate aggregate.

6. Source-of-truth and overlay precedence

There are exactly two modelling paths. The Property.source_path property selects.

6.1 Path 1 — Site notes

If a Property has site_notes and they are newer than any available EPC (or no EPC exists), site notes are the complete source of truth:

effective_epc = site_notes.to_epc_property_data().
EPC fields not covered by site notes — none expected. Site notes are committed to being a full-coverage survey. Treat any gap as a survey-quality bug, not a fallback signal.
LandlordOverrides are not applicable in Path 1 (the survey supersedes).

6.2 Path 2 — EPC with landlord overlay

If a Property has no site notes (or the EPC is newer):

effective_epc = epc with landlord_overrides applied as a sparse field-level overlay (landlord > epc).
LandlordOverrides are sparse: each row represents one corrected field. Schema TBD at implementation time; assume flat input via Excel/CSV for v1, with a flag to revisit shape after first customer onboarding.

6.3 Recency tie-break

When a property has both site notes and a public EPC, the newer of the two wins. Rationale: a recent EPC may reflect retrofit work done after our survey; conversely a recent survey reflects on-site observations the EPC cannot capture.

This tie-break is implemented in Property.source_path and may be tuned later (e.g. always prefer surveys regardless of date, or per-portfolio policy).

6.4 Rebaselining trigger

The modelling pipeline re-predicts SAP / carbon / heat / kwh whenever:

effective_epc differs from the canonical baseline (i.e. raw EPC with no overrides), or
The previous modelling snapshot is missing or stale.

The exact diff mechanism (hash of effective EPC, dirty-flag on overrides, timestamp comparison) is an implementation detail; recommendation is to start with a content hash stored alongside the previous run.

6.5 Deprecated concepts

Patches (patch_epc) — removed. Functionality subsumed by LandlordOverrides.
Already-installed measures — likely subsumed by LandlordOverrides ("we have a heat pump now" → override heating fields). Confirmed at implementation time.
Non-invasive recommendations — TBD whether this concept survives; not blocking.

7. Persistence: repositories and unit of work

7.1 What a repository is

A repository owns the SQL for one aggregate. Nothing else writes SQL for that aggregate. Callers see only domain objects.

class PropertyRepo(Protocol):
    def get(self, identity: PropertyIdentity) -> Optional[Property]: ...
    def bulk_save(self, uow: UnitOfWork, properties: Properties) -> None: ...
    def find_by_portfolio(self, portfolio_id: UUID) -> Properties: ...
    def find_stale(self, portfolio_id: UUID, threshold: timedelta) -> Properties: ...

Implementation references current db_funcs.* modules during phase 0 to avoid a big-bang SQL rewrite, but the interface is fixed.

7.2 Unit of Work

Multi-table writes inside a single aggregate, or across aggregates that share a transaction (e.g. property + plan + recommendations) go through a UnitOfWork:

with self.uow_factory() as uow:
    self.property_repo.bulk_save(uow, properties)
    self.recommendations_repo.bulk_save(uow, plans)
    uow.commit()

UoW owns the SQLAlchemy session lifecycle. Repos use the session passed in via the UoW. Outside a UoW, repos use a short-lived read session.

7.3 Repository inventory

Repo	Tables it owns
`PropertyRepo`	`properties`, `property_details_epc`, `property_spatial`
`EpcCacheRepo`	new table: `epc_api_cache` (TTL, raw API response, mapped `EpcPropertyData`)
`SiteNotesRepo`	new table: `site_notes` (replaces current `energy_assessments`)
`LandlordOverridesRepo`	new table: `landlord_overrides` (sparse, per-field rows for audit)
`RecommendationsRepo`	`plans`, `recommendations`, `recommendation_parts`, `scenarios`
`GenericDataRepo`	new table or S3-backed: UPRN-range geospatial + postcode-keyed shared static data
`SubtaskRepo`	`tasks`, `subtasks` (existing)

DDL migrations are scoped to sub-PRD (iii).

7.4 Fakes

For tests, each repo has a FakeXRepo companion backed by a dict. Service unit tests inject fakes. No DB required.

8. ML contract

8.1 Where ML lives

Concern	Owner
Defining the EPC → features transform	This repo (`ara.domain.ml.EpcMlTransform`)
Loading data, applying transform, writing training parquet to S3	This repo (sub-PRD (ii) batch job)
Training, hyperparameter search, deployment	Autogluon repo
Scoring at modelling time	This repo (`FeatureBuilder` calls `EpcMlTransform`, sends DataFrame to deployed lambda)

The autogluon repo is intentionally dumb: it consumes parquet, knows which column is the target, knows which columns to ignore. It has no EPC semantics.

8.2 `EpcMlTransform`

A separate class (not a method on EpcPropertyData), because:

The data class stays clean of training-infrastructure concerns.
Versioned transforms (EpcMlTransformV1, EpcMlTransformV2) swap easily.
Future need: injection of normalisation stats from the training set is straightforward on a class, awkward on a dataclass.

class EpcMlTransform:
    VERSION: str = "1.0.0"  # semver

    def to_row(self, epc: EpcPropertyData) -> dict[str, Any]: ...
    def to_rows(self, properties: Properties) -> pd.DataFrame: ...
    def schema(self) -> dict[str, type]: ...  # for parquet emission + validation

The interesting work — flattening List[SapWindow], List[SapBuildingPart] into fixed-width columns — lives inside this class. Domain decisions (top-N windows, aggregate roofs, etc.) are encoded here and reviewed by Khalim. Sub-PRD (ii) goes into detail.

8.3 Versioning

Transform class is semver-tagged (VERSION = "1.0.0").
S3 path for training parquet includes the version: s3://.../training/v1.0.0/....
Deployed scoring lambda is tagged with the transform version it was trained against.
Modelling pipeline asserts at startup that its EpcMlTransform.VERSION matches the deployed lambda's tag; mismatch = hard fail at deploy time.

Bump major when removing or renaming columns. Bump minor when adding optional columns (older models still scoreable; new models can be trained against new fields).

8.4 Two model families, one transform

Both ML services use the same transform:

Service	Lambda	Target
`KwhPredictionService` (service #5)	`kwh-models-*`	annual kWh + bills
`ImpactPredictionService` (service #7)	`impact-models-*`	SAP, carbon, heat demand, post-retrofit kWh

The two families are trained against the same input feature schema; only target columns differ. Sub-PRD (ii) handles training-time details.

9. Service catalogue

Twelve classes implement the modelling pipeline end-to-end. Detailed signatures are deliberately left for implementers — this PRD documents purpose, dependencies, and rough shape.

9.1 Fetchers (called by `IngestionPipeline`)

#	Class	Purpose	Dependencies
F1	`EpcClientService`	Fetches EPCs from new gov API. Already exists at `backend/epc_client/`.	httpx
F2	`GeospatialFetcher`	Fetches UPRN-range geospatial data (replaces `OpenUprnClient` use in current engine).	S3 / Ordnance Survey API
F3	`SolarFetcher`	Wraps Google Solar API; building-level + unit-level scenes.	Google Solar API
F4	`SiteNotesIngester`	Loads site notes from Excel uploads / structured input. Persists via `SiteNotesRepo`.	S3, repo

9.2 Domain services (called by `ModellingPipeline`)

#	Class	Original-list #	Purpose	Reads	Writes
S1	`EpcRemappingService`	4	Re-map legacy / historical EPCs into new `EpcPropertyData` shape.	`EpcCacheRepo`	`EpcCacheRepo` (mapped column)
S2	`EpcPredictionService`	3	For every property: produce predicted EPC + per-field anomaly flags vs neighbours. Used both for gap-fill (Path 2 if EPC missing) and UI surfacing.	`EpcCacheRepo`, `GenericDataRepo`	—
S3	`FeatureBuilder`	(new)	Wraps `EpcMlTransform`. Converts `Properties` → scoring DataFrame.	—	—
S4	`KwhPredictionService`	5	Calls kWh + bills ML lambda; attaches results to `Property.baseline_predictions` / per-measure.	`FeatureBuilder`	—
S5	`RecommendationService`	6	Generates per-property recommendations using `effective_epc`, materials, exclusions, etc. Replaces current `Recommendations` (1383 LOC).	`MaterialsRepo`	—
S6	`ImpactPredictionService`	7	Calls SAP / carbon / heat / bills impact lambda for each recommendation.	`FeatureBuilder`	—
S7	`OptimiserService`	8	Produces optimised retrofit packages. Wraps current `CostOptimiser` / `GainOptimiser` / `optimise_with_scenarios`.	—	—
S8	`ResultsPersister`	9	Final step: writes plans, recommendations, property updates via repos under one UoW.	—	All write repos

9.3 Orchestrators

#	Class	Purpose
O1	`IngestionPipeline`	Per-batch SQS consumer. Calls F1–F4, persists via repos.
O2	`ModellingPipeline`	Per-batch SQS consumer. Reads from repos, runs S1→S8 in order, ends with persistence.
O3	`RefreshOrchestrator`	Top-level: triggers Ingestion → diff → optionally Modelling.

9.4 `ModellingPipeline` step order

For each Property in the batch:

1. PropertyRepo.get()  →  Property (epc, site_notes, overrides, geospatial, solar)
2. EpcRemappingService — if epc is in legacy schema, upgrade to current
3. EpcPredictionService — produce predicted EPC + anomaly flags (always runs)
4. Compute Property.effective_epc (path-1 or path-2)
5. KwhPredictionService — baseline kwh + bills
6. RecommendationService — generate candidate measures
7. ImpactPredictionService — predict per-measure impact
8. OptimiserService — select optimal package
9. KwhPredictionService — re-score on optimised package (tenant savings)
10. ResultsPersister — write Plan + Recommendations under one UoW

Steps 1–4 are per-property. Steps 5–9 batch the whole batch into one ML call where possible (the lambdas accept a DataFrame; today's code already batches).

9.5 Per-service contracts — deferred

Method signatures, return types, error semantics, and edge-case behaviour are explicitly out of scope for this PRD. The implementer of each service runs a /grill-me session against this document and produces a detailed sub-design before coding.

10. Cross-batch concerns

Concern	Status	Approach
Building-level solar adjustment	Deferred — future TODO, not implemented today.	The current `building_ids` block in `model_engine` is dead-ish; it operates on the in-process batch only. New design preserves that limitation. Future feature: a post-modelling consolidation pass that groups results by `building_id` across batches and re-optimises.
Portfolio aggregation	Dropped.	Front-end computes aggregations dynamically from per-property plans. `extract_portfolio_aggregation_data` in current engine is dead code (defined, never called) — deleting.
Shared upstream data	Handled by orchestrator partitioning + `GenericDataRepo`.	Trigger endpoint groups UPRNs by postcode / UPRN-range before SQS chunking so each batch maximises intra-batch sharing. `GenericDataRepo` caches across batches so first batch pays, subsequent batches hit cache.

11. Directory layout

Proposal — team to tweak.

ara/                                  # new top-level package, sibling of backend/
├── domain/
│   ├── __init__.py
│   ├── property.py                   # Property aggregate
│   ├── properties.py                 # Properties collection
│   ├── identity.py                   # PropertyIdentity, AddressLines
│   ├── site_notes.py                 # SiteNotes (replaces energy_assessment)
│   ├── landlord_overrides.py
│   ├── geospatial.py
│   ├── solar.py
│   ├── recommendations.py            # Recommendation, OptimisedPackage
│   ├── predictions.py                # BaselinePredictions, ImpactPredictions
│   ├── anomaly_flags.py              # EpcAnomalyFlags
│   └── ml/
│       ├── __init__.py
│       ├── transform.py              # EpcMlTransform (versioned)
│       └── schema.py                 # scoring DataFrame schema
│
├── fetchers/
│   ├── __init__.py
│   ├── epc_client.py                 # alias / re-export of backend/epc_client/
│   ├── geospatial.py
│   ├── solar.py
│   └── site_notes_ingester.py
│
├── repos/
│   ├── __init__.py
│   ├── unit_of_work.py
│   ├── property_repo.py
│   ├── epc_cache_repo.py
│   ├── site_notes_repo.py
│   ├── landlord_overrides_repo.py
│   ├── recommendations_repo.py
│   ├── generic_data_repo.py
│   └── subtask_repo.py
│
├── services/
│   ├── __init__.py
│   ├── epc_remapping.py
│   ├── epc_prediction.py             # nearby-similar + anomaly flags
│   ├── feature_builder.py            # uses domain.ml.EpcMlTransform
│   ├── kwh_prediction.py
│   ├── impact_prediction.py
│   ├── recommendation.py
│   ├── optimiser.py                  # wraps recommendations/optimiser/
│   └── results_persister.py
│
├── orchestrators/
│   ├── __init__.py
│   ├── ingestion_pipeline.py
│   ├── modelling_pipeline.py
│   └── refresh_orchestrator.py
│
├── api/
│   ├── __init__.py
│   ├── routers/
│   │   ├── ingestion.py              # if two APIs
│   │   └── modelling.py
│   └── schemas/                      # request/response Pydantic models
│
└── tests/
    ├── fakes/                        # FakePropertyRepo, FakeEpcClient, etc.
    ├── unit/                         # service tests using fakes only
    └── integration/                  # real DB + real SQS via localstack

backend/ continues to host the legacy code during phase 1. Once the last portfolio is migrated, backend/engine/, backend/SearchEpc.py, backend/Property.py are deleted.

Reused intact (no rewrite needed):

backend/epc_client/ — the new gov API client. Wrapped by ara/fetchers/epc_client.py.
datatypes/epc/domain/ — the new EPC schema. Property.epc: EpcPropertyData references it directly.
recommendations/optimiser/ — wrapped by ara/services/optimiser.py.
backend/app/db/ — repos delegate into db_funcs.* until the SQL is rewritten under sub-PRD (iii).

12. Testing strategy

12.1 Unit tests (the bulk)

Every service test injects fake fetchers and fake repos. No DB, no network, no ML lambda. A service test verifies one slice of logic in 5–30 lines.

Example:

def test_epc_prediction_flags_anomalous_wall_type():
    neighbours = [_make_epc(wall_construction="solid") for _ in range(5)]
    target = _make_property(epc=_make_epc(wall_construction="cavity"))
    repo = FakeGenericDataRepo(neighbours_by_postcode={target.identity.postcode: neighbours})

    svc = EpcPredictionService(generic_repo=repo)
    result = svc.run(Properties([target]))

    assert result[0].epc_anomaly_flags.wall_construction == "differs_from_neighbours"

12.2 Integration tests

One per pipeline (Ingestion, Modelling, Refresh). Real Postgres (testcontainers or localstack), fake fetchers (hitting recorded fixtures), fake ML lambdas (returning canned predictions). Catches schema / SQL / transaction issues.

12.3 Contract tests

The transform (EpcMlTransform) has its own test suite:

Golden file: given a fixed Property, output matches an expected DataFrame row exactly.
Schema test: the output columns exactly match a checked-in CSV header (so autogluon team sees breakage on PR).

12.4 What is NOT tested

The autogluon repo's training code — owned there.
The gov EPC API behaviour — assumed via the official spec.
Front-end aggregation logic — owned there.

13. Observability

Each pipeline step emits a structured log line at start and end with:

{step, property_id, uprn, portfolio_id, subtask_id, duration_ms, outcome, error?}

Errors propagate with the Property.identity attached, so a portfolio of 100k can be triaged by grep.

The existing task/subtask state machine is preserved — IngestionPipeline and ModellingPipeline update subtask status at start (in progress), end (complete / failed), with the CloudWatch log URL attached as today.

CloudWatch alarms exist on subtask failure rate; thresholds remain unchanged.

14. Data flow: a worked example

A landlord uploads a corrected heating system for UPRN 12345 via the UI.

UI → POST /properties/12345/overrides → writes to landlord_overrides table via LandlordOverridesRepo.
RefreshOrchestrator invoked (either automatically on override-write, or by a "re-model" button). Notes: ingestion is not triggered because no external state changed.
ModellingPipeline invoked on a batch of [12345]:
- Reads Property(uprn=12345) from PropertyRepo.
- Property.effective_epc = epc + landlord_overrides → heating system fields differ from baseline.
- Rebaselining triggered: KwhPredictionService re-predicts baseline SAP / carbon / heat / kwh.
- RecommendationService regenerates recommendations against the new baseline.
- OptimiserService re-picks optimal package.
- ResultsPersister writes new plan under one UoW (old plan is superseded; whether to soft-archive is a sub-PRD (iii) decision).

Total external calls: zero. The override write is the only thing that hit a network boundary, and that was the inbound HTTP from the UI.

15. Open questions for team review

One API vs two (§4.5) — clean interfaces allow either; pick at implementation.
LandlordOverrides shape (§6.2) — flat-Excel-shape for v1, with a flag to revisit after first customer.
already_installed and non_invasive_recommendations (§6.5) — both likely subsumed by overlay, but final call deferred.
Recency tie-break policy (§6.3) — default "newer wins"; team to consider per-portfolio override.
GenericDataRepo storage backend — Postgres table, S3, or DynamoDB. Postgres is the path of least infra change; recommend defaulting to that.
Soft-archive vs hard-overwrite for superseded plans (§14) — affects audit / undo behaviour. Defer to sub-PRD (iii).
Building-level optimisation as a Phase 2 service (§10) — agreed deferred; flag for roadmap discussion.
Transform versioning policy (§8.3) — semver chosen; team to confirm bump conventions.

16. Linked sub-PRDs (placeholders)

Sub-PRD (ii) — ML training pipeline — docs/sub-prds/ml-training-pipeline.md (TBC)
Sub-PRD (iii) — DB schema migration — docs/sub-prds/db-schema-migration.md (TBC)
Sub-PRD (iv) — Historical EPC re-mapping — docs/sub-prds/historical-epc-remap.md (TBC)

Each sub-PRD owner: TBC. Each is independently reviewable but consumes the contracts defined in §5 (Property aggregate), §7 (repos), §8 (ML transform).

17. Next steps

Team review of this PRD (target: ~1 week).
Open follow-up grill sessions per service (/grill-me on each of S1–S8 + F1–F4) before that service is implemented.
Break into issues via /to-issues against the project tracker.
Stand up the empty ara/ package skeleton + fakes + first integration-test scaffold as PR-1.
Land services in dependency order: domain → repos → fetchers → services → orchestrators → API.

Phase 1 milestone gate: first portfolio routed through new pipeline with parity against old engine (manual spot-check on 5 representative properties).

33 KiB Raw Blame History Unescape Escape