diff --git a/ara_backend_design.md b/ara_backend_design.md new file mode 100644 index 00000000..f901936b --- /dev/null +++ b/ara_backend_design.md @@ -0,0 +1,648 @@ +# ARA Backend Redesign — Design PRD + +**Status**: Draft for team review +**Author**: Khalim Conn-Kowlessar (with Claude grill session) +**Branch**: `ara-backend-design-prd` +**Scope**: Service architecture + domain model + contracts for the new modelling backend. Linked sub-PRDs cover ML training pipeline, DB schema migration, and historical EPC re-mapping. + +--- + +## 1. Context + +### 1.1 The forcing function + +The current modelling backend (`backend/engine/engine.py` — `model_engine`, 1331 LOC) was built as an MVP. It is: + +- **Tightly coupled** to a specific gov EPC API that is being **decommissioned on 30 May 2026** (~17 days from today). +- **A monolith** — one async function reaches into DB modules, HTTP clients, ML lambdas, S3, and queue infrastructure directly. +- **Bottlenecked on a single person** — Khalim is the only contributor able to safely modify the engine because no one else can predict the blast radius of a change. +- **Already returning erroneous data** from the old API (clients are aware). The replacement API is partially built (`backend/epc_client/epc_client_service.py`) on the current feature branch. + +### 1.2 What needs to change + +Beyond just swapping API clients, this is the moment to **rebuild the backend into a production-grade, contribute-able codebase**, with: + +- A clear domain model rooted in the new EPC schema (`EpcPropertyData`). +- Service boundaries that other team members can read, fix, and extend without needing the entire mental model. +- Repository-mediated persistence so business logic can be tested without spinning up a database. +- A separation between **data fetching** (slow, IO-heavy, external) and **modelling** (deterministic, fast, internal). + +### 1.3 Out of scope for this PRD + +These ship as **linked sub-PRDs**: + +- **Sub-PRD (ii) — ML training pipeline** (autogluon repo + parquet generation in this repo + scoring model retraining for the new EPC schema) +- **Sub-PRD (iii) — DB schema migration** (new tables: `site_notes`, `landlord_overrides`, EPC cache, parallel write strategy) +- **Sub-PRD (iv) — Historical EPC re-mapping** (one-off + ongoing batch job: legacy stored EPCs → new `EpcPropertyData` shape) + +The contracts this PRD defines are the inputs each sub-PRD consumes. + +--- + +## 2. Goals and non-goals + +### 2.1 Goals + +1. **Survive the 30 May API shutdown** — even if it means a brief degraded window, modelling continues to function against the new gov EPC API. +2. **Decouple data fetching from modelling** — modelling never makes external HTTP calls; it reads everything from repositories. +3. **Make every service unit-testable against fakes** — no test needs a real DB, a real gov API, or a real ML lambda to verify business logic. +4. **Establish a single `Property` aggregate root** as the domain centrepiece; all 9 modelling concerns are slices of one aggregate. +5. **Versioned ML data contract** — the EPC-to-features transform is the single shared artifact between this repo and the autogluon repo. +6. **Per-property UI surfaces** — fetched data is shown to users for review and override **before** modelling runs; modelling is triggered separately. + +### 2.2 Non-goals + +- Multi-region deploy, GDPR-class data minimisation work, or compliance reporting — separate workstreams. +- Replacement of the front-end. The new APIs preserve enough of the existing response shape that the FE migrates incrementally. +- Removing pandas. The ML transform output is a parquet-friendly DataFrame-like shape; that stays. +- A workflow engine (Prefect / Temporal / Airflow). Coordinator-class orchestration plus the existing SQS-fanout pattern is sufficient at the scale we serve. + +--- + +## 3. Cutover strategy + +Two-phase cutover, driven by the 30 May deadline. + +### 3.1 Phase 0 — Stopgap (now → end of May) + +- The current `model_engine` keeps running. `SearchEpc` is rewired to delegate to `EpcClientService` (the new gov API client already built on this branch). +- Old-schema EPCs persisted in the DB are read as-is; the EPC re-mapping service is not yet wired in. +- Goal: no modelling outage at the API death date. Some degraded behaviour acceptable; clients are aware. + +### 3.2 Phase 1 — Strangler (June → ~Q4 2026) + +- New `ara/` package built alongside the old code. New endpoints expose the new pipeline. The old `model_engine` keeps running. +- Per-portfolio feature flag: when set, the trigger endpoint routes the portfolio through the new pipeline. Default is the old pipeline. +- Each of the 9 services is built, tested, and ships independently. Adding a service to the new pipeline does not require deleting the old one. +- When confidence is high (last portfolio migrated, no regressions seen for N weeks), the old engine is deleted. + +### 3.3 What is **not** done + +- No parallel-shadow run (run both, diff outputs). Reason: doubles compute per plan, requires diff tooling we don't have, and the old engine is already known to return bad data — diffs would be noise. +- No big-bang switch. Reason: 9 services is too much change to land in one PR. + +--- + +## 4. Architecture overview + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ Trigger endpoint(s) │ +│ (one or two — see §4.5; deferred decision) │ +└───────────┬──────────────────────────────────────────┬──────────────┘ + │ │ + ▼ ▼ + ┌─────────────────┐ ┌─────────────────┐ + │ IngestionPipe │ SQS, batches of N │ ModellingPipe │ + │ ----------- │ ◄─────────────────────│ ----------- │ + │ Fetchers run │ │ Reads via Repos │ + │ Persist via │ │ Calls Services │ + │ Repos │ │ ML predictions │ + └────────┬────────┘ └────────┬────────┘ + │ │ + └───────────────► Repos ◄─────────────────┘ + │ + ▼ + ┌──────────────────┐ + │ Postgres tables │ + │ (property, │ + │ epc_cache, │ + │ site_notes, │ + │ landlord_ │ + │ overrides, │ + │ plans, etc.) │ + └──────────────────┘ + + ┌──────────────────────────┐ + │ RefreshOrchestrator │ triggers Ingestion → diff → conditionally Modelling + └──────────────────────────┘ +``` + +### 4.1 Class taxonomy + +Every class falls into exactly one of four roles: + +| Role | Job | Examples | +|------|-----|----------| +| **Fetchers** | Call external APIs. Return raw response data. No DB. | `EpcClientService`, `GeospatialFetcher`, `SolarFetcher`, `SiteNotesIngester` | +| **Repos** | Persist and load domain aggregates. SQL hidden inside. No external IO. | `PropertyRepo`, `EpcCacheRepo`, `SiteNotesRepo`, `LandlordOverridesRepo`, `RecommendationsRepo`, `GenericDataRepo`, `SubtaskRepo` | +| **Services** | Business logic over domain objects. No external IO except via injected Fetchers / Repos. | `EpcRemappingService`, `EpcPredictionService`, `KwhPredictionService`, `ImpactPredictionService`, `RecommendationService`, `OptimiserService`, `FeatureBuilder`, `ResultsPersister` | +| **Orchestrators** | Compose Fetchers + Services + Repos to produce an end-to-end result. The only place where step order is encoded. | `IngestionPipeline`, `ModellingPipeline`, `RefreshOrchestrator` | + +This taxonomy is **strict**. A class that fetches *and* persists belongs in the Service layer and depends on a Fetcher + a Repo. No back-channels. + +### 4.2 Two pipelines, one direction + +Data flows one way only: **Ingestion → Repos → Modelling**. + +- **Ingestion** writes; never calls Modelling. +- **Modelling** reads; never calls Fetchers. + +If Modelling needs fresh data, it returns "stale" and the caller decides whether to ingest first. This makes Modelling a pure function of repository state, which is the property that makes it reproducible, debuggable, and testable. + +### 4.3 RefreshOrchestrator + +Sits above both pipelines. Job: + +1. Trigger `IngestionPipeline` for a portfolio. +2. After ingestion completes, ask repos: "did anything change vs the last modelled snapshot?" +3. If yes, trigger `ModellingPipeline`. If no, return early. + +This avoids re-modelling 100k properties when only 200 had refreshed EPC data. + +### 4.4 SQS fanout (preserved from current architecture) + +The existing `trigger_plan_entrypoint` SQS-chunking pattern is kept. Both pipelines fan out per batch of ~30–100 properties (tuneable). Each consumer runs one batch end-to-end through the relevant pipeline. + +UPRN partitioning: the trigger endpoint groups UPRNs by **locality** (postcode prefix / UPRN range) before chunking, so each batch maximises shared upstream fetches (one geospatial-range pull serves all 30 properties in the batch). + +### 4.5 One API or two? (deferred) + +The team will decide at implementation time whether Ingestion and Modelling sit behind: + +- **(a) One unified API** with a single trigger endpoint that runs both phases. +- **(b) Two APIs**, each with its own trigger, RefreshOrchestrator chains them. + +Either is workable if the class taxonomy is preserved. Deferred to implementation review. + +--- + +## 5. Domain model + +### 5.1 Aggregate root: `Property` + +`Property` is the centrepiece. Every service operates on one or more `Property` instances. Every repo writes one slice of `Property`. The aggregate carries all state for a single property's modelling run. + +```python +@dataclass +class PropertyIdentity: + portfolio_id: UUID + uprn: Optional[int] + landlord_property_id: Optional[str] + address: AddressLines + postcode: str + +@dataclass +class Property: + identity: PropertyIdentity + + # --- Source data — modelling path is determined by which of these are set --- + epc: Optional[EpcPropertyData] # from gov API (or remapped historical) + site_notes: Optional[SiteNotes] # our own survey; supersedes EPC when present + landlord_overrides: Optional[LandlordOverrides] # sparse, only meaningful when epc set + + # --- Enrichments --- + geospatial: Optional[GeoSpatial] + solar: Optional[SolarPotential] + epc_anomaly_flags: Optional[EpcAnomalyFlags] # from EpcPredictionService vs neighbours + + # --- Modelling outputs --- + baseline_predictions: Optional[BaselinePredictions] # SAP/carbon/heat after rebaselining + recommendations: list[Recommendation] + impact_predictions: Optional[ImpactPredictions] + optimised_package: Optional[OptimisedPackage] + + # --- Derived --- + @property + def source_path(self) -> Literal["site_notes", "epc_with_overlay"]: ... + + @property + def effective_epc(self) -> EpcPropertyData: + """The EPC the modelling pipeline actually scores against.""" + ... +``` + +### 5.2 `Properties` collection + +A first-class iterable, so batch operations are obvious: + +```python +@dataclass +class Properties: + items: list[Property] + + def __iter__(self) -> Iterator[Property]: ... + def __len__(self) -> int: ... + def filter(self, pred: Callable[[Property], bool]) -> "Properties": ... + def map(self, fn: Callable[[Property], Property]) -> "Properties": ... + def with_landlord_overrides(self) -> "Properties": ... +``` + +Services typically take and return `Properties`, not lists. + +### 5.3 Other aggregates + +| Aggregate | Owns | Repo | +|---|---|---| +| `Property` | property identity, epc, site_notes, landlord_overrides, enrichments, modelling results | `PropertyRepo` | +| `Plan` | per-property modelling output, scenario membership, plan + recommendations + parts | `RecommendationsRepo` | +| `Scenario` | portfolio-wide scenario metadata | `RecommendationsRepo` | +| `Subtask` / `Task` | SQS fanout state | `SubtaskRepo` | +| `EpcCache` | gov-API responses keyed by UPRN, with freshness/TTL | `EpcCacheRepo` | +| `GenericData` | UPRN-range geospatial, postcode lookups, shared static data | `GenericDataRepo` | + +Aggregates are loaded **whole** — never half a `Property`. If a slice is too large to load eagerly (e.g. recommendation history), it lives in a separate aggregate. + +--- + +## 6. Source-of-truth and overlay precedence + +There are exactly **two modelling paths**. The `Property.source_path` property selects. + +### 6.1 Path 1 — Site notes + +If a `Property` has `site_notes` and they are newer than any available EPC (or no EPC exists), site notes are the **complete** source of truth: + +- `effective_epc` = `site_notes.to_epc_property_data()`. +- EPC fields not covered by site notes — **none expected**. Site notes are committed to being a full-coverage survey. Treat any gap as a survey-quality bug, not a fallback signal. +- `LandlordOverrides` are not applicable in Path 1 (the survey supersedes). + +### 6.2 Path 2 — EPC with landlord overlay + +If a `Property` has no site notes (or the EPC is newer): + +- `effective_epc` = `epc` with `landlord_overrides` applied as a sparse field-level overlay (`landlord > epc`). +- `LandlordOverrides` are sparse: each row represents one corrected field. Schema TBD at implementation time; assume flat input via Excel/CSV for v1, with a flag to revisit shape after first customer onboarding. + +### 6.3 Recency tie-break + +When a property has **both** site notes and a public EPC, the newer of the two wins. Rationale: a recent EPC may reflect retrofit work done after our survey; conversely a recent survey reflects on-site observations the EPC cannot capture. + +This tie-break is implemented in `Property.source_path` and may be tuned later (e.g. always prefer surveys regardless of date, or per-portfolio policy). + +### 6.4 Rebaselining trigger + +The modelling pipeline re-predicts SAP / carbon / heat / kwh whenever: + +- `effective_epc` differs from the canonical baseline (i.e. raw EPC with no overrides), **or** +- The previous modelling snapshot is missing or stale. + +The exact diff mechanism (hash of effective EPC, dirty-flag on overrides, timestamp comparison) is an implementation detail; recommendation is to start with a content hash stored alongside the previous run. + +### 6.5 Deprecated concepts + +- **Patches** (`patch_epc`) — removed. Functionality subsumed by `LandlordOverrides`. +- **Already-installed measures** — likely subsumed by `LandlordOverrides` ("we have a heat pump now" → override heating fields). Confirmed at implementation time. +- **Non-invasive recommendations** — TBD whether this concept survives; not blocking. + +--- + +## 7. Persistence: repositories and unit of work + +### 7.1 What a repository is + +A repository owns the SQL for one aggregate. Nothing else writes SQL for that aggregate. Callers see only domain objects. + +```python +class PropertyRepo(Protocol): + def get(self, identity: PropertyIdentity) -> Optional[Property]: ... + def bulk_save(self, uow: UnitOfWork, properties: Properties) -> None: ... + def find_by_portfolio(self, portfolio_id: UUID) -> Properties: ... + def find_stale(self, portfolio_id: UUID, threshold: timedelta) -> Properties: ... +``` + +Implementation references current `db_funcs.*` modules during phase 0 to avoid a big-bang SQL rewrite, but the interface is fixed. + +### 7.2 Unit of Work + +Multi-table writes inside a single aggregate, or across aggregates that share a transaction (e.g. property + plan + recommendations) go through a `UnitOfWork`: + +```python +with self.uow_factory() as uow: + self.property_repo.bulk_save(uow, properties) + self.recommendations_repo.bulk_save(uow, plans) + uow.commit() +``` + +UoW owns the SQLAlchemy session lifecycle. Repos use the session passed in via the UoW. Outside a UoW, repos use a short-lived read session. + +### 7.3 Repository inventory + +| Repo | Tables it owns | +|------|----------------| +| `PropertyRepo` | `properties`, `property_details_epc`, `property_spatial` | +| `EpcCacheRepo` | new table: `epc_api_cache` (TTL, raw API response, mapped `EpcPropertyData`) | +| `SiteNotesRepo` | new table: `site_notes` (replaces current `energy_assessments`) | +| `LandlordOverridesRepo` | new table: `landlord_overrides` (sparse, per-field rows for audit) | +| `RecommendationsRepo` | `plans`, `recommendations`, `recommendation_parts`, `scenarios` | +| `GenericDataRepo` | new table or S3-backed: UPRN-range geospatial + postcode-keyed shared static data | +| `SubtaskRepo` | `tasks`, `subtasks` (existing) | + +DDL migrations are scoped to sub-PRD (iii). + +### 7.4 Fakes + +For tests, each repo has a `FakeXRepo` companion backed by a dict. Service unit tests inject fakes. No DB required. + +--- + +## 8. ML contract + +### 8.1 Where ML lives + +| Concern | Owner | +|---|---| +| Defining the EPC → features transform | **This repo** (`ara.domain.ml.EpcMlTransform`) | +| Loading data, applying transform, writing training parquet to S3 | **This repo** (sub-PRD (ii) batch job) | +| Training, hyperparameter search, deployment | **Autogluon repo** | +| Scoring at modelling time | **This repo** (`FeatureBuilder` calls `EpcMlTransform`, sends DataFrame to deployed lambda) | + +The autogluon repo is intentionally **dumb**: it consumes parquet, knows which column is the target, knows which columns to ignore. It has no EPC semantics. + +### 8.2 `EpcMlTransform` + +A separate class (not a method on `EpcPropertyData`), because: + +- The data class stays clean of training-infrastructure concerns. +- Versioned transforms (`EpcMlTransformV1`, `EpcMlTransformV2`) swap easily. +- Future need: injection of normalisation stats from the training set is straightforward on a class, awkward on a dataclass. + +```python +class EpcMlTransform: + VERSION: str = "1.0.0" # semver + + def to_row(self, epc: EpcPropertyData) -> dict[str, Any]: ... + def to_rows(self, properties: Properties) -> pd.DataFrame: ... + def schema(self) -> dict[str, type]: ... # for parquet emission + validation +``` + +The interesting work — flattening `List[SapWindow]`, `List[SapBuildingPart]` into fixed-width columns — lives inside this class. Domain decisions (top-N windows, aggregate roofs, etc.) are encoded here and reviewed by Khalim. Sub-PRD (ii) goes into detail. + +### 8.3 Versioning + +- Transform class is **semver-tagged** (`VERSION = "1.0.0"`). +- S3 path for training parquet includes the version: `s3://.../training/v1.0.0/...`. +- Deployed scoring lambda is tagged with the transform version it was trained against. +- Modelling pipeline asserts at startup that its `EpcMlTransform.VERSION` matches the deployed lambda's tag; mismatch = hard fail at deploy time. + +Bump major when removing or renaming columns. Bump minor when adding optional columns (older models still scoreable; new models can be trained against new fields). + +### 8.4 Two model families, one transform + +Both ML services use the same transform: + +| Service | Lambda | Target | +|---|---|---| +| `KwhPredictionService` (service #5) | `kwh-models-*` | annual kWh + bills | +| `ImpactPredictionService` (service #7) | `impact-models-*` | SAP, carbon, heat demand, post-retrofit kWh | + +The two families are trained against the same input feature schema; only target columns differ. Sub-PRD (ii) handles training-time details. + +--- + +## 9. Service catalogue + +Twelve classes implement the modelling pipeline end-to-end. Detailed signatures are deliberately left for implementers — this PRD documents purpose, dependencies, and rough shape. + +### 9.1 Fetchers (called by `IngestionPipeline`) + +| # | Class | Purpose | Dependencies | +|---|---|---|---| +| F1 | `EpcClientService` | Fetches EPCs from new gov API. Already exists at `backend/epc_client/`. | httpx | +| F2 | `GeospatialFetcher` | Fetches UPRN-range geospatial data (replaces `OpenUprnClient` use in current engine). | S3 / Ordnance Survey API | +| F3 | `SolarFetcher` | Wraps Google Solar API; building-level + unit-level scenes. | Google Solar API | +| F4 | `SiteNotesIngester` | Loads site notes from Excel uploads / structured input. Persists via `SiteNotesRepo`. | S3, repo | + +### 9.2 Domain services (called by `ModellingPipeline`) + +| # | Class | Original-list # | Purpose | Reads | Writes | +|---|---|---|---|---|---| +| S1 | `EpcRemappingService` | 4 | Re-map legacy / historical EPCs into new `EpcPropertyData` shape. | `EpcCacheRepo` | `EpcCacheRepo` (mapped column) | +| S2 | `EpcPredictionService` | 3 | For every property: produce predicted EPC + per-field anomaly flags vs neighbours. Used both for gap-fill (Path 2 if EPC missing) and UI surfacing. | `EpcCacheRepo`, `GenericDataRepo` | — | +| S3 | `FeatureBuilder` | (new) | Wraps `EpcMlTransform`. Converts `Properties` → scoring DataFrame. | — | — | +| S4 | `KwhPredictionService` | 5 | Calls kWh + bills ML lambda; attaches results to `Property.baseline_predictions` / per-measure. | `FeatureBuilder` | — | +| S5 | `RecommendationService` | 6 | Generates per-property recommendations using `effective_epc`, materials, exclusions, etc. Replaces current `Recommendations` (1383 LOC). | `MaterialsRepo` | — | +| S6 | `ImpactPredictionService` | 7 | Calls SAP / carbon / heat / bills impact lambda for each recommendation. | `FeatureBuilder` | — | +| S7 | `OptimiserService` | 8 | Produces optimised retrofit packages. Wraps current `CostOptimiser` / `GainOptimiser` / `optimise_with_scenarios`. | — | — | +| S8 | `ResultsPersister` | 9 | Final step: writes plans, recommendations, property updates via repos under one UoW. | — | All write repos | + +### 9.3 Orchestrators + +| # | Class | Purpose | +|---|---|---| +| O1 | `IngestionPipeline` | Per-batch SQS consumer. Calls F1–F4, persists via repos. | +| O2 | `ModellingPipeline` | Per-batch SQS consumer. Reads from repos, runs S1→S8 in order, ends with persistence. | +| O3 | `RefreshOrchestrator` | Top-level: triggers Ingestion → diff → optionally Modelling. | + +### 9.4 `ModellingPipeline` step order + +For each `Property` in the batch: + +``` +1. PropertyRepo.get() → Property (epc, site_notes, overrides, geospatial, solar) +2. EpcRemappingService — if epc is in legacy schema, upgrade to current +3. EpcPredictionService — produce predicted EPC + anomaly flags (always runs) +4. Compute Property.effective_epc (path-1 or path-2) +5. KwhPredictionService — baseline kwh + bills +6. RecommendationService — generate candidate measures +7. ImpactPredictionService — predict per-measure impact +8. OptimiserService — select optimal package +9. KwhPredictionService — re-score on optimised package (tenant savings) +10. ResultsPersister — write Plan + Recommendations under one UoW +``` + +Steps 1–4 are per-property. Steps 5–9 batch the whole batch into one ML call where possible (the lambdas accept a DataFrame; today's code already batches). + +### 9.5 Per-service contracts — deferred + +Method signatures, return types, error semantics, and edge-case behaviour are **explicitly out of scope** for this PRD. The implementer of each service runs a `/grill-me` session against this document and produces a detailed sub-design before coding. + +--- + +## 10. Cross-batch concerns + +| Concern | Status | Approach | +|---|---|---| +| Building-level solar adjustment | Deferred — future TODO, not implemented today. | The current `building_ids` block in `model_engine` is dead-ish; it operates on the in-process batch only. New design preserves that limitation. Future feature: a post-modelling consolidation pass that groups results by `building_id` across batches and re-optimises. | +| Portfolio aggregation | Dropped. | Front-end computes aggregations dynamically from per-property plans. `extract_portfolio_aggregation_data` in current engine is dead code (defined, never called) — deleting. | +| Shared upstream data | Handled by orchestrator partitioning + `GenericDataRepo`. | Trigger endpoint groups UPRNs by postcode / UPRN-range before SQS chunking so each batch maximises intra-batch sharing. `GenericDataRepo` caches across batches so first batch pays, subsequent batches hit cache. | + +--- + +## 11. Directory layout + +Proposal — team to tweak. + +``` +ara/ # new top-level package, sibling of backend/ +├── domain/ +│ ├── __init__.py +│ ├── property.py # Property aggregate +│ ├── properties.py # Properties collection +│ ├── identity.py # PropertyIdentity, AddressLines +│ ├── site_notes.py # SiteNotes (replaces energy_assessment) +│ ├── landlord_overrides.py +│ ├── geospatial.py +│ ├── solar.py +│ ├── recommendations.py # Recommendation, OptimisedPackage +│ ├── predictions.py # BaselinePredictions, ImpactPredictions +│ ├── anomaly_flags.py # EpcAnomalyFlags +│ └── ml/ +│ ├── __init__.py +│ ├── transform.py # EpcMlTransform (versioned) +│ └── schema.py # scoring DataFrame schema +│ +├── fetchers/ +│ ├── __init__.py +│ ├── epc_client.py # alias / re-export of backend/epc_client/ +│ ├── geospatial.py +│ ├── solar.py +│ └── site_notes_ingester.py +│ +├── repos/ +│ ├── __init__.py +│ ├── unit_of_work.py +│ ├── property_repo.py +│ ├── epc_cache_repo.py +│ ├── site_notes_repo.py +│ ├── landlord_overrides_repo.py +│ ├── recommendations_repo.py +│ ├── generic_data_repo.py +│ └── subtask_repo.py +│ +├── services/ +│ ├── __init__.py +│ ├── epc_remapping.py +│ ├── epc_prediction.py # nearby-similar + anomaly flags +│ ├── feature_builder.py # uses domain.ml.EpcMlTransform +│ ├── kwh_prediction.py +│ ├── impact_prediction.py +│ ├── recommendation.py +│ ├── optimiser.py # wraps recommendations/optimiser/ +│ └── results_persister.py +│ +├── orchestrators/ +│ ├── __init__.py +│ ├── ingestion_pipeline.py +│ ├── modelling_pipeline.py +│ └── refresh_orchestrator.py +│ +├── api/ +│ ├── __init__.py +│ ├── routers/ +│ │ ├── ingestion.py # if two APIs +│ │ └── modelling.py +│ └── schemas/ # request/response Pydantic models +│ +└── tests/ + ├── fakes/ # FakePropertyRepo, FakeEpcClient, etc. + ├── unit/ # service tests using fakes only + └── integration/ # real DB + real SQS via localstack +``` + +`backend/` continues to host the legacy code during phase 1. Once the last portfolio is migrated, `backend/engine/`, `backend/SearchEpc.py`, `backend/Property.py` are deleted. + +Reused intact (no rewrite needed): + +- `backend/epc_client/` — the new gov API client. Wrapped by `ara/fetchers/epc_client.py`. +- `datatypes/epc/domain/` — the new EPC schema. `Property.epc: EpcPropertyData` references it directly. +- `recommendations/optimiser/` — wrapped by `ara/services/optimiser.py`. +- `backend/app/db/` — repos delegate into `db_funcs.*` until the SQL is rewritten under sub-PRD (iii). + +--- + +## 12. Testing strategy + +### 12.1 Unit tests (the bulk) + +Every service test injects fake fetchers and fake repos. No DB, no network, no ML lambda. A service test verifies one slice of logic in 5–30 lines. + +Example: + +```python +def test_epc_prediction_flags_anomalous_wall_type(): + neighbours = [_make_epc(wall_construction="solid") for _ in range(5)] + target = _make_property(epc=_make_epc(wall_construction="cavity")) + repo = FakeGenericDataRepo(neighbours_by_postcode={target.identity.postcode: neighbours}) + + svc = EpcPredictionService(generic_repo=repo) + result = svc.run(Properties([target])) + + assert result[0].epc_anomaly_flags.wall_construction == "differs_from_neighbours" +``` + +### 12.2 Integration tests + +One per pipeline (Ingestion, Modelling, Refresh). Real Postgres (testcontainers or localstack), fake fetchers (hitting recorded fixtures), fake ML lambdas (returning canned predictions). Catches schema / SQL / transaction issues. + +### 12.3 Contract tests + +The transform (`EpcMlTransform`) has its own test suite: + +- Golden file: given a fixed `Property`, output matches an expected DataFrame row exactly. +- Schema test: the output columns exactly match a checked-in CSV header (so autogluon team sees breakage on PR). + +### 12.4 What is NOT tested + +- The autogluon repo's training code — owned there. +- The gov EPC API behaviour — assumed via the official spec. +- Front-end aggregation logic — owned there. + +--- + +## 13. Observability + +Each pipeline step emits a **structured log line** at start and end with: + +``` +{step, property_id, uprn, portfolio_id, subtask_id, duration_ms, outcome, error?} +``` + +Errors propagate with the `Property.identity` attached, so a portfolio of 100k can be triaged by grep. + +The existing task/subtask state machine is preserved — `IngestionPipeline` and `ModellingPipeline` update subtask status at start (`in progress`), end (`complete` / `failed`), with the CloudWatch log URL attached as today. + +CloudWatch alarms exist on subtask failure rate; thresholds remain unchanged. + +--- + +## 14. Data flow: a worked example + +A landlord uploads a corrected heating system for UPRN 12345 via the UI. + +1. **UI** → `POST /properties/12345/overrides` → writes to `landlord_overrides` table via `LandlordOverridesRepo`. +2. **RefreshOrchestrator** invoked (either automatically on override-write, or by a "re-model" button). Notes: ingestion is *not* triggered because no external state changed. +3. **ModellingPipeline** invoked on a batch of `[12345]`: + - Reads `Property(uprn=12345)` from `PropertyRepo`. + - `Property.effective_epc` = epc + landlord_overrides → heating system fields differ from baseline. + - Rebaselining triggered: `KwhPredictionService` re-predicts baseline SAP / carbon / heat / kwh. + - `RecommendationService` regenerates recommendations against the new baseline. + - `OptimiserService` re-picks optimal package. + - `ResultsPersister` writes new plan under one UoW (old plan is superseded; whether to soft-archive is a sub-PRD (iii) decision). + +Total external calls: zero. The override write is the only thing that hit a network boundary, and that was the inbound HTTP from the UI. + +--- + +## 15. Open questions for team review + +1. **One API vs two** (§4.5) — clean interfaces allow either; pick at implementation. +2. **`LandlordOverrides` shape** (§6.2) — flat-Excel-shape for v1, with a flag to revisit after first customer. +3. **`already_installed` and `non_invasive_recommendations`** (§6.5) — both likely subsumed by overlay, but final call deferred. +4. **Recency tie-break policy** (§6.3) — default "newer wins"; team to consider per-portfolio override. +5. **`GenericDataRepo` storage backend** — Postgres table, S3, or DynamoDB. Postgres is the path of least infra change; recommend defaulting to that. +6. **Soft-archive vs hard-overwrite** for superseded plans (§14) — affects audit / undo behaviour. Defer to sub-PRD (iii). +7. **Building-level optimisation as a Phase 2 service** (§10) — agreed deferred; flag for roadmap discussion. +8. **Transform versioning policy** (§8.3) — semver chosen; team to confirm bump conventions. + +--- + +## 16. Linked sub-PRDs (placeholders) + +- **Sub-PRD (ii) — ML training pipeline** — `docs/sub-prds/ml-training-pipeline.md` (TBC) +- **Sub-PRD (iii) — DB schema migration** — `docs/sub-prds/db-schema-migration.md` (TBC) +- **Sub-PRD (iv) — Historical EPC re-mapping** — `docs/sub-prds/historical-epc-remap.md` (TBC) + +Each sub-PRD owner: TBC. Each is independently reviewable but consumes the contracts defined in §5 (`Property` aggregate), §7 (repos), §8 (ML transform). + +--- + +## 17. Next steps + +1. Team review of this PRD (target: ~1 week). +2. Open follow-up grill sessions per service (`/grill-me` on each of S1–S8 + F1–F4) before that service is implemented. +3. Break into issues via `/to-issues` against the project tracker. +4. Stand up the empty `ara/` package skeleton + fakes + first integration-test scaffold as PR-1. +5. Land services in dependency order: domain → repos → fetchers → services → orchestrators → API. + +Phase 1 milestone gate: first portfolio routed through new pipeline with parity against old engine (manual spot-check on 5 representative properties).