From 8d6c770da8d4215f88c0b5b037d86b3181fdc237 Mon Sep 17 00:00:00 2001
From: Khalim Conn-Kowlessar <kconnkowlessar@gmail.com>
Date: Thu, 14 May 2026 16:36:22 +0000
Subject: [PATCH] grilling session updates to prd

---
 ara_backend_design.md | 350 ++++++++++++++++++++++++++++--------------
 1 file changed, 236 insertions(+), 114 deletions(-)
diff --git a/ara_backend_design.md b/ara_backend_design.md
index 109dd1fa..b6aa8f22 100644
--- a/ara_backend_design.md
+++ b/ara_backend_design.md
@@ -26,6 +26,7 @@ Beyond just swapping API clients, this is the moment to **rebuild the backend in
 - Service boundaries that other team members can read, fix, and extend without needing the entire mental model.
 - Repository-mediated persistence so business logic can be tested without spinning up a database.
 - A separation between **data fetching** (slow, IO-heavy, external) and **modelling** (deterministic, fast, internal).
+- Baseline kWh and bills derived deterministically from the Effective EPC (SAP physics + UCL correction + per-fuel rates from a refreshable repo) rather than from the EPC's stale cost fields or from an ML kWh prediction.
 
 ### 1.3 Out of scope for this PRD
 
@@ -159,14 +160,35 @@ The existing `trigger_plan_entrypoint` SQS-chunking pattern is kept. Both pipeli
 
 UPRN partitioning: the trigger endpoint groups UPRNs by **locality** (postcode prefix / UPRN range) before chunking, so each batch maximises shared upstream fetches (one geospatial-range pull serves all 30 properties in the batch).
 
-### 4.5 One API or two? (deferred)
+### 4.5 One endpoint for v1
 
-The team will decide at implementation time whether Ingestion and Modelling sit behind:
+For Phase 1 we ship **one trigger endpoint** that internally chains Ingestion → Modelling via `RefreshOrchestrator`. This matches the current FastAPI-fronted Lambda pattern (the FastAPI app in `services/<svc>/` is a thin entrypoint that invokes the modelling Lambda).
 
-- **(a) One unified API** with a single trigger endpoint that runs both phases. Most closely mimics what's live today.
-- **(b) Two APIs**, each with its own trigger, RefreshOrchestrator chains them. Separate API call for fetching and modelling.
+We can split into two endpoints later (refresh-only vs model-only) once a real workflow demands it — e.g. a Landlord-Override edit that should re-model without re-fetching open data. The class taxonomy and `RefreshOrchestrator` boundary allow this split without re-architecting.
 
-Either is workable if the class taxonomy is preserved. Deferred to implementation review.
+### 4.6 Trigger contract
+
+The trigger payload is reduced compared to today's `PlanTriggerRequest` ([backend/app/plan/schemas.py:98](../../backend/app/plan/schemas.py#L98)) — most of what's currently in the request body moves into the persisted `Scenario` aggregate.
+
+```python
+class ModelTriggerRequest(BaseModel):
+    portfolio_id: UUID
+    property_ids: list[UUID] | S3Ref           # inline up to ~10k, S3 ref above
+    scenario_ids: list[UUID]                   # 1+; resolved + pinned to ScenarioSnapshot at fan-out
+    task_id: UUID
+    subtask_id: UUID                           # SQS state machine, preserved from today
+```
+
+Everything that used to ride at the top level dies or moves:
+
+- `goal`, `budget`, `goal_value`, `inclusions`, `exclusions`, `required_measures`, `enforce_fabric_first`, `scenario_name`, `housing_type` → into `Scenario` / `ScenarioPhase`.
+- `patches_file_path`, `already_installed_file_path`, `non_invasive_recommendations_file_path` → gone; Landlord Overrides covers all three.
+- `valuation_file_path` → gone; `ValuationService` derives it.
+- `ashp_cop`, `default_u_values` → `HeatingSystemAssumptionsRepo` / global config; not per-trigger.
+- `multi_plan` → gone; `scenario_ids: list[...]` handles N runs natively (one Plan per scenario per property).
+- `event_type`, `epc_certificate_number`, `lmk_key`, `file_format`, `sheet_name`, `index_start`/`index_end`, `file_type` → ingestion-side concerns; if needed, ride on a separate ingestion-trigger payload.
+
+**Scenario snapshotting**: at fan-out time `RefreshOrchestrator` reads each requested `Scenario`, writes a `ScenarioSnapshot` keyed by `(task_id, scenario_id)`, and per-batch SQS messages reference the snapshot. Mid-run edits to the live `Scenario` do not affect an in-flight modelling job. Snapshots are read-only and can be garbage-collected after the task completes.
 
 ---
 
@@ -200,10 +222,10 @@ class Property:
     epc_anomaly_flags: Optional[EpcAnomalyFlags]  # from EpcPredictionService vs neighbours
 
     # --- Modelling outputs ---
-    baseline_performance: Optional[BaselinePerformance]   # SAP/carbon/heat (from EPC or rebaselined ML) + kWh + fuel split (always EPC + UCL + fuel deduction)
+    baseline_performance: Optional[BaselinePerformance]   # carries lodged + effective pair; see §5.4
     recommendations: list[Recommendation]
     impact_predictions: Optional[ImpactPredictions]
-    optimised_package: Optional[OptimisedPackage]
+    plans: list[Plan]                                     # one per Scenario the property was modelled against
 
     # --- Derived ---
     @property
@@ -238,14 +260,51 @@ Services typically take and return `Properties`, not lists.
 | Aggregate | Owns | Repo |
 |---|---|---|
 | `Property` | property identity, epc, site_notes, landlord_overrides, enrichments, modelling results | `PropertyRepo` |
-| `Plan` | per-property modelling output, scenario membership, plan + recommendations + parts | `RecommendationsRepo` |
-| `Scenario` | portfolio-wide scenario metadata | `RecommendationsRepo` |
+| `Plan` | per-property modelling output for one Scenario: ordered `phases: list[PlanPhase]`, each carrying its `OptimisedPackage`, ending state snapshot, and rolled-over options | `RecommendationsRepo` |
+| `Scenario` | portfolio-wide scenario metadata (goal, budget, exclusions, housing type) plus ordered `phases: list[ScenarioPhase]`; each phase carries `measure_types_allowed`, phase budget, phase target | `RecommendationsRepo` |
+| `ScenarioSnapshot` | frozen copy of a `Scenario` pinned at trigger time, keyed by `(task_id, scenario_id)`, so mid-run scenario edits don't affect an in-flight modelling job | `RecommendationsRepo` |
 | `Subtask` / `Task` | SQS fanout state | `SubtaskRepo` |
 | `EpcCache` | gov-API responses keyed by UPRN, with freshness/TTL | `EpcCacheRepo` |
 | `GenericData` | UPRN-range geospatial, postcode lookups, shared static data | `GenericDataRepo` |
+| `FuelRates` | time-versioned, region-aware per-fuel rates (pence/kWh), standing charges, SEG export rate, calorific values | `FuelRatesRepo` |
+| `CarbonFactors` | time-versioned per-fuel CO2 emission factors (kgCO2e/kWh); Defra publishes annually | `CarbonFactorsRepo` |
+| `HeatingSystemAssumptions` | boiler efficiency tables, ASHP/GSHP COPs, solar-thermal coverage proportion; per-property physical assumptions, not fuel-market data | `HeatingSystemAssumptionsRepo` |
 
 Aggregates are loaded **whole** — never half a `Property`. If a slice is too large to load eagerly (e.g. recommendation history), it lives in a separate aggregate.
 
+A single-phase Scenario is `phases: [<one ScenarioPhase>]` with all measure types allowed and the full budget on it — no special-case path through the pipeline.
+
+### 5.4 `BaselinePerformance` carries lodged + effective
+
+```python
+@dataclass
+class BaselinePerformance:
+    # As-lodged: unmodified EPC fields (or Site Notes' recorded values where Site Notes are the source).
+    lodged_sap: int
+    lodged_band: Epc
+    lodged_carbon: float
+    lodged_heat_demand: float
+
+    # Effective: what the modelling pipeline actually scored against.
+    # Equals lodged when neither rebaselining trigger fires; equals ML output when rebaselined.
+    effective_sap: int
+    effective_band: Epc
+    effective_carbon: float
+    effective_heat_demand: float
+
+    # kWh / fuel split / bills — always derived deterministically from the Effective EPC by
+    # EpcEnergyDerivationService (SAP physics + UCL correction + FuelRates lookup).
+    # Lodged kWh / bills are not stored separately — the EPC's cost fields are stale by design.
+    annual_kwh: float
+    fuel_split: dict[Fuel, float]
+    annual_bills: dict[Fuel, float]
+
+    rebaselined: bool
+    rebaseline_reason: Optional[Literal["pre_sap10", "physical_state_changed", "both"]]
+```
+
+The pair lets the FE show "lodged rating vs SAP10-equivalent rebaselined rating" side by side without a separate query. Both fields are always populated; when no rebaselining trigger fires, `effective_*` equals `lodged_*`.
+
 ---
 
 ## 6. Source-of-truth and overlay precedence
@@ -275,12 +334,16 @@ This tie-break is implemented in `Property.source_path` and may be tuned later (
 
 ### 6.4 Rebaselining trigger
 
-The modelling pipeline re-predicts SAP / carbon / heat / kwh whenever:
+ML re-predicts SAP / carbon / heat when **either** of these holds:
 
-- `effective_epc` differs from the canonical baseline (i.e. raw EPC with no overrides), **or**
-- The previous modelling snapshot is missing or stale.
+1. **Pre-SAP10 schema** — `effective_epc.sap_version < 10.0`. The EPC was rated under SAP 2012 (or earlier) and we want a SAP10-equivalent baseline so all properties are scored against the same model version. Canonical signal is the `sap_version: float` field; fall back to `schema_type` string, then to `lodgement_date` if both are absent. Site Notes are assumed SAP10 by construction (PasHub / ECMK produce them now) — Path 1 typically doesn't trigger this leg.
+2. **Physical state changed** — `effective_epc` differs from the lodged EPC's physical fields (walls / heating / windows / etc.). Triggered by Landlord Overrides changing physical state, or by Site Notes that contradict the lodged EPC.
 
-The exact diff mechanism (hash of effective EPC, dirty-flag on overrides, timestamp comparison) is an implementation detail; recommendation is to start with a content hash stored alongside the previous run.
+When triggered, a single ML call re-predicts SAP/carbon/heat with the current Effective EPC state as input. Both reasons can fire together; the prediction is still one call.
+
+kWh is **always** re-derived via `EpcEnergyDerivationService` — even when no ML rebaseline runs, because fuel rates change over time and the EPC's cost fields are stale by design.
+
+The diff mechanism for "physical state changed" (content hash, dirty flag, etc.) is an implementation detail; start with a content hash of the physical-state subset of `EpcPropertyData` stored alongside the previous run.
 
 ### 6.5 Deprecated concepts
 
@@ -327,8 +390,11 @@ UoW owns the SQLAlchemy session lifecycle. Repos use the session passed in via t
 | `EpcCacheRepo` | new table: `epc_api_cache` (TTL, raw API response, mapped `EpcPropertyData`) |
 | `SiteNotesRepo` | new table: `site_notes` (replaces current `energy_assessments`) |
 | `LandlordOverridesRepo` | new table: `landlord_overrides` (sparse, per-field rows for audit) |
-| `RecommendationsRepo` | `plans`, `recommendations`, `recommendation_parts`, `scenarios` |
+| `RecommendationsRepo` | `plans`, `plan_phases`, `recommendations`, `recommendation_parts`, `scenarios`, `scenario_phases`, `scenario_snapshots` |
 | `GenericDataRepo` | new table or S3-backed: UPRN-range geospatial + postcode-keyed shared static data |
+| `FuelRatesRepo` | new table: `fuel_rates` — `(fuel_type, rate_pence_per_kwh, standing_charge_pence_per_day, calorific_value_kwh_per_unit, unit, effective_from, effective_to, region_code Optional, source)`. SEG export rate is a row with `fuel_type = 'electricity_export'`. |
+| `CarbonFactorsRepo` | new table: `carbon_factors` — `(fuel_type, kgco2e_per_kwh, effective_from, effective_to, source)`. Defra publishes annually. |
+| `HeatingSystemAssumptionsRepo` | new table(s): boiler efficiency, ASHP/GSHP COP, solar-thermal coverage proportion. Static-ish, manual refresh. |
 | `SubtaskRepo` | `tasks`, `subtasks` (existing) |
 
 DDL migrations are scoped to sub-PRD (iii).
@@ -380,14 +446,16 @@ The interesting work — flattening `List[SapWindow]`, `List[SapBuildingPart]` i
 
 Bump major when removing or renaming columns. Bump minor when adding optional columns (older models still scoreable; new models can be trained against new fields).
 
-### 8.4 Two model families, one transform
+### 8.4 ML model families
 
-Both ML services use the same transform:
+Both ML calls (rebaselining + per-measure impact) use the same `EpcMlTransform`:
 
 | Service | Lambda | Target |
 |---|---|---|
-| `KwhImpactService` (service #5) | `kwh-models-*` | per-measure annual kWh + bills delta (post-optimisation re-score only) |
-| `ImpactPredictionService` (service #7) | `impact-models-*` | SAP, carbon, heat demand per-measure impact |
+| `RebaseliningService` (S4b) | `baseline-models-*` | SAP / carbon / heat demand under the current Effective EPC state (SAP10-equivalent) |
+| `ImpactPredictionService` (S6) | `impact-models-*` | SAP / carbon / heat demand impact per measure (and per battery option, using new EPC battery fields) |
+
+Annual kWh and bills are never an ML target — derived deterministically by `EpcEnergyDerivationService` (S4a). Recommendation kWh delta is derived from the SAP delta predicted by S6 plus heating-system fuel + COP, not via a separate ML call.
 
 The two families are trained against the same input feature schema; only target columns differ. Sub-PRD (ii) handles training-time details.
 
@@ -395,16 +463,20 @@ The two families are trained against the same input feature schema; only target
 
 ## 9. Service catalogue
 
-Twelve classes implement the modelling pipeline end-to-end. Detailed signatures are deliberately left for implementers — this PRD documents purpose, dependencies, and rough shape.
+The classes below implement the pipeline end-to-end. Detailed signatures are deliberately left for implementers — this PRD documents purpose, dependencies, and rough shape; per-service grill sessions produce the contracts.
+
+**Out of the legacy engine** (deleted, not migrated): `PredictionMatrix` (debug-only, moves to test fixtures), `extract_portfolio_aggregation_data` (dead code, FE aggregates dynamically per §10), inspections plumbing (`inspections_map` is initialised but never populated in the current engine), patches / `already_installed` / `non_invasive_recommendations` (subsumed by Landlord Overrides), ECO4 / WHLG funding integration (`get_funding_data` and `optimise_with_scenarios`' funding paths), the pre-recommendation kWh ML lambda (`KWH_MODEL_PREFIXES`), and floor-count / heat-loss-perimeter estimation from geospatial (now on `EpcPropertyData`). Address matching (`address2UPRN`) lives as a separate service, not inside `EpcClientService`.
 
 ### 9.1 Fetchers (called by `IngestionPipeline`)
 
 | # | Class | Purpose | Dependencies |
 |---|---|---|---|
-| F1 | `EpcClientService` | Fetches EPCs from new gov API. Already exists at `backend/epc_client/`. | httpx |
-| F2 | `GeospatialFetcher` | Fetches UPRN-range geospatial data (replaces `OpenUprnClient` use in current engine). | S3 / Ordnance Survey API |
+| F1 | `EpcClientService` | Fetches EPCs from new gov API. Already exists at `backend/epc_client/`. Scope narrows compared to current `SearchEpc` — address matching (`address2uprn`) and OS API estimation are not its concern. | httpx |
+| F2 | `GeospatialFetcher` | Fetches UPRN-range geospatial data. Replaces `OpenUprnClient`. **Floor count and heat-loss perimeter estimation are no longer needed** — both are now on `EpcPropertyData` directly (`number_of_storeys`, `SapFloorDimension.heat_loss_perimeter_m`). Scope reduces to building geometry and postcode-area context. | S3 / Ordnance Survey API |
 | F3 | `SolarFetcher` | Wraps Google Solar API; building-level + unit-level scenes. | Google Solar API |
 | F4 | `SiteNotesIngester` | Loads site notes from Excel uploads / structured input. Persists via `SiteNotesRepo`. | S3, repo |
+| F5 | `FuelRatesFetcher` | Scheduled ETL — scrapes Ofgem regional caps and per-fuel rates, writes timeseries rows to `FuelRatesRepo`. Manual CSV upload fallback for off-cycle corrections. | Ofgem feed, repo |
+| F6 | `CarbonFactorsFetcher` | Same shape as F5 against Defra's annual CO2 factor publication. | Defra feed, repo |
 
 ### 9.2 Domain services (called by `ModellingPipeline`)
 
@@ -413,13 +485,13 @@ Twelve classes implement the modelling pipeline end-to-end. Detailed signatures
 | S1 | `EpcRemappingService` | 4 | Re-map legacy / historical EPCs into new `EpcPropertyData` shape. | `EpcCacheRepo` | `EpcCacheRepo` (mapped column) |
 | S2 | `EpcPredictionService` | 3 | For every property: produce predicted EPC + per-field anomaly flags vs neighbours. Used both for gap-fill (Path 2 if EPC missing) and UI surfacing. | `EpcCacheRepo`, `GenericDataRepo` | — |
 | S3 | `FeatureBuilder` | (new) | Wraps `EpcMlTransform`. Converts `Properties` → scoring DataFrame. | — | — |
-| S4a | `EpcEnergyDerivationService` | (new) | Derives baseline kWh + fuel split + bills from the Effective EPC's energy fields (`energy_consumption_current`, `heating_cost_current`, `hot_water_cost_current`). Applies UCL-style correction for known EPC over/under-prediction, then deduces fuel type (gas/electric/other) for heating + hot water to split consumption. Deterministic, no ML. | — | — |
-| S4b | `RebaseliningService` | (new, partial overlap with old "rebaselining" logic) | When the Effective EPC's physical state differs from the originally lodged EPC (Site Notes or Landlord Overrides applied), calls SAP/carbon/heat ML lambdas to produce new baseline values. kWh under the new state is re-derived via `EpcEnergyDerivationService`, not ML. | `FeatureBuilder` | — |
-| S5 | `RecommendationService` | 6 | Generates per-property recommendations using `effective_epc`, materials, exclusions, etc. Replaces current `Recommendations` (1383 LOC). | `MaterialsRepo` | — |
-| S6 | `ImpactPredictionService` | 7 | Calls SAP / carbon / heat impact lambda for each recommendation. | `FeatureBuilder` | — |
-| S6b | `KwhImpactService` | 5 (partial) | Calls kWh ML lambda to predict the kWh delta per recommendation; used to compute bill savings on the optimised package. | `FeatureBuilder` | — |
-| S7 | `OptimiserService` | 8 | Produces optimised retrofit packages. Wraps current `CostOptimiser` / `GainOptimiser` / `optimise_with_scenarios`. | — | — |
-| S8 | `ResultsPersister` | 9 | Final step: writes plans, recommendations, property updates via repos under one UoW. | — | All write repos |
+| S4a | `EpcEnergyDerivationService` | (new) | Derives annual kWh + fuel split + bills from the Effective EPC. Deterministic, no ML. Pipeline: (1) source regulated PEUI — either from `energy_consumption_current × floor_area` when EPC field present and no physical override, or from SAP physics (heat demand × area + SAP hot-water + SAP lighting) for Site Notes / overridden cases; (2) add appliance + cooking via SAP Appendix L formulas (port of [`AnnualBillSavings.estimate_appliances_energy_use`](../../backend/ml_models/AnnualBillSavings.py)); (3) apply UCL per-band correction (Few et al. 2023, Table 3), keyed on the **post-state Effective EPC's band** — not the lodged band; (4) decompose total PEUI into end-use shares via SAP-physics proportions; (5) primary→delivered per fuel using SAP primary factors; (6) bills = delivered kWh per fuel × current rate from `FuelRatesRepo` + standing charges + SEG credits. CO2 emissions from `CarbonFactorsRepo`. | `FuelRatesRepo`, `CarbonFactorsRepo`, `HeatingSystemAssumptionsRepo` | — |
+| S4b | `RebaseliningService` | (new, partial overlap with old "rebaselining" logic) | Triggered by §6.4 conditions (pre-SAP10 schema **or** physical state changed). Calls SAP/carbon/heat ML lambdas to produce SAP10-equivalent baseline against the current Effective EPC state. Both `BaselinePerformance.lodged_*` and `effective_*` are populated downstream — pair is always stored, equal when not rebaselined. kWh is re-derived via S4a, not ML. | `FeatureBuilder` | — |
+| S5 | `RecommendationService` | 6 | Generates per-property recommendations against the current rolling Effective EPC. Invoked **once per (scenario × phase)** — filters candidates to the phase's `measure_types_allowed`, returns candidates eligible against the post-prior-phase state. Replaces current `Recommendations` (1383 LOC). | `MaterialsRepo` | — |
+| S6 | `ImpactPredictionService` | 7 | Calls SAP / carbon / heat impact ML lambda for **every** candidate recommendation (FE displays all options to user). Invoked per (scenario × phase) with the rolling state's feature vector. Recommendation kWh delta is derived deterministically from SAP delta + heating-system fuel/COP, not from a separate ML call. Battery impact uses the new EPC battery fields (`energy_pv_battery_count`, `energy_pv_battery_capacity`) as ML inputs — the deterministic `BatterySAPScorer` from the legacy engine is replaced by ML prediction. | `FeatureBuilder` | — |
+| S7 | `OptimiserService` | 8 | Per-phase optimisation against rolling state. Reads `PlanPhase.state_at_end[n-1]` to honour cross-phase constraints (fabric-first, heat-pump-needs-insulation, ventilation). Wraps current `CostOptimiser` / `GainOptimiser` / `optimise_with_scenarios` minus the dead ECO-funding paths. Unselected candidates roll into phase n+1's candidate pool (auto vs user-marked TBD, §15). | — | — |
+| S8 | `ValuationService` | — | Estimates per-property valuation (current + post-retrofit) from academic-paper-based regression on EPC change, property type, region. Improvement on the existing `PropertyValuation.estimate` code — exact shape deferred to per-service grill. | — | — |
+| S9 | `ResultsPersister` | 9 | Final step: writes Plan (with `phases[]`) + Recommendations + Property updates via repos under one UoW, per scenario. | — | All write repos |
 
 ### 9.3 Orchestrators
 
@@ -431,25 +503,42 @@ Twelve classes implement the modelling pipeline end-to-end. Detailed signatures
 
 ### 9.4 `ModellingPipeline` step order
 
-For each `Property` in the batch:
+For each `Property` in the batch, against each pinned `ScenarioSnapshot` from the trigger payload:
 
 ```
-1.  PropertyRepo.get()  →  Property (epc, site_notes, overrides, geospatial, solar)
-2.  EpcRemappingService — if epc is in legacy schema, upgrade to current
-3.  EpcPredictionService — produce predicted EPC + anomaly flags (always runs)
-4.  Compute Property.effective_epc (path-1 or path-2)
-5.  RebaseliningService — IF effective_epc differs from lodged EPC, re-predict SAP/carbon/heat via ML
-6.  EpcEnergyDerivationService — derive baseline kWh + fuel split + bills from the (possibly rebaselined) Effective EPC. No ML.
-7.  RecommendationService — generate candidate measures
-8.  ImpactPredictionService — predict per-measure SAP/carbon/heat impact (ML)
-9.  OptimiserService — select optimal package
-10. KwhImpactService — predict kWh + bill delta for the optimised package (ML)
-11. ResultsPersister — write Plan + Recommendations under one UoW
+Per-property setup (runs once regardless of scenario count):
+  1.  PropertyRepo.get()  →  Property (epc, site_notes, overrides, geospatial, solar)
+  2.  EpcRemappingService — if epc is in legacy schema, upgrade to current
+  3.  EpcPredictionService — predicted EPC + per-field anomaly flags (always runs)
+  4.  Compute Property.effective_epc (path-1 or path-2)
+  5.  RebaseliningService — IF §6.4 conditions hold (pre-SAP10 OR physical state changed),
+                            re-predict SAP/carbon/heat via ML against the Effective EPC state.
+                            Populate BaselinePerformance.lodged_* + effective_*.
+  6.  EpcEnergyDerivationService — SAP-physics + UCL (post-state band) + FuelRates → kWh, fuel split, bills.
+
+Per-scenario loop:
+  Per-phase loop (in scenario phase order):
+    7.  RecommendationService — generate candidate measures, restricted to phase's measure_types_allowed,
+                                 against the rolling Effective EPC state (baseline for phase 1; updated for phase 2+).
+    8.  ImpactPredictionService — predict SAP/carbon/heat impact for those candidates, ML scored against
+                                   the rolling state's feature vector. All candidates scored (FE shows options).
+    9.  OptimiserService — select package within phase budget + phase goal. Reads earlier-phase state to honour
+                            cross-phase constraints (fabric-first, heat-pump-needs-insulation, ventilation).
+    10. Apply package → roll state forward (simulate post-package SAP / kWh / bills via S4a + impact predictions
+                        from step 8). Record `PlanPhase.state_at_end`. Unselected options become
+                        `PlanPhase.rolled_over_options` and are eligible candidates next phase.
+  11. ResultsPersister — write Plan (phases[]) + Recommendations under one UoW for this scenario.
 ```
 
-Steps 1–4 are per-property. Steps 5, 8, 10 batch the whole batch into one ML call where possible (the lambdas accept a DataFrame; today's code already batches). Steps 6 and 7 are deterministic per-property.
+Steps 1–6 run **once per property** regardless of scenario count.
+Steps 7–10 run **once per (scenario × phase)** per property.
+Step 11 runs once per scenario per property.
 
-Note vs the current `model_engine`: the **pre-recommendation** kWh ML call has been removed. Baseline kWh now comes from the Effective EPC directly (the new gov EPC API exposes `energy_consumption_current` and per-end-use cost fields). ML is reserved for **post-recommendation impact prediction** only.
+Batching: steps 5, 8 batch the whole batch into one ML call where possible. Step 8's cost scales with `N_phases × N_scenarios × N_candidate_measures`; multi-phase pays its own ML bill, single-phase scenarios cost the same as today.
+
+Note vs the current `model_engine`: the **pre-recommendation** kWh ML call has been removed. Baseline kWh now comes from `EpcEnergyDerivationService` (SAP physics + UCL + FuelRates). ML is reserved for SAP/carbon/heat (rebaselining + impact prediction). Recommendation-level kWh delta is derived deterministically from the impact-predicted SAP delta plus heating-system fuel + COP from `HeatingSystemAssumptionsRepo`; no separate kWh ML lambda.
+
+**Open future change** (flagged §15): SAP-impact-of-a-measure is not strictly additive — installing measure A changes the SAP impact of measure B. The current per-measure ML scoring + linear optimisation approximates this. A future iteration may pre-define candidate packages and ML-score whole packages, accepting the combinatorial cost in return for accuracy. Defer until implementation reveals where the approximation hurts.
 
 ### 9.5 Per-service contracts — deferred
 
@@ -467,85 +556,113 @@ Method signatures, return types, error semantics, and edge-case behaviour are **
 
 ---
 
-## 11. Directory layout
+## 11. Repository layout — monorepo via uv workspaces
 
-Proposal — team to tweak.
+The repo is restructured as a Python monorepo using **uv workspaces**. Shared types and shared infra live as workspace packages under `packages/`; each deployable Lambda or microservice lives as its own package under `services/`. Each `services/<svc>/` has its own `pyproject.toml`, `Dockerfile`, and Lambda image — the bundle contains only that service's deps + its workspace deps, keeping cold-start size and package weight contained.
 
 ```
-ara/                                  # new top-level package, sibling of backend/
-├── domain/
-│   ├── __init__.py
-│   ├── property.py                   # Property aggregate
-│   ├── properties.py                 # Properties collection
-│   ├── identity.py                   # PropertyIdentity, AddressLines
-│   ├── site_notes.py                 # SiteNotes (replaces energy_assessment)
-│   ├── landlord_overrides.py
-│   ├── geospatial.py
-│   ├── solar.py
-│   ├── recommendations.py            # Recommendation, OptimisedPackage
-│   ├── predictions.py                # BaselinePredictions, ImpactPredictions
-│   ├── anomaly_flags.py              # EpcAnomalyFlags
-│   └── ml/
-│       ├── __init__.py
-│       ├── transform.py              # EpcMlTransform (versioned)
-│       └── schema.py                 # scoring DataFrame schema
+/
+├── pyproject.toml                      # workspace root
+├── uv.lock
 │
-├── fetchers/
-│   ├── __init__.py
-│   ├── epc_client.py                 # alias / re-export of backend/epc_client/
-│   ├── geospatial.py
-│   ├── solar.py
-│   └── site_notes_ingester.py
+├── packages/                           # shared workspace packages — imported by services/
+│   ├── domain/                         # "domna-domain"
+│   │   ├── pyproject.toml
+│   │   └── src/domain/
+│   │       ├── property.py             # Property, Properties, PropertyIdentity
+│   │       ├── site_notes.py
+│   │       ├── landlord_overrides.py
+│   │       ├── baseline_performance.py # lodged + effective pair
+│   │       ├── plan.py                 # Plan, PlanPhase, OptimisedPackage
+│   │       ├── scenario.py             # Scenario, ScenarioPhase, ScenarioSnapshot
+│   │       ├── recommendation.py
+│   │       ├── geospatial.py
+│   │       ├── solar.py
+│   │       ├── anomaly_flags.py
+│   │       └── ml/
+│   │           ├── transform.py        # EpcMlTransform (versioned)
+│   │           └── schema.py
+│   │
+│   ├── repos/                          # "domna-repos" — persistence, no business logic
+│   │   ├── pyproject.toml
+│   │   └── src/repos/
+│   │       ├── unit_of_work.py
+│   │       ├── property_repo.py
+│   │       ├── epc_cache_repo.py
+│   │       ├── site_notes_repo.py
+│   │       ├── landlord_overrides_repo.py
+│   │       ├── recommendations_repo.py
+│   │       ├── generic_data_repo.py
+│   │       ├── fuel_rates_repo.py
+│   │       ├── carbon_factors_repo.py
+│   │       ├── heating_system_assumptions_repo.py
+│   │       └── subtask_repo.py
+│   │
+│   ├── fetchers/                       # "domna-fetchers" — external API clients
+│   │   ├── pyproject.toml
+│   │   └── src/fetchers/
+│   │       ├── epc_client.py           # wraps backend/epc_client/
+│   │       ├── geospatial.py
+│   │       ├── solar.py
+│   │       ├── fuel_rates_fetcher.py
+│   │       └── carbon_factors_fetcher.py
+│   │
+│   └── utils/                          # "domna-utils" — logging, AWS, S3, cloudwatch, subtasks
+│       ├── pyproject.toml
+│       └── src/utils/
 │
-├── repos/
-│   ├── __init__.py
-│   ├── unit_of_work.py
-│   ├── property_repo.py
-│   ├── epc_cache_repo.py
-│   ├── site_notes_repo.py
-│   ├── landlord_overrides_repo.py
-│   ├── recommendations_repo.py
-│   ├── generic_data_repo.py
-│   └── subtask_repo.py
+├── services/                           # deployable units, one Lambda image each
+│   ├── ara/                            # the modelling backend
+│   │   ├── pyproject.toml              # deps: domna-domain, domna-repos, domna-fetchers, domna-utils, ML libs
+│   │   ├── Dockerfile
+│   │   ├── src/ara/
+│   │   │   ├── services/               # EpcRemappingService, EpcPredictionService,
+│   │   │   │                           # EpcEnergyDerivationService, RebaseliningService,
+│   │   │   │                           # FeatureBuilder, RecommendationService,
+│   │   │   │                           # ImpactPredictionService, OptimiserService,
+│   │   │   │                           # ValuationService, ResultsPersister
+│   │   │   ├── orchestrators/          # IngestionPipeline, ModellingPipeline, RefreshOrchestrator
+│   │   │   └── lambdas/                # handler.py per Lambda + event-shape contracts
+│   │   └── tests/
+│   │       ├── fakes/                  # FakePropertyRepo, FakeEpcClient, etc.
+│   │       ├── unit/                   # service tests using fakes only
+│   │       └── integration/            # real DB + real SQS via localstack
+│   │
+│   ├── address2uprn/                   # messy-address → UPRN matching, pre-modelling step
+│   │   ├── pyproject.toml
+│   │   ├── Dockerfile
+│   │   └── src/address2uprn/
+│   ├── hubspot/                        # existing Hubspot ETL
+│   ├── pashub/                         # PasHub survey ingestion
+│   ├── ecmk/                           # ECMK assessment ingestion
+│   └── magicplan/                      # MagicPlan integration
 │
-├── services/
-│   ├── __init__.py
-│   ├── epc_remapping.py
-│   ├── epc_prediction.py             # nearby-similar + anomaly flags
-│   ├── feature_builder.py            # uses domain.ml.EpcMlTransform
-│   ├── kwh_prediction.py
-│   ├── impact_prediction.py
-│   ├── recommendation.py
-│   ├── optimiser.py                  # wraps recommendations/optimiser/
-│   └── results_persister.py
+├── backend/                            # legacy FastAPI app + microservices, kept until cut-over
+│   ├── app/                            # FastAPI; thin entrypoints that invoke service Lambdas
+│   └── ...                             # legacy engine, SearchEpc, etc.; deleted after cut-over
 │
-├── orchestrators/
-│   ├── __init__.py
-│   ├── ingestion_pipeline.py
-│   ├── modelling_pipeline.py
-│   └── refresh_orchestrator.py
-│
-├── api/
-│   ├── __init__.py
-│   ├── routers/
-│   │   ├── ingestion.py              # if two APIs
-│   │   └── modelling.py
-│   └── schemas/                      # request/response Pydantic models
-│
-└── tests/
-    ├── fakes/                        # FakePropertyRepo, FakeEpcClient, etc.
-    ├── unit/                         # service tests using fakes only
-    └── integration/                  # real DB + real SQS via localstack
+├── datatypes/                          # existing — EPC schemas; eventually folds into packages/domain/
+└── docs/
+    └── adr/                            # architectural decision records
 ```
 
-`backend/` continues to host the legacy code until the new pipeline is live. Once `model_engine` is no longer serving any traffic, `backend/engine/`, `backend/SearchEpc.py`, and the legacy `backend/Property.py` are deleted.
+**Boundary properties** (enforced by package structure, not convention):
+- A `services/<svc>/` package can `import domain.*`, `import repos.*`, `import fetchers.*`, `import utils.*`. It **cannot** import another service's modules — they're separate distributions with no cross-import path.
+- ADR-0003 (Ingestion / Modelling separation) is preserved: modelling services in `services/ara/src/ara/services/` depend only on `repos.*` + `domain.*`, never on fetchers. Orchestrators are the only place fetchers and services meet.
 
-Reused intact (no rewrite needed):
+**Migration** (incremental, not big-bang):
+1. Carve out `packages/domain/` first — fold `datatypes/epc/domain/` + the new aggregate types into it.
+2. Carve out `packages/utils/` from current `utils/` + `backend/utils/`.
+3. Carve out `packages/repos/` and `packages/fetchers/` once `services/ara/` is being built and needs them.
+4. `services/ara/` is greenfield — no legacy code lives in it.
+5. `services/address2uprn/`, `services/pashub/`, etc. are split out as their owners pick them up.
+6. `backend/` shrinks to the FastAPI entrypoint layer once everything else has moved.
 
-- `backend/epc_client/` — the new gov API client. Wrapped by `ara/fetchers/epc_client.py`.
-- `datatypes/epc/domain/` — the new EPC schema. `Property.epc: EpcPropertyData` references it directly.
-- `recommendations/optimiser/` — wrapped by `ara/services/optimiser.py`.
-- `backend/app/db/` — repos delegate into `db_funcs.*` until the SQL is rewritten under sub-PRD (iii).
+**Reused intact** (no rewrite needed at carve-out time):
+- `backend/epc_client/` → folds into `packages/fetchers/src/fetchers/epc_client.py`.
+- `datatypes/epc/domain/` → folds into `packages/domain/src/domain/epc/`.
+- `recommendations/optimiser/` → wrapped by `services/ara/src/ara/services/optimiser.py`.
+- `backend/app/db/` → repos delegate into `db_funcs.*` until SQL is rewritten under sub-PRD (iii).
 
 ---
 
@@ -625,7 +742,7 @@ Total external calls: zero. The override write is the only thing that hit a netw
 
 ## 15. Open questions for team review
 
-1. **One API vs two** (§4.5) — clean interfaces allow either; pick at implementation.
+1. **One endpoint vs two** (§4.5) — **resolved**: single endpoint for Phase 1; split later when a real workflow demands it.
 2. **`LandlordOverrides` shape** (§6.2) — flat-Excel-shape for v1, with a flag to revisit after first customer.
 3. **`already_installed` and `non_invasive_recommendations`** (§6.5) — both likely subsumed by overlay, but final call deferred.
 4. **Recency tie-break policy** (§6.3) — default "newer wins"; team to consider per-portfolio override.
@@ -633,9 +750,14 @@ Total external calls: zero. The override write is the only thing that hit a netw
 6. **Soft-archive vs hard-overwrite** for superseded plans (§14) — affects audit / undo behaviour. Defer to sub-PRD (iii).
 7. **Building-level optimisation as a Phase 2 service** (§10) — agreed deferred; flag for roadmap discussion.
 8. **Transform versioning policy** (§8.3) — semver chosen; team to confirm bump conventions.
-9. **UCL EPC-correction model** (§9.2 S4a) — need the reference paper, the implementation we've used before, and a decision on whether to port directly or re-implement against the new EPC schema.
-10. **Fuel-price source for bill calculation** (§9.2 S4a) — Ofgem caps? Time-varying? Per-portfolio override? Decide alongside `EpcEnergyDerivationService` design.
-11. **kWh handling under Rebaselining** (§9.4 step 5) — confirmed: ML re-predicts SAP/carbon/heat only; `EpcEnergyDerivationService` re-runs for kWh. Validate that this is sufficient when overrides change heating fuel type (which would shift the fuel deduction).
+9. **UCL EPC-correction model** (§9.2 S4a) — **resolved**: Few et al. 2023 (Energy & Buildings 288, 113024). Implementation pattern already in [`AnnualBillSavings.adjust_energy_to_metered`](../../backend/ml_models/AnnualBillSavings.py) — port the per-band gradients/intercepts (Table 3) into `EpcEnergyDerivationService`, keyed on the post-state Effective EPC band.
+10. **Fuel-price source for bill calculation** (§9.2 S4a) — **resolved**: `FuelRatesRepo` is a time-versioned, region-aware table; ETL by `FuelRatesFetcher` (Ofgem feed + manual upload fallback). Per-portfolio override deferred to v2 — confirm whether Calico / Hyde have bulk-buy contracts before first onboarding.
+11. **kWh handling under Rebaselining** (§9.4) — **resolved**: ML re-predicts SAP/carbon/heat only; `EpcEnergyDerivationService` re-derives kWh from the rebaselined Effective EPC. Heating-fuel-type change is handled naturally because S4a re-reads heating fields from the Effective EPC.
+12. **Phase rollover semantics** (§9.2 S7) — when a candidate measure isn't selected in phase n, does it auto-roll into phase n+1's candidate pool, or does the user mark which measure types can roll? Auto is simpler; user-marked is more flexible. Decide at scenario-builder UX time.
+13. **Package-level vs per-measure ML scoring** (§9.4) — SAP impact of a measure is not strictly additive; the current per-measure scoring + linear optimisation approximates this. A future iteration may pre-define candidate packages and ML-score whole packages. Defer until per-service grill on `OptimiserService`.
+14. **UCL extrapolation scope** (§9.2 S4a) — the Few et al. paper is gas-heated, no PV, England + Wales only. Current legacy code applies the correction to all properties regardless. Keep silent extrapolation for v1, or stratify (no correction for non-gas / PV) and surface uncertainty to FE? Defer to per-service grill.
+15. **`ValuationService` rebuild** (§9.2 S8) — existing `PropertyValuation.estimate` cites several papers; the rebuild should improve the regression. Shape deferred to per-service grill.
+16. **Battery-via-ML cutover** (§9.2 S6) — confirm the new ML model is trained against `energy_pv_battery_count` + `energy_pv_battery_capacity` and the legacy `BatterySAPScorer` can be retired without regression for battery-equipped properties.
 
 ---