docs(modelling): #1157 Plan-persistence design review

Outcome of the /grill-with-docs session scoping #1157. - CONTEXT.md: add **Plan Measure** (the persisted selected Option + role-3 attribution + cost); Recommendation stays the candidate. Remove Scenario Phase / Plan Phase / Rolled-over Options — multi-phase is deferred. Reshape Scenario + Plan to single-phase; fix relationships, dialogue, and the "phase" ambiguity note. - ADR-0005: rewritten to Deferred (multi-phase was speculative prospective-client work; single-phase now; future plan_phase back-fill path preserved). Stray phase refs cleaned in ADR-0016 / ADR-0009. - ADR-0017 (new): Plan persistence — reuse the live plan/recommendation tables via SQLModel mirrors + a PlanRepository on the UoW; add recommendation.plan_id, retire the plan_recommendations m2m; flat post-retrofit on plan; idempotent replace; CO2 in tonnes. Unselected alternatives + bills noted as deferred directions. - docs/migrations/recommendation-plan-id.md: the FE-owned Drizzle change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-27 23:35:01 +00:00 · 2026-06-03 11:12:54 +00:00 · 2026-06-03 11:12:54 +00:00 · 772cdd4f5a
commit 772cdd4f5a
parent cc0bb8f9bb
6 changed files with 101 additions and 36 deletions
--- a/CONTEXT.md
+++ b/CONTEXT.md
@ -114,7 +114,7 @@ The subset of corpus certs used to validate **SAP10 Calculation** against **Lodg
 _Avoid_: parity cohort, validation set, corpus sample

 **Measure Application**:
-The process that translates an Optimised Package into cert-field changes and produces the "ending state snapshot" EpcPropertyData that Plan Phase persists. Implemented by the `MeasureApplicator` service class in `domain/sap/` (or a sibling package). Each Measure Type's translation rules (e.g. `loft_insulation` → `roof_insulation_thickness_mm = 270mm`, `ashp` → `main_heating_details[0]` replacement) live here. Pure function — does not run SAP10 Calculation itself; the caller chains `MeasureApplicator.apply(epc, package) → Sap10Calculator.calculate(post_epc)`. ADR-0009.
+The process that translates an Optimised Package into cert-field changes and produces the "ending state snapshot" EpcPropertyData that the **Plan** persists. Implemented by the `MeasureApplicator` service class in `domain/sap/` (or a sibling package). Each Measure Type's translation rules (e.g. `loft_insulation` → `roof_insulation_thickness_mm = 270mm`, `ashp` → `main_heating_details[0]` replacement) live here. Pure function — does not run SAP10 Calculation itself; the caller chains `MeasureApplicator.apply(epc, package) → Sap10Calculator.calculate(post_epc)`. ADR-0009.
 _Avoid_: measure overrides (rejected during ADR-0009 grill — phantom mid-layer), package applier, retrofit simulator

 **Bill Derivation**:
@ -142,7 +142,7 @@ The second stage. Reads the persisted source data from repos, hydrates the **Pro
 _Avoid_: rebaseline (that is a specific ML trigger — see Rebaselining), enrichment

 **Modelling** (stage):
-The third stage. Takes the baselined Property plus a set of **Scenarios** and produces **Recommendations** → an **Optimised Package** per **Scenario Phase** → **Plans**, persisted to repos. A separate orchestrator from Baseline so the single-property flow can stop after Baseline and only run Modelling when the user hits "play".
+The third stage. Takes the baselined Property plus a set of **Scenarios** and produces **Recommendations** → an **Optimised Package** → **Plans**, persisted to repos. A separate orchestrator from Baseline so the single-property flow can stop after Baseline and only run Modelling when the user hits "play".
 _Avoid_: scoring (overloaded), recommendation engine

 **First Run**:
@ -184,32 +184,24 @@ _Avoid_: emission factors (ambiguous), CO2 rates
 ### Outputs

 **Scenario**:
-A named portfolio-level retrofit plan, built by a user in the scenario-builder UI and persisted before any modelling fires; carries the overall goal (e.g. Increasing EPC), budget, exclusions, housing type, and an ordered list of Scenario Phases. The model is triggered against one or more Scenarios at once; each Scenario yields one Plan per Property.
+A named portfolio-level retrofit plan, built by a user in the scenario-builder UI and persisted before any modelling fires; carries the overall goal (e.g. Increasing EPC), budget, exclusions, housing type, and the set of measure types it permits. The model is triggered against one or more Scenarios at once; each Scenario yields one Plan per Property.
 _Avoid_: project, batch, run-set

-**Scenario Phase**:
-One ordered step inside a Scenario, carrying a measure-type allowlist (e.g. "loft insulation and walls in phase 1; ASHP in phase 2"), an optional phase budget, and an optional phase target. A single-phase Scenario is one Scenario Phase with all measure types allowed and the full budget on it — there is no special-case path.
-_Avoid_: scenario stage, scenario step, tranche
-
 **Scenario Snapshot**:
 A frozen copy of a Scenario pinned at trigger time, keyed by (task, scenario); used by the modelling pipeline so mid-run edits to the live Scenario do not affect an in-flight job. Snapshots are read-only and may be garbage-collected after the task completes.
 _Avoid_: scenario version, frozen scenario, pinned scenario

 **Plan**:
-The per-Property output of one Scenario's modelling run; carries an ordered list of Plan Phases matching the Scenario's Phase shape. A Property modelled against N Scenarios in one trigger ends up with N Plans.
+The per-Property output of one Scenario's modelling run; carries the **Optimised Package** selected for the Property (its **Plan Measures**) and the Property's post-retrofit figures (SAP / kWh / CO₂ / bills). A Property modelled against N Scenarios in one trigger ends up with N Plans.
 _Avoid_: recommendation set, output, result

-**Plan Phase**:
-The per-Property output of one Scenario Phase: the Optimised Package selected for that phase, the ending state snapshot (the Property's SAP / kWh / bills after the package is applied), and any Rolled-over Options that flow as candidates into the next Plan Phase.
-_Avoid_: plan stage, plan step
-
-**Rolled-over Options**:
-Recommendations generated but not selected by the Optimiser in a given Plan Phase, that remain eligible as candidates in subsequent Plan Phases. Exact roll-over rule (automatic vs user-marked) is under design.
-_Avoid_: deferred measures, leftover recommendations
+**Plan Measure**:
+One selected **Measure Option** as persisted inside a **Plan** — the single Option the Optimiser kept for a given **Recommendation**, recorded with its installed **Cost** and its **final-package (role-3) attributed impact** (the SAP points and CO₂ / energy savings that telescope exactly to the Plan's package total, per ADR-0016). It is the *output* counterpart to a Recommendation's *candidate* Option: a Recommendation proposes mutually-exclusive Options carrying no stored impact, whereas a Plan Measure is the one that was chosen with its truthful attributed impact frozen in. The persisted set of a Plan's Plan Measures **is** its Optimised Package.
+_Avoid_: recommendation (that is the candidate — never persist an output as a Recommendation), installed measure, selected measure (that names the package, not the line), plan item, plan recommendation

 **Recommendation**:
 The finding that a Property needs work on a given **target surface** — a building part (the MAIN wall, an extension roof…) or a system (heating + hot water + controls, treated as one). Carries one or more mutually-exclusive **Measure Options**; the Optimiser selects at most one. The target itself is encoded in each Option's **Simulation Overlay** (which addresses a building part, a specific window, or a system) — never as a typed key on the Recommendation, so the type stays stable as new surfaces land. Recommendations **partition** the modifiable surface of EpcPropertyData: no two Recommendations write the same field of the same target, so selected Options never collide. Exclusivity between competing treatments (cavity-fill vs EWI; a boiler bundle vs an ASHP) is captured *within* one Recommendation, never across them.
-_Avoid_: suggestion, recommendation engine, keying by measure type (a Recommendation can span measure types — e.g. a heating + hot-water bundle)
+_Avoid_: suggestion, recommendation engine, keying by measure type (a Recommendation can span measure types — e.g. a heating + hot-water bundle), the persisted selected-measure output line (that is a **Plan Measure**, which carries impact; a Recommendation never does)

 **Measure Option**:
 One mutually-exclusive way to satisfy a **Recommendation** — possibly a **bundle** of sub-measures (e.g. "new condensing boiler + cylinder insulation"), possibly a single intervention at a chosen size/product (a 4 kWp PV array of product X). Carries its total cost and a **Simulation Overlay** for its combined effect on the target surface. Cost is intrinsic to the Option; SAP / kWh / carbon impact is **not** — impact is cascade-conditional (depends on what is already installed) and is produced by scoring, never stored on the Option. Two Options under one Recommendation may share an identical Simulation Overlay (differing only on cost/product) or differ (e.g. PV kWp), so scoring runs per distinct Overlay.
@ -302,8 +294,7 @@ _Avoid_: API key, auth token, secret
 - **Rebaselining** produces **Effective Performance** by ML re-prediction across SAP score, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh, when either (a) the Effective EPC was lodged under a pre-SAP10 schema, or (b) the Effective EPC's physical state diverges from the lodged EPC. **Lodged Performance** is never overwritten.
 - **Bill Derivation** derives **fuel split** and **bills** from kWh values (sourced from the EPC's `renewable_heat_incentive` fields for baseline SAP10 properties, or from ML when Rebaselining fires), reading current **Fuel Rates** and **Carbon Factors** from their respective repos.
 - The **EPC Prediction Service** uses **Comparable Properties** for both gap-filling and producing **EPC Anomaly Flags**.
- A **Scenario** carries one or more ordered **Scenario Phases**. Triggering the model against N Scenarios produces N **Plans** per Property; each Plan carries an ordered list of **Plan Phases** matching the Scenario's shape.
- Each **Plan Phase** holds its **Optimised Package**, the ending state snapshot, and any **Rolled-over Options** that flow as candidates into the next Plan Phase. A single-phase Scenario is one Scenario Phase with all measure types allowed; the same machinery handles it.
+- Triggering the model against N **Scenarios** produces N **Plans** per Property. Each **Plan** holds one **Optimised Package** — its selected **Plan Measures** — plus the Property's post-retrofit figures.
 - A **Scenario Snapshot** is pinned at trigger time per (task, scenario) so mid-run edits to the live Scenario do not affect an in-flight modelling job.
 - A **Recommendation** references one **Measure Type** and carries property-specific cost and impact.
 - **Address Matching** uses a **User Address** and **Postcode** to find a **UPRN** by scoring **UPRN Candidates** from an EPC search. A **Lexirank** of 1 with no **Ambiguous Match** and a **Lexiscore** ≥ the **Score Threshold** produces a **Best Match**.
@ -330,10 +321,6 @@ _Avoid_: API key, auth token, secret
 >
 > **Domain expert:** "Those are **Lodged Performance** and **Effective Performance**. **Lodged** is what the gov register says — the EPC was rated under SAP 2012. **Effective** is what we scored against — we ran **Rebaselining** to predict the SAP10-equivalent rating because the methodology changed. Both stay on the **Baseline Performance** so users can see what's on record and what we're modelling against."

-> **Dev:** "A landlord wants a 3-year retrofit plan — fabric work this year, heat pump next, solar after. How do we model that?"
->
-> **Domain expert:** "Three **Scenario Phases** in one **Scenario**. Phase 1 allows fabric measures with this year's budget, phase 2 allows the heat pump with next year's budget, phase 3 allows solar. When we model, the **Optimiser Service** runs per phase against the rolling state — the heat pump is scored against the post-insulation property, not the original one. Each **Plan Phase** captures the **Optimised Package** plus the ending SAP / bills, and any **Rolled-over Options** that didn't make this phase's budget become candidates next phase."
-
 ## Flagged ambiguities

 - **"property"** was historically warned against in favour of "dwelling"; that has been inverted. **Property** is now canonical for the Ara domain aggregate. Legacy code still uses "dwelling" in places — treat as alias.
@ -345,5 +332,5 @@ _Avoid_: API key, auth token, secret
 - **"user_inputed_address"** in `backend/address2UPRN/main.py` is a misspelling and a synonym for **User Address** — the canonical term. New code should use `user_address`.
 - **"EPC"** is overloaded as both the document and the rating band letter. Use **EPC** for the document, **EPC Band** for the letter.
 - **"re-scoring"** has two meanings in the codebase — **Rebaselining** (re-predicting baseline performance after an EPC change) and post-optimisation measure re-prediction. Prefer **Rebaselining** for the former; for the latter, the **Optimiser Service** step does its own scoring without a special name.
- **"phase"** appears in two unrelated contexts: as cut-over timeline language in the PRD ("Phase 0 — Status quo", "Phase 1 — Forced cut-over") and as a domain concept in **Scenario Phase** / **Plan Phase**. Only the latter is a glossary term; cut-over phases are project-management vocabulary that does not enter code.
+- **"phase"** (sequencing measures into ordered steps within a Scenario/Plan) was a speculative, prospective-client feature and is **deferred — out of scope** (see ADR-0005). It is *not* a current domain term: a **Scenario** carries one set of measures, a **Plan** one **Optimised Package**. The only live use of "phase" is cut-over timeline language in the PRD ("Phase 0 — Status quo"), which is project-management vocabulary and does not enter code.
 - **"stale"** appears in two senses: cache-freshness ("a Repo record is stale and the orchestrator should refetch") — a legitimate operational concept; and as loose shorthand for the EPC's recorded cost fields being unusable. The cost fields are not stale — they are pinned to the inspection-date fuel rates by design. Use "pinned to inspection date" or "pre-SAP10 schema" (whichever applies) instead.
--- a/docs/adr/0005-multi-phase-scenarios-per-phase-recompute.md
+++ b/docs/adr/0005-multi-phase-scenarios-per-phase-recompute.md
@ -1,14 +1,31 @@
-# Multi-phase scenarios with per-phase recompute against rolling state
+# Multi-phase scenarios — deferred (speculative)

-The Scenario aggregate becomes ordered phases: each phase has a measure-type allowlist, an optional budget, and an optional goal. The `ModellingPipeline` walks the phases in order; for each phase it (1) generates candidate recommendations restricted to the phase's measure types, (2) re-runs `ImpactPredictionService` against the **rolling** Effective EPC state (baseline for phase 1; post-phase-1 for phase 2; etc.), (3) optimises within the phase's budget/goal, (4) applies the selected package and rolls the state forward. We considered scoring all measures once against the baseline and slicing the scored list by phase, and rejected that.
+**Status: Deferred / Out of scope.** Superseded by the single-phase decision taken in a `/grill-with-docs` session (2026-06-03) while scoping the #1157 Plan persistence schema. This ADR previously proposed an *Accepted* multi-phase Scenario aggregate with per-phase recompute against rolling state; that design is **not** being built now. The original proposal is preserved below under "Deferred design" for the day the requirement returns.

-Per-phase recompute makes phase ordering load-bearing in the optimisation, not decorative. Installing fabric measures before a heat pump materially changes the heat pump's SAP impact; a single-pass-against-baseline pipeline forces that fact into the optimiser as a hard rule rather than a derived effect, and any cross-measure interaction we don't know to encode becomes silent error. The cost is ML calls scaling with `N_phases × N_scenarios × N_candidate_measures` per property — multi-phase scenarios pay their own ML bill, single-phase scenarios cost the same as today (the loop body runs once).
+## Why deferred

-A single-phase Scenario is `phases: [<one ScenarioPhase>]` with all measure types allowed and the full budget on it. There is no special-case path for single-phase — the pipeline always loops. This avoids two code paths and lets the FE evolve from single-phase to multi-phase without rewiring the backend.
+Multi-phase sequencing — letting a user split a Scenario into ordered phases ("fabric this year, heat pump next, solar after"), each with its own measure allowlist / budget / target, and producing Plans shaped to match — came from a **prospective (not current) client**. It is entirely speculative: we may never build it. Baking it into the core domain as an accepted decision made the model "too strong" — it forced a first-class **Scenario Phase** / **Plan Phase** / **Rolled-over Options** vocabulary and a `plan_phase` table into a live product that has no consumer for any of it.

-## Consequences
+The current goal is to **replicate and improve the existing pipeline**, which is single-phase. So:

- `Plan` carries `phases: list[PlanPhase]` rather than a flat `OptimisedPackage`. Every consumer of plan output (FE, exports, downstream reports) reads phases.
- The optimiser must accept rolling-state input rather than only baseline state — a generalisation of today's single-shot pass.
- ML cost can be controlled at the scenario layer: keeping a scenario single-phase is the lever for "score once, optimise once" if cost becomes a problem.
- ~~Open future change: SAP impact of a measure is not strictly additive even within a phase. The current per-measure scoring + linear optimisation approximates this. A future iteration may pre-define candidate packages and ML-score whole packages, accepting combinatorial cost for accuracy. Track in PRD §15.~~ **Resolved by [ADR-0016](0016-package-rescore-over-warm-start-optimisation.md):** the answer is not package enumeration but warm-start MILP on independent-vs-baseline scores → deterministic-calculator package re-score → greedy repair, which sidesteps the cross-product.
+- A **Scenario** carries one set of permitted measure types (no ordered phases).
+- A **Plan** holds one **Optimised Package** of **Plan Measures** plus the Property's flat post-retrofit figures (the legacy `plan` columns). There is no `plan_phase` table and no `phase` column.
+- The terms **Scenario Phase**, **Plan Phase**, and **Rolled-over Options** are removed from `CONTEXT.md`.
+
+This is cheap to reverse: re-introducing phases is additive, and the [ADR-0016](0016-package-rescore-over-warm-start-optimisation.md) scoring split (per-Option signal → whole-package re-score → marginal-cascade attribution) already works against a single package and generalises to per-phase rolling state unchanged.
+
+## Future migration path (when/if the requirement returns)
+
+Scope it properly as a feature in its own right — do **not** retrofit it implicitly. The migration shape we expect:
+
+1. Add a `plan_phase` table; give each existing live **Plan** exactly one Plan Phase and back-fill its current Optimised Package + post-retrofit figures into that single phase.
+2. Add ordered phases to the **Scenario** aggregate (allowlist / budget / target per phase).
+3. Generalise the Optimiser to run per phase against the **rolling** Effective EPC (phase 1 = baseline; phase 2 = post-phase-1 state; …), so phase ordering becomes load-bearing in the optimisation rather than decorative.
+
+This back-fill keeps every live single-phase Plan valid as a degenerate one-phase case.
+
+## Deferred design (original proposal, for reference)
+
+The Scenario aggregate becomes ordered phases: each phase has a measure-type allowlist, an optional budget, and an optional goal. The pipeline walks the phases in order; for each phase it (1) generates candidate recommendations restricted to the phase's measure types, (2) re-runs scoring against the **rolling** Effective EPC state (baseline for phase 1; post-phase-1 for phase 2; etc.), (3) optimises within the phase's budget/goal, (4) applies the selected package and rolls the state forward.
+
+The rationale was that per-phase recompute makes phase ordering load-bearing in the optimisation, not decorative: installing fabric measures before a heat pump materially changes the heat pump's SAP impact. The cost is ML/calculator calls scaling with `N_phases × N_scenarios × N_candidate_measures` per property. A single-phase Scenario was modelled as `phases: [<one ScenarioPhase>]` with all measure types allowed — i.e. exactly the single-phase product we are now building directly, without the phase machinery.
--- a/docs/adr/0009-deterministic-sap-calculator.md
+++ b/docs/adr/0009-deterministic-sap-calculator.md
@ -15,7 +15,7 @@ Seven open questions resolved through a `/grill-with-docs` session before Sessio
 | 4 | Living-area fraction default | **RdSAP 10 Table 27** — direct lookup from `habitable_rooms_count`. Unambiguous, one-line table. |
 | 5 | Secondary-heating allocation | **SAP 10.2/10.3 Table 11** keyed on main heating type. RdSAP doesn't redefine the fraction — it identifies the type only. Forcing rule: when main is micro-CHP and Table N9 says non-zero secondary heat with no secondary specified, assume portable electric heaters. |
 | 6 | Validation cohort | **Stratified random of 1000 certs**; report MAE per stratum. Session A success criterion = MAE ≤ 1.0 SAP-point on the **typical subset** (excluding sap_score ≤ 5, sap_score ≥ 100, multi-heating, conservatory, RIR). Global MAE reported alongside for honesty. |
-| 7 | `MeasureOverrides` shape | **Rejected as phantom mid-layer.** `Sap10Calculator.calculate(epc) -> SapResult` takes a single immutable cert. A separate **MeasureApplicator** service translates Optimised Package → cert-field changes, returning the "ending state snapshot" EpcPropertyData that Plan Phase already persists. Three pure functions in chain: applicator → calculator → result. |
+| 7 | `MeasureOverrides` shape | **Rejected as phantom mid-layer.** `Sap10Calculator.calculate(epc) -> SapResult` takes a single immutable cert. A separate **MeasureApplicator** service translates Optimised Package → cert-field changes, returning the "ending state snapshot" EpcPropertyData the **Plan** persists. Three pure functions in chain: applicator → calculator → result. |

 ## Additional findings from the grill that change Session A scope

--- a/docs/adr/0016-package-rescore-over-warm-start-optimisation.md
+++ b/docs/adr/0016-package-rescore-over-warm-start-optimisation.md
@ -1,6 +1,6 @@
 # Package re-scoring over warm-start optimisation, not marginal cascade or full enumeration

-Modelling scores each **Measure Option** once, **independently against the baseline** Effective EPC (deduplicated per distinct **Simulation Overlay**, so identical overlays are scored once). It runs a grouped-knapsack MILP over those per-Option scores to get a *candidate* package, injects any forced **Measure Dependencies** (e.g. ventilation) into that package, composes the selected + injected overlays into one throwaway `EpcPropertyData`, and **re-scores the whole package on the deterministic SAP10 calculator** for the truthful figure. If the true package SAP undershoots the phase goal, it **greedy-adds** the unselected Option with the best residual SAP-per-£ and re-scores, repeating until the target is met or the budget is exhausted.
+Modelling scores each **Measure Option** once, **independently against the baseline** Effective EPC (deduplicated per distinct **Simulation Overlay**, so identical overlays are scored once). It runs a grouped-knapsack MILP over those per-Option scores to get a *candidate* package, injects any forced **Measure Dependencies** (e.g. ventilation) into that package, composes the selected + injected overlays into one throwaway `EpcPropertyData`, and **re-scores the whole package on the deterministic SAP10 calculator** for the truthful figure. If the true package SAP undershoots the Scenario goal, it **greedy-adds** the unselected Option with the best residual SAP-per-£ and re-scores, repeating until the target is met or the budget is exhausted.

 The reason for the split is that SAP impact is **sub-additive** — summed independent per-Option scores overestimate the combined effect, so the MILP optimum is a *signal*, not the truth. Because the calculator is deterministic and fast (ADR-0009), accuracy is bought by re-scoring the chosen package, not by making the optimiser's per-measure inputs accurate. The optimiser only has to rank measures well enough to seed a near-right package; the calculator supplies the real number.

@ -13,10 +13,10 @@ This resolves the open question deferred in **ADR-0005 §14**.

 ## Consequences

- Calculator calls per Property per **Scenario Phase** ≈ `(# distinct Simulation Overlays)` for the per-Option pass `+` `(a few package re-scores)` in the repair loop — **bounded, never the cross-product**. The Option-dedup-by-Overlay invariant is what keeps the per-Option pass cheap.
+- Calculator calls per Property per **Scenario** ≈ `(# distinct Simulation Overlays)` for the per-Option pass `+` `(a few package re-scores)` in the repair loop — **bounded, never the cross-product**. The Option-dedup-by-Overlay invariant is what keeps the per-Option pass cheap.
 - A forced **Measure Dependency** must be injected into the package **before** the re-score, so its real SAP contribution — *negative* for ventilation — lands in the truthful figure and in the undershoot/repair decision. (The legacy bug was adding ventilation as a cost-only line *after* scoring, which silently overstated the package and undershot the real target.)
 - The optimiser is a clean grouped knapsack: pick ≤1 Option per Recommendation, groups disjoint, **no cross-group mutual-exclusion constraints** — the Recommendation partition (no two Recommendations write the same `(building part, field)`) makes selected overlays collision-free by construction.
 - Greedy repair can overspend relative to a global re-optimise. Accepted for bounded calculator calls and simplicity; re-solving the MILP with the corrected package score fed back as a constraint is the fallback if greedy proves too loose in practice.
 - Per-Option scores are *approximate by design* (independent-vs-baseline) and must never be persisted or surfaced as a measure's "true" impact — only the package re-score is truthful. Measure-level impact shown to users is derived from the final scored package, not from step A.
- **Three distinct scoring roles, each with one job:** (1) per-Option independent-vs-baseline → optimiser *input* (approximate signal, never surfaced); (2) whole-package re-score → truthful *package total*; (3) **final-package marginal cascade** → per-measure *attribution* for display. Role 3 runs only on the *selected* set, applied in **best-practice prescribed order** (walls → roof → ventilation → … per the legacy `Recommendations` class), so `attribution(mᵢ) = score(m₁..mᵢ) − score(m₁..mᵢ₋₁)`; the marginals **telescope exactly to the package total** (role 2) with no residual. The "drop a middle measure" inaccuracy cannot occur because the actual final set is scored, not a hypothetical. Phase is the cascade unit; intra-phase ordering follows the same best-practice sequence.
+- **Three distinct scoring roles, each with one job:** (1) per-Option independent-vs-baseline → optimiser *input* (approximate signal, never surfaced); (2) whole-package re-score → truthful *package total*; (3) **final-package marginal cascade** → per-measure *attribution* for display. Role 3 runs only on the *selected* set, applied in **best-practice prescribed order** (walls → roof → ventilation → … per the legacy `Recommendations` class), so `attribution(mᵢ) = score(m₁..mᵢ) − score(m₁..mᵢ₋₁)`; the marginals **telescope exactly to the package total** (role 2) with no residual. The "drop a middle measure" inaccuracy cannot occur because the actual final set is scored, not a hypothetical. The selected package is the cascade unit; ordering within it follows the best-practice sequence.
 - **The package-scoring primitive is reusable.** "Compose selected overlays → throwaway `EpcPropertyData` → calculator" serves both the optimiser's package re-score (role 2) and a future endpoint that re-scores a *user-assembled* plan live (the FE toggling Rolled-over Options on/off). Because the calculator is fast, live re-score is the **accurate** path the moment a user deviates from the optimiser's selection. Note the trap this avoids: summing stored per-measure figures across a user-edited selection re-introduces the sub-additivity overestimate — a user-edited plan must be re-scored as a package, never summed from stored attributions.
--- a/docs/adr/0017-plan-persistence-evolve-live-tables.md
+++ b/docs/adr/0017-plan-persistence-evolve-live-tables.md
@ -0,0 +1,33 @@
+# Plan persistence — evolve the live tables, no Plan Phase
+
+**Status: Accepted.** Decided in a `/grill-with-docs` session (2026-06-03) scoping the #1157 Plan-persistence schema. Builds on [ADR-0011](0011-composable-stage-orchestrators.md) / [ADR-0012](0012-unit-of-work-per-stage-batch-transaction.md) (stage orchestrators, one Unit of Work per batch), [ADR-0016](0016-package-rescore-over-warm-start-optimisation.md) (the three scoring roles), and [ADR-0005](0005-multi-phase-scenarios-per-phase-recompute.md) (multi-phase deferred).
+
+## Context
+
+The Modelling stage must persist a **Plan** per Property per **Scenario**. Unlike the rest of the rebuild, the output tables already exist in the **live product**: `plan`, `recommendation`, `plan_recommendations` (an m2m join), and `scenario` — SQLAlchemy `Base` models in `backend/app/db/models/recommendations.py`, which the live FE reads. This is **schema evolution on a running product**, not greenfield. Wholesale table changes are expensive and risky.
+
+The rebuild's persistence convention is SQLModel `table=True` rows in `infrastructure/postgres/`, written through repos bound to a `UnitOfWork`, with the ephemeral-Postgres tests building the schema via `SQLModel.metadata.create_all`. The established way it already touches live tables is a **SQLModel mirror pointing at the same physical table** (`task_table.py` → `tasks`, `product_table.py` → `material`, `property_table.py` → `property`); the legacy `Base` model stays for the live app and the physical table is the shared contract.
+
+## Decision
+
+- **Reuse the live `plan` and `recommendation` tables** via SQLModel mirrors in `infrastructure/postgres/`, written through a new `PlanRepository` on the Unit of Work. No new parallel tables. The legacy SQLAlchemy models remain for the live app's reads.
+- **Add `recommendation.plan_id`** (FK → `plan.id`, `ON DELETE CASCADE`). New writes link each measure to its Plan directly; the **`plan_recommendations` m2m is retired for new writes** (its many-to-many made deletes pathologically slow). The m2m table is left in place until the last legacy reader is cut over.
+- **A persisted `recommendation` row is a Plan Measure** — the one selected **Measure Option** with its **role-3 (final-package cascade) attributed impact** and its **Cost**. A **Recommendation** (the candidate, multi-Option, no stored impact) is never persisted as output. (See `CONTEXT.md`: Plan Measure vs Recommendation.)
+- **Post-retrofit figures stay flat on `plan`** (the legacy columns). **No `plan_phase` table and no `phase` column** — multi-phase is deferred (ADR-0005).
+- **Idempotent replace per `(property_id, scenario_id)`** (ADR-0012): a re-run deletes the matching `plan` rows — cascading to their `recommendation` rows via `plan_id` — then inserts fresh. One batch commit, never per-property.
+- **`plan.is_default` derives from `scenario.is_default`** so exactly one default plan exists per Property even across many Scenarios. **`recommendation.default = True`** for every persisted Plan Measure (only selected measures are persisted today).
+- **Units match the live column contract:** the calculator emits CO₂ in **kg**; the live `co2_equivalent_savings` / `post_co2_emissions` columns are **tonnes**, so divide by 1000 on the way in. The CO₂ baseline for the saving comes from the **same calculator** (`PackageScorer.score(epc, [])`), keeping baseline and post self-consistent.
+
+## Considered and rejected
+
+- **Greenfield clean tables for Plans** — rejected: the live FE already reads `plan`/`recommendation`, and there is live data. A parallel table would fork the read model.
+- **Keep the `plan_recommendations` m2m** — rejected: the join's cascade delete is the known performance killer this change exists to remove.
+- **JSONB blob for the package** — rejected: the FE queries per-measure columns; flat typed columns are the existing contract.
+
+## Consequences
+
+- **Two ORM definitions of `plan`/`recommendation`** coexist (legacy SQLAlchemy + new SQLModel mirror), a drift hazard — mitigated by this being the established mirror pattern and the physical table being the single contract. Retiring the legacy models is later, separate work.
+- The **FE owns the Drizzle migration** adding `recommendation.plan_id` (+ index) and, eventually, dropping `plan_recommendations`. Documented in `docs/migrations/recommendation-plan-id.md`.
+- **Unselected alternatives** (the "swap-in" UX) will later be persisted as `recommendation` rows with `default = False` linked via `plan_id` — this schema is forward-compatible. The open question is *what impact figure* such a row carries: it cannot hold a role-3 attribution (it is not in the package), and ADR-0016 forbids surfacing the role-1 independent signal as truth. **Deferred** as an ADR-0016 question.
+- **Energy / bill columns** (`plan.post_energy_consumption`, `plan.energy_consumption_savings`, `plan.post_energy_bill`, `plan.energy_bill_savings`, `recommendation.kwh_savings`, `recommendation.energy_cost_savings`) are **delivered/billed kWh**, not the calculator's primary energy. They are populated by a later **Bill Derivation slice that re-runs bills on the post-package EPC**; NULL until then.
+- The **#1157 tracer persists only** SAP (`post_sap_points`, `recommendation.sap_points`), CO₂ in tonnes (`post_co2_emissions`, `co2_savings`, `recommendation.co2_equivalent_savings`), cost (`estimated_cost`, `cost_of_works`, `contingency_cost`), and the derived `post_epc_rating`. Valuation, `plan_type`, U-values, heat demand, labour, and the energy/bill cluster are left NULL for later slices.
--- a/docs/migrations/recommendation-plan-id.md
+++ b/docs/migrations/recommendation-plan-id.md
@ -0,0 +1,28 @@
+# `recommendation.plan_id` — FE-owned migration
+
+**Context:** #1157 of the Modelling-stage rebuild. The `ModellingOrchestrator` persists a **Plan** and its selected **Plan Measures** (rows of the live `recommendation` table). To link a measure to its Plan it adds **`recommendation.plan_id`**, replacing the `plan_recommendations` many-to-many join for new writes (the m2m's cascade delete is pathologically slow — see [ADR-0017](../adr/0017-plan-persistence-evolve-live-tables.md)).
+
+The SQLModel mirror is defined in `infrastructure/postgres/` so the ephemeral-Postgres tests build it via `SQLModel.metadata.create_all`. The **production migration is FE-owned (Drizzle ORM)**.
+
+## Change
+
+Add one column to the existing `recommendation` table:
+
+| Column | Type | Notes |
+|---|---|---|
+| `plan_id` | bigint, FK → `plan.id`, **`ON DELETE CASCADE`**, indexed | the Plan this measure belongs to. Nullable during transition (legacy rows predate it); new writes always set it. |
+
+- **Index `plan_id`** — the orchestrator's idempotent replace deletes a Plan and relies on the cascade to remove its measures; reads fetch a Plan's measures by `plan_id`.
+- **`ON DELETE CASCADE`** is what makes "delete the Plan → its measures go too" a single statement, replacing the m2m cleanup.
+
+## Transition / sequencing
+
+1. **Add `plan_id` (nullable)** — this migration. New `ModellingOrchestrator` writes populate it; legacy writers and existing rows are unaffected.
+2. **Cut legacy readers** off `plan_recommendations` onto `plan_id` (separate work, not in #1157).
+3. **Drop `plan_recommendations`** once no reader remains (separate migration).
+
+Existing live `recommendation` rows keep `plan_id = NULL` until/unless re-modelled; they remain reachable via the legacy `plan_recommendations` join during the transition.
+
+## Not changed here
+
+No new columns for contingency (per-measure contingency stays summed into `plan.contingency_cost`, matching legacy), no `phase` column (multi-phase deferred, ADR-0005), and the energy/bill columns are populated by a later Bill Derivation slice (ADR-0017).