From 0ba45a09cc98041c1662b0b5ba386c542b28844f Mon Sep 17 00:00:00 2001
From: Khalim Conn-Kowlessar <kconnkowlessar@gmail.com>
Date: Tue, 2 Jun 2026 22:13:51 +0000
Subject: [PATCH] =?UTF-8?q?docs(modelling):=20record=20stage=20design=20?=
 =?UTF-8?q?=E2=80=94=20CONTEXT=20terms=20+=20ADR-0016?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Reframe Recommendation as a target surface (partitions the EpcPropertyData
surface, so selected overlays never collide); add Measure Option,
Simulation Overlay (EpcSimulation), Product, Cost, Contingency, and
Measure Dependency. ADR-0016 fixes the scoring/optimisation approach
(warm-start grouped-knapsack MILP -> deterministic package re-score ->
greedy repair, with a final-package marginal cascade for display
attribution), resolving the open question in ADR-0005 §14.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 CONTEXT.md                                    | 27 +++++++++++++++++--
 ...lti-phase-scenarios-per-phase-recompute.md |  2 +-
 ...ge-rescore-over-warm-start-optimisation.md | 22 +++++++++++++++
 3 files changed, 48 insertions(+), 3 deletions(-)
 create mode 100644 docs/adr/0016-package-rescore-over-warm-start-optimisation.md

diff --git a/CONTEXT.md b/CONTEXT.md
index f3ffd4fa..74759091 100644
--- a/CONTEXT.md
+++ b/CONTEXT.md
@@ -208,8 +208,31 @@ Recommendations generated but not selected by the Optimiser in a given Plan Phas
 _Avoid_: deferred measures, leftover recommendations
 
 **Recommendation**:
-A single proposed retrofit measure for a Property, with its cost, SAP impact, kWh savings, carbon savings, and parts list.
-_Avoid_: suggestion, option
+The finding that a Property needs work on a given **target surface** — a building part (the MAIN wall, an extension roof…) or a system (heating + hot water + controls, treated as one). Carries one or more mutually-exclusive **Measure Options**; the Optimiser selects at most one. Recommendations **partition** the modifiable surface of EpcPropertyData: no two Recommendations write the same `(building part, field)`, so selected Options never collide. Exclusivity between competing treatments (cavity-fill vs EWI; a boiler bundle vs an ASHP) is captured *within* one Recommendation, never across them.
+_Avoid_: suggestion, recommendation engine, keying by measure type (a Recommendation can span measure types — e.g. a heating + hot-water bundle)
+
+**Measure Option**:
+One mutually-exclusive way to satisfy a **Recommendation** — possibly a **bundle** of sub-measures (e.g. "new condensing boiler + cylinder insulation"), possibly a single intervention at a chosen size/product (a 4 kWp PV array of product X). Carries its total cost and a **Simulation Overlay** for its combined effect on the target surface. Cost is intrinsic to the Option; SAP / kWh / carbon impact is **not** — impact is cascade-conditional (depends on what is already installed) and is produced by scoring, never stored on the Option. Two Options under one Recommendation may share an identical Simulation Overlay (differing only on cost/product) or differ (e.g. PV kWp), so scoring runs per distinct Overlay.
+_Avoid_: option (too generic), variant, SKU
+
+**Simulation Overlay** (type `EpcSimulation`):
+The change a single **Measure Option** makes to a Property's EpcPropertyData, expressed as an all-optional partial mirror of EpcPropertyData and its nested types — covering only the retrofit-relevant surface (walls/roofs/floors, windows, heating + controls, hot water, ventilation, lighting, PV, draughtproofing), never identity/location fields. Targets a specific building part by `BuildingPartIdentifier` (MAIN, EXTENSION_1..4) so "insulate the cavity wall" addresses the exact `SapBuildingPart`. Carries no scores. It is **not** an EpcPropertyData (composition, not inheritance — an all-`None` overlay is not a valid EPC). A domain operation folds a baseline EpcPropertyData + an ordered set of Overlays into a throwaway EpcPropertyData handed to the calculator; only the score is kept, the EPD is discarded.
+_Avoid_: simulation config (the legacy EPC-API flag object), patch, delta, diff
+
+**Product**:
+A catalogue entry a **Measure Option** installs — insulation, glazing units, heat pumps, boilers, cylinders, PV panels, inverters, batteries — carrying the data to price an Option and shape its **Simulation Overlay**. Named *Product*, not *material*: the catalogue is dominated by equipment and appliances, and a heat pump is not a building material. Read via `ProductRepository`, which for now combines two inputs — the Products in the database plus a committed costs file holding what the ETL does not yet supply. Single-source unification (ETL-supplied costs) is separate, queued work; legacy `Costs.py` is retained but queued for deletion.
+_Avoid_: material, building material (inaccurate for appliances), part (the per-Option installed line item), SKU
+
+**Cost** (of a Measure Option):
+A single **fully-loaded total** — products + labour + preliminaries + VAT + margin rolled into one figure — **plus a separately-carried Contingency**. Only contingency is broken out; the rest is not decomposed, as that breakdown proved unhelpful.
+
+**Contingency**:
+A per-**Measure-Type** percentage uplift on an Option's cost covering job-specific risk (e.g. cavity-wall 10%, internal/external wall 26%, ASHP 25% — cf. legacy `Costs.CONTINGENCIES`). The one cost component carried separately from the fully-loaded total, because the rate is measure-type-specific and meaningful to surface.
+_Avoid_: preliminaries (a different, rolled-in 10%), margin
+
+**Measure Dependency**:
+A "selecting A requires B" edge between **Recommendations**, for couplings that are real but that the Optimiser would not choose on its own — e.g. wall (and possibly roof) insulation requires adequate ventilation. The required Option is excluded from the optimiser's candidate pool (it is mandatory-when-triggered, not a free choice) but is **injected into the Optimised Package before the package re-score**, so its real SAP contribution — which for ventilation is *negative* — is captured in the true package score and in the undershoot/repair loop. Trigger set is held as **data** (cf. legacy `assumptions.measures_needing_ventilation`), not control flow, so extending the triggers (e.g. to roof insulation) is a data edit. Distinct from the legacy post-optimisation best-practice add, which tacked cost on *after* scoring and so undershot.
+_Avoid_: best-practice measure (legacy term), forced measure
 
 **Optimised Package**:
 The subset of a Property's Recommendations selected by the Optimiser Service for installation, chosen to satisfy the Scenario's goal subject to budget.
diff --git a/docs/adr/0005-multi-phase-scenarios-per-phase-recompute.md b/docs/adr/0005-multi-phase-scenarios-per-phase-recompute.md
index 6fc5b4cf..0d811847 100644
--- a/docs/adr/0005-multi-phase-scenarios-per-phase-recompute.md
+++ b/docs/adr/0005-multi-phase-scenarios-per-phase-recompute.md
@@ -11,4 +11,4 @@ A single-phase Scenario is `phases: [<one ScenarioPhase>]` with all measure type
 - `Plan` carries `phases: list[PlanPhase]` rather than a flat `OptimisedPackage`. Every consumer of plan output (FE, exports, downstream reports) reads phases.
 - The optimiser must accept rolling-state input rather than only baseline state — a generalisation of today's single-shot pass.
 - ML cost can be controlled at the scenario layer: keeping a scenario single-phase is the lever for "score once, optimise once" if cost becomes a problem.
-- Open future change: SAP impact of a measure is not strictly additive even within a phase. The current per-measure scoring + linear optimisation approximates this. A future iteration may pre-define candidate packages and ML-score whole packages, accepting combinatorial cost for accuracy. Track in PRD §15.
+- ~~Open future change: SAP impact of a measure is not strictly additive even within a phase. The current per-measure scoring + linear optimisation approximates this. A future iteration may pre-define candidate packages and ML-score whole packages, accepting combinatorial cost for accuracy. Track in PRD §15.~~ **Resolved by [ADR-0016](0016-package-rescore-over-warm-start-optimisation.md):** the answer is not package enumeration but warm-start MILP on independent-vs-baseline scores → deterministic-calculator package re-score → greedy repair, which sidesteps the cross-product.
diff --git a/docs/adr/0016-package-rescore-over-warm-start-optimisation.md b/docs/adr/0016-package-rescore-over-warm-start-optimisation.md
new file mode 100644
index 00000000..50095a03
--- /dev/null
+++ b/docs/adr/0016-package-rescore-over-warm-start-optimisation.md
@@ -0,0 +1,22 @@
+# Package re-scoring over warm-start optimisation, not marginal cascade or full enumeration
+
+Modelling scores each **Measure Option** once, **independently against the baseline** Effective EPC (deduplicated per distinct **Simulation Overlay**, so identical overlays are scored once). It runs a grouped-knapsack MILP over those per-Option scores to get a *candidate* package, injects any forced **Measure Dependencies** (e.g. ventilation) into that package, composes the selected + injected overlays into one throwaway `EpcPropertyData`, and **re-scores the whole package on the deterministic SAP10 calculator** for the truthful figure. If the true package SAP undershoots the phase goal, it **greedy-adds** the unselected Option with the best residual SAP-per-£ and re-scores, repeating until the target is met or the budget is exhausted.
+
+The reason for the split is that SAP impact is **sub-additive** — summed independent per-Option scores overestimate the combined effect, so the MILP optimum is a *signal*, not the truth. Because the calculator is deterministic and fast (ADR-0009), accuracy is bought by re-scoring the chosen package, not by making the optimiser's per-measure inputs accurate. The optimiser only has to rank measures well enough to seed a near-right package; the calculator supplies the real number.
+
+We rejected two alternatives:
+
+- **Marginal cascade scores** (the legacy approach): score measure *N* assuming measures `1..N-1` are present. These telescope to the true total *only if every measure is selected*; the optimiser dropping a middle measure invalidates every downstream marginal. It adds the cascade's complexity for an accuracy the package re-score already provides.
+- **Full package enumeration / ML-scoring the cross-product** (the path ADR-0005 §14 anticipated): combinatorial in `#Recommendations × #Options`. With realistic option counts (wall × roof × floor × heating-bundle × PV × …) the cross-product is intractable. The warm-start + re-score + repair loop reaches a truthful, near-optimal package without ever materialising it.
+
+This resolves the open question deferred in **ADR-0005 §14**.
+
+## Consequences
+
+- Calculator calls per Property per **Scenario Phase** ≈ `(# distinct Simulation Overlays)` for the per-Option pass `+` `(a few package re-scores)` in the repair loop — **bounded, never the cross-product**. The Option-dedup-by-Overlay invariant is what keeps the per-Option pass cheap.
+- A forced **Measure Dependency** must be injected into the package **before** the re-score, so its real SAP contribution — *negative* for ventilation — lands in the truthful figure and in the undershoot/repair decision. (The legacy bug was adding ventilation as a cost-only line *after* scoring, which silently overstated the package and undershot the real target.)
+- The optimiser is a clean grouped knapsack: pick ≤1 Option per Recommendation, groups disjoint, **no cross-group mutual-exclusion constraints** — the Recommendation partition (no two Recommendations write the same `(building part, field)`) makes selected overlays collision-free by construction.
+- Greedy repair can overspend relative to a global re-optimise. Accepted for bounded calculator calls and simplicity; re-solving the MILP with the corrected package score fed back as a constraint is the fallback if greedy proves too loose in practice.
+- Per-Option scores are *approximate by design* (independent-vs-baseline) and must never be persisted or surfaced as a measure's "true" impact — only the package re-score is truthful. Measure-level impact shown to users is derived from the final scored package, not from step A.
+- **Three distinct scoring roles, each with one job:** (1) per-Option independent-vs-baseline → optimiser *input* (approximate signal, never surfaced); (2) whole-package re-score → truthful *package total*; (3) **final-package marginal cascade** → per-measure *attribution* for display. Role 3 runs only on the *selected* set, applied in **best-practice prescribed order** (walls → roof → ventilation → … per the legacy `Recommendations` class), so `attribution(mᵢ) = score(m₁..mᵢ) − score(m₁..mᵢ₋₁)`; the marginals **telescope exactly to the package total** (role 2) with no residual. The "drop a middle measure" inaccuracy cannot occur because the actual final set is scored, not a hypothetical. Phase is the cascade unit; intra-phase ordering follows the same best-practice sequence.
+- **The package-scoring primitive is reusable.** "Compose selected overlays → throwaway `EpcPropertyData` → calculator" serves both the optimiser's package re-score (role 2) and a future endpoint that re-scores a *user-assembled* plan live (the FE toggling Rolled-over Options on/off). Because the calculator is fast, live re-score is the **accurate** path the moment a user deviates from the optimiser's selection. Note the trap this avoids: summing stored per-measure figures across a user-edited selection re-introduces the sub-additivity overestimate — a user-edited plan must be re-scored as a package, never summed from stored attributions.