diff --git a/docs/HANDOVER_MODELLING.md b/docs/HANDOVER_MODELLING.md
index 776de693..83fa2aa8 100644
--- a/docs/HANDOVER_MODELLING.md
+++ b/docs/HANDOVER_MODELLING.md
@@ -127,6 +127,17 @@ Designed in `/grill-with-docs` + `/grill-me`. The live `plan`/`recommendation` t
 
 **Gotchas for the next agent:** the modelling SQLModel classes are `…Model` and live in `infrastructure/postgres/modelling/` (NOT the old flat `plan_table.py`/`scenario_table.py` — deleted); `backend/app/db/models/recommendations.py` is now a pure shim. Out-of-cluster columns are plain ints (no FK) per the mirror convention. **`PortfolioGoal` lives in `domain/modelling/`** now. The `etl/`+`sfr/` reporting scripts still reference the m2m and are **deferred** (out of scope). The live DB changes (add `plan_id`, backfill, drop `plan_recommendations`) are the **FE-owned Drizzle** migrations in the migration doc — this branch is the backend end-state.
 
+## NEXT PHASE — depth + scale e2e (handover for a grilling session)
+
+The owner's goal: run a large dump of **SAP 10.2 EPCs (1,000–10,000)** through Modelling and inspect the recommendations — a large-scale integration test — plus **manual testing via a Python console**. Measure *coverage* (heating/solar/glazing/…) is explicitly **deferred** ("we'll flesh this out"). This phase is **depth + scale on the existing 4 fabric measures** (cavity wall / loft / floor / ventilation):
+
+1. **Close the persisted-field gaps** so a persisted Plan matches the engine's richness for the measures we *do* model: `recommendation_materials` (BOM — depth/quantity/unit/cost; rebuild `Cost` is a single total today, no per-material breakdown), per-measure U-values (`starting_u_value`/`new_u_value`), `total_work_hours`/`labour_days`. Source of truth: the rebuild `ProductRepository` (`repositories/product/`) + legacy `materials_functions.py` / `recommendations_functions.upload_recommendations` (writes `rec["parts"]`).
+2. **Financial uplift modelling** — valuation columns (`plan.valuation_*`, `recommendation.property_valuation_increase`/`rental_yield_increase`) are **greenfield in the rebuild** (no domain concept yet). Legacy logic: `backend/Property.py`, `backend/Funding.py`, `backend/app/db/functions/funding_functions.py`, `portfolio_functions.py`. Needs a domain design (likely a `/grill-with-docs` pass).
+3. **Large-scale e2e harness** — template is `tests/orchestration/test_ara_first_run_pipeline_integration.py::test_modelling_optimises_and_persists_a_multi_measure_plan` (seeds an EPC via `EpcPostgresRepository` + `MaterialRow`s + a `ScenarioModel`, runs `ModellingOrchestrator` directly — the Baseline stage can't run on calculator fixtures). For the dump: parse each EPC via `EpcPropertyDataMapper.from_api_response` / `from_rdsap_schema_21_0_x` (see `datatypes/epc/domain/mapper.py`), seed, run, inspect. EPC samples live under `backend/epc_api/json_samples/`.
+4. **Python-console manual run** — instantiate `ModellingOrchestrator` against a real DB and inspect Plans/Recommendations. Mind the **worktree import trap** (run from the worktree root, not `/tmp`).
+
+A self-contained handover prompt for the next agent is in **`docs/HANDOVER_NEXT_PHASE_PROMPT.md`**.
+
 ## What's left
 
 **Deferred fronts** (open, post-#1161): exclusion-filtering of the candidate pool (deferred from #1160); persist **unselected alternatives** (`default=False` rows linked via `plan_id`) for the swap-in UX — open ADR-0016 question: what impact figure they carry; promote `ProductRepository` to the DB+file composite; non-EPC goal objectives (Energy Savings, Reducing CO2) in the optimiser. Possible extension of the ventilation trigger set to roof insulation (now a one-line data edit in `MEASURES_NEEDING_VENTILATION`); and making the dependency builder lazy (thunk) so the Product is only fetched when a trigger is actually selected.
diff --git a/docs/HANDOVER_NEXT_PHASE_PROMPT.md b/docs/HANDOVER_NEXT_PHASE_PROMPT.md
new file mode 100644
index 00000000..e8b7512f
--- /dev/null
+++ b/docs/HANDOVER_NEXT_PHASE_PROMPT.md
@@ -0,0 +1,53 @@
+# Handover prompt — Modelling: depth + scale e2e (next phase)
+
+> Paste this to a fresh agent. The owner will then run a **grilling session** to lock the design before any code.
+
+You are continuing the **Modelling stage rebuild** (3rd pipeline stage) on branch `feature/bill-derivation`, worktree `/workspaces/home/hestia-worktrees/model-assemble-new-backend`, HEAD at the tip of that branch.
+
+## FIRST: read these, in order
+1. `docs/HANDOVER_MODELLING.md` — full state, locked decisions, gotchas (read in full; the "NEXT PHASE" section frames this work).
+2. Auto-memory `project_modelling_stage_state` — running state.
+3. ADRs **0011/0012** (orchestrators + UoW), **0014** (billing), **0016** (three scoring roles), **0017 + its amendment** (Plan persistence; the `…Model` SQLModel cluster in `infrastructure/postgres/modelling/`; `plan_recommendations` retired).
+4. `CONTEXT.md` — domain glossary (Plan, Plan Measure, Recommendation, Measure Option, Scenario, …).
+
+## Where things stand (what works)
+- The `ModellingOrchestrator` **runs end-to-end and persists to a real Postgres**: generate fabric candidates → role-1 score → optimise (least-cost-to-target) → role-3 attribute → bill → persist a **Plan** + its **Plan Measures** (`recommendation` rows linked by `recommendation.plan_id`; the m2m is gone). Persists SAP, CO₂ (tonnes), cost + contingency, post-band, **plan + per-measure energy/bill/kWh savings**.
+- Proven by `tests/orchestration/test_ara_first_run_pipeline_integration.py::test_modelling_optimises_and_persists_a_multi_measure_plan` (drives the orchestrator directly off a repo-seeded EPC — **the e2e template**).
+- All green: rebuild suite + legacy export/functions; pyright strict clean.
+- **4 fabric generators only**: cavity wall, loft, floor, ventilation (`domain/modelling/generators/`).
+
+## The owner's goal (this phase)
+> "I have a big dump of SAP 10.2 EPCs. I want to run a bunch (1,000–10,000) through this and inspect the recommendations — a reasonably large-scale integration test. I also want to run the code via a Python console for manual testing. Once these measures work e2e, we flesh out the others."
+
+**Measure coverage is explicitly deferred.** This phase is **depth + scale on the existing 4 fabric measures**:
+
+1. **Close the persisted-field gaps** (make a persisted Plan as rich as the engine for the measures we model):
+   - `recommendation_materials` (BOM: depth / quantity / quantity_unit / estimated_cost). Today the rebuild's `Cost` (`domain/modelling/recommendation.py`) is a single fully-loaded `total` + `contingency_rate` — **no per-material breakdown**. Source: rebuild `ProductRepository` (`repositories/product/`), legacy `backend/app/db/functions/materials_functions.py` + `recommendations_functions.upload_recommendations` (writes `rec["parts"]`).
+   - Per-measure U-values (`starting_u_value` / `new_u_value`), `total_work_hours`, `labour_days`. These columns already exist on `RecommendationModel` (NULL today).
+2. **Financial uplift modelling** (valuations) — **greenfield in the rebuild** (no domain concept exists; only `plan.valuation_*` / `recommendation.property_valuation_increase` columns sit NULL). Legacy logic: `backend/Property.py`, `backend/Funding.py`, `backend/app/db/functions/funding_functions.py`, `portfolio_functions.py`. This wants its own design.
+3. **Large-scale e2e harness** — run the EPC dump through Modelling and inspect recommendations:
+   - Parse each EPC via `EpcPropertyDataMapper` (`datatypes/epc/domain/mapper.py`): `from_api_response` (API JSON) / `from_rdsap_schema_21_0_0` / `from_rdsap_schema_21_0_1`. Samples: `backend/epc_api/json_samples/`.
+   - Seed via `EpcPostgresRepository(session).save(epc, property_id, portfolio_id)` + a `ScenarioModel` + the `MaterialRow`s every firing generator prices against, then `ModellingOrchestrator(...).run([...], [scenario_id], portfolio_id)`. (Baseline can't run on calculator fixtures — drive Modelling directly, as the template does.)
+4. **Python-console manual run** — instantiate the orchestrator against a DB and inspect Plans/Recommendations interactively.
+
+## Critical gotchas (carry these)
+- **`mip`/CBC is broken on this aarch64 container** — never build on `mip`.
+- **`moto` not installed** — `--ignore` `tests/orchestration/test_postcode_splitter_orchestrator.py` + `tests/repositories/unstandardised_address/` when sweeping.
+- Run tests with `python -m pytest <path> -q` (NOT `-p no:cov`). The rebuild `db_engine` fixture builds **only `SQLModel.metadata`**.
+- **Worktree import trap** — run via `pytest` / `python -c` **from the worktree root**, not `python /tmp/foo.py` (that imports `/workspaces/model`).
+- Don't edit the SAP calculator's `heat_transmission.py` (another agent owns it).
+- The modelling SQLModel classes are **`…Model`** in **`infrastructure/postgres/modelling/`** (the old flat `plan_table.py`/`scenario_table.py` are deleted); `backend/app/db/models/recommendations.py` is a pure re-export shim. `PortfolioGoal` lives in `domain/modelling/`. Out-of-cluster columns are plain ints (no FK — mirror convention). `ScenarioModel.goal` is the `PortfolioGoal` **enum**; the repo's `to_domain` maps it to its `.value`.
+- `etl/` + `sfr/` and the live Drizzle migrations (add `plan_id` / backfill / drop `plan_recommendations`, per `docs/migrations/recommendation-plan-id.md`) are the **owner's**, not yours.
+- ADR-0014 limitation still applies: **appliances + cooking stubbed at 0 kWh** in bills.
+
+## Conventions
+Stay on `feature/bill-derivation`; one TDD slice = one commit; conventional-commit ending `Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>`; AAA test headers; assert with `abs(x - y) <= tol` (not `pytest.approx`); pyright strict zero errors; annotate call-return locals.
+
+## How to start
+**Do NOT write code yet.** The owner wants a **grilling session** first. Open by mapping the decision tree and surfacing the design questions, e.g.:
+- **BOM / Cost shape:** does `Cost` grow into a per-material breakdown (parts with depth/quantity/unit), or do materials become a separate concept the generators emit alongside the Option? How does the rebuild `ProductRepository` supply material parts + U-values today vs. what the BOM needs?
+- **Financial uplift:** what's the valuation model (legacy `Property.py`/`Funding.py` — back-solve or formula)? Which columns are in-scope (valuation lower/upper/avg, post-retrofit, rental yield)? Domain home for it?
+- **Scale harness:** is the EPC dump API-JSON or RdSAP-schema? Where does it live / how is it provided? Is it a committed test (subset) + a separate runnable script for the full 1k–10k? What's "inspect the recommendations" — assertions, a CSV/report, or console exploration? How to seed materials for *all* measure types at scale (catalogue completeness).
+- **Console UX:** a small documented entrypoint/helper to build a `ModellingOrchestrator` + UoW against a chosen DB and run one property?
+
+Tell the owner what you'll tackle first and whether you want a `/grill-with-docs` design pass (the financial-uplift and BOM-shape decisions are load-bearing and want ADRs).