Model/docs/HANDOVER_NEXT_PHASE_PROMPT.md

# Handover prompt — Modelling: depth + scale e2e (next phase)

> Paste this to a fresh agent. The owner will then run a **grilling session** to lock the design before any code.

You are continuing the **Modelling stage rebuild** (3rd pipeline stage) on branch `feature/bill-derivation`, worktree `/workspaces/home/hestia-worktrees/model-assemble-new-backend`, HEAD at the tip of that branch.

## FIRST: read these, in order
1. `docs/HANDOVER_MODELLING.md` — full state, locked decisions, gotchas (read in full; the "NEXT PHASE" section frames this work).
2. Auto-memory `project_modelling_stage_state` — running state.
3. ADRs **0011/0012** (orchestrators + UoW), **0014** (billing), **0016** (three scoring roles), **0017 + its amendment** (Plan persistence; the `…Model` SQLModel cluster in `infrastructure/postgres/modelling/`; `plan_recommendations` retired).
4. `CONTEXT.md` — domain glossary (Plan, Plan Measure, Recommendation, Measure Option, Scenario, …).

## Where things stand (what works)
- The `ModellingOrchestrator` **runs end-to-end and persists to a real Postgres**: generate fabric candidates → role-1 score → optimise (least-cost-to-target) → role-3 attribute → bill → persist a **Plan** + its **Plan Measures** (`recommendation` rows linked by `recommendation.plan_id`; the m2m is gone). Persists SAP, CO₂ (tonnes), cost + contingency, post-band, **plan + per-measure energy/bill/kWh savings**.
- Proven by `tests/orchestration/test_ara_first_run_pipeline_integration.py::test_modelling_optimises_and_persists_a_multi_measure_plan` (drives the orchestrator directly off a repo-seeded EPC — **the e2e template**).
- All green: rebuild suite + legacy export/functions; pyright strict clean.
- **4 fabric generators only**: cavity wall, loft, floor, ventilation (`domain/modelling/generators/`).

## The owner's goal (this phase)
> "I have a big dump of SAP 10.2 EPCs. I want to run a bunch (1,000–10,000) through this and inspect the recommendations — a reasonably large-scale integration test. I also want to run the code via a Python console for manual testing. Once these measures work e2e, we flesh out the others."

**Measure coverage is explicitly deferred.** This phase is **depth + scale on the existing 4 fabric measures**:

1. **Close the persisted-field gaps** (make a persisted Plan as rich as the engine for the measures we model):
   - `recommendation_materials` (BOM: depth / quantity / quantity_unit / estimated_cost). Today the rebuild's `Cost` (`domain/modelling/recommendation.py`) is a single fully-loaded `total` + `contingency_rate` — **no per-material breakdown**. Source: rebuild `ProductRepository` (`repositories/product/`), legacy `backend/app/db/functions/materials_functions.py` + `recommendations_functions.upload_recommendations` (writes `rec["parts"]`).
   - Per-measure U-values (`starting_u_value` / `new_u_value`), `total_work_hours`, `labour_days`. These columns already exist on `RecommendationModel` (NULL today).
2. **Financial uplift modelling** (valuations) — **greenfield in the rebuild** (no domain concept exists; only `plan.valuation_*` / `recommendation.property_valuation_increase` columns sit NULL). Legacy logic: `backend/Property.py`, `backend/Funding.py`, `backend/app/db/functions/funding_functions.py`, `portfolio_functions.py`. This wants its own design.
3. **Large-scale e2e harness** — run the EPC dump through Modelling and inspect recommendations:
   - Parse each EPC via `EpcPropertyDataMapper` (`datatypes/epc/domain/mapper.py`): `from_api_response` (API JSON) / `from_rdsap_schema_21_0_0` / `from_rdsap_schema_21_0_1`. Samples: `backend/epc_api/json_samples/`.
   - Seed via `EpcPostgresRepository(session).save(epc, property_id, portfolio_id)` + a `ScenarioModel` + the `MaterialRow`s every firing generator prices against, then `ModellingOrchestrator(...).run([...], [scenario_id], portfolio_id)`. (Baseline can't run on calculator fixtures — drive Modelling directly, as the template does.)
4. **Python-console manual run** — instantiate the orchestrator against a DB and inspect Plans/Recommendations interactively.

## Critical gotchas (carry these)
- **`mip`/CBC is broken on this aarch64 container** — never build on `mip`.
- **`moto` not installed** — `--ignore` `tests/orchestration/test_postcode_splitter_orchestrator.py` + `tests/repositories/unstandardised_address/` when sweeping.
- Run tests with `python -m pytest <path> -q` (NOT `-p no:cov`). The rebuild `db_engine` fixture builds **only `SQLModel.metadata`**.
- **Worktree import trap** — run via `pytest` / `python -c` **from the worktree root**, not `python /tmp/foo.py` (that imports `/workspaces/model`).
- Don't edit the SAP calculator's `heat_transmission.py` (another agent owns it).
- The modelling SQLModel classes are **`…Model`** in **`infrastructure/postgres/modelling/`** (the old flat `plan_table.py`/`scenario_table.py` are deleted); `backend/app/db/models/recommendations.py` is a pure re-export shim. `PortfolioGoal` lives in `domain/modelling/`. Out-of-cluster columns are plain ints (no FK — mirror convention). `ScenarioModel.goal` is the `PortfolioGoal` **enum**; the repo's `to_domain` maps it to its `.value`.
- `etl/` + `sfr/` and the live Drizzle migrations (add `plan_id` / backfill / drop `plan_recommendations`, per `docs/migrations/recommendation-plan-id.md`) are the **owner's**, not yours.
- ADR-0014 limitation still applies: **appliances + cooking stubbed at 0 kWh** in bills.

## Conventions
Stay on `feature/bill-derivation`; one TDD slice = one commit; conventional-commit ending `Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>`; AAA test headers; assert with `abs(x - y) <= tol` (not `pytest.approx`); pyright strict zero errors; annotate call-return locals.

## How to start
**Do NOT write code yet.** The owner wants a **grilling session** first. Open by mapping the decision tree and surfacing the design questions, e.g.:
- **BOM / Cost shape:** does `Cost` grow into a per-material breakdown (parts with depth/quantity/unit), or do materials become a separate concept the generators emit alongside the Option? How does the rebuild `ProductRepository` supply material parts + U-values today vs. what the BOM needs?
- **Financial uplift:** what's the valuation model (legacy `Property.py`/`Funding.py` — back-solve or formula)? Which columns are in-scope (valuation lower/upper/avg, post-retrofit, rental yield)? Domain home for it?
- **Scale harness:** is the EPC dump API-JSON or RdSAP-schema? Where does it live / how is it provided? Is it a committed test (subset) + a separate runnable script for the full 1k–10k? What's "inspect the recommendations" — assertions, a CSV/report, or console exploration? How to seed materials for *all* measure types at scale (catalogue completeness).
- **Console UX:** a small documented entrypoint/helper to build a `ModellingOrchestrator` + UoW against a chosen DB and run one property?

Tell the owner what you'll tackle first and whether you want a `/grill-with-docs` design pass (the financial-uplift and BOM-shape decisions are load-bearing and want ADRs).