Model/docs/HANDOVER_NEXT_PHASE_PROMPT.md
Khalim Conn-Kowlessar b8b7e02034 docs(modelling): next-phase handover — depth + scale e2e + grilling prompt
Capture the next phase (close persisted-field gaps + financial uplift, plus a
large-scale e2e run of a SAP 10.2 EPC dump and console manual testing; measure
coverage deferred) and a self-contained handover prompt for a fresh agent to
pick up via a grilling session.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 23:09:08 +00:00

53 lines
7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Handover prompt — Modelling: depth + scale e2e (next phase)
> Paste this to a fresh agent. The owner will then run a **grilling session** to lock the design before any code.
You are continuing the **Modelling stage rebuild** (3rd pipeline stage) on branch `feature/bill-derivation`, worktree `/workspaces/home/hestia-worktrees/model-assemble-new-backend`, HEAD at the tip of that branch.
## FIRST: read these, in order
1. `docs/HANDOVER_MODELLING.md` — full state, locked decisions, gotchas (read in full; the "NEXT PHASE" section frames this work).
2. Auto-memory `project_modelling_stage_state` — running state.
3. ADRs **0011/0012** (orchestrators + UoW), **0014** (billing), **0016** (three scoring roles), **0017 + its amendment** (Plan persistence; the `…Model` SQLModel cluster in `infrastructure/postgres/modelling/`; `plan_recommendations` retired).
4. `CONTEXT.md` — domain glossary (Plan, Plan Measure, Recommendation, Measure Option, Scenario, …).
## Where things stand (what works)
- The `ModellingOrchestrator` **runs end-to-end and persists to a real Postgres**: generate fabric candidates → role-1 score → optimise (least-cost-to-target) → role-3 attribute → bill → persist a **Plan** + its **Plan Measures** (`recommendation` rows linked by `recommendation.plan_id`; the m2m is gone). Persists SAP, CO₂ (tonnes), cost + contingency, post-band, **plan + per-measure energy/bill/kWh savings**.
- Proven by `tests/orchestration/test_ara_first_run_pipeline_integration.py::test_modelling_optimises_and_persists_a_multi_measure_plan` (drives the orchestrator directly off a repo-seeded EPC — **the e2e template**).
- All green: rebuild suite + legacy export/functions; pyright strict clean.
- **4 fabric generators only**: cavity wall, loft, floor, ventilation (`domain/modelling/generators/`).
## The owner's goal (this phase)
> "I have a big dump of SAP 10.2 EPCs. I want to run a bunch (1,00010,000) through this and inspect the recommendations — a reasonably large-scale integration test. I also want to run the code via a Python console for manual testing. Once these measures work e2e, we flesh out the others."
**Measure coverage is explicitly deferred.** This phase is **depth + scale on the existing 4 fabric measures**:
1. **Close the persisted-field gaps** (make a persisted Plan as rich as the engine for the measures we model):
- `recommendation_materials` (BOM: depth / quantity / quantity_unit / estimated_cost). Today the rebuild's `Cost` (`domain/modelling/recommendation.py`) is a single fully-loaded `total` + `contingency_rate`**no per-material breakdown**. Source: rebuild `ProductRepository` (`repositories/product/`), legacy `backend/app/db/functions/materials_functions.py` + `recommendations_functions.upload_recommendations` (writes `rec["parts"]`).
- Per-measure U-values (`starting_u_value` / `new_u_value`), `total_work_hours`, `labour_days`. These columns already exist on `RecommendationModel` (NULL today).
2. **Financial uplift modelling** (valuations) — **greenfield in the rebuild** (no domain concept exists; only `plan.valuation_*` / `recommendation.property_valuation_increase` columns sit NULL). Legacy logic: `backend/Property.py`, `backend/Funding.py`, `backend/app/db/functions/funding_functions.py`, `portfolio_functions.py`. This wants its own design.
3. **Large-scale e2e harness** — run the EPC dump through Modelling and inspect recommendations:
- Parse each EPC via `EpcPropertyDataMapper` (`datatypes/epc/domain/mapper.py`): `from_api_response` (API JSON) / `from_rdsap_schema_21_0_0` / `from_rdsap_schema_21_0_1`. Samples: `backend/epc_api/json_samples/`.
- Seed via `EpcPostgresRepository(session).save(epc, property_id, portfolio_id)` + a `ScenarioModel` + the `MaterialRow`s every firing generator prices against, then `ModellingOrchestrator(...).run([...], [scenario_id], portfolio_id)`. (Baseline can't run on calculator fixtures — drive Modelling directly, as the template does.)
4. **Python-console manual run** — instantiate the orchestrator against a DB and inspect Plans/Recommendations interactively.
## Critical gotchas (carry these)
- **`mip`/CBC is broken on this aarch64 container** — never build on `mip`.
- **`moto` not installed** — `--ignore` `tests/orchestration/test_postcode_splitter_orchestrator.py` + `tests/repositories/unstandardised_address/` when sweeping.
- Run tests with `python -m pytest <path> -q` (NOT `-p no:cov`). The rebuild `db_engine` fixture builds **only `SQLModel.metadata`**.
- **Worktree import trap** — run via `pytest` / `python -c` **from the worktree root**, not `python /tmp/foo.py` (that imports `/workspaces/model`).
- Don't edit the SAP calculator's `heat_transmission.py` (another agent owns it).
- The modelling SQLModel classes are **`…Model`** in **`infrastructure/postgres/modelling/`** (the old flat `plan_table.py`/`scenario_table.py` are deleted); `backend/app/db/models/recommendations.py` is a pure re-export shim. `PortfolioGoal` lives in `domain/modelling/`. Out-of-cluster columns are plain ints (no FK — mirror convention). `ScenarioModel.goal` is the `PortfolioGoal` **enum**; the repo's `to_domain` maps it to its `.value`.
- `etl/` + `sfr/` and the live Drizzle migrations (add `plan_id` / backfill / drop `plan_recommendations`, per `docs/migrations/recommendation-plan-id.md`) are the **owner's**, not yours.
- ADR-0014 limitation still applies: **appliances + cooking stubbed at 0 kWh** in bills.
## Conventions
Stay on `feature/bill-derivation`; one TDD slice = one commit; conventional-commit ending `Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>`; AAA test headers; assert with `abs(x - y) <= tol` (not `pytest.approx`); pyright strict zero errors; annotate call-return locals.
## How to start
**Do NOT write code yet.** The owner wants a **grilling session** first. Open by mapping the decision tree and surfacing the design questions, e.g.:
- **BOM / Cost shape:** does `Cost` grow into a per-material breakdown (parts with depth/quantity/unit), or do materials become a separate concept the generators emit alongside the Option? How does the rebuild `ProductRepository` supply material parts + U-values today vs. what the BOM needs?
- **Financial uplift:** what's the valuation model (legacy `Property.py`/`Funding.py` — back-solve or formula)? Which columns are in-scope (valuation lower/upper/avg, post-retrofit, rental yield)? Domain home for it?
- **Scale harness:** is the EPC dump API-JSON or RdSAP-schema? Where does it live / how is it provided? Is it a committed test (subset) + a separate runnable script for the full 1k10k? What's "inspect the recommendations" — assertions, a CSV/report, or console exploration? How to seed materials for *all* measure types at scale (catalogue completeness).
- **Console UX:** a small documented entrypoint/helper to build a `ModellingOrchestrator` + UoW against a chosen DB and run one property?
Tell the owner what you'll tackle first and whether you want a `/grill-with-docs` design pass (the financial-uplift and BOM-shape decisions are load-bearing and want ADRs).