Model/docs/HANDOVER_NEXT_PHASE_PROMPT.md
Khalim Conn-Kowlessar b8b7e02034 docs(modelling): next-phase handover — depth + scale e2e + grilling prompt
Capture the next phase (close persisted-field gaps + financial uplift, plus a
large-scale e2e run of a SAP 10.2 EPC dump and console manual testing; measure
coverage deferred) and a self-contained handover prompt for a fresh agent to
pick up via a grilling session.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 23:09:08 +00:00

7 KiB
Raw Blame History

Handover prompt — Modelling: depth + scale e2e (next phase)

Paste this to a fresh agent. The owner will then run a grilling session to lock the design before any code.

You are continuing the Modelling stage rebuild (3rd pipeline stage) on branch feature/bill-derivation, worktree /workspaces/home/hestia-worktrees/model-assemble-new-backend, HEAD at the tip of that branch.

FIRST: read these, in order

  1. docs/HANDOVER_MODELLING.md — full state, locked decisions, gotchas (read in full; the "NEXT PHASE" section frames this work).
  2. Auto-memory project_modelling_stage_state — running state.
  3. ADRs 0011/0012 (orchestrators + UoW), 0014 (billing), 0016 (three scoring roles), 0017 + its amendment (Plan persistence; the …Model SQLModel cluster in infrastructure/postgres/modelling/; plan_recommendations retired).
  4. CONTEXT.md — domain glossary (Plan, Plan Measure, Recommendation, Measure Option, Scenario, …).

Where things stand (what works)

  • The ModellingOrchestrator runs end-to-end and persists to a real Postgres: generate fabric candidates → role-1 score → optimise (least-cost-to-target) → role-3 attribute → bill → persist a Plan + its Plan Measures (recommendation rows linked by recommendation.plan_id; the m2m is gone). Persists SAP, CO₂ (tonnes), cost + contingency, post-band, plan + per-measure energy/bill/kWh savings.
  • Proven by tests/orchestration/test_ara_first_run_pipeline_integration.py::test_modelling_optimises_and_persists_a_multi_measure_plan (drives the orchestrator directly off a repo-seeded EPC — the e2e template).
  • All green: rebuild suite + legacy export/functions; pyright strict clean.
  • 4 fabric generators only: cavity wall, loft, floor, ventilation (domain/modelling/generators/).

The owner's goal (this phase)

"I have a big dump of SAP 10.2 EPCs. I want to run a bunch (1,00010,000) through this and inspect the recommendations — a reasonably large-scale integration test. I also want to run the code via a Python console for manual testing. Once these measures work e2e, we flesh out the others."

Measure coverage is explicitly deferred. This phase is depth + scale on the existing 4 fabric measures:

  1. Close the persisted-field gaps (make a persisted Plan as rich as the engine for the measures we model):
    • recommendation_materials (BOM: depth / quantity / quantity_unit / estimated_cost). Today the rebuild's Cost (domain/modelling/recommendation.py) is a single fully-loaded total + contingency_rateno per-material breakdown. Source: rebuild ProductRepository (repositories/product/), legacy backend/app/db/functions/materials_functions.py + recommendations_functions.upload_recommendations (writes rec["parts"]).
    • Per-measure U-values (starting_u_value / new_u_value), total_work_hours, labour_days. These columns already exist on RecommendationModel (NULL today).
  2. Financial uplift modelling (valuations) — greenfield in the rebuild (no domain concept exists; only plan.valuation_* / recommendation.property_valuation_increase columns sit NULL). Legacy logic: backend/Property.py, backend/Funding.py, backend/app/db/functions/funding_functions.py, portfolio_functions.py. This wants its own design.
  3. Large-scale e2e harness — run the EPC dump through Modelling and inspect recommendations:
    • Parse each EPC via EpcPropertyDataMapper (datatypes/epc/domain/mapper.py): from_api_response (API JSON) / from_rdsap_schema_21_0_0 / from_rdsap_schema_21_0_1. Samples: backend/epc_api/json_samples/.
    • Seed via EpcPostgresRepository(session).save(epc, property_id, portfolio_id) + a ScenarioModel + the MaterialRows every firing generator prices against, then ModellingOrchestrator(...).run([...], [scenario_id], portfolio_id). (Baseline can't run on calculator fixtures — drive Modelling directly, as the template does.)
  4. Python-console manual run — instantiate the orchestrator against a DB and inspect Plans/Recommendations interactively.

Critical gotchas (carry these)

  • mip/CBC is broken on this aarch64 container — never build on mip.
  • moto not installed--ignore tests/orchestration/test_postcode_splitter_orchestrator.py + tests/repositories/unstandardised_address/ when sweeping.
  • Run tests with python -m pytest <path> -q (NOT -p no:cov). The rebuild db_engine fixture builds only SQLModel.metadata.
  • Worktree import trap — run via pytest / python -c from the worktree root, not python /tmp/foo.py (that imports /workspaces/model).
  • Don't edit the SAP calculator's heat_transmission.py (another agent owns it).
  • The modelling SQLModel classes are …Model in infrastructure/postgres/modelling/ (the old flat plan_table.py/scenario_table.py are deleted); backend/app/db/models/recommendations.py is a pure re-export shim. PortfolioGoal lives in domain/modelling/. Out-of-cluster columns are plain ints (no FK — mirror convention). ScenarioModel.goal is the PortfolioGoal enum; the repo's to_domain maps it to its .value.
  • etl/ + sfr/ and the live Drizzle migrations (add plan_id / backfill / drop plan_recommendations, per docs/migrations/recommendation-plan-id.md) are the owner's, not yours.
  • ADR-0014 limitation still applies: appliances + cooking stubbed at 0 kWh in bills.

Conventions

Stay on feature/bill-derivation; one TDD slice = one commit; conventional-commit ending Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>; AAA test headers; assert with abs(x - y) <= tol (not pytest.approx); pyright strict zero errors; annotate call-return locals.

How to start

Do NOT write code yet. The owner wants a grilling session first. Open by mapping the decision tree and surfacing the design questions, e.g.:

  • BOM / Cost shape: does Cost grow into a per-material breakdown (parts with depth/quantity/unit), or do materials become a separate concept the generators emit alongside the Option? How does the rebuild ProductRepository supply material parts + U-values today vs. what the BOM needs?
  • Financial uplift: what's the valuation model (legacy Property.py/Funding.py — back-solve or formula)? Which columns are in-scope (valuation lower/upper/avg, post-retrofit, rental yield)? Domain home for it?
  • Scale harness: is the EPC dump API-JSON or RdSAP-schema? Where does it live / how is it provided? Is it a committed test (subset) + a separate runnable script for the full 1k10k? What's "inspect the recommendations" — assertions, a CSV/report, or console exploration? How to seed materials for all measure types at scale (catalogue completeness).
  • Console UX: a small documented entrypoint/helper to build a ModellingOrchestrator + UoW against a chosen DB and run one property?

Tell the owner what you'll tackle first and whether you want a /grill-with-docs design pass (the financial-uplift and BOM-shape decisions are load-bearing and want ADRs).