Capture the next phase (close persisted-field gaps + financial uplift, plus a large-scale e2e run of a SAP 10.2 EPC dump and console manual testing; measure coverage deferred) and a self-contained handover prompt for a fresh agent to pick up via a grilling session. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
7 KiB
Handover prompt — Modelling: depth + scale e2e (next phase)
Paste this to a fresh agent. The owner will then run a grilling session to lock the design before any code.
You are continuing the Modelling stage rebuild (3rd pipeline stage) on branch feature/bill-derivation, worktree /workspaces/home/hestia-worktrees/model-assemble-new-backend, HEAD at the tip of that branch.
FIRST: read these, in order
docs/HANDOVER_MODELLING.md— full state, locked decisions, gotchas (read in full; the "NEXT PHASE" section frames this work).- Auto-memory
project_modelling_stage_state— running state. - ADRs 0011/0012 (orchestrators + UoW), 0014 (billing), 0016 (three scoring roles), 0017 + its amendment (Plan persistence; the
…ModelSQLModel cluster ininfrastructure/postgres/modelling/;plan_recommendationsretired). CONTEXT.md— domain glossary (Plan, Plan Measure, Recommendation, Measure Option, Scenario, …).
Where things stand (what works)
- The
ModellingOrchestratorruns end-to-end and persists to a real Postgres: generate fabric candidates → role-1 score → optimise (least-cost-to-target) → role-3 attribute → bill → persist a Plan + its Plan Measures (recommendationrows linked byrecommendation.plan_id; the m2m is gone). Persists SAP, CO₂ (tonnes), cost + contingency, post-band, plan + per-measure energy/bill/kWh savings. - Proven by
tests/orchestration/test_ara_first_run_pipeline_integration.py::test_modelling_optimises_and_persists_a_multi_measure_plan(drives the orchestrator directly off a repo-seeded EPC — the e2e template). - All green: rebuild suite + legacy export/functions; pyright strict clean.
- 4 fabric generators only: cavity wall, loft, floor, ventilation (
domain/modelling/generators/).
The owner's goal (this phase)
"I have a big dump of SAP 10.2 EPCs. I want to run a bunch (1,000–10,000) through this and inspect the recommendations — a reasonably large-scale integration test. I also want to run the code via a Python console for manual testing. Once these measures work e2e, we flesh out the others."
Measure coverage is explicitly deferred. This phase is depth + scale on the existing 4 fabric measures:
- Close the persisted-field gaps (make a persisted Plan as rich as the engine for the measures we model):
recommendation_materials(BOM: depth / quantity / quantity_unit / estimated_cost). Today the rebuild'sCost(domain/modelling/recommendation.py) is a single fully-loadedtotal+contingency_rate— no per-material breakdown. Source: rebuildProductRepository(repositories/product/), legacybackend/app/db/functions/materials_functions.py+recommendations_functions.upload_recommendations(writesrec["parts"]).- Per-measure U-values (
starting_u_value/new_u_value),total_work_hours,labour_days. These columns already exist onRecommendationModel(NULL today).
- Financial uplift modelling (valuations) — greenfield in the rebuild (no domain concept exists; only
plan.valuation_*/recommendation.property_valuation_increasecolumns sit NULL). Legacy logic:backend/Property.py,backend/Funding.py,backend/app/db/functions/funding_functions.py,portfolio_functions.py. This wants its own design. - Large-scale e2e harness — run the EPC dump through Modelling and inspect recommendations:
- Parse each EPC via
EpcPropertyDataMapper(datatypes/epc/domain/mapper.py):from_api_response(API JSON) /from_rdsap_schema_21_0_0/from_rdsap_schema_21_0_1. Samples:backend/epc_api/json_samples/. - Seed via
EpcPostgresRepository(session).save(epc, property_id, portfolio_id)+ aScenarioModel+ theMaterialRows every firing generator prices against, thenModellingOrchestrator(...).run([...], [scenario_id], portfolio_id). (Baseline can't run on calculator fixtures — drive Modelling directly, as the template does.)
- Parse each EPC via
- Python-console manual run — instantiate the orchestrator against a DB and inspect Plans/Recommendations interactively.
Critical gotchas (carry these)
mip/CBC is broken on this aarch64 container — never build onmip.motonot installed —--ignoretests/orchestration/test_postcode_splitter_orchestrator.py+tests/repositories/unstandardised_address/when sweeping.- Run tests with
python -m pytest <path> -q(NOT-p no:cov). The rebuilddb_enginefixture builds onlySQLModel.metadata. - Worktree import trap — run via
pytest/python -cfrom the worktree root, notpython /tmp/foo.py(that imports/workspaces/model). - Don't edit the SAP calculator's
heat_transmission.py(another agent owns it). - The modelling SQLModel classes are
…Modelininfrastructure/postgres/modelling/(the old flatplan_table.py/scenario_table.pyare deleted);backend/app/db/models/recommendations.pyis a pure re-export shim.PortfolioGoallives indomain/modelling/. Out-of-cluster columns are plain ints (no FK — mirror convention).ScenarioModel.goalis thePortfolioGoalenum; the repo'sto_domainmaps it to its.value. etl/+sfr/and the live Drizzle migrations (addplan_id/ backfill / dropplan_recommendations, perdocs/migrations/recommendation-plan-id.md) are the owner's, not yours.- ADR-0014 limitation still applies: appliances + cooking stubbed at 0 kWh in bills.
Conventions
Stay on feature/bill-derivation; one TDD slice = one commit; conventional-commit ending Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>; AAA test headers; assert with abs(x - y) <= tol (not pytest.approx); pyright strict zero errors; annotate call-return locals.
How to start
Do NOT write code yet. The owner wants a grilling session first. Open by mapping the decision tree and surfacing the design questions, e.g.:
- BOM / Cost shape: does
Costgrow into a per-material breakdown (parts with depth/quantity/unit), or do materials become a separate concept the generators emit alongside the Option? How does the rebuildProductRepositorysupply material parts + U-values today vs. what the BOM needs? - Financial uplift: what's the valuation model (legacy
Property.py/Funding.py— back-solve or formula)? Which columns are in-scope (valuation lower/upper/avg, post-retrofit, rental yield)? Domain home for it? - Scale harness: is the EPC dump API-JSON or RdSAP-schema? Where does it live / how is it provided? Is it a committed test (subset) + a separate runnable script for the full 1k–10k? What's "inspect the recommendations" — assertions, a CSV/report, or console exploration? How to seed materials for all measure types at scale (catalogue completeness).
- Console UX: a small documented entrypoint/helper to build a
ModellingOrchestrator+ UoW against a chosen DB and run one property?
Tell the owner what you'll tackle first and whether you want a /grill-with-docs design pass (the financial-uplift and BOM-shape decisions are load-bearing and want ADRs).