Capture the next phase (close persisted-field gaps + financial uplift, plus a large-scale e2e run of a SAP 10.2 EPC dump and console manual testing; measure coverage deferred) and a self-contained handover prompt for a fresh agent to pick up via a grilling session. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
24 KiB
HANDOVER — Modelling stage rebuild
Branch: feature/bill-derivation (worktree /workspaces/home/hestia-worktrees/model-assemble-new-backend). HEAD: 6f0dcc04.
PRD: GitHub Hestia-Homes/Model#1152, sliced into #1153–#1161. All slices #1153–#1161 closed.
Issue status
| Issue | What | State |
|---|---|---|
| #1153 | Overlay Applicator + EpcSimulation |
✅ closed |
| #1154 | Package Scorer | ✅ closed — Elmhurst cascade pin (4c0a907a) |
| #1155 | wall Recommendation Generator | ✅ closed; cascade-pinned |
| #1156 | score Options + attribution | ✅ closed |
| #1157 | persist a Plan via ModellingOrchestrator |
✅ closed this session (772cdd4f→c7e2aa37) |
| #1158 | roof (loft) generator | ✅ closed — 300 mm + cascade pin |
| #1159 | floor generator | ✅ closed — overlay insulation-type field + pins |
| #1160 | Optimiser (knapsack + greedy repair) | ✅ closed this session (77983cae→34d4748a) |
| #1161 | Measure Dependency (ventilation) | ✅ closed this session (7c59e919→0fec0699) |
What this session did
- Cascade pins for #1154/#1158/#1159 —
tests/domain/modelling/test_elmhurst_cascade_pins.py. Parse Elmhurst before/after recommendation Summaries via the extractor chain (NOTparse_site_notes_pdf), apply the generator's overlay, score, assert delta 0 vs the after-cert. Found+fixed: loft 270→300 mm; suspended floor needs the overlay to also setfloor_insulation_type_str='Retro-fitted'. ProductJsonRepository(cc0bb8f9) — file-backed catalogue behind theProductRepositoryport.- #1157 — persist a Plan. Design review (
/grill-with-docs) + 5 TDD slices. See "Design decisions" below. - #1160 — the Optimiser. 4 TDD slices. See "Design decisions".
Design decisions locked this session (READ THESE)
- Multi-phase is DEFERRED (speculative prospective-client ask). ADR-0005 rewritten to "Deferred". No
plan_phasetable, nophasecolumn.CONTEXT.mdno longer has Scenario Phase / Plan Phase / Rolled-over Options. Everything is single-phase. Future: a migration addsplan_phase+ back-fills live plans as 1-phase. - Plan Measure is the new term (in
CONTEXT.md): the persisted selected Option + its role-3 attributed impact + cost. Recommendation stays the candidate (never persisted; no stored impact). - Reuse the LIVE tables (
plan,recommendation) — they exist in the live product (backend/app/db/models/recommendations.py, SQLAlchemyBase) and the FE reads them. The rebuild writes the same physical tables via SQLModel mirrors (infrastructure/postgres/plan_table.py) — the established pattern (task_table.py→tasks,product_table.py→material). ADR-0017 records this. - Added
recommendation.plan_id(FK→plan, ON DELETE CASCADE); retire theplan_recommendationsm2m for new writes. FE-owned Drizzle migration:docs/migrations/recommendation-plan-id.md. - Tracer persists SAP + CO₂ (tonnes = calc kg ÷ 1000) + cost + derived
post_epc_rating. Energy/bill columns deferred. Idempotent replace per (property_id, scenario_id). - Optimiser = exact pure-Python multiple-choice knapsack, NOT
mip. RecyclesGainOptimiser/CostOptimiser's formulation (≤1/group, maximise gain s.t. budget) but not the dependency —mip's CBC backend does not load on this aarch64 container (NameError: cbclib), so the legacy solver can't run/be tested here. ADR-0016's MILP is only a warm-start signal, so exact small-scale enumeration is ample. Re-score + greedy-repair toward the goal's SAP target gives the truth.
domain/modelling/ layout (grouped 84ec6da0)
Behaviour lives in subpackages; shared value-object vocabulary stays flat at the top (imported everywhere): recommendation.py (Recommendation / MeasureOption / Cost), plan.py, scenario.py, product.py, contingencies.py, simulation.py (EpcSimulation overlay).
generators/—wall_recommendation/roof_recommendation/floor_recommendation.scoring/—overlay_applicator(apply_simulations),package_scorer(role 2),scoring(role-1independent_option_impacts+ role-3marginal_impacts). Note the path isdomain.modelling.scoring.scoringfor the role-1/3 module.optimisation/—optimiser,measure_dependency.
What's built (all in domain/modelling/, infrastructure/postgres/, repositories/, orchestration/)
- Generators (
generators/):recommend_cavity_wall/recommend_loft_insulation(300 mm) /recommend_floor_insulation(setsfloor_insulation_type_str). simulation.pyoverlay +scoring/overlay_applicator.apply_simulations(generic field-fold) +scoring/package_scorer.PackageScorer.score(role 2) +scoring/scoring.py(marginal_impactsrole 3,independent_option_impactsrole 1).scenario.pyScenario(id, goal, goal_value, budget, is_default);plan.pyPlan+PlanMeasure(derives cost_of_works/contingency_cost/co2_savings/post_epc_rating).optimisation/optimiser.py—optimise(groups, budget)(exact knapsack) +optimise_package(...)(re-score + greedy repair,ScorerProtocol,OptimisedPackage).infrastructure/postgres/:scenario_table.ScenarioRow,plan_table.{PlanRow,RecommendationRow}(mirrors of live tables;from_domain).repositories/:scenario/,plan/,product/(Postgres + Json) — all on theUnitOfWork(uow.scenario/uow.product/uow.plan).ModellingOrchestrator.run(property_ids, scenario_ids, portfolio_id)— one UoW, commit once; generate (wall/roof/floor) → role-1 score →optimise_package→ role-3 attribute → persist. Wired intoAraFirstRunPipeline+handler.py.datatypes/epc/domain/epc.py::Epc.sap_lower_bound()(band → min SAP, target for INCREASING_EPC).
Gotchas (will bite a fresh agent)
mip/ CBC is broken on aarch64 here — never build runnable code onmip. The legacyrecommendations/optimiser/tests only "pass" because they avoid constructing amip.Model.motois not installed —tests/orchestration/test_postcode_splitter_orchestrator.pyandtests/repositories/unstandardised_address/fail at collection. Pre-existing, unrelated;--ignorethem when sweeping.- Run tests:
python -m pytest <path> -q(do NOT pass-p no:cov). Ephemeral Postgres via thedb_enginefixture builds onlySQLModel.metadata— legacyBasetables are absent in tests, which is why mirrors work. - Worktree import trap:
python /tmp/foo.pyimports/workspaces/model, not this worktree. Usepytest(rootdir handles it) or apython -cfrom the worktree root. - Driving Modelling in an integration test: the calculator fixtures (
_elmhurst_worksheet_000490.build_epc()) lack lodged recorded-performance fields, so the Baseline stage can't run on them. DriveModellingOrchestratordirectly off a repo-seeded EPC (EpcPostgresRepository(session).save(epc, property_id, portfolio_id)) — seetest_modelling_optimises_and_persists_a_multi_measure_plan. The sample API EPC (_lodged_epc()) does go through the full pipeline. PortfolioGoal.INCREASING_EPCvalue is"Increasing EPC"(with a space) — the orchestrator comparesscenario.goal == "Increasing EPC".- A generator calls
products.get(...)during candidate generation, so the integration test must seed amaterialrow for every measure type that fires (e.g. the sample EPC's uninsulated solid floor needssolid_floor_insulation). - Don't edit the SAP calculator's
heat_transmission.py(another agent owns it).
Conventions
Commit per TDD slice; conventional-commit message ending Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>; stay on feature/bill-derivation. Tests use literal # Arrange / # Act / # Assert; assert with abs(x - y) <= tol (not pytest.approx); pyright strict, zero errors; annotate call-return locals. Cascade pins target the worksheet at delta 0.
#1161 — Measure Dependency (ventilation), as built (4 TDD slices, all green)
Forks resolved with the user (AskUserQuestion): guard now (skip when already MEV/MVHR), persist as a Plan Measure (cost + real negative marginal), forced but its cost counts toward spend (mandatory-when-triggered, never budget-gated; repair sees less headroom).
7c59e919— Simulation Overlay grows a dwelling-level segment:VentilationOverlay(all-optional partial ofSapVentilation, fieldmechanical_ventilation_kind) +EpcSimulation.ventilation;apply_simulationsfolds it ontosap_ventilation(creating one if the baseline lodged none). Until now the overlay was building-part only — ventilation is whole-dwelling.6b11c902— generic injection in the optimiser:MeasureDependency(triggers: frozenset[str], required: ScoredOption)lives inoptimisation/optimiser.py(its input contract).optimise_package(..., dependencies=())injects any dependency whose triggers ∩ selected-measure-types, before every re-score (initial and each repair)._injectdedups by required measure-type. Forced (injected even over budget) but its cost is in_package_cost, so repair headroom shrinks._best_repair_candidatefolds in any dependency a candidate newly triggers, so its marginal SAP and incremental cost are truthful; affordability gates on whole-package cost vs budget. Returnedselectedincludes the injected deps. Optimiser stays domain-agnostic — no ventilation import.1bf5b410—domain/modelling/optimisation/measure_dependency.py:MEASURES_NEEDING_VENTILATION(cavity/internal/external wall, cf. legacyassumptions.measures_needing_ventilation) +ventilation_dependency(epc, products)→ MEV Option (mechanical_ventilation_kind="EXTRACT_OR_PIV_OUTSIDE", decentralised MEV = legacy "mechanical, extract only"), priced at 2 fully-loaded units. Returns None whensap_ventilation.mechanical_ventilation_kindis already set (= legacyhas_ventilation— confirmed againstbackend/Property.py:1236). Note: builder fetches the Product up-front, so the catalogue needs amechanical_ventilationrow for every not-yet-ventilated dwelling, even if no wall is ultimately selected.0fec0699— orchestrator wiring:_measure_dependenciesbuilds the (≤1) dependency;_BEST_PRACTICE_ORDERgains"mechanical_ventilation"between loft and floors (role-3 cascade walls→roof→vent→floor); ventilation persists as a Plan Measure with its real negative marginal + cost. Addedmechanical_ventilation: 0.26contingency (legacyCosts.CONTINGENCIES). On 000490 the real calculator scores MEV at −1.275 SAP.
Post-#1161 refactor (631df921→02afc04c): production split from selection-semantics. Detection + pricing moved into a proper generator generators/ventilation_recommendation.py::recommend_ventilation(epc, products) -> Optional[Recommendation] (same shape as wall/roof/floor; guard returns None when already mechanically ventilated). optimisation/measure_dependency.py now owns only the trigger set + the forced-edge wrapping: ventilation_dependency delegates to the generator and wraps the Recommendation (cheapest Option) into the MeasureDependency. The orchestrator's _measure_dependencies call is unchanged. Key asymmetry: recommend_ventilation lives in generators/ but is not in _candidate_recommendations' generator tuple — it's consumed only by the dependency path, never the free pool. This is the natural home for the multi-option future (MEV-c / MVHR) and the FE swap-in front.
Gotchas for the next agent: the ventilation Product/contingency must exist for any not-yet-ventilated dwelling (the generator fetches the Product at build time, not inject-time); the stub scorer in test_optimiser.py indexes building_parts[MAIN], so vent-only overlays need the separate _VentStubScorer.
Optimiser objective realigned to least-cost-to-target (5620f49f→641c1bd7)
A /grill-with-docs pass found the rebuild had the wrong optimiser objective: it maximised SAP gain within budget (target as a repair floor), whereas the legacy StrategicOptimiser.solve() Case 1 (the intended behaviour) is min-cost subject to gain ≥ target and cost ≤ budget, fall back to max-gain only if the target is unreachable. ADR-0016 was amended (it had specified the wrong objective). 4 slices, all green:
05a4f5f8—optimise_min_cost(groups, budget, target_gain, dependencies=()): exact-enumeration sibling tooptimise; cheapest package reachingtarget_gainwithin budget (ties → higher gain),Noneif unreachable.2bf42d04—optimise_packagerewired: target present → min-cost warm-start → inject → re-score → repair toward target; if warm-start infeasible or repaired package still short on the true score →_max_gain_packagefallback. No target → max-gain (unchanged). Stops at the target, no overshoot into a higher band, surplus budget unspent.af501fce— ventilation-aware selection:_with_role1_signalsscores each dependency's true (negative) role-1 impact (was a0.0placeholder);_augmented_cost_gainfolds the triggered dependency into every candidate's cost+gain in both selectors. Stops min-cost picking a wall whose mandatory ventilation (−1 to −5 SAP) it can't justify, or whose £900 a wall-free package would avoid.641c1bd7— orchestrator needed no change (already threads budget/target/deps); added an end-to-end pin (band-D property + goal D = already met → Plan with no measures).
Decisions locked (in the ADR amendment): target predicate sap_continuous ≥ band_floor (e.g. ≥ 69 for C — conservative, no legacy allow_slack); budget is a hard envelope — a wall whose ventilation would bust the budget is dropped, not forced over (reverses the earlier "forced regardless of budget" call; presence still guaranteed for any selected wall); warm-start-on-signal + re-score + repair kept (not exhaustive re-score) for scalability; "recommend slightly more than land short" is satisfied by the conservative floor + repair, not by spending budget for headroom.
Bill-Derivation: plan-level post-retrofit bills (75ba5dd7→198122d1)
A /grill-with-docs pass designed the Modelling Bill-Derivation slice (ADR-0014 amended). Plan-level columns done across 4 slices; per-measure is the next slice.
ced6287b— relocatedBill/EnergyBreakdown/BillDerivation/sap_fuel(+ tests) fromdomain/property_baseline/to a neutraldomain/billing/(cross-stage concern; both Baseline and Modelling consume it). Pure move, ~10 files.2bbc401f—Scoregainssap_result: Optional[SapResult], populated byPackageScorer. Lets Modelling bill the scored end-state reusing aSapResultthe optimiser/orchestrator already computed — no secondcalculate. Optimiser ignores it (staysScore-only; stubs unaffected).26de28aa—Plancarries optionalbaseline_bill/post_billand derivespost_energy_bill/energy_bill_savings/post_energy_consumption/energy_consumption_savings(None until billed → NULL).198122d1—ModellingOrchestratorgains a constructor-injectedFuelRatesRepository(mirrors Baseline —get_current()once, oneBillDerivationper batch);_plan_forbills the baseline (scorer.score(epc, [])) and post-package (package.score)SapResults at the same snapshot, savings = baseline − post.PlanRowmirror +from_domainpersist the four columns (they already exist on the liveplantable — no FE migration). Pipeline/handler wired.
Key properties: fuel-switch is handled for free — we bill the fully-overlaid post-package SapResult, so a future oil→ASHP measure prices at the new fuel via sap_code_to_fuel (no per-measure fuel bookkeeping). Baseline and post are priced at one FuelRates snapshot, so the delta is rate-consistent. Carries ADR-0014's appliances+cooking-stubbed-at-0 limitation (shared with Baseline, so savings stay consistent).
Bill-Derivation: per-measure bill savings (e79ffabf→b976c3ab) — DONE
Filled recommendation.kwh_savings + energy_cost_savings via the telescoping bill cascade over the role-3 best-practice order. 3 slices, all green + pyright-strict-clean:
e79ffabf— enabling refactor: pulled the cumulative-prefix scoring out ofmarginal_impactsinto a reusablescoring.cascade_scores(scorer, baseline, overlays) -> list[Score](index 0 = baseline, onecalculateper prefix) + a puremarginals_from_scores. Each Score carries itsSapResult, so the bill cascade re-bills the same prefixes the role-3 attribution scores — no extracalculate.marginal_impactsnow delegates (behaviour unchanged).7e79c30a—PlanMeasuregrows optionalkwh_savings(delivered energy) +energy_cost_savings(£), signed so positive = saving,Noneuntil billed.RecommendationRowdeclares the liverecommendation.kwh_savings/energy_cost_savingscolumns + maps them (None→NULL). Vestigialrecommendation.energy_savingsstays undeclared (legacy = 0). No FE migration (columns already live).b976c3ab—_plan_forscores baseline + every prefix once viacascade_scores, bills each at one Fuel Rates snapshot, and takes consecutive Bill deltas as each measure's marginal delivered-kWh + £ saving. The Plan'sbaseline_bill/post_billare now the same cascade endpoints (bills[0]/bills[-1]), so per-measure savings telescope exactly to the headline savings — pinned on the real calculator (Σ per-measure == plan totals, abs ≤ 1e-6). Ventilation's saving is negative and still telescopes. AddedBill.total_consumption_kwh(shared by Plan + orchestrator); dropped the redundant standalone baselinecalculate.
Key property: MeasureImpact.energy_savings_kwh_per_yr is primary energy and does not feed kwh_savings — kwh_savings is delivered energy from the Bill section kWh. Carries ADR-0014's appliances+cooking-stubbed-at-0 limitation.
Retire plan_recommendations + consolidate models (b76d0f81→6f0dcc04) — DONE
Designed in /grill-with-docs + /grill-me. The live plan/recommendation tables are read directly by the Drizzle FE, so this was a two-repo expand/contract. FE-visibility goal met: Plans and their measures now link solely by recommendation.plan_id; the m2m is gone. 9 slices, all green + pyright-strict-clean, and the rebuild + legacy suites are now co-runnable (the consolidation fixed a pre-existing dual-definition collision).
b76d0f81— migration spec (docs/migrations/recommendation-plan-id.md: addplan_id→ backfill → dual-write → cut reads → drop; backfill-before-reads + dual-write are the load-bearing rules since the FE can't deploy atomically) + ADR-0017 amendment.c1c7b06f— consolidateplan/recommendation/recommendation_materialsintoinfrastructure/postgres/modelling/as single SQLModel defs (absorbing the partialPlanRow/RecommendationRowmirrors, full column parity +plan_id).backend/app/db/models/recommendations.py→ re-export shim. Export conftest: create SQLModel-first / skip the redundantdrop_all(theepcenum type is now shared across both metadatas).27fcc5b1— legacy writers setrecommendation.plan_id(dual-write).af5dbe32— cut all three readers (portfolio_functions,Outputs,export/property_scenarios) ontoplan_id.b97d0688— drop the m2m: writes,delete_property_batchcleanup, thePlanRecommendationRowmodel, thetest_exportfixtures.01c2c391— rename the cluster…Row→…Model(matches theepc_propertyprecedent + the legacy namesbackend/already imports, so the shim's plan re-export is literal). The non-cluster…Rowtables stay until their live legacy…Modelcounterparts retire (renaming now would re-create dual-definition collisions).2fbd7147— movePortfolioGoaltodomain/modelling/portfolio_goal.py(domain vocab; infra→domain is the normal direction);portfolio.pykeeps a re-export.c18968ba— consolidatescenario+installed_measure(full-parityScenarioModel/InstalledMeasureModel+MeasureType).ScenarioModel.goalis thePortfolioGoalenum (legacy planning branches on it); the repo'sto_domainmaps it to its value, soScenario.goalis now the value"Increasing EPC"consistent with the orchestrator — fixing the latent name-vs-value bug the oldstrcolumn masked.6f0dcc04— characterization test for the FE aggregationaggregate_portfolio_recommendations(was untested), pinning theplan_idjoin.
Gotchas for the next agent: the modelling SQLModel classes are …Model and live in infrastructure/postgres/modelling/ (NOT the old flat plan_table.py/scenario_table.py — deleted); backend/app/db/models/recommendations.py is now a pure shim. Out-of-cluster columns are plain ints (no FK) per the mirror convention. PortfolioGoal lives in domain/modelling/ now. The etl/+sfr/ reporting scripts still reference the m2m and are deferred (out of scope). The live DB changes (add plan_id, backfill, drop plan_recommendations) are the FE-owned Drizzle migrations in the migration doc — this branch is the backend end-state.
NEXT PHASE — depth + scale e2e (handover for a grilling session)
The owner's goal: run a large dump of SAP 10.2 EPCs (1,000–10,000) through Modelling and inspect the recommendations — a large-scale integration test — plus manual testing via a Python console. Measure coverage (heating/solar/glazing/…) is explicitly deferred ("we'll flesh this out"). This phase is depth + scale on the existing 4 fabric measures (cavity wall / loft / floor / ventilation):
- Close the persisted-field gaps so a persisted Plan matches the engine's richness for the measures we do model:
recommendation_materials(BOM — depth/quantity/unit/cost; rebuildCostis a single total today, no per-material breakdown), per-measure U-values (starting_u_value/new_u_value),total_work_hours/labour_days. Source of truth: the rebuildProductRepository(repositories/product/) + legacymaterials_functions.py/recommendations_functions.upload_recommendations(writesrec["parts"]). - Financial uplift modelling — valuation columns (
plan.valuation_*,recommendation.property_valuation_increase/rental_yield_increase) are greenfield in the rebuild (no domain concept yet). Legacy logic:backend/Property.py,backend/Funding.py,backend/app/db/functions/funding_functions.py,portfolio_functions.py. Needs a domain design (likely a/grill-with-docspass). - Large-scale e2e harness — template is
tests/orchestration/test_ara_first_run_pipeline_integration.py::test_modelling_optimises_and_persists_a_multi_measure_plan(seeds an EPC viaEpcPostgresRepository+MaterialRows + aScenarioModel, runsModellingOrchestratordirectly — the Baseline stage can't run on calculator fixtures). For the dump: parse each EPC viaEpcPropertyDataMapper.from_api_response/from_rdsap_schema_21_0_x(seedatatypes/epc/domain/mapper.py), seed, run, inspect. EPC samples live underbackend/epc_api/json_samples/. - Python-console manual run — instantiate
ModellingOrchestratoragainst a real DB and inspect Plans/Recommendations. Mind the worktree import trap (run from the worktree root, not/tmp).
A self-contained handover prompt for the next agent is in docs/HANDOVER_NEXT_PHASE_PROMPT.md.
What's left
Deferred fronts (open, post-#1161): exclusion-filtering of the candidate pool (deferred from #1160); persist unselected alternatives (default=False rows linked via plan_id) for the swap-in UX — open ADR-0016 question: what impact figure they carry; promote ProductRepository to the DB+file composite; non-EPC goal objectives (Energy Savings, Reducing CO2) in the optimiser. Possible extension of the ventilation trigger set to roof insulation (now a one-line data edit in MEASURES_NEEDING_VENTILATION); and making the dependency builder lazy (thunk) so the Product is only fetched when a trigger is actually selected.
Key references
- ADRs: 0005 (multi-phase deferred), 0011/0012 (orchestrators + UoW), 0016 (three scoring roles + warm-start/re-score/repair), 0017 (Plan persistence — evolve live tables).
CONTEXT.md: Plan, Plan Measure, Recommendation, Measure Option, Optimised Package, Scenario, Measure Dependency.- Auto-memory
project_modelling_stage_statehas the running state.