Factual staleness fix flagged in the handover; the calculator lives in domain/sap10_calculator/calculator.py. Glossary term 'Baseline Performance' deliberately left unchanged (concept vs PropertyBaselinePerformance class). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
9.5 KiB
Handover — Ara backend: Property Baseline (SAP calculator) + Modelling
You are picking up a clean, merged baseline. The ara_first_run backend rebuild is
done and shipped; the next two fronts are (1) wiring the SAP calculator into
Property Baseline, and (2) starting Modelling. This doc is the orientation — the ADRs
and CONTEXT.md are authoritative for decisions; don't re-derive them.
Where things stand
- The
ara_first_runrebuild is complete and merged tomain(viafeature/per-cert-mapper-validation): the full pipeline spine Ingestion → Baseline → Modelling(stub) on a flat-hexagonal layout with a per-stage Unit-of-Work. Issues #1129–#1138 (parent PRD #1128) are all done. - Branch + worktree: you are on
feature/property-baseline-sap10, cut from the up-to-datefeature/per-cert-mapper-validation(which containsmain+ the merged ara work + the ongoing per-cert SAP-calculator validation slices). Worktree:/workspaces/home/hestia-worktrees/model-assemble-new-backend. The/workspaces/modelworktree holdsfeature/per-cert-mapper-validationitself. - PRs go into
feature/per-cert-mapper-validation, NOTmaindirectly — one PR per slice, the rhythm used for #1129–#1138.
Read first (authoritative — don't re-derive)
- ADRs
docs/adr/: 0002 (Property aggregate root), 0003 (strict Ingestion→Modelling separation, amended), 0004 (BaselinePerformance = Lodged+Effective pair, amended for the standalone table), 0005 (multi-phase Scenarios, per-phase recompute — governs Modelling), 0006/0007 (deterministic kWh / kWh-as-ML-target), 0009+0010 (deterministic SAP calculator + its spec target & validation cohort), 0011 (composable stage orchestrators, one lambda per use case, stages talk through repos), 0012 (Unit-of-Work per-stage batch transaction). - CONTEXT.md — the glossary; use this vocabulary in code + commits.
ara_backend_design.mdis a stale draft PRD — its architecture sections are superseded by ADR-0011/0012 (a banner now says so). Trust the ADRs, not it.
Architecture (current — flat hexagonal at repo root)
applications/<lambda>/ thin handler + trigger body + Dockerfile + local_handler
orchestration/ stage orchestrators + AraFirstRunPipeline (deps injected)
domain/ pure aggregates + services
repositories/<agg>/ port (ABC) + adapter (*_postgres_repository / *_s3_repository)
infrastructure/ clients + SQLModel rows (*_table.py) + engine/config
Stages communicate only through repos, threading just property_ids — never an
in-memory hand-off (ADR-0011/0003). Each stage runs its batch in one Unit of Work and
commits once (ADR-0012); all-or-nothing per batch, fail noisily → subtask FAILED →
debug & re-run; re-runs are idempotent (replace-by-property_id). Ingestion is
fetch-then-write so a DB connection is never held during external IO.
Key files (note the recent rename: baseline → property_baseline; FirstRun → AraFirstRun)
orchestration/ara_first_run_pipeline.py—AraFirstRunPipeline,AraFirstRunCommand, theIngestionStage/PropertyBaselineStage/ModellingStageProtocols.orchestration/property_baseline_orchestrator.py—PropertyBaselineOrchestrator(this is where the SAP calculator gets wired).orchestration/ingestion_orchestrator.py,orchestration/modelling_orchestrator.py(stub).domain/property_baseline/—PropertyBaselinePerformance,Performance,lodged_performance(),Rebaseliner/StubRebaseliner.repositories/property_baseline/(port + postgres adapter),repositories/unit_of_work.py+repositories/postgres_unit_of_work.py.repositories/scenario/,repositories/materials/— empty seam ports for Modelling.infrastructure/postgres/property_baseline_performance_table.py— flat-column row.applications/ara_first_run/handler.py—build_first_run_pipelinewiring +_source_clients_from_env(a seam that raises — see Stubs below).- SAP calculator (for task 1):
domain/sap10_calculator/calculator.py, classSap10Calculator, returns aSapResult(5 quantities + monthly + worksheet audit). It is mature and heavily validated by the per-cert work on this branch.
Conventions + gotchas
- TDD, one test → one impl;
# Arrange / # Act / # Assertheaders; commit per slice with a spec/ADR citation and theCo-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>trailer. - Tests: real ephemeral PostgreSQL via the
db_enginefixture (JSONB needs real PG). Orchestrator/repo unit tests use fakes —tests/orchestration/fakes.py(FakeUnitOfWorkexposingproperty/epc/solar/property_baselinerepos + commit count). Run with-p no:cacheprovider; ignore coverage spam. - pyright strict, zero errors. Known noise to ignore: a
venvPathwarning; themoto-not-installed import errors intest_postcode_splitter_orchestrator.py+test_user_address_csv_s3_repository.py(those modules don't collect —--ignorethem); and 4 pre-existing failures outsidetests/(summary_pdf_mapper_chain ×3 + from_rdsap_schema total_floor_area). - Pushing from this worktree: the VS Code git credential helpers are broken
(missing node binaries), so use a one-shot gh override:
git -c credential.helper= -c credential.helper='!gh auth git-credential' push.
Next task 1 — SAP calculator on Property Baseline (the user expects this to be simple)
Wire Sap10Calculator into PropertyBaselineOrchestrator to produce Calculated SAP10
Performance per property. Per CONTEXT (≈line 100), this is a quantity distinct from
Lodged/Effective Performance — surfaced alongside them during the validation phase; it
may supersede Effective Performance in a later ADR once parity is confirmed (ADR-0009/0010).
Grill these two before coding (/grill-with-docs):
- Where it sits. Recommended: a third value-set on
PropertyBaselinePerformance(calculated: Performance+ its space/water kWh), persisted ascalculated_*columns onproperty_baseline_performance— not an overwrite ofeffective. Pin the aggregate shape + table migration in one pass (the table migration is FE-owned/Drizzle — seedocs/migrations/property-baseline-performance-table.md). - Failure posture. The calculator strict-raises (
UnmappedSapCode, etc.) on certs it can't yet handle. Running it over a real cohort surfaces those gaps — which is the validation workfeature/per-cert-mapper-validationexists for. Decide: let the raise abort the batch (ADR-0012 all-or-nothing), or collect/skip-and-report. This is the main judgment call; "simple to wire" but it lights up the validation surface.
Then TDD: inject the calculator into PropertyBaselineOrchestrator, call it on the
Effective EPC, persist the calculated set in the same unit.
Next task 2 — Modelling (Recommendations / Optimiser / Plans)
ModellingOrchestrator.run(property_ids, scenario_ids) is a no-op stub;
ScenarioRepository and MaterialsRepository are empty seam ports. Building this out
is the third stage. ADR-0005 (multi-phase Scenarios, per-phase recompute) governs it.
Relevant CONTEXT terms: Modelling (stage), Scenario, Scenario Phase, Scenario Snapshot,
Optimised Package, Plans, Recommendations, Optimiser Service.
Before coding, grill the port shapes + the Scenario/Materials domain aggregates. Two known open points:
MaterialsRepositorynaming. A PR reviewer suggestedBuildingMaterialsRepository; this was deliberately deferred to this grill because "building materials" may under-describe retrofit measures (a heat pump / ASHP is a measure/product, not a building material). Settle the term (Materials / Measures / Products / BuildingMaterials) here.- Modelling will need a Unit of Work when it writes Plans — the stub currently takes
no
unit_of_work; it gains one (ADR-0012) when its body is built.
Stubs / seams that raise or no-op (do NOT mistake for "done")
applications/ara_first_run/handler.py::_source_clients_from_env— raisesNotImplementedError. EPC-API / Google-Solar / geospatial-S3 client config + env-var names + pandas/s3fs deps + Terraform wiring are a separate deploy piece (out of scope so far). The lambda is not end-to-end runnable until this is filled in.ModellingOrchestrator.run— no-op.ScenarioRepository/MaterialsRepository— empty ABC ports.StubRebaseliner— raisesRebaselineNotImplementedon pre-SAP10 certs (sap_version < 10); ML Rebaselining is not implemented.- EPC Energy Derivation (fuel split + bills + the Ofgem-cap Fuel Rates ETL) is
deferred — kWh is carried on
PropertyBaselinePerformance, the rest is not.
Known doc drift to be aware of (flagged, intentionally not auto-fixed)
- CONTEXT.md term vs code class. The glossary term is "Baseline Performance"; the
code class is
PropertyBaselinePerformance(renamed on PR review). The glossary was deliberately left un-renamed — treat "Baseline Performance" as the spoken concept andPropertyBaselinePerformanceas its class. If you want them aligned, rename the term to "Property Baseline Performance" across CONTEXT + ADR prose (a quick, mechanical change).
Issues / process
Parent PRD: gh issue view 1128 --repo Hestia-Homes/Model. #1129–#1138 done (each with a
"Done." comment). New work → new issues (use /to-issues or /triage), ready-for-agent
labelled, parented to #1128.