From dafc50f6ed9bae62b10b4f9beeb9d9dff4fc51bb Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Mon, 1 Jun 2026 16:20:06 +0000 Subject: [PATCH] docs(ara): next-agent handover for Property Baseline (SAP calc) + Modelling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Orientation for the next chat picking up the two open fronts after the ara_first_run rebuild shipped: - where things stand (merged to main via per-cert; branch/worktree layout; PRs into per-cert), authoritative ADRs/CONTEXT to read, - current architecture + key files (post baseline→property_baseline / FirstRun→AraFirstRun rename), - conventions + gotchas (TDD, ephemeral PG, FakeUnitOfWork, pyright noise to ignore, gh-credential push workaround), - Task 1: wire Sap10Calculator into PropertyBaselineOrchestrator (Calculated SAP10 Performance as a third value-set; failure-posture decision), - Task 2: Modelling (stubs to build out; MaterialsRepository naming open; needs a UoW when writing Plans), - the raising/no-op seams not to mistake for done, - known doc drift flagged (CONTEXT term vs PropertyBaselinePerformance class; stale domain/sap/ path → domain/sap10_calculator). Also banners ara_backend_design.md as superseded (architecture) by ADR-0011/0012. Co-Authored-By: Claude Opus 4.8 --- ara_backend_design.md | 6 ++ docs/HANDOVER_ARA_NEXT.md | 155 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 161 insertions(+) create mode 100644 docs/HANDOVER_ARA_NEXT.md diff --git a/ara_backend_design.md b/ara_backend_design.md index f3d11696..c2f3f542 100644 --- a/ara_backend_design.md +++ b/ara_backend_design.md @@ -1,5 +1,11 @@ # ARA Backend Redesign — Design PRD +> ⚠️ **SUPERSEDED (architecture sections).** This is an early draft PRD. The actual +> architecture as built differs — see the ADRs in `docs/adr/` (especially 0011 +> composable stage orchestrators, 0012 Unit-of-Work per-stage batch) and +> `docs/HANDOVER_ARA_NEXT.md` for current state. Treat this doc as historical context, +> not the source of truth for layout/contracts. + **Status**: Draft for team review **Author**: Khalim Conn-Kowlessar (with Claude grill session) **Branch**: `ara-backend-design-prd` diff --git a/docs/HANDOVER_ARA_NEXT.md b/docs/HANDOVER_ARA_NEXT.md new file mode 100644 index 00000000..61eac61a --- /dev/null +++ b/docs/HANDOVER_ARA_NEXT.md @@ -0,0 +1,155 @@ +# Handover — Ara backend: Property Baseline (SAP calculator) + Modelling + +You are picking up a clean, merged baseline. The `ara_first_run` backend rebuild is +**done and shipped**; the next two fronts are (1) wiring the SAP calculator into +Property Baseline, and (2) starting Modelling. This doc is the orientation — the ADRs +and CONTEXT.md are authoritative for decisions; don't re-derive them. + +## Where things stand + +- The **`ara_first_run` rebuild is complete and merged to `main`** (via + `feature/per-cert-mapper-validation`): the full pipeline spine + **Ingestion → Baseline → Modelling(stub)** on a flat-hexagonal layout with a + per-stage Unit-of-Work. Issues #1129–#1138 (parent PRD #1128) are all done. +- **Branch + worktree:** you are on `feature/property-baseline-sap10`, cut from the + up-to-date `feature/per-cert-mapper-validation` (which contains `main` + the merged + ara work + the ongoing per-cert SAP-calculator validation slices). Worktree: + `/workspaces/home/hestia-worktrees/model-assemble-new-backend`. The + `/workspaces/model` worktree holds `feature/per-cert-mapper-validation` itself. +- **PRs go into `feature/per-cert-mapper-validation`, NOT `main` directly** — one PR + per slice, the rhythm used for #1129–#1138. + +## Read first (authoritative — don't re-derive) + +- **ADRs** `docs/adr/`: 0002 (Property aggregate root), 0003 (strict Ingestion→Modelling + separation, amended), 0004 (BaselinePerformance = Lodged+Effective pair, amended for + the standalone table), 0005 (multi-phase Scenarios, per-phase recompute — **governs + Modelling**), 0006/0007 (deterministic kWh / kWh-as-ML-target), 0009+0010 + (deterministic SAP calculator + its spec target & validation cohort), 0011 (composable + stage orchestrators, one lambda per use case, stages talk through repos), 0012 + (Unit-of-Work per-stage batch transaction). +- **CONTEXT.md** — the glossary; use this vocabulary in code + commits. +- **`ara_backend_design.md`** is a **stale draft PRD** — its architecture sections are + superseded by ADR-0011/0012 (a banner now says so). Trust the ADRs, not it. + +## Architecture (current — flat hexagonal at repo root) + +``` +applications// thin handler + trigger body + Dockerfile + local_handler +orchestration/ stage orchestrators + AraFirstRunPipeline (deps injected) +domain/ pure aggregates + services +repositories// port (ABC) + adapter (*_postgres_repository / *_s3_repository) +infrastructure/ clients + SQLModel rows (*_table.py) + engine/config +``` + +Stages communicate **only through repos**, threading just `property_ids` — never an +in-memory hand-off (ADR-0011/0003). Each stage runs its batch in **one Unit of Work and +commits once** (ADR-0012); all-or-nothing per batch, fail noisily → subtask FAILED → +debug & re-run; re-runs are idempotent (replace-by-`property_id`). Ingestion is +fetch-then-write so a DB connection is never held during external IO. + +## Key files (note the recent rename: baseline → property_baseline; FirstRun → AraFirstRun) + +- `orchestration/ara_first_run_pipeline.py` — `AraFirstRunPipeline`, `AraFirstRunCommand`, + the `IngestionStage`/`PropertyBaselineStage`/`ModellingStage` Protocols. +- `orchestration/property_baseline_orchestrator.py` — `PropertyBaselineOrchestrator` + (**this is where the SAP calculator gets wired**). +- `orchestration/ingestion_orchestrator.py`, `orchestration/modelling_orchestrator.py` (stub). +- `domain/property_baseline/` — `PropertyBaselinePerformance`, `Performance`, + `lodged_performance()`, `Rebaseliner`/`StubRebaseliner`. +- `repositories/property_baseline/` (port + postgres adapter), + `repositories/unit_of_work.py` + `repositories/postgres_unit_of_work.py`. +- `repositories/scenario/`, `repositories/materials/` — **empty seam ports** for Modelling. +- `infrastructure/postgres/property_baseline_performance_table.py` — flat-column row. +- `applications/ara_first_run/handler.py` — `build_first_run_pipeline` wiring + + `_source_clients_from_env` (a seam that **raises** — see Stubs below). +- **SAP calculator (for task 1):** `domain/sap10_calculator/calculator.py`, class + `Sap10Calculator`, returns a `SapResult` (5 quantities + monthly + worksheet audit). + It is mature and heavily validated by the per-cert work on this branch. + +## Conventions + gotchas + +- **TDD**, one test → one impl; `# Arrange / # Act / # Assert` headers; **commit per + slice** with a spec/ADR citation and the + `Co-Authored-By: Claude Opus 4.8 ` trailer. +- Tests: real ephemeral PostgreSQL via the `db_engine` fixture (JSONB needs real PG). + **Orchestrator/repo unit tests use fakes** — `tests/orchestration/fakes.py` + (`FakeUnitOfWork` exposing `property`/`epc`/`solar`/`property_baseline` repos + commit + count). Run with `-p no:cacheprovider`; ignore coverage spam. +- **pyright strict, zero errors.** Known noise to ignore: a `venvPath` warning; the + `moto`-not-installed import errors in `test_postcode_splitter_orchestrator.py` + + `test_user_address_csv_s3_repository.py` (those modules don't collect — `--ignore` + them); and 4 pre-existing failures outside `tests/` (summary_pdf_mapper_chain ×3 + + from_rdsap_schema total_floor_area). +- **Pushing from this worktree:** the VS Code git credential helpers are broken + (missing node binaries), so use a one-shot gh override: + `git -c credential.helper= -c credential.helper='!gh auth git-credential' push`. + +## Next task 1 — SAP calculator on Property Baseline (the user expects this to be simple) + +Wire `Sap10Calculator` into `PropertyBaselineOrchestrator` to produce **Calculated SAP10 +Performance** per property. Per CONTEXT (≈line 100), this is a quantity **distinct from** +Lodged/Effective Performance — surfaced *alongside* them during the validation phase; it +may supersede Effective Performance in a later ADR once parity is confirmed (ADR-0009/0010). + +**Grill these two before coding (`/grill-with-docs`):** +1. **Where it sits.** Recommended: a *third* value-set on `PropertyBaselinePerformance` + (`calculated: Performance` + its space/water kWh), persisted as `calculated_*` columns + on `property_baseline_performance` — **not** an overwrite of `effective`. Pin the + aggregate shape + table migration in one pass (the table migration is FE-owned/Drizzle — + see `docs/migrations/property-baseline-performance-table.md`). +2. **Failure posture.** The calculator strict-raises (`UnmappedSapCode`, etc.) on certs it + can't yet handle. Running it over a real cohort *surfaces those gaps* — which is the + validation work `feature/per-cert-mapper-validation` exists for. Decide: let the raise + abort the batch (ADR-0012 all-or-nothing), or collect/skip-and-report. This is the main + judgment call; "simple to wire" but it lights up the validation surface. + +Then TDD: inject the calculator into `PropertyBaselineOrchestrator`, call it on the +Effective EPC, persist the calculated set in the same unit. + +## Next task 2 — Modelling (Recommendations / Optimiser / Plans) + +`ModellingOrchestrator.run(property_ids, scenario_ids)` is a **no-op stub**; +`ScenarioRepository` and `MaterialsRepository` are **empty seam ports**. Building this out +is the third stage. ADR-0005 (multi-phase Scenarios, per-phase recompute) governs it. +Relevant CONTEXT terms: Modelling (stage), Scenario, Scenario Phase, Scenario Snapshot, +Optimised Package, Plans, Recommendations, Optimiser Service. + +Before coding, grill the port shapes + the Scenario/Materials domain aggregates. Two +known open points: +- **`MaterialsRepository` naming.** A PR reviewer suggested `BuildingMaterialsRepository`; + this was **deliberately deferred to this grill** because "building materials" may + under-describe retrofit measures (a heat pump / ASHP is a *measure/product*, not a + building material). Settle the term (Materials / Measures / Products / BuildingMaterials) + here. +- **Modelling will need a Unit of Work** when it writes Plans — the stub currently takes + no `unit_of_work`; it gains one (ADR-0012) when its body is built. + +## Stubs / seams that raise or no-op (do NOT mistake for "done") + +- `applications/ara_first_run/handler.py::_source_clients_from_env` — **raises** + `NotImplementedError`. EPC-API / Google-Solar / geospatial-S3 client config + env-var + names + pandas/s3fs deps + Terraform wiring are a separate deploy piece (out of scope so + far). The lambda is not end-to-end runnable until this is filled in. +- `ModellingOrchestrator.run` — no-op. +- `ScenarioRepository` / `MaterialsRepository` — empty ABC ports. +- `StubRebaseliner` — raises `RebaselineNotImplemented` on pre-SAP10 certs (`sap_version + < 10`); ML Rebaselining is not implemented. +- **EPC Energy Derivation** (fuel split + bills + the Ofgem-cap Fuel Rates ETL) is + deferred — kWh is carried on `PropertyBaselinePerformance`, the rest is not. + +## Known doc drift to be aware of (flagged, intentionally not auto-fixed) + +- **CONTEXT.md term vs code class.** The glossary term is **"Baseline Performance"**; the + code class is **`PropertyBaselinePerformance`** (renamed on PR review). The glossary was + *deliberately* left un-renamed — treat "Baseline Performance" as the spoken concept and + `PropertyBaselinePerformance` as its class. If you want them aligned, rename the term to + "Property Baseline Performance" across CONTEXT + ADR prose (a quick, mechanical change). +- **CONTEXT.md ≈line 105** says the calculator lives in `domain/sap/` — that's **stale**; + it's `domain/sap10_calculator/calculator.py`. Safe to correct. + +## Issues / process + +Parent PRD: `gh issue view 1128 --repo Hestia-Homes/Model`. #1129–#1138 done (each with a +"Done." comment). New work → new issues (use `/to-issues` or `/triage`), `ready-for-agent` +labelled, parented to #1128.