mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
Factual staleness fix flagged in the handover; the calculator lives in domain/sap10_calculator/calculator.py. Glossary term 'Baseline Performance' deliberately left unchanged (concept vs PropertyBaselinePerformance class). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
153 lines
9.5 KiB
Markdown
153 lines
9.5 KiB
Markdown
# Handover — Ara backend: Property Baseline (SAP calculator) + Modelling
|
||
|
||
You are picking up a clean, merged baseline. The `ara_first_run` backend rebuild is
|
||
**done and shipped**; the next two fronts are (1) wiring the SAP calculator into
|
||
Property Baseline, and (2) starting Modelling. This doc is the orientation — the ADRs
|
||
and CONTEXT.md are authoritative for decisions; don't re-derive them.
|
||
|
||
## Where things stand
|
||
|
||
- The **`ara_first_run` rebuild is complete and merged to `main`** (via
|
||
`feature/per-cert-mapper-validation`): the full pipeline spine
|
||
**Ingestion → Baseline → Modelling(stub)** on a flat-hexagonal layout with a
|
||
per-stage Unit-of-Work. Issues #1129–#1138 (parent PRD #1128) are all done.
|
||
- **Branch + worktree:** you are on `feature/property-baseline-sap10`, cut from the
|
||
up-to-date `feature/per-cert-mapper-validation` (which contains `main` + the merged
|
||
ara work + the ongoing per-cert SAP-calculator validation slices). Worktree:
|
||
`/workspaces/home/hestia-worktrees/model-assemble-new-backend`. The
|
||
`/workspaces/model` worktree holds `feature/per-cert-mapper-validation` itself.
|
||
- **PRs go into `feature/per-cert-mapper-validation`, NOT `main` directly** — one PR
|
||
per slice, the rhythm used for #1129–#1138.
|
||
|
||
## Read first (authoritative — don't re-derive)
|
||
|
||
- **ADRs** `docs/adr/`: 0002 (Property aggregate root), 0003 (strict Ingestion→Modelling
|
||
separation, amended), 0004 (BaselinePerformance = Lodged+Effective pair, amended for
|
||
the standalone table), 0005 (multi-phase Scenarios, per-phase recompute — **governs
|
||
Modelling**), 0006/0007 (deterministic kWh / kWh-as-ML-target), 0009+0010
|
||
(deterministic SAP calculator + its spec target & validation cohort), 0011 (composable
|
||
stage orchestrators, one lambda per use case, stages talk through repos), 0012
|
||
(Unit-of-Work per-stage batch transaction).
|
||
- **CONTEXT.md** — the glossary; use this vocabulary in code + commits.
|
||
- **`ara_backend_design.md`** is a **stale draft PRD** — its architecture sections are
|
||
superseded by ADR-0011/0012 (a banner now says so). Trust the ADRs, not it.
|
||
|
||
## Architecture (current — flat hexagonal at repo root)
|
||
|
||
```
|
||
applications/<lambda>/ thin handler + trigger body + Dockerfile + local_handler
|
||
orchestration/ stage orchestrators + AraFirstRunPipeline (deps injected)
|
||
domain/ pure aggregates + services
|
||
repositories/<agg>/ port (ABC) + adapter (*_postgres_repository / *_s3_repository)
|
||
infrastructure/ clients + SQLModel rows (*_table.py) + engine/config
|
||
```
|
||
|
||
Stages communicate **only through repos**, threading just `property_ids` — never an
|
||
in-memory hand-off (ADR-0011/0003). Each stage runs its batch in **one Unit of Work and
|
||
commits once** (ADR-0012); all-or-nothing per batch, fail noisily → subtask FAILED →
|
||
debug & re-run; re-runs are idempotent (replace-by-`property_id`). Ingestion is
|
||
fetch-then-write so a DB connection is never held during external IO.
|
||
|
||
## Key files (note the recent rename: baseline → property_baseline; FirstRun → AraFirstRun)
|
||
|
||
- `orchestration/ara_first_run_pipeline.py` — `AraFirstRunPipeline`, `AraFirstRunCommand`,
|
||
the `IngestionStage`/`PropertyBaselineStage`/`ModellingStage` Protocols.
|
||
- `orchestration/property_baseline_orchestrator.py` — `PropertyBaselineOrchestrator`
|
||
(**this is where the SAP calculator gets wired**).
|
||
- `orchestration/ingestion_orchestrator.py`, `orchestration/modelling_orchestrator.py` (stub).
|
||
- `domain/property_baseline/` — `PropertyBaselinePerformance`, `Performance`,
|
||
`lodged_performance()`, `Rebaseliner`/`StubRebaseliner`.
|
||
- `repositories/property_baseline/` (port + postgres adapter),
|
||
`repositories/unit_of_work.py` + `repositories/postgres_unit_of_work.py`.
|
||
- `repositories/scenario/`, `repositories/materials/` — **empty seam ports** for Modelling.
|
||
- `infrastructure/postgres/property_baseline_performance_table.py` — flat-column row.
|
||
- `applications/ara_first_run/handler.py` — `build_first_run_pipeline` wiring +
|
||
`_source_clients_from_env` (a seam that **raises** — see Stubs below).
|
||
- **SAP calculator (for task 1):** `domain/sap10_calculator/calculator.py`, class
|
||
`Sap10Calculator`, returns a `SapResult` (5 quantities + monthly + worksheet audit).
|
||
It is mature and heavily validated by the per-cert work on this branch.
|
||
|
||
## Conventions + gotchas
|
||
|
||
- **TDD**, one test → one impl; `# Arrange / # Act / # Assert` headers; **commit per
|
||
slice** with a spec/ADR citation and the
|
||
`Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>` trailer.
|
||
- Tests: real ephemeral PostgreSQL via the `db_engine` fixture (JSONB needs real PG).
|
||
**Orchestrator/repo unit tests use fakes** — `tests/orchestration/fakes.py`
|
||
(`FakeUnitOfWork` exposing `property`/`epc`/`solar`/`property_baseline` repos + commit
|
||
count). Run with `-p no:cacheprovider`; ignore coverage spam.
|
||
- **pyright strict, zero errors.** Known noise to ignore: a `venvPath` warning; the
|
||
`moto`-not-installed import errors in `test_postcode_splitter_orchestrator.py` +
|
||
`test_user_address_csv_s3_repository.py` (those modules don't collect — `--ignore`
|
||
them); and 4 pre-existing failures outside `tests/` (summary_pdf_mapper_chain ×3 +
|
||
from_rdsap_schema total_floor_area).
|
||
- **Pushing from this worktree:** the VS Code git credential helpers are broken
|
||
(missing node binaries), so use a one-shot gh override:
|
||
`git -c credential.helper= -c credential.helper='!gh auth git-credential' push`.
|
||
|
||
## Next task 1 — SAP calculator on Property Baseline (the user expects this to be simple)
|
||
|
||
Wire `Sap10Calculator` into `PropertyBaselineOrchestrator` to produce **Calculated SAP10
|
||
Performance** per property. Per CONTEXT (≈line 100), this is a quantity **distinct from**
|
||
Lodged/Effective Performance — surfaced *alongside* them during the validation phase; it
|
||
may supersede Effective Performance in a later ADR once parity is confirmed (ADR-0009/0010).
|
||
|
||
**Grill these two before coding (`/grill-with-docs`):**
|
||
1. **Where it sits.** Recommended: a *third* value-set on `PropertyBaselinePerformance`
|
||
(`calculated: Performance` + its space/water kWh), persisted as `calculated_*` columns
|
||
on `property_baseline_performance` — **not** an overwrite of `effective`. Pin the
|
||
aggregate shape + table migration in one pass (the table migration is FE-owned/Drizzle —
|
||
see `docs/migrations/property-baseline-performance-table.md`).
|
||
2. **Failure posture.** The calculator strict-raises (`UnmappedSapCode`, etc.) on certs it
|
||
can't yet handle. Running it over a real cohort *surfaces those gaps* — which is the
|
||
validation work `feature/per-cert-mapper-validation` exists for. Decide: let the raise
|
||
abort the batch (ADR-0012 all-or-nothing), or collect/skip-and-report. This is the main
|
||
judgment call; "simple to wire" but it lights up the validation surface.
|
||
|
||
Then TDD: inject the calculator into `PropertyBaselineOrchestrator`, call it on the
|
||
Effective EPC, persist the calculated set in the same unit.
|
||
|
||
## Next task 2 — Modelling (Recommendations / Optimiser / Plans)
|
||
|
||
`ModellingOrchestrator.run(property_ids, scenario_ids)` is a **no-op stub**;
|
||
`ScenarioRepository` and `MaterialsRepository` are **empty seam ports**. Building this out
|
||
is the third stage. ADR-0005 (multi-phase Scenarios, per-phase recompute) governs it.
|
||
Relevant CONTEXT terms: Modelling (stage), Scenario, Scenario Phase, Scenario Snapshot,
|
||
Optimised Package, Plans, Recommendations, Optimiser Service.
|
||
|
||
Before coding, grill the port shapes + the Scenario/Materials domain aggregates. Two
|
||
known open points:
|
||
- **`MaterialsRepository` naming.** A PR reviewer suggested `BuildingMaterialsRepository`;
|
||
this was **deliberately deferred to this grill** because "building materials" may
|
||
under-describe retrofit measures (a heat pump / ASHP is a *measure/product*, not a
|
||
building material). Settle the term (Materials / Measures / Products / BuildingMaterials)
|
||
here.
|
||
- **Modelling will need a Unit of Work** when it writes Plans — the stub currently takes
|
||
no `unit_of_work`; it gains one (ADR-0012) when its body is built.
|
||
|
||
## Stubs / seams that raise or no-op (do NOT mistake for "done")
|
||
|
||
- `applications/ara_first_run/handler.py::_source_clients_from_env` — **raises**
|
||
`NotImplementedError`. EPC-API / Google-Solar / geospatial-S3 client config + env-var
|
||
names + pandas/s3fs deps + Terraform wiring are a separate deploy piece (out of scope so
|
||
far). The lambda is not end-to-end runnable until this is filled in.
|
||
- `ModellingOrchestrator.run` — no-op.
|
||
- `ScenarioRepository` / `MaterialsRepository` — empty ABC ports.
|
||
- `StubRebaseliner` — raises `RebaselineNotImplemented` on pre-SAP10 certs (`sap_version
|
||
< 10`); ML Rebaselining is not implemented.
|
||
- **EPC Energy Derivation** (fuel split + bills + the Ofgem-cap Fuel Rates ETL) is
|
||
deferred — kWh is carried on `PropertyBaselinePerformance`, the rest is not.
|
||
|
||
## Known doc drift to be aware of (flagged, intentionally not auto-fixed)
|
||
|
||
- **CONTEXT.md term vs code class.** The glossary term is **"Baseline Performance"**; the
|
||
code class is **`PropertyBaselinePerformance`** (renamed on PR review). The glossary was
|
||
*deliberately* left un-renamed — treat "Baseline Performance" as the spoken concept and
|
||
`PropertyBaselinePerformance` as its class. If you want them aligned, rename the term to
|
||
"Property Baseline Performance" across CONTEXT + ADR prose (a quick, mechanical change).
|
||
|
||
## Issues / process
|
||
|
||
Parent PRD: `gh issue view 1128 --repo Hestia-Homes/Model`. #1129–#1138 done (each with a
|
||
"Done." comment). New work → new issues (use `/to-issues` or `/triage`), `ready-for-agent`
|
||
labelled, parented to #1128.
|