This branch's objective is the SAL ingestion handler
(applications/SAL/handler.py) and its dependency tree. Drop work
that crept in but is unreferenced by it:
- EPC feature: domain/epc, infrastructure/epc (gov_uk + historical
clients), tests/infrastructure/epc
- datatypes/epc edits (instantaneous_wwhrs Optional) reverted to main
- asset_list/app.py local data-file/column tweak reverted to main
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Aligns the composition with its entry point (the `ara_first_run` lambda +
`AraFirstRunTriggerBody`): clearer what the file does.
- orchestration/first_run_pipeline.py → ara_first_run_pipeline.py
- FirstRunPipeline → AraFirstRunPipeline; FirstRunCommand → AraFirstRunCommand
- test files renamed to match
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`property` is an FE-owned table the backend only ever reads — every row read
carries an id — so the autoincrement-PK `Optional[int]` idiom doesn't apply
here. Make it `int` and drop the now-redundant None guard in get_many.
(Contrast: solar_table keeps Optional id — the backend DOES insert those, so
id is genuinely None pre-flush.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make the stored units explicit on the property_baseline_performance columns:
- `*_co2_emissions` → `*_co2_emissions_t_per_yr` (tonnes CO₂/yr, whole dwelling)
- `*_primary_energy_intensity` → `*_primary_energy_intensity_kwh_per_m2_yr`
Column names only; the domain `Performance` VO stays unit-suffix-free (units are
a storage concern, mapped in from_domain/to_domain). Migration doc updated.
Round-trip stays green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Final slice of ADR-0012: collapse the per-property read round-trips a batch
made (Baseline hydrated ~8 queries x 30 properties one at a time) into a
handful of per-table IN queries.
- EpcPostgresRepository: extracted a shared `_compose(rows)` from `get` (the
windows + floor-dim fetches are now passed in, not fetched inline), so both
`get` and the new `get_for_properties(property_ids)` build EpcPropertyData
from pre-fetched rows. `get_for_properties` fetches each child table once
(`WHERE epc_property_id IN ...`), groups in memory, and composes — load-whole
per ADR-0002.
- PropertyRepository.get_many(property_ids) -> Properties: one query for the
property rows + one bulk EPC hydration, composed in input order.
- BaselineOrchestrator / IngestionOrchestrator read the batch via get_many
instead of N x get.
- Ports + fakes gain the bulk methods.
The #1129 round-trip fidelity test stays green (the compose extraction is
behaviour-preserving). New tests: bulk hydration correctness + round-trips are
constant w.r.t. batch size (one-per-table, proven by query count). 123 pass;
pyright strict clean; AAA.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replaces the handler's whole-pipeline Session (one transaction across all
three stages, connection pinned during Ingestion's external IO) with a
Unit-of-Work per stage (ADR-0012, added here). Each stage runs its batch in
one unit and commits once; any property raising aborts the batch and the
subtask fails noisily.
- BaselineOrchestrator(unit_of_work, rebaseliner): one unit for the batch,
commit once. Raise on a pre-SAP10 property leaves the unit uncommitted.
- IngestionOrchestrator(unit_of_work, epc_fetcher, geospatial_repo,
solar_fetcher): fetch/write split — phase 1 fetches the whole batch (EPC /
coords / solar) with NO unit open; phase 2 writes in one unit and commits.
The connection is never held during external IO. Geospatial S3 repo stays
injected (reference data, not transactional).
- Handler: module-scoped engine (pool reused across warm invocations) + a UoW
factory; whole-pipeline `with Session` gone. `build_first_run_pipeline`
composes on the factory. Source clients still behind the raising seam.
- ADR-0012 records the decision (per-stage boundary, all-or-nothing batch,
idempotent re-run, fetch/write split, module-scoped engine). Modelling stub
left untouched (no-op, no DB) per the ADR.
Tests: orchestrators on a shared FakeUnitOfWork (assert persisted batch +
exactly-once commit + no-commit-on-raise). New real-DB E2E integration test:
real PostgresUnitOfWork, Ingestion writes the EPC → Baseline reads it back
through the repo → re-run replaces, not duplicates (1 EPC row, 1 baseline row
after two runs). 121 pass in tests/; pyright strict clean; AAA.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Re-runs of a First Run batch re-save a property's data; that must replace,
not duplicate (ADR-0012 idempotent batch writes).
- `EpcPostgresRepository.save` deletes the property's existing EPC graph
(parent + all child tables, floor-dims via their building parts) before
inserting, when a `property_id` is given. Anonymous saves still insert.
- `BaselinePostgresRepository.save` deletes the existing row for the
`property_id` before inserting — no more unique-constraint violation on
re-save; also what the re-score-on-override path needs.
- Solar already upserts, so it's unchanged.
The #1129 round-trip fidelity test stays green (delete-first is a no-op on
a first save). 2 new tests (re-save replaces, not duplicates). pyright
strict clean; AAA.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
First slice of the per-stage batch-transaction refactor (ADR-0012). A
UnitOfWork is the single transaction a stage runs its batch in: a context
manager exposing the DB repos bound to one session, committing once on
`commit()` and rolling back on exception or exit-without-commit
(all-or-nothing per batch, fail noisily).
- `UnitOfWork` (port): `property` / `epc` / `solar` / `baseline` repos +
`commit()` / `rollback()`; `__exit__` rolls back uncommitted work.
- `PostgresUnitOfWork(session_factory)`: opens a Session from an injected
factory (a module-scoped engine + sessionmaker in prod, so the pool is
reused across warm invocations), binds the Postgres repos to it, closes
on exit.
Not yet wired into any orchestrator — that lands in the Baseline /
Ingestion refactor slices. 3 tests against ephemeral PG (commit durable
across units; exception rolls back; no-commit persists nothing). pyright
strict clean; AAA.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Completes the First Run spine. Replaces the #1130 stub FirstRunPipeline
with the real three-stage composition and wires it into the handler.
- `FirstRunPipeline.run(command)` sequences Ingestion → Baseline →
Modelling, threading **only** `property_ids` between stages (and
`scenario_ids` into Modelling, off the command — never a prior stage's
output). Stages are injected behind thin `IngestionStage` /
`BaselineStage` / `ModellingStage` Protocols (the EpcFetcher/SolarFetcher
idiom), so the handler owns wiring and tests substitute fakes (ADR-0011).
- `ModellingOrchestrator` stub + `ScenarioRepository` / `MaterialsRepository`
seam ports — `run(property_ids, scenario_ids)` reads through repos, does
no scoring yet. Method shapes deferred to the Modelling per-service grills
(Scenario / Scenario Phase / Snapshot / Optimised Package / Plans are rich
— not pre-empted here).
- Handler delegates to the real pipeline via `build_first_run_pipeline`
(Postgres-backed repos off the session). The Ingestion source clients
(EPC API / Google Solar / geospatial S3) are isolated behind one
`_source_clients_from_env` seam that raises until the deploy/Terraform
config settles — out of scope for this slice. Subtask complete/failed +
CloudWatch URL still come from `@subtask_handler`.
Integration test (the criterion's centrepiece): wires REAL Ingestion +
REAL Baseline + stub Modelling through a shared fake EPC repo, with a
repo-backed PropertyRepo composing the Property from that slice. Proves
Baseline reads the very EPC Ingestion persisted — the through-repos
hand-off, no in-memory coupling. Plus a composition test pinning stage
order + only-property_ids threading.
TDD, one test → one impl. pyright strict clean; AAA layout. 116 pass in
the tests/ tree, no regressions.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>