Commit graph

5822 commits

Author SHA1 Message Date
Daniel Roth
5ed3bf73e8 evidence categories plus typehinting 2026-06-04 15:40:25 +00:00
Daniel Roth
020a24d345 run() returns core and other file paths 🟩 2026-06-04 15:40:25 +00:00
Daniel Roth
d8ec12065f run() returns core and other file paths 🟥 2026-06-04 15:40:25 +00:00
Daniel Roth
0aa6a4fc30 Other files persisted to DB with file_type OTHER 🟩 2026-06-04 15:40:25 +00:00
Daniel Roth
1650762ae2 Other files persisted to DB with file_type OTHER 🟥 2026-06-04 15:40:25 +00:00
Daniel Roth
c86dbeb4a1 Upload other files to S3 when get_other_files is True 🟩 2026-06-04 15:40:25 +00:00
Daniel Roth
098f60ecfd Upload other files to S3 when get_other_files is True 🟥 2026-06-04 15:40:25 +00:00
Daniel Roth
9c38f45c98 tidying for readability 2026-06-04 15:40:25 +00:00
Daniel Roth
c9a2ce4921 Service deletes other-file temp paths after run 🟩 2026-06-04 15:40:25 +00:00
Daniel Roth
f8d2bb8049 Service deletes other-file temp paths after run 🟥 2026-06-04 15:40:25 +00:00
Daniel Roth
49e7b7fea6 Wire service to get_evidence_files_by_job_id; retire get_core_evidence_files_by_job_id 🟪 2026-06-04 15:40:25 +00:00
Daniel Roth
662f6de0ab get_evidence_files_by_job_id downloads other files when include_other=True 🟩 2026-06-04 15:40:25 +00:00
Daniel Roth
c4ffaaa069 get_evidence_files_by_job_id downloads other files when include_other=True 🟥 2026-06-04 15:40:25 +00:00
Daniel Roth
f95b6bdd7d get_evidence_files_by_job_id returns DownloadedFiles with empty other when include_other=False 🟩 2026-06-04 15:40:25 +00:00
Daniel Roth
665dc69ad5 get_evidence_files_by_job_id returns DownloadedFiles with empty other when include_other=False 🟥 2026-06-04 15:40:25 +00:00
Daniel Roth
e7c679e0db Group evidence into core and other via _group_into_core_and_other_files 🟪 2026-06-04 15:40:25 +00:00
Daniel Roth
99229844b5 _select_other_files returns non-core evidence files 🟩 2026-06-04 15:40:25 +00:00
Daniel Roth
db796747d9 _select_other_files returns non-core evidence files 🟥 2026-06-04 15:40:25 +00:00
Daniel Roth
6cb6c8c756 allow for missing deal stage column when triggering sqs from file 2026-06-04 15:40:25 +00:00
Daniel Roth
790e430aff rename local handler trigger script 2026-06-04 15:40:25 +00:00
Jun-te Kim
a4670f8bc0 deploy lambda 2026-06-04 15:32:51 +00:00
Jun-te Kim
261fae2e79 reformatted to be DDD structure 2026-06-04 14:50:04 +00:00
Jun-te Kim
dfd05ba28b tests files 2026-06-04 11:47:42 +00:00
Jun-te Kim
c614ff6388 save local changes 2026-06-03 12:41:56 +00:00
Jun-te Kim
bf166e7f46 test suite to pick it up 2026-06-02 15:33:46 +00:00
Jun-te Kim
6c8fe86cf9 ddd tests 2026-06-02 15:31:42 +00:00
Jun-te Kim
25ba1427b1 seperate ddd tests 2026-06-02 15:30:58 +00:00
Jun-te Kim
4e02eb7c77 more tests to ensure we don't deploy something that is brokern 2026-06-02 15:03:20 +00:00
Jun-te Kim
a3c80b6691
Merge pull request #1147 from Hestia-Homes/feature/landlord_data
address2uprn was missing a dependency
2026-06-02 11:50:42 +01:00
Jun-te Kim
144233a5f3 backend was missing a dependency 2026-06-02 10:46:29 +00:00
Jun-te Kim
feb3bc08f0
Merge pull request #1144 from Hestia-Homes/feature/landlord_data
if you change the descript it destories and make a new one instead of…
2026-06-02 10:39:17 +01:00
Jun-te Kim
f3ad339cf5 if you change the descript it destories and make a new one instead of edit 2026-06-02 09:36:31 +00:00
Jun-te Kim
8accb51383
Merge pull request #1142 from Hestia-Homes/feature/landlord_data
iam permissions for my lambda to import location
2026-06-02 09:43:32 +01:00
Jun-te Kim
04dc1b20fe iam permissions 2026-06-01 21:08:19 +00:00
Jun-te Kim
0a123c6723
Merge pull request #1122 from Hestia-Homes/feature/landlord_data
Feature/landlord data
2026-06-01 20:25:20 +01:00
Jun-te Kim
616744a606 Merge remote-tracking branch 'origin/main' into feature/landlord_data
# Conflicts:
#	datatypes/epc/schema/rdsap_schema_21_0_0.py
#	datatypes/epc/schema/rdsap_schema_21_0_1.py
2026-06-01 17:02:20 +00:00
Jun-te Kim
bf3b689f15 Remove EPC and asset_list changes unrelated to SAL handler
This branch's objective is the SAL ingestion handler
(applications/SAL/handler.py) and its dependency tree. Drop work
that crept in but is unreferenced by it:

- EPC feature: domain/epc, infrastructure/epc (gov_uk + historical
  clients), tests/infrastructure/epc
- datatypes/epc edits (instantaneous_wwhrs Optional) reverted to main
- asset_list/app.py local data-file/column tweak reverted to main

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 16:39:09 +00:00
Jun-te Kim
bdf703ea00 updated rdsap option; seperated s3 location in infrastrucutre; added open ai api 2026-06-01 16:33:14 +00:00
Jun-te Kim
754e6609fd standardist Address 2026-06-01 16:32:48 +00:00
Khalim Conn-Kowlessar
1ea71a3acb refactor(ara): rename FirstRunPipeline → AraFirstRunPipeline (PR #1139 review)
Aligns the composition with its entry point (the `ara_first_run` lambda +
`AraFirstRunTriggerBody`): clearer what the file does.

- orchestration/first_run_pipeline.py → ara_first_run_pipeline.py
- FirstRunPipeline → AraFirstRunPipeline; FirstRunCommand → AraFirstRunCommand
- test files renamed to match

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:28:48 +00:00
Khalim Conn-Kowlessar
d89983d44f refactor(property): PropertyRow.id non-Optional (PR #1139 review)
`property` is an FE-owned table the backend only ever reads — every row read
carries an id — so the autoincrement-PK `Optional[int]` idiom doesn't apply
here. Make it `int` and drop the now-redundant None guard in get_many.

(Contrast: solar_table keeps Optional id — the backend DOES insert those, so
id is genuinely None pre-flush.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:28:48 +00:00
Khalim Conn-Kowlessar
50914e8aae refactor(property-baseline): units on co2 / PEUI columns (PR #1139 review)
Make the stored units explicit on the property_baseline_performance columns:
- `*_co2_emissions` → `*_co2_emissions_t_per_yr` (tonnes CO₂/yr, whole dwelling)
- `*_primary_energy_intensity` → `*_primary_energy_intensity_kwh_per_m2_yr`

Column names only; the domain `Performance` VO stays unit-suffix-free (units are
a storage concern, mapped in from_domain/to_domain). Migration doc updated.
Round-trip stays green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:28:48 +00:00
Khalim Conn-Kowlessar
457d959b1f refactor(property-baseline): rename baseline → property_baseline aggregate (PR #1139 review)
Wholesale rename of the Baseline aggregate to PropertyBaseline for clarity /
to disambiguate from baselines that appear elsewhere in Modelling. Scoped to
this aggregate only — the distinct Rebaselining term (rebaseline_reason,
StubRebaseliner, RebaselineNotImplemented) is deliberately untouched.

- domain/baseline → domain/property_baseline; BaselinePerformance →
  PropertyBaselinePerformance.
- repositories/baseline → repositories/property_baseline; BaselineRepository
  / BaselinePostgresRepository → PropertyBaseline*.
- orchestration/baseline_orchestrator.py → property_baseline_orchestrator.py;
  BaselineOrchestrator → PropertyBaselineOrchestrator. BaselineStage →
  PropertyBaselineStage.
- infrastructure/postgres: baseline_performance_table.py →
  property_baseline_performance_table.py; table `baseline_performance` →
  `property_baseline_performance`; Model renamed.
- UnitOfWork attribute `.baseline` → `.property_baseline`.
- Docs: ADR-0004 references + migration doc (renamed to
  property-baseline-performance-table.md) updated.

CONTEXT.md glossary term ("Baseline Performance") left as-is pending a
ubiquitous-language call (raised on the PR). 123 tests pass; pyright strict
clean (only the unrelated pre-existing moto import errors remain).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:28:48 +00:00
Khalim Conn-Kowlessar
d2d008f5c5 perf(repos): bulk get_many / get_for_properties — batch reads, not N round-trips (#1138)
Final slice of ADR-0012: collapse the per-property read round-trips a batch
made (Baseline hydrated ~8 queries x 30 properties one at a time) into a
handful of per-table IN queries.

- EpcPostgresRepository: extracted a shared `_compose(rows)` from `get` (the
  windows + floor-dim fetches are now passed in, not fetched inline), so both
  `get` and the new `get_for_properties(property_ids)` build EpcPropertyData
  from pre-fetched rows. `get_for_properties` fetches each child table once
  (`WHERE epc_property_id IN ...`), groups in memory, and composes — load-whole
  per ADR-0002.
- PropertyRepository.get_many(property_ids) -> Properties: one query for the
  property rows + one bulk EPC hydration, composed in input order.
- BaselineOrchestrator / IngestionOrchestrator read the batch via get_many
  instead of N x get.
- Ports + fakes gain the bulk methods.

The #1129 round-trip fidelity test stays green (the compose extraction is
behaviour-preserving). New tests: bulk hydration correctness + round-trips are
constant w.r.t. batch size (one-per-table, proven by query count). 123 pass;
pyright strict clean; AAA.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:28:48 +00:00
Khalim Conn-Kowlessar
7275850c9e refactor(orchestration): wire stages onto the UnitOfWork; per-stage commit (#1138)
Replaces the handler's whole-pipeline Session (one transaction across all
three stages, connection pinned during Ingestion's external IO) with a
Unit-of-Work per stage (ADR-0012, added here). Each stage runs its batch in
one unit and commits once; any property raising aborts the batch and the
subtask fails noisily.

- BaselineOrchestrator(unit_of_work, rebaseliner): one unit for the batch,
  commit once. Raise on a pre-SAP10 property leaves the unit uncommitted.
- IngestionOrchestrator(unit_of_work, epc_fetcher, geospatial_repo,
  solar_fetcher): fetch/write split — phase 1 fetches the whole batch (EPC /
  coords / solar) with NO unit open; phase 2 writes in one unit and commits.
  The connection is never held during external IO. Geospatial S3 repo stays
  injected (reference data, not transactional).
- Handler: module-scoped engine (pool reused across warm invocations) + a UoW
  factory; whole-pipeline `with Session` gone. `build_first_run_pipeline`
  composes on the factory. Source clients still behind the raising seam.
- ADR-0012 records the decision (per-stage boundary, all-or-nothing batch,
  idempotent re-run, fetch/write split, module-scoped engine). Modelling stub
  left untouched (no-op, no DB) per the ADR.

Tests: orchestrators on a shared FakeUnitOfWork (assert persisted batch +
exactly-once commit + no-commit-on-raise). New real-DB E2E integration test:
real PostgresUnitOfWork, Ingestion writes the EPC → Baseline reads it back
through the repo → re-run replaces, not duplicates (1 EPC row, 1 baseline row
after two runs). 121 pass in tests/; pyright strict clean; AAA.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:28:48 +00:00
Khalim Conn-Kowlessar
fa5224b6ed feat(repos): idempotent EPC + Baseline writes (replace by property_id) (#1138)
Re-runs of a First Run batch re-save a property's data; that must replace,
not duplicate (ADR-0012 idempotent batch writes).

- `EpcPostgresRepository.save` deletes the property's existing EPC graph
  (parent + all child tables, floor-dims via their building parts) before
  inserting, when a `property_id` is given. Anonymous saves still insert.
- `BaselinePostgresRepository.save` deletes the existing row for the
  `property_id` before inserting — no more unique-constraint violation on
  re-save; also what the re-score-on-override path needs.
- Solar already upserts, so it's unchanged.

The #1129 round-trip fidelity test stays green (delete-first is a no-op on
a first save). 2 new tests (re-save replaces, not duplicates). pyright
strict clean; AAA.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:28:48 +00:00
Khalim Conn-Kowlessar
5524385984 feat(uow): UnitOfWork port + PostgresUnitOfWork adapter (#1138)
First slice of the per-stage batch-transaction refactor (ADR-0012). A
UnitOfWork is the single transaction a stage runs its batch in: a context
manager exposing the DB repos bound to one session, committing once on
`commit()` and rolling back on exception or exit-without-commit
(all-or-nothing per batch, fail noisily).

- `UnitOfWork` (port): `property` / `epc` / `solar` / `baseline` repos +
  `commit()` / `rollback()`; `__exit__` rolls back uncommitted work.
- `PostgresUnitOfWork(session_factory)`: opens a Session from an injected
  factory (a module-scoped engine + sessionmaker in prod, so the pool is
  reused across warm invocations), binds the Postgres repos to it, closes
  on exit.

Not yet wired into any orchestrator — that lands in the Baseline /
Ingestion refactor slices. 3 tests against ephemeral PG (commit durable
across units; exception rolls back; no-commit persists nothing). pyright
strict clean; AAA.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:28:48 +00:00
Khalim Conn-Kowlessar
61846665b1 feat(first-run): FirstRunPipeline E2E — Ingestion → Baseline → Modelling (#1136)
Completes the First Run spine. Replaces the #1130 stub FirstRunPipeline
with the real three-stage composition and wires it into the handler.

- `FirstRunPipeline.run(command)` sequences Ingestion → Baseline →
  Modelling, threading **only** `property_ids` between stages (and
  `scenario_ids` into Modelling, off the command — never a prior stage's
  output). Stages are injected behind thin `IngestionStage` /
  `BaselineStage` / `ModellingStage` Protocols (the EpcFetcher/SolarFetcher
  idiom), so the handler owns wiring and tests substitute fakes (ADR-0011).
- `ModellingOrchestrator` stub + `ScenarioRepository` / `MaterialsRepository`
  seam ports — `run(property_ids, scenario_ids)` reads through repos, does
  no scoring yet. Method shapes deferred to the Modelling per-service grills
  (Scenario / Scenario Phase / Snapshot / Optimised Package / Plans are rich
  — not pre-empted here).
- Handler delegates to the real pipeline via `build_first_run_pipeline`
  (Postgres-backed repos off the session). The Ingestion source clients
  (EPC API / Google Solar / geospatial S3) are isolated behind one
  `_source_clients_from_env` seam that raises until the deploy/Terraform
  config settles — out of scope for this slice. Subtask complete/failed +
  CloudWatch URL still come from `@subtask_handler`.

Integration test (the criterion's centrepiece): wires REAL Ingestion +
REAL Baseline + stub Modelling through a shared fake EPC repo, with a
repo-backed PropertyRepo composing the Property from that slice. Proves
Baseline reads the very EPC Ingestion persisted — the through-repos
hand-off, no in-memory coupling. Plus a composition test pinning stage
order + only-property_ids threading.

TDD, one test → one impl. pyright strict clean; AAA layout. 116 pass in
the tests/ tree, no regressions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:28:48 +00:00
Khalim Conn-Kowlessar
9f22b0aae8 feat(baseline): BaselineOrchestrator + BaselinePerformance aggregate (#1135)
Stage 2 of First Run. Establishes each Property's Baseline Performance
from persisted source data and writes it back — reads only from repos,
never a Fetcher or HTTP (ADR-0003), so it is byte-identical whether
Ingestion ran milliseconds ago or last week.

Domain (`domain/baseline/`):
- `Performance` VO — the four rated quantities: SAP / EPC Band / CO2 /
  Primary Energy Intensity. `lodged_performance(epc)` reads them off the
  EPC's recorded fields (PEUI = `energy_consumption_current`).
- `BaselinePerformance` (ADR-0004) — the paired `lodged` + `effective`
  Performance + `rebaseline_reason`, plus the no-derivation part of the
  energy block (`space_heating_kwh` / `water_heating_kwh`, off the RHI,
  deterministic per ADR-0006). Both halves always populated.
- `Rebaseliner` port + `StubRebaseliner`: the re-score-on-override seam
  (ADR-0011). SAP10 certs pass through (effective == lodged, reason
  "none"); a pre-SAP10 cert raises `RebaselineNotImplemented` rather
  than fabricating a plausible-but-wrong "none" — ML rebaselining is not
  wired yet. Mirrors the repo's strict-raise culture.

Persistence: new `BaselineRepository` port + `BaselinePostgresRepository`
+ flat-column `baseline_performance` SQLModel (one row per Property). Per
ADR-0004's amendment this is a standalone table, NOT columns on the
retiring `property_details_epc`. Production migration is FE-owned
(Drizzle) — docs/migrations/baseline-performance-table.md.

Docs (grill-with-docs): corrected CONTEXT.md Lodged/Effective Performance
to Primary Energy Intensity (the term collided with its own _Avoid_ entry
under "heat demand") + fixed stale RHI field names; amended ADR-0004
Consequences for the standalone-table decision.

Fuel split + bills (rest of EPC Energy Derivation) deferred to a
follow-up — they need a Fuel Rates source (Ofgem-cap ETL) that does not
exist yet.

TDD, one test -> one impl: 7 tests (lodged read, rebaseliner pass-through
+ raise, orchestrator establish-and-persist + pre-SAP10 raise, Postgres
round-trip + absent). pyright strict clean; AAA layout.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:28:48 +00:00
Khalim Conn-Kowlessar
a910ce9855 feat(ara): AraFirstRunTriggerBody + ara_first_run lambda skeleton (#1130)
Stage-2 entry point for the First Run use case. Adds the
`ara_first_run` Lambda package mirroring the `postcode_splitter`
template, its typed trigger contract, and a stub `FirstRunPipeline`.

- `AraFirstRunTriggerBody`: thin command of five fields — `task_id`,
  `sub_task_id` (UUID, lifecycle), `portfolio_id`, `property_ids`,
  `scenario_ids` (int business IDs). No `model_config` override, so
  Pydantic's default `extra="ignore"` lets the FastAPI backend add
  fields without breaking deployed lambdas. UPRNs / Scenario defs are
  deliberately off the event — read from source-of-truth tables.
- Thin `handler.py`: validate-and-delegate only, via a named
  `dispatch_first_run` seam (testable without the Lambda runtime).
  Subtask status (in-progress/complete/failed) + CloudWatch log URL
  come for free from the existing `@subtask_handler()` decorator.
- `FirstRunPipeline` (orchestration/) stub: `run(command)` receives the
  validated command. Declares a structural `FirstRunCommand` Protocol
  (the three business fields) that `AraFirstRunTriggerBody` satisfies,
  so orchestration needs no application-layer import — rhymes with the
  `EpcFetcher`/`SolarFetcher` Protocols on IngestionOrchestrator
  (ADR-0011). Full Ingestion→Baseline→Modelling composition lands in
  #1136.
- Dockerfile / requirements.txt / local_handler/ mirror postcode_splitter.

TDD: 7 new tests (trigger-body validation incl. forward-compat +
id-types, pipeline seam, handler delegation). pyright strict clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:28:48 +00:00