Model

mirror of https://github.com/Hestia-Homes/Model.git synced 2026-06-08 11:17:27 +00:00

Author	SHA1	Message	Date
Khalim Conn-Kowlessar	305bffd284	refactor(ara): rename FirstRunPipeline → AraFirstRunPipeline (PR #1139 review) Aligns the composition with its entry point (the `ara_first_run` lambda + `AraFirstRunTriggerBody`): clearer what the file does. - orchestration/first_run_pipeline.py → ara_first_run_pipeline.py - FirstRunPipeline → AraFirstRunPipeline; FirstRunCommand → AraFirstRunCommand - test files renamed to match Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 15:00:33 +00:00
Khalim Conn-Kowlessar	c3691d9af2	refactor(property-baseline): rename baseline → property_baseline aggregate (PR #1139 review) Wholesale rename of the Baseline aggregate to PropertyBaseline for clarity / to disambiguate from baselines that appear elsewhere in Modelling. Scoped to this aggregate only — the distinct Rebaselining term (rebaseline_reason, StubRebaseliner, RebaselineNotImplemented) is deliberately untouched. - domain/baseline → domain/property_baseline; BaselinePerformance → PropertyBaselinePerformance. - repositories/baseline → repositories/property_baseline; BaselineRepository / BaselinePostgresRepository → PropertyBaseline*. - orchestration/baseline_orchestrator.py → property_baseline_orchestrator.py; BaselineOrchestrator → PropertyBaselineOrchestrator. BaselineStage → PropertyBaselineStage. - infrastructure/postgres: baseline_performance_table.py → property_baseline_performance_table.py; table `baseline_performance` → `property_baseline_performance`; Model renamed. - UnitOfWork attribute `.baseline` → `.property_baseline`. - Docs: ADR-0004 references + migration doc (renamed to property-baseline-performance-table.md) updated. CONTEXT.md glossary term ("Baseline Performance") left as-is pending a ubiquitous-language call (raised on the PR). 123 tests pass; pyright strict clean (only the unrelated pre-existing moto import errors remain). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 14:54:59 +00:00
Khalim Conn-Kowlessar	8685f8ba3a	perf(repos): bulk get_many / get_for_properties — batch reads, not N round-trips (#1138 ) Final slice of ADR-0012: collapse the per-property read round-trips a batch made (Baseline hydrated ~8 queries x 30 properties one at a time) into a handful of per-table IN queries. - EpcPostgresRepository: extracted a shared `_compose(rows)` from `get` (the windows + floor-dim fetches are now passed in, not fetched inline), so both `get` and the new `get_for_properties(property_ids)` build EpcPropertyData from pre-fetched rows. `get_for_properties` fetches each child table once (`WHERE epc_property_id IN ...`), groups in memory, and composes — load-whole per ADR-0002. - PropertyRepository.get_many(property_ids) -> Properties: one query for the property rows + one bulk EPC hydration, composed in input order. - BaselineOrchestrator / IngestionOrchestrator read the batch via get_many instead of N x get. - Ports + fakes gain the bulk methods. The #1129 round-trip fidelity test stays green (the compose extraction is behaviour-preserving). New tests: bulk hydration correctness + round-trips are constant w.r.t. batch size (one-per-table, proven by query count). 123 pass; pyright strict clean; AAA. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 10:33:24 +00:00
Khalim Conn-Kowlessar	48a488d1e9	refactor(orchestration): wire stages onto the UnitOfWork; per-stage commit (#1138 ) Replaces the handler's whole-pipeline Session (one transaction across all three stages, connection pinned during Ingestion's external IO) with a Unit-of-Work per stage (ADR-0012, added here). Each stage runs its batch in one unit and commits once; any property raising aborts the batch and the subtask fails noisily. - BaselineOrchestrator(unit_of_work, rebaseliner): one unit for the batch, commit once. Raise on a pre-SAP10 property leaves the unit uncommitted. - IngestionOrchestrator(unit_of_work, epc_fetcher, geospatial_repo, solar_fetcher): fetch/write split — phase 1 fetches the whole batch (EPC / coords / solar) with NO unit open; phase 2 writes in one unit and commits. The connection is never held during external IO. Geospatial S3 repo stays injected (reference data, not transactional). - Handler: module-scoped engine (pool reused across warm invocations) + a UoW factory; whole-pipeline `with Session` gone. `build_first_run_pipeline` composes on the factory. Source clients still behind the raising seam. - ADR-0012 records the decision (per-stage boundary, all-or-nothing batch, idempotent re-run, fetch/write split, module-scoped engine). Modelling stub left untouched (no-op, no DB) per the ADR. Tests: orchestrators on a shared FakeUnitOfWork (assert persisted batch + exactly-once commit + no-commit-on-raise). New real-DB E2E integration test: real PostgresUnitOfWork, Ingestion writes the EPC → Baseline reads it back through the repo → re-run replaces, not duplicates (1 EPC row, 1 baseline row after two runs). 121 pass in tests/; pyright strict clean; AAA. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 09:54:47 +00:00
Khalim Conn-Kowlessar	b77fe26892	feat(first-run): FirstRunPipeline E2E — Ingestion → Baseline → Modelling (#1136 ) Completes the First Run spine. Replaces the #1130 stub FirstRunPipeline with the real three-stage composition and wires it into the handler. - `FirstRunPipeline.run(command)` sequences Ingestion → Baseline → Modelling, threading only `property_ids` between stages (and `scenario_ids` into Modelling, off the command — never a prior stage's output). Stages are injected behind thin `IngestionStage` / `BaselineStage` / `ModellingStage` Protocols (the EpcFetcher/SolarFetcher idiom), so the handler owns wiring and tests substitute fakes (ADR-0011). - `ModellingOrchestrator` stub + `ScenarioRepository` / `MaterialsRepository` seam ports — `run(property_ids, scenario_ids)` reads through repos, does no scoring yet. Method shapes deferred to the Modelling per-service grills (Scenario / Scenario Phase / Snapshot / Optimised Package / Plans are rich — not pre-empted here). - Handler delegates to the real pipeline via `build_first_run_pipeline` (Postgres-backed repos off the session). The Ingestion source clients (EPC API / Google Solar / geospatial S3) are isolated behind one `_source_clients_from_env` seam that raises until the deploy/Terraform config settles — out of scope for this slice. Subtask complete/failed + CloudWatch URL still come from `@subtask_handler`. Integration test (the criterion's centrepiece): wires REAL Ingestion + REAL Baseline + stub Modelling through a shared fake EPC repo, with a repo-backed PropertyRepo composing the Property from that slice. Proves Baseline reads the very EPC Ingestion persisted — the through-repos hand-off, no in-memory coupling. Plus a composition test pinning stage order + only-property_ids threading. TDD, one test → one impl. pyright strict clean; AAA layout. 116 pass in the tests/ tree, no regressions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 22:32:58 +00:00
Khalim Conn-Kowlessar	76717dfc3a	feat(baseline): BaselineOrchestrator + BaselinePerformance aggregate (#1135 ) Stage 2 of First Run. Establishes each Property's Baseline Performance from persisted source data and writes it back — reads only from repos, never a Fetcher or HTTP (ADR-0003), so it is byte-identical whether Ingestion ran milliseconds ago or last week. Domain (`domain/baseline/`): - `Performance` VO — the four rated quantities: SAP / EPC Band / CO2 / Primary Energy Intensity. `lodged_performance(epc)` reads them off the EPC's recorded fields (PEUI = `energy_consumption_current`). - `BaselinePerformance` (ADR-0004) — the paired `lodged` + `effective` Performance + `rebaseline_reason`, plus the no-derivation part of the energy block (`space_heating_kwh` / `water_heating_kwh`, off the RHI, deterministic per ADR-0006). Both halves always populated. - `Rebaseliner` port + `StubRebaseliner`: the re-score-on-override seam (ADR-0011). SAP10 certs pass through (effective == lodged, reason "none"); a pre-SAP10 cert raises `RebaselineNotImplemented` rather than fabricating a plausible-but-wrong "none" — ML rebaselining is not wired yet. Mirrors the repo's strict-raise culture. Persistence: new `BaselineRepository` port + `BaselinePostgresRepository` + flat-column `baseline_performance` SQLModel (one row per Property). Per ADR-0004's amendment this is a standalone table, NOT columns on the retiring `property_details_epc`. Production migration is FE-owned (Drizzle) — docs/migrations/baseline-performance-table.md. Docs (grill-with-docs): corrected CONTEXT.md Lodged/Effective Performance to Primary Energy Intensity (the term collided with its own _Avoid_ entry under "heat demand") + fixed stale RHI field names; amended ADR-0004 Consequences for the standalone-table decision. Fuel split + bills (rest of EPC Energy Derivation) deferred to a follow-up — they need a Fuel Rates source (Ofgem-cap ETL) that does not exist yet. TDD, one test -> one impl: 7 tests (lodged read, rebaseliner pass-through + raise, orchestrator establish-and-persist + pre-SAP10 raise, Postgres round-trip + absent). pyright strict clean; AAA layout. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 21:21:34 +00:00
Khalim Conn-Kowlessar	75fbba60fc	feat(ara): AraFirstRunTriggerBody + ara_first_run lambda skeleton (#1130 ) Stage-2 entry point for the First Run use case. Adds the `ara_first_run` Lambda package mirroring the `postcode_splitter` template, its typed trigger contract, and a stub `FirstRunPipeline`. - `AraFirstRunTriggerBody`: thin command of five fields — `task_id`, `sub_task_id` (UUID, lifecycle), `portfolio_id`, `property_ids`, `scenario_ids` (int business IDs). No `model_config` override, so Pydantic's default `extra="ignore"` lets the FastAPI backend add fields without breaking deployed lambdas. UPRNs / Scenario defs are deliberately off the event — read from source-of-truth tables. - Thin `handler.py`: validate-and-delegate only, via a named `dispatch_first_run` seam (testable without the Lambda runtime). Subtask status (in-progress/complete/failed) + CloudWatch log URL come for free from the existing `@subtask_handler()` decorator. - `FirstRunPipeline` (orchestration/) stub: `run(command)` receives the validated command. Declares a structural `FirstRunCommand` Protocol (the three business fields) that `AraFirstRunTriggerBody` satisfies, so orchestration needs no application-layer import — rhymes with the `EpcFetcher`/`SolarFetcher` Protocols on IngestionOrchestrator (ADR-0011). Full Ingestion→Baseline→Modelling composition lands in #1136. - Dockerfile / requirements.txt / local_handler/ mirror postcode_splitter. TDD: 7 new tests (trigger-body validation incl. forward-compat + id-types, pipeline seam, handler delegation). pyright strict clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 20:38:15 +00:00
Khalim Conn-Kowlessar	1696cccba6	feat(ingestion): IngestionOrchestrator end-to-end (#1134 ) Stage 1 of the pipeline: per property, read its UPRN from the property row, fetch its EPC, resolve coordinates from the Geospatial reference repo, thread those into the Solar fetcher, and persist EPC + solar via repos. Fetchers never call each other — the orchestrator threads the coordinate (ADR-0011). Coordinates are reference data (deterministic from UPRN), resolved transiently to drive the solar fetch rather than persisted per-property. Depends on thin EpcFetcher/SolarFetcher Protocols (EpcClientService and GoogleSolarApiClient satisfy them structurally). Unit-tested against fakes — no DB, gov API, or network: persists EPC, threads coords into solar, skips UPRN-less properties and skips solar when coordinates are absent. pyright clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 19:58:21 +00:00
Jun-te Kim	dc159e0b45	tests framework completed	2026-05-20 14:00:19 +00:00
Jun-te Kim	d0cf3d14ad	get rid of comments	2026-05-20 13:21:11 +00:00
Jun-te Kim	914a8ed51e	postcode splliter working e2e	2026-05-20 11:07:40 +00:00
Jun-te Kim	0a04448217	applications/postcode_splitter: PostcodeSplitterOrchestrator + Lambda entrypoint slice Wires slice 1-5 primitives into a deployable splitter: - orchestration/postcode_splitter_orchestrator.py: PostcodeSplitterOrchestrator loads addresses via UserAddressRepository, groups by postcode via iter_postcode_grouped_batches, persists each batch under ara_postcode_splitter_batches/{task_id}/{subtask_id}/, creates a WAITING child SubTask, and publishes an address2UPRN SQS message per batch. - applications/postcode_splitter/: Lambda entrypoint. handler.py is decorated with @subtask_handler() so the parent SubTask lifecycle is decorator-owned; PostcodeSplitterTriggerBody validates the body. Dockerfile is the python:3.11 Lambda base with the DDD-shaped source layers and no pandas. - tests/orchestration/test_postcode_splitter_orchestrator.py: integration test using moto S3 + moto SQS + in-memory SQLite that exercises the full wiring against a fixture CSV spanning three postcode groups (one oversize) and asserts child count, persisted inputs, queue bodies, and dispatch order. backend/postcode_splitter/ and .github/workflows/deploy_terraform.yml are intentionally unchanged: the dockerfile_path flip is deferred until the companion backend/address2UPRN/ migration is also ready.	2026-05-19 17:46:12 +00:00
Jun-te Kim	d7f14033ba	orchestration: add TaskOrchestrator.create_child_subtask primitive Adds a primitive for creating a new WAITING SubTask under an existing parent Task, routing all SubTask creation through the orchestrator (replacing the legacy SubTaskInterface path used by the splitter). Skips _cascade because a new WAITING child against an IN_PROGRESS parent is a no-op under Task.recalculate_from_subtasks.	2026-05-19 17:19:41 +00:00
Jun-te Kim	54a674b5c8	added postcode splitter rewrite to ddd	2026-05-19 16:35:09 +00:00

14 commits