Commit graph

93 commits

Author SHA1 Message Date
Khalim Conn-Kowlessar
76717dfc3a feat(baseline): BaselineOrchestrator + BaselinePerformance aggregate (#1135)
Stage 2 of First Run. Establishes each Property's Baseline Performance
from persisted source data and writes it back — reads only from repos,
never a Fetcher or HTTP (ADR-0003), so it is byte-identical whether
Ingestion ran milliseconds ago or last week.

Domain (`domain/baseline/`):
- `Performance` VO — the four rated quantities: SAP / EPC Band / CO2 /
  Primary Energy Intensity. `lodged_performance(epc)` reads them off the
  EPC's recorded fields (PEUI = `energy_consumption_current`).
- `BaselinePerformance` (ADR-0004) — the paired `lodged` + `effective`
  Performance + `rebaseline_reason`, plus the no-derivation part of the
  energy block (`space_heating_kwh` / `water_heating_kwh`, off the RHI,
  deterministic per ADR-0006). Both halves always populated.
- `Rebaseliner` port + `StubRebaseliner`: the re-score-on-override seam
  (ADR-0011). SAP10 certs pass through (effective == lodged, reason
  "none"); a pre-SAP10 cert raises `RebaselineNotImplemented` rather
  than fabricating a plausible-but-wrong "none" — ML rebaselining is not
  wired yet. Mirrors the repo's strict-raise culture.

Persistence: new `BaselineRepository` port + `BaselinePostgresRepository`
+ flat-column `baseline_performance` SQLModel (one row per Property). Per
ADR-0004's amendment this is a standalone table, NOT columns on the
retiring `property_details_epc`. Production migration is FE-owned
(Drizzle) — docs/migrations/baseline-performance-table.md.

Docs (grill-with-docs): corrected CONTEXT.md Lodged/Effective Performance
to Primary Energy Intensity (the term collided with its own _Avoid_ entry
under "heat demand") + fixed stale RHI field names; amended ADR-0004
Consequences for the standalone-table decision.

Fuel split + bills (rest of EPC Energy Derivation) deferred to a
follow-up — they need a Fuel Rates source (Ofgem-cap ETL) that does not
exist yet.

TDD, one test -> one impl: 7 tests (lodged read, rebaseliner pass-through
+ raise, orchestrator establish-and-persist + pre-SAP10 raise, Postgres
round-trip + absent). pyright strict clean; AAA layout.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 21:21:34 +00:00
Khalim Conn-Kowlessar
75fbba60fc feat(ara): AraFirstRunTriggerBody + ara_first_run lambda skeleton (#1130)
Stage-2 entry point for the First Run use case. Adds the
`ara_first_run` Lambda package mirroring the `postcode_splitter`
template, its typed trigger contract, and a stub `FirstRunPipeline`.

- `AraFirstRunTriggerBody`: thin command of five fields — `task_id`,
  `sub_task_id` (UUID, lifecycle), `portfolio_id`, `property_ids`,
  `scenario_ids` (int business IDs). No `model_config` override, so
  Pydantic's default `extra="ignore"` lets the FastAPI backend add
  fields without breaking deployed lambdas. UPRNs / Scenario defs are
  deliberately off the event — read from source-of-truth tables.
- Thin `handler.py`: validate-and-delegate only, via a named
  `dispatch_first_run` seam (testable without the Lambda runtime).
  Subtask status (in-progress/complete/failed) + CloudWatch log URL
  come for free from the existing `@subtask_handler()` decorator.
- `FirstRunPipeline` (orchestration/) stub: `run(command)` receives the
  validated command. Declares a structural `FirstRunCommand` Protocol
  (the three business fields) that `AraFirstRunTriggerBody` satisfies,
  so orchestration needs no application-layer import — rhymes with the
  `EpcFetcher`/`SolarFetcher` Protocols on IngestionOrchestrator
  (ADR-0011). Full Ingestion→Baseline→Modelling composition lands in
  #1136.
- Dockerfile / requirements.txt / local_handler/ mirror postcode_splitter.

TDD: 7 new tests (trigger-body validation incl. forward-compat +
id-types, pipeline seam, handler delegation). pyright strict clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 20:38:15 +00:00
Khalim Conn-Kowlessar
1696cccba6 feat(ingestion): IngestionOrchestrator end-to-end (#1134)
Stage 1 of the pipeline: per property, read its UPRN from the property row,
fetch its EPC, resolve coordinates from the Geospatial reference repo, thread
those into the Solar fetcher, and persist EPC + solar via repos. Fetchers never
call each other — the orchestrator threads the coordinate (ADR-0011). Coordinates
are reference data (deterministic from UPRN), resolved transiently to drive the
solar fetch rather than persisted per-property.

Depends on thin EpcFetcher/SolarFetcher Protocols (EpcClientService and
GoogleSolarApiClient satisfy them structurally). Unit-tested against fakes — no
DB, gov API, or network: persists EPC, threads coords into solar, skips
UPRN-less properties and skips solar when coordinates are absent. pyright clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 19:58:21 +00:00
Khalim Conn-Kowlessar
3998ef586c feat(geospatial): GeospatialRepo — OS Open-UPRN coordinate lookup (#1131)
Add Coordinates value object + GeospatialRepository port + GeospatialS3Repository
adapter. Resolves a Property's lon/lat from the partitioned Ordnance Survey
Open-UPRN parquet (filename_meta -> partition -> UPRN row). A Repo, not a
Fetcher (ADR-0011): no live OS API call. The parquet reader is injected, so it's
unit-tested against fixture parquets with no S3/network; returns None when the
UPRN is uncovered or absent. pyright strict clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 19:55:46 +00:00
Khalim Conn-Kowlessar
caee4de2f4 feat(ingestion): relocate EpcClientService to infrastructure + SolarRepo (#1133)
Move the EpcClientService package (client + _retry + exceptions + tests) from
the dying backend/ tree to infrastructure/epc_client/ as the New-EPC-API Fetcher;
update the two callers (address2UPRN, a script). All 14 client tests pass.

Add SolarRepository port + SolarPostgresRepository persisting Google Solar
building insights as JSONB (solar_building_insights table), one row per Property.
The EPC repo half of this slice already landed in #1129. pyright strict clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 19:45:26 +00:00
Khalim Conn-Kowlessar
92de07efba feat(property): Property aggregate + PropertyRepository (#1132)
Add the Ara modelling aggregate root (ADR-0002): domain/property/ with
PropertyIdentity, SiteNotes, Property, Properties. Property.source_path
implements the two disjoint source paths + Recency Tie-Break (ADR-0001;
survey wins on an equal date); effective_epc resolves to the surveyed data
(Site Notes path) or the public EPC (epc_with_overlay path — Landlord
Overrides overlay is a later slice). Pure dataclasses, no infrastructure imports.

PropertyRepository port + PropertyPostgresRepository hydrate the aggregate
whole from a defensive view of the FE-owned 'property' table (identity columns)
plus the EPC slice via EpcRepository.get_for_property. Reads only from repos
(ADR-0003). 8 domain + 1 hydration test; pyright strict clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 19:39:54 +00:00
Khalim Conn-Kowlessar
311d1e751a feat(epc): persist renewable_heat_incentive — full round-trip equality (#1137)
Add epc_renewable_heat_incentive table (space_heating_kwh, water_heating_kwh +
the three insulation-impact kWh fields), wired into EpcPostgresRepository
save/get. This is the P0 gap: RenewableHeatIncentive carries the baseline
space-heating/hot-water kWh that EPC Energy Derivation consumes.

The round-trip test now asserts full deep-equality (dropped the
renewable_heat_incentive exclusion) and passes for RdSAP 21.0.0 + 21.0.1.
DB migration for the new table documented in
docs/migrations/epc-property-round-trip-fidelity.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 19:30:18 +00:00
Khalim Conn-Kowlessar
5f0a3b8f65 feat(epc): EPC persistence round-trip fidelity + JSONB code columns (Slice 1 #1129)
Relocate EpcPropertyModel + child tables from the dying backend/ tree to
infrastructure/postgres/epc_property_table.py (re-export shim keeps
documents_parser working). Add EpcRepository port + EpcPostgresRepository with
a full reverse mapper (epc_property tables -> EpcPropertyData).

Round-trip test surfaced two fidelity gaps:
 1. Union[int,str] SAP code fields were str()-coerced on save, losing the int
    (API) vs str (Site Notes) distinction. Now stored as JSONB (type-preserving).
 2. The schema was a partial projection. Closed the cheap gaps on the model
    (heating shower/bath counts, roof_construction_type, curtain_wall_age,
    addendum, mechanical_vent_duct_insulation_level, SAP 10.2 §2 ventilation
    fields + a ventilation_present flag). Structural gaps tracked as follow-ups;
    renewable_heat_incentive (P0, #1137) excluded from the assertion until landed.

Round-trip passes for RdSAP-Schema-21.0.0 and 21.0.1; pyright strict clean.
Migration inventory for the DB: docs/migrations/epc-property-round-trip-fidelity.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 19:26:18 +00:00
Jun-te Kim
3e30b4af40 tests wrong environemnt 2026-05-29 16:17:06 +00:00
Jun-te Kim
36f4c32904 added roofs 2026-05-26 16:18:26 +00:00
Jun-te Kim
8422041215 landlord overrid orchestration 2026-05-26 15:27:45 +00:00
Jun-te Kim
96aeed4f2e Remove EPC and asset_list changes unrelated to SAL handler
This branch's objective is the SAL ingestion handler
(applications/SAL/handler.py) and its dependency tree. Drop work
that crept in but is unreferenced by it:

- EPC feature: domain/epc, infrastructure/epc (gov_uk + historical
  clients), tests/infrastructure/epc
- datatypes/epc edits (instantaneous_wwhrs Optional) reverted to main
- asset_list/app.py local data-file/column tweak reverted to main

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:36:46 +00:00
Jun-te Kim
a747534f37 refactored to allow multiple column types 2026-05-22 15:28:26 +00:00
Jun-te Kim
11a498ba4e Map an unrecognised classification reply to UNKNOWN 🟥 2026-05-22 14:55:01 +00:00
Jun-te Kim
d0e5aa9e3f Classify a landlord description into a SAL property type 🟩 2026-05-22 14:53:31 +00:00
Jun-te Kim
675aa089c9 updated rdsap option; seperated s3 location in infrastrucutre; added open ai api 2026-05-22 14:00:33 +00:00
Jun-te Kim
61efcad27b standardist Address 2026-05-22 10:13:32 +00:00
Jun-te Kim
0dee917094 unsanistiesed address list instead of raw address lit 2026-05-22 08:27:59 +00:00
Jun-te Kim
91bb4b6571 address list 2026-05-22 08:22:13 +00:00
Jun-te Kim
84098e28ff raw address list repo 2026-05-22 08:17:37 +00:00
Jun-te Kim
cf14a4e3aa rename to SAL and AssetList and RawAddresses 2026-05-22 08:14:46 +00:00
Jun-te Kim
acb306f7b9 asset list from landlord 2026-05-22 07:34:50 +00:00
Jun-te Kim
94cbf5f516 changed useraddress landlordasset list 2026-05-21 16:59:57 +00:00
Jun-te Kim
8baa4c82aa save correct progress 2026-05-21 16:57:14 +00:00
Jun-te Kim
b14f98788e added landlord orchestration 2026-05-21 16:32:50 +00:00
Jun-te Kim
4830f82b58 test: add failing tests for get_col_to_description_mappings
Drive the contract for LandlordDescriptionOverridesOrchestrator.
get_col_to_description_mappings: given a list of UserAddress sharing
the same landlord_additional_info keys, return each key mapped to the
list of values found across all addresses.

Tests are red — the method still raises NotImplementedError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:32:15 +00:00
Daniel Roth
0d4462d131 GoogleSolarApi translates BuildingInsightsNotFoundError to sentinel dict 🟩 2026-05-21 16:06:54 +00:00
Daniel Roth
521294ad91 GoogleSolarApi delegates get_building_insights to GoogleSolarApiClient 🟥 2026-05-21 15:59:22 +00:00
Daniel Roth
b1933c07c3 GoogleSolarApiClient propagates exception after retry exhaustion 🟩 2026-05-21 15:53:39 +00:00
Daniel Roth
573656be64 GoogleSolarApiClient raises BuildingInsightsNotFoundError on 404 entity-not-found 🟥 2026-05-21 15:52:21 +00:00
Daniel Roth
44217bf361 GoogleSolarApiClient retries on transient HTTP errors 🟥 2026-05-21 15:49:57 +00:00
Daniel Roth
629fc34a0f GoogleSolarApiClient fetches building insights from the Solar API 🟥 2026-05-21 15:46:47 +00:00
Jun-te Kim
dc159e0b45 tests framework completed 2026-05-20 14:00:19 +00:00
Jun-te Kim
d0cf3d14ad get rid of comments 2026-05-20 13:21:11 +00:00
Jun-te Kim
8bb90a5aa5 sanitisation of postcode 2026-05-20 12:57:03 +00:00
Jun-te Kim
914a8ed51e postcode splliter working e2e 2026-05-20 11:07:40 +00:00
Jun-te Kim
0a04448217 applications/postcode_splitter: PostcodeSplitterOrchestrator + Lambda entrypoint slice
Wires slice 1-5 primitives into a deployable splitter:

- orchestration/postcode_splitter_orchestrator.py: PostcodeSplitterOrchestrator
  loads addresses via UserAddressRepository, groups by postcode via
  iter_postcode_grouped_batches, persists each batch under
  ara_postcode_splitter_batches/{task_id}/{subtask_id}/, creates a WAITING
  child SubTask, and publishes an address2UPRN SQS message per batch.

- applications/postcode_splitter/: Lambda entrypoint. handler.py is decorated
  with @subtask_handler() so the parent SubTask lifecycle is decorator-owned;
  PostcodeSplitterTriggerBody validates the body. Dockerfile is the
  python:3.11 Lambda base with the DDD-shaped source layers and no pandas.

- tests/orchestration/test_postcode_splitter_orchestrator.py: integration
  test using moto S3 + moto SQS + in-memory SQLite that exercises the full
  wiring against a fixture CSV spanning three postcode groups (one
  oversize) and asserts child count, persisted inputs, queue bodies, and
  dispatch order.

backend/postcode_splitter/ and .github/workflows/deploy_terraform.yml are
intentionally unchanged: the dockerfile_path flip is deferred until the
companion backend/address2UPRN/ migration is also ready.
2026-05-19 17:46:12 +00:00
Jun-te Kim
708f1b5d18 repositories: UserAddressRepository + UserAddressCsvS3Repository (CSV-on-S3 adapter)
Adds the persistence layer for UserAddress batches:

- Abstract UserAddressRepository with load_batch / save_batch.
- Concrete UserAddressCsvS3Repository over CsvS3Client:
  - load_batch reads canonical upload columns (Address 1/2/3, Postcode,
    Internal Reference), comma-joins non-empty address parts, and
    passes Internal Reference through (None when missing/empty).
  - save_batch writes a 3-column CSV (user_address,postcode,
    internal_reference) to {path_prefix}/{ISO datetime}_{uuid8}.csv
    and returns the s3://bucket/key URI.
- Postcode sanitisation flows through UserAddress.__post_init__; the
  repo never calls sanitise_postcode directly.

Tests (moto-backed) cover: three-line address load, Address-1-only
load, missing Internal Reference, save->reload round trip, and
unique-filename-per-save. pyright --strict clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 17:37:02 +00:00
Jun-te Kim
d70e8a9e53 utilities/aws_lambda: @subtask_handler injects TaskOrchestrator as third positional arg
The wrapped function now receives the decorator-owned TaskOrchestrator as
a third positional argument so handlers can compose their own use-case
orchestrator that shares the session, instead of opening a second Postgres
connection per invocation.

Both existing callers (backend/ordnanceSurvey/main.py and
backend/bulk_address2uprn_combiner/main.py) have their signatures extended
to accept the new positional argument (typed Optional[TaskOrchestrator] so
the legacy backend.utils.subtasks.subtask_handler — which only passes two
args — keeps working until the migration to the new decorator lands).

@task_handler is intentionally unchanged in this slice; symmetry is
deferred per issue #1103.
2026-05-19 17:31:27 +00:00
Jun-te Kim
d7f14033ba orchestration: add TaskOrchestrator.create_child_subtask primitive
Adds a primitive for creating a new WAITING SubTask under an existing
parent Task, routing all SubTask creation through the orchestrator
(replacing the legacy SubTaskInterface path used by the splitter).
Skips _cascade because a new WAITING child against an IN_PROGRESS
parent is a no-op under Task.recalculate_from_subtasks.
2026-05-19 17:19:41 +00:00
Jun-te Kim
7b00a33cd2 infrastructure: typed S3/SQS clients (S3Client, CsvS3Client, SqsClient, Address2UprnQueueClient)
Slice 3/6 of the postcode_splitter refactor (Hestia-Homes/Model#1101).
Introduces a thin typed infrastructure layer wrapping boto3 for the AWS
side of the splitter. S3Client/SqsClient are bucket-/queue-bound byte
adapters; CsvS3Client subclasses S3Client to round-trip CSV row dicts
via the existing parse_s3_uri helper in utils/s3.py; Address2UprnQueueClient
subclasses SqsClient to publish the typed {task_id, sub_task_id, s3_uri}
fan-out body the downstream consumer expects. moto[s3,sqs] is pulled into
test.requirements.txt and the new tests/infrastructure/ suite exercises
each client against the moto backend (S3 round-trip, CSV round-trip,
SQS send + body inspection, typed publish + body inspection). pyright
--strict is clean on the new modules.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 17:12:21 +00:00
Jun-te Kim
6198d7a46d postcode_splitter: pure domain (UserAddress, sanitise_postcode, postcode_batching)
Slice 1/6 of the postcode_splitter refactor (Hestia-Homes/Model#1100).
Introduces the pure-domain foundation under domain/, with no AWS, Postgres,
or pandas. UserAddress is a frozen dataclass that sanitises its postcode in
__post_init__ via the canonical sanitise_postcode helper, and
iter_postcode_grouped_batches preserves the legacy splitter's batching
invariants (group-by-postcode in insertion order, never split a group,
oversize single-postcode groups dispatched whole, final flush). Updates
UBIQUITOUS_LANGUAGE.md so the User Address term covers both the dataclass
sense (preferred in domain code) and the raw upstream-string sense.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 16:45:47 +00:00
Jun-te Kim
54a674b5c8 added postcode splitter rewrite to ddd 2026-05-19 16:35:09 +00:00