From 5aebd90ef7f81a6d0daf650bce9f756cb96bb449 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sat, 30 May 2026 19:26:17 +0000 Subject: [PATCH 01/18] docs(ara): composable stage-orchestrator design (ADR-0011 + ADR-0003 amend + CONTEXT) Records the grill-with-docs outcomes for the ara_first_run rebuild: three composable stage orchestrators (Ingestion/Baseline/Modelling), one lambda per use case chaining them through repos (not in-memory), and the Fetcher-vs-Repo data-source taxonomy. Amends ADR-0003's chaining rule to generalise beyond RefreshOrchestrator. Adds the pipeline-composition + First Run vocabulary to CONTEXT.md. Co-Authored-By: Claude Opus 4.8 --- CONTEXT.md | 20 + ...3-strict-ingestion-modelling-separation.md | 3 + .../0011-composable-stage-orchestrators.md | 41 + infrastructure/postgres/epc_property_table.py | 716 ++++++++++++++++++ 4 files changed, 780 insertions(+) create mode 100644 docs/adr/0011-composable-stage-orchestrators.md create mode 100644 infrastructure/postgres/epc_property_table.py diff --git a/CONTEXT.md b/CONTEXT.md index 54e66032..b99a1ac6 100644 --- a/CONTEXT.md +++ b/CONTEXT.md @@ -129,6 +129,26 @@ _Avoid_: UCL adjustment, energy correction, metered correction A per-field indicator that a Property's value for an EPC field differs significantly from Comparable Properties; advisory only — surfaces in the UI to prompt user review, does not block modelling. _Avoid_: outlier, mismatch, divergence flag +### Pipeline composition + +The modelling backend is composed from three independently-invocable **stage orchestrators**, chained differently per use case. This composability — not a single end-to-end function — is the point: it is what lets the interactive single-property flow pause between stages where the batch flows do not. (Supersedes the monolithic `model_engine`.) + +**Ingestion**: +The first stage. Acquires a Property's external source data — the EPC certificate (New EPC API) and Google Solar insights — and resolves its coordinates, then writes everything to repos. Writes only; runs no modelling business logic. Per ADR-0003 nothing downstream reads across this seam by calling back to a source — downstream stages read the persisted data from repos. +_Avoid_: fetching (a fetch is one source call; Ingestion is the whole write stage), data load + +**Baseline** (stage): +The second stage. Reads the persisted source data from repos, hydrates the **Property** aggregate, resolves its **Effective EPC**, and establishes its **Baseline Performance**. Re-scoring after a user override lives here. Distinct from **Baseline Performance** (the aggregate it produces). +_Avoid_: rebaseline (that is a specific ML trigger — see Rebaselining), enrichment + +**Modelling** (stage): +The third stage. Takes the baselined Property plus a set of **Scenarios** and produces **Recommendations** → an **Optimised Package** per **Scenario Phase** → **Plans**, persisted to repos. A separate orchestrator from Baseline so the single-property flow can stop after Baseline and only run Modelling when the user hits "play". +_Avoid_: scoring (overloaded), recommendation engine + +**First Run**: +The use case where a Property has only a row in the property table (post address→UPRN matching) and no existing **Plan**: the pipeline runs Ingestion → Baseline → Modelling end-to-end over a batch. The first sibling lambda being built (`ara_first_run`). +_Avoid_: initial run, cold run + ### ML training **EPC ML Transform**: diff --git a/docs/adr/0003-strict-ingestion-modelling-separation.md b/docs/adr/0003-strict-ingestion-modelling-separation.md index 68361ba9..318f2970 100644 --- a/docs/adr/0003-strict-ingestion-modelling-separation.md +++ b/docs/adr/0003-strict-ingestion-modelling-separation.md @@ -1,5 +1,8 @@ # Strict separation between Ingestion and Modelling +**Status: Accepted, refined by [ADR-0011](0011-composable-stage-orchestrators.md).** The one-way flow below stands. ADR-0011 generalises the chaining rule: it is no longer "only a `RefreshOrchestrator` may chain" — it is *"only a top-level use-case pipeline orchestrator (e.g. `FirstRunPipeline`) may chain across the Ingestion→Modelling seam; the stage orchestrators communicate through repos and never call across it."* + + Data flows one way only: **Ingestion → Repos → Modelling**. Modelling services never make external HTTP calls; Ingestion services never run business logic. If Modelling needs fresh data, it sees a stale record in a repo and returns; the caller (a refresh orchestrator or the FE) decides whether to ingest first. We considered allowing modelling services to call fetchers directly on cache miss — convenient — and rejected it. The trade-off is that modelling cannot "self-heal" by going to the gov EPC API when it finds stale data. The benefit is that modelling becomes a deterministic function of repository state: same Property in the repos, same modelling output. That is the property that makes modelling unit-testable against fakes (no DB, no network, no ML lambda), reproducible, and debuggable. It also enables a per-property UI flow where fetched data is shown to the user for review and possible override **before** modelling runs. diff --git a/docs/adr/0011-composable-stage-orchestrators.md b/docs/adr/0011-composable-stage-orchestrators.md new file mode 100644 index 00000000..44caae74 --- /dev/null +++ b/docs/adr/0011-composable-stage-orchestrators.md @@ -0,0 +1,41 @@ +# Composable stage orchestrators; one lambda per use case; stages communicate through repos + +**Status: Accepted.** Refines [ADR-0003](0003-strict-ingestion-modelling-separation.md) (Ingestion→Repos→Modelling one-way flow) for the concrete shape of the rebuilt backend. Decided in a `/grill-with-docs` session (2026-05-30) before the first `ara_first_run` slice. Replaces the stale §4 / §9 / §11 architecture of `ara_backend_design.md`, which predates this thinking. + +## Context + +The pipeline must serve three use cases from the *same building blocks*: + +- **First Run** (batch) — a property has only a row in the property table; run everything end-to-end. +- **Refresh** (batch) — re-check for new data and re-model if it changed. +- **Single-property interactive** (a new front end) — fetch, **pause** for the user to validate/override, re-score, **pause** again, then model on demand. + +The single-property flow is the forcing function: it must be able to stop *between* establishing baseline data and producing recommendations. The legacy `model_engine` (one 1331-line function) cannot be re-entered partway, which is why it cannot serve this flow. + +## Decision + +**Three independently-invocable stage orchestrators**, in `orchestration/`: + +| Stage | Reads | Writes | Role | +|---|---|---|---| +| `IngestionOrchestrator` | Fetchers (EPC, Solar) + reference Repos (Geospatial) | source Repos | acquire + persist external source data | +| `BaselineOrchestrator` | source Repos | `Property` + Baseline Performance | hydrate the aggregate; resolve Effective EPC; re-score on override | +| `ModellingOrchestrator` | baselined Repos + Scenario/Materials Repos | Plans / Recommendations Repos | scenarios → recommendations → optimise → plans | + +**One lambda per use case** composes these via a thin pipeline object. `applications/ara_first_run/` is the first: a `handler.py` that only wires dependencies and delegates to a `FirstRunPipeline` (`Ingestion → Baseline → Modelling`). `refresh` and the single-property app are later siblings composing the *same three* stages differently. + +**Stages communicate through the repos, not in-memory.** The pipeline threads only identifiers (`property_ids`) between stages; each stage reads what it needs from repos and writes its outputs back. Baseline is therefore byte-identical whether ingestion ran 50 ms ago (First Run) or last week (single-property review) — there is no second entry mode. + +**Data-source taxonomy: "external" does not mean "Fetcher."** A **Fetcher** hits a *live, per-entity* API and returns raw data (infra client, no DB): the New EPC API, Google Solar. A **Repo** reads *stored data by key* — ours *or* a hosted reference dataset — and returns domain objects (no HTTP): Ordnance Survey Open-UPRN coordinates (`GeospatialRepo`), cost data (`MaterialsRepo`). When a fetch needs reference data (Solar needs lat/long), the **orchestrator** reads the repo and threads the value into the fetcher; fetchers never call each other. + +## Considered options + +- **One lambda per stage, coordinated by AWS Step Functions** — rejected. Step Functions buys cross-lambda completion signalling we don't need when the three stages are cheap to keep warm in one process and a batch is bite-size (≤~100 properties). Promoting a stage to its own lambda later is cheap *because* it is already a separate class. +- **In-memory hand-off between stages in First Run** — rejected as the default. It gives `BaselineOrchestrator` two entry modes (fresh object vs repo read) and hides EPC persistence loss until a later Refresh reads the data back. Going through repos surfaces that loss inside First Run on day one. May be added later as an opt-in fast path where a profiler justifies it. + +## Consequences + +- A few redundant reads of rows just written, within one process — negligible at batch scale, and the price of each stage being a pure function of repo state. +- Each stage is unit-testable against fake repos with no upstream stage present. +- No HTTP library may appear in the `BaselineOrchestrator` / `ModellingOrchestrator` import graph (ADR-0003 holds per-stage). +- Because stages round-trip `EpcPropertyData` through persistence in First Run, a **persistence round-trip fidelity test** (fetch EPCs across schema versions → map → save → load → map back → assert deep-equality) is a prerequisite deliverable: it is what proves `epc_property` + child tables actually cover the domain object, and surfaces any required FE-owned migration early. diff --git a/infrastructure/postgres/epc_property_table.py b/infrastructure/postgres/epc_property_table.py new file mode 100644 index 00000000..deee192c --- /dev/null +++ b/infrastructure/postgres/epc_property_table.py @@ -0,0 +1,716 @@ +from __future__ import annotations + +from typing import ClassVar, Optional, Union +from sqlalchemy import Column +from sqlalchemy.dialects.postgresql import JSONB +from sqlmodel import SQLModel, Field + +from datatypes.epc.domain.epc_property_data import ( + EpcPropertyData, + EnergyElement, + MainHeatingDetail, + SapBuildingPart, + SapFloorDimension, + SapFlatDetails, + SapWindow, +) + + +class EpcPropertyModel(SQLModel, table=True): + __tablename__: ClassVar[str] = "epc_property" # pyright: ignore[reportIncompatibleVariableOverride] + + id: Optional[int] = Field(default=None, primary_key=True) + property_id: Optional[int] = Field(default=None) + portfolio_id: Optional[int] = Field(default=None) + uploaded_file_id: Optional[int] = Field(default=None) + + # Identity / admin + uprn: Optional[int] = Field(default=None) + uprn_source: Optional[str] = Field(default=None) + report_reference: Optional[str] = Field(default=None) + report_type: Optional[str] = Field(default=None) + assessment_type: Optional[str] = Field(default=None) + sap_version: Optional[float] = Field(default=None) + schema_type: Optional[str] = Field(default=None) + schema_versions_original: Optional[str] = Field(default=None) + status: Optional[str] = Field(default=None) + calculation_software_version: Optional[str] = Field(default=None) + + # Address + address_line_1: Optional[str] = Field(default=None) + address_line_2: Optional[str] = Field(default=None) + post_town: Optional[str] = Field(default=None) + postcode: Optional[str] = Field(default=None) + region_code: Optional[str] = Field(default=None) + country_code: Optional[str] = Field(default=None) + language_code: Optional[str] = Field(default=None) + + # Property description + dwelling_type: str + property_type: Optional[str] = Field(default=None) + built_form: Optional[str] = Field(default=None) + tenure: str + transaction_type: str + inspection_date: str # store as ISO string; cast on read if needed + completion_date: Optional[str] = Field(default=None) + registration_date: Optional[str] = Field(default=None) + total_floor_area_m2: float + measurement_type: Optional[int] = Field(default=None) + + # Flags + solar_water_heating: bool + has_hot_water_cylinder: bool + has_fixed_air_conditioning: bool + has_conservatory: Optional[bool] = Field(default=None) + has_heated_separate_conservatory: Optional[bool] = Field(default=None) + conservatory_type: Optional[int] = Field(default=None) + + # Counts + door_count: int + wet_rooms_count: int + extensions_count: int + heated_rooms_count: int + open_chimneys_count: int + habitable_rooms_count: int + insulated_door_count: int + cfl_fixed_lighting_bulbs_count: int + led_fixed_lighting_bulbs_count: int + incandescent_fixed_lighting_bulbs_count: int + blocked_chimneys_count: Optional[int] = Field(default=None) + draughtproofed_door_count: Optional[int] = Field(default=None) + energy_rating_average: Optional[int] = Field(default=None) + low_energy_fixed_lighting_bulbs_count: Optional[int] = Field(default=None) + fixed_lighting_outlets_count: Optional[int] = Field(default=None) + low_energy_fixed_lighting_outlets_count: Optional[int] = Field(default=None) + number_of_storeys: Optional[int] = Field(default=None) + any_unheated_rooms: Optional[bool] = Field(default=None) + mechanical_vent_duct_insulation_level: Optional[int] = Field(default=None) + + # Addendum (cert-level construction flags) + addendum_stone_walls: Optional[bool] = Field(default=None) + addendum_system_build: Optional[bool] = Field(default=None) + addendum_numbers: Optional[list[int]] = Field( + default=None, sa_column=Column(JSONB, nullable=True) + ) + + # Misc + hydro: Optional[bool] = Field(default=None) + photovoltaic_array: Optional[bool] = Field(default=None) + waste_water_heat_recovery: Optional[str] = Field(default=None) + pressure_test: Optional[int] = Field(default=None) + pressure_test_certificate_number: Optional[int] = Field(default=None) + percent_draughtproofed: Optional[int] = Field(default=None) + insulated_door_u_value: Optional[float] = Field(default=None) + multiple_glazed_proportion: Optional[int] = Field(default=None) + windows_transmission_u_value: Optional[float] = Field(default=None) + windows_transmission_data_source: Optional[int] = Field(default=None) + windows_transmission_solar_transmittance: Optional[float] = Field(default=None) + + # Energy source + energy_mains_gas: bool + energy_meter_type: str + energy_pv_battery_count: int + energy_wind_turbines_count: int + energy_gas_smart_meter_present: bool + energy_is_dwelling_export_capable: bool + energy_wind_turbines_terrain_type: str + energy_electricity_smart_meter_present: bool + energy_pv_connection: Optional[Union[int, str]] = Field( + default=None, sa_column=Column(JSONB, nullable=True) + ) + energy_pv_percent_roof_area: Optional[int] = Field(default=None) + energy_pv_battery_capacity: Optional[float] = Field(default=None) + energy_wind_turbine_hub_height: Optional[float] = Field(default=None) + energy_wind_turbine_rotor_diameter: Optional[float] = Field(default=None) + + # Heating config + # Union[int, str] code fields stored as JSONB to preserve the int (API) vs + # str (Site Notes) distinction on round-trip (see docs/migrations/epc-property-round-trip-fidelity.md §1). + heating_cylinder_size: Optional[Union[int, str]] = Field( + default=None, sa_column=Column(JSONB, nullable=True) + ) + heating_water_heating_code: Optional[int] = Field(default=None) + heating_water_heating_fuel: Optional[int] = Field(default=None) + heating_immersion_heating_type: Optional[Union[int, str]] = Field( + default=None, sa_column=Column(JSONB, nullable=True) + ) + heating_cylinder_insulation_type: Optional[Union[int, str]] = Field( + default=None, sa_column=Column(JSONB, nullable=True) + ) + heating_cylinder_thermostat: Optional[str] = Field(default=None) + heating_secondary_fuel_type: Optional[int] = Field(default=None) + heating_secondary_heating_type: Optional[Union[int, str]] = Field( + default=None, sa_column=Column(JSONB, nullable=True) + ) + heating_cylinder_insulation_thickness_mm: Optional[int] = Field(default=None) + heating_wwhrs_index_number_1: Optional[int] = Field(default=None) + heating_wwhrs_index_number_2: Optional[int] = Field(default=None) + heating_shower_outlet_type: Optional[Union[int, str]] = Field( + default=None, sa_column=Column(JSONB, nullable=True) + ) + heating_shower_wwhrs: Optional[int] = Field(default=None) + heating_number_baths: Optional[int] = Field(default=None) + heating_number_baths_wwhrs: Optional[int] = Field(default=None) + heating_electric_shower_count: Optional[int] = Field(default=None) + heating_mixer_shower_count: Optional[int] = Field(default=None) + + # Ventilation + ventilation_type: Optional[str] = Field(default=None) + ventilation_draught_lobby: Optional[bool] = Field(default=None) + ventilation_pressure_test: Optional[str] = Field(default=None) + ventilation_open_flues_count: Optional[int] = Field(default=None) + ventilation_closed_flues_count: Optional[int] = Field(default=None) + ventilation_boiler_flues_count: Optional[int] = Field(default=None) + ventilation_other_flues_count: Optional[int] = Field(default=None) + ventilation_extract_fans_count: Optional[int] = Field(default=None) + ventilation_passive_vents_count: Optional[int] = Field(default=None) + ventilation_flueless_gas_fires_count: Optional[int] = Field(default=None) + ventilation_in_pcdf_database: Optional[bool] = Field(default=None) + # SAP 10.2 §2 lodgements + a presence flag so an all-None SapVentilation + # round-trips as present (not collapsed to None). + ventilation_present: bool = Field(default=False) + ventilation_sheltered_sides: Optional[int] = Field(default=None) + ventilation_has_suspended_timber_floor: Optional[bool] = Field(default=None) + ventilation_suspended_timber_floor_sealed: Optional[bool] = Field(default=None) + ventilation_has_draught_lobby: Optional[bool] = Field(default=None) + ventilation_air_permeability_ap4_m3_h_m2: Optional[float] = Field(default=None) + ventilation_mechanical_ventilation_kind: Optional[str] = Field(default=None) + mechanical_ventilation: Optional[int] = Field(default=None) + mechanical_vent_duct_type: Optional[int] = Field(default=None) + mechanical_vent_duct_placement: Optional[int] = Field(default=None) + mechanical_vent_duct_insulation: Optional[int] = Field(default=None) + mechanical_ventilation_index_number: Optional[int] = Field(default=None) + mechanical_vent_measured_installation: Optional[str] = Field(default=None) + + @classmethod + def from_epc_property_data( + cls, + data: EpcPropertyData, + property_id: Optional[int] = None, + portfolio_id: Optional[int] = None, + ) -> EpcPropertyModel: + es = data.sap_energy_source + h = data.sap_heating + v = data.sap_ventilation + shower = h.shower_outlets.shower_outlet if h.shower_outlets else None + pv = es.photovoltaic_supply + wt = es.wind_turbine_details + pvb = es.pv_batteries + + return cls( + property_id=property_id, + portfolio_id=portfolio_id, + uprn=data.uprn, + uprn_source=data.uprn_source, + report_reference=data.report_reference, + report_type=data.report_type, + assessment_type=data.assessment_type, + sap_version=data.sap_version, + schema_type=data.schema_type, + schema_versions_original=data.schema_versions_original, + status=data.status, + calculation_software_version=data.calculation_software_version, + address_line_1=data.address_line_1, + address_line_2=data.address_line_2, + post_town=data.post_town, + postcode=data.postcode, + region_code=data.region_code, + country_code=data.country_code, + language_code=data.language_code, + dwelling_type=data.dwelling_type, + property_type=data.property_type, + built_form=data.built_form, + tenure=data.tenure, + transaction_type=data.transaction_type, + inspection_date=data.inspection_date.isoformat(), + completion_date=( + data.completion_date.isoformat() if data.completion_date else None + ), + registration_date=( + data.registration_date.isoformat() if data.registration_date else None + ), + total_floor_area_m2=data.total_floor_area_m2, + measurement_type=data.measurement_type, + solar_water_heating=data.solar_water_heating, + has_hot_water_cylinder=data.has_hot_water_cylinder, + has_fixed_air_conditioning=data.has_fixed_air_conditioning, + has_conservatory=data.has_conservatory, + has_heated_separate_conservatory=data.has_heated_separate_conservatory, + conservatory_type=data.conservatory_type, + door_count=data.door_count, + wet_rooms_count=data.wet_rooms_count, + extensions_count=data.extensions_count, + heated_rooms_count=data.heated_rooms_count, + open_chimneys_count=data.open_chimneys_count, + habitable_rooms_count=data.habitable_rooms_count, + insulated_door_count=data.insulated_door_count, + cfl_fixed_lighting_bulbs_count=data.cfl_fixed_lighting_bulbs_count, + led_fixed_lighting_bulbs_count=data.led_fixed_lighting_bulbs_count, + incandescent_fixed_lighting_bulbs_count=data.incandescent_fixed_lighting_bulbs_count, + blocked_chimneys_count=data.blocked_chimneys_count, + draughtproofed_door_count=data.draughtproofed_door_count, + energy_rating_average=data.energy_rating_average, + low_energy_fixed_lighting_bulbs_count=data.low_energy_fixed_lighting_bulbs_count, + fixed_lighting_outlets_count=data.fixed_lighting_outlets_count, + low_energy_fixed_lighting_outlets_count=data.low_energy_fixed_lighting_outlets_count, + number_of_storeys=data.number_of_storeys, + any_unheated_rooms=data.any_unheated_rooms, + mechanical_vent_duct_insulation_level=data.mechanical_vent_duct_insulation_level, + addendum_stone_walls=data.addendum.stone_walls if data.addendum else None, + addendum_system_build=( + data.addendum.system_build if data.addendum else None + ), + addendum_numbers=data.addendum.addendum_numbers if data.addendum else None, + hydro=data.hydro, + photovoltaic_array=data.photovoltaic_array, + waste_water_heat_recovery=data.waste_water_heat_recovery, + pressure_test=data.pressure_test, + pressure_test_certificate_number=data.pressure_test_certificate_number, + percent_draughtproofed=data.percent_draughtproofed, + insulated_door_u_value=data.insulated_door_u_value, + multiple_glazed_proportion=data.multiple_glazed_proportion, + windows_transmission_u_value=( + data.windows_transmission_details.u_value + if data.windows_transmission_details + else None + ), + windows_transmission_data_source=( + data.windows_transmission_details.data_source + if data.windows_transmission_details + else None + ), + windows_transmission_solar_transmittance=( + data.windows_transmission_details.solar_transmittance + if data.windows_transmission_details + else None + ), + energy_mains_gas=es.mains_gas, + energy_meter_type=str(es.meter_type), + energy_pv_battery_count=es.pv_battery_count, + energy_wind_turbines_count=es.wind_turbines_count, + energy_gas_smart_meter_present=es.gas_smart_meter_present, + energy_is_dwelling_export_capable=es.is_dwelling_export_capable, + energy_wind_turbines_terrain_type=str(es.wind_turbines_terrain_type), + energy_electricity_smart_meter_present=es.electricity_smart_meter_present, + energy_pv_connection=es.pv_connection, + energy_pv_percent_roof_area=( + pv.none_or_no_details.percent_roof_area if pv else None + ), + energy_pv_battery_capacity=pvb.pv_battery.battery_capacity if pvb else None, + energy_wind_turbine_hub_height=wt.hub_height if wt else None, + energy_wind_turbine_rotor_diameter=wt.rotor_diameter if wt else None, + heating_cylinder_size=h.cylinder_size, + heating_water_heating_code=h.water_heating_code, + heating_water_heating_fuel=h.water_heating_fuel, + heating_immersion_heating_type=h.immersion_heating_type, + heating_cylinder_insulation_type=h.cylinder_insulation_type, + heating_cylinder_thermostat=h.cylinder_thermostat, + heating_secondary_fuel_type=h.secondary_fuel_type, + heating_secondary_heating_type=h.secondary_heating_type, + heating_cylinder_insulation_thickness_mm=h.cylinder_insulation_thickness_mm, + heating_wwhrs_index_number_1=h.instantaneous_wwhrs.wwhrs_index_number1, + heating_wwhrs_index_number_2=h.instantaneous_wwhrs.wwhrs_index_number2, + heating_shower_outlet_type=shower.shower_outlet_type if shower else None, + heating_shower_wwhrs=shower.shower_wwhrs if shower else None, + heating_number_baths=h.number_baths, + heating_number_baths_wwhrs=h.number_baths_wwhrs, + heating_electric_shower_count=h.electric_shower_count, + heating_mixer_shower_count=h.mixer_shower_count, + ventilation_type=v.ventilation_type if v else None, + ventilation_draught_lobby=v.draught_lobby if v else None, + ventilation_pressure_test=v.pressure_test if v else None, + ventilation_open_flues_count=v.open_flues_count if v else None, + ventilation_closed_flues_count=v.closed_flues_count if v else None, + ventilation_boiler_flues_count=v.boiler_flues_count if v else None, + ventilation_other_flues_count=v.other_flues_count if v else None, + ventilation_extract_fans_count=v.extract_fans_count if v else None, + ventilation_passive_vents_count=v.passive_vents_count if v else None, + ventilation_flueless_gas_fires_count=( + v.flueless_gas_fires_count if v else None + ), + ventilation_in_pcdf_database=v.ventilation_in_pcdf_database if v else None, + ventilation_present=v is not None, + ventilation_sheltered_sides=v.sheltered_sides if v else None, + ventilation_has_suspended_timber_floor=( + v.has_suspended_timber_floor if v else None + ), + ventilation_suspended_timber_floor_sealed=( + v.suspended_timber_floor_sealed if v else None + ), + ventilation_has_draught_lobby=v.has_draught_lobby if v else None, + ventilation_air_permeability_ap4_m3_h_m2=( + v.air_permeability_ap4_m3_h_m2 if v else None + ), + ventilation_mechanical_ventilation_kind=( + v.mechanical_ventilation_kind if v else None + ), + mechanical_ventilation=data.mechanical_ventilation, + mechanical_vent_duct_type=data.mechanical_vent_duct_type, + mechanical_vent_duct_placement=data.mechanical_vent_duct_placement, + mechanical_vent_duct_insulation=data.mechanical_vent_duct_insulation, + mechanical_ventilation_index_number=data.mechanical_ventilation_index_number, + mechanical_vent_measured_installation=data.mechanical_vent_measured_installation, + ) + + +class EpcPropertyEnergyPerformanceModel(SQLModel, table=True): + __tablename__: ClassVar[str] = "epc_property_energy_performance" # pyright: ignore[reportIncompatibleVariableOverride] + + id: Optional[int] = Field(default=None, primary_key=True) + epc_property_id: int = Field( + foreign_key="epc_property.id", nullable=False, unique=True + ) + + energy_rating_current: Optional[int] = Field(default=None) + energy_consumption_current: Optional[int] = Field(default=None) + environmental_impact_current: Optional[int] = Field(default=None) + heating_cost_current: Optional[float] = Field(default=None) + lighting_cost_current: Optional[float] = Field(default=None) + hot_water_cost_current: Optional[float] = Field(default=None) + co2_emissions_current: Optional[float] = Field(default=None) + co2_emissions_current_per_floor_area: Optional[int] = Field(default=None) + current_energy_efficiency_band: Optional[str] = Field(default=None) + energy_rating_potential: Optional[float] = Field(default=None) + energy_consumption_potential: Optional[int] = Field(default=None) + environmental_impact_potential: Optional[int] = Field(default=None) + heating_cost_potential: Optional[float] = Field(default=None) + lighting_cost_potential: Optional[float] = Field(default=None) + hot_water_cost_potential: Optional[float] = Field(default=None) + co2_emissions_potential: Optional[float] = Field(default=None) + potential_energy_efficiency_band: Optional[str] = Field(default=None) + + @classmethod + def from_epc_property_data( + cls, data: EpcPropertyData, epc_property_id: int + ) -> EpcPropertyEnergyPerformanceModel: + return cls( + epc_property_id=epc_property_id, + energy_rating_current=data.energy_rating_current, + energy_consumption_current=data.energy_consumption_current, + environmental_impact_current=data.environmental_impact_current, + heating_cost_current=data.heating_cost_current, + lighting_cost_current=data.lighting_cost_current, + hot_water_cost_current=data.hot_water_cost_current, + co2_emissions_current=data.co2_emissions_current, + co2_emissions_current_per_floor_area=data.co2_emissions_current_per_floor_area, + current_energy_efficiency_band=( + data.current_energy_efficiency_band.value + if data.current_energy_efficiency_band + else None + ), + energy_rating_potential=data.energy_rating_potential, + energy_consumption_potential=data.energy_consumption_potential, + environmental_impact_potential=data.environmental_impact_potential, + heating_cost_potential=data.heating_cost_potential, + lighting_cost_potential=data.lighting_cost_potential, + hot_water_cost_potential=data.hot_water_cost_potential, + co2_emissions_potential=data.co2_emissions_potential, + potential_energy_efficiency_band=( + data.potential_energy_efficiency_band.value + if data.potential_energy_efficiency_band + else None + ), + ) + + +class EpcFlatDetailsModel(SQLModel, table=True): + __tablename__: ClassVar[str] = "epc_flat_details" # pyright: ignore[reportIncompatibleVariableOverride] + + id: Optional[int] = Field(default=None, primary_key=True) + epc_property_id: int = Field( + foreign_key="epc_property.id", nullable=False, unique=True + ) + + level: int + top_storey: str + flat_location: int + heat_loss_corridor: int + storey_count: Optional[int] = Field(default=None) + unheated_corridor_length_m: Optional[int] = Field(default=None) + + @classmethod + def from_domain( + cls, flat: SapFlatDetails, epc_property_id: int + ) -> EpcFlatDetailsModel: + return cls( + epc_property_id=epc_property_id, + level=flat.level, + top_storey=flat.top_storey, + flat_location=flat.flat_location, + heat_loss_corridor=flat.heat_loss_corridor, + storey_count=flat.storey_count, + unheated_corridor_length_m=flat.unheated_corridor_length_m, + ) + + +class EpcMainHeatingDetailModel(SQLModel, table=True): + __tablename__: ClassVar[str] = "epc_main_heating_detail" # pyright: ignore[reportIncompatibleVariableOverride] + + id: Optional[int] = Field(default=None, primary_key=True) + epc_property_id: int = Field(foreign_key="epc_property.id", nullable=False) + + has_fghrs: bool + # Union[int, str] code fields — JSONB to preserve int/str on round-trip. + main_fuel_type: Union[int, str] = Field(sa_column=Column(JSONB, nullable=False)) + heat_emitter_type: Union[int, str] = Field(sa_column=Column(JSONB, nullable=False)) + emitter_temperature: Union[int, str] = Field( + sa_column=Column(JSONB, nullable=False) + ) + main_heating_control: Union[int, str] = Field( + sa_column=Column(JSONB, nullable=False) + ) + fan_flue_present: Optional[bool] = Field(default=None) + boiler_flue_type: Optional[int] = Field(default=None) + boiler_ignition_type: Optional[int] = Field(default=None) + central_heating_pump_age: Optional[int] = Field(default=None) + central_heating_pump_age_str: Optional[str] = Field(default=None) + main_heating_index_number: Optional[int] = Field(default=None) + sap_main_heating_code: Optional[int] = Field(default=None) + main_heating_number: Optional[int] = Field(default=None) + main_heating_category: Optional[int] = Field(default=None) + main_heating_fraction: Optional[int] = Field(default=None) + main_heating_data_source: Optional[int] = Field(default=None) + condensing: Optional[bool] = Field(default=None) + weather_compensator: Optional[bool] = Field(default=None) + + @classmethod + def from_domain( + cls, detail: MainHeatingDetail, epc_property_id: int + ) -> EpcMainHeatingDetailModel: + return cls( + epc_property_id=epc_property_id, + has_fghrs=detail.has_fghrs, + main_fuel_type=detail.main_fuel_type, + heat_emitter_type=detail.heat_emitter_type, + emitter_temperature=detail.emitter_temperature, + main_heating_control=detail.main_heating_control, + fan_flue_present=detail.fan_flue_present, + boiler_flue_type=detail.boiler_flue_type, + boiler_ignition_type=detail.boiler_ignition_type, + central_heating_pump_age=detail.central_heating_pump_age, + central_heating_pump_age_str=detail.central_heating_pump_age_str, + main_heating_index_number=detail.main_heating_index_number, + sap_main_heating_code=detail.sap_main_heating_code, + main_heating_number=detail.main_heating_number, + main_heating_category=detail.main_heating_category, + main_heating_fraction=detail.main_heating_fraction, + main_heating_data_source=detail.main_heating_data_source, + condensing=detail.condensing, + weather_compensator=detail.weather_compensator, + ) + + +class EpcBuildingPartModel(SQLModel, table=True): + __tablename__: ClassVar[str] = "epc_building_part" # pyright: ignore[reportIncompatibleVariableOverride] + + id: Optional[int] = Field(default=None, primary_key=True) + epc_property_id: int = Field(foreign_key="epc_property.id", nullable=False) + + identifier: str + construction_age_band: str + # Union[int, str] code fields — JSONB to preserve int/str on round-trip. + wall_construction: Union[int, str] = Field(sa_column=Column(JSONB, nullable=False)) + wall_insulation_type: Union[int, str] = Field( + sa_column=Column(JSONB, nullable=False) + ) + wall_thickness_measured: bool + party_wall_construction: Optional[Union[int, str]] = Field( + default=None, sa_column=Column(JSONB, nullable=True) + ) + building_part_number: Optional[int] = Field(default=None) + wall_dry_lined: Optional[bool] = Field(default=None) + wall_thickness_mm: Optional[int] = Field(default=None) + wall_insulation_thickness: Optional[str] = Field(default=None) + floor_heat_loss: Optional[int] = Field(default=None) + floor_insulation_thickness: Optional[str] = Field(default=None) + flat_roof_insulation_thickness: Optional[Union[str, int]] = Field( + default=None, sa_column=Column(JSONB, nullable=True) + ) + floor_type: Optional[str] = Field(default=None) + floor_construction_type: Optional[str] = Field(default=None) + floor_insulation_type_str: Optional[str] = Field(default=None) + floor_u_value_known: Optional[bool] = Field(default=None) + roof_construction: Optional[int] = Field(default=None) + roof_construction_type: Optional[str] = Field(default=None) + curtain_wall_age: Optional[str] = Field(default=None) + roof_insulation_location: Optional[Union[int, str]] = Field( + default=None, sa_column=Column(JSONB, nullable=True) + ) + roof_insulation_thickness: Optional[Union[str, int]] = Field( + default=None, sa_column=Column(JSONB, nullable=True) + ) + room_in_roof_floor_area: Optional[float] = Field(default=None) + room_in_roof_construction_age_band: Optional[str] = Field(default=None) + alt_wall_1_area: Optional[float] = Field(default=None) + alt_wall_1_dry_lined: Optional[str] = Field(default=None) + alt_wall_1_construction: Optional[int] = Field(default=None) + alt_wall_1_insulation_type: Optional[int] = Field(default=None) + alt_wall_1_thickness_measured: Optional[str] = Field(default=None) + alt_wall_1_insulation_thickness: Optional[str] = Field(default=None) + alt_wall_2_area: Optional[float] = Field(default=None) + alt_wall_2_dry_lined: Optional[str] = Field(default=None) + alt_wall_2_construction: Optional[int] = Field(default=None) + alt_wall_2_insulation_type: Optional[int] = Field(default=None) + alt_wall_2_thickness_measured: Optional[str] = Field(default=None) + alt_wall_2_insulation_thickness: Optional[str] = Field(default=None) + + @classmethod + def from_domain( + cls, part: SapBuildingPart, epc_property_id: int + ) -> EpcBuildingPartModel: + rir = part.sap_room_in_roof + aw1 = part.sap_alternative_wall_1 + aw2 = part.sap_alternative_wall_2 + return cls( + epc_property_id=epc_property_id, + identifier=part.identifier.value, + construction_age_band=part.construction_age_band, + wall_construction=part.wall_construction, + wall_insulation_type=part.wall_insulation_type, + wall_thickness_measured=part.wall_thickness_measured, + party_wall_construction=part.party_wall_construction, + building_part_number=part.building_part_number, + wall_dry_lined=part.wall_dry_lined, + wall_thickness_mm=part.wall_thickness_mm, + wall_insulation_thickness=part.wall_insulation_thickness, + floor_heat_loss=part.floor_heat_loss, + floor_insulation_thickness=part.floor_insulation_thickness, + flat_roof_insulation_thickness=part.flat_roof_insulation_thickness, + floor_type=part.floor_type, + floor_construction_type=part.floor_construction_type, + floor_insulation_type_str=part.floor_insulation_type_str, + floor_u_value_known=part.floor_u_value_known, + roof_construction=part.roof_construction, + roof_construction_type=part.roof_construction_type, + curtain_wall_age=part.curtain_wall_age, + roof_insulation_location=part.roof_insulation_location, + roof_insulation_thickness=part.roof_insulation_thickness, + room_in_roof_floor_area=float(rir.floor_area) if rir else None, + room_in_roof_construction_age_band=( + rir.construction_age_band if rir else None + ), + alt_wall_1_area=aw1.wall_area if aw1 else None, + alt_wall_1_dry_lined=aw1.wall_dry_lined if aw1 else None, + alt_wall_1_construction=aw1.wall_construction if aw1 else None, + alt_wall_1_insulation_type=aw1.wall_insulation_type if aw1 else None, + alt_wall_1_thickness_measured=aw1.wall_thickness_measured if aw1 else None, + alt_wall_1_insulation_thickness=( + aw1.wall_insulation_thickness if aw1 else None + ), + alt_wall_2_area=aw2.wall_area if aw2 else None, + alt_wall_2_dry_lined=aw2.wall_dry_lined if aw2 else None, + alt_wall_2_construction=aw2.wall_construction if aw2 else None, + alt_wall_2_insulation_type=aw2.wall_insulation_type if aw2 else None, + alt_wall_2_thickness_measured=aw2.wall_thickness_measured if aw2 else None, + alt_wall_2_insulation_thickness=( + aw2.wall_insulation_thickness if aw2 else None + ), + ) + + +class EpcFloorDimensionModel(SQLModel, table=True): + __tablename__: ClassVar[str] = "epc_floor_dimension" # pyright: ignore[reportIncompatibleVariableOverride] + + id: Optional[int] = Field(default=None, primary_key=True) + epc_building_part_id: int = Field( + foreign_key="epc_building_part.id", nullable=False + ) + + floor: Optional[int] = Field(default=None) + room_height_m: float + total_floor_area_m2: float + party_wall_length_m: float + heat_loss_perimeter_m: float + floor_insulation: Optional[int] = Field(default=None) + floor_construction: Optional[int] = Field(default=None) + + @classmethod + def from_domain( + cls, dim: SapFloorDimension, epc_building_part_id: int + ) -> EpcFloorDimensionModel: + return cls( + epc_building_part_id=epc_building_part_id, + floor=dim.floor, + room_height_m=dim.room_height_m, + total_floor_area_m2=dim.total_floor_area_m2, + party_wall_length_m=dim.party_wall_length_m, + heat_loss_perimeter_m=dim.heat_loss_perimeter_m, + floor_insulation=dim.floor_insulation, + floor_construction=dim.floor_construction, + ) + + +class EpcWindowModel(SQLModel, table=True): + __tablename__: ClassVar[str] = "epc_window" # pyright: ignore[reportIncompatibleVariableOverride] + + id: Optional[int] = Field(default=None, primary_key=True) + epc_property_id: int = Field(foreign_key="epc_property.id", nullable=False) + + frame_material: Optional[str] = Field(default=None) + # Union[int, str] / Union[bool, str] code fields — JSONB to preserve type on round-trip. + glazing_gap: Union[int, str] = Field(sa_column=Column(JSONB, nullable=False)) + orientation: Union[int, str] = Field(sa_column=Column(JSONB, nullable=False)) + window_type: Union[int, str] = Field(sa_column=Column(JSONB, nullable=False)) + glazing_type: Union[int, str] = Field(sa_column=Column(JSONB, nullable=False)) + window_width: float + window_height: float + draught_proofed: Union[bool, str] = Field(sa_column=Column(JSONB, nullable=False)) + window_location: Union[int, str] = Field(sa_column=Column(JSONB, nullable=False)) + window_wall_type: Union[int, str] = Field(sa_column=Column(JSONB, nullable=False)) + permanent_shutters_present: Union[bool, str] = Field( + sa_column=Column(JSONB, nullable=False) + ) + frame_factor: Optional[float] = Field(default=None) + permanent_shutters_insulated: Optional[str] = Field(default=None) + transmission_u_value: Optional[float] = Field(default=None) + transmission_data_source: Optional[Union[int, str]] = Field( + default=None, sa_column=Column(JSONB, nullable=True) + ) + transmission_solar_transmittance: Optional[float] = Field(default=None) + + @classmethod + def from_domain(cls, window: SapWindow, epc_property_id: int) -> EpcWindowModel: + td = window.window_transmission_details + return cls( + epc_property_id=epc_property_id, + frame_material=window.frame_material, + glazing_gap=window.glazing_gap, + orientation=window.orientation, + window_type=window.window_type, + glazing_type=window.glazing_type, + window_width=window.window_width, + window_height=window.window_height, + draught_proofed=window.draught_proofed, + window_location=window.window_location, + window_wall_type=window.window_wall_type, + permanent_shutters_present=window.permanent_shutters_present, + frame_factor=window.frame_factor, + permanent_shutters_insulated=window.permanent_shutters_insulated, + transmission_u_value=td.u_value if td else None, + transmission_data_source=td.data_source if td else None, + transmission_solar_transmittance=td.solar_transmittance if td else None, + ) + + +class EpcEnergyElementModel(SQLModel, table=True): + __tablename__: ClassVar[str] = "epc_energy_element" # pyright: ignore[reportIncompatibleVariableOverride] + + id: Optional[int] = Field(default=None, primary_key=True) + epc_property_id: int = Field(foreign_key="epc_property.id", nullable=False) + + element_type: str # roof | wall | floor | main_heating | window | lighting | hot_water | secondary_heating | main_heating_controls + description: str + energy_efficiency_rating: int + environmental_efficiency_rating: int + + @classmethod + def from_domain( + cls, element: EnergyElement, element_type: str, epc_property_id: int + ) -> EpcEnergyElementModel: + return cls( + epc_property_id=epc_property_id, + element_type=element_type, + description=element.description, + energy_efficiency_rating=element.energy_efficiency_rating, + environmental_efficiency_rating=element.environmental_efficiency_rating, + ) From 5f0a3b8f65e1541a489d493901b531792889d2bd Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sat, 30 May 2026 19:26:18 +0000 Subject: [PATCH 02/18] feat(epc): EPC persistence round-trip fidelity + JSONB code columns (Slice 1 #1129) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Relocate EpcPropertyModel + child tables from the dying backend/ tree to infrastructure/postgres/epc_property_table.py (re-export shim keeps documents_parser working). Add EpcRepository port + EpcPostgresRepository with a full reverse mapper (epc_property tables -> EpcPropertyData). Round-trip test surfaced two fidelity gaps: 1. Union[int,str] SAP code fields were str()-coerced on save, losing the int (API) vs str (Site Notes) distinction. Now stored as JSONB (type-preserving). 2. The schema was a partial projection. Closed the cheap gaps on the model (heating shower/bath counts, roof_construction_type, curtain_wall_age, addendum, mechanical_vent_duct_insulation_level, SAP 10.2 §2 ventilation fields + a ventilation_present flag). Structural gaps tracked as follow-ups; renewable_heat_incentive (P0, #1137) excluded from the assertion until landed. Round-trip passes for RdSAP-Schema-21.0.0 and 21.0.1; pyright strict clean. Migration inventory for the DB: docs/migrations/epc-property-round-trip-fidelity.md Co-Authored-By: Claude Opus 4.8 --- backend/app/db/models/epc_property.py | 680 +----------------- .../epc-property-round-trip-fidelity.md | 167 +++++ repositories/epc/__init__.py | 0 repositories/epc/epc_postgres_repository.py | 608 ++++++++++++++++ repositories/epc/epc_repository.py | 26 + tests/repositories/epc/__init__.py | 0 tests/repositories/epc/test_epc_round_trip.py | 56 ++ 7 files changed, 882 insertions(+), 655 deletions(-) create mode 100644 docs/migrations/epc-property-round-trip-fidelity.md create mode 100644 repositories/epc/__init__.py create mode 100644 repositories/epc/epc_postgres_repository.py create mode 100644 repositories/epc/epc_repository.py create mode 100644 tests/repositories/epc/__init__.py create mode 100644 tests/repositories/epc/test_epc_round_trip.py diff --git a/backend/app/db/models/epc_property.py b/backend/app/db/models/epc_property.py index 93882d5d..9cd8cd94 100644 --- a/backend/app/db/models/epc_property.py +++ b/backend/app/db/models/epc_property.py @@ -1,659 +1,29 @@ -from __future__ import annotations +"""Re-export shim. -from typing import Optional -from sqlmodel import SQLModel, Field +The EPC persistence models moved to ``infrastructure/postgres/epc_property_table.py`` +as part of the Ara backend rebuild (PRD Hestia-Homes/Model#1128, Slice 1 #1129). +This shim keeps the dying ``backend/`` callers working until cut-over. New code must +import from ``infrastructure.postgres.epc_property_table`` directly. +""" -from datatypes.epc.domain.epc_property_data import ( - EpcPropertyData, - EnergyElement, - MainHeatingDetail, - SapBuildingPart, - SapFloorDimension, - SapFlatDetails, - SapWindow, +from infrastructure.postgres.epc_property_table import ( + EpcBuildingPartModel, + EpcEnergyElementModel, + EpcFlatDetailsModel, + EpcFloorDimensionModel, + EpcMainHeatingDetailModel, + EpcPropertyEnergyPerformanceModel, + EpcPropertyModel, + EpcWindowModel, ) - -class EpcPropertyModel(SQLModel, table=True): - __tablename__ = "epc_property" - - id: Optional[int] = Field(default=None, primary_key=True) - property_id: Optional[int] = Field(default=None) - portfolio_id: Optional[int] = Field(default=None) - uploaded_file_id: Optional[int] = Field(default=None) - - # Identity / admin - uprn: Optional[int] = Field(default=None) - uprn_source: Optional[str] = Field(default=None) - report_reference: Optional[str] = Field(default=None) - report_type: Optional[str] = Field(default=None) - assessment_type: Optional[str] = Field(default=None) - sap_version: Optional[float] = Field(default=None) - schema_type: Optional[str] = Field(default=None) - schema_versions_original: Optional[str] = Field(default=None) - status: Optional[str] = Field(default=None) - calculation_software_version: Optional[str] = Field(default=None) - - # Address - address_line_1: Optional[str] = Field(default=None) - address_line_2: Optional[str] = Field(default=None) - post_town: Optional[str] = Field(default=None) - postcode: Optional[str] = Field(default=None) - region_code: Optional[str] = Field(default=None) - country_code: Optional[str] = Field(default=None) - language_code: Optional[str] = Field(default=None) - - # Property description - dwelling_type: str - property_type: Optional[str] = Field(default=None) - built_form: Optional[str] = Field(default=None) - tenure: str - transaction_type: str - inspection_date: str # store as ISO string; cast on read if needed - completion_date: Optional[str] = Field(default=None) - registration_date: Optional[str] = Field(default=None) - total_floor_area_m2: float - measurement_type: Optional[int] = Field(default=None) - - # Flags - solar_water_heating: bool - has_hot_water_cylinder: bool - has_fixed_air_conditioning: bool - has_conservatory: Optional[bool] = Field(default=None) - has_heated_separate_conservatory: Optional[bool] = Field(default=None) - conservatory_type: Optional[int] = Field(default=None) - - # Counts - door_count: int - wet_rooms_count: int - extensions_count: int - heated_rooms_count: int - open_chimneys_count: int - habitable_rooms_count: int - insulated_door_count: int - cfl_fixed_lighting_bulbs_count: int - led_fixed_lighting_bulbs_count: int - incandescent_fixed_lighting_bulbs_count: int - blocked_chimneys_count: Optional[int] = Field(default=None) - draughtproofed_door_count: Optional[int] = Field(default=None) - energy_rating_average: Optional[int] = Field(default=None) - low_energy_fixed_lighting_bulbs_count: Optional[int] = Field(default=None) - fixed_lighting_outlets_count: Optional[int] = Field(default=None) - low_energy_fixed_lighting_outlets_count: Optional[int] = Field(default=None) - number_of_storeys: Optional[int] = Field(default=None) - any_unheated_rooms: Optional[bool] = Field(default=None) - - # Misc - hydro: Optional[bool] = Field(default=None) - photovoltaic_array: Optional[bool] = Field(default=None) - waste_water_heat_recovery: Optional[str] = Field(default=None) - pressure_test: Optional[int] = Field(default=None) - pressure_test_certificate_number: Optional[int] = Field(default=None) - percent_draughtproofed: Optional[int] = Field(default=None) - insulated_door_u_value: Optional[float] = Field(default=None) - multiple_glazed_proportion: Optional[int] = Field(default=None) - windows_transmission_u_value: Optional[float] = Field(default=None) - windows_transmission_data_source: Optional[int] = Field(default=None) - windows_transmission_solar_transmittance: Optional[float] = Field(default=None) - - # Energy source - energy_mains_gas: bool - energy_meter_type: str - energy_pv_battery_count: int - energy_wind_turbines_count: int - energy_gas_smart_meter_present: bool - energy_is_dwelling_export_capable: bool - energy_wind_turbines_terrain_type: str - energy_electricity_smart_meter_present: bool - energy_pv_connection: Optional[str] = Field(default=None) - energy_pv_percent_roof_area: Optional[int] = Field(default=None) - energy_pv_battery_capacity: Optional[float] = Field(default=None) - energy_wind_turbine_hub_height: Optional[float] = Field(default=None) - energy_wind_turbine_rotor_diameter: Optional[float] = Field(default=None) - - # Heating config - heating_cylinder_size: Optional[str] = Field(default=None) - heating_water_heating_code: Optional[int] = Field(default=None) - heating_water_heating_fuel: Optional[int] = Field(default=None) - heating_immersion_heating_type: Optional[str] = Field(default=None) - heating_cylinder_insulation_type: Optional[str] = Field(default=None) - heating_cylinder_thermostat: Optional[str] = Field(default=None) - heating_secondary_fuel_type: Optional[int] = Field(default=None) - heating_secondary_heating_type: Optional[str] = Field(default=None) - heating_cylinder_insulation_thickness_mm: Optional[int] = Field(default=None) - heating_wwhrs_index_number_1: Optional[int] = Field(default=None) - heating_wwhrs_index_number_2: Optional[int] = Field(default=None) - heating_shower_outlet_type: Optional[str] = Field(default=None) - heating_shower_wwhrs: Optional[int] = Field(default=None) - - # Ventilation - ventilation_type: Optional[str] = Field(default=None) - ventilation_draught_lobby: Optional[bool] = Field(default=None) - ventilation_pressure_test: Optional[str] = Field(default=None) - ventilation_open_flues_count: Optional[int] = Field(default=None) - ventilation_closed_flues_count: Optional[int] = Field(default=None) - ventilation_boiler_flues_count: Optional[int] = Field(default=None) - ventilation_other_flues_count: Optional[int] = Field(default=None) - ventilation_extract_fans_count: Optional[int] = Field(default=None) - ventilation_passive_vents_count: Optional[int] = Field(default=None) - ventilation_flueless_gas_fires_count: Optional[int] = Field(default=None) - ventilation_in_pcdf_database: Optional[bool] = Field(default=None) - mechanical_ventilation: Optional[int] = Field(default=None) - mechanical_vent_duct_type: Optional[int] = Field(default=None) - mechanical_vent_duct_placement: Optional[int] = Field(default=None) - mechanical_vent_duct_insulation: Optional[int] = Field(default=None) - mechanical_ventilation_index_number: Optional[int] = Field(default=None) - mechanical_vent_measured_installation: Optional[str] = Field(default=None) - - @classmethod - def from_epc_property_data( - cls, - data: EpcPropertyData, - property_id: Optional[int] = None, - portfolio_id: Optional[int] = None, - ) -> EpcPropertyModel: - es = data.sap_energy_source - h = data.sap_heating - v = data.sap_ventilation - shower = h.shower_outlets.shower_outlet if h.shower_outlets else None - pv = es.photovoltaic_supply - wt = es.wind_turbine_details - pvb = es.pv_batteries - - return cls( - property_id=property_id, - portfolio_id=portfolio_id, - uprn=data.uprn, - uprn_source=data.uprn_source, - report_reference=data.report_reference, - report_type=data.report_type, - assessment_type=data.assessment_type, - sap_version=data.sap_version, - schema_type=data.schema_type, - schema_versions_original=data.schema_versions_original, - status=data.status, - calculation_software_version=data.calculation_software_version, - address_line_1=data.address_line_1, - address_line_2=data.address_line_2, - post_town=data.post_town, - postcode=data.postcode, - region_code=data.region_code, - country_code=data.country_code, - language_code=data.language_code, - dwelling_type=data.dwelling_type, - property_type=data.property_type, - built_form=data.built_form, - tenure=data.tenure, - transaction_type=data.transaction_type, - inspection_date=data.inspection_date.isoformat(), - completion_date=( - data.completion_date.isoformat() if data.completion_date else None - ), - registration_date=( - data.registration_date.isoformat() if data.registration_date else None - ), - total_floor_area_m2=data.total_floor_area_m2, - measurement_type=data.measurement_type, - solar_water_heating=data.solar_water_heating, - has_hot_water_cylinder=data.has_hot_water_cylinder, - has_fixed_air_conditioning=data.has_fixed_air_conditioning, - has_conservatory=data.has_conservatory, - has_heated_separate_conservatory=data.has_heated_separate_conservatory, - conservatory_type=data.conservatory_type, - door_count=data.door_count, - wet_rooms_count=data.wet_rooms_count, - extensions_count=data.extensions_count, - heated_rooms_count=data.heated_rooms_count, - open_chimneys_count=data.open_chimneys_count, - habitable_rooms_count=data.habitable_rooms_count, - insulated_door_count=data.insulated_door_count, - cfl_fixed_lighting_bulbs_count=data.cfl_fixed_lighting_bulbs_count, - led_fixed_lighting_bulbs_count=data.led_fixed_lighting_bulbs_count, - incandescent_fixed_lighting_bulbs_count=data.incandescent_fixed_lighting_bulbs_count, - blocked_chimneys_count=data.blocked_chimneys_count, - draughtproofed_door_count=data.draughtproofed_door_count, - energy_rating_average=data.energy_rating_average, - low_energy_fixed_lighting_bulbs_count=data.low_energy_fixed_lighting_bulbs_count, - fixed_lighting_outlets_count=data.fixed_lighting_outlets_count, - low_energy_fixed_lighting_outlets_count=data.low_energy_fixed_lighting_outlets_count, - number_of_storeys=data.number_of_storeys, - any_unheated_rooms=data.any_unheated_rooms, - hydro=data.hydro, - photovoltaic_array=data.photovoltaic_array, - waste_water_heat_recovery=data.waste_water_heat_recovery, - pressure_test=data.pressure_test, - pressure_test_certificate_number=data.pressure_test_certificate_number, - percent_draughtproofed=data.percent_draughtproofed, - insulated_door_u_value=data.insulated_door_u_value, - multiple_glazed_proportion=data.multiple_glazed_proportion, - windows_transmission_u_value=( - data.windows_transmission_details.u_value - if data.windows_transmission_details - else None - ), - windows_transmission_data_source=( - data.windows_transmission_details.data_source - if data.windows_transmission_details - else None - ), - windows_transmission_solar_transmittance=( - data.windows_transmission_details.solar_transmittance - if data.windows_transmission_details - else None - ), - energy_mains_gas=es.mains_gas, - energy_meter_type=str(es.meter_type), - energy_pv_battery_count=es.pv_battery_count, - energy_wind_turbines_count=es.wind_turbines_count, - energy_gas_smart_meter_present=es.gas_smart_meter_present, - energy_is_dwelling_export_capable=es.is_dwelling_export_capable, - energy_wind_turbines_terrain_type=str(es.wind_turbines_terrain_type), - energy_electricity_smart_meter_present=es.electricity_smart_meter_present, - energy_pv_connection=( - str(es.pv_connection) if es.pv_connection is not None else None - ), - energy_pv_percent_roof_area=( - pv.none_or_no_details.percent_roof_area if pv else None - ), - energy_pv_battery_capacity=pvb.pv_battery.battery_capacity if pvb else None, - energy_wind_turbine_hub_height=wt.hub_height if wt else None, - energy_wind_turbine_rotor_diameter=wt.rotor_diameter if wt else None, - heating_cylinder_size=( - str(h.cylinder_size) if h.cylinder_size is not None else None - ), - heating_water_heating_code=h.water_heating_code, - heating_water_heating_fuel=h.water_heating_fuel, - heating_immersion_heating_type=( - str(h.immersion_heating_type) - if h.immersion_heating_type is not None - else None - ), - heating_cylinder_insulation_type=( - str(h.cylinder_insulation_type) - if h.cylinder_insulation_type is not None - else None - ), - heating_cylinder_thermostat=h.cylinder_thermostat, - heating_secondary_fuel_type=h.secondary_fuel_type, - heating_secondary_heating_type=( - str(h.secondary_heating_type) - if h.secondary_heating_type is not None - else None - ), - heating_cylinder_insulation_thickness_mm=h.cylinder_insulation_thickness_mm, - heating_wwhrs_index_number_1=h.instantaneous_wwhrs.wwhrs_index_number1, - heating_wwhrs_index_number_2=h.instantaneous_wwhrs.wwhrs_index_number2, - heating_shower_outlet_type=( - str(shower.shower_outlet_type) if shower else None - ), - heating_shower_wwhrs=shower.shower_wwhrs if shower else None, - ventilation_type=v.ventilation_type if v else None, - ventilation_draught_lobby=v.draught_lobby if v else None, - ventilation_pressure_test=v.pressure_test if v else None, - ventilation_open_flues_count=v.open_flues_count if v else None, - ventilation_closed_flues_count=v.closed_flues_count if v else None, - ventilation_boiler_flues_count=v.boiler_flues_count if v else None, - ventilation_other_flues_count=v.other_flues_count if v else None, - ventilation_extract_fans_count=v.extract_fans_count if v else None, - ventilation_passive_vents_count=v.passive_vents_count if v else None, - ventilation_flueless_gas_fires_count=( - v.flueless_gas_fires_count if v else None - ), - ventilation_in_pcdf_database=v.ventilation_in_pcdf_database if v else None, - mechanical_ventilation=data.mechanical_ventilation, - mechanical_vent_duct_type=data.mechanical_vent_duct_type, - mechanical_vent_duct_placement=data.mechanical_vent_duct_placement, - mechanical_vent_duct_insulation=data.mechanical_vent_duct_insulation, - mechanical_ventilation_index_number=data.mechanical_ventilation_index_number, - mechanical_vent_measured_installation=data.mechanical_vent_measured_installation, - ) - - -class EpcPropertyEnergyPerformanceModel(SQLModel, table=True): - __tablename__ = "epc_property_energy_performance" - - id: Optional[int] = Field(default=None, primary_key=True) - epc_property_id: int = Field( - foreign_key="epc_property.id", nullable=False, unique=True - ) - - energy_rating_current: Optional[int] = Field(default=None) - energy_consumption_current: Optional[int] = Field(default=None) - environmental_impact_current: Optional[int] = Field(default=None) - heating_cost_current: Optional[float] = Field(default=None) - lighting_cost_current: Optional[float] = Field(default=None) - hot_water_cost_current: Optional[float] = Field(default=None) - co2_emissions_current: Optional[float] = Field(default=None) - co2_emissions_current_per_floor_area: Optional[int] = Field(default=None) - current_energy_efficiency_band: Optional[str] = Field(default=None) - energy_rating_potential: Optional[float] = Field(default=None) - energy_consumption_potential: Optional[int] = Field(default=None) - environmental_impact_potential: Optional[int] = Field(default=None) - heating_cost_potential: Optional[float] = Field(default=None) - lighting_cost_potential: Optional[float] = Field(default=None) - hot_water_cost_potential: Optional[float] = Field(default=None) - co2_emissions_potential: Optional[float] = Field(default=None) - potential_energy_efficiency_band: Optional[str] = Field(default=None) - - @classmethod - def from_epc_property_data( - cls, data: EpcPropertyData, epc_property_id: int - ) -> EpcPropertyEnergyPerformanceModel: - return cls( - epc_property_id=epc_property_id, - energy_rating_current=data.energy_rating_current, - energy_consumption_current=data.energy_consumption_current, - environmental_impact_current=data.environmental_impact_current, - heating_cost_current=data.heating_cost_current, - lighting_cost_current=data.lighting_cost_current, - hot_water_cost_current=data.hot_water_cost_current, - co2_emissions_current=data.co2_emissions_current, - co2_emissions_current_per_floor_area=data.co2_emissions_current_per_floor_area, - current_energy_efficiency_band=( - data.current_energy_efficiency_band.value - if data.current_energy_efficiency_band - else None - ), - energy_rating_potential=data.energy_rating_potential, - energy_consumption_potential=data.energy_consumption_potential, - environmental_impact_potential=data.environmental_impact_potential, - heating_cost_potential=data.heating_cost_potential, - lighting_cost_potential=data.lighting_cost_potential, - hot_water_cost_potential=data.hot_water_cost_potential, - co2_emissions_potential=data.co2_emissions_potential, - potential_energy_efficiency_band=( - data.potential_energy_efficiency_band.value - if data.potential_energy_efficiency_band - else None - ), - ) - - -class EpcFlatDetailsModel(SQLModel, table=True): - __tablename__ = "epc_flat_details" - - id: Optional[int] = Field(default=None, primary_key=True) - epc_property_id: int = Field( - foreign_key="epc_property.id", nullable=False, unique=True - ) - - level: int - top_storey: str - flat_location: int - heat_loss_corridor: int - storey_count: Optional[int] = Field(default=None) - unheated_corridor_length_m: Optional[int] = Field(default=None) - - @classmethod - def from_domain( - cls, flat: SapFlatDetails, epc_property_id: int - ) -> EpcFlatDetailsModel: - return cls( - epc_property_id=epc_property_id, - level=flat.level, - top_storey=flat.top_storey, - flat_location=flat.flat_location, - heat_loss_corridor=flat.heat_loss_corridor, - storey_count=flat.storey_count, - unheated_corridor_length_m=flat.unheated_corridor_length_m, - ) - - -class EpcMainHeatingDetailModel(SQLModel, table=True): - __tablename__ = "epc_main_heating_detail" - - id: Optional[int] = Field(default=None, primary_key=True) - epc_property_id: int = Field(foreign_key="epc_property.id", nullable=False) - - has_fghrs: bool - main_fuel_type: str - heat_emitter_type: str - emitter_temperature: str - main_heating_control: str - fan_flue_present: Optional[bool] = Field(default=None) - boiler_flue_type: Optional[int] = Field(default=None) - boiler_ignition_type: Optional[int] = Field(default=None) - central_heating_pump_age: Optional[int] = Field(default=None) - central_heating_pump_age_str: Optional[str] = Field(default=None) - main_heating_index_number: Optional[int] = Field(default=None) - sap_main_heating_code: Optional[int] = Field(default=None) - main_heating_number: Optional[int] = Field(default=None) - main_heating_category: Optional[int] = Field(default=None) - main_heating_fraction: Optional[int] = Field(default=None) - main_heating_data_source: Optional[int] = Field(default=None) - condensing: Optional[bool] = Field(default=None) - weather_compensator: Optional[bool] = Field(default=None) - - @classmethod - def from_domain( - cls, detail: MainHeatingDetail, epc_property_id: int - ) -> EpcMainHeatingDetailModel: - return cls( - epc_property_id=epc_property_id, - has_fghrs=detail.has_fghrs, - main_fuel_type=str(detail.main_fuel_type), - heat_emitter_type=str(detail.heat_emitter_type), - emitter_temperature=str(detail.emitter_temperature), - main_heating_control=str(detail.main_heating_control), - fan_flue_present=detail.fan_flue_present, - boiler_flue_type=detail.boiler_flue_type, - boiler_ignition_type=detail.boiler_ignition_type, - central_heating_pump_age=detail.central_heating_pump_age, - central_heating_pump_age_str=detail.central_heating_pump_age_str, - main_heating_index_number=detail.main_heating_index_number, - sap_main_heating_code=detail.sap_main_heating_code, - main_heating_number=detail.main_heating_number, - main_heating_category=detail.main_heating_category, - main_heating_fraction=detail.main_heating_fraction, - main_heating_data_source=detail.main_heating_data_source, - condensing=detail.condensing, - weather_compensator=detail.weather_compensator, - ) - - -class EpcBuildingPartModel(SQLModel, table=True): - __tablename__ = "epc_building_part" - - id: Optional[int] = Field(default=None, primary_key=True) - epc_property_id: int = Field(foreign_key="epc_property.id", nullable=False) - - identifier: str - construction_age_band: str - wall_construction: str - wall_insulation_type: str - wall_thickness_measured: bool - party_wall_construction: str - building_part_number: Optional[int] = Field(default=None) - wall_dry_lined: Optional[bool] = Field(default=None) - wall_thickness_mm: Optional[int] = Field(default=None) - wall_insulation_thickness: Optional[str] = Field(default=None) - floor_heat_loss: Optional[int] = Field(default=None) - floor_insulation_thickness: Optional[str] = Field(default=None) - flat_roof_insulation_thickness: Optional[str] = Field(default=None) - floor_type: Optional[str] = Field(default=None) - floor_construction_type: Optional[str] = Field(default=None) - floor_insulation_type_str: Optional[str] = Field(default=None) - floor_u_value_known: Optional[bool] = Field(default=None) - roof_construction: Optional[int] = Field(default=None) - roof_insulation_location: Optional[str] = Field(default=None) - roof_insulation_thickness: Optional[str] = Field(default=None) - room_in_roof_floor_area: Optional[float] = Field(default=None) - room_in_roof_construction_age_band: Optional[str] = Field(default=None) - alt_wall_1_area: Optional[float] = Field(default=None) - alt_wall_1_dry_lined: Optional[str] = Field(default=None) - alt_wall_1_construction: Optional[int] = Field(default=None) - alt_wall_1_insulation_type: Optional[int] = Field(default=None) - alt_wall_1_thickness_measured: Optional[str] = Field(default=None) - alt_wall_1_insulation_thickness: Optional[str] = Field(default=None) - alt_wall_2_area: Optional[float] = Field(default=None) - alt_wall_2_dry_lined: Optional[str] = Field(default=None) - alt_wall_2_construction: Optional[int] = Field(default=None) - alt_wall_2_insulation_type: Optional[int] = Field(default=None) - alt_wall_2_thickness_measured: Optional[str] = Field(default=None) - alt_wall_2_insulation_thickness: Optional[str] = Field(default=None) - - @classmethod - def from_domain( - cls, part: SapBuildingPart, epc_property_id: int - ) -> EpcBuildingPartModel: - rir = part.sap_room_in_roof - aw1 = part.sap_alternative_wall_1 - aw2 = part.sap_alternative_wall_2 - return cls( - epc_property_id=epc_property_id, - identifier=part.identifier.value, - construction_age_band=part.construction_age_band, - wall_construction=str(part.wall_construction), - wall_insulation_type=str(part.wall_insulation_type), - wall_thickness_measured=part.wall_thickness_measured, - party_wall_construction=str(part.party_wall_construction), - building_part_number=part.building_part_number, - wall_dry_lined=part.wall_dry_lined, - wall_thickness_mm=part.wall_thickness_mm, - wall_insulation_thickness=part.wall_insulation_thickness, - floor_heat_loss=part.floor_heat_loss, - floor_insulation_thickness=part.floor_insulation_thickness, - flat_roof_insulation_thickness=( - str(part.flat_roof_insulation_thickness) - if part.flat_roof_insulation_thickness is not None - else None - ), - floor_type=part.floor_type, - floor_construction_type=part.floor_construction_type, - floor_insulation_type_str=part.floor_insulation_type_str, - floor_u_value_known=part.floor_u_value_known, - roof_construction=part.roof_construction, - roof_insulation_location=( - str(part.roof_insulation_location) - if part.roof_insulation_location is not None - else None - ), - roof_insulation_thickness=( - str(part.roof_insulation_thickness) - if part.roof_insulation_thickness is not None - else None - ), - room_in_roof_floor_area=float(rir.floor_area) if rir else None, - room_in_roof_construction_age_band=( - rir.construction_age_band if rir else None - ), - alt_wall_1_area=aw1.wall_area if aw1 else None, - alt_wall_1_dry_lined=aw1.wall_dry_lined if aw1 else None, - alt_wall_1_construction=aw1.wall_construction if aw1 else None, - alt_wall_1_insulation_type=aw1.wall_insulation_type if aw1 else None, - alt_wall_1_thickness_measured=aw1.wall_thickness_measured if aw1 else None, - alt_wall_1_insulation_thickness=( - aw1.wall_insulation_thickness if aw1 else None - ), - alt_wall_2_area=aw2.wall_area if aw2 else None, - alt_wall_2_dry_lined=aw2.wall_dry_lined if aw2 else None, - alt_wall_2_construction=aw2.wall_construction if aw2 else None, - alt_wall_2_insulation_type=aw2.wall_insulation_type if aw2 else None, - alt_wall_2_thickness_measured=aw2.wall_thickness_measured if aw2 else None, - alt_wall_2_insulation_thickness=( - aw2.wall_insulation_thickness if aw2 else None - ), - ) - - -class EpcFloorDimensionModel(SQLModel, table=True): - __tablename__ = "epc_floor_dimension" - - id: Optional[int] = Field(default=None, primary_key=True) - epc_building_part_id: int = Field( - foreign_key="epc_building_part.id", nullable=False - ) - - floor: Optional[int] = Field(default=None) - room_height_m: float - total_floor_area_m2: float - party_wall_length_m: float - heat_loss_perimeter_m: float - floor_insulation: Optional[int] = Field(default=None) - floor_construction: Optional[int] = Field(default=None) - - @classmethod - def from_domain( - cls, dim: SapFloorDimension, epc_building_part_id: int - ) -> EpcFloorDimensionModel: - return cls( - epc_building_part_id=epc_building_part_id, - floor=dim.floor, - room_height_m=dim.room_height_m, - total_floor_area_m2=dim.total_floor_area_m2, - party_wall_length_m=dim.party_wall_length_m, - heat_loss_perimeter_m=dim.heat_loss_perimeter_m, - floor_insulation=dim.floor_insulation, - floor_construction=dim.floor_construction, - ) - - -class EpcWindowModel(SQLModel, table=True): - __tablename__ = "epc_window" - - id: Optional[int] = Field(default=None, primary_key=True) - epc_property_id: int = Field(foreign_key="epc_property.id", nullable=False) - - frame_material: Optional[str] = Field(default=None) - glazing_gap: str - orientation: str - window_type: str - glazing_type: str - window_width: float - window_height: float - draught_proofed: bool - window_location: str - window_wall_type: str - permanent_shutters_present: bool - frame_factor: Optional[float] = Field(default=None) - permanent_shutters_insulated: Optional[str] = Field(default=None) - transmission_u_value: Optional[float] = Field(default=None) - transmission_data_source: Optional[str] = Field(default=None) - transmission_solar_transmittance: Optional[float] = Field(default=None) - - @classmethod - def from_domain(cls, window: SapWindow, epc_property_id: int) -> EpcWindowModel: - td = window.window_transmission_details - return cls( - epc_property_id=epc_property_id, - frame_material=window.frame_material, - glazing_gap=str(window.glazing_gap), - orientation=str(window.orientation), - window_type=str(window.window_type), - glazing_type=str(window.glazing_type), - window_width=window.window_width, - window_height=window.window_height, - draught_proofed=bool(window.draught_proofed), - window_location=str(window.window_location), - window_wall_type=str(window.window_wall_type), - permanent_shutters_present=bool(window.permanent_shutters_present), - frame_factor=window.frame_factor, - permanent_shutters_insulated=window.permanent_shutters_insulated, - transmission_u_value=td.u_value if td else None, - transmission_data_source=td.data_source if td else None, - transmission_solar_transmittance=td.solar_transmittance if td else None, - ) - - -class EpcEnergyElementModel(SQLModel, table=True): - __tablename__ = "epc_energy_element" - - id: Optional[int] = Field(default=None, primary_key=True) - epc_property_id: int = Field(foreign_key="epc_property.id", nullable=False) - - element_type: str # roof | wall | floor | main_heating | window | lighting | hot_water | secondary_heating | main_heating_controls - description: str - energy_efficiency_rating: int - environmental_efficiency_rating: int - - @classmethod - def from_domain( - cls, element: EnergyElement, element_type: str, epc_property_id: int - ) -> EpcEnergyElementModel: - return cls( - epc_property_id=epc_property_id, - element_type=element_type, - description=element.description, - energy_efficiency_rating=element.energy_efficiency_rating, - environmental_efficiency_rating=element.environmental_efficiency_rating, - ) +__all__ = [ + "EpcBuildingPartModel", + "EpcEnergyElementModel", + "EpcFlatDetailsModel", + "EpcFloorDimensionModel", + "EpcMainHeatingDetailModel", + "EpcPropertyEnergyPerformanceModel", + "EpcPropertyModel", + "EpcWindowModel", +] diff --git a/docs/migrations/epc-property-round-trip-fidelity.md b/docs/migrations/epc-property-round-trip-fidelity.md new file mode 100644 index 00000000..e7e23c02 --- /dev/null +++ b/docs/migrations/epc-property-round-trip-fidelity.md @@ -0,0 +1,167 @@ +# EPC persistence schema gaps — migrations for round-trip fidelity + +**Context:** Slice 1 (Hestia-Homes/Model#1129) of the `ara_first_run` rebuild. The round-trip +fidelity test (`EpcPropertyData → epc_property tables → reload → EpcPropertyData`, deep-equality) +surfaced that the current `epc_property` schema stores only a **partial, partly type-lossy +projection** of the `EpcPropertyData` domain object. This document lists every gap and the +migration needed to close it, so the schema (FE-owned for some tables) can be updated. + +We can make the column/table changes on the **SQLModel definitions** in +`infrastructure/postgres/epc_property_table.py` directly — tests build their schema from those +models via `SQLModel.metadata.create_all`, so they don't need the live DB. The live migrations +listed here are what must be applied wherever the physical tables are owned. + +**`epc_cache` relationship:** the raw gov-API JSON response is retained in the `epc_cache` table, +so the *source* is always recoverable even where the structured `epc_property` projection is +lossy. That makes these gaps "the structured store is incomplete" rather than "data is lost +forever" — but the modelling pipeline reads the structured `epc_property`, not the raw cache, so +the gaps below still block faithful modelling and must be closed. + +Priority key: **P0** modelling needs it now · **P1** needed soon · **P2** completeness. + +--- + +## Status after Slice 1 (#1129) + +The round-trip test passes over the persisted projection for RdSAP-Schema-21.0.0 and 21.0.1. +The following were **applied on the SQLModel** (`infrastructure/postgres/epc_property_table.py`) +and **still require the matching DB migration** wherever the physical tables live: + +- **§1 JSONB** — all `Union` code columns converted (`epc_property`: `heating_cylinder_size`, + `heating_immersion_heating_type`, `heating_cylinder_insulation_type`, + `heating_secondary_heating_type`, `heating_shower_outlet_type`, `energy_pv_connection`; + `epc_main_heating_detail`: `main_fuel_type`, `heat_emitter_type`, `emitter_temperature`, + `main_heating_control`; `epc_building_part`: `wall_construction`, `wall_insulation_type`, + `party_wall_construction`, `flat_roof_insulation_thickness`, `roof_insulation_location`, + `roof_insulation_thickness`; `epc_window`: `glazing_gap`, `orientation`, `window_type`, + `glazing_type`, `window_location`, `window_wall_type`, `draught_proofed`, + `permanent_shutters_present`, `transmission_data_source`). +- **New scalar columns** — `epc_property`: `heating_number_baths`, `heating_number_baths_wwhrs`, + `heating_electric_shower_count`, `heating_mixer_shower_count`, + `mechanical_vent_duct_insulation_level`, `addendum_stone_walls`, `addendum_system_build`, + `addendum_numbers` (JSONB), `ventilation_present`, `ventilation_sheltered_sides`, + `ventilation_has_suspended_timber_floor`, `ventilation_suspended_timber_floor_sealed`, + `ventilation_has_draught_lobby`, `ventilation_air_permeability_ap4_m3_h_m2`, + `ventilation_mechanical_ventilation_kind`; `epc_building_part`: `roof_construction_type`, + `curtain_wall_age`. + +**Still open (follow-up issues):** §2.1 `epc_renewable_heat_incentive` (P0, #1137 — excluded from the +slice-1 assertion via `dataclasses.replace(..., renewable_heat_incentive=None)` until landed), and +the remaining §2 structural tables (room-in-roof detail, PV arrays, roof windows) + §3 nested-wall +fields (`SapAlternativeWall.u_value`/`wall_thickness_mm`) + `SapFloorDimension` exposed-floor flags. + +--- + +## 1. Type fidelity — convert `Union[int, str]` code columns to JSONB + +These columns hold SAP/RdSAP categorical codes that are **`int` from the gov API** and **`str` +from Site Notes** (`Union[int, str]` in the domain). The forward mapper currently coerces them +with `str(...)` (and `bool(...)` for two window flags), so an API `int` of `26` is stored as +`"26"` and cannot be recovered. Convert each to **JSONB** and drop the `str()`/`bool()` coercion +in the forward mapper so the Python type round-trips exactly (JSON scalars preserve `int` vs +`str` vs `bool` vs `null`). **P0** — these feed the SAP10 calculator's int-keyed dispatch. + +| Table | Columns | +|---|---| +| `epc_property` | `heating_cylinder_size`, `heating_immersion_heating_type`, `heating_cylinder_insulation_type`, `heating_secondary_heating_type`, `heating_shower_outlet_type`, `energy_pv_connection` | +| `epc_main_heating_detail` | `main_fuel_type`, `heat_emitter_type`, `emitter_temperature`, `main_heating_control` | +| `epc_building_part` | `wall_construction`, `wall_insulation_type`, `party_wall_construction`, `flat_roof_insulation_thickness`, `roof_insulation_location`, `roof_insulation_thickness` | +| `epc_window` | `glazing_gap`, `orientation`, `window_type`, `glazing_type`, `window_location`, `window_wall_type`, `draught_proofed`, `permanent_shutters_present` | + +(`energy_meter_type` and `energy_wind_turbines_terrain_type` are `str` in the domain — leave as +`TEXT`.) + +--- + +## 2. Not stored at all — new tables + +### 2.1 `epc_renewable_heat_incentive` — **P0** +Maps `EpcPropertyData.renewable_heat_incentive` (`RenewableHeatIncentive`). Carries the **baseline +space-heating and hot-water kWh** that EPC Energy Derivation consumes — the single most important +gap. One row per `epc_property`. + +| Column | Type | Source | +|---|---|---| +| `epc_property_id` | FK → `epc_property.id`, unique | | +| `space_heating_kwh` | float | `space_heating_kwh` | +| `water_heating_kwh` | float | `water_heating_kwh` | +| `impact_of_loft_insulation_kwh` | float, null | `impact_of_loft_insulation_kwh` | +| `impact_of_cavity_insulation_kwh` | float, null | `impact_of_cavity_insulation_kwh` | +| `impact_of_solid_wall_insulation_kwh` | float, null | `impact_of_solid_wall_insulation_kwh` | + +### 2.2 `epc_room_in_roof` (+ `epc_room_in_roof_surface`) — **P1** +`SapBuildingPart.sap_room_in_roof` (`SapRoomInRoof`) is currently flattened to just +`room_in_roof_floor_area` + `room_in_roof_construction_age_band` on `epc_building_part`, dropping +the Type-2 geometry and the Detailed-measurement surfaces. Replace with a child table of +`epc_building_part`: + +`epc_room_in_roof`: `epc_building_part_id` (FK, unique), `floor_area`, `construction_age_band`, +`common_wall_length_m`, `common_wall_height_m`, `gable_1_length_m`, `gable_1_height_m`, +`gable_2_length_m`, `gable_2_height_m`. + +`epc_room_in_roof_surface` (0..n per RIR, from `detailed_surfaces: List[SapRoomInRoofSurface]`): +`epc_room_in_roof_id` (FK), `kind`, `area_m2`, `insulation_thickness_mm` (null), +`insulation_type` (null), `u_value` (null). + +### 2.3 `epc_photovoltaic_array` — **P1** +`SapEnergySource.photovoltaic_arrays: List[PhotovoltaicArray]` (measured PV) is not stored at all +— only the `percent_roof_area` fallback is. One row per array: `epc_property_id` (FK), +`peak_power`, `pitch`, `orientation`, `overshading`. + +### 2.4 `epc_roof_window` — **P2** +`EpcPropertyData.sap_roof_windows: List[SapRoofWindow]` not stored. One row per roof window: +`epc_property_id` (FK), `area_m2`, `u_value_raw`, `orientation`, `pitch_deg`, `g_perpendicular`, +`frame_factor`. + +--- + +## 3. Not stored at all — new columns + +### 3.1 `epc_property` additions +| Column | Type | Source | Pri | +|---|---|---|---| +| `addendum_stone_walls` | bool, null | `addendum.stone_walls` | P2 | +| `addendum_system_build` | bool, null | `addendum.system_build` | P2 | +| `addendum_numbers` | JSONB, null | `addendum.addendum_numbers` (`List[int]`) | P2 | +| `lzc_energy_sources` | JSONB, null | `lzc_energy_sources` (`List[int]`) | P2 | +| `solar_hw_collector_orientation` | text, null | `solar_hw_collector_orientation` | P1 | +| `solar_hw_collector_pitch_deg` | int, null | `solar_hw_collector_pitch_deg` | P1 | +| `solar_hw_overshading` | text, null | `solar_hw_overshading` | P1 | +| `extract_fans_count` | int, null | top-level `extract_fans_count` (distinct from the `ventilation_*` one) | P2 | +| `mechanical_vent_duct_insulation_level` | int, null | `mechanical_vent_duct_insulation_level` | P2 | + +### 3.2 `epc_building_part` additions +| Column | Type | Source | Pri | +|---|---|---|---| +| `roof_construction_type` | text, null | `roof_construction_type` (Site-Notes str) | P1 | +| `curtain_wall_age` | text, null | `curtain_wall_age` (RdSAP §5.18) | P1 | +| `alt_wall_1_u_value` | float, null | `sap_alternative_wall_1.u_value` | P1 | +| `alt_wall_1_thickness_mm` | int, null | `sap_alternative_wall_1.wall_thickness_mm` | P1 | +| `alt_wall_2_u_value` | float, null | `sap_alternative_wall_2.u_value` | P1 | +| `alt_wall_2_thickness_mm` | int, null | `sap_alternative_wall_2.wall_thickness_mm` | P1 | + +### 3.3 `epc_floor_dimension` additions +| Column | Type | Source | Pri | +|---|---|---|---| +| `is_exposed_floor` | bool, default false | `SapFloorDimension.is_exposed_floor` | P1 | +| `is_above_partially_heated_space` | bool, default false | `SapFloorDimension.is_above_partially_heated_space` | P1 | + +--- + +## 4. Mapper-only gaps (no schema change required) + +The table can already hold these; the **save mapper** simply doesn't write them. Fix in the +forward mapper, not the DB: + +- **`air_tightness`** (`EnergyElement`) — `epc_energy_element.element_type` is a free string, so add + an `"air_tightness"` element type to the save loop. **P1.** + +--- + +## 5. Scope note + +Slice 1 (#1129) asserts faithful round-trip over the **projection the schema is meant to store**, +after applying §1 (JSONB) and the straightforward §3/§4 additions on the SQLModel. The structural +new tables in §2 (RHI, room-in-roof, PV arrays, roof windows) are tracked as their own follow-up +issues — `epc_renewable_heat_incentive` (§2.1) first, as it unblocks EPC Energy Derivation. Each +gap above should become a checkbox on the relevant issue so nothing is silently dropped. diff --git a/repositories/epc/__init__.py b/repositories/epc/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/repositories/epc/epc_postgres_repository.py b/repositories/epc/epc_postgres_repository.py new file mode 100644 index 00000000..02dc49b9 --- /dev/null +++ b/repositories/epc/epc_postgres_repository.py @@ -0,0 +1,608 @@ +from __future__ import annotations + +from datetime import date +from typing import Optional, TypeVar + +from sqlmodel import Session, select + +from datatypes.epc.domain.epc import Epc +from datatypes.epc.domain.epc_property_data import ( + Addendum, + BuildingPartIdentifier, + EnergyElement, + EpcPropertyData, + InstantaneousWwhrs, + MainHeatingDetail, + PhotovoltaicSupply, + PhotovoltaicSupplyNoneOrNoDetails, + PvBatteries, + PvBattery, + SapAlternativeWall, + SapBuildingPart, + SapEnergySource, + SapFlatDetails, + SapFloorDimension, + SapHeating, + SapRoomInRoof, + SapVentilation, + SapWindow, + ShowerOutlet, + ShowerOutlets, + WindowsTransmissionDetails, + WindowTransmissionDetails, + WindTurbineDetails, +) +from infrastructure.postgres.epc_property_table import ( + EpcBuildingPartModel, + EpcEnergyElementModel, + EpcFlatDetailsModel, + EpcFloorDimensionModel, + EpcMainHeatingDetailModel, + EpcPropertyEnergyPerformanceModel, + EpcPropertyModel, + EpcWindowModel, +) +from repositories.epc.epc_repository import EpcRepository +from utilities.private import private + +_T = TypeVar("_T") + + +def _require(value: Optional[_T], field: str) -> _T: + if value is None: + raise ValueError(f"epc_property row is missing required field {field!r}") + return value + + +class EpcPostgresRepository(EpcRepository): + """Maps EpcPropertyData to/from the epc_property parent row + child tables. + + Round-trip fidelity over the persisted projection is pinned by the Slice-1 + round-trip test (Hestia-Homes/Model#1129). Fields the schema does not yet + store (see docs/migrations/epc-property-round-trip-fidelity.md §2) reconstruct + as their dataclass defaults — tracked as follow-up migrations. + """ + + def __init__(self, session: Session) -> None: + self._session = session + + def save( + self, + data: EpcPropertyData, + property_id: Optional[int] = None, + portfolio_id: Optional[int] = None, + ) -> int: + parent = EpcPropertyModel.from_epc_property_data( + data, property_id=property_id, portfolio_id=portfolio_id + ) + self._session.add(parent) + self._session.flush() + epc_property_id = _require(parent.id, "id") + + self._session.add( + EpcPropertyEnergyPerformanceModel.from_epc_property_data( + data, epc_property_id=epc_property_id + ) + ) + for detail in data.sap_heating.main_heating_details: + self._session.add( + EpcMainHeatingDetailModel.from_domain(detail, epc_property_id) + ) + for part in data.sap_building_parts: + bp = EpcBuildingPartModel.from_domain(part, epc_property_id) + self._session.add(bp) + self._session.flush() + bp_id = _require(bp.id, "epc_building_part.id") + for dim in part.sap_floor_dimensions: + self._session.add(EpcFloorDimensionModel.from_domain(dim, bp_id)) + for window in data.sap_windows: + self._session.add(EpcWindowModel.from_domain(window, epc_property_id)) + + for element_type, elements in ( + ("roof", data.roofs), + ("wall", data.walls), + ("floor", data.floors), + ("main_heating", data.main_heating), + ): + for el in elements: + self._session.add( + EpcEnergyElementModel.from_domain(el, element_type, epc_property_id) + ) + for el, element_type in ( + (data.window, "window"), + (data.lighting, "lighting"), + (data.hot_water, "hot_water"), + (data.secondary_heating, "secondary_heating"), + (data.main_heating_controls, "main_heating_controls"), + ): + if el is not None: + self._session.add( + EpcEnergyElementModel.from_domain(el, element_type, epc_property_id) + ) + + if data.sap_flat_details is not None: + self._session.add( + EpcFlatDetailsModel.from_domain(data.sap_flat_details, epc_property_id) + ) + return epc_property_id + + def get(self, epc_property_id: int) -> EpcPropertyData: + p = self._session.get(EpcPropertyModel, epc_property_id) + if p is None: + raise ValueError(f"epc_property {epc_property_id} not found") + perf = self._session.exec( + select(EpcPropertyEnergyPerformanceModel).where( + EpcPropertyEnergyPerformanceModel.epc_property_id == epc_property_id + ) + ).first() + elements = list( + self._session.exec( + select(EpcEnergyElementModel) + .where(EpcEnergyElementModel.epc_property_id == epc_property_id) + .order_by(EpcEnergyElementModel.id) # type: ignore[arg-type] + ).all() + ) + heating_rows = list( + self._session.exec( + select(EpcMainHeatingDetailModel) + .where(EpcMainHeatingDetailModel.epc_property_id == epc_property_id) + .order_by(EpcMainHeatingDetailModel.id) # type: ignore[arg-type] + ).all() + ) + part_rows = list( + self._session.exec( + select(EpcBuildingPartModel) + .where(EpcBuildingPartModel.epc_property_id == epc_property_id) + .order_by(EpcBuildingPartModel.id) # type: ignore[arg-type] + ).all() + ) + flat_row = self._session.exec( + select(EpcFlatDetailsModel).where( + EpcFlatDetailsModel.epc_property_id == epc_property_id + ) + ).first() + + def _elements(element_type: str) -> list[EnergyElement]: + return [self._to_energy_element(e) for e in elements if e.element_type == element_type] + + def _single(element_type: str) -> Optional[EnergyElement]: + found = _elements(element_type) + return found[0] if found else None + + return EpcPropertyData( + dwelling_type=p.dwelling_type, + inspection_date=date.fromisoformat(p.inspection_date), + tenure=p.tenure, + transaction_type=p.transaction_type, + address_line_1=_require(p.address_line_1, "address_line_1"), + postcode=_require(p.postcode, "postcode"), + post_town=_require(p.post_town, "post_town"), + roofs=_elements("roof"), + walls=_elements("wall"), + floors=_elements("floor"), + main_heating=_elements("main_heating"), + door_count=p.door_count, + sap_heating=self._to_sap_heating(p, heating_rows), + sap_windows=[self._to_window(w) for w in self._windows(epc_property_id)], + sap_energy_source=self._to_energy_source(p), + sap_building_parts=[self._to_building_part(bp) for bp in part_rows], + solar_water_heating=p.solar_water_heating, + has_hot_water_cylinder=p.has_hot_water_cylinder, + has_fixed_air_conditioning=p.has_fixed_air_conditioning, + wet_rooms_count=p.wet_rooms_count, + extensions_count=p.extensions_count, + heated_rooms_count=p.heated_rooms_count, + open_chimneys_count=p.open_chimneys_count, + habitable_rooms_count=p.habitable_rooms_count, + insulated_door_count=p.insulated_door_count, + cfl_fixed_lighting_bulbs_count=p.cfl_fixed_lighting_bulbs_count, + led_fixed_lighting_bulbs_count=p.led_fixed_lighting_bulbs_count, + incandescent_fixed_lighting_bulbs_count=p.incandescent_fixed_lighting_bulbs_count, + total_floor_area_m2=p.total_floor_area_m2, + assessment_type=p.assessment_type, + sap_version=p.sap_version, + uprn=p.uprn, + status=p.status, + window=_single("window"), + lighting=_single("lighting"), + hot_water=_single("hot_water"), + secondary_heating=_single("secondary_heating"), + main_heating_controls=_single("main_heating_controls"), + schema_type=p.schema_type, + schema_versions_original=p.schema_versions_original, + report_type=p.report_type, + report_reference=p.report_reference, + uprn_source=p.uprn_source, + address_line_2=p.address_line_2, + region_code=p.region_code, + country_code=p.country_code, + built_form=p.built_form, + property_type=p.property_type, + pressure_test=p.pressure_test, + language_code=p.language_code, + completion_date=( + date.fromisoformat(p.completion_date) if p.completion_date else None + ), + registration_date=( + date.fromisoformat(p.registration_date) + if p.registration_date + else None + ), + measurement_type=p.measurement_type, + conservatory_type=p.conservatory_type, + has_conservatory=p.has_conservatory, + has_heated_separate_conservatory=p.has_heated_separate_conservatory, + blocked_chimneys_count=p.blocked_chimneys_count, + energy_rating_average=p.energy_rating_average, + current_energy_efficiency_band=( + Epc(perf.current_energy_efficiency_band) + if perf and perf.current_energy_efficiency_band + else None + ), + environmental_impact_current=( + perf.environmental_impact_current if perf else None + ), + heating_cost_current=perf.heating_cost_current if perf else None, + co2_emissions_current=perf.co2_emissions_current if perf else None, + energy_consumption_current=( + perf.energy_consumption_current if perf else None + ), + energy_rating_current=perf.energy_rating_current if perf else None, + lighting_cost_current=perf.lighting_cost_current if perf else None, + hot_water_cost_current=perf.hot_water_cost_current if perf else None, + insulated_door_u_value=p.insulated_door_u_value, + mechanical_ventilation=p.mechanical_ventilation, + percent_draughtproofed=p.percent_draughtproofed, + heating_cost_potential=perf.heating_cost_potential if perf else None, + co2_emissions_potential=perf.co2_emissions_potential if perf else None, + energy_consumption_potential=( + perf.energy_consumption_potential if perf else None + ), + energy_rating_potential=perf.energy_rating_potential if perf else None, + lighting_cost_potential=perf.lighting_cost_potential if perf else None, + hot_water_cost_potential=perf.hot_water_cost_potential if perf else None, + environmental_impact_potential=( + perf.environmental_impact_potential if perf else None + ), + potential_energy_efficiency_band=( + Epc(perf.potential_energy_efficiency_band) + if perf and perf.potential_energy_efficiency_band + else None + ), + draughtproofed_door_count=p.draughtproofed_door_count, + mechanical_vent_duct_type=p.mechanical_vent_duct_type, + windows_transmission_details=( + WindowsTransmissionDetails( + u_value=p.windows_transmission_u_value, + data_source=_require( + p.windows_transmission_data_source, + "windows_transmission_data_source", + ), + solar_transmittance=_require( + p.windows_transmission_solar_transmittance, + "windows_transmission_solar_transmittance", + ), + ) + if p.windows_transmission_u_value is not None + else None + ), + multiple_glazed_proportion=p.multiple_glazed_proportion, + calculation_software_version=p.calculation_software_version, + mechanical_vent_duct_placement=p.mechanical_vent_duct_placement, + mechanical_vent_duct_insulation=p.mechanical_vent_duct_insulation, + pressure_test_certificate_number=p.pressure_test_certificate_number, + mechanical_ventilation_index_number=p.mechanical_ventilation_index_number, + mechanical_vent_measured_installation=p.mechanical_vent_measured_installation, + co2_emissions_current_per_floor_area=( + perf.co2_emissions_current_per_floor_area if perf else None + ), + low_energy_fixed_lighting_bulbs_count=p.low_energy_fixed_lighting_bulbs_count, + sap_flat_details=( + self._to_flat_details(flat_row) if flat_row is not None else None + ), + fixed_lighting_outlets_count=p.fixed_lighting_outlets_count, + low_energy_fixed_lighting_outlets_count=p.low_energy_fixed_lighting_outlets_count, + sap_ventilation=self._to_ventilation(p), + number_of_storeys=p.number_of_storeys, + any_unheated_rooms=p.any_unheated_rooms, + waste_water_heat_recovery=p.waste_water_heat_recovery, + hydro=p.hydro, + photovoltaic_array=p.photovoltaic_array, + mechanical_vent_duct_insulation_level=p.mechanical_vent_duct_insulation_level, + addendum=( + Addendum( + stone_walls=p.addendum_stone_walls, + system_build=p.addendum_system_build, + addendum_numbers=p.addendum_numbers, + ) + if ( + p.addendum_stone_walls is not None + or p.addendum_system_build is not None + or p.addendum_numbers is not None + ) + else None + ), + ) + + @private + def _windows(self, epc_property_id: int) -> list[EpcWindowModel]: + return list( + self._session.exec( + select(EpcWindowModel) + .where(EpcWindowModel.epc_property_id == epc_property_id) + .order_by(EpcWindowModel.id) # type: ignore[arg-type] + ).all() + ) + + @private + def _to_energy_element(self, e: EpcEnergyElementModel) -> EnergyElement: + return EnergyElement( + description=e.description, + energy_efficiency_rating=e.energy_efficiency_rating, + environmental_efficiency_rating=e.environmental_efficiency_rating, + ) + + @private + def _to_sap_heating( + self, p: EpcPropertyModel, heating_rows: list[EpcMainHeatingDetailModel] + ) -> SapHeating: + shower_outlets = ( + ShowerOutlets( + shower_outlet=ShowerOutlet( + shower_outlet_type=p.heating_shower_outlet_type, + shower_wwhrs=p.heating_shower_wwhrs, + ) + ) + if p.heating_shower_outlet_type is not None + else None + ) + return SapHeating( + instantaneous_wwhrs=InstantaneousWwhrs( + wwhrs_index_number1=p.heating_wwhrs_index_number_1, + wwhrs_index_number2=p.heating_wwhrs_index_number_2, + ), + main_heating_details=[self._to_main_heating(m) for m in heating_rows], + has_fixed_air_conditioning=p.has_fixed_air_conditioning, + cylinder_size=p.heating_cylinder_size, + water_heating_code=p.heating_water_heating_code, + water_heating_fuel=p.heating_water_heating_fuel, + immersion_heating_type=p.heating_immersion_heating_type, + shower_outlets=shower_outlets, + cylinder_insulation_type=p.heating_cylinder_insulation_type, + cylinder_thermostat=p.heating_cylinder_thermostat, + secondary_fuel_type=p.heating_secondary_fuel_type, + secondary_heating_type=p.heating_secondary_heating_type, + cylinder_insulation_thickness_mm=p.heating_cylinder_insulation_thickness_mm, + number_baths=p.heating_number_baths, + number_baths_wwhrs=p.heating_number_baths_wwhrs, + electric_shower_count=p.heating_electric_shower_count, + mixer_shower_count=p.heating_mixer_shower_count, + ) + + @private + def _to_main_heating(self, m: EpcMainHeatingDetailModel) -> MainHeatingDetail: + return MainHeatingDetail( + has_fghrs=m.has_fghrs, + main_fuel_type=m.main_fuel_type, + heat_emitter_type=m.heat_emitter_type, + emitter_temperature=m.emitter_temperature, + main_heating_control=m.main_heating_control, + fan_flue_present=m.fan_flue_present, + boiler_flue_type=m.boiler_flue_type, + boiler_ignition_type=m.boiler_ignition_type, + central_heating_pump_age=m.central_heating_pump_age, + central_heating_pump_age_str=m.central_heating_pump_age_str, + main_heating_index_number=m.main_heating_index_number, + sap_main_heating_code=m.sap_main_heating_code, + main_heating_number=m.main_heating_number, + main_heating_category=m.main_heating_category, + main_heating_fraction=m.main_heating_fraction, + main_heating_data_source=m.main_heating_data_source, + condensing=m.condensing, + weather_compensator=m.weather_compensator, + ) + + @private + def _to_window(self, w: EpcWindowModel) -> SapWindow: + return SapWindow( + frame_material=w.frame_material, + glazing_gap=w.glazing_gap, + orientation=w.orientation, + window_type=w.window_type, + glazing_type=w.glazing_type, + window_width=w.window_width, + window_height=w.window_height, + draught_proofed=w.draught_proofed, + window_location=w.window_location, + window_wall_type=w.window_wall_type, + permanent_shutters_present=w.permanent_shutters_present, + frame_factor=w.frame_factor, + window_transmission_details=( + WindowTransmissionDetails( + u_value=w.transmission_u_value, + data_source=_require( + w.transmission_data_source, "window.transmission_data_source" + ), + solar_transmittance=_require( + w.transmission_solar_transmittance, + "window.transmission_solar_transmittance", + ), + ) + if w.transmission_u_value is not None + else None + ), + permanent_shutters_insulated=w.permanent_shutters_insulated, + ) + + @private + def _to_building_part(self, bp: EpcBuildingPartModel) -> SapBuildingPart: + floor_rows = list( + self._session.exec( + select(EpcFloorDimensionModel) + .where(EpcFloorDimensionModel.epc_building_part_id == bp.id) + .order_by(EpcFloorDimensionModel.id) # type: ignore[arg-type] + ).all() + ) + return SapBuildingPart( + identifier=BuildingPartIdentifier(bp.identifier), + construction_age_band=bp.construction_age_band, + wall_construction=bp.wall_construction, + wall_insulation_type=bp.wall_insulation_type, + wall_thickness_measured=bp.wall_thickness_measured, + party_wall_construction=bp.party_wall_construction, + sap_floor_dimensions=[self._to_floor_dimension(f) for f in floor_rows], + building_part_number=bp.building_part_number, + wall_dry_lined=bp.wall_dry_lined, + wall_thickness_mm=bp.wall_thickness_mm, + wall_insulation_thickness=bp.wall_insulation_thickness, + sap_alternative_wall_1=self._to_alt_wall(bp, 1), + sap_alternative_wall_2=self._to_alt_wall(bp, 2), + floor_heat_loss=bp.floor_heat_loss, + floor_insulation_thickness=bp.floor_insulation_thickness, + flat_roof_insulation_thickness=bp.flat_roof_insulation_thickness, + floor_type=bp.floor_type, + floor_construction_type=bp.floor_construction_type, + floor_insulation_type_str=bp.floor_insulation_type_str, + floor_u_value_known=bp.floor_u_value_known, + roof_construction=bp.roof_construction, + roof_construction_type=bp.roof_construction_type, + curtain_wall_age=bp.curtain_wall_age, + roof_insulation_location=bp.roof_insulation_location, + roof_insulation_thickness=bp.roof_insulation_thickness, + sap_room_in_roof=( + SapRoomInRoof( + floor_area=bp.room_in_roof_floor_area, + construction_age_band=_require( + bp.room_in_roof_construction_age_band, + "room_in_roof_construction_age_band", + ), + ) + if bp.room_in_roof_floor_area is not None + else None + ), + ) + + @private + def _to_alt_wall( + self, bp: EpcBuildingPartModel, n: int + ) -> Optional[SapAlternativeWall]: + area = bp.alt_wall_1_area if n == 1 else bp.alt_wall_2_area + if area is None: + return None + dry_lined = bp.alt_wall_1_dry_lined if n == 1 else bp.alt_wall_2_dry_lined + construction = ( + bp.alt_wall_1_construction if n == 1 else bp.alt_wall_2_construction + ) + insulation_type = ( + bp.alt_wall_1_insulation_type if n == 1 else bp.alt_wall_2_insulation_type + ) + thickness_measured = ( + bp.alt_wall_1_thickness_measured + if n == 1 + else bp.alt_wall_2_thickness_measured + ) + insulation_thickness = ( + bp.alt_wall_1_insulation_thickness + if n == 1 + else bp.alt_wall_2_insulation_thickness + ) + return SapAlternativeWall( + wall_area=area, + wall_dry_lined=_require(dry_lined, f"alt_wall_{n}_dry_lined"), + wall_construction=_require(construction, f"alt_wall_{n}_construction"), + wall_insulation_type=_require( + insulation_type, f"alt_wall_{n}_insulation_type" + ), + wall_thickness_measured=_require( + thickness_measured, f"alt_wall_{n}_thickness_measured" + ), + wall_insulation_thickness=insulation_thickness, + ) + + @private + def _to_floor_dimension(self, f: EpcFloorDimensionModel) -> SapFloorDimension: + return SapFloorDimension( + room_height_m=f.room_height_m, + total_floor_area_m2=f.total_floor_area_m2, + party_wall_length_m=f.party_wall_length_m, + heat_loss_perimeter_m=f.heat_loss_perimeter_m, + floor=f.floor, + floor_insulation=f.floor_insulation, + floor_construction=f.floor_construction, + ) + + @private + def _to_energy_source(self, p: EpcPropertyModel) -> SapEnergySource: + return SapEnergySource( + mains_gas=p.energy_mains_gas, + meter_type=p.energy_meter_type, + pv_battery_count=p.energy_pv_battery_count, + wind_turbines_count=p.energy_wind_turbines_count, + gas_smart_meter_present=p.energy_gas_smart_meter_present, + is_dwelling_export_capable=p.energy_is_dwelling_export_capable, + wind_turbines_terrain_type=p.energy_wind_turbines_terrain_type, + electricity_smart_meter_present=p.energy_electricity_smart_meter_present, + pv_connection=p.energy_pv_connection, + photovoltaic_supply=( + PhotovoltaicSupply( + none_or_no_details=PhotovoltaicSupplyNoneOrNoDetails( + percent_roof_area=p.energy_pv_percent_roof_area + ) + ) + if p.energy_pv_percent_roof_area is not None + else None + ), + wind_turbine_details=( + WindTurbineDetails( + hub_height=p.energy_wind_turbine_hub_height, + rotor_diameter=_require( + p.energy_wind_turbine_rotor_diameter, + "energy_wind_turbine_rotor_diameter", + ), + ) + if p.energy_wind_turbine_hub_height is not None + else None + ), + pv_batteries=( + PvBatteries( + pv_battery=PvBattery(battery_capacity=p.energy_pv_battery_capacity) + ) + if p.energy_pv_battery_capacity is not None + else None + ), + ) + + @private + def _to_ventilation(self, p: EpcPropertyModel) -> Optional[SapVentilation]: + if not p.ventilation_present: + return None + return SapVentilation( + ventilation_type=p.ventilation_type, + draught_lobby=p.ventilation_draught_lobby, + pressure_test=p.ventilation_pressure_test, + open_flues_count=p.ventilation_open_flues_count, + closed_flues_count=p.ventilation_closed_flues_count, + boiler_flues_count=p.ventilation_boiler_flues_count, + other_flues_count=p.ventilation_other_flues_count, + extract_fans_count=p.ventilation_extract_fans_count, + passive_vents_count=p.ventilation_passive_vents_count, + flueless_gas_fires_count=p.ventilation_flueless_gas_fires_count, + ventilation_in_pcdf_database=p.ventilation_in_pcdf_database, + sheltered_sides=p.ventilation_sheltered_sides, + has_suspended_timber_floor=p.ventilation_has_suspended_timber_floor, + suspended_timber_floor_sealed=p.ventilation_suspended_timber_floor_sealed, + has_draught_lobby=p.ventilation_has_draught_lobby, + air_permeability_ap4_m3_h_m2=p.ventilation_air_permeability_ap4_m3_h_m2, + mechanical_ventilation_kind=p.ventilation_mechanical_ventilation_kind, + ) + + @private + def _to_flat_details(self, f: EpcFlatDetailsModel) -> SapFlatDetails: + return SapFlatDetails( + level=f.level, + top_storey=f.top_storey, + flat_location=f.flat_location, + heat_loss_corridor=f.heat_loss_corridor, + storey_count=f.storey_count, + unheated_corridor_length_m=f.unheated_corridor_length_m, + ) diff --git a/repositories/epc/epc_repository.py b/repositories/epc/epc_repository.py new file mode 100644 index 00000000..db479c85 --- /dev/null +++ b/repositories/epc/epc_repository.py @@ -0,0 +1,26 @@ +from __future__ import annotations + +from abc import ABC, abstractmethod + +from datatypes.epc.domain.epc_property_data import EpcPropertyData + + +class EpcRepository(ABC): + """Persists and loads the structured EPC Property Data slice. + + `save` writes the `EpcPropertyData` to the `epc_property` parent row and its + child tables; `get` reconstructs the persisted projection back into an + `EpcPropertyData`. Round-trip fidelity over that projection is pinned by the + Slice-1 round-trip test (Hestia-Homes/Model#1129). + """ + + @abstractmethod + def save( + self, + data: EpcPropertyData, + property_id: int | None = None, + portfolio_id: int | None = None, + ) -> int: ... + + @abstractmethod + def get(self, epc_property_id: int) -> EpcPropertyData: ... diff --git a/tests/repositories/epc/__init__.py b/tests/repositories/epc/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/repositories/epc/test_epc_round_trip.py b/tests/repositories/epc/test_epc_round_trip.py new file mode 100644 index 00000000..064891bd --- /dev/null +++ b/tests/repositories/epc/test_epc_round_trip.py @@ -0,0 +1,56 @@ +"""Persistence round-trip fidelity for EPC Property Data (Slice 1, #1129). + +The load-bearing risk of the ara_first_run rebuild: an EpcPropertyData mapped to +the epc_property tables, saved, reloaded and mapped back must reconstruct the +original object exactly. A failure here is either a missing column (a migration +the FE repo must make) or a mapper gap — either way we want it to fail loudly, +inside First Run, rather than be deferred to a later Refresh. +""" + +from __future__ import annotations + +import dataclasses +import json +from pathlib import Path +from typing import Any + +import pytest +from sqlalchemy import Engine +from sqlmodel import Session + +from datatypes.epc.domain.epc_property_data import EpcPropertyData +from datatypes.epc.domain.mapper import EpcPropertyDataMapper +from repositories.epc.epc_postgres_repository import EpcPostgresRepository + +_JSON_SAMPLES = Path(__file__).resolve().parents[3] / "backend/epc_api/json_samples" + + +def _load_epc(schema_dir: str) -> EpcPropertyData: + raw: dict[str, Any] = json.loads( + (_JSON_SAMPLES / schema_dir / "epc.json").read_text() + ) + return EpcPropertyDataMapper.from_api_response(raw) + + +@pytest.mark.parametrize( + "schema_dir", + ["RdSAP-Schema-21.0.0", "RdSAP-Schema-21.0.1"], +) +def test_epc_property_data_round_trips(schema_dir: str, db_engine: Engine) -> None: + # Arrange + original = _load_epc(schema_dir) + + # Act + with Session(db_engine) as session: + epc_property_id = EpcPostgresRepository(session).save(original) + session.commit() + with Session(db_engine) as session: + reloaded = EpcPostgresRepository(session).get(epc_property_id) + + # Assert + # Slice 1 pins round-trip fidelity over the persisted projection. The only + # field not yet stored is `renewable_heat_incentive` (the P0 structural gap + # tracked in #1137 — a new table); exclude it here and drop this `replace` + # once that table lands. + projected = dataclasses.replace(original, renewable_heat_incentive=None) + assert reloaded == projected From 311d1e751aed1bdb8a3af8c6e93f04d4a5cd42e1 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sat, 30 May 2026 19:30:18 +0000 Subject: [PATCH 03/18] =?UTF-8?q?feat(epc):=20persist=20renewable=5Fheat?= =?UTF-8?q?=5Fincentive=20=E2=80=94=20full=20round-trip=20equality=20(#113?= =?UTF-8?q?7)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add epc_renewable_heat_incentive table (space_heating_kwh, water_heating_kwh + the three insulation-impact kWh fields), wired into EpcPostgresRepository save/get. This is the P0 gap: RenewableHeatIncentive carries the baseline space-heating/hot-water kWh that EPC Energy Derivation consumes. The round-trip test now asserts full deep-equality (dropped the renewable_heat_incentive exclusion) and passes for RdSAP 21.0.0 + 21.0.1. DB migration for the new table documented in docs/migrations/epc-property-round-trip-fidelity.md. Co-Authored-By: Claude Opus 4.8 --- .../epc-property-round-trip-fidelity.md | 11 ++++--- infrastructure/postgres/epc_property_table.py | 29 +++++++++++++++++++ repositories/epc/epc_postgres_repository.py | 24 +++++++++++++++ tests/repositories/epc/test_epc_round_trip.py | 8 +---- 4 files changed, 61 insertions(+), 11 deletions(-) diff --git a/docs/migrations/epc-property-round-trip-fidelity.md b/docs/migrations/epc-property-round-trip-fidelity.md index e7e23c02..d9ed6557 100644 --- a/docs/migrations/epc-property-round-trip-fidelity.md +++ b/docs/migrations/epc-property-round-trip-fidelity.md @@ -44,11 +44,14 @@ and **still require the matching DB migration** wherever the physical tables liv `ventilation_has_draught_lobby`, `ventilation_air_permeability_ap4_m3_h_m2`, `ventilation_mechanical_ventilation_kind`; `epc_building_part`: `roof_construction_type`, `curtain_wall_age`. +- **§2.1 `epc_renewable_heat_incentive` table** (#1137) — now created on the SQLModel and wired + into save/get; the round-trip test asserts **full deep-equality** (no exclusion). DB migration + still required. -**Still open (follow-up issues):** §2.1 `epc_renewable_heat_incentive` (P0, #1137 — excluded from the -slice-1 assertion via `dataclasses.replace(..., renewable_heat_incentive=None)` until landed), and -the remaining §2 structural tables (room-in-roof detail, PV arrays, roof windows) + §3 nested-wall -fields (`SapAlternativeWall.u_value`/`wall_thickness_mm`) + `SapFloorDimension` exposed-floor flags. +**Still open (follow-up issues):** the remaining §2 structural tables (room-in-roof detail, PV +arrays, roof windows) + §3 nested-wall fields (`SapAlternativeWall.u_value`/`wall_thickness_mm`) + +`SapFloorDimension` exposed-floor flags — none populated in the 21.0.0/21.0.1 fixtures, so latent +until a richer fixture exercises them. --- diff --git a/infrastructure/postgres/epc_property_table.py b/infrastructure/postgres/epc_property_table.py index deee192c..539628bd 100644 --- a/infrastructure/postgres/epc_property_table.py +++ b/infrastructure/postgres/epc_property_table.py @@ -9,6 +9,7 @@ from datatypes.epc.domain.epc_property_data import ( EpcPropertyData, EnergyElement, MainHeatingDetail, + RenewableHeatIncentive, SapBuildingPart, SapFloorDimension, SapFlatDetails, @@ -413,6 +414,34 @@ class EpcPropertyEnergyPerformanceModel(SQLModel, table=True): ) +class EpcRenewableHeatIncentiveModel(SQLModel, table=True): + __tablename__: ClassVar[str] = "epc_renewable_heat_incentive" # pyright: ignore[reportIncompatibleVariableOverride] + + id: Optional[int] = Field(default=None, primary_key=True) + epc_property_id: int = Field( + foreign_key="epc_property.id", nullable=False, unique=True + ) + + space_heating_kwh: float + water_heating_kwh: float + impact_of_loft_insulation_kwh: Optional[float] = Field(default=None) + impact_of_cavity_insulation_kwh: Optional[float] = Field(default=None) + impact_of_solid_wall_insulation_kwh: Optional[float] = Field(default=None) + + @classmethod + def from_domain( + cls, rhi: RenewableHeatIncentive, epc_property_id: int + ) -> EpcRenewableHeatIncentiveModel: + return cls( + epc_property_id=epc_property_id, + space_heating_kwh=rhi.space_heating_kwh, + water_heating_kwh=rhi.water_heating_kwh, + impact_of_loft_insulation_kwh=rhi.impact_of_loft_insulation_kwh, + impact_of_cavity_insulation_kwh=rhi.impact_of_cavity_insulation_kwh, + impact_of_solid_wall_insulation_kwh=rhi.impact_of_solid_wall_insulation_kwh, + ) + + class EpcFlatDetailsModel(SQLModel, table=True): __tablename__: ClassVar[str] = "epc_flat_details" # pyright: ignore[reportIncompatibleVariableOverride] diff --git a/repositories/epc/epc_postgres_repository.py b/repositories/epc/epc_postgres_repository.py index 02dc49b9..52873dce 100644 --- a/repositories/epc/epc_postgres_repository.py +++ b/repositories/epc/epc_postgres_repository.py @@ -17,6 +17,7 @@ from datatypes.epc.domain.epc_property_data import ( PhotovoltaicSupplyNoneOrNoDetails, PvBatteries, PvBattery, + RenewableHeatIncentive, SapAlternativeWall, SapBuildingPart, SapEnergySource, @@ -40,6 +41,7 @@ from infrastructure.postgres.epc_property_table import ( EpcMainHeatingDetailModel, EpcPropertyEnergyPerformanceModel, EpcPropertyModel, + EpcRenewableHeatIncentiveModel, EpcWindowModel, ) from repositories.epc.epc_repository import EpcRepository @@ -124,6 +126,12 @@ class EpcPostgresRepository(EpcRepository): self._session.add( EpcFlatDetailsModel.from_domain(data.sap_flat_details, epc_property_id) ) + if data.renewable_heat_incentive is not None: + self._session.add( + EpcRenewableHeatIncentiveModel.from_domain( + data.renewable_heat_incentive, epc_property_id + ) + ) return epc_property_id def get(self, epc_property_id: int) -> EpcPropertyData: @@ -161,6 +169,11 @@ class EpcPostgresRepository(EpcRepository): EpcFlatDetailsModel.epc_property_id == epc_property_id ) ).first() + rhi_row = self._session.exec( + select(EpcRenewableHeatIncentiveModel).where( + EpcRenewableHeatIncentiveModel.epc_property_id == epc_property_id + ) + ).first() def _elements(element_type: str) -> list[EnergyElement]: return [self._to_energy_element(e) for e in elements if e.element_type == element_type] @@ -308,6 +321,17 @@ class EpcPostgresRepository(EpcRepository): waste_water_heat_recovery=p.waste_water_heat_recovery, hydro=p.hydro, photovoltaic_array=p.photovoltaic_array, + renewable_heat_incentive=( + RenewableHeatIncentive( + space_heating_kwh=rhi_row.space_heating_kwh, + water_heating_kwh=rhi_row.water_heating_kwh, + impact_of_loft_insulation_kwh=rhi_row.impact_of_loft_insulation_kwh, + impact_of_cavity_insulation_kwh=rhi_row.impact_of_cavity_insulation_kwh, + impact_of_solid_wall_insulation_kwh=rhi_row.impact_of_solid_wall_insulation_kwh, + ) + if rhi_row is not None + else None + ), mechanical_vent_duct_insulation_level=p.mechanical_vent_duct_insulation_level, addendum=( Addendum( diff --git a/tests/repositories/epc/test_epc_round_trip.py b/tests/repositories/epc/test_epc_round_trip.py index 064891bd..192027f7 100644 --- a/tests/repositories/epc/test_epc_round_trip.py +++ b/tests/repositories/epc/test_epc_round_trip.py @@ -9,7 +9,6 @@ inside First Run, rather than be deferred to a later Refresh. from __future__ import annotations -import dataclasses import json from pathlib import Path from typing import Any @@ -48,9 +47,4 @@ def test_epc_property_data_round_trips(schema_dir: str, db_engine: Engine) -> No reloaded = EpcPostgresRepository(session).get(epc_property_id) # Assert - # Slice 1 pins round-trip fidelity over the persisted projection. The only - # field not yet stored is `renewable_heat_incentive` (the P0 structural gap - # tracked in #1137 — a new table); exclude it here and drop this `replace` - # once that table lands. - projected = dataclasses.replace(original, renewable_heat_incentive=None) - assert reloaded == projected + assert reloaded == original From 92de07efba5085da10002ead9300cb9c9e3f7bac Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sat, 30 May 2026 19:39:54 +0000 Subject: [PATCH 04/18] feat(property): Property aggregate + PropertyRepository (#1132) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add the Ara modelling aggregate root (ADR-0002): domain/property/ with PropertyIdentity, SiteNotes, Property, Properties. Property.source_path implements the two disjoint source paths + Recency Tie-Break (ADR-0001; survey wins on an equal date); effective_epc resolves to the surveyed data (Site Notes path) or the public EPC (epc_with_overlay path — Landlord Overrides overlay is a later slice). Pure dataclasses, no infrastructure imports. PropertyRepository port + PropertyPostgresRepository hydrate the aggregate whole from a defensive view of the FE-owned 'property' table (identity columns) plus the EPC slice via EpcRepository.get_for_property. Reads only from repos (ADR-0003). 8 domain + 1 hydration test; pyright strict clean. Co-Authored-By: Claude Opus 4.8 --- domain/property/__init__.py | 0 domain/property/properties.py | 25 ++++ domain/property/property.py | 73 ++++++++++ domain/property/site_notes.py | 23 ++++ infrastructure/postgres/property_table.py | 23 ++++ repositories/epc/epc_postgres_repository.py | 10 ++ repositories/epc/epc_repository.py | 4 + repositories/property/__init__.py | 0 .../property/property_postgres_repository.py | 36 +++++ repositories/property/property_repository.py | 17 +++ tests/domain/property/__init__.py | 0 tests/domain/property/test_property.py | 127 ++++++++++++++++++ tests/repositories/property/__init__.py | 0 .../property/test_property_repository.py | 49 +++++++ 14 files changed, 387 insertions(+) create mode 100644 domain/property/__init__.py create mode 100644 domain/property/properties.py create mode 100644 domain/property/property.py create mode 100644 domain/property/site_notes.py create mode 100644 infrastructure/postgres/property_table.py create mode 100644 repositories/property/__init__.py create mode 100644 repositories/property/property_postgres_repository.py create mode 100644 repositories/property/property_repository.py create mode 100644 tests/domain/property/__init__.py create mode 100644 tests/domain/property/test_property.py create mode 100644 tests/repositories/property/__init__.py create mode 100644 tests/repositories/property/test_property_repository.py diff --git a/domain/property/__init__.py b/domain/property/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/domain/property/properties.py b/domain/property/properties.py new file mode 100644 index 00000000..b7a5aae5 --- /dev/null +++ b/domain/property/properties.py @@ -0,0 +1,25 @@ +from __future__ import annotations + +from collections.abc import Callable, Iterator +from dataclasses import dataclass + +from domain.property.property import Property + + +@dataclass +class Properties: + """A first-class collection of Property objects — the unit of bulk operation + in services (CONTEXT.md: Properties). Services take and return `Properties` + rather than bare lists so batch operations read clearly. + """ + + items: list[Property] + + def __iter__(self) -> Iterator[Property]: + return iter(self.items) + + def __len__(self) -> int: + return len(self.items) + + def filter(self, predicate: Callable[[Property], bool]) -> "Properties": + return Properties([p for p in self.items if predicate(p)]) diff --git a/domain/property/property.py b/domain/property/property.py new file mode 100644 index 00000000..856eb3e3 --- /dev/null +++ b/domain/property/property.py @@ -0,0 +1,73 @@ +from __future__ import annotations + +from dataclasses import dataclass +from typing import Literal, Optional + +from datatypes.epc.domain.epc_property_data import EpcPropertyData +from domain.property.site_notes import SiteNotes + +SourcePath = Literal["site_notes", "epc_with_overlay"] + + +@dataclass(frozen=True) +class PropertyIdentity: + """Identifies a single Property within a portfolio. + + Keyed by `(portfolio_id, uprn)` or `(portfolio_id, landlord_property_id)` — + a UPRN is permanent but each portfolio gets its own Property against it + (CONTEXT.md: UPRN). + """ + + portfolio_id: int + postcode: str + address: str + uprn: Optional[int] = None + landlord_property_id: Optional[str] = None + + +@dataclass +class Property: + """The Ara modelling aggregate root for a single dwelling (ADR-0002). + + Holds identity plus the source data the pipeline reasons about. Enrichments + (geospatial, solar) and modelling outputs (baseline performance, plans) are + added by later slices — this is the minimal-and-growing shape for First Run. + """ + + identity: PropertyIdentity + epc: Optional[EpcPropertyData] = None + site_notes: Optional[SiteNotes] = None + + @property + def source_path(self) -> SourcePath: + """Which of the two disjoint source paths models this Property (ADR-0001). + + Site Notes alone, or the public EPC (with Landlord Overrides, once that + slice lands). When both exist the newer wins (Recency Tie-Break); on an + equal date the survey wins, as it reflects on-site observation. + """ + if self.site_notes is not None and self.epc is not None: + epc_date = self.epc.registration_date or self.epc.inspection_date + if self.site_notes.surveyed_at >= epc_date: + return "site_notes" + return "epc_with_overlay" + if self.site_notes is not None: + return "site_notes" + if self.epc is not None: + return "epc_with_overlay" + raise ValueError( + "Property has neither Site Notes nor an EPC; no source path to model from" + ) + + @property + def effective_epc(self) -> EpcPropertyData: + """The EpcPropertyData the modelling pipeline scores against. + + Path 1: the Site Notes' surveyed data. Path 2: the public EPC (Landlord + Overrides overlay is a later slice — returned as-is for now). + """ + if self.source_path == "site_notes": + assert self.site_notes is not None + return self.site_notes.to_epc_property_data() + assert self.epc is not None + return self.epc diff --git a/domain/property/site_notes.py b/domain/property/site_notes.py new file mode 100644 index 00000000..04267735 --- /dev/null +++ b/domain/property/site_notes.py @@ -0,0 +1,23 @@ +from __future__ import annotations + +from dataclasses import dataclass +from datetime import date + +from datatypes.epc.domain.epc_property_data import EpcPropertyData + + +@dataclass +class SiteNotes: + """A Domna survey of a single Property (CONTEXT.md: Site Notes). + + Committed by the domain to being full-coverage — it carries every EPC field + the modelling pipeline needs, expressed as an `EpcPropertyData`. When present + (and not older than the public EPC) it is the complete source of truth for + the Property; the public EPC is then irrelevant (ADR-0001). + """ + + surveyed_at: date + epc: EpcPropertyData + + def to_epc_property_data(self) -> EpcPropertyData: + return self.epc diff --git a/infrastructure/postgres/property_table.py b/infrastructure/postgres/property_table.py new file mode 100644 index 00000000..0b91a2ad --- /dev/null +++ b/infrastructure/postgres/property_table.py @@ -0,0 +1,23 @@ +from __future__ import annotations + +from typing import ClassVar, Optional + +from sqlmodel import Field, SQLModel + + +class PropertyRow(SQLModel, table=True): + """Defensive view of the FE-owned ``property`` table. + + The schema and migrations for ``property`` are owned by the front-end + Next.js repo; this declares only the identity columns the modelling backend + reads/writes, so FE-owned migrations to other columns don't ripple into us. + """ + + __tablename__: ClassVar[str] = "property" # pyright: ignore[reportIncompatibleVariableOverride] + + id: Optional[int] = Field(default=None, primary_key=True) + portfolio_id: int + postcode: str + address: str + uprn: Optional[int] = Field(default=None) + landlord_property_id: Optional[str] = Field(default=None) diff --git a/repositories/epc/epc_postgres_repository.py b/repositories/epc/epc_postgres_repository.py index 52873dce..b0a8070c 100644 --- a/repositories/epc/epc_postgres_repository.py +++ b/repositories/epc/epc_postgres_repository.py @@ -134,6 +134,16 @@ class EpcPostgresRepository(EpcRepository): ) return epc_property_id + def get_for_property(self, property_id: int) -> Optional[EpcPropertyData]: + row = self._session.exec( + select(EpcPropertyModel) + .where(EpcPropertyModel.property_id == property_id) + .order_by(EpcPropertyModel.id) # type: ignore[arg-type] + ).first() + if row is None or row.id is None: + return None + return self.get(row.id) + def get(self, epc_property_id: int) -> EpcPropertyData: p = self._session.get(EpcPropertyModel, epc_property_id) if p is None: diff --git a/repositories/epc/epc_repository.py b/repositories/epc/epc_repository.py index db479c85..fb83bdbc 100644 --- a/repositories/epc/epc_repository.py +++ b/repositories/epc/epc_repository.py @@ -1,6 +1,7 @@ from __future__ import annotations from abc import ABC, abstractmethod +from typing import Optional from datatypes.epc.domain.epc_property_data import EpcPropertyData @@ -24,3 +25,6 @@ class EpcRepository(ABC): @abstractmethod def get(self, epc_property_id: int) -> EpcPropertyData: ... + + @abstractmethod + def get_for_property(self, property_id: int) -> Optional[EpcPropertyData]: ... diff --git a/repositories/property/__init__.py b/repositories/property/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/repositories/property/property_postgres_repository.py b/repositories/property/property_postgres_repository.py new file mode 100644 index 00000000..c1b631dd --- /dev/null +++ b/repositories/property/property_postgres_repository.py @@ -0,0 +1,36 @@ +from __future__ import annotations + +from sqlmodel import Session + +from domain.property.property import Property, PropertyIdentity +from infrastructure.postgres.property_table import PropertyRow +from repositories.epc.epc_repository import EpcRepository +from repositories.property.property_repository import PropertyRepository + + +class PropertyPostgresRepository(PropertyRepository): + """Hydrates the Property aggregate from the FE-owned ``property`` row plus the + EPC slice (via an injected `EpcRepository`). Reads only from repos — no + external IO — so a hydrated Property is a pure function of repository state + (ADR-0003). + """ + + def __init__(self, session: Session, epc_repo: EpcRepository) -> None: + self._session = session + self._epc_repo = epc_repo + + def get(self, property_id: int) -> Property: + row = self._session.get(PropertyRow, property_id) + if row is None: + raise ValueError(f"property {property_id} not found") + identity = PropertyIdentity( + portfolio_id=row.portfolio_id, + postcode=row.postcode, + address=row.address, + uprn=row.uprn, + landlord_property_id=row.landlord_property_id, + ) + return Property( + identity=identity, + epc=self._epc_repo.get_for_property(property_id), + ) diff --git a/repositories/property/property_repository.py b/repositories/property/property_repository.py new file mode 100644 index 00000000..0a9045be --- /dev/null +++ b/repositories/property/property_repository.py @@ -0,0 +1,17 @@ +from __future__ import annotations + +from abc import ABC, abstractmethod + +from domain.property.property import Property + + +class PropertyRepository(ABC): + """Loads and saves the Property aggregate. + + Composes the aggregate whole from the FE-owned ``property`` identity row plus + its source-data slices (EPC today; Site Notes / enrichments as later slices + land). Aggregates load whole — never half a Property (ADR-0002). + """ + + @abstractmethod + def get(self, property_id: int) -> Property: ... diff --git a/tests/domain/property/__init__.py b/tests/domain/property/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/domain/property/test_property.py b/tests/domain/property/test_property.py new file mode 100644 index 00000000..01d7edfd --- /dev/null +++ b/tests/domain/property/test_property.py @@ -0,0 +1,127 @@ +"""Property aggregate — source-path precedence and Effective EPC resolution. + +The two disjoint source paths (ADR-0001): a Property is modelled either from its +Site Notes alone, or from the public EPC (with Landlord Overrides, once that slice +lands). When both exist, the newer wins (Recency Tie-Break). +""" + +from __future__ import annotations + +import json +from datetime import date +from pathlib import Path +from typing import Any + +from datatypes.epc.domain.epc_property_data import EpcPropertyData +from datatypes.epc.domain.mapper import EpcPropertyDataMapper +from domain.property.properties import Properties +from domain.property.property import Property, PropertyIdentity +from domain.property.site_notes import SiteNotes + +_JSON_SAMPLES = Path(__file__).resolve().parents[3] / "backend/epc_api/json_samples" + + +def _epc(inspection: str = "2023-12-01") -> EpcPropertyData: + raw: dict[str, Any] = json.loads( + (_JSON_SAMPLES / "RdSAP-Schema-21.0.0" / "epc.json").read_text() + ) + return EpcPropertyDataMapper.from_api_response(raw) + + +def _identity() -> PropertyIdentity: + return PropertyIdentity( + portfolio_id=1, postcode="A0 0AA", address="1 Some Street", uprn=12345 + ) + + +def test_source_path_is_epc_with_overlay_when_only_epc_present() -> None: + # Arrange + prop = Property(identity=_identity(), epc=_epc()) + + # Act + path = prop.source_path + + # Assert + assert path == "epc_with_overlay" + + +def test_source_path_is_site_notes_when_only_site_notes_present() -> None: + # Arrange + prop = Property( + identity=_identity(), + site_notes=SiteNotes(surveyed_at=date(2024, 6, 1), epc=_epc()), + ) + + # Act + path = prop.source_path + + # Assert + assert path == "site_notes" + + +def test_recency_tie_break_newer_site_notes_win_over_older_epc() -> None: + # Arrange — EPC inspected 2023-12-01; survey is newer + prop = Property( + identity=_identity(), + epc=_epc(), + site_notes=SiteNotes(surveyed_at=date(2025, 1, 1), epc=_epc()), + ) + + # Act / Assert + assert prop.source_path == "site_notes" + + +def test_recency_tie_break_older_site_notes_lose_to_newer_epc() -> None: + # Arrange — survey predates the EPC's inspection date + prop = Property( + identity=_identity(), + epc=_epc(), + site_notes=SiteNotes(surveyed_at=date(2020, 1, 1), epc=_epc()), + ) + + # Act / Assert + assert prop.source_path == "epc_with_overlay" + + +def test_effective_epc_follows_the_selected_source_path() -> None: + # Arrange + survey_epc = _epc() + public_epc = _epc() + site_notes_property = Property( + identity=_identity(), + site_notes=SiteNotes(surveyed_at=date(2025, 1, 1), epc=survey_epc), + ) + epc_property = Property(identity=_identity(), epc=public_epc) + + # Act / Assert + assert site_notes_property.effective_epc is survey_epc + assert epc_property.effective_epc is public_epc + + +def test_property_with_no_source_raises() -> None: + # Arrange + prop = Property(identity=_identity()) + + # Act / Assert + try: + _ = prop.source_path + except ValueError: + pass + else: # pragma: no cover + raise AssertionError("expected ValueError when no source is present") + + +def test_properties_collection_iterates_and_filters() -> None: + # Arrange + with_epc = Property(identity=_identity(), epc=_epc()) + without = Property(identity=_identity()) + properties = Properties([with_epc, without]) + + # Act + with_source = properties.filter(lambda p: p.epc is not None) + + # Assert + assert len(properties) == 2 + assert list(properties) == [with_epc, without] + assert len(with_source) == 1 + assert list(with_source) == [with_epc] diff --git a/tests/repositories/property/__init__.py b/tests/repositories/property/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/repositories/property/test_property_repository.py b/tests/repositories/property/test_property_repository.py new file mode 100644 index 00000000..2456a670 --- /dev/null +++ b/tests/repositories/property/test_property_repository.py @@ -0,0 +1,49 @@ +"""PropertyRepository hydrates the aggregate whole from the property row + EPC slice.""" + +from __future__ import annotations + +import json +from pathlib import Path +from typing import Any + +from sqlalchemy import Engine +from sqlmodel import Session + +from datatypes.epc.domain.mapper import EpcPropertyDataMapper +from infrastructure.postgres.property_table import PropertyRow +from repositories.epc.epc_postgres_repository import EpcPostgresRepository +from repositories.property.property_postgres_repository import ( + PropertyPostgresRepository, +) + +_JSON_SAMPLES = Path(__file__).resolve().parents[3] / "backend/epc_api/json_samples" + + +def test_get_hydrates_identity_and_epc_slice(db_engine: Engine) -> None: + # Arrange + raw: dict[str, Any] = json.loads( + (_JSON_SAMPLES / "RdSAP-Schema-21.0.0" / "epc.json").read_text() + ) + epc = EpcPropertyDataMapper.from_api_response(raw) + with Session(db_engine) as session: + row = PropertyRow( + portfolio_id=7, postcode="A0 0AA", address="1 Some Street", uprn=12345 + ) + session.add(row) + session.commit() + property_id = row.id + assert property_id is not None + EpcPostgresRepository(session).save(epc, property_id=property_id) + session.commit() + + # Act + with Session(db_engine) as session: + repo = PropertyPostgresRepository(session, EpcPostgresRepository(session)) + prop = repo.get(property_id) + + # Assert + assert prop.identity.portfolio_id == 7 + assert prop.identity.uprn == 12345 + assert prop.epc == epc + assert prop.source_path == "epc_with_overlay" + assert prop.effective_epc == epc From caee4de2f45433cbcfb3faaaf536ed8c4b99c139 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sat, 30 May 2026 19:44:29 +0000 Subject: [PATCH 05/18] feat(ingestion): relocate EpcClientService to infrastructure + SolarRepo (#1133) Move the EpcClientService package (client + _retry + exceptions + tests) from the dying backend/ tree to infrastructure/epc_client/ as the New-EPC-API Fetcher; update the two callers (address2UPRN, a script). All 14 client tests pass. Add SolarRepository port + SolarPostgresRepository persisting Google Solar building insights as JSONB (solar_building_insights table), one row per Property. The EPC repo half of this slice already landed in #1129. pyright strict clean. Co-Authored-By: Claude Opus 4.8 --- backend/address2UPRN/main.py | 2 +- backend/epc_client/__init__.py | 3 -- infrastructure/epc_client/__init__.py | 3 ++ .../epc_client/_retry.py | 2 +- .../epc_client/epc_client_service.py | 4 +- .../epc_client/exceptions.py | 0 .../epc_client/tests/__init__.py | 0 .../epc_client/tests/conftest.py | 2 +- .../epc_client/tests/test_client.py | 14 +++---- .../tests/test_mapper_dispatcher.py | 0 infrastructure/postgres/solar_table.py | 22 ++++++++++ repositories/solar/__init__.py | 0 .../solar/solar_postgres_repository.py | 35 ++++++++++++++++ repositories/solar/solar_repository.py | 19 +++++++++ scripts/fetch_cohort2_api_jsons.py | 6 +-- tests/repositories/solar/__init__.py | 0 .../solar/test_solar_repository.py | 41 +++++++++++++++++++ 17 files changed, 135 insertions(+), 18 deletions(-) delete mode 100644 backend/epc_client/__init__.py create mode 100644 infrastructure/epc_client/__init__.py rename {backend => infrastructure}/epc_client/_retry.py (91%) rename {backend => infrastructure}/epc_client/epc_client_service.py (97%) rename {backend => infrastructure}/epc_client/exceptions.py (100%) rename {backend => infrastructure}/epc_client/tests/__init__.py (100%) rename {backend => infrastructure}/epc_client/tests/conftest.py (93%) rename {backend => infrastructure}/epc_client/tests/test_client.py (94%) rename {backend => infrastructure}/epc_client/tests/test_mapper_dispatcher.py (100%) create mode 100644 infrastructure/postgres/solar_table.py create mode 100644 repositories/solar/__init__.py create mode 100644 repositories/solar/solar_postgres_repository.py create mode 100644 repositories/solar/solar_repository.py create mode 100644 tests/repositories/solar/__init__.py create mode 100644 tests/repositories/solar/test_solar_repository.py diff --git a/backend/address2UPRN/main.py b/backend/address2UPRN/main.py index 389816cc..02eb27dc 100644 --- a/backend/address2UPRN/main.py +++ b/backend/address2UPRN/main.py @@ -19,7 +19,7 @@ from backend.address2UPRN.scoring import all_uprns_match, rank_address_similarit from datatypes.epc.domain.historic_epc_matching import ( match_addresses_for_postcode, ) -from backend.epc_client.epc_client_service import EpcClientService +from infrastructure.epc_client.epc_client_service import EpcClientService from datatypes.epc.domain.historic_epc_matching import ScoredHistoricEpc logger = setup_logger() diff --git a/backend/epc_client/__init__.py b/backend/epc_client/__init__.py deleted file mode 100644 index 84062592..00000000 --- a/backend/epc_client/__init__.py +++ /dev/null @@ -1,3 +0,0 @@ -from backend.epc_client.epc_client_service import EpcClientService - -__all__ = ["EpcClientService"] diff --git a/infrastructure/epc_client/__init__.py b/infrastructure/epc_client/__init__.py new file mode 100644 index 00000000..f8718b77 --- /dev/null +++ b/infrastructure/epc_client/__init__.py @@ -0,0 +1,3 @@ +from infrastructure.epc_client.epc_client_service import EpcClientService + +__all__ = ["EpcClientService"] diff --git a/backend/epc_client/_retry.py b/infrastructure/epc_client/_retry.py similarity index 91% rename from backend/epc_client/_retry.py rename to infrastructure/epc_client/_retry.py index bbdd0cff..d37f5e9c 100644 --- a/backend/epc_client/_retry.py +++ b/infrastructure/epc_client/_retry.py @@ -1,7 +1,7 @@ import time from typing import Callable, TypeVar -from backend.epc_client.exceptions import EpcRateLimitError +from infrastructure.epc_client.exceptions import EpcRateLimitError T = TypeVar("T") diff --git a/backend/epc_client/epc_client_service.py b/infrastructure/epc_client/epc_client_service.py similarity index 97% rename from backend/epc_client/epc_client_service.py rename to infrastructure/epc_client/epc_client_service.py index 72dbf142..16cd4d2f 100644 --- a/backend/epc_client/epc_client_service.py +++ b/infrastructure/epc_client/epc_client_service.py @@ -5,12 +5,12 @@ from typing import Any, Optional import httpx -from backend.epc_client.exceptions import ( +from infrastructure.epc_client.exceptions import ( EpcApiError, EpcNotFoundError, EpcRateLimitError, ) -from backend.epc_client._retry import call_with_retry +from infrastructure.epc_client._retry import call_with_retry from datatypes.epc.domain.epc_property_data import EpcPropertyData from datatypes.epc.domain.mapper import EpcPropertyDataMapper from datatypes.epc.search import EpcSearchResult diff --git a/backend/epc_client/exceptions.py b/infrastructure/epc_client/exceptions.py similarity index 100% rename from backend/epc_client/exceptions.py rename to infrastructure/epc_client/exceptions.py diff --git a/backend/epc_client/tests/__init__.py b/infrastructure/epc_client/tests/__init__.py similarity index 100% rename from backend/epc_client/tests/__init__.py rename to infrastructure/epc_client/tests/__init__.py diff --git a/backend/epc_client/tests/conftest.py b/infrastructure/epc_client/tests/conftest.py similarity index 93% rename from backend/epc_client/tests/conftest.py rename to infrastructure/epc_client/tests/conftest.py index 2dab138e..dc491c2b 100644 --- a/backend/epc_client/tests/conftest.py +++ b/infrastructure/epc_client/tests/conftest.py @@ -2,7 +2,7 @@ import json import pathlib import pytest -from backend.epc_client.epc_client_service import EpcClientService +from infrastructure.epc_client.epc_client_service import EpcClientService SAMPLES_DIR = pathlib.Path("backend/epc_api/json_samples") diff --git a/backend/epc_client/tests/test_client.py b/infrastructure/epc_client/tests/test_client.py similarity index 94% rename from backend/epc_client/tests/test_client.py rename to infrastructure/epc_client/tests/test_client.py index 70425a92..2b6c4099 100644 --- a/backend/epc_client/tests/test_client.py +++ b/infrastructure/epc_client/tests/test_client.py @@ -1,11 +1,11 @@ from unittest.mock import MagicMock, patch, call import pytest -from backend.epc_client.epc_client_service import EpcClientService +from infrastructure.epc_client.epc_client_service import EpcClientService from datatypes.epc.search import EpcSearchResult -from backend.epc_client.exceptions import EpcNotFoundError, EpcRateLimitError +from infrastructure.epc_client.exceptions import EpcNotFoundError, EpcRateLimitError from datatypes.epc.domain.epc_property_data import EpcPropertyData -from backend.epc_client.tests.conftest import make_search_row +from infrastructure.epc_client.tests.conftest import make_search_row def _mock_response(status_code=200, json_data=None, headers=None): @@ -78,7 +78,7 @@ def test_429_retry_after_header_drives_sleep_duration( _mock_response(200, cert_response), ] with patch("httpx.get", side_effect=responses), patch( - "backend.epc_client._retry.time.sleep" + "infrastructure.epc_client._retry.time.sleep" ) as mock_sleep: epc_service.get_by_certificate_number("CERT-001") @@ -100,7 +100,7 @@ def test_429_without_retry_after_uses_exponential_backoff( _mock_response(200, cert_response), ] with patch("httpx.get", side_effect=responses), patch( - "backend.epc_client._retry.time.sleep" + "infrastructure.epc_client._retry.time.sleep" ) as mock_sleep: epc_service.get_by_certificate_number("CERT-001") @@ -121,7 +121,7 @@ def test_429_malformed_retry_after_falls_back_to_backoff( _mock_response(200, cert_response), ] with patch("httpx.get", side_effect=responses), patch( - "backend.epc_client._retry.time.sleep" + "infrastructure.epc_client._retry.time.sleep" ) as mock_sleep: epc_service.get_by_certificate_number("CERT-001") @@ -140,7 +140,7 @@ def test_429_retry_after_capped_by_max_backoff(epc_service, rdsap_21_0_1_cert): _mock_response(200, cert_response), ] with patch("httpx.get", side_effect=responses), patch( - "backend.epc_client._retry.time.sleep" + "infrastructure.epc_client._retry.time.sleep" ) as mock_sleep: epc_service.get_by_certificate_number("CERT-001") diff --git a/backend/epc_client/tests/test_mapper_dispatcher.py b/infrastructure/epc_client/tests/test_mapper_dispatcher.py similarity index 100% rename from backend/epc_client/tests/test_mapper_dispatcher.py rename to infrastructure/epc_client/tests/test_mapper_dispatcher.py diff --git a/infrastructure/postgres/solar_table.py b/infrastructure/postgres/solar_table.py new file mode 100644 index 00000000..1563ce15 --- /dev/null +++ b/infrastructure/postgres/solar_table.py @@ -0,0 +1,22 @@ +from __future__ import annotations + +from typing import Any, ClassVar, Optional + +from sqlalchemy import Column +from sqlalchemy.dialects.postgresql import JSONB +from sqlmodel import Field, SQLModel + + +class SolarBuildingInsightsRow(SQLModel, table=True): + """Persisted Google Solar `buildingInsights` response for one Property. + + Stored as JSONB — the raw fetched insights are retained whole so the + structured projection a future SolarPotential type needs can be derived + without re-fetching. One row per Property. + """ + + __tablename__: ClassVar[str] = "solar_building_insights" # pyright: ignore[reportIncompatibleVariableOverride] + + id: Optional[int] = Field(default=None, primary_key=True) + property_id: int = Field(index=True, unique=True) + insights: dict[str, Any] = Field(sa_column=Column(JSONB, nullable=False)) diff --git a/repositories/solar/__init__.py b/repositories/solar/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/repositories/solar/solar_postgres_repository.py b/repositories/solar/solar_postgres_repository.py new file mode 100644 index 00000000..9c8a70a7 --- /dev/null +++ b/repositories/solar/solar_postgres_repository.py @@ -0,0 +1,35 @@ +from __future__ import annotations + +from typing import Any, Optional + +from sqlmodel import Session, select + +from infrastructure.postgres.solar_table import SolarBuildingInsightsRow +from repositories.solar.solar_repository import SolarRepository + + +class SolarPostgresRepository(SolarRepository): + def __init__(self, session: Session) -> None: + self._session = session + + def save(self, property_id: int, insights: dict[str, Any]) -> None: + existing = self._session.exec( + select(SolarBuildingInsightsRow).where( + SolarBuildingInsightsRow.property_id == property_id + ) + ).first() + if existing is None: + self._session.add( + SolarBuildingInsightsRow(property_id=property_id, insights=insights) + ) + else: + existing.insights = insights + self._session.add(existing) + + def get(self, property_id: int) -> Optional[dict[str, Any]]: + row = self._session.exec( + select(SolarBuildingInsightsRow).where( + SolarBuildingInsightsRow.property_id == property_id + ) + ).first() + return row.insights if row is not None else None diff --git a/repositories/solar/solar_repository.py b/repositories/solar/solar_repository.py new file mode 100644 index 00000000..aa91022a --- /dev/null +++ b/repositories/solar/solar_repository.py @@ -0,0 +1,19 @@ +from __future__ import annotations + +from abc import ABC, abstractmethod +from typing import Any, Optional + + +class SolarRepository(ABC): + """Persists and loads a Property's Google Solar building insights. + + Thin save/get over the raw fetched insights (a future SolarPotential domain + type will derive its fields from these). Written by Ingestion, read by + Baseline/Modelling — never re-fetched downstream (ADR-0003). + """ + + @abstractmethod + def save(self, property_id: int, insights: dict[str, Any]) -> None: ... + + @abstractmethod + def get(self, property_id: int) -> Optional[dict[str, Any]]: ... diff --git a/scripts/fetch_cohort2_api_jsons.py b/scripts/fetch_cohort2_api_jsons.py index f44a29ea..70211453 100644 --- a/scripts/fetch_cohort2_api_jsons.py +++ b/scripts/fetch_cohort2_api_jsons.py @@ -18,9 +18,9 @@ from dotenv import load_dotenv REPO_ROOT = Path(__file__).resolve().parents[1] sys.path.insert(0, str(REPO_ROOT)) -from backend.epc_client._retry import call_with_retry -from backend.epc_client.epc_client_service import EpcClientService -from backend.epc_client.exceptions import ( +from infrastructure.epc_client._retry import call_with_retry +from infrastructure.epc_client.epc_client_service import EpcClientService +from infrastructure.epc_client.exceptions import ( EpcApiError, EpcNotFoundError, EpcRateLimitError, diff --git a/tests/repositories/solar/__init__.py b/tests/repositories/solar/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/repositories/solar/test_solar_repository.py b/tests/repositories/solar/test_solar_repository.py new file mode 100644 index 00000000..3623ae6e --- /dev/null +++ b/tests/repositories/solar/test_solar_repository.py @@ -0,0 +1,41 @@ +"""SolarRepo round-trips Google Solar building insights for a Property.""" + +from __future__ import annotations + +from typing import Any + +from sqlalchemy import Engine +from sqlmodel import Session + +from repositories.solar.solar_postgres_repository import SolarPostgresRepository + + +def test_building_insights_round_trip(db_engine: Engine) -> None: + # Arrange + insights: dict[str, Any] = { + "name": "buildings/ChIJ", + "solarPotential": { + "maxArrayPanelsCount": 42, + "panelCapacityWatts": 250.0, + "roofSegmentStats": [{"pitchDegrees": 30.0, "azimuthDegrees": 180.0}], + }, + } + + # Act + with Session(db_engine) as session: + SolarPostgresRepository(session).save(property_id=5, insights=insights) + session.commit() + with Session(db_engine) as session: + reloaded = SolarPostgresRepository(session).get(5) + + # Assert + assert reloaded == insights + + +def test_get_returns_none_when_no_insights_stored(db_engine: Engine) -> None: + # Arrange / Act + with Session(db_engine) as session: + reloaded = SolarPostgresRepository(session).get(999) + + # Assert + assert reloaded is None From 3998ef586c38a5fdfaa57c13503038049231deaa Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sat, 30 May 2026 19:55:46 +0000 Subject: [PATCH 06/18] =?UTF-8?q?feat(geospatial):=20GeospatialRepo=20?= =?UTF-8?q?=E2=80=94=20OS=20Open-UPRN=20coordinate=20lookup=20(#1131)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add Coordinates value object + GeospatialRepository port + GeospatialS3Repository adapter. Resolves a Property's lon/lat from the partitioned Ordnance Survey Open-UPRN parquet (filename_meta -> partition -> UPRN row). A Repo, not a Fetcher (ADR-0011): no live OS API call. The parquet reader is injected, so it's unit-tested against fixture parquets with no S3/network; returns None when the UPRN is uncovered or absent. pyright strict clean. Co-Authored-By: Claude Opus 4.8 --- domain/geospatial/__init__.py | 0 domain/geospatial/coordinates.py | 15 ++++ repositories/geospatial/__init__.py | 0 .../geospatial/geospatial_repository.py | 17 +++++ .../geospatial/geospatial_s3_repository.py | 43 +++++++++++ tests/repositories/geospatial/__init__.py | 0 .../geospatial/test_geospatial_repository.py | 71 +++++++++++++++++++ 7 files changed, 146 insertions(+) create mode 100644 domain/geospatial/__init__.py create mode 100644 domain/geospatial/coordinates.py create mode 100644 repositories/geospatial/__init__.py create mode 100644 repositories/geospatial/geospatial_repository.py create mode 100644 repositories/geospatial/geospatial_s3_repository.py create mode 100644 tests/repositories/geospatial/__init__.py create mode 100644 tests/repositories/geospatial/test_geospatial_repository.py diff --git a/domain/geospatial/__init__.py b/domain/geospatial/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/domain/geospatial/coordinates.py b/domain/geospatial/coordinates.py new file mode 100644 index 00000000..a190c23d --- /dev/null +++ b/domain/geospatial/coordinates.py @@ -0,0 +1,15 @@ +from __future__ import annotations + +from dataclasses import dataclass + + +@dataclass(frozen=True) +class Coordinates: + """A WGS84 point for a Property — longitude/latitude in decimal degrees. + + Resolved from the Ordnance Survey Open-UPRN reference data and fed to the + Google Solar fetcher by the Ingestion orchestrator. + """ + + longitude: float + latitude: float diff --git a/repositories/geospatial/__init__.py b/repositories/geospatial/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/repositories/geospatial/geospatial_repository.py b/repositories/geospatial/geospatial_repository.py new file mode 100644 index 00000000..558216bb --- /dev/null +++ b/repositories/geospatial/geospatial_repository.py @@ -0,0 +1,17 @@ +from __future__ import annotations + +from abc import ABC, abstractmethod +from typing import Optional + +from domain.geospatial.coordinates import Coordinates + + +class GeospatialRepository(ABC): + """Resolves a Property's coordinates from hosted reference data by UPRN. + + A Repo, not a Fetcher (ADR-0011): it reads stored Ordnance Survey Open-UPRN + data, with no live API call. Returns None when the UPRN is not covered. + """ + + @abstractmethod + def coordinates_for(self, uprn: int) -> Optional[Coordinates]: ... diff --git a/repositories/geospatial/geospatial_s3_repository.py b/repositories/geospatial/geospatial_s3_repository.py new file mode 100644 index 00000000..c91a57e1 --- /dev/null +++ b/repositories/geospatial/geospatial_s3_repository.py @@ -0,0 +1,43 @@ +from __future__ import annotations + +from collections.abc import Callable +from typing import Optional + +import pandas as pd + +from domain.geospatial.coordinates import Coordinates +from repositories.geospatial.geospatial_repository import GeospatialRepository + +ParquetReader = Callable[[str], pd.DataFrame] + +_META_KEY = "spatial/filename_meta.parquet" + + +class GeospatialS3Repository(GeospatialRepository): + """Reads the partitioned Ordnance Survey Open-UPRN parquet dataset. + + `spatial/filename_meta.parquet` maps a UPRN range (lower/upper) to a + partition file; that partition carries `UPRN`/`LATITUDE`/`LONGITUDE`. The + parquet reader is injected so the dataset can be sourced from S3 in + production or a fixture directory in tests — the Repo holds no S3/HTTP code. + """ + + def __init__(self, read_parquet: ParquetReader) -> None: + self._read_parquet = read_parquet + + def coordinates_for(self, uprn: int) -> Optional[Coordinates]: + meta = self._read_parquet(_META_KEY) + covering = meta[(meta["lower"] <= uprn) & (meta["upper"] >= uprn)] + if covering.empty: + return None + filename = str(covering["filenames"].iloc[0]) + + partition = self._read_parquet(f"spatial/{filename}") + rows = partition[partition["UPRN"] == uprn] + if rows.empty: + return None + row = rows.iloc[0] + return Coordinates( + longitude=float(row["LONGITUDE"]), + latitude=float(row["LATITUDE"]), + ) diff --git a/tests/repositories/geospatial/__init__.py b/tests/repositories/geospatial/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/repositories/geospatial/test_geospatial_repository.py b/tests/repositories/geospatial/test_geospatial_repository.py new file mode 100644 index 00000000..4b0834c9 --- /dev/null +++ b/tests/repositories/geospatial/test_geospatial_repository.py @@ -0,0 +1,71 @@ +"""GeospatialRepo resolves a Property's coordinates from the OS Open-UPRN data. + +A reference-data lookup, not a Fetcher (ADR-0011): no live OS API call. The +adapter reads the partitioned Open-UPRN parquet via an injected reader, so the +test exercises the partition lookup + filter against real fixture parquets with +no network. +""" + +from __future__ import annotations + +from collections.abc import Callable +from pathlib import Path + +import pandas as pd + +from domain.geospatial.coordinates import Coordinates +from repositories.geospatial.geospatial_s3_repository import GeospatialS3Repository + + +def _reader(base: Path) -> Callable[[str], pd.DataFrame]: + def read(key: str) -> pd.DataFrame: + return pd.read_parquet(base / key) + + return read + + +def _write_open_uprn(base: Path) -> None: + spatial = base / "spatial" + spatial.mkdir(parents=True, exist_ok=True) + pd.DataFrame( + {"lower": [0], "upper": [100000], "filenames": ["0_100000.parquet"]} + ).to_parquet(spatial / "filename_meta.parquet") + pd.DataFrame( + { + "UPRN": [12345, 12346], + "LATITUDE": [51.5074, 51.6000], + "LONGITUDE": [-0.1278, -0.2000], + } + ).to_parquet(spatial / "0_100000.parquet") + + +def test_coordinates_for_returns_lon_lat(tmp_path: Path) -> None: + # Arrange + _write_open_uprn(tmp_path) + repo = GeospatialS3Repository(_reader(tmp_path)) + + # Act + coords = repo.coordinates_for(12345) + + # Assert + assert coords == Coordinates(longitude=-0.1278, latitude=51.5074) + + +def test_coordinates_for_returns_none_when_uprn_absent(tmp_path: Path) -> None: + # Arrange + _write_open_uprn(tmp_path) + repo = GeospatialS3Repository(_reader(tmp_path)) + + # Act / Assert — uprn inside the partition range but not present in the data + assert repo.coordinates_for(99999) is None + + +def test_coordinates_for_returns_none_when_no_partition_covers_uprn( + tmp_path: Path, +) -> None: + # Arrange + _write_open_uprn(tmp_path) + repo = GeospatialS3Repository(_reader(tmp_path)) + + # Act / Assert — uprn beyond every partition's range + assert repo.coordinates_for(500000) is None From 1696cccba60e63a33ffe8cb3d743781667c08879 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sat, 30 May 2026 19:58:21 +0000 Subject: [PATCH 07/18] feat(ingestion): IngestionOrchestrator end-to-end (#1134) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Stage 1 of the pipeline: per property, read its UPRN from the property row, fetch its EPC, resolve coordinates from the Geospatial reference repo, thread those into the Solar fetcher, and persist EPC + solar via repos. Fetchers never call each other — the orchestrator threads the coordinate (ADR-0011). Coordinates are reference data (deterministic from UPRN), resolved transiently to drive the solar fetch rather than persisted per-property. Depends on thin EpcFetcher/SolarFetcher Protocols (EpcClientService and GoogleSolarApiClient satisfy them structurally). Unit-tested against fakes — no DB, gov API, or network: persists EPC, threads coords into solar, skips UPRN-less properties and skips solar when coordinates are absent. pyright clean. Co-Authored-By: Claude Opus 4.8 --- orchestration/ingestion_orchestrator.py | 72 +++++++ .../test_ingestion_orchestrator.py | 175 ++++++++++++++++++ 2 files changed, 247 insertions(+) create mode 100644 orchestration/ingestion_orchestrator.py create mode 100644 tests/orchestration/test_ingestion_orchestrator.py diff --git a/orchestration/ingestion_orchestrator.py b/orchestration/ingestion_orchestrator.py new file mode 100644 index 00000000..a3d60d8f --- /dev/null +++ b/orchestration/ingestion_orchestrator.py @@ -0,0 +1,72 @@ +from __future__ import annotations + +from typing import Any, Optional, Protocol + +from datatypes.epc.domain.epc_property_data import EpcPropertyData +from repositories.epc.epc_repository import EpcRepository +from repositories.geospatial.geospatial_repository import GeospatialRepository +from repositories.property.property_repository import PropertyRepository +from repositories.solar.solar_repository import SolarRepository + + +class EpcFetcher(Protocol): + """The slice of the New-EPC-API client Ingestion needs (e.g. EpcClientService).""" + + def get_by_uprn(self, uprn: int) -> Optional[EpcPropertyData]: ... + + +class SolarFetcher(Protocol): + """The slice of the Google Solar client Ingestion needs (e.g. GoogleSolarApiClient).""" + + def get_building_insights( + self, longitude: float, latitude: float + ) -> dict[str, Any]: ... + + +class IngestionOrchestrator: + """Stage 1: acquire a Property's external source data and persist it. + + For each property: read its UPRN from the property row, fetch its EPC, resolve + its coordinates from the Geospatial reference Repo, thread those into the Solar + fetcher, and persist EPC + solar via repos. The orchestrator is the only place + a Fetcher and a Repo meet, and it threads the coordinate from the Repo into the + Solar Fetcher — Fetchers never call each other (ADR-0011). Coordinates are + reference data (deterministic from UPRN), so they are resolved transiently to + drive the Solar fetch rather than persisted per-property. + """ + + def __init__( + self, + *, + property_repo: PropertyRepository, + epc_fetcher: EpcFetcher, + geospatial_repo: GeospatialRepository, + solar_fetcher: SolarFetcher, + epc_repo: EpcRepository, + solar_repo: SolarRepository, + ) -> None: + self._property_repo = property_repo + self._epc_fetcher = epc_fetcher + self._geospatial_repo = geospatial_repo + self._solar_fetcher = solar_fetcher + self._epc_repo = epc_repo + self._solar_repo = solar_repo + + def run(self, property_ids: list[int]) -> None: + for property_id in property_ids: + uprn = self._property_repo.get(property_id).identity.uprn + if uprn is None: + # No UPRN to fetch against (e.g. landlord_property_id-only); a + # later Site-Notes path covers these. + continue + + epc = self._epc_fetcher.get_by_uprn(uprn) + if epc is not None: + self._epc_repo.save(epc, property_id=property_id) + + coordinates = self._geospatial_repo.coordinates_for(uprn) + if coordinates is not None: + insights = self._solar_fetcher.get_building_insights( + coordinates.longitude, coordinates.latitude + ) + self._solar_repo.save(property_id, insights) diff --git a/tests/orchestration/test_ingestion_orchestrator.py b/tests/orchestration/test_ingestion_orchestrator.py new file mode 100644 index 00000000..1c6a0f89 --- /dev/null +++ b/tests/orchestration/test_ingestion_orchestrator.py @@ -0,0 +1,175 @@ +"""IngestionOrchestrator wires fetchers + repos with no real IO (ADR-0011). + +Tested entirely against fakes: it must fetch EPC + solar, thread the +Geospatial-resolved coordinates into the solar fetcher, and persist via repos. +""" + +from __future__ import annotations + +from typing import Any, Optional + +from datatypes.epc.domain.epc_property_data import EpcPropertyData +from domain.geospatial.coordinates import Coordinates +from domain.property.property import Property, PropertyIdentity +from orchestration.ingestion_orchestrator import IngestionOrchestrator +from repositories.epc.epc_repository import EpcRepository +from repositories.geospatial.geospatial_repository import GeospatialRepository +from repositories.property.property_repository import PropertyRepository +from repositories.solar.solar_repository import SolarRepository + + +class _FakePropertyRepo(PropertyRepository): + def __init__(self, by_id: dict[int, Property]) -> None: + self._by_id = by_id + + def get(self, property_id: int) -> Property: + return self._by_id[property_id] + + +class _FakeEpcFetcher: + def __init__(self, epc: Optional[EpcPropertyData]) -> None: + self.epc = epc + self.uprns: list[int] = [] + + def get_by_uprn(self, uprn: int) -> Optional[EpcPropertyData]: + self.uprns.append(uprn) + return self.epc + + +class _FakeGeospatialRepo(GeospatialRepository): + def __init__(self, coordinates: Optional[Coordinates]) -> None: + self._coordinates = coordinates + + def coordinates_for(self, uprn: int) -> Optional[Coordinates]: + return self._coordinates + + +class _FakeSolarFetcher: + def __init__(self, insights: dict[str, Any]) -> None: + self.insights = insights + self.calls: list[tuple[float, float]] = [] + + def get_building_insights( + self, longitude: float, latitude: float + ) -> dict[str, Any]: + self.calls.append((longitude, latitude)) + return self.insights + + +class _FakeEpcRepo(EpcRepository): + def __init__(self) -> None: + self.saved: list[tuple[EpcPropertyData, Optional[int]]] = [] + + def save( + self, + data: EpcPropertyData, + property_id: Optional[int] = None, + portfolio_id: Optional[int] = None, + ) -> int: + self.saved.append((data, property_id)) + return 1 + + def get(self, epc_property_id: int) -> EpcPropertyData: # pragma: no cover + raise NotImplementedError + + def get_for_property( + self, property_id: int + ) -> Optional[EpcPropertyData]: # pragma: no cover + raise NotImplementedError + + +class _FakeSolarRepo(SolarRepository): + def __init__(self) -> None: + self.saved: list[tuple[int, dict[str, Any]]] = [] + + def save(self, property_id: int, insights: dict[str, Any]) -> None: + self.saved.append((property_id, insights)) + + def get(self, property_id: int) -> Optional[dict[str, Any]]: # pragma: no cover + raise NotImplementedError + + +def _property(uprn: Optional[int]) -> Property: + return Property( + identity=PropertyIdentity( + portfolio_id=1, postcode="A0 0AA", address="1 Some Street", uprn=uprn + ) + ) + + +def _epc() -> EpcPropertyData: + # A bare placeholder is enough — the orchestrator treats the EPC opaquely. + return object.__new__(EpcPropertyData) + + +def test_ingestion_persists_epc_and_threads_coords_into_solar() -> None: + # Arrange + epc = _epc() + insights = {"name": "buildings/X"} + coords = Coordinates(longitude=-0.1278, latitude=51.5074) + epc_repo = _FakeEpcRepo() + solar_repo = _FakeSolarRepo() + solar_fetcher = _FakeSolarFetcher(insights) + orchestrator = IngestionOrchestrator( + property_repo=_FakePropertyRepo({10: _property(uprn=12345)}), + epc_fetcher=_FakeEpcFetcher(epc), + geospatial_repo=_FakeGeospatialRepo(coords), + solar_fetcher=solar_fetcher, + epc_repo=epc_repo, + solar_repo=solar_repo, + ) + + # Act + orchestrator.run([10]) + + # Assert + assert epc_repo.saved == [(epc, 10)] + assert solar_fetcher.calls == [(-0.1278, 51.5074)] # coords threaded from repo + assert solar_repo.saved == [(10, insights)] + + +def test_ingestion_skips_property_without_uprn() -> None: + # Arrange + epc_repo = _FakeEpcRepo() + solar_repo = _FakeSolarRepo() + solar_fetcher = _FakeSolarFetcher({}) + orchestrator = IngestionOrchestrator( + property_repo=_FakePropertyRepo({10: _property(uprn=None)}), + epc_fetcher=_FakeEpcFetcher(_epc()), + geospatial_repo=_FakeGeospatialRepo(None), + solar_fetcher=solar_fetcher, + epc_repo=epc_repo, + solar_repo=solar_repo, + ) + + # Act + orchestrator.run([10]) + + # Assert — nothing fetched or persisted for a UPRN-less property + assert epc_repo.saved == [] + assert solar_repo.saved == [] + assert solar_fetcher.calls == [] + + +def test_ingestion_persists_epc_but_skips_solar_when_no_coordinates() -> None: + # Arrange + epc = _epc() + epc_repo = _FakeEpcRepo() + solar_repo = _FakeSolarRepo() + solar_fetcher = _FakeSolarFetcher({}) + orchestrator = IngestionOrchestrator( + property_repo=_FakePropertyRepo({10: _property(uprn=12345)}), + epc_fetcher=_FakeEpcFetcher(epc), + geospatial_repo=_FakeGeospatialRepo(None), + solar_fetcher=solar_fetcher, + epc_repo=epc_repo, + solar_repo=solar_repo, + ) + + # Act + orchestrator.run([10]) + + # Assert + assert epc_repo.saved == [(epc, 10)] + assert solar_fetcher.calls == [] + assert solar_repo.saved == [] From 75fbba60fc5af811b68a5bd499aba938ae99f542 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sat, 30 May 2026 20:38:15 +0000 Subject: [PATCH 08/18] feat(ara): AraFirstRunTriggerBody + ara_first_run lambda skeleton (#1130) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Stage-2 entry point for the First Run use case. Adds the `ara_first_run` Lambda package mirroring the `postcode_splitter` template, its typed trigger contract, and a stub `FirstRunPipeline`. - `AraFirstRunTriggerBody`: thin command of five fields — `task_id`, `sub_task_id` (UUID, lifecycle), `portfolio_id`, `property_ids`, `scenario_ids` (int business IDs). No `model_config` override, so Pydantic's default `extra="ignore"` lets the FastAPI backend add fields without breaking deployed lambdas. UPRNs / Scenario defs are deliberately off the event — read from source-of-truth tables. - Thin `handler.py`: validate-and-delegate only, via a named `dispatch_first_run` seam (testable without the Lambda runtime). Subtask status (in-progress/complete/failed) + CloudWatch log URL come for free from the existing `@subtask_handler()` decorator. - `FirstRunPipeline` (orchestration/) stub: `run(command)` receives the validated command. Declares a structural `FirstRunCommand` Protocol (the three business fields) that `AraFirstRunTriggerBody` satisfies, so orchestration needs no application-layer import — rhymes with the `EpcFetcher`/`SolarFetcher` Protocols on IngestionOrchestrator (ADR-0011). Full Ingestion→Baseline→Modelling composition lands in #1136. - Dockerfile / requirements.txt / local_handler/ mirror postcode_splitter. TDD: 7 new tests (trigger-body validation incl. forward-compat + id-types, pipeline seam, handler delegation). pyright strict clean. Co-Authored-By: Claude Opus 4.8 --- applications/ara_first_run/Dockerfile | 34 +++++++ applications/ara_first_run/__init__.py | 0 .../ara_first_run_trigger_body.py | 25 +++++ applications/ara_first_run/handler.py | 34 +++++++ .../local_handler/.env.local.example | 28 ++++++ .../local_handler/docker-compose.yml | 9 ++ .../local_handler/invoke_local_lambda.py | 30 ++++++ .../ara_first_run/local_handler/run_local.sh | 12 +++ applications/ara_first_run/requirements.txt | 4 + orchestration/first_run_pipeline.py | 36 +++++++ tests/applications/__init__.py | 0 tests/applications/ara_first_run/__init__.py | 0 .../test_ara_first_run_trigger_body.py | 97 +++++++++++++++++++ .../ara_first_run/test_handler.py | 44 +++++++++ .../orchestration/test_first_run_pipeline.py | 29 ++++++ 15 files changed, 382 insertions(+) create mode 100644 applications/ara_first_run/Dockerfile create mode 100644 applications/ara_first_run/__init__.py create mode 100644 applications/ara_first_run/ara_first_run_trigger_body.py create mode 100644 applications/ara_first_run/handler.py create mode 100644 applications/ara_first_run/local_handler/.env.local.example create mode 100644 applications/ara_first_run/local_handler/docker-compose.yml create mode 100755 applications/ara_first_run/local_handler/invoke_local_lambda.py create mode 100755 applications/ara_first_run/local_handler/run_local.sh create mode 100644 applications/ara_first_run/requirements.txt create mode 100644 orchestration/first_run_pipeline.py create mode 100644 tests/applications/__init__.py create mode 100644 tests/applications/ara_first_run/__init__.py create mode 100644 tests/applications/ara_first_run/test_ara_first_run_trigger_body.py create mode 100644 tests/applications/ara_first_run/test_handler.py create mode 100644 tests/orchestration/test_first_run_pipeline.py diff --git a/applications/ara_first_run/Dockerfile b/applications/ara_first_run/Dockerfile new file mode 100644 index 00000000..2d3f6515 --- /dev/null +++ b/applications/ara_first_run/Dockerfile @@ -0,0 +1,34 @@ +FROM public.ecr.aws/lambda/python:3.11 + +# Postgres host/port/database are baked into the image at build time from +# the deploy workflow's --build-arg values (GitHub Actions DEV_DB_* secrets), +# mirroring applications/postcode_splitter/Dockerfile. They map onto the +# POSTGRES_* names PostgresConfig.from_env reads. Username/password are NOT +# baked in -- Terraform injects those as Lambda env vars from Secrets Manager. +ARG DEV_DB_HOST +ARG DEV_DB_PORT +ARG DEV_DB_NAME + +ENV POSTGRES_HOST=${DEV_DB_HOST} +ENV POSTGRES_PORT=${DEV_DB_PORT} +ENV POSTGRES_DATABASE=${DEV_DB_NAME} + +WORKDIR /var/task + +COPY applications/ara_first_run/requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt + +# Copy the layered source the handler imports from. DDD-shaped packages only — +# no pandas, no legacy backend/. +COPY domain/ domain/ +COPY infrastructure/ infrastructure/ +COPY orchestration/ orchestration/ +COPY repositories/ repositories/ +COPY utilities/ utilities/ +COPY applications/ applications/ + +# Place the handler at the Lambda task root so the runtime can resolve +# ``main.handler`` without an extra package prefix. +COPY applications/ara_first_run/handler.py /var/task/main.py + +CMD ["main.handler"] diff --git a/applications/ara_first_run/__init__.py b/applications/ara_first_run/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/applications/ara_first_run/ara_first_run_trigger_body.py b/applications/ara_first_run/ara_first_run_trigger_body.py new file mode 100644 index 00000000..0f975389 --- /dev/null +++ b/applications/ara_first_run/ara_first_run_trigger_body.py @@ -0,0 +1,25 @@ +from __future__ import annotations + +from uuid import UUID + +from pydantic import BaseModel + + +class AraFirstRunTriggerBody(BaseModel): + """The SQS event the ``ara_first_run`` Lambda is triggered with. + + A thin command. ``task_id``/``sub_task_id`` drive the SubTask lifecycle (the + ``@subtask_handler`` decorator reads them); the three business fields are what + the pipeline threads downstream. UPRNs and Scenario definitions are + deliberately absent — they are read from their source-of-truth tables, not + carried on the event (issue #1130). + + No ``model_config`` override: Pydantic's default ``extra="ignore"`` lets the + FastAPI backend add fields to the payload without breaking deployed lambdas. + """ + + task_id: UUID + sub_task_id: UUID + portfolio_id: int + property_ids: list[int] + scenario_ids: list[int] diff --git a/applications/ara_first_run/handler.py b/applications/ara_first_run/handler.py new file mode 100644 index 00000000..b944227b --- /dev/null +++ b/applications/ara_first_run/handler.py @@ -0,0 +1,34 @@ +from __future__ import annotations + +from typing import Any, Protocol + +from applications.ara_first_run.ara_first_run_trigger_body import ( + AraFirstRunTriggerBody, +) +from orchestration.first_run_pipeline import FirstRunPipeline +from orchestration.task_orchestrator import TaskOrchestrator +from utilities.aws_lambda.subtask_handler import subtask_handler + + +class _RunsFirstRun(Protocol): + """The slice of FirstRunPipeline the handler delegates to.""" + + def run(self, command: AraFirstRunTriggerBody) -> None: ... + + +def dispatch_first_run(body: dict[str, Any], *, pipeline: _RunsFirstRun) -> None: + """Validate the raw event body and hand the command to the pipeline. + + The handler's entire job — kept as a named seam so it is exercised without + the Lambda runtime. No business logic lives here: validate, then delegate + (issue #1130). + """ + trigger = AraFirstRunTriggerBody.model_validate(body) + pipeline.run(trigger) + + +@subtask_handler() +def handler( + body: dict[str, Any], context: Any, task_orchestrator: TaskOrchestrator +) -> None: + dispatch_first_run(body, pipeline=FirstRunPipeline()) diff --git a/applications/ara_first_run/local_handler/.env.local.example b/applications/ara_first_run/local_handler/.env.local.example new file mode 100644 index 00000000..30924816 --- /dev/null +++ b/applications/ara_first_run/local_handler/.env.local.example @@ -0,0 +1,28 @@ +# Local-test environment for the ara_first_run Lambda. +# +# cp .env.local.example .env.local then fill in the values below. +# +# .env.local is gitignored. The container hits a REAL Postgres (the SubTask +# lifecycle store), so every value here points at infrastructure that exists. +# +# NOTE: the DDD code uses different env var names than the repo root .env. The +# mapping (root .env name -> var here) is given per section. Keep comments on +# their own lines — docker-compose's env_file parser folds a trailing "# ..." +# into the value. + +# --- Postgres (utilities/aws_lambda/default_orchestrator -> PostgresConfig.from_env) --- +# POSTGRES_HOST <- DB_HOST, PORT <- DB_PORT, USERNAME <- DB_USERNAME, +# PASSWORD <- DB_PASSWORD, DATABASE <- DB_NAME. +POSTGRES_HOST= +POSTGRES_PORT=5432 +POSTGRES_USERNAME= +POSTGRES_PASSWORD= +POSTGRES_DATABASE= +# POSTGRES_DRIVER=psycopg2 (optional; defaults to psycopg2) + +# --- AWS credentials for boto3 (used by later slices; the SubTask lifecycle +# CloudWatch URL is read from the Lambda runtime's own AWS_* env in prod) --- +AWS_ACCESS_KEY_ID= +AWS_SECRET_ACCESS_KEY= +AWS_DEFAULT_REGION=eu-west-2 +# AWS_SESSION_TOKEN= (only if using temporary/SSO credentials) diff --git a/applications/ara_first_run/local_handler/docker-compose.yml b/applications/ara_first_run/local_handler/docker-compose.yml new file mode 100644 index 00000000..09151bc6 --- /dev/null +++ b/applications/ara_first_run/local_handler/docker-compose.yml @@ -0,0 +1,9 @@ +services: + ara-first-run: + build: + context: ../../../ + dockerfile: applications/ara_first_run/Dockerfile + ports: + - "9002:8080" + env_file: + - .env.local diff --git a/applications/ara_first_run/local_handler/invoke_local_lambda.py b/applications/ara_first_run/local_handler/invoke_local_lambda.py new file mode 100755 index 00000000..9998205d --- /dev/null +++ b/applications/ara_first_run/local_handler/invoke_local_lambda.py @@ -0,0 +1,30 @@ +#!/usr/bin/env python3 +import json +import requests + +HOST = "localhost" +PORT = "9002" + +LAMBDA_URL = f"http://{HOST}:{PORT}/2015-03-31/functions/function/invocations" + +payload = { + "Records": [ + { + "body": json.dumps( + { + "task_id": "e295d89b-a7c5-4a9a-8b4e-b405fab1f298", + "sub_task_id": "f4a9944f-41f0-4a33-8669-5016ec574068", + "portfolio_id": 42, + "property_ids": [101, 102, 103], + "scenario_ids": [7, 8], + } + ) + } + ] +} + +response = requests.post(LAMBDA_URL, json=payload) + +print("Status code:", response.status_code) +print("Response:") +print(response.text) diff --git a/applications/ara_first_run/local_handler/run_local.sh b/applications/ara_first_run/local_handler/run_local.sh new file mode 100755 index 00000000..345b60ee --- /dev/null +++ b/applications/ara_first_run/local_handler/run_local.sh @@ -0,0 +1,12 @@ +#!/usr/bin/env bash +set -euo pipefail +cd "$(dirname "$0")" + +if [ ! -f .env.local ]; then + cp .env.local.example .env.local + echo "Created .env.local from the template — fill it in, then re-run." >&2 + exit 1 +fi + +docker compose build --no-cache +docker compose up --force-recreate diff --git a/applications/ara_first_run/requirements.txt b/applications/ara_first_run/requirements.txt new file mode 100644 index 00000000..6a85a255 --- /dev/null +++ b/applications/ara_first_run/requirements.txt @@ -0,0 +1,4 @@ +boto3 +pydantic +sqlmodel +psycopg2-binary diff --git a/orchestration/first_run_pipeline.py b/orchestration/first_run_pipeline.py new file mode 100644 index 00000000..1fd8839b --- /dev/null +++ b/orchestration/first_run_pipeline.py @@ -0,0 +1,36 @@ +from __future__ import annotations + +from typing import Protocol + + +class FirstRunCommand(Protocol): + """The slice of the trigger the pipeline threads downstream. + + Only the business fields — UPRNs and Scenario definitions are read from + their source-of-truth tables, not carried here. ``task_id``/``sub_task_id`` + are deliberately absent: the SubTask lifecycle is the decorator's concern, + not the pipeline's. ``AraFirstRunTriggerBody`` satisfies this structurally, + so ``orchestration`` need not import the application-layer event type. + """ + + @property + def portfolio_id(self) -> int: ... + + @property + def property_ids(self) -> list[int]: ... + + @property + def scenario_ids(self) -> list[int]: ... + + +class FirstRunPipeline: + """Composes the First Run stages end-to-end (Ingestion -> Baseline -> + Modelling), threading only ``property_ids`` between them through repos + (ADR-0011). + + Stub at this stage (#1130): ``run`` simply receives the validated command. + The real three-stage composition lands in #1136. + """ + + def run(self, command: FirstRunCommand) -> None: + return None diff --git a/tests/applications/__init__.py b/tests/applications/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/applications/ara_first_run/__init__.py b/tests/applications/ara_first_run/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/applications/ara_first_run/test_ara_first_run_trigger_body.py b/tests/applications/ara_first_run/test_ara_first_run_trigger_body.py new file mode 100644 index 00000000..5ee17396 --- /dev/null +++ b/tests/applications/ara_first_run/test_ara_first_run_trigger_body.py @@ -0,0 +1,97 @@ +from __future__ import annotations + +from uuid import UUID + +import pytest +from pydantic import ValidationError + +from applications.ara_first_run.ara_first_run_trigger_body import ( + AraFirstRunTriggerBody, +) + + +def test_validates_well_formed_body_into_typed_fields() -> None: + # Arrange + body = { + "task_id": "e295d89b-a7c5-4a9a-8b4e-b405fab1f298", + "sub_task_id": "f4a9944f-41f0-4a33-8669-5016ec574068", + "portfolio_id": 42, + "property_ids": [101, 102, 103], + "scenario_ids": [7, 8], + } + + # Act + trigger = AraFirstRunTriggerBody.model_validate(body) + + # Assert + assert trigger.task_id == UUID("e295d89b-a7c5-4a9a-8b4e-b405fab1f298") + assert trigger.sub_task_id == UUID("f4a9944f-41f0-4a33-8669-5016ec574068") + assert trigger.portfolio_id == 42 + assert trigger.property_ids == [101, 102, 103] + assert trigger.scenario_ids == [7, 8] + + +def test_ignores_unknown_extra_fields() -> None: + # Arrange — the FastAPI backend may add fields the deployed lambda predates. + body = { + "task_id": "e295d89b-a7c5-4a9a-8b4e-b405fab1f298", + "sub_task_id": "f4a9944f-41f0-4a33-8669-5016ec574068", + "portfolio_id": 42, + "property_ids": [101], + "scenario_ids": [7], + "a_field_added_later_by_the_backend": "ignore me", + } + + # Act + trigger = AraFirstRunTriggerBody.model_validate(body) + + # Assert — the unknown field is dropped, not retained or rejected. + assert not hasattr(trigger, "a_field_added_later_by_the_backend") + assert trigger.portfolio_id == 42 + + +def test_rejects_body_missing_a_required_field() -> None: + # Arrange — scenario_ids omitted. + body = { + "task_id": "e295d89b-a7c5-4a9a-8b4e-b405fab1f298", + "sub_task_id": "f4a9944f-41f0-4a33-8669-5016ec574068", + "portfolio_id": 42, + "property_ids": [101], + } + + # Act / Assert + with pytest.raises(ValidationError) as exc_info: + AraFirstRunTriggerBody.model_validate(body) + assert "scenario_ids" in str(exc_info.value) + + +def test_rejects_non_uuid_task_id() -> None: + # Arrange + body = { + "task_id": "not-a-uuid", + "sub_task_id": "f4a9944f-41f0-4a33-8669-5016ec574068", + "portfolio_id": 42, + "property_ids": [101], + "scenario_ids": [7], + } + + # Act / Assert + with pytest.raises(ValidationError) as exc_info: + AraFirstRunTriggerBody.model_validate(body) + assert "task_id" in str(exc_info.value) + + +def test_rejects_non_int_portfolio_id() -> None: + # Arrange — business IDs are integers, not strings. + body = { + "task_id": "e295d89b-a7c5-4a9a-8b4e-b405fab1f298", + "sub_task_id": "f4a9944f-41f0-4a33-8669-5016ec574068", + "portfolio_id": "not-an-int", + "property_ids": [101], + "scenario_ids": [7], + } + + # Act / Assert + with pytest.raises(ValidationError) as exc_info: + AraFirstRunTriggerBody.model_validate(body) + assert "portfolio_id" in str(exc_info.value) diff --git a/tests/applications/ara_first_run/test_handler.py b/tests/applications/ara_first_run/test_handler.py new file mode 100644 index 00000000..21e96e3d --- /dev/null +++ b/tests/applications/ara_first_run/test_handler.py @@ -0,0 +1,44 @@ +from __future__ import annotations + +from typing import Optional +from uuid import UUID + +from applications.ara_first_run.ara_first_run_trigger_body import ( + AraFirstRunTriggerBody, +) +from applications.ara_first_run.handler import dispatch_first_run +from orchestration.first_run_pipeline import FirstRunCommand + + +class _SpyPipeline: + """Records the command it is asked to run, instead of composing stages.""" + + def __init__(self) -> None: + self.received: Optional[FirstRunCommand] = None + + def run(self, command: FirstRunCommand) -> None: + self.received = command + + +def test_validates_the_event_body_and_delegates_the_command_to_the_pipeline() -> None: + # Arrange — a raw SQS body, as the decorator hands it to the handler. + body = { + "task_id": "e295d89b-a7c5-4a9a-8b4e-b405fab1f298", + "sub_task_id": "f4a9944f-41f0-4a33-8669-5016ec574068", + "portfolio_id": 42, + "property_ids": [101, 102], + "scenario_ids": [7], + } + pipeline = _SpyPipeline() + + # Act + dispatch_first_run(body, pipeline=pipeline) + + # Assert — the raw body was validated into the typed trigger and handed + # straight on, untouched. + received = pipeline.received + assert isinstance(received, AraFirstRunTriggerBody) + assert received.task_id == UUID("e295d89b-a7c5-4a9a-8b4e-b405fab1f298") + assert received.portfolio_id == 42 + assert received.property_ids == [101, 102] + assert received.scenario_ids == [7] diff --git a/tests/orchestration/test_first_run_pipeline.py b/tests/orchestration/test_first_run_pipeline.py new file mode 100644 index 00000000..4b685bb2 --- /dev/null +++ b/tests/orchestration/test_first_run_pipeline.py @@ -0,0 +1,29 @@ +from __future__ import annotations + +from dataclasses import dataclass + +from orchestration.first_run_pipeline import FirstRunCommand, FirstRunPipeline + + +@dataclass +class _FakeCommand: + """A stand-in for AraFirstRunTriggerBody — structurally a FirstRunCommand.""" + + portfolio_id: int + property_ids: list[int] + scenario_ids: list[int] + + +def test_run_accepts_the_validated_command() -> None: + # Arrange + command: FirstRunCommand = _FakeCommand( + portfolio_id=42, property_ids=[101, 102], scenario_ids=[7] + ) + pipeline = FirstRunPipeline() + + # Act + result = pipeline.run(command) + + # Assert — the stub simply receives the command; full Ingestion -> Baseline + # -> Modelling composition lands in #1136. + assert result is None From 76717dfc3a98581f10b6766546a071a2fd1e01ba Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sat, 30 May 2026 21:21:34 +0000 Subject: [PATCH 09/18] feat(baseline): BaselineOrchestrator + BaselinePerformance aggregate (#1135) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Stage 2 of First Run. Establishes each Property's Baseline Performance from persisted source data and writes it back — reads only from repos, never a Fetcher or HTTP (ADR-0003), so it is byte-identical whether Ingestion ran milliseconds ago or last week. Domain (`domain/baseline/`): - `Performance` VO — the four rated quantities: SAP / EPC Band / CO2 / Primary Energy Intensity. `lodged_performance(epc)` reads them off the EPC's recorded fields (PEUI = `energy_consumption_current`). - `BaselinePerformance` (ADR-0004) — the paired `lodged` + `effective` Performance + `rebaseline_reason`, plus the no-derivation part of the energy block (`space_heating_kwh` / `water_heating_kwh`, off the RHI, deterministic per ADR-0006). Both halves always populated. - `Rebaseliner` port + `StubRebaseliner`: the re-score-on-override seam (ADR-0011). SAP10 certs pass through (effective == lodged, reason "none"); a pre-SAP10 cert raises `RebaselineNotImplemented` rather than fabricating a plausible-but-wrong "none" — ML rebaselining is not wired yet. Mirrors the repo's strict-raise culture. Persistence: new `BaselineRepository` port + `BaselinePostgresRepository` + flat-column `baseline_performance` SQLModel (one row per Property). Per ADR-0004's amendment this is a standalone table, NOT columns on the retiring `property_details_epc`. Production migration is FE-owned (Drizzle) — docs/migrations/baseline-performance-table.md. Docs (grill-with-docs): corrected CONTEXT.md Lodged/Effective Performance to Primary Energy Intensity (the term collided with its own _Avoid_ entry under "heat demand") + fixed stale RHI field names; amended ADR-0004 Consequences for the standalone-table decision. Fuel split + bills (rest of EPC Energy Derivation) deferred to a follow-up — they need a Fuel Rates source (Ofgem-cap ETL) that does not exist yet. TDD, one test -> one impl: 7 tests (lodged read, rebaseliner pass-through + raise, orchestrator establish-and-persist + pre-SAP10 raise, Postgres round-trip + absent). pyright strict clean; AAA layout. Co-Authored-By: Claude Opus 4.8 --- CONTEXT.md | 6 +- ...eline-performance-lodged-effective-pair.md | 30 ++++- docs/migrations/baseline-performance-table.md | 43 +++++++ domain/baseline/__init__.py | 0 domain/baseline/baseline_performance.py | 28 +++++ domain/baseline/performance.py | 53 +++++++++ domain/baseline/rebaseliner.py | 60 ++++++++++ .../postgres/baseline_performance_table.py | 77 ++++++++++++ orchestration/baseline_orchestrator.py | 63 ++++++++++ repositories/baseline/__init__.py | 0 .../baseline/baseline_postgres_repository.py | 36 ++++++ repositories/baseline/baseline_repository.py | 23 ++++ tests/domain/baseline/__init__.py | 0 tests/domain/baseline/test_performance.py | 34 ++++++ tests/domain/baseline/test_rebaseliner.py | 48 ++++++++ .../test_baseline_orchestrator.py | 110 ++++++++++++++++++ tests/repositories/baseline/__init__.py | 0 .../test_baseline_postgres_repository.py | 53 +++++++++ 18 files changed, 660 insertions(+), 4 deletions(-) create mode 100644 docs/migrations/baseline-performance-table.md create mode 100644 domain/baseline/__init__.py create mode 100644 domain/baseline/baseline_performance.py create mode 100644 domain/baseline/performance.py create mode 100644 domain/baseline/rebaseliner.py create mode 100644 infrastructure/postgres/baseline_performance_table.py create mode 100644 orchestration/baseline_orchestrator.py create mode 100644 repositories/baseline/__init__.py create mode 100644 repositories/baseline/baseline_postgres_repository.py create mode 100644 repositories/baseline/baseline_repository.py create mode 100644 tests/domain/baseline/__init__.py create mode 100644 tests/domain/baseline/test_performance.py create mode 100644 tests/domain/baseline/test_rebaseliner.py create mode 100644 tests/orchestration/test_baseline_orchestrator.py create mode 100644 tests/repositories/baseline/__init__.py create mode 100644 tests/repositories/baseline/test_baseline_postgres_repository.py diff --git a/CONTEXT.md b/CONTEXT.md index b99a1ac6..345e5ce1 100644 --- a/CONTEXT.md +++ b/CONTEXT.md @@ -90,11 +90,11 @@ A Property's current performance aggregate, holding both Lodged Performance and _Avoid_: baseline predictions, predicted baseline, rebaselined values **Lodged Performance**: -The SAP / EPC Band / carbon emissions / heat demand recorded on the public EPC (or the Site Notes' as-surveyed values when Site Notes are the source) — unmodified by modelling. The half of Baseline Performance that says "what the government register says about this Property". +The SAP / EPC Band / carbon emissions / Primary Energy Intensity recorded on the public EPC (or the Site Notes' as-surveyed values when Site Notes are the source) — unmodified by modelling. The half of Baseline Performance that says "what the government register says about this Property". _Avoid_: original performance, raw EPC values, recorded baseline **Effective Performance**: -The SAP / EPC Band / carbon emissions / heat demand the modelling pipeline actually scored against — equal to Lodged Performance when no Rebaselining trigger fires, replaced by ML output when triggered. The half of Baseline Performance that says "what we modelled". +The SAP / EPC Band / carbon emissions / Primary Energy Intensity the modelling pipeline actually scored against — equal to Lodged Performance when no Rebaselining trigger fires, replaced by ML output when triggered. The half of Baseline Performance that says "what we modelled". _Avoid_: modelled performance, rebaselined performance (only correct when rebaselining ran), scored values **Calculated SAP10 Performance**: @@ -118,7 +118,7 @@ The process that translates an Optimised Package into cert-field changes and pro _Avoid_: measure overrides (rejected during ADR-0009 grill — phantom mid-layer), package applier, retrofit simulator **EPC Energy Derivation**: -The process that derives a Property's fuel split and annual bills from its space heating kWh and hot water kWh values plus the heating fuel deduced from SAP fields. kWh values themselves come from the EPC's recorded fields (`renewable_heat_incentive.space_heating_existing_dwelling` and `.water_heating`) for SAP10 baselines, or from ML prediction when Rebaselining fires or when scoring a post-measure state. Bills are computed deterministically from delivered kWh × current Fuel Rates + standing charges + SEG credits. The UCL Correction is no longer applied at runtime — it is folded into ML training labels (see [[epc-ml-transform]] and ADR-0007). +The process that derives a Property's fuel split and annual bills from its space heating kWh and hot water kWh values plus the heating fuel deduced from SAP fields. kWh values themselves come from the EPC's recorded fields (`renewable_heat_incentive.space_heating_kwh` and `.water_heating_kwh`) for SAP10 baselines, or from ML prediction when Rebaselining fires or when scoring a post-measure state. Bills are computed deterministically from delivered kWh × current Fuel Rates + standing charges + SEG credits. The UCL Correction is no longer applied at runtime — it is folded into ML training labels (see [[epc-ml-transform]] and ADR-0007). _Avoid_: kWh prediction (kWh is now an ML target — see Rebaselining), baseline kWh, energy estimation **UCL Correction**: diff --git a/docs/adr/0004-baseline-performance-lodged-effective-pair.md b/docs/adr/0004-baseline-performance-lodged-effective-pair.md index 9cedcbc7..ba275473 100644 --- a/docs/adr/0004-baseline-performance-lodged-effective-pair.md +++ b/docs/adr/0004-baseline-performance-lodged-effective-pair.md @@ -8,6 +8,34 @@ The cost is a wider row + the discipline that **every** `BaselinePerformance` po ## Consequences -- Schema migration: `property_details_epc` (or its successor) carries 8 fields instead of 4 for the SAP-equivalent block. - Reversing this means rewriting every consumer that has learned to read both values. Hard to roll back once the FE depends on the pair. - The rebaseline trigger has two reasons (`pre_sap10`, `physical_state_changed`, or `both`) — store the reason alongside so we know *why* a property was rebaselined when debugging. + +### Amendment (2026-05-30, #1135): standalone `baseline_performance` table + +The original consequence read *"`property_details_epc` (or its successor) carries 8 fields +instead of 4 for the SAP-equivalent block"* — i.e. the pair as columns on the EPC-details table. +That is superseded. `property_details_epc` is being **retired**: it is too tightly coupled to the +schema of the legacy EPC API, which the Ara rebuild is moving off. So the pair has no home there. + +`BaselinePerformance` instead persists as its **own standalone `baseline_performance` table, one +row per Property**, behind a dedicated `BaselineRepository` port (`save` / `get_for_property`), +mirroring the EPC slice's repo shape. This is the cleaner model regardless of the retirement: +`BaselinePerformance` is its own aggregate (a Property's current performance), not a detail of any +single EPC. + +The row is **flat typed columns**, not a JSONB blob, because the FE both surfaces the block and +queries the lodged-vs-effective pair: `lodged_{sap_score, epc_band, co2_emissions, +primary_energy_intensity}`, the four `effective_*` mirrors, `rebaseline_reason`, and (for the part +of the energy block that needs no derivation) `space_heating_kwh` / `water_heating_kwh`. The +fourth paired quantity is **Primary Energy Intensity**, not "heat demand" — see CONTEXT.md +(the prose above predates that term being sharpened). + +Fuel split and bills — the rest of the EPC Energy Derivation block — are **deferred to a +follow-up**: bills require a current Fuel Rates source (Ofgem-cap ETL) that does not yet exist, and +fuel split is produced by the same `EpcEnergyDerivationService`, so the two land together rather +than churning the table twice. + +The SQLModel row is defined in `infrastructure/postgres/` so the ephemeral-Postgres tests build it +via `create_all`; the production migration is FE-owned (Drizzle ORM) and tracked in +`docs/migrations/`. diff --git a/docs/migrations/baseline-performance-table.md b/docs/migrations/baseline-performance-table.md new file mode 100644 index 00000000..24e06179 --- /dev/null +++ b/docs/migrations/baseline-performance-table.md @@ -0,0 +1,43 @@ +# `baseline_performance` table — FE-owned migration + +**Context:** Slice 6 (Hestia-Homes/Model#1135) of the `ara_first_run` rebuild. The +`BaselineOrchestrator` establishes a Property's **Baseline Performance** (ADR-0004) and persists it +via a new `BaselineRepository` port. This is a brand-new table — no predecessor. + +Per ADR-0004's amendment, the lodged/effective pair does **not** land on `property_details_epc` +(which is being retired as too coupled to the legacy EPC-API schema). It lands here, as its own +aggregate's table. + +The SQLModel row is defined in `infrastructure/postgres/` so the ephemeral-Postgres tests build it +via `SQLModel.metadata.create_all`. The **production migration is FE-owned (Drizzle ORM)** — a +straight lift-and-shift of the columns below. + +## `baseline_performance` — one row per Property + +| Column | Type | Notes | +|---|---|---| +| `id` | serial PK | | +| `property_id` | int, FK → `property.id`, **unique** | one Baseline Performance per Property | +| `lodged_sap_score` | int | Lodged Performance — gov register, off the Effective EPC | +| `lodged_epc_band` | text | the `Epc` enum, stored as its string value (e.g. `"C"`) | +| `lodged_co2_emissions` | float | | +| `lodged_primary_energy_intensity` | int | PEUI (kWh/m²/yr); **not** "heat demand" — see CONTEXT.md | +| `effective_sap_score` | int | Effective Performance — what modelling scored against | +| `effective_epc_band` | text | | +| `effective_co2_emissions` | float | | +| `effective_primary_energy_intensity` | int | | +| `rebaseline_reason` | text | `none` \| `pre_sap10` \| `physical_state_changed` \| `both` | +| `space_heating_kwh` | float | off `renewable_heat_incentive`; deterministic (ADR-0006) | +| `water_heating_kwh` | float | off `renewable_heat_incentive` | + +This slice has no ML rebaselining, so `effective_* == lodged_*` and `rebaseline_reason = 'none'` +for every row written (a pre-SAP10 cert raises rather than persisting a wrong-but-plausible row — +see #1135). The `effective_*` columns exist now so the table shape is stable when ML lands. + +## Deferred (follow-up — EPC Energy Derivation + Fuel Rates) + +`fuel_split` and `bills` are **not** in this table yet. They are produced by +`EpcEnergyDerivationService`, which needs a current **Fuel Rates** source (Ofgem-cap ETL) that does +not exist yet. They land together in the follow-up so this table is not migrated twice. Likely +shape: a `bills`-style block (per-fuel kWh + standing charge + SEG) — to be specified in that +slice's migration note. diff --git a/domain/baseline/__init__.py b/domain/baseline/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/domain/baseline/baseline_performance.py b/domain/baseline/baseline_performance.py new file mode 100644 index 00000000..8db6e05d --- /dev/null +++ b/domain/baseline/baseline_performance.py @@ -0,0 +1,28 @@ +from __future__ import annotations + +from dataclasses import dataclass + +from domain.baseline.performance import Performance +from domain.baseline.rebaseliner import RebaselineReason + + +@dataclass(frozen=True) +class BaselinePerformance: + """A Property's current performance aggregate (CONTEXT.md, ADR-0004). + + Holds both halves — ``lodged`` (what the gov register says) and + ``effective`` (what the modelling pipeline scored against) — plus the + ``rebaseline_reason`` recording *why* they differ (``"none"`` when equal). + Both halves are always populated, even when equal. + + Carries the part of the energy block that needs no derivation: annual + ``space_heating_kwh`` / ``water_heating_kwh`` read off the EPC's RHI. + Fuel split and bills (the rest of EPC Energy Derivation) land in a + follow-up once a Fuel Rates source exists. + """ + + lodged: Performance + effective: Performance + rebaseline_reason: RebaselineReason + space_heating_kwh: float + water_heating_kwh: float diff --git a/domain/baseline/performance.py b/domain/baseline/performance.py new file mode 100644 index 00000000..1db38846 --- /dev/null +++ b/domain/baseline/performance.py @@ -0,0 +1,53 @@ +from __future__ import annotations + +from dataclasses import dataclass +from typing import Optional, TypeVar + +from datatypes.epc.domain.epc import Epc +from datatypes.epc.domain.epc_property_data import EpcPropertyData + +_T = TypeVar("_T") + + +@dataclass(frozen=True) +class Performance: + """One half of a Baseline Performance — a single set of SAP10 figures. + + The four quantities a Property is rated on (CONTEXT.md: Lodged / Effective + Performance): SAP score, EPC Band, carbon emissions, and Primary Energy + Intensity. Used for both the Lodged half (off the gov register) and the + Effective half (what the modelling pipeline scored against). + """ + + sap_score: int + epc_band: Epc + co2_emissions: float + primary_energy_intensity: int + + +def _require(value: Optional[_T], field: str) -> _T: + if value is None: + raise ValueError( + f"EPC is missing recorded performance field {field!r}; " + "cannot establish Lodged Performance" + ) + return value + + +def lodged_performance(epc: EpcPropertyData) -> Performance: + """The Lodged Performance recorded on an EPC — what the gov register says. + + Reads the four rated quantities straight off the EPC's recorded fields + (CONTEXT.md: Primary Energy Intensity is recorded as `energy_consumption_current`). + Unmodified by modelling. + """ + return Performance( + sap_score=_require(epc.energy_rating_current, "energy_rating_current"), + epc_band=_require( + epc.current_energy_efficiency_band, "current_energy_efficiency_band" + ), + co2_emissions=_require(epc.co2_emissions_current, "co2_emissions_current"), + primary_energy_intensity=_require( + epc.energy_consumption_current, "energy_consumption_current" + ), + ) diff --git a/domain/baseline/rebaseliner.py b/domain/baseline/rebaseliner.py new file mode 100644 index 00000000..40034a58 --- /dev/null +++ b/domain/baseline/rebaseliner.py @@ -0,0 +1,60 @@ +from __future__ import annotations + +from abc import ABC, abstractmethod +from typing import Literal + +from datatypes.epc.domain.epc_property_data import EpcPropertyData +from domain.baseline.performance import Performance + +RebaselineReason = Literal["none", "pre_sap10", "physical_state_changed", "both"] + +# The SAP spec version below which a cert's recorded scores reflect a superseded +# methodology and must be ML-rebaselined (CONTEXT.md: Rebaselining). +_SAP10_FLOOR = 10.0 + + +class RebaselineNotImplemented(Exception): + """A Property needs Rebaselining, but the ML adapter is not wired yet. + + Raised rather than silently recording ``reason="none"`` for a property that + genuinely needs rebaselining — a plausible-but-wrong baseline is expensive to + discover downstream. Surfaces how much of a First Run cohort the pipeline can + handle today (#1135). + """ + + +class Rebaseliner(ABC): + """Produces a Property's Effective Performance from its Effective EPC. + + Rebaselining (CONTEXT.md) re-predicts the rated quantities via ML when the + EPC was lodged pre-SAP10 or its physical state diverged from the lodged EPC; + otherwise Effective Performance equals Lodged. Injected into the + BaselineOrchestrator (ADR-0011) so the ML adapter can swap in without + touching the orchestrator, and so the single-property re-score-on-override + flow reuses the same port. + """ + + @abstractmethod + def rebaseline( + self, effective_epc: EpcPropertyData, lodged: Performance + ) -> tuple[Performance, RebaselineReason]: ... + + +class StubRebaseliner(Rebaseliner): + """The no-ML stub for the validation phase. + + SAP10 certs pass through untouched — Effective Performance equals Lodged, + reason ``"none"``. A pre-SAP10 cert genuinely needs ML rebaselining, which is + not implemented yet (#1135), so it raises rather than fabricating a "none". + """ + + def rebaseline( + self, effective_epc: EpcPropertyData, lodged: Performance + ) -> tuple[Performance, RebaselineReason]: + sap_version = effective_epc.sap_version + if sap_version is not None and sap_version < _SAP10_FLOOR: + raise RebaselineNotImplemented( + f"Property needs rebaselining (pre-SAP10 cert, sap_version=" + f"{sap_version}); ML rebaselining is not implemented yet" + ) + return lodged, "none" diff --git a/infrastructure/postgres/baseline_performance_table.py b/infrastructure/postgres/baseline_performance_table.py new file mode 100644 index 00000000..fad4be9d --- /dev/null +++ b/infrastructure/postgres/baseline_performance_table.py @@ -0,0 +1,77 @@ +from __future__ import annotations + +from typing import ClassVar, Optional, cast + +from sqlmodel import Field, SQLModel + +from datatypes.epc.domain.epc import Epc +from domain.baseline.baseline_performance import BaselinePerformance +from domain.baseline.performance import Performance +from domain.baseline.rebaseliner import RebaselineReason + + +class BaselinePerformanceModel(SQLModel, table=True): + """The ``baseline_performance`` row — one per Property (ADR-0004). + + Flat typed columns (not a JSONB blob) so the FE can both surface the block + and query the lodged-vs-effective pair. The production migration is FE-owned + (Drizzle); see docs/migrations/baseline-performance-table.md. + """ + + __tablename__: ClassVar[str] = "baseline_performance" # pyright: ignore[reportIncompatibleVariableOverride] + + id: Optional[int] = Field(default=None, primary_key=True) + property_id: int = Field(unique=True, index=True) + + lodged_sap_score: int + lodged_epc_band: str + lodged_co2_emissions: float + lodged_primary_energy_intensity: int + + effective_sap_score: int + effective_epc_band: str + effective_co2_emissions: float + effective_primary_energy_intensity: int + + rebaseline_reason: str + + space_heating_kwh: float + water_heating_kwh: float + + @classmethod + def from_domain( + cls, baseline: BaselinePerformance, property_id: int + ) -> "BaselinePerformanceModel": + return cls( + property_id=property_id, + lodged_sap_score=baseline.lodged.sap_score, + lodged_epc_band=baseline.lodged.epc_band.value, + lodged_co2_emissions=baseline.lodged.co2_emissions, + lodged_primary_energy_intensity=baseline.lodged.primary_energy_intensity, + effective_sap_score=baseline.effective.sap_score, + effective_epc_band=baseline.effective.epc_band.value, + effective_co2_emissions=baseline.effective.co2_emissions, + effective_primary_energy_intensity=baseline.effective.primary_energy_intensity, + rebaseline_reason=baseline.rebaseline_reason, + space_heating_kwh=baseline.space_heating_kwh, + water_heating_kwh=baseline.water_heating_kwh, + ) + + def to_domain(self) -> BaselinePerformance: + return BaselinePerformance( + lodged=Performance( + sap_score=self.lodged_sap_score, + epc_band=Epc(self.lodged_epc_band), + co2_emissions=self.lodged_co2_emissions, + primary_energy_intensity=self.lodged_primary_energy_intensity, + ), + effective=Performance( + sap_score=self.effective_sap_score, + epc_band=Epc(self.effective_epc_band), + co2_emissions=self.effective_co2_emissions, + primary_energy_intensity=self.effective_primary_energy_intensity, + ), + rebaseline_reason=cast(RebaselineReason, self.rebaseline_reason), + space_heating_kwh=self.space_heating_kwh, + water_heating_kwh=self.water_heating_kwh, + ) diff --git a/orchestration/baseline_orchestrator.py b/orchestration/baseline_orchestrator.py new file mode 100644 index 00000000..298e3683 --- /dev/null +++ b/orchestration/baseline_orchestrator.py @@ -0,0 +1,63 @@ +from __future__ import annotations + +from datatypes.epc.domain.epc_property_data import ( + EpcPropertyData, + RenewableHeatIncentive, +) +from domain.baseline.baseline_performance import BaselinePerformance +from domain.baseline.performance import lodged_performance +from domain.baseline.rebaseliner import Rebaseliner +from repositories.baseline.baseline_repository import BaselineRepository +from repositories.property.property_repository import PropertyRepository + + +class BaselineOrchestrator: + """Stage 2: establish each Property's Baseline Performance and persist it. + + For each property: hydrate the Property aggregate via PropertyRepo, resolve + its Effective EPC, read Lodged Performance off it, run the Rebaseliner to + produce Effective Performance (equal to Lodged unless a trigger fires), and + persist the pair plus the deterministic kWh. + + Reads only from repos — never a Fetcher or HTTP (ADR-0003). That is what + makes it byte-identical whether Ingestion ran milliseconds ago (First Run) + or last week (single-property review). The injected Rebaseliner is the + re-score-on-override seam: the future single-property flow re-runs the same + step after a Landlord Override changes the Effective EPC (ADR-0011). + """ + + def __init__( + self, + *, + property_repo: PropertyRepository, + rebaseliner: Rebaseliner, + baseline_repo: BaselineRepository, + ) -> None: + self._property_repo = property_repo + self._rebaseliner = rebaseliner + self._baseline_repo = baseline_repo + + def run(self, property_ids: list[int]) -> None: + for property_id in property_ids: + effective_epc = self._property_repo.get(property_id).effective_epc + lodged = lodged_performance(effective_epc) + effective, reason = self._rebaseliner.rebaseline(effective_epc, lodged) + rhi = _require_rhi(effective_epc) + baseline = BaselinePerformance( + lodged=lodged, + effective=effective, + rebaseline_reason=reason, + space_heating_kwh=rhi.space_heating_kwh, + water_heating_kwh=rhi.water_heating_kwh, + ) + self._baseline_repo.save(baseline, property_id) + + +def _require_rhi(epc: EpcPropertyData) -> RenewableHeatIncentive: + rhi = epc.renewable_heat_incentive + if rhi is None: + raise ValueError( + "Effective EPC is missing renewable_heat_incentive; cannot read " + "baseline space-heating / hot-water kWh" + ) + return rhi diff --git a/repositories/baseline/__init__.py b/repositories/baseline/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/repositories/baseline/baseline_postgres_repository.py b/repositories/baseline/baseline_postgres_repository.py new file mode 100644 index 00000000..5a2c7bb8 --- /dev/null +++ b/repositories/baseline/baseline_postgres_repository.py @@ -0,0 +1,36 @@ +from __future__ import annotations + +from typing import Optional + +from sqlmodel import Session, select + +from domain.baseline.baseline_performance import BaselinePerformance +from infrastructure.postgres.baseline_performance_table import ( + BaselinePerformanceModel, +) +from repositories.baseline.baseline_repository import BaselineRepository + + +class BaselinePostgresRepository(BaselineRepository): + """Maps BaselinePerformance to/from the ``baseline_performance`` table.""" + + def __init__(self, session: Session) -> None: + self._session = session + + def save(self, baseline: BaselinePerformance, property_id: int) -> int: + row = BaselinePerformanceModel.from_domain(baseline, property_id) + self._session.add(row) + self._session.flush() + if row.id is None: + raise ValueError("baseline_performance row did not receive an id") + return row.id + + def get_for_property( + self, property_id: int + ) -> Optional[BaselinePerformance]: + row = self._session.exec( + select(BaselinePerformanceModel).where( + BaselinePerformanceModel.property_id == property_id + ) + ).first() + return row.to_domain() if row is not None else None diff --git a/repositories/baseline/baseline_repository.py b/repositories/baseline/baseline_repository.py new file mode 100644 index 00000000..67e430f5 --- /dev/null +++ b/repositories/baseline/baseline_repository.py @@ -0,0 +1,23 @@ +from __future__ import annotations + +from abc import ABC, abstractmethod +from typing import Optional + +from domain.baseline.baseline_performance import BaselinePerformance + + +class BaselineRepository(ABC): + """Persists and loads a Property's Baseline Performance. + + One Baseline Performance per Property (ADR-0004: persisted as one row). The + Postgres adapter writes the standalone ``baseline_performance`` table — not + columns on the retiring ``property_details_epc``. + """ + + @abstractmethod + def save(self, baseline: BaselinePerformance, property_id: int) -> int: ... + + @abstractmethod + def get_for_property( + self, property_id: int + ) -> Optional[BaselinePerformance]: ... diff --git a/tests/domain/baseline/__init__.py b/tests/domain/baseline/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/domain/baseline/test_performance.py b/tests/domain/baseline/test_performance.py new file mode 100644 index 00000000..6e8f080e --- /dev/null +++ b/tests/domain/baseline/test_performance.py @@ -0,0 +1,34 @@ +from __future__ import annotations + +from datatypes.epc.domain.epc import Epc +from datatypes.epc.domain.epc_property_data import EpcPropertyData +from domain.baseline.performance import Performance, lodged_performance + + +def _epc_with_recorded_performance( + *, sap: int, band: Epc, co2: float, peui: int +) -> EpcPropertyData: + # A bare instance with only the recorded-performance fields the reader + # touches — mirrors the opaque-EPC idiom used in the ingestion tests. + epc = object.__new__(EpcPropertyData) + epc.energy_rating_current = sap + epc.current_energy_efficiency_band = band + epc.co2_emissions_current = co2 + epc.energy_consumption_current = peui + return epc + + +def test_lodged_performance_reads_the_four_recorded_quantities_off_the_epc() -> None: + # Arrange + epc = _epc_with_recorded_performance(sap=72, band=Epc.C, co2=1.8, peui=180) + + # Act + performance = lodged_performance(epc) + + # Assert + assert performance == Performance( + sap_score=72, + epc_band=Epc.C, + co2_emissions=1.8, + primary_energy_intensity=180, + ) diff --git a/tests/domain/baseline/test_rebaseliner.py b/tests/domain/baseline/test_rebaseliner.py new file mode 100644 index 00000000..f4ceee70 --- /dev/null +++ b/tests/domain/baseline/test_rebaseliner.py @@ -0,0 +1,48 @@ +from __future__ import annotations + +from typing import Optional + +import pytest + +from datatypes.epc.domain.epc import Epc +from datatypes.epc.domain.epc_property_data import EpcPropertyData +from domain.baseline.performance import Performance +from domain.baseline.rebaseliner import RebaselineNotImplemented, StubRebaseliner + + +def _epc(*, sap_version: Optional[float]) -> EpcPropertyData: + epc = object.__new__(EpcPropertyData) + epc.sap_version = sap_version + return epc + + +def _lodged() -> Performance: + return Performance( + sap_score=72, epc_band=Epc.C, co2_emissions=1.8, primary_energy_intensity=180 + ) + + +def test_sap10_epc_is_not_rebaselined_so_effective_equals_lodged() -> None: + # Arrange — a SAP 10.2 cert: no rebaselining trigger fires. + epc = _epc(sap_version=10.2) + lodged = _lodged() + rebaseliner = StubRebaseliner() + + # Act + effective, reason = rebaseliner.rebaseline(epc, lodged) + + # Assert — Effective Performance equals Lodged, reason "none". + assert effective == lodged + assert reason == "none" + + +def test_pre_sap10_epc_raises_because_rebaselining_is_not_implemented() -> None: + # Arrange — a cert lodged under a pre-SAP10 schema genuinely needs ML + # rebaselining, which does not exist yet; the stub must not fabricate a + # "none" answer for it. + epc = _epc(sap_version=9.94) + rebaseliner = StubRebaseliner() + + # Act / Assert + with pytest.raises(RebaselineNotImplemented): + rebaseliner.rebaseline(epc, _lodged()) diff --git a/tests/orchestration/test_baseline_orchestrator.py b/tests/orchestration/test_baseline_orchestrator.py new file mode 100644 index 00000000..3958b9b4 --- /dev/null +++ b/tests/orchestration/test_baseline_orchestrator.py @@ -0,0 +1,110 @@ +from __future__ import annotations + +from typing import Optional + +import pytest + +from datatypes.epc.domain.epc import Epc +from datatypes.epc.domain.epc_property_data import ( + EpcPropertyData, + RenewableHeatIncentive, +) +from domain.baseline.baseline_performance import BaselinePerformance +from domain.baseline.performance import Performance +from domain.baseline.rebaseliner import RebaselineNotImplemented, StubRebaseliner +from domain.property.property import Property, PropertyIdentity +from orchestration.baseline_orchestrator import BaselineOrchestrator +from repositories.baseline.baseline_repository import BaselineRepository +from repositories.property.property_repository import PropertyRepository + + +class _FakePropertyRepo(PropertyRepository): + def __init__(self, by_id: dict[int, Property]) -> None: + self._by_id = by_id + + def get(self, property_id: int) -> Property: + return self._by_id[property_id] + + +class _FakeBaselineRepo(BaselineRepository): + def __init__(self) -> None: + self.saved: list[tuple[BaselinePerformance, int]] = [] + + def save(self, baseline: BaselinePerformance, property_id: int) -> int: + self.saved.append((baseline, property_id)) + return len(self.saved) + + def get_for_property( + self, property_id: int + ) -> Optional[BaselinePerformance]: # pragma: no cover + raise NotImplementedError + + +def _property(*, sap_version: float) -> Property: + epc = object.__new__(EpcPropertyData) + epc.energy_rating_current = 72 + epc.current_energy_efficiency_band = Epc.C + epc.co2_emissions_current = 1.8 + epc.energy_consumption_current = 180 + epc.sap_version = sap_version + epc.renewable_heat_incentive = RenewableHeatIncentive( + space_heating_kwh=5000.0, water_heating_kwh=2000.0 + ) + return Property( + identity=PropertyIdentity( + portfolio_id=1, postcode="A0 0AA", address="1 Some Street", uprn=123 + ), + epc=epc, + ) + + +def _sap10_property() -> Property: + return _property(sap_version=10.2) + + +def test_run_establishes_and_persists_baseline_performance() -> None: + # Arrange + property_repo = _FakePropertyRepo({10: _sap10_property()}) + baseline_repo = _FakeBaselineRepo() + orchestrator = BaselineOrchestrator( + property_repo=property_repo, + rebaseliner=StubRebaseliner(), + baseline_repo=baseline_repo, + ) + + # Act + orchestrator.run([10]) + + # Assert — one Baseline Performance persisted for property 10, both halves + # equal (no rebaselining), kWh read off the RHI. + lodged = Performance( + sap_score=72, epc_band=Epc.C, co2_emissions=1.8, primary_energy_intensity=180 + ) + assert baseline_repo.saved == [ + ( + BaselinePerformance( + lodged=lodged, + effective=lodged, + rebaseline_reason="none", + space_heating_kwh=5000.0, + water_heating_kwh=2000.0, + ), + 10, + ) + ] + + +def test_run_raises_on_a_pre_sap10_property_and_persists_nothing() -> None: + # Arrange — a pre-SAP10 cert needs ML rebaselining, which is not wired yet. + property_repo = _FakePropertyRepo({10: _property(sap_version=9.94)}) + baseline_repo = _FakeBaselineRepo() + orchestrator = BaselineOrchestrator( + property_repo=property_repo, + rebaseliner=StubRebaseliner(), + baseline_repo=baseline_repo, + ) + + # Act / Assert — the raise propagates; no half-baked baseline is written. + with pytest.raises(RebaselineNotImplemented): + orchestrator.run([10]) + assert baseline_repo.saved == [] diff --git a/tests/repositories/baseline/__init__.py b/tests/repositories/baseline/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/repositories/baseline/test_baseline_postgres_repository.py b/tests/repositories/baseline/test_baseline_postgres_repository.py new file mode 100644 index 00000000..eaa20003 --- /dev/null +++ b/tests/repositories/baseline/test_baseline_postgres_repository.py @@ -0,0 +1,53 @@ +from __future__ import annotations + +from sqlalchemy import Engine +from sqlmodel import Session + +from datatypes.epc.domain.epc import Epc +from domain.baseline.baseline_performance import BaselinePerformance +from domain.baseline.performance import Performance +from repositories.baseline.baseline_postgres_repository import ( + BaselinePostgresRepository, +) + + +def _baseline() -> BaselinePerformance: + lodged = Performance( + sap_score=72, epc_band=Epc.C, co2_emissions=1.8, primary_energy_intensity=180 + ) + # A rebaselined property — distinct halves so the round-trip proves both are + # persisted independently (not collapsed to one set). + effective = Performance( + sap_score=64, epc_band=Epc.D, co2_emissions=2.4, primary_energy_intensity=210 + ) + return BaselinePerformance( + lodged=lodged, + effective=effective, + rebaseline_reason="pre_sap10", + space_heating_kwh=5000.0, + water_heating_kwh=2000.0, + ) + + +def test_baseline_performance_round_trips(db_engine: Engine) -> None: + # Arrange + baseline = _baseline() + with Session(db_engine) as session: + BaselinePostgresRepository(session).save(baseline, property_id=10) + session.commit() + + # Act + with Session(db_engine) as session: + loaded = BaselinePostgresRepository(session).get_for_property(10) + + # Assert — the full aggregate reconstructs, both halves intact. + assert loaded == baseline + + +def test_get_for_property_returns_none_when_absent(db_engine: Engine) -> None: + # Arrange / Act + with Session(db_engine) as session: + loaded = BaselinePostgresRepository(session).get_for_property(999) + + # Assert + assert loaded is None From b77fe26892ea260ee1d0008a805c932504b2fcfa Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sat, 30 May 2026 22:32:58 +0000 Subject: [PATCH 10/18] =?UTF-8?q?feat(first-run):=20FirstRunPipeline=20E2E?= =?UTF-8?q?=20=E2=80=94=20Ingestion=20=E2=86=92=20Baseline=20=E2=86=92=20M?= =?UTF-8?q?odelling=20(#1136)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Completes the First Run spine. Replaces the #1130 stub FirstRunPipeline with the real three-stage composition and wires it into the handler. - `FirstRunPipeline.run(command)` sequences Ingestion → Baseline → Modelling, threading **only** `property_ids` between stages (and `scenario_ids` into Modelling, off the command — never a prior stage's output). Stages are injected behind thin `IngestionStage` / `BaselineStage` / `ModellingStage` Protocols (the EpcFetcher/SolarFetcher idiom), so the handler owns wiring and tests substitute fakes (ADR-0011). - `ModellingOrchestrator` stub + `ScenarioRepository` / `MaterialsRepository` seam ports — `run(property_ids, scenario_ids)` reads through repos, does no scoring yet. Method shapes deferred to the Modelling per-service grills (Scenario / Scenario Phase / Snapshot / Optimised Package / Plans are rich — not pre-empted here). - Handler delegates to the real pipeline via `build_first_run_pipeline` (Postgres-backed repos off the session). The Ingestion source clients (EPC API / Google Solar / geospatial S3) are isolated behind one `_source_clients_from_env` seam that raises until the deploy/Terraform config settles — out of scope for this slice. Subtask complete/failed + CloudWatch URL still come from `@subtask_handler`. Integration test (the criterion's centrepiece): wires REAL Ingestion + REAL Baseline + stub Modelling through a shared fake EPC repo, with a repo-backed PropertyRepo composing the Property from that slice. Proves Baseline reads the very EPC Ingestion persisted — the through-repos hand-off, no in-memory coupling. Plus a composition test pinning stage order + only-property_ids threading. TDD, one test → one impl. pyright strict clean; AAA layout. 116 pass in the tests/ tree, no regressions. Co-Authored-By: Claude Opus 4.8 --- applications/ara_first_run/handler.py | 95 ++++++++- orchestration/first_run_pipeline.py | 48 ++++- orchestration/modelling_orchestrator.py | 29 +++ repositories/materials/__init__.py | 0 .../materials/materials_repository.py | 13 ++ repositories/scenario/__init__.py | 0 repositories/scenario/scenario_repository.py | 14 ++ .../orchestration/test_first_run_pipeline.py | 49 ++++- .../test_first_run_pipeline_integration.py | 183 ++++++++++++++++++ 9 files changed, 413 insertions(+), 18 deletions(-) create mode 100644 orchestration/modelling_orchestrator.py create mode 100644 repositories/materials/__init__.py create mode 100644 repositories/materials/materials_repository.py create mode 100644 repositories/scenario/__init__.py create mode 100644 repositories/scenario/scenario_repository.py create mode 100644 tests/orchestration/test_first_run_pipeline_integration.py diff --git a/applications/ara_first_run/handler.py b/applications/ara_first_run/handler.py index b944227b..c0df86a9 100644 --- a/applications/ara_first_run/handler.py +++ b/applications/ara_first_run/handler.py @@ -1,12 +1,36 @@ from __future__ import annotations +import os from typing import Any, Protocol +from sqlmodel import Session + from applications.ara_first_run.ara_first_run_trigger_body import ( AraFirstRunTriggerBody, ) +from domain.baseline.rebaseliner import StubRebaseliner +from infrastructure.postgres.config import PostgresConfig +from infrastructure.postgres.engine import make_engine +from orchestration.baseline_orchestrator import BaselineOrchestrator from orchestration.first_run_pipeline import FirstRunPipeline +from orchestration.ingestion_orchestrator import ( + EpcFetcher, + IngestionOrchestrator, + SolarFetcher, +) +from orchestration.modelling_orchestrator import ModellingOrchestrator from orchestration.task_orchestrator import TaskOrchestrator +from repositories.baseline.baseline_postgres_repository import ( + BaselinePostgresRepository, +) +from repositories.epc.epc_postgres_repository import EpcPostgresRepository +from repositories.geospatial.geospatial_repository import GeospatialRepository +from repositories.materials.materials_repository import MaterialsRepository +from repositories.property.property_postgres_repository import ( + PropertyPostgresRepository, +) +from repositories.scenario.scenario_repository import ScenarioRepository +from repositories.solar.solar_postgres_repository import SolarPostgresRepository from utilities.aws_lambda.subtask_handler import subtask_handler @@ -19,16 +43,79 @@ class _RunsFirstRun(Protocol): def dispatch_first_run(body: dict[str, Any], *, pipeline: _RunsFirstRun) -> None: """Validate the raw event body and hand the command to the pipeline. - The handler's entire job — kept as a named seam so it is exercised without - the Lambda runtime. No business logic lives here: validate, then delegate - (issue #1130). + The handler's entire decision logic — kept as a named seam so it is + exercised without the Lambda runtime. No business logic lives here: validate, + then delegate (issue #1130/#1136). """ trigger = AraFirstRunTriggerBody.model_validate(body) pipeline.run(trigger) +def build_first_run_pipeline( + *, + session: Session, + epc_fetcher: EpcFetcher, + geospatial_repo: GeospatialRepository, + solar_fetcher: SolarFetcher, +) -> FirstRunPipeline: + """Compose the real three-stage pipeline over Postgres-backed repos. + + The stages share the session's repos and hand off only ``property_ids`` + through them (ADR-0011). The source clients are passed in rather than built + here because their config is not settled — see ``_source_clients_from_env``. + Modelling is stubbed (#1136); its Scenario / Materials ports are seams. + """ + epc_repo = EpcPostgresRepository(session) + property_repo = PropertyPostgresRepository(session, epc_repo) + solar_repo = SolarPostgresRepository(session) + baseline_repo = BaselinePostgresRepository(session) + return FirstRunPipeline( + ingestion=IngestionOrchestrator( + property_repo=property_repo, + epc_fetcher=epc_fetcher, + geospatial_repo=geospatial_repo, + solar_fetcher=solar_fetcher, + epc_repo=epc_repo, + solar_repo=solar_repo, + ), + baseline=BaselineOrchestrator( + property_repo=property_repo, + rebaseliner=StubRebaseliner(), + baseline_repo=baseline_repo, + ), + modelling=ModellingOrchestrator( + scenario_repo=ScenarioRepository(), + materials_repo=MaterialsRepository(), + ), + ) + + +def _source_clients_from_env() -> tuple[EpcFetcher, GeospatialRepository, SolarFetcher]: + """The Ingestion source clients — EPC API, Google Solar, geospatial S3. + + TODO(deploy): their config (EPC auth token, Google Solar API key, geospatial + S3 parquet reader), env-var names, and the pandas/s3fs runtime deps are not + settled — that wiring is a separate Terraform piece, out of scope for #1136. + Raises until then so the lambda fails loudly rather than half-running. + """ + raise NotImplementedError( + "ara_first_run source-client wiring (EPC / Google Solar / geospatial) " + "is pending the deploy/Terraform piece; see #1136." + ) + + @subtask_handler() def handler( body: dict[str, Any], context: Any, task_orchestrator: TaskOrchestrator ) -> None: - dispatch_first_run(body, pipeline=FirstRunPipeline()) + engine = make_engine(PostgresConfig.from_env(dict(os.environ))) + epc_fetcher, geospatial_repo, solar_fetcher = _source_clients_from_env() + with Session(engine) as session: + pipeline = build_first_run_pipeline( + session=session, + epc_fetcher=epc_fetcher, + geospatial_repo=geospatial_repo, + solar_fetcher=solar_fetcher, + ) + dispatch_first_run(body, pipeline=pipeline) + session.commit() diff --git a/orchestration/first_run_pipeline.py b/orchestration/first_run_pipeline.py index 1fd8839b..3d642d9e 100644 --- a/orchestration/first_run_pipeline.py +++ b/orchestration/first_run_pipeline.py @@ -23,14 +23,48 @@ class FirstRunCommand(Protocol): def scenario_ids(self) -> list[int]: ... -class FirstRunPipeline: - """Composes the First Run stages end-to-end (Ingestion -> Baseline -> - Modelling), threading only ``property_ids`` between them through repos - (ADR-0011). +class IngestionStage(Protocol): + """Stage 1 — acquires and persists each Property's external source data.""" - Stub at this stage (#1130): ``run`` simply receives the validated command. - The real three-stage composition lands in #1136. + def run(self, property_ids: list[int]) -> None: ... + + +class BaselineStage(Protocol): + """Stage 2 — establishes each Property's Baseline Performance.""" + + def run(self, property_ids: list[int]) -> None: ... + + +class ModellingStage(Protocol): + """Stage 3 — scores each Property against its Scenarios into Plans.""" + + def run(self, property_ids: list[int], scenario_ids: list[int]) -> None: ... + + +class FirstRunPipeline: + """Composes the First Run stages end-to-end: Ingestion -> Baseline -> + Modelling. + + Threads **only** ``property_ids`` between stages (and ``scenario_ids`` into + Modelling, off the command — not a prior stage). The stages communicate + through repos, never via in-memory hand-off, which is what makes each stage + independently runnable for the single-property review flow (ADR-0011, + ADR-0003). Stage orchestrators are injected so the handler owns wiring and + tests substitute fakes. """ + def __init__( + self, + *, + ingestion: IngestionStage, + baseline: BaselineStage, + modelling: ModellingStage, + ) -> None: + self._ingestion = ingestion + self._baseline = baseline + self._modelling = modelling + def run(self, command: FirstRunCommand) -> None: - return None + self._ingestion.run(command.property_ids) + self._baseline.run(command.property_ids) + self._modelling.run(command.property_ids, command.scenario_ids) diff --git a/orchestration/modelling_orchestrator.py b/orchestration/modelling_orchestrator.py new file mode 100644 index 00000000..48f70b19 --- /dev/null +++ b/orchestration/modelling_orchestrator.py @@ -0,0 +1,29 @@ +from __future__ import annotations + +from repositories.materials.materials_repository import MaterialsRepository +from repositories.scenario.scenario_repository import ScenarioRepository + + +class ModellingOrchestrator: + """Stage 3 — scores each baselined Property against its Scenarios, producing + Recommendations -> an Optimised Package per Scenario Phase -> Plans + (CONTEXT.md: Modelling). + + Stub at this stage (#1136): ``run`` reads its inputs through repos (it takes + only ``property_ids`` + ``scenario_ids``, never an in-memory hand-off from + Baseline) but does no scoring yet. Full Modelling lands via later TDD slices + + per-service grills. The Scenario / Materials repos are injected now so the + composition and wiring are real even while the body is empty. + """ + + def __init__( + self, + *, + scenario_repo: ScenarioRepository, + materials_repo: MaterialsRepository, + ) -> None: + self._scenario_repo = scenario_repo + self._materials_repo = materials_repo + + def run(self, property_ids: list[int], scenario_ids: list[int]) -> None: + return None diff --git a/repositories/materials/__init__.py b/repositories/materials/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/repositories/materials/materials_repository.py b/repositories/materials/materials_repository.py new file mode 100644 index 00000000..5d94f166 --- /dev/null +++ b/repositories/materials/materials_repository.py @@ -0,0 +1,13 @@ +from __future__ import annotations + +from abc import ABC + + +class MaterialsRepository(ABC): + """Loads the retrofit Materials catalogue the Modelling stage draws measures + and costs from. + + Seam only at this stage (#1136): the method shape is deferred to the + Modelling per-service grill. Declared now so the pipeline can be composed + end-to-end with Modelling stubbed. + """ diff --git a/repositories/scenario/__init__.py b/repositories/scenario/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/repositories/scenario/scenario_repository.py b/repositories/scenario/scenario_repository.py new file mode 100644 index 00000000..f560db14 --- /dev/null +++ b/repositories/scenario/scenario_repository.py @@ -0,0 +1,14 @@ +from __future__ import annotations + +from abc import ABC + + +class ScenarioRepository(ABC): + """Loads the Scenarios (and Scenario Snapshots) the Modelling stage scores + a Property against. + + Seam only at this stage (#1136): the method shape is deferred to the + Modelling per-service grill, where Scenario / Scenario Phase / Scenario + Snapshot are designed (CONTEXT.md). Declared now so the pipeline can be + composed end-to-end with Modelling stubbed. + """ diff --git a/tests/orchestration/test_first_run_pipeline.py b/tests/orchestration/test_first_run_pipeline.py index 4b685bb2..705282ee 100644 --- a/tests/orchestration/test_first_run_pipeline.py +++ b/tests/orchestration/test_first_run_pipeline.py @@ -14,16 +14,51 @@ class _FakeCommand: scenario_ids: list[int] -def test_run_accepts_the_validated_command() -> None: +class _SpyIngestion: + def __init__(self, log: list[tuple[object, ...]]) -> None: + self._log = log + + def run(self, property_ids: list[int]) -> None: + self._log.append(("ingestion", property_ids)) + + +class _SpyBaseline: + def __init__(self, log: list[tuple[object, ...]]) -> None: + self._log = log + + def run(self, property_ids: list[int]) -> None: + self._log.append(("baseline", property_ids)) + + +class _SpyModelling: + def __init__(self, log: list[tuple[object, ...]]) -> None: + self._log = log + + def run(self, property_ids: list[int], scenario_ids: list[int]) -> None: + self._log.append(("modelling", property_ids, scenario_ids)) + + +def test_run_sequences_the_three_stages_threading_only_property_ids() -> None: # Arrange + log: list[tuple[object, ...]] = [] command: FirstRunCommand = _FakeCommand( - portfolio_id=42, property_ids=[101, 102], scenario_ids=[7] + portfolio_id=1, property_ids=[10, 11], scenario_ids=[7] + ) + pipeline = FirstRunPipeline( + ingestion=_SpyIngestion(log), + baseline=_SpyBaseline(log), + modelling=_SpyModelling(log), ) - pipeline = FirstRunPipeline() # Act - result = pipeline.run(command) + pipeline.run(command) - # Assert — the stub simply receives the command; full Ingestion -> Baseline - # -> Modelling composition lands in #1136. - assert result is None + # Assert — Ingestion -> Baseline -> Modelling, in order. Ingestion and + # Baseline receive only property_ids; Modelling additionally gets the + # scenario_ids (off the command, not a prior stage). Nothing else is + # threaded between stages — they communicate through repos (ADR-0011). + assert log == [ + ("ingestion", [10, 11]), + ("baseline", [10, 11]), + ("modelling", [10, 11], [7]), + ] diff --git a/tests/orchestration/test_first_run_pipeline_integration.py b/tests/orchestration/test_first_run_pipeline_integration.py new file mode 100644 index 00000000..55ca34ed --- /dev/null +++ b/tests/orchestration/test_first_run_pipeline_integration.py @@ -0,0 +1,183 @@ +from __future__ import annotations + +from dataclasses import dataclass +from typing import Any, Optional + +from datatypes.epc.domain.epc import Epc +from datatypes.epc.domain.epc_property_data import ( + EpcPropertyData, + RenewableHeatIncentive, +) +from domain.baseline.rebaseliner import StubRebaseliner +from domain.geospatial.coordinates import Coordinates +from domain.property.property import Property, PropertyIdentity +from orchestration.baseline_orchestrator import BaselineOrchestrator +from orchestration.first_run_pipeline import FirstRunPipeline +from orchestration.ingestion_orchestrator import IngestionOrchestrator +from orchestration.modelling_orchestrator import ModellingOrchestrator +from repositories.baseline.baseline_repository import BaselineRepository +from repositories.epc.epc_repository import EpcRepository +from repositories.geospatial.geospatial_repository import GeospatialRepository +from repositories.materials.materials_repository import MaterialsRepository +from repositories.property.property_repository import PropertyRepository +from repositories.scenario.scenario_repository import ScenarioRepository +from repositories.solar.solar_repository import SolarRepository +from domain.baseline.baseline_performance import BaselinePerformance + + +@dataclass +class _FakeCommand: + portfolio_id: int + property_ids: list[int] + scenario_ids: list[int] + + +class _SharedEpcRepo(EpcRepository): + """Stands in for the persisted EPC slice both stages talk through.""" + + def __init__(self) -> None: + self._by_property: dict[int, EpcPropertyData] = {} + + def save( + self, + data: EpcPropertyData, + property_id: Optional[int] = None, + portfolio_id: Optional[int] = None, + ) -> int: + assert property_id is not None + self._by_property[property_id] = data + return property_id + + def get(self, epc_property_id: int) -> EpcPropertyData: # pragma: no cover + raise NotImplementedError + + def get_for_property(self, property_id: int) -> Optional[EpcPropertyData]: + return self._by_property.get(property_id) + + +class _RepoBackedPropertyRepo(PropertyRepository): + """Composes the Property from its identity row + the EPC slice in the shared + EPC repo — mirroring PropertyPostgresRepository, so the stages genuinely + hand off through repo state, not in memory.""" + + def __init__( + self, identities: dict[int, PropertyIdentity], epc_repo: _SharedEpcRepo + ) -> None: + self._identities = identities + self._epc_repo = epc_repo + + def get(self, property_id: int) -> Property: + return Property( + identity=self._identities[property_id], + epc=self._epc_repo.get_for_property(property_id), + ) + + +class _FakeEpcFetcher: + def __init__(self, epc: EpcPropertyData) -> None: + self._epc = epc + + def get_by_uprn(self, uprn: int) -> Optional[EpcPropertyData]: + return self._epc + + +class _NoCoordinatesGeospatialRepo(GeospatialRepository): + def coordinates_for(self, uprn: int) -> Optional[Coordinates]: + return None # skip the solar leg — not under test here + + +class _FakeSolarFetcher: + def get_building_insights( + self, longitude: float, latitude: float + ) -> dict[str, Any]: # pragma: no cover + return {} + + +class _FakeSolarRepo(SolarRepository): + def save(self, property_id: int, insights: dict[str, Any]) -> None: # pragma: no cover + return None + + def get(self, property_id: int) -> Optional[dict[str, Any]]: # pragma: no cover + raise NotImplementedError + + +class _CollectingBaselineRepo(BaselineRepository): + def __init__(self) -> None: + self.saved: list[tuple[BaselinePerformance, int]] = [] + + def save(self, baseline: BaselinePerformance, property_id: int) -> int: + self.saved.append((baseline, property_id)) + return len(self.saved) + + def get_for_property( + self, property_id: int + ) -> Optional[BaselinePerformance]: # pragma: no cover + raise NotImplementedError + + +class _FakeScenarioRepo(ScenarioRepository): + pass + + +class _FakeMaterialsRepo(MaterialsRepository): + pass + + +def _ingestible_epc() -> EpcPropertyData: + epc = object.__new__(EpcPropertyData) + epc.energy_rating_current = 72 + epc.current_energy_efficiency_band = Epc.C + epc.co2_emissions_current = 1.8 + epc.energy_consumption_current = 180 + epc.sap_version = 10.2 + epc.renewable_heat_incentive = RenewableHeatIncentive( + space_heating_kwh=5000.0, water_heating_kwh=2000.0 + ) + return epc + + +def test_baseline_reads_the_epc_ingestion_persisted_through_repos() -> None: + # Arrange — one property; the EPC the fetcher returns is what Ingestion + # persists and Baseline must then read back through the shared repo. + epc = _ingestible_epc() + epc_repo = _SharedEpcRepo() + identities = { + 10: PropertyIdentity( + portfolio_id=1, postcode="A0 0AA", address="1 Some Street", uprn=123 + ) + } + property_repo = _RepoBackedPropertyRepo(identities, epc_repo) + baseline_repo = _CollectingBaselineRepo() + + pipeline = FirstRunPipeline( + ingestion=IngestionOrchestrator( + property_repo=property_repo, + epc_fetcher=_FakeEpcFetcher(epc), + geospatial_repo=_NoCoordinatesGeospatialRepo(), + solar_fetcher=_FakeSolarFetcher(), + epc_repo=epc_repo, + solar_repo=_FakeSolarRepo(), + ), + baseline=BaselineOrchestrator( + property_repo=property_repo, + rebaseliner=StubRebaseliner(), + baseline_repo=baseline_repo, + ), + modelling=ModellingOrchestrator( + scenario_repo=_FakeScenarioRepo(), + materials_repo=_FakeMaterialsRepo(), + ), + ) + + # Act + pipeline.run(_FakeCommand(portfolio_id=1, property_ids=[10], scenario_ids=[7])) + + # Assert — a Baseline Performance landed for property 10, its Lodged half + # read off the very EPC Ingestion persisted. Only property_ids crossed the + # stage boundary; the EPC itself travelled through the repo. + assert len(baseline_repo.saved) == 1 + baseline, property_id = baseline_repo.saved[0] + assert property_id == 10 + assert baseline.lodged.sap_score == 72 + assert baseline.lodged.epc_band == Epc.C + assert baseline.space_heating_kwh == 5000.0 From 4daba1f7c57520739dde6306c319a9bb44a92185 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sun, 31 May 2026 09:25:17 +0000 Subject: [PATCH 11/18] feat(uow): UnitOfWork port + PostgresUnitOfWork adapter (#1138) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit First slice of the per-stage batch-transaction refactor (ADR-0012). A UnitOfWork is the single transaction a stage runs its batch in: a context manager exposing the DB repos bound to one session, committing once on `commit()` and rolling back on exception or exit-without-commit (all-or-nothing per batch, fail noisily). - `UnitOfWork` (port): `property` / `epc` / `solar` / `baseline` repos + `commit()` / `rollback()`; `__exit__` rolls back uncommitted work. - `PostgresUnitOfWork(session_factory)`: opens a Session from an injected factory (a module-scoped engine + sessionmaker in prod, so the pool is reused across warm invocations), binds the Postgres repos to it, closes on exit. Not yet wired into any orchestrator — that lands in the Baseline / Ingestion refactor slices. 3 tests against ephemeral PG (commit durable across units; exception rolls back; no-commit persists nothing). pyright strict clean; AAA. Co-Authored-By: Claude Opus 4.8 --- repositories/postgres_unit_of_work.py | 56 +++++++++++++++++++ repositories/unit_of_work.py | 47 ++++++++++++++++ tests/repositories/test_unit_of_work.py | 73 +++++++++++++++++++++++++ 3 files changed, 176 insertions(+) create mode 100644 repositories/postgres_unit_of_work.py create mode 100644 repositories/unit_of_work.py create mode 100644 tests/repositories/test_unit_of_work.py diff --git a/repositories/postgres_unit_of_work.py b/repositories/postgres_unit_of_work.py new file mode 100644 index 00000000..bd5957e9 --- /dev/null +++ b/repositories/postgres_unit_of_work.py @@ -0,0 +1,56 @@ +from __future__ import annotations + +from collections.abc import Callable +from types import TracebackType +from typing import Optional + +from sqlmodel import Session + +from repositories.baseline.baseline_postgres_repository import ( + BaselinePostgresRepository, +) +from repositories.epc.epc_postgres_repository import EpcPostgresRepository +from repositories.property.property_postgres_repository import ( + PropertyPostgresRepository, +) +from repositories.solar.solar_postgres_repository import SolarPostgresRepository +from repositories.unit_of_work import UnitOfWork + + +class PostgresUnitOfWork(UnitOfWork): + """Postgres-backed Unit of Work: one ``Session``, all repos bound to it. + + Built from a session factory (a module-scoped engine + sessionmaker in + production, ADR-0012) so the connection pool is reused across warm Lambda + invocations. The session is opened on ``__enter__`` and closed on + ``__exit__``; a fresh instance is one single-use unit. + """ + + def __init__(self, session_factory: Callable[[], Session]) -> None: + self._session_factory = session_factory + + def __enter__(self) -> "PostgresUnitOfWork": + self._session = self._session_factory() + epc_repo = EpcPostgresRepository(self._session) + self.property = PropertyPostgresRepository(self._session, epc_repo) + self.epc = epc_repo + self.solar = SolarPostgresRepository(self._session) + self.baseline = BaselinePostgresRepository(self._session) + return self + + def __exit__( + self, + exc_type: Optional[type[BaseException]], + exc: Optional[BaseException], + tb: Optional[TracebackType], + ) -> None: + try: + self._session.rollback() + finally: + self._session.close() + + def commit(self) -> None: + self._session.commit() + + def rollback(self) -> None: + self._session.rollback() diff --git a/repositories/unit_of_work.py b/repositories/unit_of_work.py new file mode 100644 index 00000000..af5b77f2 --- /dev/null +++ b/repositories/unit_of_work.py @@ -0,0 +1,47 @@ +from __future__ import annotations + +from abc import ABC, abstractmethod +from types import TracebackType +from typing import Optional + +from repositories.baseline.baseline_repository import BaselineRepository +from repositories.epc.epc_repository import EpcRepository +from repositories.property.property_repository import PropertyRepository +from repositories.solar.solar_repository import SolarRepository + + +class UnitOfWork(ABC): + """A single batch transaction across the DB-backed repos (ADR-0012). + + A context manager that exposes the repos bound to one session. A stage runs + its whole batch inside one unit and calls ``commit()`` once; leaving the + block without committing — including via an exception — rolls back, so a + failed batch persists nothing and the subtask fails noisily. + + The non-DB dependencies (EPC/Solar fetchers, the geospatial S3 repo, the + Rebaseliner) are *not* part of the unit — only transactional DB work is. + """ + + property: PropertyRepository + epc: EpcRepository + solar: SolarRepository + baseline: BaselineRepository + + @abstractmethod + def commit(self) -> None: ... + + @abstractmethod + def rollback(self) -> None: ... + + def __enter__(self) -> "UnitOfWork": + return self + + def __exit__( + self, + exc_type: Optional[type[BaseException]], + exc: Optional[BaseException], + tb: Optional[TracebackType], + ) -> None: + # Roll back whatever was not explicitly committed (a no-op after a + # successful commit). All-or-nothing per batch. + self.rollback() diff --git a/tests/repositories/test_unit_of_work.py b/tests/repositories/test_unit_of_work.py new file mode 100644 index 00000000..2851edaf --- /dev/null +++ b/tests/repositories/test_unit_of_work.py @@ -0,0 +1,73 @@ +from __future__ import annotations + +from collections.abc import Callable + +import pytest +from sqlalchemy import Engine +from sqlmodel import Session + +from datatypes.epc.domain.epc import Epc +from domain.baseline.baseline_performance import BaselinePerformance +from domain.baseline.performance import Performance +from repositories.postgres_unit_of_work import PostgresUnitOfWork + + +def _session_factory(db_engine: Engine) -> Callable[[], Session]: + return lambda: Session(db_engine) + + +def _baseline() -> BaselinePerformance: + perf = Performance( + sap_score=72, epc_band=Epc.C, co2_emissions=1.8, primary_energy_intensity=180 + ) + return BaselinePerformance( + lodged=perf, + effective=perf, + rebaseline_reason="none", + space_heating_kwh=5000.0, + water_heating_kwh=2000.0, + ) + + +def test_committed_work_is_visible_to_a_later_unit(db_engine: Engine) -> None: + # Arrange + new_unit = lambda: PostgresUnitOfWork(_session_factory(db_engine)) + baseline = _baseline() + + # Act + with new_unit() as uow: + uow.baseline.save(baseline, property_id=10) + uow.commit() + + # Assert — a fresh unit reads back what the first one committed. + with new_unit() as uow: + loaded = uow.baseline.get_for_property(10) + assert loaded == baseline + + +def test_an_exception_in_the_block_rolls_the_batch_back(db_engine: Engine) -> None: + # Arrange + new_unit = lambda: PostgresUnitOfWork(_session_factory(db_engine)) + + # Act — a property mid-batch raises after a write but before commit. + with pytest.raises(RuntimeError, match="boom"): + with new_unit() as uow: + uow.baseline.save(_baseline(), property_id=10) + raise RuntimeError("boom") + + # Assert — nothing from the aborted batch is persisted. + with new_unit() as uow: + assert uow.baseline.get_for_property(10) is None + + +def test_leaving_the_block_without_commit_persists_nothing(db_engine: Engine) -> None: + # Arrange + new_unit = lambda: PostgresUnitOfWork(_session_factory(db_engine)) + + # Act — write but never commit. + with new_unit() as uow: + uow.baseline.save(_baseline(), property_id=10) + + # Assert + with new_unit() as uow: + assert uow.baseline.get_for_property(10) is None From 559ae1b4ecb7e56e179d3c6bc1e5f322d3f13fdc Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sun, 31 May 2026 09:41:39 +0000 Subject: [PATCH 12/18] feat(repos): idempotent EPC + Baseline writes (replace by property_id) (#1138) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Re-runs of a First Run batch re-save a property's data; that must replace, not duplicate (ADR-0012 idempotent batch writes). - `EpcPostgresRepository.save` deletes the property's existing EPC graph (parent + all child tables, floor-dims via their building parts) before inserting, when a `property_id` is given. Anonymous saves still insert. - `BaselinePostgresRepository.save` deletes the existing row for the `property_id` before inserting — no more unique-constraint violation on re-save; also what the re-score-on-override path needs. - Solar already upserts, so it's unchanged. The #1129 round-trip fidelity test stays green (delete-first is a no-op on a first save). 2 new tests (re-save replaces, not duplicates). pyright strict clean; AAA. Co-Authored-By: Claude Opus 4.8 --- .../baseline/baseline_postgres_repository.py | 9 +++- repositories/epc/epc_postgres_repository.py | 52 ++++++++++++++++++- .../test_baseline_postgres_repository.py | 38 ++++++++++++++ .../epc/test_epc_idempotent_save.py | 52 +++++++++++++++++++ 4 files changed, 149 insertions(+), 2 deletions(-) create mode 100644 tests/repositories/epc/test_epc_idempotent_save.py diff --git a/repositories/baseline/baseline_postgres_repository.py b/repositories/baseline/baseline_postgres_repository.py index 5a2c7bb8..7a5b5807 100644 --- a/repositories/baseline/baseline_postgres_repository.py +++ b/repositories/baseline/baseline_postgres_repository.py @@ -2,7 +2,7 @@ from __future__ import annotations from typing import Optional -from sqlmodel import Session, select +from sqlmodel import Session, col, delete, select from domain.baseline.baseline_performance import BaselinePerformance from infrastructure.postgres.baseline_performance_table import ( @@ -18,6 +18,13 @@ class BaselinePostgresRepository(BaselineRepository): self._session = session def save(self, baseline: BaselinePerformance, property_id: int) -> int: + # Idempotent on property_id: a re-run (or re-score) replaces the row + # rather than hitting the unique constraint (ADR-0012). + self._session.exec( # type: ignore[call-overload] + delete(BaselinePerformanceModel).where( + col(BaselinePerformanceModel.property_id) == property_id + ) + ) row = BaselinePerformanceModel.from_domain(baseline, property_id) self._session.add(row) self._session.flush() diff --git a/repositories/epc/epc_postgres_repository.py b/repositories/epc/epc_postgres_repository.py index b0a8070c..b1368916 100644 --- a/repositories/epc/epc_postgres_repository.py +++ b/repositories/epc/epc_postgres_repository.py @@ -3,7 +3,7 @@ from __future__ import annotations from datetime import date from typing import Optional, TypeVar -from sqlmodel import Session, select +from sqlmodel import Session, col, delete, select from datatypes.epc.domain.epc import Epc from datatypes.epc.domain.epc_property_data import ( @@ -74,6 +74,11 @@ class EpcPostgresRepository(EpcRepository): property_id: Optional[int] = None, portfolio_id: Optional[int] = None, ) -> int: + # Idempotent on property_id: a re-run replaces the property's EPC graph + # rather than duplicating it (ADR-0012). Anonymous saves (no property_id) + # always insert. + if property_id is not None: + self._delete_for_property(property_id) parent = EpcPropertyModel.from_epc_property_data( data, property_id=property_id, portfolio_id=portfolio_id ) @@ -134,6 +139,51 @@ class EpcPostgresRepository(EpcRepository): ) return epc_property_id + def _delete_for_property(self, property_id: int) -> None: + """Remove the property's existing EPC graph (parent + child tables) so a + re-save replaces rather than duplicates (ADR-0012).""" + epc_ids = [ + i + for i in self._session.exec( + select(EpcPropertyModel.id).where( + EpcPropertyModel.property_id == property_id + ) + ).all() + if i is not None + ] + if not epc_ids: + return + part_ids = [ + i + for i in self._session.exec( + select(EpcBuildingPartModel.id).where( + col(EpcBuildingPartModel.epc_property_id).in_(epc_ids) + ) + ).all() + if i is not None + ] + if part_ids: + self._session.exec( # type: ignore[call-overload] + delete(EpcFloorDimensionModel).where( + col(EpcFloorDimensionModel.epc_building_part_id).in_(part_ids) + ) + ) + for child in ( + EpcPropertyEnergyPerformanceModel, + EpcEnergyElementModel, + EpcMainHeatingDetailModel, + EpcBuildingPartModel, + EpcWindowModel, + EpcFlatDetailsModel, + EpcRenewableHeatIncentiveModel, + ): + self._session.exec( # type: ignore[call-overload] + delete(child).where(col(child.epc_property_id).in_(epc_ids)) + ) + self._session.exec( # type: ignore[call-overload] + delete(EpcPropertyModel).where(col(EpcPropertyModel.id).in_(epc_ids)) + ) + def get_for_property(self, property_id: int) -> Optional[EpcPropertyData]: row = self._session.exec( select(EpcPropertyModel) diff --git a/tests/repositories/baseline/test_baseline_postgres_repository.py b/tests/repositories/baseline/test_baseline_postgres_repository.py index eaa20003..df1da9e8 100644 --- a/tests/repositories/baseline/test_baseline_postgres_repository.py +++ b/tests/repositories/baseline/test_baseline_postgres_repository.py @@ -44,6 +44,44 @@ def test_baseline_performance_round_trips(db_engine: Engine) -> None: assert loaded == baseline +def test_resaving_baseline_for_a_property_replaces_rather_than_duplicating( + db_engine: Engine, +) -> None: + # Arrange — a re-run re-establishes the same property's baseline with a + # different rating. + first = _baseline() + rerun = BaselinePerformance( + lodged=Performance( + sap_score=80, + epc_band=Epc.B, + co2_emissions=1.2, + primary_energy_intensity=150, + ), + effective=Performance( + sap_score=80, + epc_band=Epc.B, + co2_emissions=1.2, + primary_energy_intensity=150, + ), + rebaseline_reason="none", + space_heating_kwh=4000.0, + water_heating_kwh=1800.0, + ) + + # Act — save twice for the same property_id (must not hit the unique + # constraint, must overwrite). + with Session(db_engine) as session: + repo = BaselinePostgresRepository(session) + repo.save(first, property_id=10) + repo.save(rerun, property_id=10) + session.commit() + + # Assert + with Session(db_engine) as session: + loaded = BaselinePostgresRepository(session).get_for_property(10) + assert loaded == rerun + + def test_get_for_property_returns_none_when_absent(db_engine: Engine) -> None: # Arrange / Act with Session(db_engine) as session: diff --git a/tests/repositories/epc/test_epc_idempotent_save.py b/tests/repositories/epc/test_epc_idempotent_save.py new file mode 100644 index 00000000..9d36ea48 --- /dev/null +++ b/tests/repositories/epc/test_epc_idempotent_save.py @@ -0,0 +1,52 @@ +"""A re-run of First Run re-saves a property's EPC; that must replace the prior +row, not duplicate it (ADR-0012 idempotent batch writes, #1138).""" + +from __future__ import annotations + +import dataclasses +import json +from pathlib import Path +from typing import Any + +from sqlalchemy import Engine +from sqlmodel import Session, select + +from datatypes.epc.domain.epc_property_data import EpcPropertyData +from datatypes.epc.domain.mapper import EpcPropertyDataMapper +from infrastructure.postgres.epc_property_table import EpcPropertyModel +from repositories.epc.epc_postgres_repository import EpcPostgresRepository + +_JSON_SAMPLES = Path(__file__).resolve().parents[3] / "backend/epc_api/json_samples" + + +def _load_epc() -> EpcPropertyData: + raw: dict[str, Any] = json.loads( + (_JSON_SAMPLES / "RdSAP-Schema-21.0.0" / "epc.json").read_text() + ) + return EpcPropertyDataMapper.from_api_response(raw) + + +def test_resaving_an_epc_for_a_property_replaces_rather_than_duplicates( + db_engine: Engine, +) -> None: + # Arrange — same property re-ingested with a changed field. + original = _load_epc() + updated = dataclasses.replace(original, status="re-run-sentinel") + + # Act — save twice for the same property_id (a re-run). + with Session(db_engine) as session: + repo = EpcPostgresRepository(session) + repo.save(original, property_id=10) + repo.save(updated, property_id=10) + session.commit() + + # Assert — exactly one EPC row for the property, holding the latest data. + with Session(db_engine) as session: + rows = session.exec( + select(EpcPropertyModel).where(EpcPropertyModel.property_id == 10) + ).all() + reloaded = EpcPostgresRepository(session).get_for_property(10) + + assert len(rows) == 1 + assert reloaded is not None + assert reloaded.status == "re-run-sentinel" From 48a488d1e97edc6c2704e2d3b95ce6bb9cdf40eb Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sun, 31 May 2026 09:54:47 +0000 Subject: [PATCH 13/18] refactor(orchestration): wire stages onto the UnitOfWork; per-stage commit (#1138) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replaces the handler's whole-pipeline Session (one transaction across all three stages, connection pinned during Ingestion's external IO) with a Unit-of-Work per stage (ADR-0012, added here). Each stage runs its batch in one unit and commits once; any property raising aborts the batch and the subtask fails noisily. - BaselineOrchestrator(unit_of_work, rebaseliner): one unit for the batch, commit once. Raise on a pre-SAP10 property leaves the unit uncommitted. - IngestionOrchestrator(unit_of_work, epc_fetcher, geospatial_repo, solar_fetcher): fetch/write split — phase 1 fetches the whole batch (EPC / coords / solar) with NO unit open; phase 2 writes in one unit and commits. The connection is never held during external IO. Geospatial S3 repo stays injected (reference data, not transactional). - Handler: module-scoped engine (pool reused across warm invocations) + a UoW factory; whole-pipeline `with Session` gone. `build_first_run_pipeline` composes on the factory. Source clients still behind the raising seam. - ADR-0012 records the decision (per-stage boundary, all-or-nothing batch, idempotent re-run, fetch/write split, module-scoped engine). Modelling stub left untouched (no-op, no DB) per the ADR. Tests: orchestrators on a shared FakeUnitOfWork (assert persisted batch + exactly-once commit + no-commit-on-raise). New real-DB E2E integration test: real PostgresUnitOfWork, Ingestion writes the EPC → Baseline reads it back through the repo → re-run replaces, not duplicates (1 EPC row, 1 baseline row after two runs). 121 pass in tests/; pyright strict clean; AAA. Co-Authored-By: Claude Opus 4.8 --- applications/ara_first_run/handler.py | 72 +++--- ...nit-of-work-per-stage-batch-transaction.md | 31 +++ orchestration/baseline_orchestrator.py | 59 ++--- orchestration/ingestion_orchestrator.py | 91 +++++--- tests/orchestration/fakes.py | 110 +++++++++ .../test_baseline_orchestrator.py | 70 ++---- .../test_first_run_pipeline_integration.py | 214 +++++++----------- .../test_ingestion_orchestrator.py | 122 ++++------ 8 files changed, 423 insertions(+), 346 deletions(-) create mode 100644 docs/adr/0012-unit-of-work-per-stage-batch-transaction.md create mode 100644 tests/orchestration/fakes.py diff --git a/applications/ara_first_run/handler.py b/applications/ara_first_run/handler.py index c0df86a9..f9cb6be7 100644 --- a/applications/ara_first_run/handler.py +++ b/applications/ara_first_run/handler.py @@ -1,8 +1,10 @@ from __future__ import annotations import os -from typing import Any, Protocol +from collections.abc import Callable +from typing import Any, Optional, Protocol +from sqlalchemy import Engine from sqlmodel import Session from applications.ara_first_run.ara_first_run_trigger_body import ( @@ -20,19 +22,24 @@ from orchestration.ingestion_orchestrator import ( ) from orchestration.modelling_orchestrator import ModellingOrchestrator from orchestration.task_orchestrator import TaskOrchestrator -from repositories.baseline.baseline_postgres_repository import ( - BaselinePostgresRepository, -) -from repositories.epc.epc_postgres_repository import EpcPostgresRepository from repositories.geospatial.geospatial_repository import GeospatialRepository from repositories.materials.materials_repository import MaterialsRepository -from repositories.property.property_postgres_repository import ( - PropertyPostgresRepository, -) +from repositories.postgres_unit_of_work import PostgresUnitOfWork from repositories.scenario.scenario_repository import ScenarioRepository -from repositories.solar.solar_postgres_repository import SolarPostgresRepository +from repositories.unit_of_work import UnitOfWork from utilities.aws_lambda.subtask_handler import subtask_handler +# Module-scoped so the connection pool is reused across warm Lambda invocations +# rather than rebuilt per invocation (ADR-0012). +_engine: Optional[Engine] = None + + +def _get_engine() -> Engine: + global _engine + if _engine is None: + _engine = make_engine(PostgresConfig.from_env(dict(os.environ))) + return _engine + class _RunsFirstRun(Protocol): """The slice of FirstRunPipeline the handler delegates to.""" @@ -44,8 +51,7 @@ def dispatch_first_run(body: dict[str, Any], *, pipeline: _RunsFirstRun) -> None """Validate the raw event body and hand the command to the pipeline. The handler's entire decision logic — kept as a named seam so it is - exercised without the Lambda runtime. No business logic lives here: validate, - then delegate (issue #1130/#1136). + exercised without the Lambda runtime. No business logic: validate, delegate. """ trigger = AraFirstRunTriggerBody.model_validate(body) pipeline.run(trigger) @@ -53,35 +59,28 @@ def dispatch_first_run(body: dict[str, Any], *, pipeline: _RunsFirstRun) -> None def build_first_run_pipeline( *, - session: Session, + unit_of_work: Callable[[], UnitOfWork], epc_fetcher: EpcFetcher, geospatial_repo: GeospatialRepository, solar_fetcher: SolarFetcher, ) -> FirstRunPipeline: - """Compose the real three-stage pipeline over Postgres-backed repos. + """Compose the real three-stage pipeline on a Unit-of-Work factory. - The stages share the session's repos and hand off only ``property_ids`` - through them (ADR-0011). The source clients are passed in rather than built - here because their config is not settled — see ``_source_clients_from_env``. - Modelling is stubbed (#1136); its Scenario / Materials ports are seams. + Each stage opens its own unit(s) and commits per batch (ADR-0012); the + handler no longer holds a session. The source clients are passed in because + their config is not settled — see ``_source_clients_from_env``. Modelling is + stubbed (#1136); its Scenario / Materials ports are seams. """ - epc_repo = EpcPostgresRepository(session) - property_repo = PropertyPostgresRepository(session, epc_repo) - solar_repo = SolarPostgresRepository(session) - baseline_repo = BaselinePostgresRepository(session) return FirstRunPipeline( ingestion=IngestionOrchestrator( - property_repo=property_repo, + unit_of_work=unit_of_work, epc_fetcher=epc_fetcher, geospatial_repo=geospatial_repo, solar_fetcher=solar_fetcher, - epc_repo=epc_repo, - solar_repo=solar_repo, ), baseline=BaselineOrchestrator( - property_repo=property_repo, + unit_of_work=unit_of_work, rebaseliner=StubRebaseliner(), - baseline_repo=baseline_repo, ), modelling=ModellingOrchestrator( scenario_repo=ScenarioRepository(), @@ -108,14 +107,15 @@ def _source_clients_from_env() -> tuple[EpcFetcher, GeospatialRepository, SolarF def handler( body: dict[str, Any], context: Any, task_orchestrator: TaskOrchestrator ) -> None: - engine = make_engine(PostgresConfig.from_env(dict(os.environ))) + engine = _get_engine() + unit_of_work: Callable[[], UnitOfWork] = lambda: PostgresUnitOfWork( + lambda: Session(engine) + ) epc_fetcher, geospatial_repo, solar_fetcher = _source_clients_from_env() - with Session(engine) as session: - pipeline = build_first_run_pipeline( - session=session, - epc_fetcher=epc_fetcher, - geospatial_repo=geospatial_repo, - solar_fetcher=solar_fetcher, - ) - dispatch_first_run(body, pipeline=pipeline) - session.commit() + pipeline = build_first_run_pipeline( + unit_of_work=unit_of_work, + epc_fetcher=epc_fetcher, + geospatial_repo=geospatial_repo, + solar_fetcher=solar_fetcher, + ) + dispatch_first_run(body, pipeline=pipeline) diff --git a/docs/adr/0012-unit-of-work-per-stage-batch-transaction.md b/docs/adr/0012-unit-of-work-per-stage-batch-transaction.md new file mode 100644 index 00000000..c31e6e7c --- /dev/null +++ b/docs/adr/0012-unit-of-work-per-stage-batch-transaction.md @@ -0,0 +1,31 @@ +# Each stage commits its batch once, through a Unit of Work + +**Status: Accepted.** Refines [ADR-0011](0011-composable-stage-orchestrators.md) (composable stage orchestrators, stages communicate through repos) with the persistence/transaction mechanics for batch processing. Decided in a `/grill-with-docs` session (2026-05-31) after the First Run spine (#1136) landed, prompted by reviewing the handler's session lifecycle. + +## Context + +A First Run trigger carries a **batch** of ~30 `property_ids`. The pipeline runs that batch through Ingestion → Baseline → Modelling. The first cut (#1136) wrapped **all three stages in one `Session` and one final `commit()`** in the handler. That has three problems: + +1. **A connection is pinned for the whole long-running pipeline.** SQLAlchemy checks out a pooled connection on the first statement and holds it until commit. Ingestion is the only IO-heavy stage (per property: EPC HTTP, Google-Solar HTTP, geospatial S3), so the connection sits checked-out-but-idle across all that external IO — the RDS-Proxy/pgbouncer "transaction-pinned connection" anti-pattern. +2. **One giant transaction** for the batch: long-held locks, identity-map growth, all-or-nothing across stages. +3. **Cross-stage hand-off through an *uncommitted* transaction.** Baseline reads Ingestion's writes only because they share one open transaction — which contradicts ADR-0011/0003's "stages hand off through *persisted* state." If a stage ever moves to its own lambda, this breaks. + +A tempting fix — commit per property — is **rejected**: per-property commits are a commit storm that has overloaded the database before. The unit of commit must be the **batch**, not the property. + +## Decision + +- **Transaction boundary = one stage = one Unit of Work = one commit.** A batch yields ~3 commits (Ingestion, Baseline, Modelling), never N. No per-property commits. +- **All-or-nothing per batch, fail noisily.** Any property failing aborts that stage's unit (rollback); the exception propagates so `@subtask_handler` marks the subtask FAILED on the task table. Operators debug and re-run the batch. There is no per-property partial success. +- **Re-runs are idempotent.** Because stages commit independently, a re-run after a mid-pipeline failure re-executes already-committed earlier stages. So each stage's batch write **replaces** the rows for the batch's `property_ids` (delete-for-these-ids then bulk insert, or upsert) inside its unit. This is also what the future re-score-on-override path needs (re-baselining overwrites, never duplicates). +- **Bulk reads, load-whole (ADR-0002).** Repos expose `get_many(property_ids) -> Properties` returning fully-hydrated aggregates, implemented as one IN-filtered query per table composed in memory — a handful of round-trips per batch, not 30 × tables. No lean stage-specific read path. +- **Ingestion splits fetch from write.** Phase 1 fetches the whole batch (EPC / coordinates / solar) over HTTP/S3 with **no DB unit open**; phase 2 opens a unit and writes the batch, committing once. The connection is therefore held only for the short batch write, never across external IO. This sharpens the Fetcher-vs-Repo taxonomy of ADR-0011: Fetchers do IO outside any unit; Repos do DB inside the committed unit. +- **Mechanism: a `UnitOfWork`.** A `UnitOfWork` port + a `PostgresUnitOfWork` adapter (built on a module-scoped engine + sessionmaker) owns the session and constructs the DB-backed repos on it (`uow.property`, `uow.epc`, `uow.solar`, `uow.baseline`). It commits on explicit `commit()` and rolls back on any exception. Orchestrators take a `unit_of_work` factory plus their **non-DB** dependencies, injected separately: the EPC/Solar fetchers, the geospatial **S3** repo (reference data — read outside the transaction), and the Rebaseliner. Baseline uses one unit for the batch; Ingestion uses two (read uprns → fetch outside any unit → write batch). + +## Consequences + +- The orchestrators' dependency shape changes from "individual session-bound repos" to "a `unit_of_work` factory + non-DB deps". The #1134 Ingestion and #1135 Baseline orchestrators are refactored accordingly; `FirstRunPipeline` is unchanged (it still composes the three stages and threads only `property_ids`). +- Hard to reverse once every stage depends on the UoW — hence this ADR. +- Atomicity is **stage-level**, not per-property; correctness of the re-run workflow depends on the idempotent batch writes above. +- The engine + sessionmaker move to module scope so the pool is reused across warm Lambda invocations, rather than rebuilt per invocation (the existing `default_orchestrator` has the same per-invocation smell and should follow). +- EPC writes span child tables, so the idempotent "replace for these `property_ids`" must delete child rows too (cascade) before re-insert. +- The Modelling stub is left untouched this slice — its `run` is a no-op that touches no DB, so giving it a `unit_of_work` now would be an unused dependency. It takes a unit when its scoring body is built (the per-service Modelling grills). diff --git a/orchestration/baseline_orchestrator.py b/orchestration/baseline_orchestrator.py index 298e3683..4ae3a480 100644 --- a/orchestration/baseline_orchestrator.py +++ b/orchestration/baseline_orchestrator.py @@ -1,5 +1,7 @@ from __future__ import annotations +from collections.abc import Callable + from datatypes.epc.domain.epc_property_data import ( EpcPropertyData, RenewableHeatIncentive, @@ -7,50 +9,51 @@ from datatypes.epc.domain.epc_property_data import ( from domain.baseline.baseline_performance import BaselinePerformance from domain.baseline.performance import lodged_performance from domain.baseline.rebaseliner import Rebaseliner -from repositories.baseline.baseline_repository import BaselineRepository -from repositories.property.property_repository import PropertyRepository +from repositories.unit_of_work import UnitOfWork class BaselineOrchestrator: """Stage 2: establish each Property's Baseline Performance and persist it. - For each property: hydrate the Property aggregate via PropertyRepo, resolve - its Effective EPC, read Lodged Performance off it, run the Rebaseliner to - produce Effective Performance (equal to Lodged unless a trigger fires), and - persist the pair plus the deterministic kWh. + Runs the whole batch in **one** Unit of Work and commits once (ADR-0012): + for each property it hydrates the Property via the unit's PropertyRepo, + resolves the Effective EPC, reads Lodged Performance off it, runs the + Rebaseliner to produce Effective Performance, and persists the pair plus the + deterministic kWh. Any property raising aborts the batch — the unit is left + uncommitted, so nothing persists and the subtask fails noisily. - Reads only from repos — never a Fetcher or HTTP (ADR-0003). That is what - makes it byte-identical whether Ingestion ran milliseconds ago (First Run) - or last week (single-property review). The injected Rebaseliner is the - re-score-on-override seam: the future single-property flow re-runs the same - step after a Landlord Override changes the Effective EPC (ADR-0011). + Reads only from repos — never a Fetcher or HTTP (ADR-0003) — so it is + byte-identical whether Ingestion ran milliseconds ago (First Run) or last + week. The injected Rebaseliner is the re-score-on-override seam (ADR-0011). """ def __init__( self, *, - property_repo: PropertyRepository, + unit_of_work: Callable[[], UnitOfWork], rebaseliner: Rebaseliner, - baseline_repo: BaselineRepository, ) -> None: - self._property_repo = property_repo + self._unit_of_work = unit_of_work self._rebaseliner = rebaseliner - self._baseline_repo = baseline_repo def run(self, property_ids: list[int]) -> None: - for property_id in property_ids: - effective_epc = self._property_repo.get(property_id).effective_epc - lodged = lodged_performance(effective_epc) - effective, reason = self._rebaseliner.rebaseline(effective_epc, lodged) - rhi = _require_rhi(effective_epc) - baseline = BaselinePerformance( - lodged=lodged, - effective=effective, - rebaseline_reason=reason, - space_heating_kwh=rhi.space_heating_kwh, - water_heating_kwh=rhi.water_heating_kwh, - ) - self._baseline_repo.save(baseline, property_id) + with self._unit_of_work() as uow: + for property_id in property_ids: + effective_epc = uow.property.get(property_id).effective_epc + lodged = lodged_performance(effective_epc) + effective, reason = self._rebaseliner.rebaseline( + effective_epc, lodged + ) + rhi = _require_rhi(effective_epc) + baseline = BaselinePerformance( + lodged=lodged, + effective=effective, + rebaseline_reason=reason, + space_heating_kwh=rhi.space_heating_kwh, + water_heating_kwh=rhi.water_heating_kwh, + ) + uow.baseline.save(baseline, property_id) + uow.commit() def _require_rhi(epc: EpcPropertyData) -> RenewableHeatIncentive: diff --git a/orchestration/ingestion_orchestrator.py b/orchestration/ingestion_orchestrator.py index a3d60d8f..f2bce52b 100644 --- a/orchestration/ingestion_orchestrator.py +++ b/orchestration/ingestion_orchestrator.py @@ -1,12 +1,12 @@ from __future__ import annotations +from collections.abc import Callable +from dataclasses import dataclass from typing import Any, Optional, Protocol from datatypes.epc.domain.epc_property_data import EpcPropertyData -from repositories.epc.epc_repository import EpcRepository from repositories.geospatial.geospatial_repository import GeospatialRepository -from repositories.property.property_repository import PropertyRepository -from repositories.solar.solar_repository import SolarRepository +from repositories.unit_of_work import UnitOfWork class EpcFetcher(Protocol): @@ -23,50 +23,75 @@ class SolarFetcher(Protocol): ) -> dict[str, Any]: ... -class IngestionOrchestrator: - """Stage 1: acquire a Property's external source data and persist it. +@dataclass +class _Fetched: + """One property's externally-fetched source data, awaiting the write phase.""" - For each property: read its UPRN from the property row, fetch its EPC, resolve - its coordinates from the Geospatial reference Repo, thread those into the Solar - fetcher, and persist EPC + solar via repos. The orchestrator is the only place - a Fetcher and a Repo meet, and it threads the coordinate from the Repo into the - Solar Fetcher — Fetchers never call each other (ADR-0011). Coordinates are - reference data (deterministic from UPRN), so they are resolved transiently to - drive the Solar fetch rather than persisted per-property. + property_id: int + epc: Optional[EpcPropertyData] + solar_insights: Optional[dict[str, Any]] + + +class IngestionOrchestrator: + """Stage 1: acquire a batch's external source data and persist it. + + Runs in two phases so a DB connection is never held during external IO + (ADR-0012): **fetch** the whole batch — read each UPRN, fetch its EPC, resolve + coordinates from the Geospatial reference Repo, thread those into the Solar + fetcher — with *no unit open*; then **write** the batch in one Unit of Work + and commit once. Fetchers never call each other (ADR-0011); the orchestrator + threads the coordinate. Coordinates are reference data (deterministic from + UPRN), resolved transiently to drive the Solar fetch, never persisted. + + The geospatial repo reads S3 reference data, not the transactional store, so + it is injected separately rather than taken from the unit. """ def __init__( self, *, - property_repo: PropertyRepository, + unit_of_work: Callable[[], UnitOfWork], epc_fetcher: EpcFetcher, geospatial_repo: GeospatialRepository, solar_fetcher: SolarFetcher, - epc_repo: EpcRepository, - solar_repo: SolarRepository, ) -> None: - self._property_repo = property_repo + self._unit_of_work = unit_of_work self._epc_fetcher = epc_fetcher self._geospatial_repo = geospatial_repo self._solar_fetcher = solar_fetcher - self._epc_repo = epc_repo - self._solar_repo = solar_repo def run(self, property_ids: list[int]) -> None: - for property_id in property_ids: - uprn = self._property_repo.get(property_id).identity.uprn - if uprn is None: - # No UPRN to fetch against (e.g. landlord_property_id-only); a - # later Site-Notes path covers these. - continue + uprns = self._uprns_for(property_ids) + fetched = [self._fetch(property_id, uprn) for property_id, uprn in uprns] + self._persist(fetched) - epc = self._epc_fetcher.get_by_uprn(uprn) - if epc is not None: - self._epc_repo.save(epc, property_id=property_id) + def _uprns_for(self, property_ids: list[int]) -> list[tuple[int, int]]: + # A short read unit; properties with no UPRN (e.g. landlord_property_id + # only) are skipped — a later Site-Notes path covers them. + with self._unit_of_work() as uow: + pairs: list[tuple[int, int]] = [] + for property_id in property_ids: + uprn = uow.property.get(property_id).identity.uprn + if uprn is not None: + pairs.append((property_id, uprn)) + return pairs - coordinates = self._geospatial_repo.coordinates_for(uprn) - if coordinates is not None: - insights = self._solar_fetcher.get_building_insights( - coordinates.longitude, coordinates.latitude - ) - self._solar_repo.save(property_id, insights) + def _fetch(self, property_id: int, uprn: int) -> _Fetched: + # No unit open here — this is the external-IO phase. + epc = self._epc_fetcher.get_by_uprn(uprn) + solar_insights: Optional[dict[str, Any]] = None + coordinates = self._geospatial_repo.coordinates_for(uprn) + if coordinates is not None: + solar_insights = self._solar_fetcher.get_building_insights( + coordinates.longitude, coordinates.latitude + ) + return _Fetched(property_id, epc, solar_insights) + + def _persist(self, fetched: list[_Fetched]) -> None: + with self._unit_of_work() as uow: + for item in fetched: + if item.epc is not None: + uow.epc.save(item.epc, property_id=item.property_id) + if item.solar_insights is not None: + uow.solar.save(item.property_id, item.solar_insights) + uow.commit() diff --git a/tests/orchestration/fakes.py b/tests/orchestration/fakes.py new file mode 100644 index 00000000..5891434a --- /dev/null +++ b/tests/orchestration/fakes.py @@ -0,0 +1,110 @@ +"""In-memory fakes for orchestrator unit tests (no DB, no network). + +A `FakeUnitOfWork` exposes dict-backed fake repos and records commits, so a +test can drive an orchestrator and then assert what was persisted and that the +batch committed exactly once (ADR-0012).""" + +from __future__ import annotations + +from types import TracebackType +from typing import Any, Optional + +from datatypes.epc.domain.epc_property_data import EpcPropertyData +from domain.baseline.baseline_performance import BaselinePerformance +from domain.property.property import Property +from repositories.baseline.baseline_repository import BaselineRepository +from repositories.epc.epc_repository import EpcRepository +from repositories.property.property_repository import PropertyRepository +from repositories.solar.solar_repository import SolarRepository +from repositories.unit_of_work import UnitOfWork + + +class FakePropertyRepo(PropertyRepository): + def __init__(self, by_id: dict[int, Property]) -> None: + self._by_id = by_id + + def get(self, property_id: int) -> Property: + return self._by_id[property_id] + + +class FakeEpcRepo(EpcRepository): + def __init__(self, by_property: Optional[dict[int, EpcPropertyData]] = None) -> None: + self.saved: list[tuple[EpcPropertyData, Optional[int]]] = [] + self._by_property = by_property or {} + + def save( + self, + data: EpcPropertyData, + property_id: Optional[int] = None, + portfolio_id: Optional[int] = None, + ) -> int: + self.saved.append((data, property_id)) + if property_id is not None: + self._by_property[property_id] = data + return len(self.saved) + + def get(self, epc_property_id: int) -> EpcPropertyData: # pragma: no cover + raise NotImplementedError + + def get_for_property(self, property_id: int) -> Optional[EpcPropertyData]: + return self._by_property.get(property_id) + + +class FakeSolarRepo(SolarRepository): + def __init__(self) -> None: + self.saved: list[tuple[int, dict[str, Any]]] = [] + + def save(self, property_id: int, insights: dict[str, Any]) -> None: + self.saved.append((property_id, insights)) + + def get(self, property_id: int) -> Optional[dict[str, Any]]: # pragma: no cover + raise NotImplementedError + + +class FakeBaselineRepo(BaselineRepository): + def __init__(self) -> None: + self.saved: list[tuple[BaselinePerformance, int]] = [] + + def save(self, baseline: BaselinePerformance, property_id: int) -> int: + self.saved.append((baseline, property_id)) + return len(self.saved) + + def get_for_property( + self, property_id: int + ) -> Optional[BaselinePerformance]: # pragma: no cover + raise NotImplementedError + + +class FakeUnitOfWork(UnitOfWork): + """A unit that holds in-memory repos and counts commits.""" + + def __init__( + self, + *, + property: FakePropertyRepo, + epc: Optional[FakeEpcRepo] = None, + solar: Optional[FakeSolarRepo] = None, + baseline: Optional[FakeBaselineRepo] = None, + ) -> None: + self.property = property + self.epc = epc or FakeEpcRepo() + self.solar = solar or FakeSolarRepo() + self.baseline = baseline or FakeBaselineRepo() + self.commits = 0 + + def __enter__(self) -> "FakeUnitOfWork": + return self + + def __exit__( + self, + exc_type: Optional[type[BaseException]], + exc: Optional[BaseException], + tb: Optional[TracebackType], + ) -> None: + return None + + def commit(self) -> None: + self.commits += 1 + + def rollback(self) -> None: + return None diff --git a/tests/orchestration/test_baseline_orchestrator.py b/tests/orchestration/test_baseline_orchestrator.py index 3958b9b4..a18628ec 100644 --- a/tests/orchestration/test_baseline_orchestrator.py +++ b/tests/orchestration/test_baseline_orchestrator.py @@ -1,7 +1,5 @@ from __future__ import annotations -from typing import Optional - import pytest from datatypes.epc.domain.epc import Epc @@ -14,30 +12,11 @@ from domain.baseline.performance import Performance from domain.baseline.rebaseliner import RebaselineNotImplemented, StubRebaseliner from domain.property.property import Property, PropertyIdentity from orchestration.baseline_orchestrator import BaselineOrchestrator -from repositories.baseline.baseline_repository import BaselineRepository -from repositories.property.property_repository import PropertyRepository - - -class _FakePropertyRepo(PropertyRepository): - def __init__(self, by_id: dict[int, Property]) -> None: - self._by_id = by_id - - def get(self, property_id: int) -> Property: - return self._by_id[property_id] - - -class _FakeBaselineRepo(BaselineRepository): - def __init__(self) -> None: - self.saved: list[tuple[BaselinePerformance, int]] = [] - - def save(self, baseline: BaselinePerformance, property_id: int) -> int: - self.saved.append((baseline, property_id)) - return len(self.saved) - - def get_for_property( - self, property_id: int - ) -> Optional[BaselinePerformance]: # pragma: no cover - raise NotImplementedError +from tests.orchestration.fakes import ( + FakeBaselineRepo, + FakePropertyRepo, + FakeUnitOfWork, +) def _property(*, sap_version: float) -> Property: @@ -58,25 +37,22 @@ def _property(*, sap_version: float) -> Property: ) -def _sap10_property() -> Property: - return _property(sap_version=10.2) - - -def test_run_establishes_and_persists_baseline_performance() -> None: +def test_run_establishes_persists_and_commits_the_batch_once() -> None: # Arrange - property_repo = _FakePropertyRepo({10: _sap10_property()}) - baseline_repo = _FakeBaselineRepo() + baseline_repo = FakeBaselineRepo() + uow = FakeUnitOfWork( + property=FakePropertyRepo({10: _property(sap_version=10.2)}), + baseline=baseline_repo, + ) orchestrator = BaselineOrchestrator( - property_repo=property_repo, - rebaseliner=StubRebaseliner(), - baseline_repo=baseline_repo, + unit_of_work=lambda: uow, rebaseliner=StubRebaseliner() ) # Act orchestrator.run([10]) - # Assert — one Baseline Performance persisted for property 10, both halves - # equal (no rebaselining), kWh read off the RHI. + # Assert — one Baseline Performance persisted (both halves equal, kWh off the + # RHI), and the batch committed exactly once. lodged = Performance( sap_score=72, epc_band=Epc.C, co2_emissions=1.8, primary_energy_intensity=180 ) @@ -92,19 +68,23 @@ def test_run_establishes_and_persists_baseline_performance() -> None: 10, ) ] + assert uow.commits == 1 -def test_run_raises_on_a_pre_sap10_property_and_persists_nothing() -> None: +def test_run_raises_on_a_pre_sap10_property_and_does_not_commit() -> None: # Arrange — a pre-SAP10 cert needs ML rebaselining, which is not wired yet. - property_repo = _FakePropertyRepo({10: _property(sap_version=9.94)}) - baseline_repo = _FakeBaselineRepo() + baseline_repo = FakeBaselineRepo() + uow = FakeUnitOfWork( + property=FakePropertyRepo({10: _property(sap_version=9.94)}), + baseline=baseline_repo, + ) orchestrator = BaselineOrchestrator( - property_repo=property_repo, - rebaseliner=StubRebaseliner(), - baseline_repo=baseline_repo, + unit_of_work=lambda: uow, rebaseliner=StubRebaseliner() ) - # Act / Assert — the raise propagates; no half-baked baseline is written. + # Act / Assert — the raise propagates; the batch is neither persisted nor + # committed (all-or-nothing). with pytest.raises(RebaselineNotImplemented): orchestrator.run([10]) assert baseline_repo.saved == [] + assert uow.commits == 0 diff --git a/tests/orchestration/test_first_run_pipeline_integration.py b/tests/orchestration/test_first_run_pipeline_integration.py index 55ca34ed..d96351c7 100644 --- a/tests/orchestration/test_first_run_pipeline_integration.py +++ b/tests/orchestration/test_first_run_pipeline_integration.py @@ -1,28 +1,43 @@ +"""End-to-end through-repos integration for First Run (ADR-0012, #1138). + +Real PostgresUnitOfWork over an ephemeral DB: Ingestion writes the EPC, Baseline +reads it back *through the repo* (not in memory), and a re-run replaces rather +than duplicates. Stub Modelling. The source clients are faked (no IO).""" + from __future__ import annotations +import dataclasses +import json from dataclasses import dataclass +from pathlib import Path from typing import Any, Optional +from sqlalchemy import Engine +from sqlmodel import Session, select + from datatypes.epc.domain.epc import Epc -from datatypes.epc.domain.epc_property_data import ( - EpcPropertyData, - RenewableHeatIncentive, -) +from datatypes.epc.domain.epc_property_data import EpcPropertyData +from datatypes.epc.domain.mapper import EpcPropertyDataMapper from domain.baseline.rebaseliner import StubRebaseliner from domain.geospatial.coordinates import Coordinates -from domain.property.property import Property, PropertyIdentity +from infrastructure.postgres.baseline_performance_table import ( + BaselinePerformanceModel, +) +from infrastructure.postgres.epc_property_table import EpcPropertyModel +from infrastructure.postgres.property_table import PropertyRow from orchestration.baseline_orchestrator import BaselineOrchestrator from orchestration.first_run_pipeline import FirstRunPipeline from orchestration.ingestion_orchestrator import IngestionOrchestrator from orchestration.modelling_orchestrator import ModellingOrchestrator -from repositories.baseline.baseline_repository import BaselineRepository -from repositories.epc.epc_repository import EpcRepository +from repositories.baseline.baseline_postgres_repository import ( + BaselinePostgresRepository, +) from repositories.geospatial.geospatial_repository import GeospatialRepository from repositories.materials.materials_repository import MaterialsRepository -from repositories.property.property_repository import PropertyRepository +from repositories.postgres_unit_of_work import PostgresUnitOfWork from repositories.scenario.scenario_repository import ScenarioRepository -from repositories.solar.solar_repository import SolarRepository -from domain.baseline.baseline_performance import BaselinePerformance + +_JSON_SAMPLES = Path(__file__).resolve().parents[2] / "backend/epc_api/json_samples" @dataclass @@ -32,48 +47,7 @@ class _FakeCommand: scenario_ids: list[int] -class _SharedEpcRepo(EpcRepository): - """Stands in for the persisted EPC slice both stages talk through.""" - - def __init__(self) -> None: - self._by_property: dict[int, EpcPropertyData] = {} - - def save( - self, - data: EpcPropertyData, - property_id: Optional[int] = None, - portfolio_id: Optional[int] = None, - ) -> int: - assert property_id is not None - self._by_property[property_id] = data - return property_id - - def get(self, epc_property_id: int) -> EpcPropertyData: # pragma: no cover - raise NotImplementedError - - def get_for_property(self, property_id: int) -> Optional[EpcPropertyData]: - return self._by_property.get(property_id) - - -class _RepoBackedPropertyRepo(PropertyRepository): - """Composes the Property from its identity row + the EPC slice in the shared - EPC repo — mirroring PropertyPostgresRepository, so the stages genuinely - hand off through repo state, not in memory.""" - - def __init__( - self, identities: dict[int, PropertyIdentity], epc_repo: _SharedEpcRepo - ) -> None: - self._identities = identities - self._epc_repo = epc_repo - - def get(self, property_id: int) -> Property: - return Property( - identity=self._identities[property_id], - epc=self._epc_repo.get_for_property(property_id), - ) - - -class _FakeEpcFetcher: +class _FetcherReturning: def __init__(self, epc: EpcPropertyData) -> None: self._epc = epc @@ -81,103 +55,91 @@ class _FakeEpcFetcher: return self._epc -class _NoCoordinatesGeospatialRepo(GeospatialRepository): +class _NoCoordinates(GeospatialRepository): def coordinates_for(self, uprn: int) -> Optional[Coordinates]: return None # skip the solar leg — not under test here -class _FakeSolarFetcher: +class _UnusedSolarFetcher: def get_building_insights( self, longitude: float, latitude: float ) -> dict[str, Any]: # pragma: no cover return {} -class _FakeSolarRepo(SolarRepository): - def save(self, property_id: int, insights: dict[str, Any]) -> None: # pragma: no cover - return None - - def get(self, property_id: int) -> Optional[dict[str, Any]]: # pragma: no cover - raise NotImplementedError - - -class _CollectingBaselineRepo(BaselineRepository): - def __init__(self) -> None: - self.saved: list[tuple[BaselinePerformance, int]] = [] - - def save(self, baseline: BaselinePerformance, property_id: int) -> int: - self.saved.append((baseline, property_id)) - return len(self.saved) - - def get_for_property( - self, property_id: int - ) -> Optional[BaselinePerformance]: # pragma: no cover - raise NotImplementedError - - -class _FakeScenarioRepo(ScenarioRepository): - pass - - -class _FakeMaterialsRepo(MaterialsRepository): - pass - - -def _ingestible_epc() -> EpcPropertyData: - epc = object.__new__(EpcPropertyData) - epc.energy_rating_current = 72 - epc.current_energy_efficiency_band = Epc.C - epc.co2_emissions_current = 1.8 - epc.energy_consumption_current = 180 - epc.sap_version = 10.2 - epc.renewable_heat_incentive = RenewableHeatIncentive( - space_heating_kwh=5000.0, water_heating_kwh=2000.0 +def _lodged_epc() -> EpcPropertyData: + # A real, persistable EPC (so it round-trips through the EPC repo), with the + # recorded-performance fields the sample leaves blank filled in so Baseline + # can read its Lodged Performance. + raw: dict[str, Any] = json.loads( + (_JSON_SAMPLES / "RdSAP-Schema-21.0.0" / "epc.json").read_text() + ) + epc = EpcPropertyDataMapper.from_api_response(raw) + return dataclasses.replace( + epc, + energy_rating_current=72, + current_energy_efficiency_band=Epc.C, + co2_emissions_current=1.8, + energy_consumption_current=180, ) - return epc -def test_baseline_reads_the_epc_ingestion_persisted_through_repos() -> None: - # Arrange — one property; the EPC the fetcher returns is what Ingestion - # persists and Baseline must then read back through the shared repo. - epc = _ingestible_epc() - epc_repo = _SharedEpcRepo() - identities = { - 10: PropertyIdentity( - portfolio_id=1, postcode="A0 0AA", address="1 Some Street", uprn=123 +def test_first_run_baselines_through_repos_and_is_idempotent_on_rerun( + db_engine: Engine, +) -> None: + # Arrange — a property row to ingest against, and the EPC its fetcher returns. + with Session(db_engine) as session: + session.add( + PropertyRow( + id=10, + portfolio_id=1, + postcode="A0 0AA", + address="1 Some Street", + uprn=12345, + ) ) - } - property_repo = _RepoBackedPropertyRepo(identities, epc_repo) - baseline_repo = _CollectingBaselineRepo() + session.commit() + + def unit_of_work() -> PostgresUnitOfWork: + return PostgresUnitOfWork(lambda: Session(db_engine)) pipeline = FirstRunPipeline( ingestion=IngestionOrchestrator( - property_repo=property_repo, - epc_fetcher=_FakeEpcFetcher(epc), - geospatial_repo=_NoCoordinatesGeospatialRepo(), - solar_fetcher=_FakeSolarFetcher(), - epc_repo=epc_repo, - solar_repo=_FakeSolarRepo(), + unit_of_work=unit_of_work, + epc_fetcher=_FetcherReturning(_lodged_epc()), + geospatial_repo=_NoCoordinates(), + solar_fetcher=_UnusedSolarFetcher(), ), baseline=BaselineOrchestrator( - property_repo=property_repo, - rebaseliner=StubRebaseliner(), - baseline_repo=baseline_repo, + unit_of_work=unit_of_work, rebaseliner=StubRebaseliner() ), modelling=ModellingOrchestrator( - scenario_repo=_FakeScenarioRepo(), - materials_repo=_FakeMaterialsRepo(), + scenario_repo=ScenarioRepository(), + materials_repo=MaterialsRepository(), ), ) + command = _FakeCommand(portfolio_id=1, property_ids=[10], scenario_ids=[7]) - # Act - pipeline.run(_FakeCommand(portfolio_id=1, property_ids=[10], scenario_ids=[7])) + # Act — First Run, then a re-run over the same batch. + pipeline.run(command) + pipeline.run(command) - # Assert — a Baseline Performance landed for property 10, its Lodged half - # read off the very EPC Ingestion persisted. Only property_ids crossed the - # stage boundary; the EPC itself travelled through the repo. - assert len(baseline_repo.saved) == 1 - baseline, property_id = baseline_repo.saved[0] - assert property_id == 10 + # Assert — Baseline read the EPC Ingestion persisted (through the repo, only + # property_ids crossed the stage boundary), and the re-run replaced rather + # than duplicated either row. + with Session(db_engine) as session: + baseline = BaselinePostgresRepository(session).get_for_property(10) + epc_rows = session.exec( + select(EpcPropertyModel).where(EpcPropertyModel.property_id == 10) + ).all() + baseline_rows = session.exec( + select(BaselinePerformanceModel).where( + BaselinePerformanceModel.property_id == 10 + ) + ).all() + + assert baseline is not None assert baseline.lodged.sap_score == 72 - assert baseline.lodged.epc_band == Epc.C - assert baseline.space_heating_kwh == 5000.0 + assert baseline.space_heating_kwh == 13120.0 + assert len(epc_rows) == 1 + assert len(baseline_rows) == 1 diff --git a/tests/orchestration/test_ingestion_orchestrator.py b/tests/orchestration/test_ingestion_orchestrator.py index 1c6a0f89..be2d86b4 100644 --- a/tests/orchestration/test_ingestion_orchestrator.py +++ b/tests/orchestration/test_ingestion_orchestrator.py @@ -1,8 +1,5 @@ -"""IngestionOrchestrator wires fetchers + repos with no real IO (ADR-0011). - -Tested entirely against fakes: it must fetch EPC + solar, thread the -Geospatial-resolved coordinates into the solar fetcher, and persist via repos. -""" +"""IngestionOrchestrator fetches the batch (no DB unit open), then writes it in +one Unit of Work and commits once (ADR-0012). Tested against fakes — no IO.""" from __future__ import annotations @@ -12,18 +9,13 @@ from datatypes.epc.domain.epc_property_data import EpcPropertyData from domain.geospatial.coordinates import Coordinates from domain.property.property import Property, PropertyIdentity from orchestration.ingestion_orchestrator import IngestionOrchestrator -from repositories.epc.epc_repository import EpcRepository from repositories.geospatial.geospatial_repository import GeospatialRepository -from repositories.property.property_repository import PropertyRepository -from repositories.solar.solar_repository import SolarRepository - - -class _FakePropertyRepo(PropertyRepository): - def __init__(self, by_id: dict[int, Property]) -> None: - self._by_id = by_id - - def get(self, property_id: int) -> Property: - return self._by_id[property_id] +from tests.orchestration.fakes import ( + FakeEpcRepo, + FakePropertyRepo, + FakeSolarRepo, + FakeUnitOfWork, +) class _FakeEpcFetcher: @@ -56,39 +48,6 @@ class _FakeSolarFetcher: return self.insights -class _FakeEpcRepo(EpcRepository): - def __init__(self) -> None: - self.saved: list[tuple[EpcPropertyData, Optional[int]]] = [] - - def save( - self, - data: EpcPropertyData, - property_id: Optional[int] = None, - portfolio_id: Optional[int] = None, - ) -> int: - self.saved.append((data, property_id)) - return 1 - - def get(self, epc_property_id: int) -> EpcPropertyData: # pragma: no cover - raise NotImplementedError - - def get_for_property( - self, property_id: int - ) -> Optional[EpcPropertyData]: # pragma: no cover - raise NotImplementedError - - -class _FakeSolarRepo(SolarRepository): - def __init__(self) -> None: - self.saved: list[tuple[int, dict[str, Any]]] = [] - - def save(self, property_id: int, insights: dict[str, Any]) -> None: - self.saved.append((property_id, insights)) - - def get(self, property_id: int) -> Optional[dict[str, Any]]: # pragma: no cover - raise NotImplementedError - - def _property(uprn: Optional[int]) -> Property: return Property( identity=PropertyIdentity( @@ -97,55 +56,59 @@ def _property(uprn: Optional[int]) -> Property: ) -def _epc() -> EpcPropertyData: - # A bare placeholder is enough — the orchestrator treats the EPC opaquely. - return object.__new__(EpcPropertyData) - - def test_ingestion_persists_epc_and_threads_coords_into_solar() -> None: # Arrange - epc = _epc() + epc = object.__new__(EpcPropertyData) insights = {"name": "buildings/X"} - coords = Coordinates(longitude=-0.1278, latitude=51.5074) - epc_repo = _FakeEpcRepo() - solar_repo = _FakeSolarRepo() + epc_repo = FakeEpcRepo() + solar_repo = FakeSolarRepo() solar_fetcher = _FakeSolarFetcher(insights) + uow = FakeUnitOfWork( + property=FakePropertyRepo({10: _property(uprn=12345)}), + epc=epc_repo, + solar=solar_repo, + ) orchestrator = IngestionOrchestrator( - property_repo=_FakePropertyRepo({10: _property(uprn=12345)}), + unit_of_work=lambda: uow, epc_fetcher=_FakeEpcFetcher(epc), - geospatial_repo=_FakeGeospatialRepo(coords), + geospatial_repo=_FakeGeospatialRepo( + Coordinates(longitude=-0.1278, latitude=51.5074) + ), solar_fetcher=solar_fetcher, - epc_repo=epc_repo, - solar_repo=solar_repo, ) # Act orchestrator.run([10]) - # Assert + # Assert — EPC persisted, coords threaded from the repo into the solar + # fetcher, solar persisted, batch committed once. assert epc_repo.saved == [(epc, 10)] - assert solar_fetcher.calls == [(-0.1278, 51.5074)] # coords threaded from repo + assert solar_fetcher.calls == [(-0.1278, 51.5074)] assert solar_repo.saved == [(10, insights)] + assert uow.commits == 1 def test_ingestion_skips_property_without_uprn() -> None: # Arrange - epc_repo = _FakeEpcRepo() - solar_repo = _FakeSolarRepo() + epc_repo = FakeEpcRepo() + solar_repo = FakeSolarRepo() solar_fetcher = _FakeSolarFetcher({}) + uow = FakeUnitOfWork( + property=FakePropertyRepo({10: _property(uprn=None)}), + epc=epc_repo, + solar=solar_repo, + ) orchestrator = IngestionOrchestrator( - property_repo=_FakePropertyRepo({10: _property(uprn=None)}), - epc_fetcher=_FakeEpcFetcher(_epc()), + unit_of_work=lambda: uow, + epc_fetcher=_FakeEpcFetcher(object.__new__(EpcPropertyData)), geospatial_repo=_FakeGeospatialRepo(None), solar_fetcher=solar_fetcher, - epc_repo=epc_repo, - solar_repo=solar_repo, ) # Act orchestrator.run([10]) - # Assert — nothing fetched or persisted for a UPRN-less property + # Assert — nothing fetched or persisted for a UPRN-less property. assert epc_repo.saved == [] assert solar_repo.saved == [] assert solar_fetcher.calls == [] @@ -153,17 +116,20 @@ def test_ingestion_skips_property_without_uprn() -> None: def test_ingestion_persists_epc_but_skips_solar_when_no_coordinates() -> None: # Arrange - epc = _epc() - epc_repo = _FakeEpcRepo() - solar_repo = _FakeSolarRepo() + epc = object.__new__(EpcPropertyData) + epc_repo = FakeEpcRepo() + solar_repo = FakeSolarRepo() solar_fetcher = _FakeSolarFetcher({}) + uow = FakeUnitOfWork( + property=FakePropertyRepo({10: _property(uprn=12345)}), + epc=epc_repo, + solar=solar_repo, + ) orchestrator = IngestionOrchestrator( - property_repo=_FakePropertyRepo({10: _property(uprn=12345)}), + unit_of_work=lambda: uow, epc_fetcher=_FakeEpcFetcher(epc), geospatial_repo=_FakeGeospatialRepo(None), solar_fetcher=solar_fetcher, - epc_repo=epc_repo, - solar_repo=solar_repo, ) # Act @@ -171,5 +137,5 @@ def test_ingestion_persists_epc_but_skips_solar_when_no_coordinates() -> None: # Assert assert epc_repo.saved == [(epc, 10)] - assert solar_fetcher.calls == [] assert solar_repo.saved == [] + assert solar_fetcher.calls == [] From 8685f8ba3a8ca11006d8f882e4b35ff7ebc9eb04 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sun, 31 May 2026 10:33:24 +0000 Subject: [PATCH 14/18] =?UTF-8?q?perf(repos):=20bulk=20get=5Fmany=20/=20ge?= =?UTF-8?q?t=5Ffor=5Fproperties=20=E2=80=94=20batch=20reads,=20not=20N=20r?= =?UTF-8?q?ound-trips=20(#1138)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Final slice of ADR-0012: collapse the per-property read round-trips a batch made (Baseline hydrated ~8 queries x 30 properties one at a time) into a handful of per-table IN queries. - EpcPostgresRepository: extracted a shared `_compose(rows)` from `get` (the windows + floor-dim fetches are now passed in, not fetched inline), so both `get` and the new `get_for_properties(property_ids)` build EpcPropertyData from pre-fetched rows. `get_for_properties` fetches each child table once (`WHERE epc_property_id IN ...`), groups in memory, and composes — load-whole per ADR-0002. - PropertyRepository.get_many(property_ids) -> Properties: one query for the property rows + one bulk EPC hydration, composed in input order. - BaselineOrchestrator / IngestionOrchestrator read the batch via get_many instead of N x get. - Ports + fakes gain the bulk methods. The #1129 round-trip fidelity test stays green (the compose extraction is behaviour-preserving). New tests: bulk hydration correctness + round-trips are constant w.r.t. batch size (one-per-table, proven by query count). 123 pass; pyright strict clean; AAA. Co-Authored-By: Claude Opus 4.8 --- orchestration/baseline_orchestrator.py | 5 +- orchestration/ingestion_orchestrator.py | 12 +- repositories/epc/epc_postgres_repository.py | 176 ++++++++++++++++-- repositories/epc/epc_repository.py | 8 + .../property/property_postgres_repository.py | 30 ++- repositories/property/property_repository.py | 8 + tests/orchestration/fakes.py | 13 ++ tests/repositories/epc/test_epc_bulk_read.py | 81 ++++++++ 8 files changed, 313 insertions(+), 20 deletions(-) create mode 100644 tests/repositories/epc/test_epc_bulk_read.py diff --git a/orchestration/baseline_orchestrator.py b/orchestration/baseline_orchestrator.py index 4ae3a480..9a1138c8 100644 --- a/orchestration/baseline_orchestrator.py +++ b/orchestration/baseline_orchestrator.py @@ -38,8 +38,9 @@ class BaselineOrchestrator: def run(self, property_ids: list[int]) -> None: with self._unit_of_work() as uow: - for property_id in property_ids: - effective_epc = uow.property.get(property_id).effective_epc + properties = uow.property.get_many(property_ids) + for property_id, prop in zip(property_ids, properties, strict=True): + effective_epc = prop.effective_epc lodged = lodged_performance(effective_epc) effective, reason = self._rebaseliner.rebaseline( effective_epc, lodged diff --git a/orchestration/ingestion_orchestrator.py b/orchestration/ingestion_orchestrator.py index f2bce52b..1662ecf9 100644 --- a/orchestration/ingestion_orchestrator.py +++ b/orchestration/ingestion_orchestrator.py @@ -69,12 +69,12 @@ class IngestionOrchestrator: # A short read unit; properties with no UPRN (e.g. landlord_property_id # only) are skipped — a later Site-Notes path covers them. with self._unit_of_work() as uow: - pairs: list[tuple[int, int]] = [] - for property_id in property_ids: - uprn = uow.property.get(property_id).identity.uprn - if uprn is not None: - pairs.append((property_id, uprn)) - return pairs + properties = uow.property.get_many(property_ids) + return [ + (property_id, prop.identity.uprn) + for property_id, prop in zip(property_ids, properties, strict=True) + if prop.identity.uprn is not None + ] def _fetch(self, property_id: int, uprn: int) -> _Fetched: # No unit open here — this is the external-IO phase. diff --git a/repositories/epc/epc_postgres_repository.py b/repositories/epc/epc_postgres_repository.py index b1368916..525476ea 100644 --- a/repositories/epc/epc_postgres_repository.py +++ b/repositories/epc/epc_postgres_repository.py @@ -1,7 +1,8 @@ from __future__ import annotations +from collections.abc import Sequence from datetime import date -from typing import Optional, TypeVar +from typing import Optional, Protocol, TypeVar from sqlmodel import Session, col, delete, select @@ -56,6 +57,20 @@ def _require(value: Optional[_T], field: str) -> _T: return value +class _HasEpcPropertyId(Protocol): + epc_property_id: int + + +_RowT = TypeVar("_RowT", bound=_HasEpcPropertyId) + + +def _group_by_epc(rows: Sequence[_RowT]) -> dict[int, list[_RowT]]: + grouped: dict[int, list[_RowT]] = {} + for row in rows: + grouped.setdefault(row.epc_property_id, []).append(row) + return grouped + + class EpcPostgresRepository(EpcRepository): """Maps EpcPropertyData to/from the epc_property parent row + child tables. @@ -194,6 +209,117 @@ class EpcPostgresRepository(EpcRepository): return None return self.get(row.id) + def get_for_properties( + self, property_ids: list[int] + ) -> dict[int, EpcPropertyData]: + """Bulk-hydrate a batch's EPCs in a handful of per-table IN queries + (ADR-0012), not N x per-property. Load-whole per ADR-0002.""" + if not property_ids: + return {} + parents = self._session.exec( + select(EpcPropertyModel) + .where(col(EpcPropertyModel.property_id).in_(property_ids)) + .order_by(EpcPropertyModel.id) # type: ignore[arg-type] + ).all() + parent_by_property: dict[int, EpcPropertyModel] = {} + for parent in parents: + if parent.property_id is not None and parent.id is not None: + parent_by_property.setdefault(parent.property_id, parent) + epc_ids = [p.id for p in parent_by_property.values() if p.id is not None] + if not epc_ids: + return {} + + perf_by = { + r.epc_property_id: r + for r in self._session.exec( + select(EpcPropertyEnergyPerformanceModel).where( + col(EpcPropertyEnergyPerformanceModel.epc_property_id).in_(epc_ids) + ) + ).all() + } + flat_by = { + r.epc_property_id: r + for r in self._session.exec( + select(EpcFlatDetailsModel).where( + col(EpcFlatDetailsModel.epc_property_id).in_(epc_ids) + ) + ).all() + } + rhi_by = { + r.epc_property_id: r + for r in self._session.exec( + select(EpcRenewableHeatIncentiveModel).where( + col(EpcRenewableHeatIncentiveModel.epc_property_id).in_(epc_ids) + ) + ).all() + } + elements_by = _group_by_epc( + self._session.exec( + select(EpcEnergyElementModel) + .where(col(EpcEnergyElementModel.epc_property_id).in_(epc_ids)) + .order_by(EpcEnergyElementModel.id) # type: ignore[arg-type] + ).all() + ) + heating_by = _group_by_epc( + self._session.exec( + select(EpcMainHeatingDetailModel) + .where(col(EpcMainHeatingDetailModel.epc_property_id).in_(epc_ids)) + .order_by(EpcMainHeatingDetailModel.id) # type: ignore[arg-type] + ).all() + ) + parts_by = _group_by_epc( + self._session.exec( + select(EpcBuildingPartModel) + .where(col(EpcBuildingPartModel.epc_property_id).in_(epc_ids)) + .order_by(EpcBuildingPartModel.id) # type: ignore[arg-type] + ).all() + ) + windows_by = _group_by_epc( + self._session.exec( + select(EpcWindowModel) + .where(col(EpcWindowModel.epc_property_id).in_(epc_ids)) + .order_by(EpcWindowModel.id) # type: ignore[arg-type] + ).all() + ) + part_ids = [ + bp.id + for parts in parts_by.values() + for bp in parts + if bp.id is not None + ] + floor_dims_by_part = self._floor_dims_by_part(part_ids) + + result: dict[int, EpcPropertyData] = {} + for property_id, parent in parent_by_property.items(): + epc_id = _require(parent.id, "id") + result[property_id] = self._compose( + p=parent, + perf=perf_by.get(epc_id), + elements=elements_by.get(epc_id, []), + heating_rows=heating_by.get(epc_id, []), + part_rows=parts_by.get(epc_id, []), + floor_dims_by_part=floor_dims_by_part, + window_rows=windows_by.get(epc_id, []), + flat_row=flat_by.get(epc_id), + rhi_row=rhi_by.get(epc_id), + ) + return result + + def _floor_dims_by_part( + self, part_ids: list[int] + ) -> dict[int, list[EpcFloorDimensionModel]]: + if not part_ids: + return {} + rows = self._session.exec( + select(EpcFloorDimensionModel) + .where(col(EpcFloorDimensionModel.epc_building_part_id).in_(part_ids)) + .order_by(EpcFloorDimensionModel.id) # type: ignore[arg-type] + ).all() + grouped: dict[int, list[EpcFloorDimensionModel]] = {} + for row in rows: + grouped.setdefault(row.epc_building_part_id, []).append(row) + return grouped + def get(self, epc_property_id: int) -> EpcPropertyData: p = self._session.get(EpcPropertyModel, epc_property_id) if p is None: @@ -234,7 +360,35 @@ class EpcPostgresRepository(EpcRepository): EpcRenewableHeatIncentiveModel.epc_property_id == epc_property_id ) ).first() + window_rows = self._windows(epc_property_id) + floor_dims_by_part = self._floor_dims_by_part( + [bp.id for bp in part_rows if bp.id is not None] + ) + return self._compose( + p=p, + perf=perf, + elements=elements, + heating_rows=heating_rows, + part_rows=part_rows, + floor_dims_by_part=floor_dims_by_part, + window_rows=window_rows, + flat_row=flat_row, + rhi_row=rhi_row, + ) + def _compose( + self, + *, + p: EpcPropertyModel, + perf: Optional[EpcPropertyEnergyPerformanceModel], + elements: list[EpcEnergyElementModel], + heating_rows: list[EpcMainHeatingDetailModel], + part_rows: list[EpcBuildingPartModel], + floor_dims_by_part: dict[int, list[EpcFloorDimensionModel]], + window_rows: list[EpcWindowModel], + flat_row: Optional[EpcFlatDetailsModel], + rhi_row: Optional[EpcRenewableHeatIncentiveModel], + ) -> EpcPropertyData: def _elements(element_type: str) -> list[EnergyElement]: return [self._to_energy_element(e) for e in elements if e.element_type == element_type] @@ -256,9 +410,14 @@ class EpcPostgresRepository(EpcRepository): main_heating=_elements("main_heating"), door_count=p.door_count, sap_heating=self._to_sap_heating(p, heating_rows), - sap_windows=[self._to_window(w) for w in self._windows(epc_property_id)], + sap_windows=[self._to_window(w) for w in window_rows], sap_energy_source=self._to_energy_source(p), - sap_building_parts=[self._to_building_part(bp) for bp in part_rows], + sap_building_parts=[ + self._to_building_part( + bp, floor_dims_by_part.get(bp.id, []) if bp.id is not None else [] + ) + for bp in part_rows + ], solar_water_heating=p.solar_water_heating, has_hot_water_cylinder=p.has_hot_water_cylinder, has_fixed_air_conditioning=p.has_fixed_air_conditioning, @@ -519,14 +678,9 @@ class EpcPostgresRepository(EpcRepository): ) @private - def _to_building_part(self, bp: EpcBuildingPartModel) -> SapBuildingPart: - floor_rows = list( - self._session.exec( - select(EpcFloorDimensionModel) - .where(EpcFloorDimensionModel.epc_building_part_id == bp.id) - .order_by(EpcFloorDimensionModel.id) # type: ignore[arg-type] - ).all() - ) + def _to_building_part( + self, bp: EpcBuildingPartModel, floor_rows: list[EpcFloorDimensionModel] + ) -> SapBuildingPart: return SapBuildingPart( identifier=BuildingPartIdentifier(bp.identifier), construction_age_band=bp.construction_age_band, diff --git a/repositories/epc/epc_repository.py b/repositories/epc/epc_repository.py index fb83bdbc..171d098e 100644 --- a/repositories/epc/epc_repository.py +++ b/repositories/epc/epc_repository.py @@ -28,3 +28,11 @@ class EpcRepository(ABC): @abstractmethod def get_for_property(self, property_id: int) -> Optional[EpcPropertyData]: ... + + @abstractmethod + def get_for_properties( + self, property_ids: list[int] + ) -> dict[int, EpcPropertyData]: + """Bulk-hydrate a batch's EPCs, keyed by property_id (only those with an + EPC are present). A handful of per-table queries, not N per property.""" + ... diff --git a/repositories/property/property_postgres_repository.py b/repositories/property/property_postgres_repository.py index c1b631dd..e0b4f9ff 100644 --- a/repositories/property/property_postgres_repository.py +++ b/repositories/property/property_postgres_repository.py @@ -1,7 +1,8 @@ from __future__ import annotations -from sqlmodel import Session +from sqlmodel import Session, col, select +from domain.property.properties import Properties from domain.property.property import Property, PropertyIdentity from infrastructure.postgres.property_table import PropertyRow from repositories.epc.epc_repository import EpcRepository @@ -34,3 +35,30 @@ class PropertyPostgresRepository(PropertyRepository): identity=identity, epc=self._epc_repo.get_for_property(property_id), ) + + def get_many(self, property_ids: list[int]) -> Properties: + if not property_ids: + return Properties([]) + rows = self._session.exec( + select(PropertyRow).where(col(PropertyRow.id).in_(property_ids)) + ).all() + row_by_id = {row.id: row for row in rows if row.id is not None} + epcs = self._epc_repo.get_for_properties(property_ids) + items: list[Property] = [] + for property_id in property_ids: + row = row_by_id.get(property_id) + if row is None: + raise ValueError(f"property {property_id} not found") + items.append( + Property( + identity=PropertyIdentity( + portfolio_id=row.portfolio_id, + postcode=row.postcode, + address=row.address, + uprn=row.uprn, + landlord_property_id=row.landlord_property_id, + ), + epc=epcs.get(property_id), + ) + ) + return Properties(items) diff --git a/repositories/property/property_repository.py b/repositories/property/property_repository.py index 0a9045be..1f3df1da 100644 --- a/repositories/property/property_repository.py +++ b/repositories/property/property_repository.py @@ -2,6 +2,7 @@ from __future__ import annotations from abc import ABC, abstractmethod +from domain.property.properties import Properties from domain.property.property import Property @@ -15,3 +16,10 @@ class PropertyRepository(ABC): @abstractmethod def get(self, property_id: int) -> Property: ... + + @abstractmethod + def get_many(self, property_ids: list[int]) -> Properties: + """Load a batch of Properties whole, in a handful of per-table queries + rather than one round-trip per property (ADR-0012). Order follows the + input ids.""" + ... diff --git a/tests/orchestration/fakes.py b/tests/orchestration/fakes.py index 5891434a..24138520 100644 --- a/tests/orchestration/fakes.py +++ b/tests/orchestration/fakes.py @@ -11,6 +11,7 @@ from typing import Any, Optional from datatypes.epc.domain.epc_property_data import EpcPropertyData from domain.baseline.baseline_performance import BaselinePerformance +from domain.property.properties import Properties from domain.property.property import Property from repositories.baseline.baseline_repository import BaselineRepository from repositories.epc.epc_repository import EpcRepository @@ -26,6 +27,9 @@ class FakePropertyRepo(PropertyRepository): def get(self, property_id: int) -> Property: return self._by_id[property_id] + def get_many(self, property_ids: list[int]) -> Properties: + return Properties([self._by_id[property_id] for property_id in property_ids]) + class FakeEpcRepo(EpcRepository): def __init__(self, by_property: Optional[dict[int, EpcPropertyData]] = None) -> None: @@ -49,6 +53,15 @@ class FakeEpcRepo(EpcRepository): def get_for_property(self, property_id: int) -> Optional[EpcPropertyData]: return self._by_property.get(property_id) + def get_for_properties( + self, property_ids: list[int] + ) -> dict[int, EpcPropertyData]: + return { + property_id: self._by_property[property_id] + for property_id in property_ids + if property_id in self._by_property + } + class FakeSolarRepo(SolarRepository): def __init__(self) -> None: diff --git a/tests/repositories/epc/test_epc_bulk_read.py b/tests/repositories/epc/test_epc_bulk_read.py new file mode 100644 index 00000000..8601bcf4 --- /dev/null +++ b/tests/repositories/epc/test_epc_bulk_read.py @@ -0,0 +1,81 @@ +"""Bulk EPC read: get_for_properties hydrates a batch in a handful of per-table +queries, not N x per-property (ADR-0012, #1138).""" + +from __future__ import annotations + +import json +from collections.abc import Callable +from pathlib import Path +from typing import Any + +from sqlalchemy import Engine, event +from sqlmodel import Session + +from datatypes.epc.domain.epc_property_data import EpcPropertyData +from datatypes.epc.domain.mapper import EpcPropertyDataMapper +from repositories.epc.epc_postgres_repository import EpcPostgresRepository + +_JSON_SAMPLES = Path(__file__).resolve().parents[3] / "backend/epc_api/json_samples" + + +def _load_epc() -> EpcPropertyData: + raw: dict[str, Any] = json.loads( + (_JSON_SAMPLES / "RdSAP-Schema-21.0.0" / "epc.json").read_text() + ) + return EpcPropertyDataMapper.from_api_response(raw) + + +def _count_queries(engine: Engine, work: Callable[[], None]) -> int: + count = 0 + + def _before(*_args: Any, **_kwargs: Any) -> None: + nonlocal count + count += 1 + + event.listen(engine, "before_cursor_execute", _before) + try: + work() + finally: + event.remove(engine, "before_cursor_execute", _before) + return count + + +def test_get_for_properties_hydrates_the_whole_batch(db_engine: Engine) -> None: + # Arrange — the same sample EPC persisted for two properties. + epc = _load_epc() + with Session(db_engine) as session: + repo = EpcPostgresRepository(session) + repo.save(epc, property_id=10) + repo.save(epc, property_id=11) + session.commit() + + # Act + with Session(db_engine) as session: + result = EpcPostgresRepository(session).get_for_properties([10, 11]) + + # Assert — both fully hydrated (load-whole, ADR-0002). + assert result == {10: epc, 11: epc} + + +def test_get_for_properties_round_trips_do_not_scale_with_batch_size( + db_engine: Engine, +) -> None: + # Arrange + epc = _load_epc() + with Session(db_engine) as session: + repo = EpcPostgresRepository(session) + repo.save(epc, property_id=10) + repo.save(epc, property_id=11) + session.commit() + + def _read(property_ids: list[int]) -> None: + with Session(db_engine) as session: + EpcPostgresRepository(session).get_for_properties(property_ids) + + # Act — count queries for a 1-property batch vs a 2-property batch. + one = _count_queries(db_engine, lambda: _read([10])) + two = _count_queries(db_engine, lambda: _read([10, 11])) + + # Assert — same number of round-trips regardless of batch size (one query + # per table, not per property). + assert one == two From c3691d9af2c0ef97d3b268eae1fcea0d3811f753 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Mon, 1 Jun 2026 14:54:59 +0000 Subject: [PATCH 15/18] =?UTF-8?q?refactor(property-baseline):=20rename=20b?= =?UTF-8?q?aseline=20=E2=86=92=20property=5Fbaseline=20aggregate=20(PR=20#?= =?UTF-8?q?1139=20review)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Wholesale rename of the Baseline aggregate to PropertyBaseline for clarity / to disambiguate from baselines that appear elsewhere in Modelling. Scoped to this aggregate only — the distinct Rebaselining term (rebaseline_reason, StubRebaseliner, RebaselineNotImplemented) is deliberately untouched. - domain/baseline → domain/property_baseline; BaselinePerformance → PropertyBaselinePerformance. - repositories/baseline → repositories/property_baseline; BaselineRepository / BaselinePostgresRepository → PropertyBaseline*. - orchestration/baseline_orchestrator.py → property_baseline_orchestrator.py; BaselineOrchestrator → PropertyBaselineOrchestrator. BaselineStage → PropertyBaselineStage. - infrastructure/postgres: baseline_performance_table.py → property_baseline_performance_table.py; table `baseline_performance` → `property_baseline_performance`; Model renamed. - UnitOfWork attribute `.baseline` → `.property_baseline`. - Docs: ADR-0004 references + migration doc (renamed to property-baseline-performance-table.md) updated. CONTEXT.md glossary term ("Baseline Performance") left as-is pending a ubiquitous-language call (raised on the PR). 123 tests pass; pyright strict clean (only the unrelated pre-existing moto import errors remain). Co-Authored-By: Claude Opus 4.8 --- applications/ara_first_run/handler.py | 6 +-- ...eline-performance-lodged-effective-pair.md | 14 +++--- ...=> property-baseline-performance-table.md} | 8 ++-- .../__init__.py | 0 .../performance.py | 0 .../property_baseline_performance.py} | 6 +-- .../rebaseliner.py | 4 +- ...=> property_baseline_performance_table.py} | 22 +++++----- orchestration/first_run_pipeline.py | 4 +- ...r.py => property_baseline_orchestrator.py} | 12 +++--- .../baseline/baseline_postgres_repository.py | 43 ------------------- repositories/postgres_unit_of_work.py | 6 +-- .../__init__.py | 0 .../property_baseline_postgres_repository.py | 43 +++++++++++++++++++ .../property_baseline_repository.py} | 10 ++--- repositories/unit_of_work.py | 4 +- .../__init__.py | 0 .../test_performance.py | 2 +- .../test_rebaseliner.py | 4 +- tests/orchestration/fakes.py | 16 +++---- .../test_first_run_pipeline_integration.py | 20 ++++----- ...=> test_property_baseline_orchestrator.py} | 28 ++++++------ .../__init__.py | 0 ..._property_baseline_postgres_repository.py} | 24 +++++------ tests/repositories/test_unit_of_work.py | 20 ++++----- 25 files changed, 148 insertions(+), 148 deletions(-) rename docs/migrations/{baseline-performance-table.md => property-baseline-performance-table.md} (87%) rename domain/{baseline => property_baseline}/__init__.py (100%) rename domain/{baseline => property_baseline}/performance.py (100%) rename domain/{baseline/baseline_performance.py => property_baseline/property_baseline_performance.py} (84%) rename domain/{baseline => property_baseline}/rebaseliner.py (94%) rename infrastructure/postgres/{baseline_performance_table.py => property_baseline_performance_table.py} (75%) rename orchestration/{baseline_orchestrator.py => property_baseline_orchestrator.py} (86%) delete mode 100644 repositories/baseline/baseline_postgres_repository.py rename repositories/{baseline => property_baseline}/__init__.py (100%) create mode 100644 repositories/property_baseline/property_baseline_postgres_repository.py rename repositories/{baseline/baseline_repository.py => property_baseline/property_baseline_repository.py} (52%) rename tests/domain/{baseline => property_baseline}/__init__.py (100%) rename tests/domain/{baseline => property_baseline}/test_performance.py (92%) rename tests/domain/{baseline => property_baseline}/test_rebaseliner.py (90%) rename tests/orchestration/{test_baseline_orchestrator.py => test_property_baseline_orchestrator.py} (74%) rename tests/repositories/{baseline => property_baseline}/__init__.py (100%) rename tests/repositories/{baseline/test_baseline_postgres_repository.py => property_baseline/test_property_baseline_postgres_repository.py} (73%) diff --git a/applications/ara_first_run/handler.py b/applications/ara_first_run/handler.py index f9cb6be7..147bf066 100644 --- a/applications/ara_first_run/handler.py +++ b/applications/ara_first_run/handler.py @@ -10,10 +10,10 @@ from sqlmodel import Session from applications.ara_first_run.ara_first_run_trigger_body import ( AraFirstRunTriggerBody, ) -from domain.baseline.rebaseliner import StubRebaseliner +from domain.property_baseline.rebaseliner import StubRebaseliner from infrastructure.postgres.config import PostgresConfig from infrastructure.postgres.engine import make_engine -from orchestration.baseline_orchestrator import BaselineOrchestrator +from orchestration.property_baseline_orchestrator import PropertyBaselineOrchestrator from orchestration.first_run_pipeline import FirstRunPipeline from orchestration.ingestion_orchestrator import ( EpcFetcher, @@ -78,7 +78,7 @@ def build_first_run_pipeline( geospatial_repo=geospatial_repo, solar_fetcher=solar_fetcher, ), - baseline=BaselineOrchestrator( + baseline=PropertyBaselineOrchestrator( unit_of_work=unit_of_work, rebaseliner=StubRebaseliner(), ), diff --git a/docs/adr/0004-baseline-performance-lodged-effective-pair.md b/docs/adr/0004-baseline-performance-lodged-effective-pair.md index ba275473..fc27be7d 100644 --- a/docs/adr/0004-baseline-performance-lodged-effective-pair.md +++ b/docs/adr/0004-baseline-performance-lodged-effective-pair.md @@ -1,27 +1,27 @@ -# `BaselinePerformance` stores both lodged and effective values +# `PropertyBaselinePerformance` stores both lodged and effective values -A Property's current performance has two states we care about: the rating that was lodged on the government register (the "lodged" SAP / band / carbon / heat) and the rating produced by the modelling pipeline against the current Effective EPC (the "effective" values, which may have been rebaselined by ML when the EPC was pre-SAP10 or when Landlord Overrides / Site Notes changed physical state). We considered storing a single set of values — the rebaselined-if-needed-otherwise-lodged figures — and rejected that. Both are stored as a pair on every `BaselinePerformance`, equal when no rebaselining trigger fires. +A Property's current performance has two states we care about: the rating that was lodged on the government register (the "lodged" SAP / band / carbon / heat) and the rating produced by the modelling pipeline against the current Effective EPC (the "effective" values, which may have been rebaselined by ML when the EPC was pre-SAP10 or when Landlord Overrides / Site Notes changed physical state). We considered storing a single set of values — the rebaselined-if-needed-otherwise-lodged figures — and rejected that. Both are stored as a pair on every `PropertyBaselinePerformance`, equal when no rebaselining trigger fires. The pair lets the FE show "this is what the gov register says vs this is the SAP10-equivalent we modelled against" side by side without a second query, and keeps the audit trail clean: a user looking at a property's plan can see exactly which figure drove the recommendation pipeline. Storing only one set forces a downstream consumer to recompute the missing one from raw EPC fields when it needs both, which is the kind of derivation creep we want to keep out of the FE. -The cost is a wider row + the discipline that **every** `BaselinePerformance` populates both halves, even when they're equal. Annual kWh, fuel split and bills are not paired — they are always derived deterministically by `EpcEnergyDerivationService` against the Effective state, because the EPC's recorded cost fields use fuel rates pinned to the inspection date and the UCL correction depends on the modelled band. +The cost is a wider row + the discipline that **every** `PropertyBaselinePerformance` populates both halves, even when they're equal. Annual kWh, fuel split and bills are not paired — they are always derived deterministically by `EpcEnergyDerivationService` against the Effective state, because the EPC's recorded cost fields use fuel rates pinned to the inspection date and the UCL correction depends on the modelled band. ## Consequences - Reversing this means rewriting every consumer that has learned to read both values. Hard to roll back once the FE depends on the pair. - The rebaseline trigger has two reasons (`pre_sap10`, `physical_state_changed`, or `both`) — store the reason alongside so we know *why* a property was rebaselined when debugging. -### Amendment (2026-05-30, #1135): standalone `baseline_performance` table +### Amendment (2026-05-30, #1135): standalone `property_baseline_performance` table The original consequence read *"`property_details_epc` (or its successor) carries 8 fields instead of 4 for the SAP-equivalent block"* — i.e. the pair as columns on the EPC-details table. That is superseded. `property_details_epc` is being **retired**: it is too tightly coupled to the schema of the legacy EPC API, which the Ara rebuild is moving off. So the pair has no home there. -`BaselinePerformance` instead persists as its **own standalone `baseline_performance` table, one -row per Property**, behind a dedicated `BaselineRepository` port (`save` / `get_for_property`), +`PropertyBaselinePerformance` instead persists as its **own standalone `property_baseline_performance` table, one +row per Property**, behind a dedicated `PropertyBaselineRepository` port (`save` / `get_for_property`), mirroring the EPC slice's repo shape. This is the cleaner model regardless of the retirement: -`BaselinePerformance` is its own aggregate (a Property's current performance), not a detail of any +`PropertyBaselinePerformance` is its own aggregate (a Property's current performance), not a detail of any single EPC. The row is **flat typed columns**, not a JSONB blob, because the FE both surfaces the block and diff --git a/docs/migrations/baseline-performance-table.md b/docs/migrations/property-baseline-performance-table.md similarity index 87% rename from docs/migrations/baseline-performance-table.md rename to docs/migrations/property-baseline-performance-table.md index 24e06179..66864eb9 100644 --- a/docs/migrations/baseline-performance-table.md +++ b/docs/migrations/property-baseline-performance-table.md @@ -1,8 +1,8 @@ -# `baseline_performance` table — FE-owned migration +# `property_baseline_performance` table — FE-owned migration **Context:** Slice 6 (Hestia-Homes/Model#1135) of the `ara_first_run` rebuild. The -`BaselineOrchestrator` establishes a Property's **Baseline Performance** (ADR-0004) and persists it -via a new `BaselineRepository` port. This is a brand-new table — no predecessor. +`PropertyBaselineOrchestrator` establishes a Property's **Baseline Performance** (ADR-0004) and persists it +via a new `PropertyBaselineRepository` port. This is a brand-new table — no predecessor. Per ADR-0004's amendment, the lodged/effective pair does **not** land on `property_details_epc` (which is being retired as too coupled to the legacy EPC-API schema). It lands here, as its own @@ -12,7 +12,7 @@ The SQLModel row is defined in `infrastructure/postgres/` so the ephemeral-Postg via `SQLModel.metadata.create_all`. The **production migration is FE-owned (Drizzle ORM)** — a straight lift-and-shift of the columns below. -## `baseline_performance` — one row per Property +## `property_baseline_performance` — one row per Property | Column | Type | Notes | |---|---|---| diff --git a/domain/baseline/__init__.py b/domain/property_baseline/__init__.py similarity index 100% rename from domain/baseline/__init__.py rename to domain/property_baseline/__init__.py diff --git a/domain/baseline/performance.py b/domain/property_baseline/performance.py similarity index 100% rename from domain/baseline/performance.py rename to domain/property_baseline/performance.py diff --git a/domain/baseline/baseline_performance.py b/domain/property_baseline/property_baseline_performance.py similarity index 84% rename from domain/baseline/baseline_performance.py rename to domain/property_baseline/property_baseline_performance.py index 8db6e05d..8da9bbf2 100644 --- a/domain/baseline/baseline_performance.py +++ b/domain/property_baseline/property_baseline_performance.py @@ -2,12 +2,12 @@ from __future__ import annotations from dataclasses import dataclass -from domain.baseline.performance import Performance -from domain.baseline.rebaseliner import RebaselineReason +from domain.property_baseline.performance import Performance +from domain.property_baseline.rebaseliner import RebaselineReason @dataclass(frozen=True) -class BaselinePerformance: +class PropertyBaselinePerformance: """A Property's current performance aggregate (CONTEXT.md, ADR-0004). Holds both halves — ``lodged`` (what the gov register says) and diff --git a/domain/baseline/rebaseliner.py b/domain/property_baseline/rebaseliner.py similarity index 94% rename from domain/baseline/rebaseliner.py rename to domain/property_baseline/rebaseliner.py index 40034a58..a80552ea 100644 --- a/domain/baseline/rebaseliner.py +++ b/domain/property_baseline/rebaseliner.py @@ -4,7 +4,7 @@ from abc import ABC, abstractmethod from typing import Literal from datatypes.epc.domain.epc_property_data import EpcPropertyData -from domain.baseline.performance import Performance +from domain.property_baseline.performance import Performance RebaselineReason = Literal["none", "pre_sap10", "physical_state_changed", "both"] @@ -29,7 +29,7 @@ class Rebaseliner(ABC): Rebaselining (CONTEXT.md) re-predicts the rated quantities via ML when the EPC was lodged pre-SAP10 or its physical state diverged from the lodged EPC; otherwise Effective Performance equals Lodged. Injected into the - BaselineOrchestrator (ADR-0011) so the ML adapter can swap in without + PropertyBaselineOrchestrator (ADR-0011) so the ML adapter can swap in without touching the orchestrator, and so the single-property re-score-on-override flow reuses the same port. """ diff --git a/infrastructure/postgres/baseline_performance_table.py b/infrastructure/postgres/property_baseline_performance_table.py similarity index 75% rename from infrastructure/postgres/baseline_performance_table.py rename to infrastructure/postgres/property_baseline_performance_table.py index fad4be9d..f43d9f3e 100644 --- a/infrastructure/postgres/baseline_performance_table.py +++ b/infrastructure/postgres/property_baseline_performance_table.py @@ -5,20 +5,20 @@ from typing import ClassVar, Optional, cast from sqlmodel import Field, SQLModel from datatypes.epc.domain.epc import Epc -from domain.baseline.baseline_performance import BaselinePerformance -from domain.baseline.performance import Performance -from domain.baseline.rebaseliner import RebaselineReason +from domain.property_baseline.property_baseline_performance import PropertyBaselinePerformance +from domain.property_baseline.performance import Performance +from domain.property_baseline.rebaseliner import RebaselineReason -class BaselinePerformanceModel(SQLModel, table=True): - """The ``baseline_performance`` row — one per Property (ADR-0004). +class PropertyBaselinePerformanceModel(SQLModel, table=True): + """The ``property_baseline_performance`` row — one per Property (ADR-0004). Flat typed columns (not a JSONB blob) so the FE can both surface the block and query the lodged-vs-effective pair. The production migration is FE-owned - (Drizzle); see docs/migrations/baseline-performance-table.md. + (Drizzle); see docs/migrations/property-baseline-performance-table.md. """ - __tablename__: ClassVar[str] = "baseline_performance" # pyright: ignore[reportIncompatibleVariableOverride] + __tablename__: ClassVar[str] = "property_baseline_performance" # pyright: ignore[reportIncompatibleVariableOverride] id: Optional[int] = Field(default=None, primary_key=True) property_id: int = Field(unique=True, index=True) @@ -40,8 +40,8 @@ class BaselinePerformanceModel(SQLModel, table=True): @classmethod def from_domain( - cls, baseline: BaselinePerformance, property_id: int - ) -> "BaselinePerformanceModel": + cls, baseline: PropertyBaselinePerformance, property_id: int + ) -> "PropertyBaselinePerformanceModel": return cls( property_id=property_id, lodged_sap_score=baseline.lodged.sap_score, @@ -57,8 +57,8 @@ class BaselinePerformanceModel(SQLModel, table=True): water_heating_kwh=baseline.water_heating_kwh, ) - def to_domain(self) -> BaselinePerformance: - return BaselinePerformance( + def to_domain(self) -> PropertyBaselinePerformance: + return PropertyBaselinePerformance( lodged=Performance( sap_score=self.lodged_sap_score, epc_band=Epc(self.lodged_epc_band), diff --git a/orchestration/first_run_pipeline.py b/orchestration/first_run_pipeline.py index 3d642d9e..6d521a35 100644 --- a/orchestration/first_run_pipeline.py +++ b/orchestration/first_run_pipeline.py @@ -29,7 +29,7 @@ class IngestionStage(Protocol): def run(self, property_ids: list[int]) -> None: ... -class BaselineStage(Protocol): +class PropertyBaselineStage(Protocol): """Stage 2 — establishes each Property's Baseline Performance.""" def run(self, property_ids: list[int]) -> None: ... @@ -57,7 +57,7 @@ class FirstRunPipeline: self, *, ingestion: IngestionStage, - baseline: BaselineStage, + baseline: PropertyBaselineStage, modelling: ModellingStage, ) -> None: self._ingestion = ingestion diff --git a/orchestration/baseline_orchestrator.py b/orchestration/property_baseline_orchestrator.py similarity index 86% rename from orchestration/baseline_orchestrator.py rename to orchestration/property_baseline_orchestrator.py index 9a1138c8..df2bf579 100644 --- a/orchestration/baseline_orchestrator.py +++ b/orchestration/property_baseline_orchestrator.py @@ -6,13 +6,13 @@ from datatypes.epc.domain.epc_property_data import ( EpcPropertyData, RenewableHeatIncentive, ) -from domain.baseline.baseline_performance import BaselinePerformance -from domain.baseline.performance import lodged_performance -from domain.baseline.rebaseliner import Rebaseliner +from domain.property_baseline.property_baseline_performance import PropertyBaselinePerformance +from domain.property_baseline.performance import lodged_performance +from domain.property_baseline.rebaseliner import Rebaseliner from repositories.unit_of_work import UnitOfWork -class BaselineOrchestrator: +class PropertyBaselineOrchestrator: """Stage 2: establish each Property's Baseline Performance and persist it. Runs the whole batch in **one** Unit of Work and commits once (ADR-0012): @@ -46,14 +46,14 @@ class BaselineOrchestrator: effective_epc, lodged ) rhi = _require_rhi(effective_epc) - baseline = BaselinePerformance( + baseline = PropertyBaselinePerformance( lodged=lodged, effective=effective, rebaseline_reason=reason, space_heating_kwh=rhi.space_heating_kwh, water_heating_kwh=rhi.water_heating_kwh, ) - uow.baseline.save(baseline, property_id) + uow.property_baseline.save(baseline, property_id) uow.commit() diff --git a/repositories/baseline/baseline_postgres_repository.py b/repositories/baseline/baseline_postgres_repository.py deleted file mode 100644 index 7a5b5807..00000000 --- a/repositories/baseline/baseline_postgres_repository.py +++ /dev/null @@ -1,43 +0,0 @@ -from __future__ import annotations - -from typing import Optional - -from sqlmodel import Session, col, delete, select - -from domain.baseline.baseline_performance import BaselinePerformance -from infrastructure.postgres.baseline_performance_table import ( - BaselinePerformanceModel, -) -from repositories.baseline.baseline_repository import BaselineRepository - - -class BaselinePostgresRepository(BaselineRepository): - """Maps BaselinePerformance to/from the ``baseline_performance`` table.""" - - def __init__(self, session: Session) -> None: - self._session = session - - def save(self, baseline: BaselinePerformance, property_id: int) -> int: - # Idempotent on property_id: a re-run (or re-score) replaces the row - # rather than hitting the unique constraint (ADR-0012). - self._session.exec( # type: ignore[call-overload] - delete(BaselinePerformanceModel).where( - col(BaselinePerformanceModel.property_id) == property_id - ) - ) - row = BaselinePerformanceModel.from_domain(baseline, property_id) - self._session.add(row) - self._session.flush() - if row.id is None: - raise ValueError("baseline_performance row did not receive an id") - return row.id - - def get_for_property( - self, property_id: int - ) -> Optional[BaselinePerformance]: - row = self._session.exec( - select(BaselinePerformanceModel).where( - BaselinePerformanceModel.property_id == property_id - ) - ).first() - return row.to_domain() if row is not None else None diff --git a/repositories/postgres_unit_of_work.py b/repositories/postgres_unit_of_work.py index bd5957e9..da91604b 100644 --- a/repositories/postgres_unit_of_work.py +++ b/repositories/postgres_unit_of_work.py @@ -6,8 +6,8 @@ from typing import Optional from sqlmodel import Session -from repositories.baseline.baseline_postgres_repository import ( - BaselinePostgresRepository, +from repositories.property_baseline.property_baseline_postgres_repository import ( + PropertyBaselinePostgresRepository, ) from repositories.epc.epc_postgres_repository import EpcPostgresRepository from repositories.property.property_postgres_repository import ( @@ -35,7 +35,7 @@ class PostgresUnitOfWork(UnitOfWork): self.property = PropertyPostgresRepository(self._session, epc_repo) self.epc = epc_repo self.solar = SolarPostgresRepository(self._session) - self.baseline = BaselinePostgresRepository(self._session) + self.property_baseline = PropertyBaselinePostgresRepository(self._session) return self def __exit__( diff --git a/repositories/baseline/__init__.py b/repositories/property_baseline/__init__.py similarity index 100% rename from repositories/baseline/__init__.py rename to repositories/property_baseline/__init__.py diff --git a/repositories/property_baseline/property_baseline_postgres_repository.py b/repositories/property_baseline/property_baseline_postgres_repository.py new file mode 100644 index 00000000..113614d9 --- /dev/null +++ b/repositories/property_baseline/property_baseline_postgres_repository.py @@ -0,0 +1,43 @@ +from __future__ import annotations + +from typing import Optional + +from sqlmodel import Session, col, delete, select + +from domain.property_baseline.property_baseline_performance import PropertyBaselinePerformance +from infrastructure.postgres.property_baseline_performance_table import ( + PropertyBaselinePerformanceModel, +) +from repositories.property_baseline.property_baseline_repository import PropertyBaselineRepository + + +class PropertyBaselinePostgresRepository(PropertyBaselineRepository): + """Maps PropertyBaselinePerformance to/from the ``property_baseline_performance`` table.""" + + def __init__(self, session: Session) -> None: + self._session = session + + def save(self, baseline: PropertyBaselinePerformance, property_id: int) -> int: + # Idempotent on property_id: a re-run (or re-score) replaces the row + # rather than hitting the unique constraint (ADR-0012). + self._session.exec( # type: ignore[call-overload] + delete(PropertyBaselinePerformanceModel).where( + col(PropertyBaselinePerformanceModel.property_id) == property_id + ) + ) + row = PropertyBaselinePerformanceModel.from_domain(baseline, property_id) + self._session.add(row) + self._session.flush() + if row.id is None: + raise ValueError("property_baseline_performance row did not receive an id") + return row.id + + def get_for_property( + self, property_id: int + ) -> Optional[PropertyBaselinePerformance]: + row = self._session.exec( + select(PropertyBaselinePerformanceModel).where( + PropertyBaselinePerformanceModel.property_id == property_id + ) + ).first() + return row.to_domain() if row is not None else None diff --git a/repositories/baseline/baseline_repository.py b/repositories/property_baseline/property_baseline_repository.py similarity index 52% rename from repositories/baseline/baseline_repository.py rename to repositories/property_baseline/property_baseline_repository.py index 67e430f5..c237f56a 100644 --- a/repositories/baseline/baseline_repository.py +++ b/repositories/property_baseline/property_baseline_repository.py @@ -3,21 +3,21 @@ from __future__ import annotations from abc import ABC, abstractmethod from typing import Optional -from domain.baseline.baseline_performance import BaselinePerformance +from domain.property_baseline.property_baseline_performance import PropertyBaselinePerformance -class BaselineRepository(ABC): +class PropertyBaselineRepository(ABC): """Persists and loads a Property's Baseline Performance. One Baseline Performance per Property (ADR-0004: persisted as one row). The - Postgres adapter writes the standalone ``baseline_performance`` table — not + Postgres adapter writes the standalone ``property_baseline_performance`` table — not columns on the retiring ``property_details_epc``. """ @abstractmethod - def save(self, baseline: BaselinePerformance, property_id: int) -> int: ... + def save(self, baseline: PropertyBaselinePerformance, property_id: int) -> int: ... @abstractmethod def get_for_property( self, property_id: int - ) -> Optional[BaselinePerformance]: ... + ) -> Optional[PropertyBaselinePerformance]: ... diff --git a/repositories/unit_of_work.py b/repositories/unit_of_work.py index af5b77f2..cb1cc1d8 100644 --- a/repositories/unit_of_work.py +++ b/repositories/unit_of_work.py @@ -4,7 +4,7 @@ from abc import ABC, abstractmethod from types import TracebackType from typing import Optional -from repositories.baseline.baseline_repository import BaselineRepository +from repositories.property_baseline.property_baseline_repository import PropertyBaselineRepository from repositories.epc.epc_repository import EpcRepository from repositories.property.property_repository import PropertyRepository from repositories.solar.solar_repository import SolarRepository @@ -25,7 +25,7 @@ class UnitOfWork(ABC): property: PropertyRepository epc: EpcRepository solar: SolarRepository - baseline: BaselineRepository + property_baseline: PropertyBaselineRepository @abstractmethod def commit(self) -> None: ... diff --git a/tests/domain/baseline/__init__.py b/tests/domain/property_baseline/__init__.py similarity index 100% rename from tests/domain/baseline/__init__.py rename to tests/domain/property_baseline/__init__.py diff --git a/tests/domain/baseline/test_performance.py b/tests/domain/property_baseline/test_performance.py similarity index 92% rename from tests/domain/baseline/test_performance.py rename to tests/domain/property_baseline/test_performance.py index 6e8f080e..9d7011cb 100644 --- a/tests/domain/baseline/test_performance.py +++ b/tests/domain/property_baseline/test_performance.py @@ -2,7 +2,7 @@ from __future__ import annotations from datatypes.epc.domain.epc import Epc from datatypes.epc.domain.epc_property_data import EpcPropertyData -from domain.baseline.performance import Performance, lodged_performance +from domain.property_baseline.performance import Performance, lodged_performance def _epc_with_recorded_performance( diff --git a/tests/domain/baseline/test_rebaseliner.py b/tests/domain/property_baseline/test_rebaseliner.py similarity index 90% rename from tests/domain/baseline/test_rebaseliner.py rename to tests/domain/property_baseline/test_rebaseliner.py index f4ceee70..8f669aed 100644 --- a/tests/domain/baseline/test_rebaseliner.py +++ b/tests/domain/property_baseline/test_rebaseliner.py @@ -6,8 +6,8 @@ import pytest from datatypes.epc.domain.epc import Epc from datatypes.epc.domain.epc_property_data import EpcPropertyData -from domain.baseline.performance import Performance -from domain.baseline.rebaseliner import RebaselineNotImplemented, StubRebaseliner +from domain.property_baseline.performance import Performance +from domain.property_baseline.rebaseliner import RebaselineNotImplemented, StubRebaseliner def _epc(*, sap_version: Optional[float]) -> EpcPropertyData: diff --git a/tests/orchestration/fakes.py b/tests/orchestration/fakes.py index 24138520..3e2feef0 100644 --- a/tests/orchestration/fakes.py +++ b/tests/orchestration/fakes.py @@ -10,10 +10,10 @@ from types import TracebackType from typing import Any, Optional from datatypes.epc.domain.epc_property_data import EpcPropertyData -from domain.baseline.baseline_performance import BaselinePerformance +from domain.property_baseline.property_baseline_performance import PropertyBaselinePerformance from domain.property.properties import Properties from domain.property.property import Property -from repositories.baseline.baseline_repository import BaselineRepository +from repositories.property_baseline.property_baseline_repository import PropertyBaselineRepository from repositories.epc.epc_repository import EpcRepository from repositories.property.property_repository import PropertyRepository from repositories.solar.solar_repository import SolarRepository @@ -74,17 +74,17 @@ class FakeSolarRepo(SolarRepository): raise NotImplementedError -class FakeBaselineRepo(BaselineRepository): +class FakePropertyBaselineRepo(PropertyBaselineRepository): def __init__(self) -> None: - self.saved: list[tuple[BaselinePerformance, int]] = [] + self.saved: list[tuple[PropertyBaselinePerformance, int]] = [] - def save(self, baseline: BaselinePerformance, property_id: int) -> int: + def save(self, baseline: PropertyBaselinePerformance, property_id: int) -> int: self.saved.append((baseline, property_id)) return len(self.saved) def get_for_property( self, property_id: int - ) -> Optional[BaselinePerformance]: # pragma: no cover + ) -> Optional[PropertyBaselinePerformance]: # pragma: no cover raise NotImplementedError @@ -97,12 +97,12 @@ class FakeUnitOfWork(UnitOfWork): property: FakePropertyRepo, epc: Optional[FakeEpcRepo] = None, solar: Optional[FakeSolarRepo] = None, - baseline: Optional[FakeBaselineRepo] = None, + property_baseline: Optional[FakePropertyBaselineRepo] = None, ) -> None: self.property = property self.epc = epc or FakeEpcRepo() self.solar = solar or FakeSolarRepo() - self.baseline = baseline or FakeBaselineRepo() + self.property_baseline = property_baseline or FakePropertyBaselineRepo() self.commits = 0 def __enter__(self) -> "FakeUnitOfWork": diff --git a/tests/orchestration/test_first_run_pipeline_integration.py b/tests/orchestration/test_first_run_pipeline_integration.py index d96351c7..781dcf87 100644 --- a/tests/orchestration/test_first_run_pipeline_integration.py +++ b/tests/orchestration/test_first_run_pipeline_integration.py @@ -18,19 +18,19 @@ from sqlmodel import Session, select from datatypes.epc.domain.epc import Epc from datatypes.epc.domain.epc_property_data import EpcPropertyData from datatypes.epc.domain.mapper import EpcPropertyDataMapper -from domain.baseline.rebaseliner import StubRebaseliner +from domain.property_baseline.rebaseliner import StubRebaseliner from domain.geospatial.coordinates import Coordinates -from infrastructure.postgres.baseline_performance_table import ( - BaselinePerformanceModel, +from infrastructure.postgres.property_baseline_performance_table import ( + PropertyBaselinePerformanceModel, ) from infrastructure.postgres.epc_property_table import EpcPropertyModel from infrastructure.postgres.property_table import PropertyRow -from orchestration.baseline_orchestrator import BaselineOrchestrator +from orchestration.property_baseline_orchestrator import PropertyBaselineOrchestrator from orchestration.first_run_pipeline import FirstRunPipeline from orchestration.ingestion_orchestrator import IngestionOrchestrator from orchestration.modelling_orchestrator import ModellingOrchestrator -from repositories.baseline.baseline_postgres_repository import ( - BaselinePostgresRepository, +from repositories.property_baseline.property_baseline_postgres_repository import ( + PropertyBaselinePostgresRepository, ) from repositories.geospatial.geospatial_repository import GeospatialRepository from repositories.materials.materials_repository import MaterialsRepository @@ -110,7 +110,7 @@ def test_first_run_baselines_through_repos_and_is_idempotent_on_rerun( geospatial_repo=_NoCoordinates(), solar_fetcher=_UnusedSolarFetcher(), ), - baseline=BaselineOrchestrator( + baseline=PropertyBaselineOrchestrator( unit_of_work=unit_of_work, rebaseliner=StubRebaseliner() ), modelling=ModellingOrchestrator( @@ -128,13 +128,13 @@ def test_first_run_baselines_through_repos_and_is_idempotent_on_rerun( # property_ids crossed the stage boundary), and the re-run replaced rather # than duplicated either row. with Session(db_engine) as session: - baseline = BaselinePostgresRepository(session).get_for_property(10) + baseline = PropertyBaselinePostgresRepository(session).get_for_property(10) epc_rows = session.exec( select(EpcPropertyModel).where(EpcPropertyModel.property_id == 10) ).all() baseline_rows = session.exec( - select(BaselinePerformanceModel).where( - BaselinePerformanceModel.property_id == 10 + select(PropertyBaselinePerformanceModel).where( + PropertyBaselinePerformanceModel.property_id == 10 ) ).all() diff --git a/tests/orchestration/test_baseline_orchestrator.py b/tests/orchestration/test_property_baseline_orchestrator.py similarity index 74% rename from tests/orchestration/test_baseline_orchestrator.py rename to tests/orchestration/test_property_baseline_orchestrator.py index a18628ec..cb67d176 100644 --- a/tests/orchestration/test_baseline_orchestrator.py +++ b/tests/orchestration/test_property_baseline_orchestrator.py @@ -7,13 +7,13 @@ from datatypes.epc.domain.epc_property_data import ( EpcPropertyData, RenewableHeatIncentive, ) -from domain.baseline.baseline_performance import BaselinePerformance -from domain.baseline.performance import Performance -from domain.baseline.rebaseliner import RebaselineNotImplemented, StubRebaseliner +from domain.property_baseline.property_baseline_performance import PropertyBaselinePerformance +from domain.property_baseline.performance import Performance +from domain.property_baseline.rebaseliner import RebaselineNotImplemented, StubRebaseliner from domain.property.property import Property, PropertyIdentity -from orchestration.baseline_orchestrator import BaselineOrchestrator +from orchestration.property_baseline_orchestrator import PropertyBaselineOrchestrator from tests.orchestration.fakes import ( - FakeBaselineRepo, + FakePropertyBaselineRepo, FakePropertyRepo, FakeUnitOfWork, ) @@ -39,12 +39,12 @@ def _property(*, sap_version: float) -> Property: def test_run_establishes_persists_and_commits_the_batch_once() -> None: # Arrange - baseline_repo = FakeBaselineRepo() + property_baseline_repo = FakePropertyBaselineRepo() uow = FakeUnitOfWork( property=FakePropertyRepo({10: _property(sap_version=10.2)}), - baseline=baseline_repo, + property_baseline=property_baseline_repo, ) - orchestrator = BaselineOrchestrator( + orchestrator = PropertyBaselineOrchestrator( unit_of_work=lambda: uow, rebaseliner=StubRebaseliner() ) @@ -56,9 +56,9 @@ def test_run_establishes_persists_and_commits_the_batch_once() -> None: lodged = Performance( sap_score=72, epc_band=Epc.C, co2_emissions=1.8, primary_energy_intensity=180 ) - assert baseline_repo.saved == [ + assert property_baseline_repo.saved == [ ( - BaselinePerformance( + PropertyBaselinePerformance( lodged=lodged, effective=lodged, rebaseline_reason="none", @@ -73,12 +73,12 @@ def test_run_establishes_persists_and_commits_the_batch_once() -> None: def test_run_raises_on_a_pre_sap10_property_and_does_not_commit() -> None: # Arrange — a pre-SAP10 cert needs ML rebaselining, which is not wired yet. - baseline_repo = FakeBaselineRepo() + property_baseline_repo = FakePropertyBaselineRepo() uow = FakeUnitOfWork( property=FakePropertyRepo({10: _property(sap_version=9.94)}), - baseline=baseline_repo, + property_baseline=property_baseline_repo, ) - orchestrator = BaselineOrchestrator( + orchestrator = PropertyBaselineOrchestrator( unit_of_work=lambda: uow, rebaseliner=StubRebaseliner() ) @@ -86,5 +86,5 @@ def test_run_raises_on_a_pre_sap10_property_and_does_not_commit() -> None: # committed (all-or-nothing). with pytest.raises(RebaselineNotImplemented): orchestrator.run([10]) - assert baseline_repo.saved == [] + assert property_baseline_repo.saved == [] assert uow.commits == 0 diff --git a/tests/repositories/baseline/__init__.py b/tests/repositories/property_baseline/__init__.py similarity index 100% rename from tests/repositories/baseline/__init__.py rename to tests/repositories/property_baseline/__init__.py diff --git a/tests/repositories/baseline/test_baseline_postgres_repository.py b/tests/repositories/property_baseline/test_property_baseline_postgres_repository.py similarity index 73% rename from tests/repositories/baseline/test_baseline_postgres_repository.py rename to tests/repositories/property_baseline/test_property_baseline_postgres_repository.py index df1da9e8..6395d0f9 100644 --- a/tests/repositories/baseline/test_baseline_postgres_repository.py +++ b/tests/repositories/property_baseline/test_property_baseline_postgres_repository.py @@ -4,14 +4,14 @@ from sqlalchemy import Engine from sqlmodel import Session from datatypes.epc.domain.epc import Epc -from domain.baseline.baseline_performance import BaselinePerformance -from domain.baseline.performance import Performance -from repositories.baseline.baseline_postgres_repository import ( - BaselinePostgresRepository, +from domain.property_baseline.property_baseline_performance import PropertyBaselinePerformance +from domain.property_baseline.performance import Performance +from repositories.property_baseline.property_baseline_postgres_repository import ( + PropertyBaselinePostgresRepository, ) -def _baseline() -> BaselinePerformance: +def _baseline() -> PropertyBaselinePerformance: lodged = Performance( sap_score=72, epc_band=Epc.C, co2_emissions=1.8, primary_energy_intensity=180 ) @@ -20,7 +20,7 @@ def _baseline() -> BaselinePerformance: effective = Performance( sap_score=64, epc_band=Epc.D, co2_emissions=2.4, primary_energy_intensity=210 ) - return BaselinePerformance( + return PropertyBaselinePerformance( lodged=lodged, effective=effective, rebaseline_reason="pre_sap10", @@ -33,12 +33,12 @@ def test_baseline_performance_round_trips(db_engine: Engine) -> None: # Arrange baseline = _baseline() with Session(db_engine) as session: - BaselinePostgresRepository(session).save(baseline, property_id=10) + PropertyBaselinePostgresRepository(session).save(baseline, property_id=10) session.commit() # Act with Session(db_engine) as session: - loaded = BaselinePostgresRepository(session).get_for_property(10) + loaded = PropertyBaselinePostgresRepository(session).get_for_property(10) # Assert — the full aggregate reconstructs, both halves intact. assert loaded == baseline @@ -50,7 +50,7 @@ def test_resaving_baseline_for_a_property_replaces_rather_than_duplicating( # Arrange — a re-run re-establishes the same property's baseline with a # different rating. first = _baseline() - rerun = BaselinePerformance( + rerun = PropertyBaselinePerformance( lodged=Performance( sap_score=80, epc_band=Epc.B, @@ -71,21 +71,21 @@ def test_resaving_baseline_for_a_property_replaces_rather_than_duplicating( # Act — save twice for the same property_id (must not hit the unique # constraint, must overwrite). with Session(db_engine) as session: - repo = BaselinePostgresRepository(session) + repo = PropertyBaselinePostgresRepository(session) repo.save(first, property_id=10) repo.save(rerun, property_id=10) session.commit() # Assert with Session(db_engine) as session: - loaded = BaselinePostgresRepository(session).get_for_property(10) + loaded = PropertyBaselinePostgresRepository(session).get_for_property(10) assert loaded == rerun def test_get_for_property_returns_none_when_absent(db_engine: Engine) -> None: # Arrange / Act with Session(db_engine) as session: - loaded = BaselinePostgresRepository(session).get_for_property(999) + loaded = PropertyBaselinePostgresRepository(session).get_for_property(999) # Assert assert loaded is None diff --git a/tests/repositories/test_unit_of_work.py b/tests/repositories/test_unit_of_work.py index 2851edaf..03018562 100644 --- a/tests/repositories/test_unit_of_work.py +++ b/tests/repositories/test_unit_of_work.py @@ -7,8 +7,8 @@ from sqlalchemy import Engine from sqlmodel import Session from datatypes.epc.domain.epc import Epc -from domain.baseline.baseline_performance import BaselinePerformance -from domain.baseline.performance import Performance +from domain.property_baseline.property_baseline_performance import PropertyBaselinePerformance +from domain.property_baseline.performance import Performance from repositories.postgres_unit_of_work import PostgresUnitOfWork @@ -16,11 +16,11 @@ def _session_factory(db_engine: Engine) -> Callable[[], Session]: return lambda: Session(db_engine) -def _baseline() -> BaselinePerformance: +def _baseline() -> PropertyBaselinePerformance: perf = Performance( sap_score=72, epc_band=Epc.C, co2_emissions=1.8, primary_energy_intensity=180 ) - return BaselinePerformance( + return PropertyBaselinePerformance( lodged=perf, effective=perf, rebaseline_reason="none", @@ -36,12 +36,12 @@ def test_committed_work_is_visible_to_a_later_unit(db_engine: Engine) -> None: # Act with new_unit() as uow: - uow.baseline.save(baseline, property_id=10) + uow.property_baseline.save(baseline, property_id=10) uow.commit() # Assert — a fresh unit reads back what the first one committed. with new_unit() as uow: - loaded = uow.baseline.get_for_property(10) + loaded = uow.property_baseline.get_for_property(10) assert loaded == baseline @@ -52,12 +52,12 @@ def test_an_exception_in_the_block_rolls_the_batch_back(db_engine: Engine) -> No # Act — a property mid-batch raises after a write but before commit. with pytest.raises(RuntimeError, match="boom"): with new_unit() as uow: - uow.baseline.save(_baseline(), property_id=10) + uow.property_baseline.save(_baseline(), property_id=10) raise RuntimeError("boom") # Assert — nothing from the aborted batch is persisted. with new_unit() as uow: - assert uow.baseline.get_for_property(10) is None + assert uow.property_baseline.get_for_property(10) is None def test_leaving_the_block_without_commit_persists_nothing(db_engine: Engine) -> None: @@ -66,8 +66,8 @@ def test_leaving_the_block_without_commit_persists_nothing(db_engine: Engine) -> # Act — write but never commit. with new_unit() as uow: - uow.baseline.save(_baseline(), property_id=10) + uow.property_baseline.save(_baseline(), property_id=10) # Assert with new_unit() as uow: - assert uow.baseline.get_for_property(10) is None + assert uow.property_baseline.get_for_property(10) is None From 3cad599fd14c226d3270d7f40d1dda400b408985 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Mon, 1 Jun 2026 14:57:00 +0000 Subject: [PATCH 16/18] refactor(property-baseline): units on co2 / PEUI columns (PR #1139 review) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Make the stored units explicit on the property_baseline_performance columns: - `*_co2_emissions` → `*_co2_emissions_t_per_yr` (tonnes CO₂/yr, whole dwelling) - `*_primary_energy_intensity` → `*_primary_energy_intensity_kwh_per_m2_yr` Column names only; the domain `Performance` VO stays unit-suffix-free (units are a storage concern, mapped in from_domain/to_domain). Migration doc updated. Round-trip stays green. Co-Authored-By: Claude Opus 4.8 --- .../property-baseline-performance-table.md | 8 +++---- .../property_baseline_performance_table.py | 24 +++++++++---------- 2 files changed, 16 insertions(+), 16 deletions(-) diff --git a/docs/migrations/property-baseline-performance-table.md b/docs/migrations/property-baseline-performance-table.md index 66864eb9..33e2171a 100644 --- a/docs/migrations/property-baseline-performance-table.md +++ b/docs/migrations/property-baseline-performance-table.md @@ -20,12 +20,12 @@ straight lift-and-shift of the columns below. | `property_id` | int, FK → `property.id`, **unique** | one Baseline Performance per Property | | `lodged_sap_score` | int | Lodged Performance — gov register, off the Effective EPC | | `lodged_epc_band` | text | the `Epc` enum, stored as its string value (e.g. `"C"`) | -| `lodged_co2_emissions` | float | | -| `lodged_primary_energy_intensity` | int | PEUI (kWh/m²/yr); **not** "heat demand" — see CONTEXT.md | +| `lodged_co2_emissions_t_per_yr` | float | tonnes CO₂/yr (whole dwelling) | +| `lodged_primary_energy_intensity_kwh_per_m2_yr` | int | PEUI (kWh/m²/yr); **not** "heat demand" — see CONTEXT.md | | `effective_sap_score` | int | Effective Performance — what modelling scored against | | `effective_epc_band` | text | | -| `effective_co2_emissions` | float | | -| `effective_primary_energy_intensity` | int | | +| `effective_co2_emissions_t_per_yr` | float | tonnes CO₂/yr (whole dwelling) | +| `effective_primary_energy_intensity_kwh_per_m2_yr` | int | kWh/m²/yr | | `rebaseline_reason` | text | `none` \| `pre_sap10` \| `physical_state_changed` \| `both` | | `space_heating_kwh` | float | off `renewable_heat_incentive`; deterministic (ADR-0006) | | `water_heating_kwh` | float | off `renewable_heat_incentive` | diff --git a/infrastructure/postgres/property_baseline_performance_table.py b/infrastructure/postgres/property_baseline_performance_table.py index f43d9f3e..0e5e1792 100644 --- a/infrastructure/postgres/property_baseline_performance_table.py +++ b/infrastructure/postgres/property_baseline_performance_table.py @@ -25,13 +25,13 @@ class PropertyBaselinePerformanceModel(SQLModel, table=True): lodged_sap_score: int lodged_epc_band: str - lodged_co2_emissions: float - lodged_primary_energy_intensity: int + lodged_co2_emissions_t_per_yr: float + lodged_primary_energy_intensity_kwh_per_m2_yr: int effective_sap_score: int effective_epc_band: str - effective_co2_emissions: float - effective_primary_energy_intensity: int + effective_co2_emissions_t_per_yr: float + effective_primary_energy_intensity_kwh_per_m2_yr: int rebaseline_reason: str @@ -46,12 +46,12 @@ class PropertyBaselinePerformanceModel(SQLModel, table=True): property_id=property_id, lodged_sap_score=baseline.lodged.sap_score, lodged_epc_band=baseline.lodged.epc_band.value, - lodged_co2_emissions=baseline.lodged.co2_emissions, - lodged_primary_energy_intensity=baseline.lodged.primary_energy_intensity, + lodged_co2_emissions_t_per_yr=baseline.lodged.co2_emissions, + lodged_primary_energy_intensity_kwh_per_m2_yr=baseline.lodged.primary_energy_intensity, effective_sap_score=baseline.effective.sap_score, effective_epc_band=baseline.effective.epc_band.value, - effective_co2_emissions=baseline.effective.co2_emissions, - effective_primary_energy_intensity=baseline.effective.primary_energy_intensity, + effective_co2_emissions_t_per_yr=baseline.effective.co2_emissions, + effective_primary_energy_intensity_kwh_per_m2_yr=baseline.effective.primary_energy_intensity, rebaseline_reason=baseline.rebaseline_reason, space_heating_kwh=baseline.space_heating_kwh, water_heating_kwh=baseline.water_heating_kwh, @@ -62,14 +62,14 @@ class PropertyBaselinePerformanceModel(SQLModel, table=True): lodged=Performance( sap_score=self.lodged_sap_score, epc_band=Epc(self.lodged_epc_band), - co2_emissions=self.lodged_co2_emissions, - primary_energy_intensity=self.lodged_primary_energy_intensity, + co2_emissions=self.lodged_co2_emissions_t_per_yr, + primary_energy_intensity=self.lodged_primary_energy_intensity_kwh_per_m2_yr, ), effective=Performance( sap_score=self.effective_sap_score, epc_band=Epc(self.effective_epc_band), - co2_emissions=self.effective_co2_emissions, - primary_energy_intensity=self.effective_primary_energy_intensity, + co2_emissions=self.effective_co2_emissions_t_per_yr, + primary_energy_intensity=self.effective_primary_energy_intensity_kwh_per_m2_yr, ), rebaseline_reason=cast(RebaselineReason, self.rebaseline_reason), space_heating_kwh=self.space_heating_kwh, From 62e762e962bfc3d50bcaf605037f92f1e293db7e Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Mon, 1 Jun 2026 14:58:11 +0000 Subject: [PATCH 17/18] refactor(property): PropertyRow.id non-Optional (PR #1139 review) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `property` is an FE-owned table the backend only ever reads — every row read carries an id — so the autoincrement-PK `Optional[int]` idiom doesn't apply here. Make it `int` and drop the now-redundant None guard in get_many. (Contrast: solar_table keeps Optional id — the backend DOES insert those, so id is genuinely None pre-flush.) Co-Authored-By: Claude Opus 4.8 --- infrastructure/postgres/property_table.py | 4 +++- repositories/property/property_postgres_repository.py | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/infrastructure/postgres/property_table.py b/infrastructure/postgres/property_table.py index 0b91a2ad..6bd2d644 100644 --- a/infrastructure/postgres/property_table.py +++ b/infrastructure/postgres/property_table.py @@ -15,7 +15,9 @@ class PropertyRow(SQLModel, table=True): __tablename__: ClassVar[str] = "property" # pyright: ignore[reportIncompatibleVariableOverride] - id: Optional[int] = Field(default=None, primary_key=True) + # Non-Optional: this is a read-only defensive view of the FE-owned ``property`` + # table — the backend never inserts rows, so every row read carries an id. + id: int = Field(primary_key=True) portfolio_id: int postcode: str address: str diff --git a/repositories/property/property_postgres_repository.py b/repositories/property/property_postgres_repository.py index e0b4f9ff..55a32ed3 100644 --- a/repositories/property/property_postgres_repository.py +++ b/repositories/property/property_postgres_repository.py @@ -42,7 +42,7 @@ class PropertyPostgresRepository(PropertyRepository): rows = self._session.exec( select(PropertyRow).where(col(PropertyRow.id).in_(property_ids)) ).all() - row_by_id = {row.id: row for row in rows if row.id is not None} + row_by_id = {row.id: row for row in rows} epcs = self._epc_repo.get_for_properties(property_ids) items: list[Property] = [] for property_id in property_ids: From 305bffd2845c267d271a9c12b6f0b3d8c5f3f169 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Mon, 1 Jun 2026 15:00:33 +0000 Subject: [PATCH 18/18] =?UTF-8?q?refactor(ara):=20rename=20FirstRunPipelin?= =?UTF-8?q?e=20=E2=86=92=20AraFirstRunPipeline=20(PR=20#1139=20review)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Aligns the composition with its entry point (the `ara_first_run` lambda + `AraFirstRunTriggerBody`): clearer what the file does. - orchestration/first_run_pipeline.py → ara_first_run_pipeline.py - FirstRunPipeline → AraFirstRunPipeline; FirstRunCommand → AraFirstRunCommand - test files renamed to match Co-Authored-By: Claude Opus 4.8 --- applications/ara_first_run/handler.py | 8 ++++---- .../{first_run_pipeline.py => ara_first_run_pipeline.py} | 6 +++--- tests/applications/ara_first_run/test_handler.py | 6 +++--- ...rst_run_pipeline.py => test_ara_first_run_pipeline.py} | 8 ++++---- ...tion.py => test_ara_first_run_pipeline_integration.py} | 4 ++-- 5 files changed, 16 insertions(+), 16 deletions(-) rename orchestration/{first_run_pipeline.py => ara_first_run_pipeline.py} (94%) rename tests/orchestration/{test_first_run_pipeline.py => test_ara_first_run_pipeline.py} (89%) rename tests/orchestration/{test_first_run_pipeline_integration.py => test_ara_first_run_pipeline_integration.py} (98%) diff --git a/applications/ara_first_run/handler.py b/applications/ara_first_run/handler.py index 147bf066..761fd207 100644 --- a/applications/ara_first_run/handler.py +++ b/applications/ara_first_run/handler.py @@ -14,7 +14,7 @@ from domain.property_baseline.rebaseliner import StubRebaseliner from infrastructure.postgres.config import PostgresConfig from infrastructure.postgres.engine import make_engine from orchestration.property_baseline_orchestrator import PropertyBaselineOrchestrator -from orchestration.first_run_pipeline import FirstRunPipeline +from orchestration.ara_first_run_pipeline import AraFirstRunPipeline from orchestration.ingestion_orchestrator import ( EpcFetcher, IngestionOrchestrator, @@ -42,7 +42,7 @@ def _get_engine() -> Engine: class _RunsFirstRun(Protocol): - """The slice of FirstRunPipeline the handler delegates to.""" + """The slice of AraFirstRunPipeline the handler delegates to.""" def run(self, command: AraFirstRunTriggerBody) -> None: ... @@ -63,7 +63,7 @@ def build_first_run_pipeline( epc_fetcher: EpcFetcher, geospatial_repo: GeospatialRepository, solar_fetcher: SolarFetcher, -) -> FirstRunPipeline: +) -> AraFirstRunPipeline: """Compose the real three-stage pipeline on a Unit-of-Work factory. Each stage opens its own unit(s) and commits per batch (ADR-0012); the @@ -71,7 +71,7 @@ def build_first_run_pipeline( their config is not settled — see ``_source_clients_from_env``. Modelling is stubbed (#1136); its Scenario / Materials ports are seams. """ - return FirstRunPipeline( + return AraFirstRunPipeline( ingestion=IngestionOrchestrator( unit_of_work=unit_of_work, epc_fetcher=epc_fetcher, diff --git a/orchestration/first_run_pipeline.py b/orchestration/ara_first_run_pipeline.py similarity index 94% rename from orchestration/first_run_pipeline.py rename to orchestration/ara_first_run_pipeline.py index 6d521a35..ed507d6e 100644 --- a/orchestration/first_run_pipeline.py +++ b/orchestration/ara_first_run_pipeline.py @@ -3,7 +3,7 @@ from __future__ import annotations from typing import Protocol -class FirstRunCommand(Protocol): +class AraFirstRunCommand(Protocol): """The slice of the trigger the pipeline threads downstream. Only the business fields — UPRNs and Scenario definitions are read from @@ -41,7 +41,7 @@ class ModellingStage(Protocol): def run(self, property_ids: list[int], scenario_ids: list[int]) -> None: ... -class FirstRunPipeline: +class AraFirstRunPipeline: """Composes the First Run stages end-to-end: Ingestion -> Baseline -> Modelling. @@ -64,7 +64,7 @@ class FirstRunPipeline: self._baseline = baseline self._modelling = modelling - def run(self, command: FirstRunCommand) -> None: + def run(self, command: AraFirstRunCommand) -> None: self._ingestion.run(command.property_ids) self._baseline.run(command.property_ids) self._modelling.run(command.property_ids, command.scenario_ids) diff --git a/tests/applications/ara_first_run/test_handler.py b/tests/applications/ara_first_run/test_handler.py index 21e96e3d..c02cc723 100644 --- a/tests/applications/ara_first_run/test_handler.py +++ b/tests/applications/ara_first_run/test_handler.py @@ -7,16 +7,16 @@ from applications.ara_first_run.ara_first_run_trigger_body import ( AraFirstRunTriggerBody, ) from applications.ara_first_run.handler import dispatch_first_run -from orchestration.first_run_pipeline import FirstRunCommand +from orchestration.ara_first_run_pipeline import AraFirstRunCommand class _SpyPipeline: """Records the command it is asked to run, instead of composing stages.""" def __init__(self) -> None: - self.received: Optional[FirstRunCommand] = None + self.received: Optional[AraFirstRunCommand] = None - def run(self, command: FirstRunCommand) -> None: + def run(self, command: AraFirstRunCommand) -> None: self.received = command diff --git a/tests/orchestration/test_first_run_pipeline.py b/tests/orchestration/test_ara_first_run_pipeline.py similarity index 89% rename from tests/orchestration/test_first_run_pipeline.py rename to tests/orchestration/test_ara_first_run_pipeline.py index 705282ee..8d78ff2c 100644 --- a/tests/orchestration/test_first_run_pipeline.py +++ b/tests/orchestration/test_ara_first_run_pipeline.py @@ -2,12 +2,12 @@ from __future__ import annotations from dataclasses import dataclass -from orchestration.first_run_pipeline import FirstRunCommand, FirstRunPipeline +from orchestration.ara_first_run_pipeline import AraFirstRunCommand, AraFirstRunPipeline @dataclass class _FakeCommand: - """A stand-in for AraFirstRunTriggerBody — structurally a FirstRunCommand.""" + """A stand-in for AraFirstRunTriggerBody — structurally a AraFirstRunCommand.""" portfolio_id: int property_ids: list[int] @@ -41,10 +41,10 @@ class _SpyModelling: def test_run_sequences_the_three_stages_threading_only_property_ids() -> None: # Arrange log: list[tuple[object, ...]] = [] - command: FirstRunCommand = _FakeCommand( + command: AraFirstRunCommand = _FakeCommand( portfolio_id=1, property_ids=[10, 11], scenario_ids=[7] ) - pipeline = FirstRunPipeline( + pipeline = AraFirstRunPipeline( ingestion=_SpyIngestion(log), baseline=_SpyBaseline(log), modelling=_SpyModelling(log), diff --git a/tests/orchestration/test_first_run_pipeline_integration.py b/tests/orchestration/test_ara_first_run_pipeline_integration.py similarity index 98% rename from tests/orchestration/test_first_run_pipeline_integration.py rename to tests/orchestration/test_ara_first_run_pipeline_integration.py index 781dcf87..381f3f21 100644 --- a/tests/orchestration/test_first_run_pipeline_integration.py +++ b/tests/orchestration/test_ara_first_run_pipeline_integration.py @@ -26,7 +26,7 @@ from infrastructure.postgres.property_baseline_performance_table import ( from infrastructure.postgres.epc_property_table import EpcPropertyModel from infrastructure.postgres.property_table import PropertyRow from orchestration.property_baseline_orchestrator import PropertyBaselineOrchestrator -from orchestration.first_run_pipeline import FirstRunPipeline +from orchestration.ara_first_run_pipeline import AraFirstRunPipeline from orchestration.ingestion_orchestrator import IngestionOrchestrator from orchestration.modelling_orchestrator import ModellingOrchestrator from repositories.property_baseline.property_baseline_postgres_repository import ( @@ -103,7 +103,7 @@ def test_first_run_baselines_through_repos_and_is_idempotent_on_rerun( def unit_of_work() -> PostgresUnitOfWork: return PostgresUnitOfWork(lambda: Session(db_engine)) - pipeline = FirstRunPipeline( + pipeline = AraFirstRunPipeline( ingestion=IngestionOrchestrator( unit_of_work=unit_of_work, epc_fetcher=_FetcherReturning(_lodged_epc()),