# EPC persistence schema gaps — migrations for round-trip fidelity **Context:** Slice 1 (Hestia-Homes/Model#1129) of the `ara_first_run` rebuild. The round-trip fidelity test (`EpcPropertyData → epc_property tables → reload → EpcPropertyData`, deep-equality) surfaced that the current `epc_property` schema stores only a **partial, partly type-lossy projection** of the `EpcPropertyData` domain object. This document lists every gap and the migration needed to close it, so the schema (FE-owned for some tables) can be updated. We can make the column/table changes on the **SQLModel definitions** in `infrastructure/postgres/epc_property_table.py` directly — tests build their schema from those models via `SQLModel.metadata.create_all`, so they don't need the live DB. The live migrations listed here are what must be applied wherever the physical tables are owned. **`epc_cache` relationship:** the raw gov-API JSON response is retained in the `epc_cache` table, so the *source* is always recoverable even where the structured `epc_property` projection is lossy. That makes these gaps "the structured store is incomplete" rather than "data is lost forever" — but the modelling pipeline reads the structured `epc_property`, not the raw cache, so the gaps below still block faithful modelling and must be closed. Priority key: **P0** modelling needs it now · **P1** needed soon · **P2** completeness. --- ## Status after Slice 1 (#1129) The round-trip test passes over the persisted projection for RdSAP-Schema-21.0.0 and 21.0.1. The following were **applied on the SQLModel** (`infrastructure/postgres/epc_property_table.py`) and **still require the matching DB migration** wherever the physical tables live: - **§1 JSONB** — all `Union` code columns converted (`epc_property`: `heating_cylinder_size`, `heating_immersion_heating_type`, `heating_cylinder_insulation_type`, `heating_secondary_heating_type`, `heating_shower_outlet_type`, `energy_pv_connection`; `epc_main_heating_detail`: `main_fuel_type`, `heat_emitter_type`, `emitter_temperature`, `main_heating_control`; `epc_building_part`: `wall_construction`, `wall_insulation_type`, `party_wall_construction`, `flat_roof_insulation_thickness`, `roof_insulation_location`, `roof_insulation_thickness`; `epc_window`: `glazing_gap`, `orientation`, `window_type`, `glazing_type`, `window_location`, `window_wall_type`, `draught_proofed`, `permanent_shutters_present`, `transmission_data_source`). - **New scalar columns** — `epc_property`: `heating_number_baths`, `heating_number_baths_wwhrs`, `heating_electric_shower_count`, `heating_mixer_shower_count`, `mechanical_vent_duct_insulation_level`, `addendum_stone_walls`, `addendum_system_build`, `addendum_numbers` (JSONB), `ventilation_present`, `ventilation_sheltered_sides`, `ventilation_has_suspended_timber_floor`, `ventilation_suspended_timber_floor_sealed`, `ventilation_has_draught_lobby`, `ventilation_air_permeability_ap4_m3_h_m2`, `ventilation_mechanical_ventilation_kind`; `epc_building_part`: `roof_construction_type`, `curtain_wall_age`. - **§2.1 `epc_renewable_heat_incentive` table** (#1137) — now created on the SQLModel and wired into save/get; the round-trip test asserts **full deep-equality** (no exclusion). DB migration still required. **Still open (follow-up issues):** the remaining §2 structural tables (room-in-roof detail, PV arrays, roof windows) + §3 nested-wall fields (`SapAlternativeWall.u_value`/`wall_thickness_mm`) + `SapFloorDimension` exposed-floor flags — none populated in the 21.0.0/21.0.1 fixtures, so latent until a richer fixture exercises them. --- ## 1. Type fidelity — convert `Union[int, str]` code columns to JSONB These columns hold SAP/RdSAP categorical codes that are **`int` from the gov API** and **`str` from Site Notes** (`Union[int, str]` in the domain). The forward mapper currently coerces them with `str(...)` (and `bool(...)` for two window flags), so an API `int` of `26` is stored as `"26"` and cannot be recovered. Convert each to **JSONB** and drop the `str()`/`bool()` coercion in the forward mapper so the Python type round-trips exactly (JSON scalars preserve `int` vs `str` vs `bool` vs `null`). **P0** — these feed the SAP10 calculator's int-keyed dispatch. | Table | Columns | |---|---| | `epc_property` | `heating_cylinder_size`, `heating_immersion_heating_type`, `heating_cylinder_insulation_type`, `heating_secondary_heating_type`, `heating_shower_outlet_type`, `energy_pv_connection` | | `epc_main_heating_detail` | `main_fuel_type`, `heat_emitter_type`, `emitter_temperature`, `main_heating_control` | | `epc_building_part` | `wall_construction`, `wall_insulation_type`, `party_wall_construction`, `flat_roof_insulation_thickness`, `roof_insulation_location`, `roof_insulation_thickness` | | `epc_window` | `glazing_gap`, `orientation`, `window_type`, `glazing_type`, `window_location`, `window_wall_type`, `draught_proofed`, `permanent_shutters_present` | (`energy_meter_type` and `energy_wind_turbines_terrain_type` are `str` in the domain — leave as `TEXT`.) --- ## 2. Not stored at all — new tables ### 2.1 `epc_renewable_heat_incentive` — **P0** Maps `EpcPropertyData.renewable_heat_incentive` (`RenewableHeatIncentive`). Carries the **baseline space-heating and hot-water kWh** that EPC Energy Derivation consumes — the single most important gap. One row per `epc_property`. | Column | Type | Source | |---|---|---| | `epc_property_id` | FK → `epc_property.id`, unique | | | `space_heating_kwh` | float | `space_heating_kwh` | | `water_heating_kwh` | float | `water_heating_kwh` | | `impact_of_loft_insulation_kwh` | float, null | `impact_of_loft_insulation_kwh` | | `impact_of_cavity_insulation_kwh` | float, null | `impact_of_cavity_insulation_kwh` | | `impact_of_solid_wall_insulation_kwh` | float, null | `impact_of_solid_wall_insulation_kwh` | ### 2.2 `epc_room_in_roof` (+ `epc_room_in_roof_surface`) — **P1** `SapBuildingPart.sap_room_in_roof` (`SapRoomInRoof`) is currently flattened to just `room_in_roof_floor_area` + `room_in_roof_construction_age_band` on `epc_building_part`, dropping the Type-2 geometry and the Detailed-measurement surfaces. Replace with a child table of `epc_building_part`: `epc_room_in_roof`: `epc_building_part_id` (FK, unique), `floor_area`, `construction_age_band`, `common_wall_length_m`, `common_wall_height_m`, `gable_1_length_m`, `gable_1_height_m`, `gable_2_length_m`, `gable_2_height_m`. `epc_room_in_roof_surface` (0..n per RIR, from `detailed_surfaces: List[SapRoomInRoofSurface]`): `epc_room_in_roof_id` (FK), `kind`, `area_m2`, `insulation_thickness_mm` (null), `insulation_type` (null), `u_value` (null). ### 2.3 `epc_photovoltaic_array` — **P1** `SapEnergySource.photovoltaic_arrays: List[PhotovoltaicArray]` (measured PV) is not stored at all — only the `percent_roof_area` fallback is. One row per array: `epc_property_id` (FK), `peak_power`, `pitch`, `orientation`, `overshading`. ### 2.4 `epc_roof_window` — **P2** `EpcPropertyData.sap_roof_windows: List[SapRoofWindow]` not stored. One row per roof window: `epc_property_id` (FK), `area_m2`, `u_value_raw`, `orientation`, `pitch_deg`, `g_perpendicular`, `frame_factor`. --- ## 3. Not stored at all — new columns ### 3.1 `epc_property` additions | Column | Type | Source | Pri | |---|---|---|---| | `addendum_stone_walls` | bool, null | `addendum.stone_walls` | P2 | | `addendum_system_build` | bool, null | `addendum.system_build` | P2 | | `addendum_numbers` | JSONB, null | `addendum.addendum_numbers` (`List[int]`) | P2 | | `lzc_energy_sources` | JSONB, null | `lzc_energy_sources` (`List[int]`) | P2 | | `solar_hw_collector_orientation` | text, null | `solar_hw_collector_orientation` | P1 | | `solar_hw_collector_pitch_deg` | int, null | `solar_hw_collector_pitch_deg` | P1 | | `solar_hw_overshading` | text, null | `solar_hw_overshading` | P1 | | `extract_fans_count` | int, null | top-level `extract_fans_count` (distinct from the `ventilation_*` one) | P2 | | `mechanical_vent_duct_insulation_level` | int, null | `mechanical_vent_duct_insulation_level` | P2 | ### 3.2 `epc_building_part` additions | Column | Type | Source | Pri | |---|---|---|---| | `roof_construction_type` | text, null | `roof_construction_type` (Site-Notes str) | P1 | | `curtain_wall_age` | text, null | `curtain_wall_age` (RdSAP §5.18) | P1 | | `alt_wall_1_u_value` | float, null | `sap_alternative_wall_1.u_value` | P1 | | `alt_wall_1_thickness_mm` | int, null | `sap_alternative_wall_1.wall_thickness_mm` | P1 | | `alt_wall_2_u_value` | float, null | `sap_alternative_wall_2.u_value` | P1 | | `alt_wall_2_thickness_mm` | int, null | `sap_alternative_wall_2.wall_thickness_mm` | P1 | ### 3.3 `epc_floor_dimension` additions | Column | Type | Source | Pri | |---|---|---|---| | `is_exposed_floor` | bool, default false | `SapFloorDimension.is_exposed_floor` | P1 | | `is_above_partially_heated_space` | bool, default false | `SapFloorDimension.is_above_partially_heated_space` | P1 | --- ## 4. Mapper-only gaps (no schema change required) The table can already hold these; the **save mapper** simply doesn't write them. Fix in the forward mapper, not the DB: - **`air_tightness`** (`EnergyElement`) — `epc_energy_element.element_type` is a free string, so add an `"air_tightness"` element type to the save loop. **P1.** --- ## 5. Scope note Slice 1 (#1129) asserts faithful round-trip over the **projection the schema is meant to store**, after applying §1 (JSONB) and the straightforward §3/§4 additions on the SQLModel. The structural new tables in §2 (RHI, room-in-roof, PV arrays, roof windows) are tracked as their own follow-up issues — `epc_renewable_heat_incentive` (§2.1) first, as it unblocks EPC Energy Derivation. Each gap above should become a checkbox on the relevant issue so nothing is silently dropped.