Four ventilation features: mechanical_ventilation (categorical
SAP10 code, 0=natural through 6=positive-input-from-outside per
epc_codes.csv mechanical_ventilation enum), mechanical_vent_duct_type
(categorical), blocked_chimneys_count (int), and pressure_test
(int — air-tightness SAP10 code).
Pulled from top-level EpcPropertyData fields; ventilation on SAP10
API EPCs sits on the certificate directly, not on the
sap_ventilation block (which is site-notes-only).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Nine more energy-source features land: has_pv_battery,
pv_battery_count, pv_battery_capacity_kwh (count × per-unit
capacity from pv_batteries.pv_battery, nullable when count=0),
has_wind_turbine, wind_turbine_count, mains_gas (the dominant
fuel-deduction signal), and the three smart-meter / export
booleans (electricity_smart_meter_present, gas_smart_meter_present,
is_dwelling_export_capable).
Closes the PV/solar feature group started in slice 11a.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fifteen PV features land: has_pv (bool), pv_capacity_source (str
categorical: measured / estimated_from_roof_area / none),
pv_array_count, pv_total_peak_power_kw, eight peak-power-by-octant
columns (pv_peak_power_kw_{N..NW}), peak-power-weighted
pv_avg_pitch and pv_avg_overshading (nullable), and
pv_percent_roof_area (nullable — populated only on the estimated
branch).
Dispatches on the SAP10 EpcPropertyData.SapEnergySource shapes added
in slice 10.5: photovoltaic_arrays populates → measured;
photovoltaic_supply.none_or_no_details.percent_roof_area > 0 →
estimated; everything else → none. percent_roof_area == 0 is the
canonical no-PV payload and surfaces as 'none', not 'estimated'.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
SAP10 EPCs with measured PV carry photovoltaic_supply as a nested
list of arrays (peak_power, pitch, orientation, overshading) rather
than the legacy unmeasured wrapper {none_or_no_details:
{percent_roof_area: N}}. The schema-21 dataclasses now accept both
shapes via Union[PhotovoltaicSupply, List[List[PhotovoltaicArray]]],
and from_dict._coerce now dispatches list values onto list type
variants of multi-type Unions.
EpcPropertyData.SapEnergySource gains
photovoltaic_arrays: Optional[List[PhotovoltaicArray]] — populated
when the measured shape is present, otherwise None. The legacy
photovoltaic_supply field is preserved for the fallback case.
Both schema-21.0.0 and 21.0.1 mappers dispatch via the new
_map_schema_21_pv helper.
Unblocks Slice 11 (PV feature aggregation in EpcMlTransform).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fifteen heating features land via hybrid Top-1 + flat fields: the
primary heating slot from main_heating_details[0] gives
main_fuel_type, heat_emitter_type, main_heating_control,
main_heating_category, has_fghrs, fan_flue_present, boiler_flue_type
and central_heating_pump_age (all int-categorical for the SAP10
codes); main_heating_count carries the aggregate. Water heating
adds water_heating_code, water_heating_fuel, cylinder_size, and
cylinder_insulation_thickness_mm. Secondary heating is summarised
by has_secondary_heating (derived) and secondary_fuel_type.
Fuel codes follow the gov api enums in epc_codes.csv (44 main_fuel
values shared with water_heating_fuel). Union[int, str] fields
coerce to int when the value is int, else None.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Thirteen building-parts features land: five cross-all-parts physical
aggregates (count, total_heat_loss_perimeter_m,
total_party_wall_length_m, total_floor_area_from_parts_m2,
avg_room_height_m) and eight Main-Dwelling-specific columns
(heat_loss_perimeter, party_wall_length, total_floor_area,
avg_room_height, has_room_in_roof, construction_age_band,
wall_construction, roof_construction). Main-Dwelling columns are
None when no part has identifier == 'Main Dwelling' — honest about
data quality rather than silently falling back to the first part.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds seventeen window-categorical-share features: one float per
SAP10 glazed_type code (1-15) plus a `_other` bucket for anything
outside the enum, and a single `window_pct_pvc_frame` for the
area-weighted PVC-frame share. All shares are area-weighted over
total window area; null pvc_frame share for window-less properties.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Thirteen window-aggregate features land on the transform: count,
total area, eight SAP-octant area columns (N/NE/E/SE/S/SW/W/NW),
area-weighted draught-proofing pct, and area-weighted u_value +
solar transmittance (nullable, populated only when windows carry
transmission_details). Windows with orientation outside 1-8 (0,
NR) contribute to count and total area but no octant.
Also: epc codes CSV (gov api /api/codes export, RdSAP-Schema-21.x +
older versions) moved next to EpcPropertyData as epc_codes.csv —
canonical SAP enum source for upcoming categorical-share slices.
.gitignore exception added so the reference CSV is tracked.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds seven flat categorical features (dwelling_type, tenure,
transaction_type, property_type, built_form, region_code,
country_code) emitted as raw strings. New ColumnSpec.categorical
bool tells the parquet writer to cast these to pd.Categorical at the
I/O boundary, keeping pandas out of the domain/schema module.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds three non-nullable booleans (solar_water_heating,
has_hot_water_cylinder, has_fixed_air_conditioning) and three
optional integer indicators (percent_draughtproofed,
energy_rating_average, environmental_impact_current). All direct
EpcPropertyData field reads.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Ten flat int counts added to the transform — door_count,
habitable/heated/wet/insulated_door counts, extensions, open
chimneys, and the three fixed-lighting bulb counts (CFL/LED/
incandescent). All non-nullable; direct EpcPropertyData field reads.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First feature column lands on the transform: schema() advertises
total_floor_area_m2 as a non-nullable float; to_row() emits the value
from EpcPropertyData.total_floor_area_m2 alongside the six targets.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous slice commits used -a-style and missed these new files;
imports in transform.py and test_transform.py would dangle on a
fresh checkout. Re-running pytest after this commit covers all four
EpcMlTransform tests cleanly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>