Khalim Conn-Kowlessar
195336b7e1
slice 15d: +50 features (gap fill + secondary building part); drop 2 derived
...
Removes:
- environmental_impact_current (SAP-derived rating, leaks into co2 target)
- energy_rating_average (average of sap_score + potential, direct leak)
Adds:
Doors draughtproofed_door_count, insulated_door_u_value
Hot water cylinder_insulation_type, cylinder_thermostat,
secondary_heating_type
Ventilation mechanical_vent_duct_placement, _duct_insulation,
_duct_insulation_level, _measured_installation
Lighting low_energy_fixed_lighting_bulbs_count,
fixed_lighting_outlets_count,
low_energy_fixed_lighting_outlets_count
Windows window_avg_glazing_gap_mm, window_avg_frame_factor,
window_pct_permanent_shutters_insulated
Main dwelling room_in_roof_floor_area_m2, alternative_wall_count,
alternative_wall_area_m2, flat_roof_insulation_thickness_mm,
wall_thickness_measured
Element counts wall_count, roof_count, floor_count,
main_heating_count_elements, main_heating_controls_present
Wind wind_turbine_hub_height_m, wind_turbine_rotor_diameter_m
Flat flat_unheated_corridor_length_m
Addendum addendum_stone_walls, addendum_system_build,
addendum_numbers_count
LZC lzc_energy_sources_count
Secondary part secondary_dwelling_present + 11 fabric features
(wall/roof/floor construction + insulation + thickness
+ area + heat-loss perimeter) + other_building_parts_count
Wires through schema -> domain -> mapper: adds Addendum dataclass,
lzc_energy_sources, mechanical_vent_duct_insulation_level. Also fixes
_measurement_value to accept raw dicts (from_dict left some Measurement
fields as dict when they weren't typed as a dataclass).
Results at N=25,000 2026 RdSAP certs:
sap_score MAPE=0.043 sMAPE=0.036 R^2=0.891
co2_emissions sMAPE=0.106 R^2=0.929
peui_raw MAPE=0.087 sMAPE=0.084 R^2=0.860
peui_ucl MAPE=0.079 sMAPE=0.076 R^2=0.866
space_heating_kwh MAPE=0.112 sMAPE=0.108 R^2=0.947
hot_water_kwh MAPE=0.071 sMAPE=0.069 R^2=0.854 (+0.082 R^2 vs 15b)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 10:13:03 +00:00
Khalim Conn-Kowlessar
9f6f7608b9
slice 15b: +18 features — heating type code, hot water, windows, flat, supply
...
Heating: primary_sap_main_heating_code (the SAP10 heating-system enum was the
single biggest missing input), primary_emitter_temperature,
primary_main_heating_fraction.
Hot water: immersion_heating_type, shower_outlet_count.
Windows: window_pct_living, window_pct_external, window_pct_permanent_shutters
(area-weighted shares parallel to existing window aggregates).
Dwelling: conservatory_type, has_heated_separate_conservatory.
Flat-only block (sap_flat_details): flat_level, flat_top_storey,
flat_storey_count, flat_location, flat_heat_loss_corridor (int sentinels
like '20+' coerce to None for the categorical features).
Energy supply: meter_type, pv_connection, wind_turbines_terrain_type.
Also plumbs `air_tightness` EnergyElement, `sap_flat_details` and
`has_heated_separate_conservatory` through the 21.0.1 mapper path (they were
silently None before).
Results at N=25,000 2026 RdSAP certs:
sap_score MAPE=0.044 sMAPE=0.038 R^2=0.884 (+0.045 R^2 vs 15a)
co2_emissions sMAPE=0.108 R^2=0.925
peui_raw MAPE=0.092 sMAPE=0.088 R^2=0.849
peui_ucl MAPE=0.081 sMAPE=0.078 R^2=0.860
space_heating_kwh MAPE=0.111 sMAPE=0.108 R^2=0.945
hot_water_kwh MAPE=0.081 sMAPE=0.079 R^2=0.772
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 00:08:11 +00:00
Khalim Conn-Kowlessar
0ffda529ec
slice 15a: add wall/floor/roof + demand scalar features for retrofit simulation
...
15 new features wired through schema -> domain -> mapper -> transform:
Main Dwelling fabric (11):
- wall_insulation_type, wall_insulation_thickness_mm, wall_dry_lined,
wall_thickness_mm, party_wall_construction
- roof_insulation_location, roof_insulation_thickness_mm
- floor_construction, floor_insulation, floor_insulation_thickness_mm,
floor_heat_loss
Dwelling-level scalars (4):
- multiple_glazed_proportion, number_baths, number_baths_wwhrs,
extract_fans_count
Thickness strings like '50mm'/'NI'/'ND' parsed via _parse_thickness_mm; NI
(no insulation) lands as 0mm so the model sees the physical zero rather than
a missing value. Categorical sentinels ('NA'/'NI'/'ND') become None.
Also fixed long-standing typo `multiple_glazed_propertion` -> `_proportion`
in domain dataclass + its lone DB-model usage.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 22:08:27 +00:00
Khalim Conn-Kowlessar
c496f345f8
slice 14l: bigger-run fixes — UCL guard, PV Measurement coercion, sMAPE
...
Three changes surfaced by the 25k 2026 run:
- transform._peui_ucl returns None for non-positive raw PEUI (net-exporters).
apply_ucl_correction would otherwise raise ValueError on negative input.
- PhotovoltaicArray scalars (peak_power, pitch, orientation, overshading)
now accept Measurement | int | float in the schema; mapper coerces via
_measurement_value.
- train_baseline reports sMAPE alongside MAPE — handles zero-actual rows
(e.g. co2_emissions for net-zero certs) where MAPE explodes.
Results at N=25,000 RdSAP 2026 certs (~32s end-to-end):
sap_score MAPE=0.064 sMAPE=0.054 R^2=0.762
co2_emissions sMAPE=0.140 R^2=0.890
peui_raw MAPE=0.126 sMAPE=0.120 R^2=0.714
peui_ucl MAPE=0.114 sMAPE=0.108 R^2=0.736
space_heating_kwh MAPE=0.167 sMAPE=0.157 R^2=0.915
hot_water_kwh MAPE=0.089 sMAPE=0.086 R^2=0.737
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 21:15:37 +00:00
Khalim Conn-Kowlessar
8fddd25b9a
slice 14k: E2E pipeline runs on real 2026 RdSAP certs
...
Two production fixes surfaced by the live run:
- mapper.from_rdsap_schema_21_0_1 now sets the three ML target scalars
(energy_rating_current, co2_emissions_current, energy_consumption_current).
They were silently None for every cert before, leaving the only labels as
the kWh fields from renewable_heat_incentive.
- train_baseline coerces object-dtype columns to numeric (None -> NaN) and
drops rows with null target per fit, so LightGBM accepts the frame.
E2E on 500 real certs (~1s):
sap_score R^2=0.604 MAPE=0.084
co2_emissions R^2=0.813 MAPE=0.130
peui_raw R^2=0.979 MAPE=0.026
space_heating_kwh R^2=0.823 MAPE=0.213
hot_water_kwh R^2=0.519 MAPE=0.115
peui_ucl excluded: UCL correction still needs wiring.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 20:47:41 +00:00
Khalim Conn-Kowlessar
6697a6c76e
slice 14j: Optional sweep across schema 21.0.1 + mapper guards
...
Across 500 real RdSAP-21.0.1 certs from 2026, mapper goes 0% -> 100% success.
Schema-loading + ml-transform + ml_training_data: 146 tests pass.
Mainly affected fields:
- SapHeating: instantaneous_wwhrs, shower_outlets (now Union with List shape)
- SapWindow: glazing_gap, frame_factor, pvc_frame, window_transmission_details
- SapEnergySource: pv_battery_count, wind_turbine_details, pv_batteries (List form)
- SapBuildingPart: all 13 sub-fields now Optional
- SapFloorDimension: Measurement | int | float fallback
- RdSapSchema21_0_1: 16 top-level fields (mechanical_vent_*, lighting counts, ...)
Mapper helpers added: _measurement_value, _first_pv_battery, _first_shower_outlet.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 20:35:28 +00:00
Khalim Conn-Kowlessar
b050348927
slice 10.5: PhotovoltaicArray on SAP10 schema + EpcPropertyData
...
SAP10 EPCs with measured PV carry photovoltaic_supply as a nested
list of arrays (peak_power, pitch, orientation, overshading) rather
than the legacy unmeasured wrapper {none_or_no_details:
{percent_roof_area: N}}. The schema-21 dataclasses now accept both
shapes via Union[PhotovoltaicSupply, List[List[PhotovoltaicArray]]],
and from_dict._coerce now dispatches list values onto list type
variants of multi-type Unions.
EpcPropertyData.SapEnergySource gains
photovoltaic_arrays: Optional[List[PhotovoltaicArray]] — populated
when the measured shape is present, otherwise None. The legacy
photovoltaic_supply field is preserved for the fallback case.
Both schema-21.0.0 and 21.0.1 mappers dispatch via the new
_map_schema_21_pv helper.
Unblocks Slice 11 (PV feature aggregation in EpcMlTransform).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 16:00:25 +00:00
Khalim Conn-Kowlessar
dba254e316
slice 8a: window physics and orientation aggregates
...
Thirteen window-aggregate features land on the transform: count,
total area, eight SAP-octant area columns (N/NE/E/SE/S/SW/W/NW),
area-weighted draught-proofing pct, and area-weighted u_value +
solar transmittance (nullable, populated only when windows carry
transmission_details). Windows with orientation outside 1-8 (0,
NR) contribute to count and total area but no octant.
Also: epc codes CSV (gov api /api/codes export, RdSAP-Schema-21.x +
older versions) moved next to EpcPropertyData as epc_codes.csv —
canonical SAP enum source for upcoming categorical-share slices.
.gitignore exception added so the reference CSV is tracked.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 15:32:45 +00:00
Khalim Conn-Kowlessar
81f6163295
added ucl corrected peui
2026-05-16 14:39:24 +00:00
Khalim Conn-Kowlessar
a64e7e74c5
adding kwh feidls to EpcPropertyData and testing to_row
2026-05-16 14:33:25 +00:00
Jun-te Kim
dfc100f78b
rank address similiarity
2026-05-12 16:02:01 +00:00
Jun-te Kim
27f2ef5e83
get rid of duplicate function and make better sensible variable name
2026-05-12 13:46:02 +00:00
Jun-te Kim
e06ead55d0
add more type hint
2026-05-12 09:48:21 +00:00
Jun-te Kim
6504785e7c
merged from main
2026-05-11 12:30:29 +00:00
Jun-te Kim
bf91722f30
renamed a function to be self commenting
2026-05-11 08:45:26 +00:00
Jun-te Kim
fb758b76bf
changed to utils
2026-05-11 08:37:44 +00:00
Jun-te Kim
c9c43f178c
demo generated for use in address2uprn
2026-05-08 14:48:15 +00:00
Jun-te Kim
7a49f5df20
save plan temporary while i incorporate skills to claude
2026-05-08 12:19:03 +00:00
Jun-te Kim
a39c3a0772
added added historic epc data class with shape
2026-05-08 12:03:35 +00:00
Daniel Roth
78da2f88b6
Handle wall thickness "Unmeasurable" 🟩
2026-04-30 16:41:16 +00:00
Khalim Conn-Kowlessar
001e9ce882
remove inline import
2026-04-28 12:03:39 +00:00
Khalim Conn-Kowlessar
821a0a08f7
addressing feedback on from_api_response
2026-04-28 12:02:34 +00:00
Daniel Roth
51bd18e0d7
Rename window frame material column 🟩
2026-04-27 16:11:32 +00:00
Daniel Roth
268576e345
map elmhurst window transmission details to epc property data class 🟩
2026-04-27 14:16:59 +00:00
Daniel Roth
5940977bb2
tweak window transmission data source type
2026-04-27 14:11:33 +00:00
Daniel Roth
865ee3eada
map elmhurst energy fields to epc property data class 🟩
2026-04-27 12:16:26 +00:00
Daniel Roth
6cc73b6ebf
remove unused import
2026-04-27 11:06:58 +00:00
Khalim Conn-Kowlessar
3ed25030d4
added new api call for new epc api
2026-04-25 22:17:38 +00:00
Daniel Roth
b36c8b884c
map remaining Elmhurst fields to EpcPropertyData 🟩
2026-04-24 15:33:59 +00:00
Daniel Roth
1105491141
Map Elmhurst site notes to EpcPropertyData 🟩
2026-04-24 13:52:02 +00:00
Daniel Roth
5aa86a8610
address fields are not optional
2026-04-24 10:22:41 +00:00
Daniel Roth
691d6f04a8
extend EpcPropertyData domain model with site-notes-only fields 🟩
2026-04-23 14:50:41 +00:00
Daniel Roth
308abd359c
extend EpcPropertyData domain model with site-notes-only fields 🟥
2026-04-23 14:39:04 +00:00
Daniel Roth
b5b6e4d358
New properties fully mapped 🟥
2026-04-23 14:36:09 +00:00
Daniel Roth
542034280e
Map misc top-level properties 🟥
2026-04-23 14:34:20 +00:00
Daniel Roth
bf31a35e6d
Map heating boiler properties 🟥
2026-04-23 14:31:10 +00:00
Daniel Roth
d323fe3277
Map floor construction properties 🟥
2026-04-23 14:30:20 +00:00
Daniel Roth
163e87920c
Map ventilation properties 🟥
2026-04-23 14:29:35 +00:00
Daniel Roth
e272a39380
additional fields mapped from pdf 4 🟩
2026-04-23 10:43:57 +00:00
Daniel Roth
c5c3f3fc83
additional fields mapped from pdf 4 🟥
2026-04-23 10:36:15 +00:00
Daniel Roth
448b42196e
map electric storage heater fuel type 🟩
2026-04-21 15:36:32 +00:00
Daniel Roth
44b6a6a0db
map pv connection 🟩
2026-04-21 15:32:00 +00:00
Daniel Roth
5bd4e22886
map pv connection 🟥
2026-04-21 15:31:01 +00:00
Daniel Roth
6035c90bcb
map heating immersion type 🟩
2026-04-21 15:12:57 +00:00
Daniel Roth
a5c9ff09f7
map secondary heating system to secondary_heating_type 🟩
2026-04-21 11:51:45 +00:00
Daniel Roth
f43a4d20eb
map secondary heating system to secondary_heating_type 🟥
2026-04-21 11:51:14 +00:00
Daniel Roth
4410b6f3e9
Fix broken mapper test
2026-04-21 11:28:23 +00:00
Daniel Roth
c8ecc82a04
map water heating cylinder size 🟩
2026-04-21 11:14:05 +00:00
Daniel Roth
c558b79b68
map water heating cylinder size 🟥
2026-04-21 11:13:04 +00:00
Daniel Roth
b64c9d275f
Rename cylinder_insulation_thickness to cylinder_insulation_thickness_mm
2026-04-21 11:06:21 +00:00