Model

mirror of https://github.com/Hestia-Homes/Model.git synced 2026-08-03 05:18:22 +00:00

Author	SHA1	Message	Date
Khalim Conn-Kowlessar	29c776bb23	slice S-B6: glazing g_perpendicular + frame_factor lookups (Tables 6b/6c) Replaces the two hardcoded glazing defaults (g⊥=0.63, FF=0.7) in the cert→inputs mapper with spec-driven lookups: - g_perpendicular by glazing_type (Table 6b): single → 0.85, double 2002+ → 0.72, low-E soft → 0.63, secondary → 0.76, triple → 0.68. Default 0.72 when missing. - frame_factor by frame_material (Table 6c): wood/PVC/composite → 0.70, aluminium/steel/metal → 0.83. Measured values from window_transmission_details / SapWindow.frame_factor still take precedence. Overshading factor stays at 0.77 ("average") since RdSAP 10 doesn't lodge a per-window overshading code. 100-cert parity probe: MAE 5.65 → 5.70 (flat) exact-match within ±1: 18% → 20% bias +1.13 → +1.50 Slight bias drift toward over-prediction is expected — bigger solar gains reduce predicted heating demand. Net: the engine is now more spec-correct (more exact matches), but composition of errors elsewhere needs the next slice to bring bias back toward 0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 14:48:11 +00:00
Khalim Conn-Kowlessar	f3baa51a9b	slice S-B5: main_heating_control code → SAP control type Maps the Table 9 main_heating_control code to SAP control type 1/2/3: codes 2101-2104 = type 1, 2105-2109 = type 2, 2110+ = type 3. Default remains type 2 when code is missing or unrecognised. Two other fixes tried-and-reverted in this slice based on the 100-cert parity probe: - NI-thickness → None (the "wall insulated but thickness unknown, use 50mm row" path): over-corrected in aggregate because many "NI" certs are genuinely uninsulated. Reverted to legacy NI→0 with a note to revisit once wall_insulation_type is used as a stronger signal. - boiler-age efficiency rescue (cat 1/2, A-F → 0.74, K-M → 0.85): same issue — stacked with NI fix it over-shot, on its own it gave marginal MAE without bias improvement. Dropped pending further investigation. 100-cert parity probe: MAE 5.72 → 5.65 (-0.07; control-type-only is a small net win) RMSE 7.58 → 7.48 (-0.10) bias +1.20 → +1.13 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 14:37:44 +00:00
Khalim Conn-Kowlessar	8e1d30c97d	slice S-B4: per-end-use fuel cost (Economy-7 for electric storage) Splits the single CalculatorInputs.fuel_unit_cost_gbp_per_kwh into three end-use lines — space_heating, hot_water, other — to match SAP 10.3 §12 which charges different tariffs per end-use on Economy-7 dwellings. cert→inputs rule: when sap_main_heating_code is in the electric-storage (401-409), high-heat-retention storage (421-425), or direct-electric (191-196) ranges, space heating bills at the 7h-low rate (5.5p/kWh) while hot water + lighting + pumps stay on standard electricity (13.19p/kWh). All other fuels use a single rate across all three end- uses. 100-cert parity probe impact: MAE 7.53 → 5.72 (-1.81, -24%) RMSE 11.60 → 7.58 (-4.02, -35%) worst residual -56 → -25 (Semi-detached bungalow) within ±10: 85% → 91% Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 14:18:56 +00:00
Khalim Conn-Kowlessar	ccdaba5acd	slice S-B3: flat heat-loss surface awareness DwellingExposure flags on heat_transmission_from_cert suppress the floor and/or roof channels when those surfaces are party with a neighbouring dwelling. Cert mapper derives the flags from EpcPropertyData.dwelling_type prefix: - "Mid-floor " → floor=False, roof=False - "Top-floor " → floor=False, roof=True - "Ground-floor *" → floor=True, roof=False - everything else → both exposed 100-cert parity probe impact: MAE 8.41 → 7.53 (-0.88) RMSE 13.98 → 11.60 (-2.38) bias -2.65 → -0.61 (system bias on flats essentially eliminated) Bungalow outliers (-56 worst residual) untouched — different failure mode (full envelope, but cascade U-values too conservative or storey count over-counted). Next slice tackles that. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 14:10:45 +00:00
Khalim Conn-Kowlessar	dde8ae30fa	S-B2: parity probe + first-pass findings (100-cert baseline) Adds services/ml_training_data/src/ml_training_data/sap_parity_probe.py — samples N certs from the v18a corpus, streams them via BulkZipReader, runs Sap10Calculator, prints MAE/RMSE/bias + worst-N residuals. Baseline across 100 certs: MAE 8.41, RMSE 13.98, bias -2.65, 0 errors. docs/sap-spec/PARITY_FINDINGS.md captures the dominant failure pattern (flats + bungalows under-predicted, 10 of the worst-15 are flats whose floor/roof are party with neighbouring dwellings) and the priority- ordered Session B iteration backlog (S-B-flat-surfaces first). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 13:59:23 +00:00
Khalim Conn-Kowlessar	57f18a8773	slice S-B1: parity-validation report aggregator Pure-function ParityCase / ParityReport / build_parity_report for the Session B 1000-cert parity check (ADR-0009). Aggregates per-cert (predicted, actual) sap pairs into global + typical-subset MAE, RMSE, bias, and the worst-N residuals for spec-iteration. Cert→case mapping (corpus load, calculator run, actual-sap lookup) sits at a higher layer; this module is trivial to test so the harder integration code inherits its testing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 13:22:45 +00:00
Khalim Conn-Kowlessar	a243055de7	slice S-A7b: RdSAP cert→inputs mapper + Sap10Calculator.calculate(epc) Adds domain.sap.rdsap.cert_to_inputs.cert_to_inputs(epc) which produces a typed CalculatorInputs from an EpcPropertyData, and a thin Sap10Calculator.calculate(epc) entry point that wraps the mapper + the S-A7a orchestrator. Defaults follow RdSAP 10 (Table 27 for living-area fraction, Table 5 for ventilation, Table 12 for fuel cost + CO2 factor) and SAP 10.3 Tables 4a/4b for heating efficiency via the existing domain.ml.sap_efficiencies cascade. Deferred to Session B: conservatory modes, room-in-roof, secondary heating split (Table 11), multi-fuel weighted cost, thermal-mass parameter from construction type, control-temp adjustment from main_heating_control code. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 09:34:41 +00:00
Khalim Conn-Kowlessar	684e2945ae	slice S-A7a: Sap10Calculator orchestrator (synthetic-input) Wires SAP 10.3 §§5-13 into a 12-month heat-balance loop driven by a typed CalculatorInputs aggregate, returning a typed SapResult with the score, ECF, costs/CO2 totals, and a 12-entry monthly breakdown. Physics assembly only — the cert→inputs mapper lands in S-A7b. η/T_internal solved with two-pass iteration per SAP 10.3 §7.3. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 09:27:28 +00:00
Khalim Conn-Kowlessar	9106621aee	slice S-A6: SAP10.3 rating + EI rating formulas (§13 + §14) Tenth slice of the SAP10 Calculator Session A (ADR-0009). Ships four pure functions under domain.sap.worksheet.rating implementing the SAP 10.3 rating formulas: energy_cost_factor(total_cost_gbp, total_floor_area_m2) -> equation (7): ECF = 0.36 × cost / (TFA + 45) Deflator 0.36 sourced from Table 12 (page 191). sap_rating(ecf) -> equations (8)/(9), continuous (un-rounded) SAP value: ECF ≥ 3.5: 108.8 − 120.5 × log10(ECF) ECF < 3.5: 100 − 16.21 × ECF Naturally rises above 100 for net energy exporters (negative ECF). sap_rating_integer(ecf) -> integer SAP value as published on the EPC: round to nearest, clamp to minimum 1 per §13. environmental_impact_rating(co2_emissions_kg_per_yr, total_floor_area_m2) -> equations (10)-(12), continuous EI rating: CF = CO2 / (TFA + 45) CF ≥ 28.3: 200 − 95 × log10(CF) CF < 28.3: 100 − 1.34 × CF 8 AAA cycles cover: ECF formula hand-computed, SAP linear branch (typical home), SAP log branch (high cost), boundary continuity at ECF=3.5, net-exporter SAP > 100, integer rounding + min-1 clamp, EI linear branch, EI log branch. Orchestrator (S-A7) wires these into Sap10Calculator alongside the monthly heat balance loop from S-A5e.	2026-05-18 09:12:25 +00:00
Khalim Conn-Kowlessar	c0afe3592f	slice S-A5e: monthly space-heating requirement (SAP 10.3 Table 9c step 10) Ninth slice of the SAP10 Calculator Session A (ADR-0009). Ships monthly_heat_requirement_kwh implementing the Table 9c step-10 formula: L_m = H × (T_i,m − T_e,m) (W) Q_heat,m = 0.024 × (L_m − η_m × G_m) × n_m (kWh) with the table's clamp: Q_heat is set to 0 when negative or below 1 kWh per month (summer months and well-insulated dwellings in shoulder months). The orchestrator (S-A6) iterates utilisation factor + mean internal temperature until they converge before calling this function. 5 AAA cycles cover: typical-winter-month hand-computed worked example, summer month with gains exceeding losses clamping to 0, gains-scaling direction check, external-temperature direction check, and the sub-1-kWh clamp per the Table 9c note.	2026-05-18 09:00:05 +00:00
Khalim Conn-Kowlessar	8c21b399c6	slice S-A5d: mean internal temperature (SAP 10.3 Tables 9 + 9b + 9c) Eighth slice of the SAP10 Calculator Session A (ADR-0009). Implements SAP 10.3 mean internal temperature with three public helpers under domain.sap.worksheet.mean_internal_temperature: elsewhere_heating_temperature_c(hlp, control_type) -> Table 9 T_h2 formula: control type 1: T_h2 = 21 − 0.5 × HLP control type 2 or 3: T_h2 = 21 − HLP + HLP² / 12 HLP clamped to 6.0 per Table 9 note (e). off_period_temperature_reduction_c(t_off, T_h, T_e, R, G, H, η, τ) -> Table 9b u value (°C drop below T_h over an off-period): t_c = 4 + 0.25·τ T_sc = (1−R)(T_h−2) + R·(T_e + η·G/H) quadratic branch when t_off ≤ t_c, linear when t_off > t_c. mean_internal_temperature_c(...) -> Table 9c steps 1-8: living-area zone (off 7+8 h, T_h1=21°C) and elsewhere zone (off 7+8 h for control 1/2 or 9+8 h for control 3, T_h2 from above), blended by living_area_fraction, plus the Table 4e control-type temperature adjustment. Step 9 (re-compute utilisation factor with the new T_i) and step 10 (Q_heat = 0.024 × (L − η·G) × n_m) live in the next slice's monthly loop. 7 AAA cycles cover: T_h2 formulas for control types 1 vs 2, HLP > 6 clamp per note (e), off-period u quadratic branch (t_off ≤ t_c), off-period u linear branch (t_off > t_c), full mean_internal_temperature hand-computed worked example, and control-type-3 longer first off-period dropping mean temp slightly below control-type-2.	2026-05-18 08:52:11 +00:00
Khalim Conn-Kowlessar	e403e2302c	slice S-A5c: heating utilisation factor η (SAP 10.3 Table 9a) Seventh slice of the SAP10 Calculator Session A (ADR-0009). Ships utilisation_factor(*, total_gains_w, heat_loss_rate_w, time_constant_h) implementing SAP 10.3 Table 9a: a = 1 + τ / 15 γ = G / L if γ > 0 and γ ≠ 1: η = (1 − γ^a) / (1 − γ^(a+1)) if γ = 1: η = a / (a + 1) if heat_loss_rate ≤ 0: η = 1 (dwelling in net surplus) η caps the contribution of internal + solar gains when they outpace the heat-loss rate. The orchestrator computes time_constant_h = TMP / (3.6 × HLP) and passes it in here; that's a future slice. 5 AAA cycles cover: small γ → η ≈ 1, γ = 1 special-case formula, zero/negative heat loss returning η = 1, large γ dropping η well below 0.5, and higher τ (more thermal mass) raising η for the same γ.	2026-05-18 08:38:03 +00:00
Khalim Conn-Kowlessar	57bf7833a9	slice S-A5b: solar gains (SAP 10.3 §6 + Appendix U §U3.2) Sixth slice of the SAP10 Calculator Session A (ADR-0009). Two layers under domain.sap.worksheet.solar_gains: 1. surface_solar_flux_w_per_m2(orientation, pitch_deg, region, month) — implements Appendix U §U3.2 polynomial that converts the horizontal solar irradiance from Table U3 to per-orientation per-pitch surface flux: S(orient, p, m) = S_h,m × R_h-inc R_h-inc = A cos²(φ-δ) + B cos(φ-δ) + C where A, B, C are cubics in sin(p/2) with coefficients k1-k9 from Table U5. Reads latitude φ from Table U4 and solar declination δ from Table U3 footer (already in domain.sap.climate.appendix_u). 2. window_solar_gain_w(area_m2, surface_flux, g⊥, FF, Z) — implements §6.1 equation (5): G = 0.9 × A × S × g⊥ × FF × Z. Orientation enum maps the 8 SAP cardinal codes to the 5 Table U5 columns: N/S to their own column; NE/NW share; E/W share; SE/SW share. 7 AAA cycles cover: UK average South vertical July hand-computed flux, rooflight pitch=0 collapses to horizontal Table U3 directly, North-vertical summer > winter (diffuse signal), NE/NW share constants symmetry, equation (5) window gain, zero-area edge case, out-of-range region validation. Tables 6b (g⊥), 6c (frame factor), 6d (overshading Z) defaults deferred to the cert→inputs mapper slice — callers pass them explicitly here so the physics stays cert-shape-independent.	2026-05-17 22:59:25 +00:00
Khalim Conn-Kowlessar	c317a72b71	slice S-A5a: internal gains (SAP 10.3 §5 + Appendix L) Fifth slice of the SAP10 Calculator Session A (ADR-0009). Ships internal_gains_w(*, total_floor_area_m2, month, occupancy=None) returning an InternalGainsBreakdown over four named SAP 10.3 components: metabolic_w — 60 W × N (SAP convention; constant year-round) cooking_w — 35 + 7N per Appendix L equation (L18) appliances_w — Appendix L (L13) E_A = 207.8 × (TFA × N)^0.4714 with the (L14) monthly cosine variation, converted to watts via (L16a) lighting_w — Appendix L existing-dwelling fallback chain (L5b, L8c, L9c-d, L10, L12). Default efficacy 21.3 lm/W, no daylight bonus, 85% internal fraction. Occupancy defaults via Appendix J Table 1b when not supplied: N = 1 + 1.76 × (1 - exp(-0.000349 × (TFA - 13.9)²)) + 0.0013 × (TFA - 13.9) for TFA > 13.9 m², else N = 1. Daylight-factor + occupancy override remain caller's responsibility for later slices (solar_gains will populate G_L; cert-to-inputs mapper will choose between RdSAP default and explicit assessor input). 8 AAA cycles cover: cooking constant, metabolic 60W/N, Appendix J occupancy default for typical and tiny TFA, appliances monthly variation, lighting existing-dwelling fallback, total = sum, month-range validation.	2026-05-17 22:42:20 +00:00
Khalim Conn-Kowlessar	732eef6adb	slice S-A4: heat-transmission HLC breakdown (SAP 10.3 §3) Fourth slice of the SAP10 Calculator Session A (ADR-0009). Ports the per-element conduction HLC logic out of domain.ml.envelope into a typed HeatTransmission breakdown under domain.sap.worksheet. Aggregates Σ U×A across walls, roof, floor, party walls, windows, doors, plus thermal- bridging y × total exposed area, summed across every building part. The orchestrator can now read walls_w_per_k / roof_w_per_k / floor_w_per_k etc. directly off the result for audit + monthly-loop wiring, rather than seeing a single envelope_heat_loss scalar. U-value cascade still routes through domain.ml.rdsap_uvalues (migrates to domain.sap.rdsap.cascade_defaults in Session B per ADR-0009 module-layout plan). domain.ml.envelope stays in place to keep the ML transform's physics-feature pipeline running until Session B. 6 AAA cycles cover: per-element breakdown for a baseline age-G cavity mid-terrace, window net-wall subtraction, insulated-door U-value blending, cavity-party-wall contribution per Table 15, thermal-bridging scaling by age band per Table 21, and multi-part (main + extension) aggregation. 192 tests pass across domain.sap + domain.ml — no regressions.	2026-05-17 22:30:56 +00:00
Khalim Conn-Kowlessar	3fcec7ef22	slice S-A3: infiltration worksheet lines (6a)-(16) (SAP 10.3 §2) Third slice of the SAP10 Calculator Session A (ADR-0009). Ports the SAP 10.2 / RdSAP10 §4.1 air-change-rate worksheet for the no-pressure-test path. Returns an InfiltrationBreakdown carrying each named worksheet line so callers can audit per SAP convention: (8) openings_ach — Table 2.1 rate × count / volume (10) additional_ach — (storey_count − 1) × 0.1 (11) structural_ach — 0.25 steel/timber-frame, 0.35 masonry (12) floor_ach — 0.2 unsealed timber / 0.1 sealed / 0 (13) draught_lobby_ach — 0.05 absent, 0.0 present (15) window_ach — 0.25 − 0.2 × (pct_dp / 100) (16) total_ach — sum of all of the above Table 2.1 rates: open chimney 80, open flue 20, closed-fire chimney 10, solid-fuel-boiler chimney 20, other-heater chimney 35, blocked chimney 20, intermittent fan 10, passive vent 10, flueless gas fire 40 (all m³/hour per opening). 9 AAA cycles cover the baseline calculation, each Table 2.1 opening contribution, frame-vs-masonry structural baseline, suspended-timber floor sealed/unsealed split, draught-lobby presence, window draught- proofing scale, multi-opening aggregation, and volume_m3 ≤ 0 validation. Pressure-test override (worksheet lines 17-21) and mechanical-ventilation adjustments (Table 4g, n_eff formula §2.6.6) are out of scope for this slice — separate later slices per ADR-0009.	2026-05-17 22:00:10 +00:00
Khalim Conn-Kowlessar	fa5bdcc26f	slice S-A2: dimensions module (SAP 10.3 §1) Second slice of the SAP10 Calculator Session A (ADR-0009). Ships a frozen Dimensions dataclass + dimensions_from_cert(epc) pure function under domain/sap/worksheet/. Aggregates geometry across every sap_building_parts entry (main dwelling + each extension): total floor area, volume, storey count, area-weighted average storey height, ground/top floor area, ground-floor heat-loss perimeter, gross wall area, party wall area. Top-level epc.total_floor_area_m2 is the authoritative TFA; per-storey sums drive the wall-area calculations. Volume = TFA × avg_storey_height. 5 AAA cycles cover: single-storey single-part, two-storey scaling, main+extension aggregation, empty-cert fallback to default 2.5 m height, and a non-default-height terrace exercising party-wall scaling. Edge cases (porches, conservatories, integral garages, RIR storey treatment) deferred to later slices per ADR-0009 Session A scope.	2026-05-17 21:49:29 +00:00
Khalim Conn-Kowlessar	2661481625	slice S-A1: Appendix U climate tables (U1/U2/U3) First slice of the SAP10 Calculator Session A (ADR-0009). Ships the three SAP 10.3 Appendix U monthly tables across 22 climate regions (region 0 = UK average; 1-21 named per spec) as a pure-data module under the new domain/sap/ package: - Table U1: mean external temperature (°C) - Table U2: wind speed (m/s) - Table U3: mean global solar irradiance on horizontal plane (W/m²) - Table U3 footer: monthly solar declination (°, region-independent) Lookups validate region (0..21) and month (1..12) and raise ValueError on out-of-range inputs. 11 AAA tests cover happy-path lookups across multiple regions/months plus boundary and error cases.	2026-05-17 21:43:09 +00:00
Khalim Conn-Kowlessar	8dbe873daf	ADR-0009: pivot to deterministic SAP 10.3 calculator (Accepted) Promotes ADR-0009 from Proposed to Accepted after the grill-with-docs session resolved all seven open questions. Bundles the SAP 10.3 and RdSAP 10 specifications under docs/sap-spec/ plus a calculator design sketch (module layout, monthly-loop pseudo-code, status table). CONTEXT.md adds three new domain terms parallel to existing performance language: - Calculated SAP10 Performance (parallel to Effective / Lodged) - SAP10 Calculation (process; implemented by Sap10Calculator) - Measure Application (process; implemented by MeasureApplicator) ML pipeline is NOT retired — it stays as the residual head once the calculator reaches parity in Session B. ADR-0009 §"Grill outcomes" carries the seven binding scope decisions plus three Session-A-scope changes discovered during the grill (RdSAP §19 EER formula, SAP 10.2 Appendix A cross-reference, RdSAP Table 29 cascade defaults).	2026-05-17 21:27:21 +00:00
Khalim Conn-Kowlessar	244f4555ac	slice 20a.1: route ventilation through predicted_space_heating_kwh (v2.7.1) v20a added ventilation_heat_loss_w_per_k as a standalone feature but never connected it to the HLC inside predicted_space_heating_kwh, so the downstream physics aggregates (predicted_ecf, predicted_total_fuel_cost, predicted_log10_ecf — the top-10 model features) never saw the infiltration signal. Importance for ventilation_heat_loss_w_per_k was rank 58/196 (importance 30) vs envelope's rank 21 (86). Adds the ventilation column to the envelope-conduction HLC before applying HDH and efficiency, so chimney + draught-proofing signals flow through the physics aggregates the model actually uses. Default 0 keeps backwards compatibility.	2026-05-17 18:48:57 +00:00
Khalim Conn-Kowlessar	4d838bb03c	slice 20a: ventilation_heat_loss_w_per_k feature (v2.7.0) Adds SAP10.2 §C tracer-bullet infiltration model as a new physics-as-feature column alongside envelope_heat_loss_w_per_k. ACH = structural baseline (0.35 masonry / 0.25 timber-or-system-built) + open chimneys at 40 m³/h each minus a draught-proofing reduction scaled by window_pct_draught_proofed, then volumed and converted to W/K. Targets the d0 catastrophic-low-SAP tail where chimney + leakage signals dominate but envelope conduction alone under-counts heat loss. Scope deferred to follow-ups: MVHR/MEV factors (mechanical_ventilation is 100% null in the corpus), pressure-test override (pressure_test also 100% null - slice 18e mapper fix), open flues / passive vents / flueless gas fires (sap_ventilation sparsely populated).	2026-05-17 18:30:02 +00:00
Khalim Conn-Kowlessar	831ebac2ae	slice 18d: seasonal_efficiency category fallback for null SAP code (v2.6.0) Many real certs carry main_heating_category=4 (heat pump) but null sap_main_heating_code, so seasonal_efficiency() was returning the 0.80 gas-boiler default — a 3x COP under-count that dragged the high-SAP heat-pump tail. Adds main_heating_category + main_fuel_type fallbacks: cat=4 -> 2.30, cat=7 -> 1.00, cat=10 routes by fuel (electric=1.00, gas=0.55, oil=0.65), cat=5 warm air -> 0.76. Explicit SAP codes still win.	2026-05-17 18:13:47 +00:00
Khalim Conn-Kowlessar	d11d4df3df	slice 18c: description-aware u_wall material fallback (v2.5.0) When wall_construction integer is missing or WALL_UNKNOWN, u_wall now parses the top-level walls[i].description for material keywords (sandstone/limestone/granite/whinstone/cob/system built/timber frame/ solid brick/cavity) before falling through to the cavity-by-age default. Explicit construction codes still win. Threaded through envelope_heat_loss_w_per_k via a joined wall description string off the top-level walls list.	2026-05-17 17:55:09 +00:00
Khalim Conn-Kowlessar	60eea0f52b	slice 18b: description-aware u_roof for catastrophic roofs (v2.4.0) Table 18 age-band roof defaults assume joist insulation >= 100mm, which mis-rates heritage roofs the surveyor explicitly described as uninsulated. u_roof now reads roofs[i].description and routes "no insulation" / "uninsulated" -> 2.30 W/m^2K and "limited insulation" -> 1.50 W/m^2K, threaded through envelope_heat_loss_w_per_k via a single joined description string off the top-level roofs list. Explicit insulation_thickness_mm still wins over description.	2026-05-17 17:32:57 +00:00
Khalim Conn-Kowlessar	696d43112e	fix: translate gov EPC API fuel codes to SAP10.2 Table 32 (v2.3.0) predicted_total_fuel_cost_gbp was silently mispricing every non-gas property because primary_main_fuel_type / water_heating_fuel store the gov EPC API enum (26=mains gas, 27=LPG, 28=oil, 29=electricity) and our _FUEL_UNIT_PRICE dict is keyed by Table 32 codes (1=gas, 4=oil, 30=elec). Codes 26-29 hit the dict's default 3.48 p/kWh -- silently treating electric immersion as gas. Concrete impact on OX1 5LR Sep 2025 cert (worst-predicted SAP=41, model 84): water_heating_fuel=29 (electric immersion). Real DHW cost 2941 kWh * 13.19p = £388/yr; we computed 2941 * 3.48 = £102 (4x under). Net predicted_total_fuel_cost £292 vs implied real £2513 -- predicted_ecf 0.49 (~SAP 93) vs real ECF 4.24 (SAP 41). Effect: every off-gas property's predicted_ecf was systematically too low, dragging the model's catastrophic-low-SAP predictions toward mid-band. Expected to substantially reduce decile-0 bias on retrain. New _API_TO_TABLE32 map covers codes 0-29. 4 new AAA tests; VERSION 2.2.0 -> 2.3.0 (MINOR; behavioural fix to existing column values).	2026-05-17 17:02:21 +00:00
Khalim Conn-Kowlessar	4df1ee78b7	slice 17b: SAP Appendix J port for predicted_hot_water_kwh (v2.2.0) The 17a-baseline residuals showed cylinder_insulation_thickness_mm, cylinder_size and cylinder_insulation_type at ranks 3/6/9 for hot_water_kwh because the crude 16d formula didn't use them -- the model had to learn storage physics from raw features. Now predicted_hot_water_kwh sums: useful_demand (existing, unchanged) + distribution_loss = useful * 0.15 + storage_loss = volume * insulation_factor * 365 * 0.6 (volume from cylinder_size, factor from cylinder_insulation_thickness_mm or age-default) + primary_circuit_loss = 245 (age A-J) / 60 (age K-M) - wwhrs_credit = useful * 0.12 if number_baths_wwhrs > 0 - solar_hw_credit = 250 if solar_water_heating all / efficiency_water = delivered kWh Same inputs we already extract; just plumbed through. Expected: predicted_hot_water_kwh feature usage jumps from rank 10 to top tier, hot_water_kwh MAPE drops from 7.17%, and predicted_ecf gets tighter for gas-heat + electric-DHW mid-band homes -> SAP MAPE marginally better. 5 new AAA tests; VERSION 2.1.0 -> 2.2.0 (MINOR; column semantics enriched).	2026-05-17 15:54:42 +00:00
Khalim Conn-Kowlessar	06ce3205b1	slice 17a: PV-export credit in predicted_total_fuel_cost (v2.1.0) Closes the high-SAP under-prediction gap diagnosed in 16h. 40% of SAP-85+ properties have PV; predicted_ecf was 1.74 mean at that band -> SAP ~88 via the formula, vs label SAP 90+. Inverse: PV homes had HIGHER predicted_ecf than non-PV at the same band because cost reconstruction had zero export credit. New helper: predicted_pv_generation_kwh(kWp, region) -> kWh/yr from a SAP10.2 Table 6e regional yield factor (UK avg 850 kWh/kWp/yr; Highland 650; Thames 920). predicted_total_fuel_cost_gbp now subtracts pv_kwh * standard electricity price (Table 32 code 30, both self-consumption and export at 13.19 p/kWh). New feature column predicted_pv_generation_kwh exposed alongside the adjusted cost so the model sees both signals. VERSION 2.0.0 -> 2.1.0 (MINOR: column added; existing column semantics shifted but pre-deploy so no consumer break).	2026-05-17 15:28:09 +00:00
Khalim Conn-Kowlessar	6072d8795a	slice 16i: MAE + RMSE in metrics; sample_weight_fn + low_sap_tail_weight train_baseline now returns mae + rmse alongside mape/smape/r2. MAE is the user-facing metric ("predicted SAP within N points"); RMSE the quadratic counterpart. Both come straight from sklearn. New sample_weight_fn parameter: callable(y_train) -> per-row weights. Threads into LGBMRegressor.fit's sample_weight argument. Default None preserves existing behaviour. Default tail strategy exposed as low_sap_tail_weight(y, threshold=58, weight=3): 3x weight where SAP < 58. Threshold picked from slice 16h's per-decile residuals — decile 0 (SAP 1-58) carries 17% MAPE vs <5% body. Three TDD tracers, all AAA.	2026-05-17 14:48:00 +00:00
Khalim Conn-Kowlessar	ece1279475	revert slice 16g: drop mape objective per 16h ablation 250k retrain showed objective='mape' loses ~0.6 percentage points of global sap_score MAPE (3.92% with regression vs 4.50% with mape) and ~0.7 pts on peui_ucl. The mape objective over-weights the low-SAP tail (weight ~1/y) and drags the body MAPE up by more than it gains in the tail. Body MAPE on v16 features is already strong (2.38% on deciles 1-8); the remaining tail bias at decile 0 (SAP<58, +3.1 bias) needs a different fix -- sample weights or stratified loss -- queued as slice 16i.	2026-05-17 14:34:04 +00:00
Khalim Conn-Kowlessar	05ef54bb02	restore transaction_type; keep tenure dropped (v2.0.0 stands) User reverted the transaction_type drop after noting that it doesn't help detect full-SAP assessments (that's `assessment_type` on the bulk-register record, filtered out at build_features.py:37). tenure removal stays; v2.0.0 still MAJOR (a column was removed).	2026-05-17 12:41:14 +00:00
Khalim Conn-Kowlessar	6aa3ddfbf4	drop tenure + transaction_type from features (v2.0.0) Neither field physically affects SAP rating; they're dataset-side metadata (owner-occupied vs rented, sale vs marketed) and any correlation with sap_score is confounded with age/condition that the model already sees through built_form / property_type / construction_age_band. Dropping reduces feature count and removes a source of spurious split-gain. MAJOR per ADR-0007 versioning policy (column removal): 1.0.0 -> 2.0.0.	2026-05-17 12:37:52 +00:00
Khalim Conn-Kowlessar	e8b6f19a3a	fix(16d): predicted_lighting_kwh handles None bulb counts EPC bulb-count fields are Optional[int]; 1k-cert sanity-check from slice 16h hit None + None TypeError. Coerce to 0 before sum.	2026-05-17 12:25:59 +00:00
Khalim Conn-Kowlessar	700ff4640c	slice 16g: LightGBM objective=mape for sap_score + peui_ucl Per ADR-0008: the v15 baseline reports MAPE but optimises MSE, which under-weights tail rows. Switching to objective='mape' applies gradient proportional to 1/\|y\| and lets the model focus where MAPE penalises. Targets co2_emissions, space_heating_kwh, hot_water_kwh, and peui_raw retain the default 'regression' objective (some rows have ~zero CO2 from heavy PV; MAPE objective destabilises near zero). Sample weights deferred to slice 16i if slice 16h's per-decile residuals still show tail bias after the objective switch.	2026-05-17 12:06:13 +00:00
Khalim Conn-Kowlessar	5c20e323da	slice 16f: rename secondary_dwelling_* -> extension_1_* (v1.0.0 MAJOR bump) 12 columns renamed; extension_2_* not added (88% null on 250k corpus; envelope_heat_loss_w_per_k already sums extension_2+ via part-iterator). ADR-0008. VERSION 0.4.0 -> 1.0.0 (MAJOR per ADR-0007 versioning policy). Coordinated cutover with AutoGluon repo + scoring lambda required at deploy time. features_v16.txt is regenerated from transform.schema() at write-parquet time (data/ml_training is gitignored; not committed).	2026-05-17 12:05:01 +00:00
Khalim Conn-Kowlessar	cda469dd7d	slice 16e: predicted_total_fuel_cost / predicted_ecf / predicted_log10_ecf ECF reconstruction per SAP10 §20.1 (Mid physics, ADR-0008): total_cost_gbp = (space_kwhp_space + dhw_kwhp_dhw + light_kwhp_elec) / 100 ECF = 0.42 total_cost / (TFA + 45) log10_ecf = log10(ECF) [0 for non-positive] p_* are Table 32 unit prices via fuel_unit_price_p_per_kwh. Standing charges deliberately omitted (constant fuel-mix offset; ADR-0008). predicted_sap_score is NOT emitted as a feature (ADR-0008 Mid not Deep): the model is left to learn the piecewise log/linear transform from log10_ecf -> SAP itself, keeping the data layer SAP-version-agnostic. VERSION 0.3.0 -> 0.4.0 (MINOR).	2026-05-17 12:00:06 +00:00
Khalim Conn-Kowlessar	eee5421112	slice 16d: predicted_space/hot_water/lighting_kwh + seasonal-efficiency features New module domain.ml.demand emits crude annual demand approximations (ADR-0008 "crude annual"): predicted_space_heating_kwh = HLC * HDH_region * 1e-3 / efficiency_main predicted_hot_water_kwh = SAP10.2 J simplified (Vd, dT, +10% losses) predicted_lighting_kwh = 9.3 * TFA reduced by LED/CFL share HDH lookup covers SAP10.2's 22 regions; fallback UK avg = 53,000 K*h/yr. Plus two seasonal-efficiency features straight off the Table 4a/4b lookup from slice 16b (seasonal_efficiency_main_heating / seasonal_efficiency_water_heating). Wired into to_row; VERSION 0.2.0 -> 0.3.0 (MINOR).	2026-05-17 11:57:29 +00:00
Khalim Conn-Kowlessar	fca8815991	slice 16c: envelope_heat_loss_w_per_k feature New module domain.ml.envelope sums Sigma(UA) + yA_exposed across every sap_building_part on a cert. U-values come from rdsap_uvalues' cascade defaults, so the feature is never null. Per-part inputs: wall / roof / floor / party-wall / windows / doors. Windows + doors are apportioned to the main part (first in the list) per RdSAP10 convention. Wired into EpcMlTransform.to_row; transform VERSION 0.1.0 -> 0.2.0 (MINOR bump for an additive column per the ADR-0007 policy). 7 envelope unit tests + 2 transform-level tests, all AAA. Reference geometry: 100 m^2 age-G mid-terrace -> ~208 W/K; doubles for two storeys; drops with better insulation; sums across extensions.	2026-05-17 11:53:43 +00:00
Khalim Conn-Kowlessar	67a4f92d53	slice 16b: sap_efficiencies.py with Table 4a/4b/32 lookups Encodes SAP10.2 Table 4a (heating-system code -> space-eff %), Table 4b (gas/oil boiler winter eff %), and Table 32 (fuel-code -> p/kWh). Helpers: - seasonal_efficiency(code) -> decimal; unknown -> 0.80 (gas-boiler typical) - water_heating_efficiency(water_code, main_code) -> decimal; codes 901/914 inherit the main code's efficiency - fuel_unit_price_p_per_kwh(fuel_code) -> p/kWh; unknown -> 3.48 (mains gas) All returns are total. Provides the seasonal-efficiency input to slice 16d and the price multipliers for slice 16e's cost reconstruction.	2026-05-17 11:45:40 +00:00
Khalim Conn-Kowlessar	8bd8f8a622	slice 16a: rdsap_uvalues.py with cascade-defaulting U-value helpers Encodes RdSAP10 Tables 6-9 (walls), 15 (party walls), 16+18 (roofs), 19+BS EN ISO 13370 (floors), 20 (upper floors), 21 (thermal bridging), 24 (windows), 26 (doors). Helpers (u_wall / u_roof / u_floor / u_window / u_door / u_party_wall / thermal_bridging_y) cascade through cert -> age-band default -> country default -> mid-range fallback so the envelope-heat-loss feature is never null. Mirrors the RdSAP "assume as-built if no evidence" rule. Country.from_code collapses EAW/GB/UK/unknown to ENG; SCT/NIR/WAL get explicit K-M overrides where Tables 7-9 diverge from Table 6 (England). 28 tests, all AAA, cover the reference values and the cascade fallbacks.	2026-05-17 11:36:39 +00:00
Khalim Conn-Kowlessar	f61d74a327	docs: ADR-0008 physics-as-feature + v16.0.0 schema bump Captures the slice-16 plan decisions before code lands: - Mid-physics: predicted_ecf + predicted_log10_ecf, NOT predicted_sap_score - Cost scope: heating + DHW + lighting (no PV/pumps/secondary) - Crude annual heat-demand calc (HLC * HDH / efficiency) - Cascade-defaulting U-value imputation - envelope_heat_loss_w_per_k sums all parts; extension_1 only as discrete features (88% null drops extension_2) - v16.0.0 MAJOR bump (rename secondary_dwelling_* -> extension_1_*); coordinated cutover with AutoGluon repo + scoring lambda - LightGBM objective="mape" for sap_score+peui_ucl in 16g; sample weights deferred	2026-05-17 11:20:40 +00:00
Khalim Conn-Kowlessar	fd8d71eb05	slice 15e: per-decile residuals reporting in train_baseline Adds `_per_decile_residuals` and writes `residuals_<target>.json` next to metrics.json. Buckets test-set rows by deciles of the true target value; each bucket carries count + MAPE + MAE + mean residual + true_min/max. Lets us tell whether errors concentrate in the tails of the true distribution (e.g. SAP<40 / SAP>85) vs the mid-band — which the global MAPE alone hides. Baseline for slice 16's MAPE-improvement ablations.	2026-05-17 11:18:40 +00:00
Khalim Conn-Kowlessar	195336b7e1	slice 15d: +50 features (gap fill + secondary building part); drop 2 derived Removes: - environmental_impact_current (SAP-derived rating, leaks into co2 target) - energy_rating_average (average of sap_score + potential, direct leak) Adds: Doors draughtproofed_door_count, insulated_door_u_value Hot water cylinder_insulation_type, cylinder_thermostat, secondary_heating_type Ventilation mechanical_vent_duct_placement, _duct_insulation, _duct_insulation_level, _measured_installation Lighting low_energy_fixed_lighting_bulbs_count, fixed_lighting_outlets_count, low_energy_fixed_lighting_outlets_count Windows window_avg_glazing_gap_mm, window_avg_frame_factor, window_pct_permanent_shutters_insulated Main dwelling room_in_roof_floor_area_m2, alternative_wall_count, alternative_wall_area_m2, flat_roof_insulation_thickness_mm, wall_thickness_measured Element counts wall_count, roof_count, floor_count, main_heating_count_elements, main_heating_controls_present Wind wind_turbine_hub_height_m, wind_turbine_rotor_diameter_m Flat flat_unheated_corridor_length_m Addendum addendum_stone_walls, addendum_system_build, addendum_numbers_count LZC lzc_energy_sources_count Secondary part secondary_dwelling_present + 11 fabric features (wall/roof/floor construction + insulation + thickness + area + heat-loss perimeter) + other_building_parts_count Wires through schema -> domain -> mapper: adds Addendum dataclass, lzc_energy_sources, mechanical_vent_duct_insulation_level. Also fixes _measurement_value to accept raw dicts (from_dict left some Measurement fields as dict when they weren't typed as a dataclass). Results at N=25,000 2026 RdSAP certs: sap_score MAPE=0.043 sMAPE=0.036 R^2=0.891 co2_emissions sMAPE=0.106 R^2=0.929 peui_raw MAPE=0.087 sMAPE=0.084 R^2=0.860 peui_ucl MAPE=0.079 sMAPE=0.076 R^2=0.866 space_heating_kwh MAPE=0.112 sMAPE=0.108 R^2=0.947 hot_water_kwh MAPE=0.071 sMAPE=0.069 R^2=0.854 (+0.082 R^2 vs 15b) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 10:13:03 +00:00
Khalim Conn-Kowlessar	a1f89b6033	slice 15c: stream build_features so 500k+ cert runs fit memory Previously kept the full list of EpcPropertyData in memory before calling EpcMlTransform.to_rows. For the 25k slice that's ~30 MB; for the 580k full-2026 corpus it OOM-killed the process silently. Now: parse cert -> to_row -> append dict -> drop EpcPropertyData reference, so memory is O(row-dict * n) instead of O(EpcPropertyData * n). Same end-of-frame post-processing (categorical casts, column-order pin). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 00:36:53 +00:00
Khalim Conn-Kowlessar	9f6f7608b9	slice 15b: +18 features — heating type code, hot water, windows, flat, supply Heating: primary_sap_main_heating_code (the SAP10 heating-system enum was the single biggest missing input), primary_emitter_temperature, primary_main_heating_fraction. Hot water: immersion_heating_type, shower_outlet_count. Windows: window_pct_living, window_pct_external, window_pct_permanent_shutters (area-weighted shares parallel to existing window aggregates). Dwelling: conservatory_type, has_heated_separate_conservatory. Flat-only block (sap_flat_details): flat_level, flat_top_storey, flat_storey_count, flat_location, flat_heat_loss_corridor (int sentinels like '20+' coerce to None for the categorical features). Energy supply: meter_type, pv_connection, wind_turbines_terrain_type. Also plumbs `air_tightness` EnergyElement, `sap_flat_details` and `has_heated_separate_conservatory` through the 21.0.1 mapper path (they were silently None before). Results at N=25,000 2026 RdSAP certs: sap_score MAPE=0.044 sMAPE=0.038 R^2=0.884 (+0.045 R^2 vs 15a) co2_emissions sMAPE=0.108 R^2=0.925 peui_raw MAPE=0.092 sMAPE=0.088 R^2=0.849 peui_ucl MAPE=0.081 sMAPE=0.078 R^2=0.860 space_heating_kwh MAPE=0.111 sMAPE=0.108 R^2=0.945 hot_water_kwh MAPE=0.081 sMAPE=0.079 R^2=0.772 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 00:08:11 +00:00
Khalim Conn-Kowlessar	0ffda529ec	slice 15a: add wall/floor/roof + demand scalar features for retrofit simulation 15 new features wired through schema -> domain -> mapper -> transform: Main Dwelling fabric (11): - wall_insulation_type, wall_insulation_thickness_mm, wall_dry_lined, wall_thickness_mm, party_wall_construction - roof_insulation_location, roof_insulation_thickness_mm - floor_construction, floor_insulation, floor_insulation_thickness_mm, floor_heat_loss Dwelling-level scalars (4): - multiple_glazed_proportion, number_baths, number_baths_wwhrs, extract_fans_count Thickness strings like '50mm'/'NI'/'ND' parsed via _parse_thickness_mm; NI (no insulation) lands as 0mm so the model sees the physical zero rather than a missing value. Categorical sentinels ('NA'/'NI'/'ND') become None. Also fixed long-standing typo `multiple_glazed_propertion` -> `_proportion` in domain dataclass + its lone DB-model usage. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 22:08:27 +00:00
Khalim Conn-Kowlessar	c496f345f8	slice 14l: bigger-run fixes — UCL guard, PV Measurement coercion, sMAPE Three changes surfaced by the 25k 2026 run: - transform._peui_ucl returns None for non-positive raw PEUI (net-exporters). apply_ucl_correction would otherwise raise ValueError on negative input. - PhotovoltaicArray scalars (peak_power, pitch, orientation, overshading) now accept Measurement \| int \| float in the schema; mapper coerces via _measurement_value. - train_baseline reports sMAPE alongside MAPE — handles zero-actual rows (e.g. co2_emissions for net-zero certs) where MAPE explodes. Results at N=25,000 RdSAP 2026 certs (~32s end-to-end): sap_score MAPE=0.064 sMAPE=0.054 R^2=0.762 co2_emissions sMAPE=0.140 R^2=0.890 peui_raw MAPE=0.126 sMAPE=0.120 R^2=0.714 peui_ucl MAPE=0.114 sMAPE=0.108 R^2=0.736 space_heating_kwh MAPE=0.167 sMAPE=0.157 R^2=0.915 hot_water_kwh MAPE=0.089 sMAPE=0.086 R^2=0.737 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 21:15:37 +00:00
Khalim Conn-Kowlessar	8fddd25b9a	slice 14k: E2E pipeline runs on real 2026 RdSAP certs Two production fixes surfaced by the live run: - mapper.from_rdsap_schema_21_0_1 now sets the three ML target scalars (energy_rating_current, co2_emissions_current, energy_consumption_current). They were silently None for every cert before, leaving the only labels as the kWh fields from renewable_heat_incentive. - train_baseline coerces object-dtype columns to numeric (None -> NaN) and drops rows with null target per fit, so LightGBM accepts the frame. E2E on 500 real certs (~1s): sap_score R^2=0.604 MAPE=0.084 co2_emissions R^2=0.813 MAPE=0.130 peui_raw R^2=0.979 MAPE=0.026 space_heating_kwh R^2=0.823 MAPE=0.213 hot_water_kwh R^2=0.519 MAPE=0.115 peui_ucl excluded: UCL correction still needs wiring. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 20:47:41 +00:00
Khalim Conn-Kowlessar	6697a6c76e	slice 14j: Optional sweep across schema 21.0.1 + mapper guards Across 500 real RdSAP-21.0.1 certs from 2026, mapper goes 0% -> 100% success. Schema-loading + ml-transform + ml_training_data: 146 tests pass. Mainly affected fields: - SapHeating: instantaneous_wwhrs, shower_outlets (now Union with List shape) - SapWindow: glazing_gap, frame_factor, pvc_frame, window_transmission_details - SapEnergySource: pv_battery_count, wind_turbine_details, pv_batteries (List form) - SapBuildingPart: all 13 sub-fields now Optional - SapFloorDimension: Measurement \| int \| float fallback - RdSapSchema21_0_1: 16 top-level fields (mechanical_vent_*, lighting counts, ...) Mapper helpers added: _measurement_value, _first_pv_battery, _first_shower_outlet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 20:35:28 +00:00
Khalim Conn-Kowlessar	ccb654c230	slice 14i: pin real RdSAP cert as fixture + RED regression test Currently fails on SapWindow.glazing_gap (first of ~30 fields the dataclass incorrectly treats as required). Will go GREEN once 14j sweeps Optional. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 20:23:29 +00:00
Khalim Conn-Kowlessar	611c07de94	slice 14h: handle real bulk-JSON shape (NDJSON wrappers + document payload) Bulk entries are NDJSON of wrapper records, not a JSON array. Each wrapper carries certificate_number, assessment_type, and a stringified document with the actual EPC schema payload. Filter to RdSAP, unwrap document, then map. remote_bulk_fetcher: per-entry presigned-URL refresh (30s S3 TTL). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 19:45:52 +00:00

1 2 3 4 5 ...

4826 commits