Model

mirror of https://github.com/Hestia-Homes/Model.git synced 2026-06-30 13:10:47 +00:00

Author	SHA1	Message	Date
Khalim Conn-Kowlessar	361abc1202	fix(mapper): handle 'ND' multiple_glazing_type on RdSAP-Schema-20.0.0 `_synthesise_20_0_0_sap_windows` passed `schema.multiple_glazing_type` straight into `_api_cascade_glazing_type`, which raised UnmappedApiCode on the "ND" (Not Defined) string that the 20.0.0 corpus lodges alongside the 1-8 integer codes — failing the mapper-coverage guard on every ND-glazed 20.0.0 cert. Mirror the existing 18.0/19.0/17.x seams: route integer codes through the cascade, fall the "ND" string back to the DG-modal default (cascade code 2 → daylight g_L 0.80). Also corrects the 20.0.0 schema field type `int` → `Union[int, str]` to match the data (as 18.0 already does), which keeps the isinstance guard pyright-clean. Pre-existing failure (present before this branch's recent commits), not in the handover regression gate. Fixes all 15 RdSAP-Schema-20.0.0 ND certs; test_mapper_corpus 6002/6002 pass. pyright net-zero. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 06:43:55 +00:00
Khalim Conn-Kowlessar	6e9f831296	chore(epc-prediction): grow validation corpus to 150 postcodes Bumps N_POSTCODES 40 -> 150 for the fetch script. Larger corpus (150 postcodes / 3719 certs) reduces leave-one-out variance and unblocks the recency-template work (#1223), which regressed the noisier 36-target gate fixture. Corpus itself stays out of git (gitignored /tmp + persistent backup at /workspaces/home/epc_prediction_corpus_backup). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 06:42:19 +00:00
Khalim Conn-Kowlessar	0fae84d2b6	fix(elmhurst): map secondary room-heater SAP codes to Table 4a fuel category Completes `_elmhurst_secondary_fuel_from_sap_code` per SAP 10.2 §12 (PDF p.34: "Secondary heating systems and applicable fuel types are taken from the room heaters section of Table 4a") + RdSAP 10 §10.4.1. Each Table 4a room-heater code now resolves to its fuel CATEGORY's modal fuel: - gas room heaters 601-613 → mains gas (26 → Table 32 1, 3.48 p/kWh) - liquid room heaters 621-625 → heating oil (28 → Table 32 4, 5.44 p/kWh) - solid room heaters 631-636 → house coal (11 → Table 32 11, 3.67 p/kWh) - electric room htrs 691-694/699/701 → None (cascade electricity default) Previously only the gas (601-613→26) and solid (631-634→11) blocks were mapped; liquid heaters (621-625) and 635-636 fell through to None → silently billed as electricity (13.19 p/kWh), a large mis-price for an oil/solid heater. The prior slice raised on those; this maps them to the correct category fuel instead, and keeps the raise ONLY for codes inside the room-heater range (601-701) that are not a recognised Table 4a row. The specific sub-fuel within a category (mains gas vs LPG vs biogas) is a SEPARATE lodgement per §10.4.1 and is NOT exported in the Summary, so the gas block stays the modal mains gas — worksheet "simulated case 37" lodged its 605 live-effect fire on biogas (7.60 p/kWh), unrecoverable from the Summary code alone (this is the entire +7 SAP case-37 gap: secondary energy £131 + a separate biogas standing charge £70; every other line matches the worksheet exactly, incl. (206) main efficiency 61%). 5 AAA tests, harness 47/47 (0 raised), pyright net-zero, regression clean, corpus gauge unchanged (Elmhurst-path only). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 06:27:37 +00:00
Khalim Conn-Kowlessar	718455e971	feat(epc-prediction): physical-similarity-weighted categorical mode (#1224 ) ADR-0029 decision 5: survivors were treated equally; now each neighbour's vote in the cohort mode decays with its distance from the cohort's physical centre (floor area from the median, age band from the modal band), so the mode leans on the most representative neighbours instead of being swayed by size/era outliers. Scales (size 20 m^2, age weight 0.5) chosen on the validation corpus; the tight size kernel is load-bearing (looser scales regress floor_insulation on the fixture). Corpus (181 SAP-10.2 targets): wall_insulation 83.4->86.2%, roof_construction 86.2->87.3%, floor_construction 78.8->81.2%, floor_insulation 92.9->94.1%; net +7.5pp gained vs -1.1pp (two 1-cert dips, both held on the fixture). Geometry/residuals untouched (template unchanged). Gate (36-target fixture): zero regression across all 24 floors/ceilings; ratcheted wall_insulation_type 0.7778->0.8333, floor_construction 0.7500->0.8125, floor_insulation 0.9062->0.9375. Dead _mode/_int_mode removed (superseded by the weighted variants). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 10:46:51 +00:00
Khalim Conn-Kowlessar	07051b9401	feat(epc-prediction): per-prediction confidence signal (#1226 ) Adds PredictionConfidence (cohort size + per-component agreement = the modal value's share among neighbours that lodge one) and EpcPrediction.confidence(), a compute-only signal so downstream can flag low-confidence components (ADR-0029 open item: 'confidence signal'). Sanity check on the 40-postcode corpus (1068 component predictions): agreement is strongly predictive of correctness — pooled hit-rate 21.9% (<0.5) / 46.7% (0.5-0.7) / 73.6% (0.7-0.9) / 95.5% (>=0.9); point-biserial corr(agreement, correct) = 0.582. Cohort size tracks too (<6 -> 68.4%, >=20 -> 96.0%). Surfacing / persistence is a separate HITL follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 10:35:59 +00:00
Khalim Conn-Kowlessar	9830ea2110	fix(elmhurst): raise on unmapped fuel-fired secondary room-heater code The Elmhurst Summary lodges only the secondary heating SAP code (Table 4a Category 10), never its fuel. `_elmhurst_secondary_fuel_from_sap_code` mapped the gas block (601-613 → mains gas) and solid block (631-634 → house coal) to their modal defaults, but returned None for any OTHER Category-10 code — and None makes the cascade SILENTLY bill the secondary as electricity (13.19 p/kWh). For a fuel-fired heater (e.g. 621-625 liquid-fuel oil/bioethanol) that is a large, invisible mis-price. Per the UnmappedElmhurstLabel strict-raise pattern (mirrors the wall_type / glazing label raises), a fuel-fired Category-10 code (601-699) outside the mapped gas/solid blocks now RAISES instead of guessing. Electric room heaters (691-699) keep returning None — electricity IS their fuel. The gas block 601-613 still resolves to the modal default mains gas: the Summary cannot distinguish mains gas from LPG/biogas, so an LPG or biogas live-effect fire (worksheet "simulated case 37" used biogas at 7.60 p/kWh vs our 3.48 p/kWh mains-gas default, a +7 SAP gap) is not recoverable from the Summary export — that is a data-availability limit, not a guess we can fix here. This commit closes the genuinely-silent-wrong path; the gas sub-fuel remains the documented modal default. Worksheet harness 47/47, 0 raised. 3 AAA tests, pyright net-zero, regression clean, corpus gauge unchanged (Elmhurst-path only; the API path lodges the secondary fuel explicitly). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 10:35:38 +00:00
Khalim Conn-Kowlessar	ffaedd8d14	feat(epc-prediction): ±1-band age scoring + window_count cosmetic (#1222 ) Measurement honesty so we optimise SAP-relevant accuracy, not SAP-neutral misses (ADR-0030 Component Accuracy): - Add construction_age_band_pm1: an exact-or-adjacent-band hit. Adjacent RdSAP age bands carry near-identical U-values, so an off-by-one is ~SAP-neutral. Full corpus: exact 78.5% but ±1-band 91.7% (fixture 63.9% -> 83.3%) — most age misses are adjacent. - Drop window_count from the gate's residual ceilings (cosmetic): the predicted picture clusters at a mapper-default 4 windows vs actuals 1-21, but total_window_area (the SAP-relevant signal) stays tight at ~3.4 m2. Gate: + construction_age_band_pm1 floor 0.8333; window_count no longer gated. Closes #1222 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 10:01:20 +00:00
Khalim Conn-Kowlessar	ac77624d67	test(pv-battery): pin SAP cost-neutrality on export-capable standard tariff End-to-end API-path regression pin for the battery behaviour validated by the user-simulated Elmhurst worksheet pair (cert 001431 "simulated case 35/36", 5 kWh, export-capable, mains-gas, standard tariff). The official SAP rating ("10a. Fuel costs - using Table 12 prices") values PV used-in- dwelling and PV exported identically at 13.19 p/kWh (export code 60 == import code 30, ADR-0010), so a battery only redistributes PV between two equally-priced lines: worksheet PV credit (252) = -455.6458 and SAP (258) = 88.0859 are IDENTICAL with/without the battery (ΔSAP = 0). Two tests over the committed RdSAP-21.0.1 corpus: - standard tariff (meter 2): toggling the battery holds continuous SAP EXACTLY constant, while at least one cert's primary energy DOES respond (proving the App-M1 §3c β-split is wired, not a dropped battery). - off-peak tariff (meter != 2): the battery STRICTLY raises SAP, because self-consumed PV displaces high-rate import (15.29) above the 13.19 export credit — confirming the standard-tariff neutrality is a price coincidence, not a no-op. Guards table_32 export price (code 60) and the battery β-split against silent regression. Complements the unit-level β tests in test_photovoltaic.py. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 09:51:34 +00:00
Khalim Conn-Kowlessar	a5b7310911	feat(epc-prediction): recency-weighted mode for roof insulation (ADR-0029/0030) Investigated recency-weighting (weight cohort votes by an exponential decay in cert age). Key finding: it must be SELECTIVE. On the validation corpus it HURTS permanent categoricals (wall 91.2->89.5, age 78.5->75.7 — discards still-valid data) but clearly HELPS time-varying ones, where a recent neighbour reflects the current physical state: roof_insulation_thickness 56.7 -> 60.7% corpus (+4pp) 29.4 -> 41.2% fixture (+12pp) So apply a recency-weighted mode only to roof_insulation_thickness (loft top-ups happen over time); keep the plain mode for permanent categoricals. tau = 4yr (~2.8yr half-life); falls back to plain mode when no registration dates are lodged. Gate floor ratcheted 0.2941 -> 0.4118. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 09:45:22 +00:00
Khalim Conn-Kowlessar	9dd23477ac	feat(epc-prediction): cohort-mode roof + floor insulation (ADR-0030) These independent fabric categoricals were template-copied; mode them like the construction categoricals. Verified mode beats template before applying. Big fixture win on roof insulation thickness (doubled), floor insulation neutral-to-positive: roof_insulation_thickness 14.7% -> 29.4% (gate floor ratcheted up) floor_insulation 90.6% (unchanged on the fixture) Glazing type was tried too (+1.6pp on the 40-postcode corpus) but REGRESSED the 36-target fixture (0.50 -> 0.44) — the gate caught it. Glazing moding is marginal/noisy, so it's left on the template; revisit with a larger corpus. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 09:37:45 +00:00
Khalim Conn-Kowlessar	a622f97d27	docs(adr): ADR-0030 — record S3-hosted Tier-1.5 scale run Tier-2 (full national bulk streaming) is deferred. The near-term scale validation is a Tier-1.5: a few-thousand-cert anonymised corpus stored in S3 (too large to commit, far more stable than the 36-target gate fixture), pulled to a temp dir and run through the same load_corpus + evaluate_component_accuracy. Reuses the committed-fixture machinery wholesale — only the data source differs. One scorer, three data sources (committed fixture / S3 corpus / bulk stream). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 09:25:58 +00:00
Khalim Conn-Kowlessar	e3a2720e5c	feat(epc-prediction): Tier-1 ratcheting Component Accuracy gate (ADR-0030) The committed CI gate: run the calculator-free leave-one-out scorer over the frozen anonymised fixture (36 SAP-10.2 targets) and assert each per-component classification rate / geometry residual is no worse than a committed baseline. Prediction is deterministic + the fixture frozen, so the numbers reproduce exactly — a failure is a real regression, never sample noise. - 19 rate floors + 5 residual ceilings, seeded at the currently-measured values; they only ever tighten (no-widening ethos on an aggregate). - Calculator-FREE — component floors are the real gate; the end-to-end SAP/carbon/PE guards stay out (their floor is the separate API-path calculator workstream). - Skips with a message when the fixture is absent. 25 parametrized assertions, all green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 09:19:39 +00:00
Khalim Conn-Kowlessar	008c1922c4	feat(epc-prediction): anonymised Tier-1 fixture + builder (ADR-0030) The committed gate needs frozen, reproducible data without dumping real UK addresses into the repo. Add: - harness anonymise_payload + stable_hash: hash street address + cert number into opaque, dedup-stable tokens; blank secondary address lines + post_town; keep postcode + all component/lodged fields (gov data is OGL). Unit-tested. - scripts/build_epc_prediction_fixture.py: curate qualifying postcodes (>=1 SAP 10.2 target + >=2 distinct addresses) from the local scratch corpus, anonymise, freeze under tests/fixtures/epc_prediction/. - The frozen fixture: 15 postcodes / 280 certs / 36 SAP-10.2 targets. Verified no plaintext address_line_1 and post_town all blank. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 09:17:27 +00:00
Khalim Conn-Kowlessar	e7177a8bd4	fix(electric-heaters): code-699 "electric heaters assumed" bills Table 12a direct-acting split A "No system present: electric heaters assumed" lodging carries SAP Table 4a code 699 (electric room heaters) but RdSAP main_heating_category 1, NOT 10. `_table_12a_system_for_main` keyed the direct-acting-electric routing on category==10 only, so the category-1 form fell through to None and `_space_heating_fuel_cost_gbp_per_kwh` billed space heating 100% at the off-peak LOW rate — as if direct-acting room heaters charged overnight like storage. Per RdSAP 10 §12 Rule 3 (PDF p.62) electric room heaters (691-694, 699) route to the 10-hour tariff, and SAP 10.2 Table 12a Grid 1 (PDF p.191) gives the "other direct-acting electric" row a 0.50 high-rate fraction at 10-hour (1.00 at 7-hour). Route those SAP codes — the same set §12 Rule 3 already uses — to OTHER_DIRECT_ACTING_ELECTRIC alongside the category-10 gate. Found via the PE/CO2-vs-cost split on the worst over-rater in the /tmp sample: cert 2958 PE +0% / CO2 -1% (energy correct) but SAP +32.2 — a pure cost-side bug. Space rate 7.50 -> 11.09 p/kWh; cert 2958 +32.2 -> +14.7. The committed corpus gauge is unchanged (its 3 non-category-10 code-699 certs are all on Single meters -> STANDARD tariff, so this split never applies to them); the win is on the unbiased /tmp population's single worst cert. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 09:16:22 +00:00
Khalim Conn-Kowlessar	027ee1fba3	refactor(epc-prediction): extract shared leave-one-out scorer + corpus loader (ADR-0030) "One scorer, two harnesses" (ADR-0030): the committed gate, the local script, and the future battle-test must run the same scoring. Extract it: - domain/epc_prediction/validation.py — `iter_predictions` (the single leave-one-out orchestration: latest-per-address hold-out, SAP-10.2 target filter, all-vintage source) + `evaluate_component_accuracy` (calculator-free ComponentAccuracy aggregation, the primary signal). Unit-tested. - harness/epc_prediction_corpus.py — `load_corpus(dir)` IO: corpus dir -> Comparable cohorts (maps payloads, carries address + registration_date). validate_epc_prediction.py now just loads + calls the scorer for the component section and iterates iter_predictions for the calculator-floored end-to-end. Identical numbers (181 targets, SAP MAE 6.34) — behaviour-preserving. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 09:12:08 +00:00
Khalim Conn-Kowlessar	65cb094abe	feat(epc-prediction): SAP-10.2 target filter + carbon/PE end-to-end (ADR-0030) Make the leave-one-out runner ADR-0030-compliant: - Hold out only SAP 10.2 targets (sap_version == 10.2) — the source cohort keeps every vintage (components are methodology-agnostic). - Label Component Accuracy as the PRIMARY, calculator-independent section. - End-to-end vs API-lodged (SECONDARY, calculator-FLOORED): add CO2 (tonnes) and PEI (kWh/m2) alongside SAP, using the canonical performance.py mapping (co2_kg/1000; primary_energy_kwh_per_m2). - Add the attribution readout calc(actual) vs lodged SAP — the calculator floor the end-to-end can reach. - Drop the neighbour-mean-of-lodged-SAP baseline (mixes SAP versions — rejected by ADR-0030). On the 181 SAP-10.2 targets: component rates are higher than the all-vintage view (age band 60.9 -> 78.5%, floor_area mean\|.\| 12.7 -> 8.4). End-to-end SAP MAE 6.34 vs the calc(actual) floor of 3.25 — ~half the gap is the known API-path calculator residual, not prediction error. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 09:04:24 +00:00
Khalim Conn-Kowlessar	275a30a825	feat(epc-prediction): complete component coverage — fabric/glazing/renewables/doors (ADR-0030) Finish the ADR-0030 Component Accuracy set: roof insulation thickness, floor insulation, room-in-roof presence, modal glazing type, PV presence, solar water heating (categoricals) + door count (residual). Presence flags (room-in-roof, PV, solar) are always-applicable — predicting absence when present is a real miss. Template-copied baseline (40-postcode corpus), newly visible: floor_insulation 94.0% solar_water_heating 99.7% has_pv 98.6% has_room_in_roof 91.9% modal_glazing_type 59.0% <- weak roof_insulation_thickness 30.6% <- weak door_count mean\|.\| 0.40 compare_prediction now scores 19 categoricals + 5 residuals across every SAP-load-bearing component group. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 09:00:30 +00:00
Khalim Conn-Kowlessar	cd43c52cf9	feat(epc-prediction): score the heating components (ADR-0030 Component Accuracy) Heating is the dominant SAP lever (ablating it to actual cut the SAP error ~7 -> ~4.5) yet was entirely unscored. Add the heating group to compare_prediction's categorical_hits: main fuel / category / control (off the primary MainHeatingDetail), water-heating fuel / code, has-cylinder, cylinder insulation, secondary heating (off SapHeating). Template-copied baseline on the 40-postcode corpus (no predictor change yet — this just makes the signal visible): heating_main_fuel 93.4% heating_main_category 92.7% water_heating_fuel/code 91.7% / 92.4% heating_main_control 62.1% <- weak has_hot_water_cylinder 78.5% cylinder_insulation_type 35.8% (n=120) <- weak secondary_heating_type 16.8% (n=125) <- weak Fuel/category predict well from the template; controls, cylinder, and secondary heating are poor and now drive the next predictor slices. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 08:53:15 +00:00
Khalim Conn-Kowlessar	41b5ce5057	refactor(epc-prediction): name-keyed categorical_hits for Component Accuracy (ADR-0030) ADR-0030 commits Component Accuracy to ~19 categorical components (5 today + 8 heating + glazing/renewables). Flat *_correct dataclass fields don't scale — each needs manual runner wiring. Collapse them into a single `categorical_hits: dict[str, Optional[bool]]` keyed by component name, which also matches the runner's name-keyed aggregation (now generic: it tallies whatever components the comparison reports). No behaviour change; the classification rates are identical (wall n 578->575 is the 3 certs whose actual wall is None, now correctly counted as not-applicable via _classify). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 08:50:34 +00:00
Khalim Conn-Kowlessar	9ee3821138	fix(pv): zero exported PV when dwelling is not export-capable SAP 10.2 Appendix M1 (PDF p.94): "EPV,ex,m = 0 if the PV system is not connected to an export-capable meter." The cascade computed the β-split export stream regardless of `is_dwelling_export_capable`, so a non-export- capable dwelling was credited the full PV export — in the §10a COST it credits at the Table 32 import rate (13.19 p/kWh), which dominates the rating. On 7 Wybourn Terrace S2 5BJ the PE (144 vs lodged 151) and CO2 (27 vs 29) already matched, yet the phantom export cost credit pulled SAP from ~73 to 92.1 (+19). Zero `epv_exported_monthly_kwh` after the Appendix-G4 diverter adjustment when not export-capable; the onsite (EPV,dw) consumption and the diverter HW reduction are unchanged. Not-export-capable PV cohort (corpus, 4 certs): 7 Wybourn +19.1 -> +6.5, 4 Lime Ave +11.1 -> +0.4, 8 Hatherleigh +7.6 -> -0.2, Flat 5 ~-0.4. Gauge 66.1% -> 66.9%, MAE 1.124 -> 1.039. Floor 0.64 -> 0.65 / ceiling 1.18 -> 1.08. Worksheet harness 47/47 0 diverge (Summary certs carry export-capable meters). 1 AAA test, pyright net-zero. Found by auditing the worst over-rater without a worksheet: PE/CO2-match + cost-miss localised it to the PV export credit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 08:48:38 +00:00
Khalim Conn-Kowlessar	35a7c07812	docs(adr): ADR-0030 — SAP-version-aware, component-first EPC Prediction validation Records the grilling-session decisions amending ADR-0029's validation: - Source cohort keeps all cert vintages (components are agnostic of the SAP methodology that rated them); only the held-out validation TARGET is restricted to SAP 10.2. Amends ADR-0029 decision 5 ("pre-SAP10 dropped"). - Component Accuracy (predicted vs API actual components) is the primary, calculator-independent signal. calc(predicted) vs calc(actual) rejected (circular ground truth, hides calculator error); neighbour-mean-lodged-SAP baseline rejected (mixes SAP versions). calc(predicted) vs API-lodged SAP/carbon/PE kept as a secondary, calculator-floored guard. - Two tiers: committed anonymized fixture (ratcheting CI gate) + bulk-export national battle-test on harness/epc_bulk.py + harness/cohort.py, emitting accuracy + a failure taxonomy, re-baselining the gate floors. CONTEXT.md: Comparable Properties corrected to all-vintage source; new Component Accuracy term. ADR-0029 Validation section marked superseded. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 08:47:58 +00:00
Khalim Conn-Kowlessar	94275d07cc	fix(hot-water): default present-but-unsized cylinder to Table 28 Normal 110 L RdSAP 10 §10.5 (PDF p.55): "If the actual size is not determined, the size of a hot-water cylinder is taken as according to Table 28." When a cylinder is present (has_hot_water_cylinder) but no size descriptor resolves — the gov API lodges cylinder_size=0, or Exact with no measured volume — `_hot_water_ cylinder_volume_l` returned None, silently dropping BOTH the cylinder's storage loss and the Table 13 electric-DHW high-rate fraction, under-costing and over-rating the dwelling. Default such cylinders to the Table 28 baseline "Normal" 110 L (the value §10.7 also instantiates as the first-row default). The context-dependent Inaccessible 210/160 values are deliberately NOT applied here — they are tied to the explicit "Inaccessible" descriptor (code 5) the assessor lodges, not to an unpopulated size field. Scope: 7 of 301 cylinder certs in the corpus (2%). Correctness fix — closes a real spec gap; marginal on the headline (within-0.5 66.1% unchanged, MAE 1.128 -> 1.124) because these certs' residual is dominated by a separate HW- demand gap, not the cylinder. Worksheet harness 47/47 0 diverge (Summary certs lodge a real size, so the fallback never fires). 1 AAA test, pyright net-zero. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 08:20:34 +00:00
Khalim Conn-Kowlessar	bec62b9167	fix(storage-heaters): Table 12a code-408 integrated-storage high-rate fraction SAP 10.2 Table 12a Grid 1 (PDF p.191): electric storage heater SAP code 408 is an "Integrated (storage + direct-acting) system" with a 0.20 space-heating high-rate fraction on a 7-hour tariff — NOT the 0.00 of "other storage heaters". `_table_12a_system_for_main` returned None for all storage codes (an explicit TODO), so code 408 fell back to the 100%-low-rate path and billed space heating at the bare 7-hour low rate (5.50 p/kWh) — under-costing → over-rating. Mapped cat-7 storage: 408 -> INTEGRATED_STORAGE_DIRECT (0.20), others -> OTHER_STORAGE_HEATERS (0.00, unchanged behaviour). The enum + fraction rows already existed; this only wires the dispatch, so the split flows self-consistently to both the §10a cost and the Appendix-M1 D_PV high-rate fraction. Corpus: sap408 over-raters +14.6/+12.9/+12.7 -> +7.1/+5.1/+3.4 (two crossed into within-0.5). Gauge 65.9% -> 66.1%, MAE 1.160 -> 1.128. Floor 0.63 -> 0.64 / MAE ceiling 1.22 -> 1.18. Worksheet harness 47/47 0 diverge. The residual +3..+7 is the "all other uses" 0.90 high-rate fraction (lighting/pumps/HW still billed 100%-low on the off-peak legacy path) — the next slice. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 02:12:39 +00:00
Khalim Conn-Kowlessar	dfcd7af57c	fix(heat-network): apply Table 4c(3) flat-rate charging factor to demand SAP 10.2 Table 4c(3) (PDF p.169) "Factor for controls and charging method" multiplies a heat network's heat requirement by 1.05-1.10 for FLAT-RATE charging (note d: household pays a fixed amount regardless of heat used, so no incentive to economise), and by 1.0 for charging linked to use. The worksheet folds it into the heat-network requirement alongside the Table 12c distribution loss factor: (307) space = (98c) x (302) x (305) x (306) (310) DHW = (64) x (305a) x (306) Our cascade applied (306) DLF but never (305)/(305a), so every flat-rate community-heating cert under-counted demand -> over-rated SAP. Folded the factor into the 1/DLF efficiency override at the space-heating (206) and DHW (water-inherits-from-main) sites. Space column adds +0.05 for no thermostatic control (2301/2302); DHW column is 1.05 flat-rate / 1.0 linked-to-use. Corpus (RdSAP-21.0.1, 1000 certs): community cluster median +0.32 -> -0.19, within-0.5 38% -> 62% (control 2307 +0.83 -> -0.19; 2306 unchanged at factor 1.0 as spec requires). Overall gauge 65.0% -> 65.9%, MAE 1.174 -> 1.160. Ratcheted the corpus-test floor 0.62 -> 0.63 / MAE ceiling 1.25 -> 1.22. Also records (corpus-test comment + scripts/decompose_co2_pe_error.py) the disproof of the prior "CO2/PE +5% is a factor/scope bug" lead: factors are spec-exact, scope identical, and the bias is per-cert demand fidelity (corr(SAP-err, PE-diff) = -0.54), not a one-slice factor fix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 01:54:51 +00:00
Khalim Conn-Kowlessar	c3d56b00dd	chore(epc-prediction): grow validation corpus to 40 postcodes (ADR-0029) Bump N_POSTCODES 150 -> 40 as the gradual-growth step from the 3-postcode smoke. 40 postcodes / 1113 certs / 578 leave-one-out predictions is enough for stable, trustworthy metrics (the smoke's 2 usable postcodes were dominated by oddball flats — floor_area mean\|.\| 52.6 there vs 12.7 here). Resumable + reproducible (random.seed(2026)); raise again to scale up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 01:52:44 +00:00
Khalim Conn-Kowlessar	fa11df56c2	fix(epc-prediction): dedupe re-lodgements + leak-free leave-one-out (ADR-0029) The register lists every historical lodgement, so a postcode cohort contains the same physical address many times (LS61AA: 15 certs / 11 addresses; NG71AA: 15 / 9 — "FLAT 3" appears 3x in each). Two consequences: - Production: a re-lodged neighbour was counting up to 3x towards the cohort mode. select_comparables now dedupes candidates to the latest cert per address (one comparable per real neighbour) — Comparable gains address + registration_date (the register metadata its docstring already anticipated, read straight off the cached payload). - Validation: leave-one-out leaked — predicting a flat from a near- identical re-lodgement of itself. The harness now holds out a whole address (excludes every sibling cert) and evaluates on the latest cert per address (the best ground truth). Removing the leak gives the honest numbers (19 distinct addresses): wall_construction 93.1% -> 89.5% construction_age_band 65.5% -> 52.6% roof_construction 79.3% -> 68.4% floor_area mean\|.\| 37.9 -> 52.6 m2 The earlier figures were inflated by self-leakage; these are the real accuracy to beat. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 00:40:23 +00:00
Khalim Conn-Kowlessar	54a57363f8	feat(epc-prediction): cohort-mode the roof/floor/insulation/age categoricals (ADR-0029) Only main wall_construction was set to the cohort mode; the other homogeneous categoricals (wall insulation, construction age band, roof construction, floor construction) were left as template-copied, so one median-size template's quirks set them. Apply the same cohort-mode mechanism to all of them per ADR-0029 decision 4 — the template still supplies geometry, only the categorical codes move to the mode. Verified mode beats (or ties) template-copy per categorical before applying. Smoke corpus (29 leave-one-out) classification rates: construction_age_band 55.2% -> 65.5% roof_construction 72.4% -> 79.3% floor_construction 46.2% -> 84.6% wall_insulation_type 93.1% (tie — already template-strong) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 00:31:16 +00:00
Khalim Conn-Kowlessar	ed96df9315	feat(epc-prediction): classify roof/floor/insulation/age categoricals (ADR-0029) The comparison only scored main wall_construction; everything else the predictor produces (by template-copy) went unmeasured. Extend compare_prediction to the rest of the ADR-0029 homogeneous categoricals — wall insulation type, construction age band, roof construction, floor construction — and aggregate per-categorical classification rates in the runner. A categorical hit is "not applicable" (None, excluded from the denominator) when the actual lodges no value, so absent-roof flats don't score free wins. Smoke corpus (29 leave-one-out, all but wall are template-copied today): wall_construction 93.1% wall_insulation_type 93.1% construction_age_band 55.2% <- loud; candidate for cohort-mode roof_construction 72.4% floor_construction 46.2% (n=13) These numbers drive the next slice (extend cohort-mode coverage). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 00:10:56 +00:00
Khalim Conn-Kowlessar	4fa20ae76b	fix(epc-prediction): size-representative template selection (ADR-0029) Template (the comparable whose structure/geometry is copied wholesale) was members[0] — an arbitrary draw from the API search order. With floor area varying widely within a property_type cohort (NG71AA houses span 51-340 m2), this made the copied geometry noisy and systematically large. Pick the member whose floor area is closest to the cohort median instead, implementing ADR-0029 decision 4's unimplemented "closest on size" criterion while keeping the structure coherent (it is still one real property, so floor dims / windows / parts stay internally consistent for the calculator). Smoke corpus (29 leave-one-out predictions): floor_area mean\|.\| 68.0 -> 37.9 m2 (bias +46.8 -> -3.9) window_area mean\|.\| 11.1 -> 7.3 m2 parts mean\|.\| 1.00 -> 0.38 SAP \|pred-calc - calc(actual)\| MAE 7.19 -> 4.86 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 00:05:40 +00:00
Khalim Conn-Kowlessar	f3ad6343a3	feat(epc-prediction): leave-one-out validation harness (ADR-0029) Pure compare_prediction (TDD): wall-construction classification hit + signed residuals on floor area, window count, total window area, building-parts count. Plus validate_epc_prediction.py (IO plumbing): drops each cert from its postcode cohort, predicts from the rest on guaranteed inputs only, aggregates the metrics, and reports SAP three ways (pred-calc vs lodged / vs calc-on-actual / vs the neighbour-mean baseline). Smoke run: wall 90.9%, floor-area mean\|·\| 42.6 m2 (a real signal — template-copied floor area is noisy), SAP pred-calc edges baseline. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 23:55:05 +00:00
Khalim Conn-Kowlessar	5e6d2cff16	feat(epc-prediction): EpcPrediction hybrid synthesis (ADR-0029) predict() copies a representative template comparable's structure (coherent for the calculator), overrides the homogeneous categorical with the cohort mode (robust to an atypical template), then applies known Landlord Overrides on top (a known value wins over the estimate). Proven on wall construction; roof/floor/ insulation/age extend on the same mode+override mechanism, driven next by the validation harness metrics. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 23:50:07 +00:00
Khalim Conn-Kowlessar	bf6b6fac17	feat(epc-prediction): Comparable Properties selection ladder (ADR-0029) Pure-domain select_comparables: property type is an always-hard filter; built form and known Landlord Overrides (e.g. solid brick) are conditioning filters on the filter-then-relax ladder — applied while >= minimum_cohort survive, relaxed otherwise (the mixed-street border case degrades gracefully). PredictionTarget (known inputs) + Comparable (epc + register metadata) + ComparableProperties (selected cohort). Weighting (recency x similarity) follows in the synthesis slice. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 23:44:57 +00:00
Khalim Conn-Kowlessar	fbe1cb54ad	test(epc): end-to-end SAP-accuracy gauge over the RdSAP-21.0.1 corpus Adds a committed integration test driving the full API path — raw gov-EPC response → from_api_response → cert_to_inputs → calculate_sap_from_inputs — across all 1000 certs in the in-repo RdSAP-21.0.1 corpus, and pins the aggregate accuracy of our continuous SAP (plus CO2 and primary energy) against each cert's lodged figures. Mirrors scripts/eval_api_sap_accuracy.py but runs in CI off the committed corpus (~2s, no /tmp sample needed). Scoped to RdSAP-21.0.1 — the SAP 10.2-era schema whose lodged rating uses the same methodology we compute (a fair target). Pre-SAP10 schemas (17.x-20.0.0) lodge SAP 2012 ratings and are out of scope (guarded for mapping only by test_mapper_corpus.py). Current: SAP within-0.5 = 65.0%, MAE = 1.174 (tight floor/ceiling — the optimised gauge). CO2 MAE = 0.27 t/yr (bias +0.17) and PE MAE = 14.6 kWh/m2/yr (bias +8.9) are reported + loosely guarded: cost is well-calibrated but CO2/PE both run ~+5-10% high (uniform across fuels — a systematic CO2/PE-factor or scope gap, not yet investigated). Thresholds ratchet as slices tighten each metric. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 23:40:05 +00:00
Khalim Conn-Kowlessar	80b525f0f4	feat(epc-prediction): postcode-clustered corpus fetch script (ADR-0029) Builds the frozen validation corpus: samples postcodes from the register, then caches each postcode's full cohort of raw cert payloads (the shape from_api_response consumes), grouped by postcode, resumably. Reads the token from backend/.env; cache dir /tmp/epc_prediction_corpus (EPC_PREDICTION_CORPUS override). IO plumbing, not test-driven. Pairs with the leave-one-out harness. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 23:36:19 +00:00
Khalim Conn-Kowlessar	008a1b2783	docs(adr): EPC Prediction from Comparable Properties (ADR-0029) Grill-with-docs outcome: deterministic neighbour synthesis (NOT ML) of an EPC-less Property's EpcPropertyData picture, scored via Sap10Calculator. Six decisions — predict-components-not-SAP; deterministic k-NN; fetch-phase fallback behind a pure EpcPrediction service + ComparableProperties port; hybrid synthesis (cohort-mode categoricals + coherent template structure + overrides); filter-then-relax cohort weighted geo x recency x similarity; dual-use gap-fill + anomaly flags. Frozen postcode-clustered corpus backs leave-one-out validation. CONTEXT.md: new EPC Prediction term, Comparable Properties refined, ML framing corrected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 23:36:19 +00:00
Khalim Conn-Kowlessar	5317175dd3	fix(water-heating): count electric showers in Noutlets for mixer demand (App J) The mixer-shower hot-water demand (worksheet 42a) divided N_shower by the count of MIXER outlets only. But SAP 10.2 Appendix J step 1a is explicit: "Establish how many shower outlets are present in the dwelling, Noutlets (including in the count any instantaneous electric showers)" — and the electric-shower step (64a) uses that same Noutlets from step 1a. So a dwelling with both a mixer and an electric shower assigned the FULL N_shower to the mixer system AND billed the electric shower on top of it, double- counting shower demand → over-counted main HW → under-rated the dwelling. Fix: thread the electric-shower count into the mixer demand so the denominator is the total outlet count (mixer + electric), iterating the warm-water draw over the mixer outlets only (per step 1e). shower_types=1,2 cohort: -0.37 median -> +0.28 (crossed zero); API gauge 68.4% -> 69.0% within-0.5. Golden cert 0300-2747 (1 mixer + 1 electric) re-pinned: PE +0.93 -> -0.10, CO2 +0.25 -> +0.15 (both toward zero, confirming the double-count). Worksheet harness 47/47, 0 divergers (the Elmhurst fixtures have no electric showers). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 23:31:02 +00:00
Khalim Conn-Kowlessar	4fb9b853dc	fix(ventilation): apply Table 4g note 3 in-use factor to index-less MEV SFP The no-PCDB MEV fan-electricity path fed the SAP 10.2 Table 4g default SFP (0.8 W/(l/s)) directly as SFPav. But Table 4g note 3 (PDF p.176) is explicit: the default SFP values "are to be multiplied by the appropriate in-use factor for default data from the PCDB" — PCDB Table 329 system_type 10 ("default data, used when SFP is taken from Table 4g rather than the PCDB"), IUF 2.5 (duct-agnostic per note 2). Table 4h, which previously held these factors, is retired ("no longer used – data now stored in the PCDB"). Omitting the IUF under-billed the index-less MEV fan electricity by 2.5x (SFPav 0.8 instead of 0.8 x 2.5 = 2.0), so cost was too low and the cohort over-rated. This is distinct from the with-index path, which already applies the tested-product system_type-2 "no scheme" IUF (~1.45) per fan. Index-less gas-house MEV cohort: +1.37 median -> -0.18 (12% -> 92% within 0.5), no overshoot — the missing IUF was exactly the over-rate. API gauge 67.7% -> 68.4% within-0.5 (mean\|err\| 0.992 -> 0.986, signed +0.031 -> +0.006). Worksheet harness 47/47, 0 divergers (Summary-path MEV certs carry a PCDB index or are natural, so unaffected). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 23:15:32 +00:00
Khalim Conn-Kowlessar	5b2cf5edc7	Merge remote-tracking branch 'origin/main' into feature/per-cert-mapper-validation # Conflicts: # datatypes/epc/domain/epc_property_data.py # datatypes/epc/domain/mapper.py # datatypes/epc/domain/tests/test_from_rdsap_schema.py	2026-06-13 22:20:15 +00:00
Daniel Roth	4c707212e7	Merge branch 'main' into improve-sharepoint-renamer	2026-06-12 17:16:15 +01:00
Daniel Roth	92ba2b9299	remove local file from branch	2026-06-12 16:15:16 +00:00
Jun-te Kim	015ab9d17b	Merge pull request #1219 from Hestia-Homes/feature/junte+khalim rdSap 17, 18, 19, 20, now maps to EPCPropertyData	2026-06-12 17:14:52 +01:00
Daniel Roth	2bfbad5ced	add local claude settings to gitignore	2026-06-12 16:11:48 +00:00
Jun-te Kim	1f40c3aeef	fix engine dockerfile	2026-06-12 16:07:39 +00:00
Daniel Roth	a135d88721	Rename files in subfolders too	2026-06-12 16:04:19 +00:00
Jun-te Kim	0159176772	python upgraded due to enum	2026-06-12 15:47:28 +00:00
Jun-te Kim	0c211f401f	Merge pull request #1220 from Hestia-Homes/feature/make_test_more_readable added floats helper	2026-06-12 16:04:56 +01:00
Jun-te Kim	80ccec9b68	added floats helper	2026-06-12 14:28:41 +00:00
Jun-te Kim	a6123d762c	Merge branch 'main' of https://github.com/Hestia-Homes/Model into feature/junte+khalim	2026-06-12 13:45:30 +00:00
Jun-te Kim	ff4a2e4242	Merge pull request #1198 from Hestia-Homes/feature/bill-derivation Feature/bill derivation	2026-06-12 14:44:30 +01:00
Jun-te Kim	77c5f7da49	Merge branch 'feature/bill-derivation' of https://github.com/Hestia-Homes/Model into feature/junte+khalim	2026-06-12 12:52:40 +00:00

... 11 12 13 14 15 ...

7203 commits