Model

mirror of https://github.com/Hestia-Homes/Model.git synced 2026-06-30 13:10:47 +00:00

Author	SHA1	Message	Date
Khalim Conn-Kowlessar	4a13fc8b0f	docs(modelling): document scenario-driven exclusions + the run command Update run_modelling_e2e's docstring so another dev can run it: the Scenario's exclusions drive measure scoping (--measures/--exclude-measures are overlays), and flag the secondary_heating_removal catalogue gap that currently requires --exclude-measures. Replace the stale --measures examples with the real scenario-driven inspect/persist commands. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 16:05:44 +00:00
Khalim Conn-Kowlessar	d4d2b222fc	feat(conservatory): §6.1 fabric cascade (27/27a/28a + TFA/volume) Wire the non-separated conservatory into the §3 heat-transmission + §1 dimensions cascade per RdSAP 10 §6.1 (PDF p.49) + Table 25 (p.51): "The floor area and volume of a non-separated conservatory are added to the total floor area and volume of the dwelling. Its roof area is taken as its floor area divided by cos(20°), and wall area is taken as the product of its exposed perimeter and its height. ... The conservatory walls and roof are taken as fully glazed ... Glazed walls are taken as windows, glazed roof as rooflight." New `worksheet/conservatory.py` derives the geometry: - height from the equivalent storey count (§6.1: 1 storey → ground-floor room height; 1½ → ground + 0.25 + 0.5×first; etc.); - glazed WALL → window (27) at Table 25 U (double 3.1 / single 4.8) with the §3.2 curtain resistance (R=0.04) → U_eff 2.758; - glazed ROOF → rooflight (27a) at Table 25 roof U (double 3.4 / single 5.3) + curtain → U_eff 2.993; - FLOOR → (28a) via BS EN ISO 13370 as an uninsulated SOLID ground floor with 300 mm walls (§5.12, spec p.43), exposed perimeter = glazed perimeter → U 0.89; - glazed wall + roof + floor areas join (31)/(36); the fully-glazed structure walls/roof add nothing (the glazing IS the window/rooflight). `dimensions_from_cert` adds the conservatory floor area to TFA (4) and floor area × height to volume (5) (feeds ventilation (8)), without making it a storey (avg storey height for §2 infiltration is unchanged). Pinned against the simulated case-44 P960 §3 at abs=1e-4 — every line ref EXACT: (4) 95.3800, (5) 257.1630, (27) 96.1169, (27a) 38.2201, (28a) 21.4164, (29a) 35.5852, (30) 7.4688, (31) 294.2900, (33) 207.3274, (36) 23.5432. The remaining whole-dwelling SAP/CO2 gap is the §6 solar gains, closed in the next slice. Worksheet harness stays 47/47 0-raised. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 15:59:26 +00:00
Khalim Conn-Kowlessar	fa131cca0b	feat(conservatory): read §6.1 geometry through extractor + mapper RdSAP 10 §6.1 (PDF p.49) models a non-separated (heated) conservatory as part of the dwelling. Until now the Summary §5 block was reduced to an inert `has_conservatory` bool and the geometry (floor area, glazed perimeter, glazing, storey height) was dropped on both paths. Plumbing only — no cascade consumer yet (Slices B/C/D wire §3/§6): - ElmhurstSiteNotesExtractor reads the §5 Conservatory block into a new `Conservatory` site-notes record (scoped to §5 so the generic "Floor Area"/"Room Height" labels can't collide with §4 dimensions); - domain gains a frozen `SapConservatory` (floor area, glazed perimeter, double/single glazing, thermally-separated guard, equivalent storey count) on `EpcPropertyData.sap_conservatory`; - the Elmhurst mapper threads it through, dropping SEPARATED conservatories per §6.2 ("A separated conservatory ... is disregarded"). Verified against the simulated case-44 Summary (RefNo 001431): extracts floor_area=12.0, glazed_perimeter=9.0, double_glazed=True, 1 storey. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 15:37:05 +00:00
Khalim Conn-Kowlessar	3580d059ec	feat(modelling): drive measure scoping from the Scenario's exclusions The measures a run considers should come from the Scenario, not a CLI flag. The live scenario table persists exclusions only (no inclusions column), as a Postgres text-array of exact MeasureType values. - Scenario gains `exclusions: frozenset[MeasureType]` + `considered_measures()` (all measures minus the excluded ones, or None when none are excluded). - ScenarioModel.to_domain parses the `{a,b,c}` exclusions array into MeasureTypes, raising on a token that is not an exact MeasureType value (no high-level category expansion), per the strict-enum convention. - ModellingOrchestrator._plan_for derives the allowlist from the Scenario's exclusions, combined (intersection) with any explicit considered_measures override via the new `combine_considered_measures`. - run_modelling_e2e sources the allowlist from the Scenario; --measures / --exclude-measures become optional overlays (e.g. the technical secondary_heating_removal exclusion the catalogue cannot yet stock). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 15:26:25 +00:00
Jun-te Kim	928fbbc33a	Merge remote-tracking branch 'origin/main' into feature/hyde_make_it_more_accurate_with_tests # Conflicts: # applications/sharepoint_renamer/handler.py # domain/sap10_calculator/worksheet/heat_transmission.py	2026-06-16 15:23:52 +00:00
Jun-te Kim	2f0eb49eee	Checkpoint: UPRN 10093116543 Elmhurst build + devcontainer VNC/Playwright + perms - Add SAP-accuracy sample for uprn_10093116543 (epc.json, elmhurst_inputs.md, summary/worksheet PDFs) - Persist hyde viewer stack (xvfb/fluxbox/x11vnc/novnc/websockify) and Playwright chromium in the backend devcontainer; forward noVNC 6080 - Broaden .claude/settings.local.json allowlist (display/python/grep/tail) - In-progress campaign mapper/cert_to_inputs work carried from prior cert Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 15:21:56 +00:00
Jun-te Kim	80b86d4790	Prove prediction resolves landlord overrides to a real cohort match end-to-end 🟩 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 15:20:49 +00:00
Jun-te Kim	864ba8dc1b	Resolve a Property's prediction attributes from landlord overrides in gov-code space 🟩 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 15:18:44 +00:00
Jun-te Kim	c5cffd9047	Read a Property's resolved landlord overrides as a faithful value-space snapshot 🟩 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 15:14:53 +00:00
Jun-te Kim	a1ce8ece50	Map landlord-override property type and built form to gov EPC codes 🟩 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 15:07:09 +00:00
Khalim Conn-Kowlessar	31ced27162	feat(modelling): surface the full candidate measure menu with per-measure cost The run only showed the measures the Optimiser selected, so a candidate it passed over (e.g. an ASHP it found too costly for the target band) and that measure's cost were invisible. Add `harness.console.candidate_recommendations` — every Generator Option with its per-Option cost, before optimisation — and have run_modelling_e2e print the full menu per property (flagging the selected Options), write a "cost per measure" section into the markdown, and emit a per-Option modelling_e2e_candidates.csv. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 15:03:26 +00:00
Khalim Conn-Kowlessar	688bb4d601	test(corpus): ratchet SAP ceiling 0.99->0.97 (§3.9.2 Type-2 RR) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 15:01:34 +00:00
Khalim Conn-Kowlessar	6385a0be85	fix(mapper): map dropped §3.9.2 Simplified Type-2 room-in-roof (API) The gov API lodges a §3.9.2 Simplified Type-2 RR (a room-in-roof bounded by continuous common walls) under `room_in_roof_type_2` — gable + common-wall lengths AND heights. The block was undeclared → `from_dict` dropped it → neither the Type-1 nor Detailed path fired → the cascade's Simplified branch billed the WHOLE A_RR shell (12.5√(floor/1.5)) at the Table-18-col-4 default with no gable/common-wall deduction (over-count → under-rate; 7 corpus certs at signed −5.02). Fix: declare `RoomInRoofType2` on rdsap_schema_21_0_0/_21_0_1 + SapRoomInRoof, and build `detailed_surfaces` by MIRRORING the worksheet-validated Summary path (`_map_elmhurst_rir_surface`, is_simplified) rather than back-solving: common wall → L × (0.25 + H) (billed at the main-wall U) gable → L × (0.25 + H) − Σ (H − H_cw)²/2 (RdSAP 10 §3.9.2 + Table 4) The gable correction sums all common walls (exposed/party/sheltered, incl. the H=0 absent-gable negative-area case that deducts from the A_RR residual); a Connected gable sums only the common walls it overtops. The `gable_wall_type_*` code routes the kind (0/1/2/3 = Party/Exposed/Sheltered/ Connected). A raw-L×H prototype scattered; the §3.9.2 quadratic is the missing piece. Validation is cross-mapper parity, NOT a corpus back-solve: `_api_type_2_ surfaces` produces surfaces IDENTICAL to the Summary path on cohort cert 000565 (connected_wall 3.68, gable_wall_external 16.08/27.68, common walls, and the −0.17 absent-gable quadratic), and 000565 is pinned to 1e-4 in the harness — so the API RR fabric is now correct by construction. The remaining type-2 cohort SAP scatter is unrelated per-cert causes (stone walls, secondary fuel), not the RR. Gauges: corpus within-0.5 67.6% → 67.9% (MAE 0.979 → 0.959); /tmp 71.7% → 71.8% (MAE 0.838 → 0.822). Harness 47/47 (000565 unchanged); regression = the 3 pre-existing fails; pyright net-zero (65=65). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 15:01:11 +00:00
Khalim Conn-Kowlessar	5c19737fc5	feat(modelling): gate generation by the considered-measures allowlist `restrict_to_considered_measures` filtered candidates only after every generator had run, so an excluded measure still queried the catalogue. That crashed properties with a lodged secondary heater: the live `material.type` enum has no `secondary_heating_removal` value, so the query raised a psycopg2 `InvalidTextRepresentation` before the allowlist could drop it. `_candidate_recommendations` now pairs each generator with the measure types it can emit and runs it only when the allowlist admits one of them (None = all), so an excluded measure never reaches the catalogue. `restrict_to_considered_measures` still trims disallowed Options off the multi-Option survivors. Add `--exclude-measures` to run_modelling_e2e (allowlist minus the excluded set) for excluding one measure without enumerating the rest. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 14:56:09 +00:00
Khalim Conn-Kowlessar	53d9f21f73	fix(modelling): offer ASHP when the catalogue has no ASHP row The ASHP bundle is priced from the rate sheet (ADR-0025); the catalogue row is read only for its material id, which is nullable end-to-end. The live `material` catalogue has no `air_source_heat_pump` row, so `products.get` raised `ValueError: no active product` and aborted every ASHP-eligible property. Add `ProductNotFound(ValueError)` + a concrete `ProductRepository .get_optional`, raise the typed error from both repos, and have `_ashp_option` look the row up optionally — a missing row now yields an ASHP Option with `material_id=None` rather than crashing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 14:55:41 +00:00
KhalimCK	90bed458f4	Merge pull request #1238 from Hestia-Homes/feature/epc-prediction Feature/epc prediction	2026-06-16 21:58:40 +08:00
Khalim Conn-Kowlessar	a43c03ed94	feat(epc-prediction): thread prediction injection points through the composition root build_first_run_pipeline now constructs epc_prediction=EpcPrediction() and accepts comparables_repo + prediction_attributes_reader as optional params, threading them into IngestionOrchestrator (ADR-0031). The on-switch is now just supplying those two arguments — no orchestrator/handler edits — once they exist: the cohort repo (its EPC client is the source client pending #1136) and the property_overrides attributes reader (built separately). Both default None, so the feature stays OFF and ingestion is unchanged until they're passed. The epc_property.source migration is live, so the predicted-EPC persistence slot (slice-5c) now works against the real DB. Handover updated to reflect the simpler composition-root step. pyright strict clean; handler + pipeline + ingestion-prediction tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 13:53:54 +00:00
Khalim Conn-Kowlessar	7ca1f815f6	refactor(epc-prediction): PR review — rename ComparableProperty, relocate PredictionTarget Two review points from @dancafc: 1) Rename the `Comparable` dataclass → `ComparableProperty` (it models one comparable property; the collection stays `ComparableProperties`). Applied across domain, repositories, orchestration, harness, scripts, and tests with a word-boundary rename so `ComparableProperties` is untouched. 2) Move `PredictionTarget` out of comparable_properties.py into prediction_target.py (where `PredictionTargetAttributes` + `build_prediction_target` already live). comparable_properties.py now imports it; no import cycle (prediction_target no longer depends on comparable_properties). Importers updated. 92 tests pass across the touched suites; pyright strict clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 13:34:44 +00:00
KhalimCK	e1e570fdc7	Merge pull request #1239 from Hestia-Homes/feature/per-cert-mapper-validation Feature/per cert mapper validation	2026-06-16 21:22:09 +08:00
Jun-te Kim	dad3044740	save skills and automation progress	2026-06-16 12:17:43 +00:00
Daniel Roth	11b80f03c5	Merge pull request #1240 from Hestia-Homes/feature/sharepoint-renamer-no-images Sharepoint renamer: ignore jpg and heic files. resolve filepath relatively	2026-06-16 11:12:39 +01:00
Daniel Roth	6063fe051e	ignore jpg and heic files. resolve filepath relatively	2026-06-16 10:06:39 +00:00
Jun-te Kim	b78e9c7768	Merge branch 'main' of https://github.com/Hestia-Homes/Model into feature/hyde_make_it_more_accurate_with_tests	2026-06-16 09:17:33 +00:00
Khalim Conn-Kowlessar	419e340477	test(worksheet): pin simulated case 43 at 1e-4 (RR + dry-line + mixed roof) Golden regression fixture for the multi-feature dwelling that surfaced the two Elmhurst-extractor bugs in `a33707f8`. case 43 is a 2-BP mid-terrace with a DETAILED room-in-roof (two slopes, two flat ceilings, party + exposed gables, two common walls), a MIXED-insulation multi-section roof (Main insulated + Extension uninsulated 2.30), a DRY-LINED extension solid wall, a mains-gas boiler (102 / control 2106) and a House-coal solid-fuel secondary (633). Routes the Summary PDF through the WHOLE extractor + mapper + calculator pipeline (no hand-built EpcPropertyData) and pins the §3 fabric + SAP-rating block at abs=1e-4: (29a) walls 74.5800, (30) roof 38.5008, (33) fabric 172.7844, continuous SAP 73.2332 = (258), CO2 3518.3037 = (272). Guards the detailed-RR slope/common_wall surfaces, the dry-lining R=0.17 adjustment, and the per-part mixed-roof billing together. Summary mirrored to backend/documents_parser/tests/fixtures/Summary_001431_case43.pdf; provider module mirrors the _case6/_case21 pattern, assertion in test_section_cascade_pins. Harness 47/47; regression = the 3 pre-existing fails; pyright net-zero. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 08:26:05 +00:00
Khalim Conn-Kowlessar	a33707f851	fix(elmhurst): read main-wall dry-lining + fix last-RR-row U over-read Two compensating Summary-extractor bugs surfaced by simulated case 43 (a 2-BP mid-terrace with a detailed room-in-roof + a dry-lined extension wall). Their fabric errors nearly cancelled (walls net −0.76 W/K), hiding both behind a deceptively small +0.05 SAP delta. Bug 1 — main/extension wall dry-lining never read. The §7 "Dry-lining: Yes/No" line was parsed only for ALTERNATIVE walls; the main/extension WallDetails dropped it, so a dry-lined solid wall was billed at its un-adjusted base U. RdSAP 10 §5.8 + Table 14: a dry-lined uninsulated wall adds R=0.17 → U = 1/(1/U_base + 0.17). Case 43 Ext1: solid brick 1.70 → 1.32. Added `WallDetails.dry_lined`, read it in the extractor (both the main-wall builder and the As-Main copy), threaded it to the domain `wall_dry_lined` (emit None when undried — cascade-equivalent to False, keeps the field absent for the non-dry-lined majority). Bug 2 — the LAST room-in-roof surface row's U over-read. The per-row token scan stops at the next RIR-row name; the final surface (no successor) over- read into the following section, shifting the trailing-token slotting and silently zeroing its `default_u` (case 43 Common Wall 2: 1.90 → 0.00 → the 2.4 m² common wall billed at U=0 instead of the main-wall 1.90). Stop the scan at the row's natural end — the "Yes"/"No" u_value_known flag plus the trailing u_value numeric. Case 43 now reproduces the P960 EXACTLY: (29a) walls 74.5800, (33) fabric 172.7844, continuous SAP 73.2332 = (258), CO2 3518.30 = (272), all <1e-4 (was SAP +0.0455 / CO2 −8.04). Harness 47/47 0 raised; regression = the 3 pre-existing fails; pyright net-zero (51=51). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 07:51:33 +00:00
Khalim Conn-Kowlessar	8a70d22278	test(corpus): ratchet SAP ceiling 1.00->0.99 (detailed-RR common walls) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 06:23:11 +00:00
Khalim Conn-Kowlessar	26998152a7	fix(mapper): read dropped detailed room-in-roof common-wall surfaces Follow-on to the slope/stud slice. A Detailed RR (RdSAP 10 §3.9.2) can also lodge `common_wall_*` — the wall separating the room-in-roof from the rest of the cold roof void. Those fields were undeclared → `from_dict` dropped them → `_api_rir_detailed_surfaces` omitted the common walls → the RR undercounted wall heat loss → over-rate. Fix: declare `common_wall_length/height_1/2` on `RoomInRoofDetails` (21_0_0 + 21_0_1) and build `kind="common_wall"` surfaces (raw L × H area to 2 d.p.). The cascade's Detailed-RR branch already bills common walls at the storey-below main-wall U (Table 4 p.22 "Common wall") and deducts their area from the §3.10.1 residual roof — no calculator change. No insulation thickness is read: common walls take the main-wall U, not a Table 17 RR-element U. 6 /tmp certs carry detailed `common_wall_length_1`: cohort mean\|err\| 2.43 -> 1.25 (all were over-rating; e.g. 2877-3059 +4.55 -> +2.79). Gauges: corpus within-0.5 67.5% -> 67.6% (MAE 0.987 -> 0.979); /tmp 71.6% -> 71.7% (MAE 0.846 -> 0.838). Harness 47/47 0 raised; regression = the 3 pre-existing fails; pyright net-zero (65=65). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 06:22:45 +00:00
Khalim Conn-Kowlessar	e4adab0e88	test(corpus): ratchet SAP floor 0.65->0.67, ceiling 1.08->1.00 Lock in the detailed-RR slope + stud-wall gain (corpus within-0.5 67.3% -> 67.5%, MAE 1.020 -> 0.987). The corpus is a fixed 1000-cert deterministic gauge, so the thresholds track measured HEAD with a small margin per the ratchet convention. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 05:57:27 +00:00
Khalim Conn-Kowlessar	363f14fbb2	fix(mapper): read dropped detailed room-in-roof slope + stud-wall surfaces The gov-EPC API lodges a Detailed RR (RdSAP 10 §3.9, Figure 4) with up to two sloping ceilings (`slope_`) and two vertical stud/knee walls (`stud_wall_`) in addition to the gable + flat-ceiling surfaces. Those slope/stud fields were undeclared on the 21.0.x schema, so `from_dict` silently dropped them and `_api_rir_detailed_surfaces` built ONLY the gable + flat-ceiling surfaces. The (large) sloping roof and the knee walls contributed ZERO heat loss → undercounted RR fabric loss → a systematic over-rate. Fix: declare `slope_`/`stud_wall_` on `RoomInRoofDetails` (rdsap_schema_21_0_0 + _21_0_1) and build `kind="slope"` / `kind="stud_wall"` surfaces in the mapper. The cascade's Detailed-RR branch already routes both to the roof aggregate via `u_rr_slope` (Table 17 col 1) and `u_rr_stud_wall` (Table 17 col 3) — RdSAP 10 §5.11.3, p.43-44 — so no calculator change is needed (Summary path worksheet-validated by the 000565 detailed-RR fixtures). insulation_type is left None to defer to the Table 17 col-(a) mineral-wool default, mirroring the existing flat_ceiling branch. 15 /tmp certs carry `slope_height_1`: cohort mean\|err\| 4.26 -> 2.05, signed +4.09 -> centred (14/15 were over-rating; e.g. 0390-2538 +5.95 -> +3.56). Gauges: corpus within-0.5 67.3% -> 67.5% (MAE 1.020 -> 0.987); /tmp 71.4% -> 71.6% (MAE 0.882 -> 0.846). Harness 47/47 0 raised; regression = the 3 pre-existing fails; pyright net-zero (65=65). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 05:56:47 +00:00
Khalim Conn-Kowlessar	b55b969b84	fix(water-heating): use lodged `cylinder_heat_loss` declared-loss factor The gov API lodges a manufacturer's declared cylinder loss factor (kWh/day) in `sap_heating.cylinder_heat_loss`, in which case it leaves the cylinder volume / insulation type / thickness None. That field was undeclared on the 21.0.x schemas, so `from_dict` dropped it — then `_cylinder_storage_loss_override` hit its insulation-None / volume-None guards and returned None, dropping the §4 storage loss ENTIRELY. The dwelling over-rated (the declared loss is typically ~1.5 kWh/day ≈ 550 kWh/yr). SAP 10.2 §4 branch a) (PDF p.136): when the declared loss factor is known, storage loss (50) = (48) declared loss × (49) Table-2b temperature factor — replacing the Table 2 V×L×VF computation. - declare `cylinder_heat_loss` on RdSapSchema21_0_0/21_0_1.SapHeating + EpcPropertyData.SapHeating; thread through the 21.0.x mappers. - `cylinder_storage_loss_monthly_kwh` gains `declared_loss_kwh_per_day`: when set, combined_55 = declared × TF (volume/insulation unused). - `_cylinder_storage_loss_override` resolves the declared loss BEFORE the insulation/volume guards (the gov omits those when the loss is lodged). 12 /tmp certs carry it (mean \|err\| 3.00 -> 2.51; the clean ones close hard, e.g. 2360 2.65 -> 0.30, 0245 2.25 -> 0.53). Corpus within-0.5 67.0% -> 67.3% (MAE 1.025 -> 1.020); /tmp 71.2% -> 71.4% (0.889 -> 0.882). Worksheet harness 47/47; regression = only the 3 pre-existing fails; pyright net-zero. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 05:27:47 +00:00
Khalim Conn-Kowlessar	7cfd54129b	fix(mapper): read the dropped `rafter_insulation_thickness` API field Roofs lodged insulated at rafters carry their thickness in a DEDICATED gov-EPC API field, `rafter_insulation_thickness` (e.g. "225mm"), while `roof_insulation_thickness` stays None (rafters aren't loft joists). That field was undeclared on the 21.0.x schemas, so `from_dict` silently dropped it — the rafter certs only looked redacted (roof EER 2-4 = insulated, yet no thickness), and the cascade fell to the Table 18 col (2) unknown default (2.30), badly under-rating them. - declare `rafter_insulation_thickness` on RdSapSchema21_0_0/21_0_1 + EpcPropertyData.SapBuildingPart (mirrors the existing sloping_ceiling_/flat_roof_insulation_thickness dropped-field handling). - thread it through `from_rdsap_schema_21_0_0/21_0_1` (older schemas get None via getattr). - `heat_transmission` prefers `rafter_insulation_thickness` over `roof_insulation_thickness` when the part is at-rafters, so the measured RdSAP 10 §5.11.2 Table 16 column (2) row applies (225 mm → 0.25). Completes the rafters roof fix: with the real thickness read, the rafter certs are recovered rather than over-stated — cert 3100-8675-0922-8628 (band E, rafters 225mm) +8.93 → +0.43 SAP. Corpus within-0.5 67.0% (MAE 1.025) and /tmp 71.2% (MAE 0.889) — both NET ABOVE the pre-rafters baseline (66.9% / 70.6%). Worksheet harness 47/47; regression = only the 3 pre-existing fails; pyright net-zero. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 05:04:39 +00:00
Khalim Conn-Kowlessar	5d556faf86	fix(roof): bill at-rafters insulation on RdSAP 10 Table 16/18 column (2) `u_roof` only implemented the joist column, so roofs lodged insulated at rafters (`roof_insulation_location == 1`) were mis-billed at the joist U on both the API and Summary paths — under-stating loss, over-rating SAP. RdSAP 10 §5.11.2 Table 16 (spec p.42-43) gives a distinct "insulation at rafters" column (2): the rafter cavity is shallower than a loft void, so the same depth yields a higher U (200 mm: rafters 0.29 vs joists 0.21). §5.11 Table 18 (p.45) likewise carries a rafters column (2) for unknown / as-built thickness (footnote (1): "The value from the table applies for unknown and as built") — band A-D = 2.30, E = 1.50, F = 0.68, diverging from the joist column's 100 mm-equivalent 0.40 default (footnote (4)). - add `_ROOF_RAFTERS_BY_THICKNESS` (Table 16 col 2) + `_ROOF_RAFTERS_BY_AGE` (Table 18 col 2) to rdsap_uvalues; `u_roof` selects them via a new `insulation_at_rafters` flag (ignored for flat / sloping-ceiling roofs). - `heat_transmission` derives the flag PER BUILDING PART from `roof_insulation_location` (gov-API int 1 / Summary "R Rafters"), which also fixes the multi-part dedup-roof-join problem: each part's own location now drives its U, replacing the unattributable joined `epc.roofs[]` description. Worksheet-validated to 1e-4: simulated case 41 (4-bp — Ext1 rafters 200mm → 0.29, Ext3 rafters As-Built band F → 0.68; roof total 24.8350) and case 42 (6 variants — rafters 50mm → 0.88, rafters unknown band C → 2.30, joists/none unchanged). Case 40 stays exact (roof 35.340, total 441.1606); worksheet harness 47/47. Corpus within-0.5 66.9% → 66.5% (gates 0.65/1.08 hold) — a spec-correct shift, NOT a regression: all 15 corpus rafter certs carry redacted (None) thickness yet lodge roof EER 2-4 (insulated), so the open API blanked a specified thickness and the spec's unknown-rafter 2.30 default correctly over-states them. Recovery needs a roof-EER→thickness inference on the API path (follow-up), not a change to the U-table. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 04:42:44 +00:00
Khalim Conn-Kowlessar	f66e2cb020	docs(epc-prediction): module README + end-to-end showcase test README at domain/epc_prediction/README.md — the flow diagram, where each piece lives, links to the ADRs/CONTEXT/handover/migration note, and a runnable test command. The team's entry point. tests/e2e/test_epc_prediction_e2e.py — the whole gap-fill flow against the REAL Postgres Unit of Work + EPC/Property repositories + EpcComparablePropertiesRepository + EpcPrediction, with only the three external HTTP clients faked (EPC API, geospatial S3, Solar). Proves: EPC-less Property → Ingestion predicts from its postcode cohort → persists to the predicted slot → reloaded Property resolves effective_epc via source_path == "predicted". The canonical "see it in action". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 04:13:30 +00:00
Khalim Conn-Kowlessar	b677448fa0	docs(epc-prediction): slice-5f production-wiring handover for Jun-te The gap-fill is wired end-to-end (slices 5a-5e) behind seams; this note is what's left to switch it on in production: (1) implement the PredictionTargetAttributesReader stub over property_overrides — with the override-value → API-code mapping select_comparables needs; (2) run the epc_property.source Drizzle migration; (3) pass the three optional collaborators at the IngestionOrchestrator composition root. Plus the open Validation-Cohort exclusion (no code path exists yet — exclude on source_path == "predicted" when one is built) and the anomaly dual-use pointer. No code change: the validation exclusion has no consumer to attach to today, and the structural signal (source_path == "predicted") already exists from slice-5a. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 04:05:00 +00:00
Khalim Conn-Kowlessar	5727ac53c1	feat(epc-prediction): slice-5e ingestion wiring (gate → predict → persist) Wire EPC Prediction gap-fill into IngestionOrchestrator (ADR-0031). When the predictor collaborators are injected (ComparablesRepo + PredictionAttributesReader + EpcPrediction), an EPC-less Property is predicted from its postcode cohort and persisted to the predicted slot; the eligibility gate (unknown property_type) and "a lodged EPC is never predicted over" both hold. The two-phase contract is kept: prediction attributes (Landlord Overrides) resolve in the unit prep phase, the cohort fetch + select + predict run in the no-unit IO phase, persistence in the write phase. All three collaborators are OPTIONAL — unwired, ingestion behaves exactly as before (existing tests unchanged). 3 tests (predict+persist, gate, lodged-wins); 228 pass across orchestration + epc_prediction + repositories; pyright strict clean. Production composition-root wiring (real ComparableProperties + override-attributes adapters) is part of the Jun-te handover. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 04:03:02 +00:00
Khalim Conn-Kowlessar	f2f954f459	feat(epc-prediction): slice-5d target assembly + eligibility gate build_prediction_target assembles an EPC-less Property's PredictionTarget from its identity (postcode), resolved coordinates, and Landlord-Override attributes (property_type / built_form / wall_construction). The eligibility GATE: a Property whose property_type is unknown returns None — never sized from a mixed-type cohort (ADR-0031). property_type is the hard cohort filter. The override attributes are read through a PredictionTargetAttributesReader port (stub seam) — the real adapter (a read over property_overrides) is being built separately by the team; ingestion wiring depends on the abstraction and tests substitute a fake. 2 tests (assembly + gate); pyright strict clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 03:56:57 +00:00
Khalim Conn-Kowlessar	fd43cf2d23	feat(epc-prediction): slice-5c predicted-EPC persistence slot Add a `source` discriminator (lodged \| predicted) to the EPC store so a Property holds a lodged EPC and a predicted one (EPC Prediction gap-fill) at once (ADR-0031). EpcRepository.save gains source="lodged"; idempotent delete is now per-source (a predicted save no longer wipes lodged, and vice versa); get_for_property/get_for_properties filter lodged; new get_predicted_for_property / get_predicted_for_properties read predicted. PropertyPostgresRepository.get + get_many hydrate Property.predicted_epc, so the predicted picture reaches the modelling read (both load via get_many). FakeEpcRepo mirrors the dual slot. EpcPropertyModel gains `source` (default "lodged"); the test DB builds from the SQLModel mirror so this is exercised without the prod migration. The matching Drizzle change (column + per-(property_id,source) uniqueness) is the team's to action before merge — docs/MIGRATION_NOTE_predicted_epc_source.md. 3 store tests (coexist, idempotent predicted re-save leaves lodged, lodged-only has no predicted) + property-repo wiring; 85 pass across affected suites; new code pyright-clean (2 pre-existing wwhrs errors in epc_property_table untouched). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 03:50:19 +00:00
Khalim Conn-Kowlessar	6979607ace	feat(epc-prediction): slice-5b ComparableProperties repo port + adapter Build the cohort IO port ADR-0029 deferred (ADR-0031 slice-5b): `ComparablePropertiesRepository.candidates_for(postcode) -> list[Comparable]`, with an EPC-API + geospatial adapter that lists the postcode's lodged certs (search_by_postcode), fetches + maps each (get_by_certificate_number), and resolves their UPRNs to coordinates in ONE batched read. Register metadata the cert doesn't carry (address, registration date) is threaded off the search row; a UPRN-less or unparseable-date cert is kept, just uncoordinated / unweighted. The domain select_comparables then filters these candidates into the cohort. Thin CohortEpcClient / CohortGeospatial Protocols keep the adapter testable against fakes; EpcClientService + GeospatialS3Repository satisfy them structurally (no changes). 3 tests; pyright strict clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 03:40:59 +00:00
Khalim Conn-Kowlessar	086187ddc7	feat(epc-prediction): slice-5a predicted source path on Property Add a `predicted_epc` slot to the Property aggregate and a "predicted" branch to SourcePath / source_path / effective_epc (ADR-0031 decisions 1+3). A neighbour-synthesised EpcPropertyData resolves as the Effective EPC ONLY when there is neither a lodged EPC nor Site Notes — a real source always wins (prediction is last-resort gap-fill). The slot is distinct from `epc` so a predicted picture coexists with any lodged one (provenance is structural, not a flag on EpcPropertyData); downstream consumers are untouched. 3 tests: predicted resolves when sole source; lodged EPC wins over predicted; Site Notes win over predicted. 10/10 green, pyright strict clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 03:33:47 +00:00
Khalim Conn-Kowlessar	d1227fd0c6	docs(epc-prediction): ADR-0031 production wiring + CONTEXT gating rule Resolve the slice-5 design tree (grill-with-docs): estimation runs in Ingestion (refines ADR-0029 dec-3; drops the #1227 "shift to Modelling" — no surviving rationale, and stages communicate only via persisted state); predicted EPC is persisted in a DISTINCT slot (EPC table + source discriminator) so lodged + predicted coexist (enables EPC Anomaly Flags); provenance is structural (the slot), not a field on EpcPropertyData; effective_epc/source_path gain a "predicted" branch; slice-5 is gap-fill only; property_type is a REQUIRED input (hard cohort filter) from Landlord Overrides, and Properties with unknown type are gated out (no national defaults). OS postcode_search as a broader type source is a noted follow-on. CONTEXT EPC Prediction entry gains the gating rule. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 03:23:12 +00:00
Khalim Conn-Kowlessar	58d5b17145	chore(epc-prediction): dense-corpus fetcher + cross-postcode geo no-go Build a geographically DENSE postcode-clustered corpus to test cross-postcode geo expansion (the handover's anticipated "real geo payoff"). The gov EPC API has no area/prefix search (a partial postcode 400s; the old opendatacommunities partial-search API is decommissioned), so neighbourhood enumeration is external: seed K postcodes nationally, expand each via postcodes.io's nearest-postcode endpoint into every unit within RADIUS_M, pull each one's full EPC cohort. postcodes.io is a corpus-BUILD dependency only — the predictor stays pure. Same on-disk layout as the scattered corpus, so load_corpus + the coords resolver consume it unchanged. MEASURE-FIRST RESULT — cross-postcode expansion is a NO-GO. On a 2-seed pilot (York YO19 + Islington N51, 81 postcodes / 1558 certs, 140 SAP-10.2 targets), pooling nearby postcodes regresses accuracy across the board: same-postcode FA_MAE 9.53 wall 92% age 72% floor_con 85% cylinder 91% cross <=0.3km FA_MAE 13.1 wall 80% age 61% floor_con 82% cylinder 79% Even as a thin-cohort top-up it hurts (thin n=18: FA 5.24 -> 7.15). Root cause: the postcode boundary is itself a strong homogeneity prior (a postcode is one coherent street/development), so same-postcode neighbours beat geographically near cross-boundary ones even when the home postcode is sparse (and they rarely are — median same-postcode cohort here is 34). Geo-proximity helps WITHIN a postcode (#1227) but does not survive crossing the boundary. Cross-postcode geo closed; geo weighting stays intra-postcode. Tooling kept (reusable). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 03:03:15 +00:00
Khalim Conn-Kowlessar	be3e51bae9	feat(epc-prediction): geo-proximity-weighted floor-area median Size the predicted dwelling from the geo-proximity-weighted median of the cohort's floor areas rather than the plain median: homes built together share a footprint, so a nearer neighbour's area should count for more (the same street signal #1227 already wired into age / wall / glazing). Reuses `_geo_weights` and adds `_weighted_median`, which reduces exactly to `statistics.median` under uniform weights (geo off / no target coordinates) — including the even-count midpoint average — so the MAD-minimising guarantee is preserved. Measured over the 514-target SAP-10.2 corpus (leave-one-out): floor_area MAE 10.48 -> 9.73 m² MAPE 13.2% -> 12.2% Re-baselines the n=36 fixture floor_area ceiling 11.8983 -> 12.0378 (a method change, not a loosening; the small fixture subset moved +0.14 the other way as sample noise while the population improved decisively). The ceiling still pins the new deterministic value exactly, so the tighten-only ratchet resumes. Investigation ruling out the adjacent floor-area levers (kept in the follow-up): lowering minimum_cohort (9.78-10.03, worse), hard same-form filter (10.19), mean instead of median (10.68), constant bias correction (10.47), extension-conditioning (oracle 9.50, not worth the misclassification cost) and room-in-roof conditioning/additive (RiR is a confound for large multi-part outliers — RiR area is only ~21% of total, and the increment breaks the homes already predicted exactly). Remaining cohort lever is built-form soft-weighting, gated on a denser corpus. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 00:08:05 +00:00
Khalim Conn-Kowlessar	b2b6f8e954	fix(mapper): map Elmhurst "Value known" cylinder to measured volume (code 6) The Elmhurst Summary §15.1 lodges "Cylinder Size: Value known" with the measured volume in the "Cylinder Volume (l)" line — the Summary-path equivalent of the gov-API "Exact" descriptor. The mapper had no entry for "Value known" so `_elmhurst_cylinder_size_code` raised UnmappedElmhurstLabel, and even once mapped the measured volume was never threaded through, so the cascade dropped the cylinder storage loss (~468 kWh/yr) from (219) water heating on every measured-volume-cylinder Summary. Per RdSAP 10 §10.5 Table 28 (p.55) a measured cylinder volume is used directly. Map "Value known" → cascade code 6 (Exact) and thread the §15.1 "Cylinder Volume (l)" value into SapHeating.cylinder_volume_measured_l, which `_cylinder_volume_l_from_code` (cert_to_inputs.py:5281) already reads for code 6 — mirroring the gov-API path (mapper.py:1575/1885). Pins simulated case 39 (P960-0001-001431): an age-A mid-terrace on direct- acting electric room heaters (SAP code 691, cat 10, control 2602) with electric-immersion DHW off a 117 L "Value known" cylinder. The full extractor→mapper→calculator cascade now reproduces the worksheet's SAP-rating block EXACTLY — SAP value 36.6365 (band F) and (272) CO2 2056.0731 kg/yr, with (219) water heating 2637.5049 and (255) total energy cost 1802.0039. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 23:57:25 +00:00
Jun-te Kim	e289c1449b	docs: handoff for expanding the real-life cert SAP-accuracy corpus Strategy/context companion to the validate-cert-sap-accuracy skill: the per-cert loop, how to read the gov-API-vs-Elmhurst comparison, the code->value gotchas (immersion/cylinder/party-wall/baths/off-peak), known mapper gaps to chase (alt-wall drop), cert-selection for coverage, guardrails (corpus gauge, no tuning to one cert, no tolerance widening), and the current corpus state. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:28:40 +00:00
Jun-te Kim	5c11fd35c8	Validate SAP calculator vs Elmhurst; fix reduced-field window U; add accuracy harness Reduced-field window U: heat_transmission derived the synthesised-window raw U from u_window(all None) -> the 2.5 placeholder regardless of glazing. Now routes the (uniform) glazing_type code through u_window (RdSAP Table 24) so e.g. double pre-2002 reads 2.8, not 2.5. Only the pre-SAP10 reduced-field path is affected (21.0.1 certs carry per-window U upstream) — the RdSAP-21.0.1 corpus gauge is unchanged at 66.9% within-0.5. test_real_cert_sap_accuracy: pin uprn_10002468137 (RdSAP-17.1, all-electric storage heaters) at SAP 61, validated against Elmhurst on identical inputs (dual off-peak immersion, 110 L cylinder, 2 baths). Our engine reproduces Elmhurst's fuel cost to the penny; lodged 55 is the old SAP-2012 schema. Tooling to grow the accuracy corpus: - scripts/fetch_real_life_epc_sample.py — capture a cert by UPRN into the corpus. - scripts/compare_epc_paths.py — diff gov-API vs Elmhurst-summary EpcPropertyData and run both through the engine, localising mapper vs calculator differences. - skill validate-cert-sap-accuracy — the end-to-end loop (capture -> Elmhurst inputs -> human builds -> compare -> reconcile -> pin in the test). - skill epc-to-elmhurst-rdsap-inputs reference: corrected immersion (code 1=dual), cylinder size (code 2 = Normal/110 L), and bath-count (WWHRS sub-tab) mappings. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:26:11 +00:00
Khalim Conn-Kowlessar	da3fc92d53	docs(epc-prediction): handover for the accuracy backlog + geo work Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:12:00 +00:00
Daniel Roth	1fe67fe814	Merge pull request #1235 from Hestia-Homes/feature/deploy-sharepoint-renamer Sharepoint renamer: Remove breaking init file	2026-06-15 16:08:49 +01:00
Khalim Conn-Kowlessar	d8f015fb0e	feat(epc-prediction): report floor-area MAE + MAPE vs typical size Adds a floor_area line giving MAE (m2), MAPE (% of actual), and the typical (median actual) size, so the absolute error reads relative to dwelling size. Corpus: MAE 10.48 m2 / MAPE 13.2% / typical 61 m2. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:07:22 +00:00
Khalim Conn-Kowlessar	aea2d7150f	test(epc-prediction): re-baseline modal_glazing floor after main merge main's 'ND' multiple_glazing_type mapper fix (`361abc12`) changes the mapped ground-truth glazing for one fixture cert, so modal_glazing_type re-baselines 0.5833 -> 0.5556 (21/36 -> 20/36). A mapper change shifts the deterministic fixture rates like a fixture change does — re-baseline, not a prediction regression. All other component floors + residual ceilings unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:04:34 +00:00
Khalim Conn-Kowlessar	0b2827e9ff	Merge remote-tracking branch 'origin/main' into feature/epc-prediction	2026-06-15 15:03:27 +00:00

... 9 10 11 12 13 ...

7203 commits