Commit graph

7203 commits

Author SHA1 Message Date
Daniel Roth
03dc0a3eef add local handler and missing requirement 2026-06-15 15:03:07 +00:00
Khalim Conn-Kowlessar
1f26703dc5 feat(epc-prediction): geo-proximity weighting, per-component (#1227)
Folds a haversine distance kernel into the categorical-mode weighting so a
nearer neighbour counts for more — applied ONLY to the components that showed
a clear distance signal in the corpus pre-check (age band, wall + floor
construction, glazing: homes built/retrofitted together cluster). Roof
construction showed no decay and is excluded; heating keeps its coherent
donor. Predictor stays pure: weights come from target.coordinates vs each
Comparable.coordinates (resolved at the boundary); geo is OFF when the target
has no coords, neutral for a neighbour with none.

Scale chosen on the harness: _GEO_SCALE_KM=0.1 is the gate-safe optimum
(0.05 lifts the corpus more but regresses fixture floor_construction).
Corpus (150pc/514, geo off->on): age 0.564->0.572, age_pm1 0.841->0.847,
wall 0.902->0.912, floor_con 0.786->0.796, glazing 0.667->0.673; roof
unchanged. Fixture: glazing 0.5278->0.5833 (floor ratcheted), all else held.

Refactored recency into a reusable _recency_weights vector composed via
_combine, so similarity/recency/geo factors multiply uniformly. Fixture ships
a committed _coordinates.json (OGL OS OpenData; build script carries it from
the corpus sidecar on rebuild) so the gate exercises geo without S3.

This is the per-component method applied to geography ([[feedback_per_component_best_method]]).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 14:58:42 +00:00
Daniel Roth
9b21cc5512 remove breaking init file 2026-06-15 14:52:48 +00:00
Khalim Conn-Kowlessar
fdc314c857 feat(epc-prediction): thread coordinates onto Comparable + target (#1227)
Adds coordinates: Optional[Coordinates] to Comparable and PredictionTarget
(data carriers — the pure predictor stays IO-free), and wires load_corpus to
read an optional _coordinates.json sidecar ({uprn: [lon, lat]}) and populate
each Comparable from its cert's uprn; iter_predictions threads the held-out
target's coordinates through. Absent sidecar -> geo-weighting stays off (no
behaviour change yet — weighting lands next slice). fetch_corpus_coordinates
now writes the sidecar into the corpus dir. load_corpus populates 99% of
corpus comparables.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 14:46:01 +00:00
Jun-te Kim
140ad39898 Map full-SAP code-based heating systems via sap_main_heating_code 🟩
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 14:40:59 +00:00
Jun-te Kim
345154c6b7 Map full-SAP measured ventilation: air permeability, MV kind, sheltered sides 🟩
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 14:37:52 +00:00
Daniel Roth
9d56cd7c1e
Merge pull request #1234 from Hestia-Homes/feature/deploy-sharepoint-renamer
Deploy sharepoint renamer: Correct dockerfile imports
2026-06-15 15:35:55 +01:00
Khalim Conn-Kowlessar
95719dd587 feat(geospatial): batch coordinates_for_uprns lookup (#1227)
Adds GeospatialRepository.coordinates_for_uprns(uprns) -> dict — a batch
coordinate lookup returning only covered UPRNs. The S3 adapter overrides it
to read the meta once, group UPRNs by their covering partition, and read each
partition once for all the UPRNs it covers; co-located (closely-numbered)
UPRNs share a partition, so an EPC Prediction cohort is typically one or two
reads instead of one per neighbour. Default port impl is a per-UPRN loop.

Feeds the EPC Prediction geo-proximity work: a cohort's UPRNs resolve to
coordinates in a couple of reads (validated at corpus scale: 170 partition
reads for 2683 UPRNs).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 14:35:32 +00:00
Daniel Roth
b31db4b58b correct Dockerfile imports 2026-06-15 14:29:04 +00:00
Khalim Conn-Kowlessar
c0a1bcac95 feat(epc-prediction): resolve corpus UPRN coordinates from S3 (#1227 signal check)
One-time utility: resolves every corpus cert's uprn -> WGS84 lon/lat from the
OS Open-UPRN parquet (DATA_BUCKET/spatial/) via boto3, grouping UPRNs by their
covering partition so each ~1.7MB partition is read at most once (the efficient
batch lookup we intend to add to GeospatialRepository). Caches {uprn:[lon,lat]}
locally for the validation harness. Resolved 2609/2683 corpus UPRNs (97%).

Signal pre-check result (does intra-postcode proximity predict components?):
intra-postcode distances are non-trivial (median 44m, p90 138m, max ~1km),
and nearer neighbours match the target markedly better on age band (0.63 at
<20m -> 0.16 at >300m), wall, glazing and floor construction. Roof shows no
decay. => geo-proximity is worth building, per-component (strongest for age,
the weakest fabric component).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 14:28:39 +00:00
Jun-te Kim
c035d17f2b Map full-SAP certs end-to-end through the dispatch ladder and pin observed score 🟩
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 14:25:48 +00:00
Daniel Roth
0ed17cfd39 Merge branch 'main' into feature/deploy-sharepoint-renamer 2026-06-15 14:24:10 +00:00
Jun-te Kim
acd0ed485d Map full-SAP energy source, mains-gas inference and lighting bulbs 🟩
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 14:23:31 +00:00
Jun-te Kim
cb4d080da2 Map full-SAP heating systems onto the domain SapHeating model 🟩
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 14:18:01 +00:00
Jun-te Kim
6226575086
Merge pull request #1232 from Hestia-Homes/feature/deploy-sharepoint-renamer
Sharepoint renamer: fix terraform issue and add dry_run option
2026-06-15 15:13:02 +01:00
Jun-te Kim
125ff6f4dd Merge remote-tracking branch 'origin/main' into feature/hyde_make_it_more_accurate_with_tests
# Conflicts:
#	datatypes/epc/domain/mapper.py
2026-06-15 14:12:38 +00:00
Daniel Roth
8b27a5fda2 correct lambda name 2026-06-15 14:08:40 +00:00
Daniel Roth
1af9d84f94 Merge branch 'main' into feature/deploy-sharepoint-renamer 2026-06-15 14:07:27 +00:00
Daniel Roth
963b7d70fe fix terraform error and pass handler bool for dry runs 2026-06-15 14:06:54 +00:00
Khalim Conn-Kowlessar
4afab2c3d8 feat(epc-prediction): roof-insulation +/-1-bucket reporting
Adds roof_insulation_thickness_pm1 (mirrors construction_age_band_pm1, issue
#1222): adjacent RdSAP thickness buckets (0/NI,12mm..400mm+) carry near-
identical roof U-values, so an off-by-one bucket is a SAP-neutral hit. 'ND'
(no-data) is off the ordered scale, so only an exact match counts there.
Honest measurement of SAP-relevant roof-insulation quality.

Corpus (150pc/514): exact 49.3% -> +/-1 53.7% (the misses are often multiple
buckets or ND, so the band gain is smaller than age's). Fixture: exact ==
+/-1 (0.4118) — its misses are all >1 bucket; gate floor added at 0.4118.

Also fixes two pre-existing pyright errors in the touched test file
(_epc main_fuel_type/main_heating_control were Optional but the
MainHeatingDetail attributes are non-optional Union[int, str]).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 14:04:18 +00:00
Jun-te Kim
5a3228ab5e
Merge pull request #1217 from Hestia-Homes/feature/per-cert-mapper-validation
Feature/per cert mapper validation
2026-06-15 15:03:05 +01:00
Jun-te Kim
5ebeb71090 Back-solve habitable-room count from full-SAP measured living area 🟩
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 13:58:03 +00:00
Khalim Conn-Kowlessar
fffb07d04b test(harness): re-pin golden-cert plans to the gain-maximising packages
Three more pre-existing failures (present at 9ee38211, before this branch's
recent commits; same family as the orchestration multi-measure re-pin) —
golden-cert plan expectations that predate the ASHP generator (ADR-0025)
and the optimiser folding forced dependencies into candidate gain (ADR-0016):

- test_console: a multi-measure plan now leads with air_source_heat_pump,
  not cavity_wall_insulation (which is dropped — its forced ventilation makes
  the pair net-negative). Assert a measure actually in the package.
- test_report 0330: package is now {solid_floor_insulation, air_source_heat_
  pump}; cavity_wall + forced mechanical_ventilation correctly excluded.
- test_report 0036: gain-maximising package is now {solid_floor_insulation,
  low_energy_lighting}.

Same verified-correct optimiser evolution as 077e3a39 (cavity_wall +2.9 SAP
alone but its forced fabric→ventilation dep drags the pair net-negative).
Re-pin to the actual packages + their trigger fields; the forced wall→vent
edge stays covered by test_measure_dependency / test_optimiser.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 13:57:27 +00:00
Jun-te Kim
af26688846 Derive heat-loss perimeter and party-wall length from full-SAP measured wall areas 🟩
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 13:56:31 +00:00
Khalim Conn-Kowlessar
7f48495ed5 feat(epc-prediction): surface CO2 + PEI calculator floors in the report (#1228)
The validation report showed only the SAP calculator floor (calc(actual) vs
lodged), so the headline PEI MAE (~40 kWh/m2) read as prediction error when
much of it is the calculator's own API-path residual. Adds the CO2 + PEI
floors alongside SAP.

Diagnostic (150pc/514): PEI floor MAE 15.73 (calc(actual) vs lodged) vs SAP
floor 1.57; calc(actual)/lodged PEI ratio ~1.06 (mean +10.7, ~+6% over-
estimate). That RULES OUT the suspected gross unit/definition mismatch (a
unit bug would be ~2x/3.6x, not 1.06) and reframes #1228: the PEI gap is a
modest calculator bias (~16 floor, calc-branch) plus a larger prediction-
sensitivity term (~24) — PEI is far more prediction-sensitive than SAP.
CO2 floor 0.20 t. Script-only; no gate impact.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 13:55:20 +00:00
Khalim Conn-Kowlessar
06a66b3dd9 feat(epc-prediction): coherent heating donor selection (#1225)
Heating sub-fields can't be field-moded without breaking system coherence,
so the whole SapHeating cluster is now copied as a unit from a single
coherent donor rather than inherited from the structural template: the
neighbour matching the cohort's modal heating signature (main fuel +
category + cylinder presence), most recent among the matches (recent cert =
current system). Including cylinder presence in the signature is load-bearing
— it protects has_hot_water_cylinder + cylinder_insulation (a bare fuel+cat
signature regressed them).

Corpus (150pc/514): heating_main_control 66.3 -> 73.9% (+7.6, the target),
main_fuel 92.8 -> 96.9, category 90.7 -> 95.7, water_fuel 92.8 -> 96.3,
water_code 88.5 -> 95.3, has_cylinder 81.1 -> 89.7, secondary 36.2 -> 42.0.
SAP MAE vs lodged 7.08 -> 6.00 (calculator floor 1.57). cylinder_insulation
-13.6 corpus (tiny-n) but +33pp on the fixture; AC requires control up +
fuel/category hold + SAP not worsened, all met.

Gate (36-target fixture): zero regression; ratcheted main_category
0.8889->0.9444, main_control 0.7500->0.8056, water_fuel 0.9167->0.9722,
water_code 0.8889->0.9444, cylinder_insulation_type 0.1667->0.5000. This is
the per-component heating method ([[feedback_per_component_best_method]]):
coherent donor, never field-mode.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 13:48:15 +00:00
Jun-te Kim
8746eabb70 Fail loud on unmapped full-SAP opening-type codes 🟩
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 13:48:14 +00:00
Jun-te Kim
dde98fb684 Collapse full-SAP roof-window openings onto sap_roof_windows 🟩
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 13:46:32 +00:00
Khalim Conn-Kowlessar
077e3a3947 test(orchestration): re-pin multi-measure plan to the gain-maximising package
The optimiser-package expectation was stale: it predated the optimiser
folding a triggered measure's forced dependency into its candidate gain
(ADR-0016). The run considers ALL measures (considered_measures defaults
to None — no restriction), so once the ASHP bundle became SAP-beneficial
(ADR-0025) the gain-maximising package shifted.

Verified the new package is CORRECT, not a regression: on the test EPC,
cavity-wall insulation earns +2.9 SAP alone but its forced fabric→
ventilation dependency (ADR-0016) drags the wall+ventilation pair to a
NET −1.8 SAP (−0.9 on top of the ASHP package), so the gain-maximising
Optimiser correctly excludes the wall and its forced ventilation. Update
the expected set to {air_source_heat_pump, suspended_floor_insulation,
low_energy_lighting, secondary_heating_removal} and drop the wall/vent-
specific assertions — the forced wall→ventilation edge is covered by
test_measure_dependency / test_optimiser; this integration test keeps its
end-to-end optimise→persist→telescope coverage on the chosen package.

Pre-existing failure (present before this branch's recent commits), outside
the handover regression gate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 13:46:22 +00:00
Jun-te Kim
36929accf7 Collapse full-SAP door openings onto door count and area-weighted U-value 🟩
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 13:39:53 +00:00
Khalim Conn-Kowlessar
d762b25808 feat(epc-prediction): recency-weighted glazing mode (#1223)
Per-component method: glazing type is now the recency-weighted cohort mode
applied to every predicted window, rather than copied from the template.
Glazing is retrofitted over a dwelling's life (single -> double), so a
recent neighbour reflects the current state — same family as roof-insulation
thickness. Recency is the CORRECT weighting here: plain moding regressed the
fixture (-5.6pp) and was previously reverted; similarity weighting also
regressed it; recency improves BOTH (window geometry stays on the template,
only the glazing categorical moves).

modal_glazing_type: corpus (150pc/514) 60.7 -> 66.7% (+6.0pp); fixture
0.5000 -> 0.5278 (floor ratcheted up). Heating, geometry residuals and all
other components unchanged. Refactored _recency_weighted_mode to a reusable
_recency_weighted_choice(value_of) shared by roof insulation + glazing.

Closes the #1223 per-component approach: floor-area (median estimate) +
glazing (recency) shipped as distinct best-fit methods rather than a global
recency template, which would have disturbed the coherence-coupled heating
cluster.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 13:35:03 +00:00
Jun-te Kim
70460935b8 Collapse full-SAP window openings onto the engine's sap_windows model 🟩
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 13:32:20 +00:00
Khalim Conn-Kowlessar
4fdc23f83d test(worksheet): pin simulated case 38 — mains-gas secondary reproduces worksheet exactly
The realistic re-generation of case 37 (code-117 gas boiler, control 2102,
+ a MAINS-GAS condensing gas-fire secondary code 611, vs case 37's biogas
605). The full extractor -> mapper -> calculator pipeline reproduces the
worksheet's SAP-rating block EXACTLY: continuous SAP 60.9152 (Δ 2e-5) and
(272) CO2 5801.0770 (Δ ~0). This confirms the boiler-efficiency /
control-2102 −5pp interlock / secondary-fuel handling are all correct, and
that case 37's +7 gap was purely the biogas sub-fuel the Summary export
cannot carry.

Summary mirrored into backend/documents_parser/tests/fixtures so the pin
runs without the unstaged workspace. PE not pinned — it is a separate
DPER block (different scope) already guarded by the corpus PE gauge.
Worksheet harness 47/47 unchanged; pyright net-zero.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 13:31:36 +00:00
Khalim Conn-Kowlessar
51cdc25ce8 feat(epc-prediction): cohort-median floor-area estimate (#1223)
Per-component method, not a global template change: the predicted floor
area is now the cohort median (the MAD-minimising point estimate of the
target's size) rather than whichever structural template's own area. The
calculator derives heat loss from building-part geometry, not this scalar,
so decoupling them is safe and the scalar becomes a better size estimate.

floor_area mean|.|: corpus (150pc/514 targets) 10.62 -> 10.48; fixture
12.2175 -> 11.8983 (ceiling ratcheted down). No other component moves.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 13:30:33 +00:00
Jun-te Kim
0eaf87b106 Carry full-SAP measured fabric U-value descriptions into the domain model 🟩
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 13:10:05 +00:00
Jun-te Kim
c3fd9a6872 Map full-SAP cert identity and scalar fields to EpcPropertyData 🟩
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 13:05:38 +00:00
Daniel Roth
17420408e4
Merge pull request #1230 from Hestia-Homes/feature/deploy-sharepoint-renamer
Deploy sharepoint renamer
2026-06-15 13:45:52 +01:00
Daniel Roth
b9cbea367d correct import in test file 2026-06-15 12:21:32 +00:00
Jun-te Kim
0079752eab inviestigation with hyde values 2026-06-15 12:13:11 +00:00
Jun-te Kim
5923f8d072 Parse full-SAP SAP-Schema-17.1 certificate payloads 🟥
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 12:11:26 +00:00
Daniel Roth
a6050fc1c7 remove tests/ from pytest.ini 2026-06-15 12:04:33 +00:00
Daniel Roth
0fc81da4cf move input files out of scripts/ 2026-06-15 11:14:09 +00:00
Daniel Roth
5c314e2914 move tests out of scripts/ 2026-06-15 11:11:08 +00:00
Daniel Roth
38b9e63844 revert pytest.ini 2026-06-15 11:02:48 +00:00
Daniel Roth
beb4e5d0d9 Move SharePoint renamer logic from scripts/ into orchestrator and app-root handler 2026-06-15 11:01:51 +00:00
Daniel Roth
8cb0e986e6 Deploy SharePoint renamer as Lambda with SQS trigger 🟩 2026-06-15 10:52:52 +00:00
Daniel Roth
b3e9d858d9 SharePoint renamer Lambda handler stub created 🟥 2026-06-15 10:49:01 +00:00
Daniel Roth
383b8b0c37 SharePoint renamer build_canonical_filename behaviour verified by tests 🟩 2026-06-15 10:48:17 +00:00
Daniel Roth
9daf6a8668
Merge pull request #1221 from Hestia-Homes/improve-sharepoint-renamer
Sharepoint renamer recursively looks for files in subfolders
2026-06-15 11:18:03 +01:00
Khalim Conn-Kowlessar
c11eb46b8a fix(modelling): HHR overlay sets off-peak immersion type so HW Table 13 applies
The HHR-storage HeatingOverlay (ADR-0024) added an off-peak electric
immersion cylinder but never set `immersion_heating_type`, so the overlaid
cert left it None. The calculator then could not resolve `immersion_single`
for the SAP 10.2 Table 13 HW high-rate split and billed hot water 100% at
the off-peak low rate — £127.41 vs the relodged after-cert's £169.39,
overstating the overlay's SAP by +1.26 (CO2/PE matched, isolating it to the
HW cost path).

Add `immersion_heating_type` to HeatingOverlay, route it through
`_fold_heating` (it lives on `sap_heating`), and set it to 1 (single
off-peak immersion) on the HHR overlay to match the relodged reference.
Closes both `test_hhr_storage_overlay_reproduces_the_relodged_after_*`
cascade pins (electric-storage and no-system befores share the after).

Pre-existing failure (present before this branch's recent commits), outside
the handover regression gate. Full modelling suite 220 pass, pyright net-
zero.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 06:53:14 +00:00