mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
57 lines
6.5 KiB
Markdown
57 lines
6.5 KiB
Markdown
# Space heating and hot water kWh are ML targets; UCL is folded into training labels
|
||
|
||
**Status: Accepted.** Supersedes [ADR-0006](0006-deterministic-kwh-no-baseline-ml.md).
|
||
|
||
The EPC ML Transform predicts **six targets**: `sap_score`, `co2_emissions`, `peui_raw`, `peui_ucl`, `space_heating_kwh`, `hot_water_kwh`. Two of these (`space_heating_kwh`, `hot_water_kwh`) were explicitly excluded from ML by ADR-0006. We reverse that decision for two independent reasons, the second of which was the deciding factor.
|
||
|
||
## Why baseline kWh becomes an ML target
|
||
|
||
The premise of ADR-0006 was that baseline kWh has no clean source in the gov data and must be derived deterministically from SAP physics + UCL correction. That premise no longer holds:
|
||
|
||
1. The New EPC API exposes `renewable_heat_incentive.space_heating_existing_dwelling` and `renewable_heat_incentive.water_heating` directly as integers (kWh/yr delivered) on every SAP10 certificate. For a SAP10-baseline property, baseline kWh is a lookup, not a derivation — no SAP-physics port required.
|
||
2. **But** for the *Rebaselining* path (pre-SAP10 EPCs being scored against SAP10 methodology) and for *post-measure* impact prediction (the state after a measure is installed), no recorded kWh exists. The choice there is: derive deterministically (the ADR-0006 stance), or predict via ML alongside SAP / carbon / heat. Reason (2) below resolves this in favour of ML.
|
||
|
||
## Why UCL is folded into training labels rather than applied at runtime
|
||
|
||
The UCL per-band correction (Few et al. 2023) is a piecewise-linear function of PEUI keyed on EPC band. Applied at runtime, post-prediction, it produces a **discontinuity at band boundaries**: when a simulated package of measures pushes a property from band D into band C, the per-band slope/intercept switches discontinuously, and the UCL-adjusted kWh can move in the opposite direction to the underlying PEUI prediction. This was observed in practice on the legacy `model_engine`.
|
||
|
||
Folding UCL into the training labels — i.e. computing UCL-corrected PEUI per training row using the row's recorded band, then fitting the model on the corrected target — means the trained model emits metered-equivalent PEUI directly. There is no per-band switching at inference. The discontinuity disappears. The model learns a smooth function over the feature space.
|
||
|
||
The same logic motivates ML prediction of space heating and hot water kWh post-measure: deterministic derivation from a SAP-delta would reintroduce a similar band-boundary artefact at every step where heating efficiency or fuel changes. A single ML model emitting kWh directly is smooth across measure transitions.
|
||
|
||
## Scope of the reversal
|
||
|
||
| Quantity | ADR-0006 stance | ADR-0007 stance |
|
||
|---|---|---|
|
||
| Baseline SAP / carbon / heat demand | ML (unchanged) | ML (unchanged) |
|
||
| Baseline PEUI (`peui_raw`) | Read from EPC; UCL-corrected at runtime | Read from EPC at baseline; ML target with UCL-corrected variant (`peui_ucl`) at training time |
|
||
| Baseline space heating kWh | Deterministic from SAP physics + UCL | Read from EPC for SAP10 baselines; ML for Rebaselining + post-measure |
|
||
| Baseline hot water kWh | Deterministic from SAP physics + UCL | Read from EPC for SAP10 baselines; ML for Rebaselining + post-measure |
|
||
| Post-measure space heating kWh delta | Derived from SAP delta + heating fuel/COP | ML target (predicted directly post-measure) |
|
||
| Post-measure hot water kWh delta | Derived from SAP delta | ML target (predicted directly post-measure) |
|
||
| Fuel split, bills | Deterministic from kWh × Fuel Rates (unchanged) | Deterministic from kWh × Fuel Rates (unchanged) |
|
||
| Carbon factors → CO2 emissions | Deterministic from kWh × Carbon Factors (unchanged at runtime) | Deterministic from kWh × Carbon Factors (unchanged at runtime); ML target also separately for Rebaselining |
|
||
| UCL correction application point | Runtime, post-prediction, per band | Training time, folded into PEUI labels per row's recorded band |
|
||
|
||
## Dual PEUI training targets
|
||
|
||
We train two PEUI variants — `peui_raw` (the EPC's `energy_consumption_current` directly) and `peui_ucl` (the same value with the row's recorded-band UCL correction pre-applied). At v0.1.0 we compare both empirically. The variant with better held-out MAPE wins; the loser is dropped at v0.2.0.
|
||
|
||
## Label coupling, not classical leakage
|
||
|
||
The UCL transform uses the row's recorded SAP-derived band to compute the PEUI label, and SAP score is itself an ML target. This couples the two targets at the label level. It is **not** classical leakage (the band is not in the feature set; the model never reads it as input). The PEUI prediction is independent of the SAP prediction at inference. We accept the coupling as the price of avoiding the band-boundary discontinuity, consistent with our explicit "park target-independence" decision — the six targets are predicted independently and small cross-target inconsistencies are tolerated for v1.
|
||
|
||
Practical safeguard: `energy_rating_current` and any other SAP-score-derived field (e.g. `current_energy_efficiency_band`) are **excluded from the feature set** in the EPC ML Transform, to avoid an entirely separate target-leakage path on the SAP prediction.
|
||
|
||
## Consequences
|
||
|
||
- `EpcEnergyDerivationService` is no longer the source of baseline kWh. Its remaining job is the deterministic step from kWh + Fuel Rates → fuel split + bills, and kWh + Carbon Factors → CO2 emissions. UCL is removed from its runtime path; the `AnnualBillSavings.adjust_energy_to_metered` port that ADR-0006 anticipated does not happen — UCL moves into the training-side EPC ML Transform.
|
||
- The EPC ML Transform owns both feature definitions *and* the per-row UCL label transformation. It is the single artefact tying SAP-band semantics into the training data; cross-repo consumers (AutoGluon) see only post-transform parquet.
|
||
- `FuelRatesRepo`, `CarbonFactorsRepo`, and `HeatingSystemAssumptionsRepo` survive but their `HeatingSystemAssumptionsRepo` consumers shrink — the SAP-physics-decomposition path that ADR-0006 envisaged is unused.
|
||
- Adding more ML targets later (lighting kWh, appliance kWh, cooking kWh) becomes a feature-additive change rather than an architectural one — the precedent of "kWh as ML target" is now established.
|
||
|
||
## What this ADR does not change
|
||
|
||
- Per-recommendation **cost** delta is still deterministic, from kWh delta × current Fuel Rates.
|
||
- Bills surfaced to the UI are always current-rate, never pinned to EPC inspection-date rates.
|
||
- `EpcEnergyDerivationService` is preserved as the bills/fuel-split service; only its responsibility shrinks.
|