# Space heating and hot water kWh are ML targets; UCL is folded into training labels **Status: Accepted.** Supersedes [ADR-0006](0006-deterministic-kwh-no-baseline-ml.md). The EPC ML Transform predicts **six targets**: `sap_score`, `co2_emissions`, `peui_raw`, `peui_ucl`, `space_heating_kwh`, `hot_water_kwh`. Two of these (`space_heating_kwh`, `hot_water_kwh`) were explicitly excluded from ML by ADR-0006. We reverse that decision for two independent reasons, the second of which was the deciding factor. ## Why baseline kWh becomes an ML target The premise of ADR-0006 was that baseline kWh has no clean source in the gov data and must be derived deterministically from SAP physics + UCL correction. That premise no longer holds: 1. The New EPC API exposes `renewable_heat_incentive.space_heating_existing_dwelling` and `renewable_heat_incentive.water_heating` directly as integers (kWh/yr delivered) on every SAP10 certificate. For a SAP10-baseline property, baseline kWh is a lookup, not a derivation — no SAP-physics port required. 2. **But** for the *Rebaselining* path (pre-SAP10 EPCs being scored against SAP10 methodology) and for *post-measure* impact prediction (the state after a measure is installed), no recorded kWh exists. The choice there is: derive deterministically (the ADR-0006 stance), or predict via ML alongside SAP / carbon / heat. Reason (2) below resolves this in favour of ML. ## Why UCL is folded into training labels rather than applied at runtime The UCL per-band correction (Few et al. 2023) is a piecewise-linear function of PEUI keyed on EPC band. Applied at runtime, post-prediction, it produces a **discontinuity at band boundaries**: when a simulated package of measures pushes a property from band D into band C, the per-band slope/intercept switches discontinuously, and the UCL-adjusted kWh can move in the opposite direction to the underlying PEUI prediction. This was observed in practice on the legacy `model_engine`. Folding UCL into the training labels — i.e. computing UCL-corrected PEUI per training row using the row's recorded band, then fitting the model on the corrected target — means the trained model emits metered-equivalent PEUI directly. There is no per-band switching at inference. The discontinuity disappears. The model learns a smooth function over the feature space. The same logic motivates ML prediction of space heating and hot water kWh post-measure: deterministic derivation from a SAP-delta would reintroduce a similar band-boundary artefact at every step where heating efficiency or fuel changes. A single ML model emitting kWh directly is smooth across measure transitions. ## Scope of the reversal | Quantity | ADR-0006 stance | ADR-0007 stance | |---|---|---| | Baseline SAP / carbon / heat demand | ML (unchanged) | ML (unchanged) | | Baseline PEUI (`peui_raw`) | Read from EPC; UCL-corrected at runtime | Read from EPC at baseline; ML target with UCL-corrected variant (`peui_ucl`) at training time | | Baseline space heating kWh | Deterministic from SAP physics + UCL | Read from EPC for SAP10 baselines; ML for Rebaselining + post-measure | | Baseline hot water kWh | Deterministic from SAP physics + UCL | Read from EPC for SAP10 baselines; ML for Rebaselining + post-measure | | Post-measure space heating kWh delta | Derived from SAP delta + heating fuel/COP | ML target (predicted directly post-measure) | | Post-measure hot water kWh delta | Derived from SAP delta | ML target (predicted directly post-measure) | | Fuel split, bills | Deterministic from kWh × Fuel Rates (unchanged) | Deterministic from kWh × Fuel Rates (unchanged) | | Carbon factors → CO2 emissions | Deterministic from kWh × Carbon Factors (unchanged at runtime) | Deterministic from kWh × Carbon Factors (unchanged at runtime); ML target also separately for Rebaselining | | UCL correction application point | Runtime, post-prediction, per band | Training time, folded into PEUI labels per row's recorded band | ## Dual PEUI training targets We train two PEUI variants — `peui_raw` (the EPC's `energy_consumption_current` directly) and `peui_ucl` (the same value with the row's recorded-band UCL correction pre-applied). At v0.1.0 we compare both empirically. The variant with better held-out MAPE wins; the loser is dropped at v0.2.0. ## Label coupling, not classical leakage The UCL transform uses the row's recorded SAP-derived band to compute the PEUI label, and SAP score is itself an ML target. This couples the two targets at the label level. It is **not** classical leakage (the band is not in the feature set; the model never reads it as input). The PEUI prediction is independent of the SAP prediction at inference. We accept the coupling as the price of avoiding the band-boundary discontinuity, consistent with our explicit "park target-independence" decision — the six targets are predicted independently and small cross-target inconsistencies are tolerated for v1. Practical safeguard: `energy_rating_current` and any other SAP-score-derived field (e.g. `current_energy_efficiency_band`) are **excluded from the feature set** in the EPC ML Transform, to avoid an entirely separate target-leakage path on the SAP prediction. ## Consequences - `EpcEnergyDerivationService` is no longer the source of baseline kWh. Its remaining job is the deterministic step from kWh + Fuel Rates → fuel split + bills, and kWh + Carbon Factors → CO2 emissions. UCL is removed from its runtime path; the `AnnualBillSavings.adjust_energy_to_metered` port that ADR-0006 anticipated does not happen — UCL moves into the training-side EPC ML Transform. - The EPC ML Transform owns both feature definitions *and* the per-row UCL label transformation. It is the single artefact tying SAP-band semantics into the training data; cross-repo consumers (AutoGluon) see only post-transform parquet. - `FuelRatesRepo`, `CarbonFactorsRepo`, and `HeatingSystemAssumptionsRepo` survive but their `HeatingSystemAssumptionsRepo` consumers shrink — the SAP-physics-decomposition path that ADR-0006 envisaged is unused. - Adding more ML targets later (lighting kWh, appliance kWh, cooking kWh) becomes a feature-additive change rather than an architectural one — the precedent of "kWh as ML target" is now established. ## What this ADR does not change - Per-recommendation **cost** delta is still deterministic, from kWh delta × current Fuel Rates. - Bills surfaced to the UI are always current-rate, never pinned to EPC inspection-date rates. - `EpcEnergyDerivationService` is preserved as the bills/fuel-split service; only its responsibility shrinks.