Model/docs/adr/0007-kwh-as-ml-target.md
2026-05-16 14:15:56 +00:00

6.5 KiB
Raw Blame History

Space heating and hot water kWh are ML targets; UCL is folded into training labels

Status: Accepted. Supersedes ADR-0006.

The EPC ML Transform predicts six targets: sap_score, co2_emissions, peui_raw, peui_ucl, space_heating_kwh, hot_water_kwh. Two of these (space_heating_kwh, hot_water_kwh) were explicitly excluded from ML by ADR-0006. We reverse that decision for two independent reasons, the second of which was the deciding factor.

Why baseline kWh becomes an ML target

The premise of ADR-0006 was that baseline kWh has no clean source in the gov data and must be derived deterministically from SAP physics + UCL correction. That premise no longer holds:

  1. The New EPC API exposes renewable_heat_incentive.space_heating_existing_dwelling and renewable_heat_incentive.water_heating directly as integers (kWh/yr delivered) on every SAP10 certificate. For a SAP10-baseline property, baseline kWh is a lookup, not a derivation — no SAP-physics port required.
  2. But for the Rebaselining path (pre-SAP10 EPCs being scored against SAP10 methodology) and for post-measure impact prediction (the state after a measure is installed), no recorded kWh exists. The choice there is: derive deterministically (the ADR-0006 stance), or predict via ML alongside SAP / carbon / heat. Reason (2) below resolves this in favour of ML.

Why UCL is folded into training labels rather than applied at runtime

The UCL per-band correction (Few et al. 2023) is a piecewise-linear function of PEUI keyed on EPC band. Applied at runtime, post-prediction, it produces a discontinuity at band boundaries: when a simulated package of measures pushes a property from band D into band C, the per-band slope/intercept switches discontinuously, and the UCL-adjusted kWh can move in the opposite direction to the underlying PEUI prediction. This was observed in practice on the legacy model_engine.

Folding UCL into the training labels — i.e. computing UCL-corrected PEUI per training row using the row's recorded band, then fitting the model on the corrected target — means the trained model emits metered-equivalent PEUI directly. There is no per-band switching at inference. The discontinuity disappears. The model learns a smooth function over the feature space.

The same logic motivates ML prediction of space heating and hot water kWh post-measure: deterministic derivation from a SAP-delta would reintroduce a similar band-boundary artefact at every step where heating efficiency or fuel changes. A single ML model emitting kWh directly is smooth across measure transitions.

Scope of the reversal

Quantity ADR-0006 stance ADR-0007 stance
Baseline SAP / carbon / heat demand ML (unchanged) ML (unchanged)
Baseline PEUI (peui_raw) Read from EPC; UCL-corrected at runtime Read from EPC at baseline; ML target with UCL-corrected variant (peui_ucl) at training time
Baseline space heating kWh Deterministic from SAP physics + UCL Read from EPC for SAP10 baselines; ML for Rebaselining + post-measure
Baseline hot water kWh Deterministic from SAP physics + UCL Read from EPC for SAP10 baselines; ML for Rebaselining + post-measure
Post-measure space heating kWh delta Derived from SAP delta + heating fuel/COP ML target (predicted directly post-measure)
Post-measure hot water kWh delta Derived from SAP delta ML target (predicted directly post-measure)
Fuel split, bills Deterministic from kWh × Fuel Rates (unchanged) Deterministic from kWh × Fuel Rates (unchanged)
Carbon factors → CO2 emissions Deterministic from kWh × Carbon Factors (unchanged at runtime) Deterministic from kWh × Carbon Factors (unchanged at runtime); ML target also separately for Rebaselining
UCL correction application point Runtime, post-prediction, per band Training time, folded into PEUI labels per row's recorded band

Dual PEUI training targets

We train two PEUI variants — peui_raw (the EPC's energy_consumption_current directly) and peui_ucl (the same value with the row's recorded-band UCL correction pre-applied). At v0.1.0 we compare both empirically. The variant with better held-out MAPE wins; the loser is dropped at v0.2.0.

Label coupling, not classical leakage

The UCL transform uses the row's recorded SAP-derived band to compute the PEUI label, and SAP score is itself an ML target. This couples the two targets at the label level. It is not classical leakage (the band is not in the feature set; the model never reads it as input). The PEUI prediction is independent of the SAP prediction at inference. We accept the coupling as the price of avoiding the band-boundary discontinuity, consistent with our explicit "park target-independence" decision — the six targets are predicted independently and small cross-target inconsistencies are tolerated for v1.

Practical safeguard: energy_rating_current and any other SAP-score-derived field (e.g. current_energy_efficiency_band) are excluded from the feature set in the EPC ML Transform, to avoid an entirely separate target-leakage path on the SAP prediction.

Consequences

  • EpcEnergyDerivationService is no longer the source of baseline kWh. Its remaining job is the deterministic step from kWh + Fuel Rates → fuel split + bills, and kWh + Carbon Factors → CO2 emissions. UCL is removed from its runtime path; the AnnualBillSavings.adjust_energy_to_metered port that ADR-0006 anticipated does not happen — UCL moves into the training-side EPC ML Transform.
  • The EPC ML Transform owns both feature definitions and the per-row UCL label transformation. It is the single artefact tying SAP-band semantics into the training data; cross-repo consumers (AutoGluon) see only post-transform parquet.
  • FuelRatesRepo, CarbonFactorsRepo, and HeatingSystemAssumptionsRepo survive but their HeatingSystemAssumptionsRepo consumers shrink — the SAP-physics-decomposition path that ADR-0006 envisaged is unused.
  • Adding more ML targets later (lighting kWh, appliance kWh, cooking kWh) becomes a feature-additive change rather than an architectural one — the precedent of "kWh as ML target" is now established.

What this ADR does not change

  • Per-recommendation cost delta is still deterministic, from kWh delta × current Fuel Rates.
  • Bills surfaced to the UI are always current-rate, never pinned to EPC inspection-date rates.
  • EpcEnergyDerivationService is preserved as the bills/fuel-split service; only its responsibility shrinks.