docs(baseline): Bill Derivation design — fuel as calculator output + rebaselining is assemble-and-score

Captures a /grill-with-docs session resolving how BillDerivation gets the
fuel each end use burns, and what Rebaselining actually is.

- ADR-0014 amendment: per-end-use fuel is a calculator OUTPUT (resolved
  Table-32 codes on SapResult: main-1/main-2/secondary/HW + pv_exported_kwh);
  the adapter is a pure SapResult->EnergyBreakdown map. Corrects stale §3
  (is_gas_code... -> sap_fuel.sap_code_to_fuel). Adds COOLING section.
  Interim, pending ADR-0015.
- ADR-0013 amendment: the calculator is the SCORING ENGINE within
  Rebaselining (assemble the Effective EPC picture, then score), not the
  whole of it; the Rebaseliner exposes its SapResult so the orchestrator
  composes Effective Performance AND the Bill from one scoring.
- ADR-0015 (new): mappers own cert normalization; EpcPropertyData becomes a
  strict type. Explains why fuel resolution sits in the calculator today.
- CONTEXT.md: Effective EPC = the assembled picture; Rebaselining = assemble
  (overrides / neighbour-estimation / old-schema remap) then score.
- EpcPropertyData docstring points at ADR-0015.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-06-02 18:04:55 +00:00
parent c431453d75
commit 19a56461ba
5 changed files with 133 additions and 3 deletions

View file

@ -78,15 +78,15 @@ _Avoid_: patches (deprecated), corrections, manual EPC, edits
### Modelling ### Modelling
**Effective EPC**: **Effective EPC**:
The EpcPropertyData scored by the modelling pipeline for a single Property, derived from either Site Notes alone or the public EPC with Landlord Overrides applied; carries source-derived physical fields and originally recorded performance values, with model-rebaselined performance held separately in Baseline Performance. The assembled `EpcPropertyData` picture the modelling pipeline scores for a single Property. Assembled from whichever source applies: Site Notes alone; or the public EPC with **Landlord Overrides** applied; or — when the EPC is **old** — its schema re-mapped to current and gaps filled from neighbour predictions; or — when there is **no EPC** — components **estimated from surrounding properties**. Carries source-derived physical fields and originally recorded performance values; the performance scored from this picture is held separately in **Baseline Performance**.
_Avoid_: modelling EPC, working EPC, resolved EPC, derived EPC _Avoid_: modelling EPC, working EPC, resolved EPC, derived EPC
**Rebaselining**: **Rebaselining**:
Re-predicting a Property's SAP score, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh via **SAP10 Calculation** (the deterministic `Sap10Calculator`, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013) so the modelling pipeline scores it against the current SAP10 methodology. Triggered when either (a) the Effective EPC was lodged under a methodology the calculator supersedes (`sap_version < 10.2`, the calculator's target spec), so the recorded scores reflect a superseded methodology, or (b) Site Notes / Landlord Overrides changed the physical state of the Property (walls / heating / windows / etc.) so the lodged scores no longer reflect what's installed. Both triggers may fire together. Produces Effective Performance; Lodged Performance is preserved unchanged. kWh is included as ML targets per ADR-0007 — see [[epc-ml-transform]]. Establishing a Property's **Effective Performance** (SAP score, EPC Band, CO2, Primary Energy Intensity, space-heating & hot-water kWh) by **assembling the Effective EPC picture and scoring it** through **SAP10 Calculation** (the deterministic `Sap10Calculator`, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013). The *assembly* is the substance: apply **Landlord Overrides** (e.g. boiler → ASHP, wall insulated) as a simulation on the `EpcPropertyData`; estimate components from surrounding properties when there is no EPC; re-map an old-schema EPC to current and gap-fill from neighbour predictions. The calculator is the **scoring engine at the tail**, not the whole of Rebaselining — so its call lives inside the Rebaseliner, after assembly. Triggered whenever the assembled picture differs from the lodged record: (a) the EPC was lodged under a methodology the calculator supersedes (`sap_version < 10.2`), (b) Overrides / Site Notes changed the physical state (walls / heating / windows / etc.), or (c) the picture is estimated or remapped rather than a real current EPC. Produces Effective Performance; Lodged Performance is preserved unchanged. The same single scoring also yields the per-end-use kWh that **Bill Derivation** prices — one scoring, two products. kWh is an ML target per ADR-0007 — see [[epc-ml-transform]].
_Avoid_: re-scoring, re-prediction, performance recomputation, refresh (for cache-freshness) _Avoid_: re-scoring, re-prediction, performance recomputation, refresh (for cache-freshness)
**Baseline Performance**: **Baseline Performance**:
A Property's current performance aggregate, holding both Lodged Performance and Effective Performance plus the energy block: delivered kWh **per end use** (heating, hot water, lighting, appliances, cooking, pumps/fans, ) and the **annual bill** composed into per-section costs plus a total, produced by **Bill Derivation** from SAP10 Calculation's per-end-use kWh × current Fuel Rates. Persisted as one row (flat typed columns, per-section kWh + cost + total); surfaced as one block in the UI. A Property's current performance aggregate, holding both Lodged Performance and Effective Performance plus the energy block: delivered kWh **per end use** (heating, hot water, lighting, appliances, cooking, pumps/fans, cooling) and the **annual bill** composed into per-section costs plus a total, produced by **Bill Derivation** from SAP10 Calculation's per-end-use kWh × current Fuel Rates. Persisted as one row (flat typed columns, per-section kWh + cost + total); surfaced as one block in the UI.
_Avoid_: baseline predictions, predicted baseline, rebaselined values _Avoid_: baseline predictions, predicted baseline, rebaselined values
**Lodged Performance**: **Lodged Performance**:

View file

@ -566,6 +566,16 @@ class RenewableHeatIncentive:
@dataclass @dataclass
class EpcPropertyData: class EpcPropertyData:
"""The cert aggregate every downstream stage reads.
Currently **loosely typed** (`Union[int, str]` fuel/emitter fields, raw
`Optional[int]` codes, `str` fallbacks) and filled by three mappers EPC
API, Elmhurst site notes, pashub with different conventions, so
normalization happens *downstream* (e.g. fuel resolution in the calculator's
`cert_to_inputs`). The direction is to push normalization to the mappers and
make this a strict type see docs/adr/0015-mappers-own-cert-normalization.md.
"""
# General # General
dwelling_type: str # TODO: make enum? dwelling_type: str # TODO: make enum?
inspection_date: date inspection_date: date

View file

@ -107,3 +107,26 @@ Effective Performance; no third value-set); only the timing changes:
The `≥1000-cert parity` gate from ADR-0009/0010 still governs whether the calculator's figures are The `≥1000-cert parity` gate from ADR-0009/0010 still governs whether the calculator's figures are
*trusted as definitive* for the SAP-10.2 cohort, but it no longer gates *wiring* — pre-10.2 certs *trusted as definitive* for the SAP-10.2 cohort, but it no longer gates *wiring* — pre-10.2 certs
have no current-spec lodged figure to fall back to, so the calculator is the only source there. have no current-spec lodged figure to fall back to, so the calculator is the only source there.
## Amendment (2026-06-02): the calculator is the *scoring engine* within Rebaselining, which also feeds Bill Derivation
This ADR's shorthand — "the calculator *is* the Rebaseliner" — is sharpened by the fuller picture of
Rebaselining. **Rebaselining is _assemble the Effective EPC picture, then score it_**: apply
**Landlord Overrides** (boiler → ASHP, wall insulated) as a simulation on `EpcPropertyData`; estimate
components from surrounding properties when there is no EPC; re-map an old-schema EPC and gap-fill from
neighbour predictions (the override/estimation work lands shortly). The `Sap10Calculator` is the
**scoring engine at the tail of that assembly**, not the whole of Rebaselining — so the calculator
call lives **inside** the Rebaseliner (after assembly), never hoisted up into the orchestrator.
Because [Bill Derivation](0014-bill-derivation-from-real-fuel-rates.md) prices the **same scored
picture**, the Rebaseliner **exposes its `SapResult` as a first-class part of its result** — not just
`(Performance, reason)`. The orchestrator runs the calculator **once** (via the Rebaseliner) and
composes two products from that one `SapResult`: Effective Performance, and the Bill
(`EnergyBreakdown.from_sap_result``BillDerivation`). Running the calculator a second time for bills
is rejected — it is the expensive step over the ~40k cohort and a second call could drift from the
first.
Corollary: once Overrides/estimation land, Effective Performance is the calculator's output **even for
`sap_version ≥ 10.2`** — a user-modified or estimated dwelling has no valid lodged figure to keep. The
"keep lodged ≥ 10.2" rule holds only for a real, current, un-overridden EPC; the **Bill always derives
from the `SapResult` regardless** (lodged figures carry no per-end-use kWh).

View file

@ -101,3 +101,34 @@ production migration is FE-owned (Drizzle); `docs/migrations/` updated.
- **Bill at SAP Table 32 prices** — rejected: standardised rating prices, ~half real electricity. - **Bill at SAP Table 32 prices** — rejected: standardised rating prices, ~half real electricity.
- **JSON `bill_breakdown` block** — rejected: end-uses are fixed-cardinality, so flat columns are - **JSON `bill_breakdown` block** — rejected: end-uses are fixed-cardinality, so flat columns are
clean and stay queryable (ADR-0004). clean and stay queryable (ADR-0004).
## Amendment (2026-06-02): fuel is a calculator *output*; §3's mapping helpers corrected
Wiring the `SapResult → EnergyBreakdown` adapter forced the question §3 left implicit: *where does
the fuel each end use burns come from?* Resolved in a `/grill-with-docs` session.
- **Decision: per-end-use fuel is calculator output.** The calculator resolves the fuel for each
billable end use (it already uses it to derive the delivered kWh and the rating cost), so it emits
the **resolved Table-32 fuel codes** on `SapResult` (main-1 / main-2 / secondary / hot water — the
electric end uses are electricity by construction), alongside `pv_exported_kwh` for the SEG credit.
`BillDerivation`'s adapter is then a **pure `SapResult → EnergyBreakdown` map** and can never price
the calculator's kWh at a fuel the calculator never used. Rejected: an adapter that re-reads raw
`EpcPropertyData` fuel fields and re-normalizes them — that duplicates `cert_to_inputs`
(`_main_fuel_code`, `_water_heating_fuel_code`, HW→main default, CHP blend, the `MissingMainFuelType`
strict-raise) and reopens divergence between the bill and the rating.
- **§3 correction.** §3 says the per-end-use fuel codes map to `Fuel` "via the existing
`is_gas_code` / `is_electric_fuel_code` / `is_liquid_fuel_code` helpers." That is not what shipped:
mapping is `domain/property_baseline/sap_fuel.py::sap_code_to_fuel`, a bounded **Table-32 fuel-code
`Fuel`** dispatch that strict-raises `UnmappedSapCode` on an unmapped code. The "meet at one
vocabulary, not raw SAP codes" intent stands; the named helpers do not.
- **Interim, pending [ADR-0015](0015-mappers-own-cert-normalization.md).** Fuel resolution sits in
the calculator *because* `EpcPropertyData` is not yet a strict normalized type. Once ADR-0015 lands
(mappers normalize at the boundary), attribution can move upstream and the `SapResult` fuel-code
fields may be retired.
- **`COOLING` section added.** §1 listed cooling as an end use but §6's flat columns omitted it.
`BillSection` gains `COOLING` (kWh from `SapResult.space_cooling_fuel_kwh_per_yr`, electricity by
construction), so §6's layout gains a `cooling_kwh` + `cooling_cost_gbp` column pair (FE-owned
Drizzle migration).

View file

@ -0,0 +1,66 @@
---
Status: accepted
---
# Mappers own cert normalization; `EpcPropertyData` becomes a strict normalized type
Names a direction that [ADR-0013](0013-calculator-produces-effective-performance-shadow-first.md)
already gestured at ("the strict-typing of `EpcPropertyData` that will close most of those gaps is
still pending") and that [ADR-0014](0014-bill-derivation-from-real-fuel-rates.md) ran into head-on.
Relates to [ADR-0001](0001-two-source-paths.md) (the two source paths). Decided in a
`/grill-with-docs` session (2026-06-02). This ADR records a **direction + a tracked piece of work**,
not a slice that has landed.
## Context
`EpcPropertyData` is the one cert aggregate every downstream stage reads, but it is **loosely
typed** — `main_fuel_type: Union[int, str]`, `heat_emitter_type: Union[int, str]`, bare
`Optional[int]` codes (`water_heating_fuel`, `secondary_fuel_type`), `str` fallbacks like
`'Unknown'` / `'Pre 2013'`. It is filled by **three mappers with different conventions**:
- the **EPC API** mapper (int codes),
- the **Elmhurst** site-notes mapper (string labels, e.g. `'Bulk LPG'`),
- **pashub**.
Because the cert arrives un-normalized, **normalization happens downstream in the calculator**
(`domain/sap10_calculator/rdsap/cert_to_inputs.py`): `_main_fuel_code` resolves the union and
**strict-raises `MissingMainFuelType`** on a non-int rather than defaulting; `_water_heating_fuel_code`
applies the "HW fuel defaults to the main system" rule; CHP/community blends are reassembled. This
logic is correct, but it lives in the wrong layer — it is *cert-shape* knowledge, not *physics*.
The trigger: [ADR-0014](0014-bill-derivation-from-real-fuel-rates.md)'s `BillDerivation` needs the
fuel each end use burns. The fuel fields *are* on `EpcPropertyData`, but reading them raw would mean
**re-implementing the calculator's normalization** (union resolution, HW→main default, strict-raise,
CHP blend) in a second place — and risk the bill pricing the calculator's delivered kWh at a fuel
the calculator never used. ADR-0014 therefore resolves fuel **inside the calculator** and emits it as
output. That is the right call *given today's loose cert*, but it is a **symptom**: the consumer is
paying for normalization that should have happened at the mapper boundary.
## Decision (direction)
1. **Normalization is a mapper responsibility.** Each mapper (API / Elmhurst / pashub) transforms its
source into a **single normalized shape**, resolving fuel labels→codes, applying defaults, and
raising on genuinely-missing required fields — at the boundary, once.
2. **`EpcPropertyData` becomes strict.** Replace `Union[int, str]` and raw `Optional[int]` code
fields with precise types (enums over SAP code ints; no string fallbacks in the domain object).
3. **Downstream consumers stop re-normalizing.** The calculator's `cert_to_inputs` normalization
shrinks to physics; a consumer like the bill adapter could then read fuel off a strict
`EpcPropertyData` safely (the "read it off the cert" option ADR-0014 rejected becomes sound).
## Consequences / affected areas
- **Calculator**`cert_to_inputs` sheds its fuel/string normalization helpers; strict-raises move
to the mappers (the right place to fix a data gap).
- **Bill Derivation (ADR-0014)** — calculator-side fuel resolution on `SapResult` is an **interim
measure**, explicitly *because* the cert is not yet normalized. When this ADR lands, fuel attribution
can move upstream and the `SapResult` fuel-code fields may be retired.
- **The three mappers** — each gains normalization responsibility and its own conformance tests
(the strict-typing also makes mapper bugs fail loudly at the boundary, not deep in the cascade).
- **Reduced divergence risk** — one normalized vocabulary means the bill, the rating, and any future
consumer cannot silently disagree about a cert's fuels.
## Status of the work
Direction accepted; **not yet implemented**. To be broken into slices and tracked as an issue
parented to the Ara backend PRD (`#1128`). Until then, downstream normalization (and ADR-0014's
calculator-side fuel resolution) stands as the documented interim.