diff --git a/CONTEXT.md b/CONTEXT.md index 3580b93e..f3ffd4fa 100644 --- a/CONTEXT.md +++ b/CONTEXT.md @@ -78,15 +78,15 @@ _Avoid_: patches (deprecated), corrections, manual EPC, edits ### Modelling **Effective EPC**: -The EpcPropertyData scored by the modelling pipeline for a single Property, derived from either Site Notes alone or the public EPC with Landlord Overrides applied; carries source-derived physical fields and originally recorded performance values, with model-rebaselined performance held separately in Baseline Performance. +The assembled `EpcPropertyData` picture the modelling pipeline scores for a single Property. Assembled from whichever source applies: Site Notes alone; or the public EPC with **Landlord Overrides** applied; or — when the EPC is **old** — its schema re-mapped to current and gaps filled from neighbour predictions; or — when there is **no EPC** — components **estimated from surrounding properties**. Carries source-derived physical fields and originally recorded performance values; the performance scored from this picture is held separately in **Baseline Performance**. _Avoid_: modelling EPC, working EPC, resolved EPC, derived EPC **Rebaselining**: -Re-predicting a Property's SAP score, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh via **SAP10 Calculation** (the deterministic `Sap10Calculator`, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013) so the modelling pipeline scores it against the current SAP10 methodology. Triggered when either (a) the Effective EPC was lodged under a methodology the calculator supersedes (`sap_version < 10.2`, the calculator's target spec), so the recorded scores reflect a superseded methodology, or (b) Site Notes / Landlord Overrides changed the physical state of the Property (walls / heating / windows / etc.) so the lodged scores no longer reflect what's installed. Both triggers may fire together. Produces Effective Performance; Lodged Performance is preserved unchanged. kWh is included as ML targets per ADR-0007 — see [[epc-ml-transform]]. +Establishing a Property's **Effective Performance** (SAP score, EPC Band, CO2, Primary Energy Intensity, space-heating & hot-water kWh) by **assembling the Effective EPC picture and scoring it** through **SAP10 Calculation** (the deterministic `Sap10Calculator`, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013). The *assembly* is the substance: apply **Landlord Overrides** (e.g. boiler → ASHP, wall insulated) as a simulation on the `EpcPropertyData`; estimate components from surrounding properties when there is no EPC; re-map an old-schema EPC to current and gap-fill from neighbour predictions. The calculator is the **scoring engine at the tail**, not the whole of Rebaselining — so its call lives inside the Rebaseliner, after assembly. Triggered whenever the assembled picture differs from the lodged record: (a) the EPC was lodged under a methodology the calculator supersedes (`sap_version < 10.2`), (b) Overrides / Site Notes changed the physical state (walls / heating / windows / etc.), or (c) the picture is estimated or remapped rather than a real current EPC. Produces Effective Performance; Lodged Performance is preserved unchanged. The same single scoring also yields the per-end-use kWh that **Bill Derivation** prices — one scoring, two products. kWh is an ML target per ADR-0007 — see [[epc-ml-transform]]. _Avoid_: re-scoring, re-prediction, performance recomputation, refresh (for cache-freshness) **Baseline Performance**: -A Property's current performance aggregate, holding both Lodged Performance and Effective Performance plus the energy block: delivered kWh **per end use** (heating, hot water, lighting, appliances, cooking, pumps/fans, …) and the **annual bill** composed into per-section costs plus a total, produced by **Bill Derivation** from SAP10 Calculation's per-end-use kWh × current Fuel Rates. Persisted as one row (flat typed columns, per-section kWh + cost + total); surfaced as one block in the UI. +A Property's current performance aggregate, holding both Lodged Performance and Effective Performance plus the energy block: delivered kWh **per end use** (heating, hot water, lighting, appliances, cooking, pumps/fans, cooling) and the **annual bill** composed into per-section costs plus a total, produced by **Bill Derivation** from SAP10 Calculation's per-end-use kWh × current Fuel Rates. Persisted as one row (flat typed columns, per-section kWh + cost + total); surfaced as one block in the UI. _Avoid_: baseline predictions, predicted baseline, rebaselined values **Lodged Performance**: diff --git a/datatypes/epc/domain/epc_property_data.py b/datatypes/epc/domain/epc_property_data.py index 1048bed2..6cae0b52 100644 --- a/datatypes/epc/domain/epc_property_data.py +++ b/datatypes/epc/domain/epc_property_data.py @@ -566,6 +566,16 @@ class RenewableHeatIncentive: @dataclass class EpcPropertyData: + """The cert aggregate every downstream stage reads. + + Currently **loosely typed** (`Union[int, str]` fuel/emitter fields, raw + `Optional[int]` codes, `str` fallbacks) and filled by three mappers — EPC + API, Elmhurst site notes, pashub — with different conventions, so + normalization happens *downstream* (e.g. fuel resolution in the calculator's + `cert_to_inputs`). The direction is to push normalization to the mappers and + make this a strict type — see docs/adr/0015-mappers-own-cert-normalization.md. + """ + # General dwelling_type: str # TODO: make enum? inspection_date: date diff --git a/docs/adr/0013-calculator-produces-effective-performance-shadow-first.md b/docs/adr/0013-calculator-produces-effective-performance-shadow-first.md index 6dd9a044..7012194c 100644 --- a/docs/adr/0013-calculator-produces-effective-performance-shadow-first.md +++ b/docs/adr/0013-calculator-produces-effective-performance-shadow-first.md @@ -107,3 +107,26 @@ Effective Performance; no third value-set); only the timing changes: The `≥1000-cert parity` gate from ADR-0009/0010 still governs whether the calculator's figures are *trusted as definitive* for the SAP-10.2 cohort, but it no longer gates *wiring* — pre-10.2 certs have no current-spec lodged figure to fall back to, so the calculator is the only source there. + +## Amendment (2026-06-02): the calculator is the *scoring engine* within Rebaselining, which also feeds Bill Derivation + +This ADR's shorthand — "the calculator *is* the Rebaseliner" — is sharpened by the fuller picture of +Rebaselining. **Rebaselining is _assemble the Effective EPC picture, then score it_**: apply +**Landlord Overrides** (boiler → ASHP, wall insulated) as a simulation on `EpcPropertyData`; estimate +components from surrounding properties when there is no EPC; re-map an old-schema EPC and gap-fill from +neighbour predictions (the override/estimation work lands shortly). The `Sap10Calculator` is the +**scoring engine at the tail of that assembly**, not the whole of Rebaselining — so the calculator +call lives **inside** the Rebaseliner (after assembly), never hoisted up into the orchestrator. + +Because [Bill Derivation](0014-bill-derivation-from-real-fuel-rates.md) prices the **same scored +picture**, the Rebaseliner **exposes its `SapResult` as a first-class part of its result** — not just +`(Performance, reason)`. The orchestrator runs the calculator **once** (via the Rebaseliner) and +composes two products from that one `SapResult`: Effective Performance, and the Bill +(`EnergyBreakdown.from_sap_result` → `BillDerivation`). Running the calculator a second time for bills +is rejected — it is the expensive step over the ~40k cohort and a second call could drift from the +first. + +Corollary: once Overrides/estimation land, Effective Performance is the calculator's output **even for +`sap_version ≥ 10.2`** — a user-modified or estimated dwelling has no valid lodged figure to keep. The +"keep lodged ≥ 10.2" rule holds only for a real, current, un-overridden EPC; the **Bill always derives +from the `SapResult` regardless** (lodged figures carry no per-end-use kWh). diff --git a/docs/adr/0014-bill-derivation-from-real-fuel-rates.md b/docs/adr/0014-bill-derivation-from-real-fuel-rates.md index cf01b02a..e10d1f32 100644 --- a/docs/adr/0014-bill-derivation-from-real-fuel-rates.md +++ b/docs/adr/0014-bill-derivation-from-real-fuel-rates.md @@ -101,3 +101,34 @@ production migration is FE-owned (Drizzle); `docs/migrations/` updated. - **Bill at SAP Table 32 prices** — rejected: standardised rating prices, ~half real electricity. - **JSON `bill_breakdown` block** — rejected: end-uses are fixed-cardinality, so flat columns are clean and stay queryable (ADR-0004). + +## Amendment (2026-06-02): fuel is a calculator *output*; §3's mapping helpers corrected + +Wiring the `SapResult → EnergyBreakdown` adapter forced the question §3 left implicit: *where does +the fuel each end use burns come from?* Resolved in a `/grill-with-docs` session. + +- **Decision: per-end-use fuel is calculator output.** The calculator resolves the fuel for each + billable end use (it already uses it to derive the delivered kWh and the rating cost), so it emits + the **resolved Table-32 fuel codes** on `SapResult` (main-1 / main-2 / secondary / hot water — the + electric end uses are electricity by construction), alongside `pv_exported_kwh` for the SEG credit. + `BillDerivation`'s adapter is then a **pure `SapResult → EnergyBreakdown` map** and can never price + the calculator's kWh at a fuel the calculator never used. Rejected: an adapter that re-reads raw + `EpcPropertyData` fuel fields and re-normalizes them — that duplicates `cert_to_inputs` + (`_main_fuel_code`, `_water_heating_fuel_code`, HW→main default, CHP blend, the `MissingMainFuelType` + strict-raise) and reopens divergence between the bill and the rating. + +- **§3 correction.** §3 says the per-end-use fuel codes map to `Fuel` "via the existing + `is_gas_code` / `is_electric_fuel_code` / `is_liquid_fuel_code` helpers." That is not what shipped: + mapping is `domain/property_baseline/sap_fuel.py::sap_code_to_fuel`, a bounded **Table-32 fuel-code + → `Fuel`** dispatch that strict-raises `UnmappedSapCode` on an unmapped code. The "meet at one + vocabulary, not raw SAP codes" intent stands; the named helpers do not. + +- **Interim, pending [ADR-0015](0015-mappers-own-cert-normalization.md).** Fuel resolution sits in + the calculator *because* `EpcPropertyData` is not yet a strict normalized type. Once ADR-0015 lands + (mappers normalize at the boundary), attribution can move upstream and the `SapResult` fuel-code + fields may be retired. + +- **`COOLING` section added.** §1 listed cooling as an end use but §6's flat columns omitted it. + `BillSection` gains `COOLING` (kWh from `SapResult.space_cooling_fuel_kwh_per_yr`, electricity by + construction), so §6's layout gains a `cooling_kwh` + `cooling_cost_gbp` column pair (FE-owned + Drizzle migration). diff --git a/docs/adr/0015-mappers-own-cert-normalization.md b/docs/adr/0015-mappers-own-cert-normalization.md new file mode 100644 index 00000000..3ce14694 --- /dev/null +++ b/docs/adr/0015-mappers-own-cert-normalization.md @@ -0,0 +1,66 @@ +--- +Status: accepted +--- + +# Mappers own cert normalization; `EpcPropertyData` becomes a strict normalized type + +Names a direction that [ADR-0013](0013-calculator-produces-effective-performance-shadow-first.md) +already gestured at ("the strict-typing of `EpcPropertyData` that will close most of those gaps is +still pending") and that [ADR-0014](0014-bill-derivation-from-real-fuel-rates.md) ran into head-on. +Relates to [ADR-0001](0001-two-source-paths.md) (the two source paths). Decided in a +`/grill-with-docs` session (2026-06-02). This ADR records a **direction + a tracked piece of work**, +not a slice that has landed. + +## Context + +`EpcPropertyData` is the one cert aggregate every downstream stage reads, but it is **loosely +typed** — `main_fuel_type: Union[int, str]`, `heat_emitter_type: Union[int, str]`, bare +`Optional[int]` codes (`water_heating_fuel`, `secondary_fuel_type`), `str` fallbacks like +`'Unknown'` / `'Pre 2013'`. It is filled by **three mappers with different conventions**: + +- the **EPC API** mapper (int codes), +- the **Elmhurst** site-notes mapper (string labels, e.g. `'Bulk LPG'`), +- **pashub**. + +Because the cert arrives un-normalized, **normalization happens downstream in the calculator** +(`domain/sap10_calculator/rdsap/cert_to_inputs.py`): `_main_fuel_code` resolves the union and +**strict-raises `MissingMainFuelType`** on a non-int rather than defaulting; `_water_heating_fuel_code` +applies the "HW fuel defaults to the main system" rule; CHP/community blends are reassembled. This +logic is correct, but it lives in the wrong layer — it is *cert-shape* knowledge, not *physics*. + +The trigger: [ADR-0014](0014-bill-derivation-from-real-fuel-rates.md)'s `BillDerivation` needs the +fuel each end use burns. The fuel fields *are* on `EpcPropertyData`, but reading them raw would mean +**re-implementing the calculator's normalization** (union resolution, HW→main default, strict-raise, +CHP blend) in a second place — and risk the bill pricing the calculator's delivered kWh at a fuel +the calculator never used. ADR-0014 therefore resolves fuel **inside the calculator** and emits it as +output. That is the right call *given today's loose cert*, but it is a **symptom**: the consumer is +paying for normalization that should have happened at the mapper boundary. + +## Decision (direction) + +1. **Normalization is a mapper responsibility.** Each mapper (API / Elmhurst / pashub) transforms its + source into a **single normalized shape**, resolving fuel labels→codes, applying defaults, and + raising on genuinely-missing required fields — at the boundary, once. +2. **`EpcPropertyData` becomes strict.** Replace `Union[int, str]` and raw `Optional[int]` code + fields with precise types (enums over SAP code ints; no string fallbacks in the domain object). +3. **Downstream consumers stop re-normalizing.** The calculator's `cert_to_inputs` normalization + shrinks to physics; a consumer like the bill adapter could then read fuel off a strict + `EpcPropertyData` safely (the "read it off the cert" option ADR-0014 rejected becomes sound). + +## Consequences / affected areas + +- **Calculator** — `cert_to_inputs` sheds its fuel/string normalization helpers; strict-raises move + to the mappers (the right place to fix a data gap). +- **Bill Derivation (ADR-0014)** — calculator-side fuel resolution on `SapResult` is an **interim + measure**, explicitly *because* the cert is not yet normalized. When this ADR lands, fuel attribution + can move upstream and the `SapResult` fuel-code fields may be retired. +- **The three mappers** — each gains normalization responsibility and its own conformance tests + (the strict-typing also makes mapper bugs fail loudly at the boundary, not deep in the cascade). +- **Reduced divergence risk** — one normalized vocabulary means the bill, the rating, and any future + consumer cannot silently disagree about a cert's fuels. + +## Status of the work + +Direction accepted; **not yet implemented**. To be broken into slices and tracked as an issue +parented to the Ara backend PRD (`#1128`). Until then, downstream normalization (and ADR-0014's +calculator-side fuel resolution) stands as the documented interim.