Model/docs/adr/0013-calculator-produces-effective-performance-shadow-first.md
Khalim Conn-Kowlessar 19a56461ba docs(baseline): Bill Derivation design — fuel as calculator output + rebaselining is assemble-and-score
Captures a /grill-with-docs session resolving how BillDerivation gets the
fuel each end use burns, and what Rebaselining actually is.

- ADR-0014 amendment: per-end-use fuel is a calculator OUTPUT (resolved
  Table-32 codes on SapResult: main-1/main-2/secondary/HW + pv_exported_kwh);
  the adapter is a pure SapResult->EnergyBreakdown map. Corrects stale §3
  (is_gas_code... -> sap_fuel.sap_code_to_fuel). Adds COOLING section.
  Interim, pending ADR-0015.
- ADR-0013 amendment: the calculator is the SCORING ENGINE within
  Rebaselining (assemble the Effective EPC picture, then score), not the
  whole of it; the Rebaseliner exposes its SapResult so the orchestrator
  composes Effective Performance AND the Bill from one scoring.
- ADR-0015 (new): mappers own cert normalization; EpcPropertyData becomes a
  strict type. Explains why fuel resolution sits in the calculator today.
- CONTEXT.md: Effective EPC = the assembled picture; Rebaselining = assemble
  (overrides / neighbour-estimation / old-schema remap) then score.
- EpcPropertyData docstring points at ADR-0015.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 18:04:55 +00:00

132 lines
9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
Status: accepted
---
# The `Sap10Calculator` produces Effective Performance (it is the Rebaseliner); Calculated SAP10 Performance is not a persisted third value-set, and is wired in shadow first
Refines [ADR-0004](0004-baseline-performance-lodged-effective-pair.md) (the Lodged/Effective
pair), [ADR-0009](0009-deterministic-sap-calculator.md)/[ADR-0010](0010-sap10-calculator-spec-target-and-validation.md)
(the calculator + the **Calculated SAP10 Performance** term), [ADR-0011](0011-composable-stage-orchestrators.md)
(the `Rebaseliner` seam) and [ADR-0012](0012-unit-of-work-per-stage-batch-transaction.md)
(all-or-nothing per batch). Decided in a `/grill-with-docs` session (2026-06-01) before wiring
`Sap10Calculator` into `PropertyBaselineOrchestrator`.
## Context
The old `model_engine` (`backend/engine/engine.py`) called out to an **ML API**
(`model_api.predict_all` over `BASELINE_MODEL_PREFIXES`) to rebaseline the properties that needed
it. The rebuild replaces that round-trip with the **deterministic `Sap10Calculator`, run live**.
The handover and CONTEXT (line 100) framed **Calculated SAP10 Performance** as a *third* value-set
persisted *alongside* Lodged and Effective (`calculated_*` columns). Walking the baselining
scenarios shows that framing reifies a distinction that does not exist in the domain:
- real lodged SAP10 EPC, no overrides ⇒ Calculated = Lodged = Effective;
- real EPC + property/landlord overrides ⇒ Calculated = Lodged-plus-overrides = Effective;
- estimated EPC (± overrides), or a pre-SAP10 EPC ⇒ Calculated = Effective (no lodged SAP10 to
compare against — Lodged Performance exists only for a *real lodged* EPC).
In every scenario **Effective = Calculated**. There is no third quantity.
## Decision
**The calculator is the mechanism that produces Effective Performance** — i.e. the deterministic
`Rebaseliner` (ADR-0011's seam), superseding the old ML-API rebaseliner. "Calculated SAP10
Performance" is the *name of that output during validation*, **not** a separately-persisted third
value-set. No `calculated_*` columns are added; `property_baseline_performance` keeps its
Lodged/Effective shape (ADR-0004). The ADR-0009 ML model is repositioned as a *future residual head*
over the calculator, not the baseline producer.
**Shadow-first, then promotion.** The calculator still strict-raises (`UnmappedSapCode`,
`MissingMainFuelType`, `UnresolvedPcdbCombiLoss`) on cert mappings it has not yet hardened, and the
strict-typing of `EpcPropertyData` that will close most of those gaps is still pending. A ~40,000
property test cohort is about to flow through baselining. So this lands in two steps:
1. **This slice — shadow.** Performance is still **defined by the input data**: `StubRebaseliner`
keeps producing Effective (`= Lodged` for the only live scenario, real SAP10 + no overrides).
The calculator runs *beside* it, on every Property's Effective EPC, **purely to be battle-tested
in the wild**. It is **not load-bearing**, therefore:
- a calculator raise is **caught and logged at `error`, never aborts the batch** — otherwise one
unmappable cert would lose the load-bearing Lodged/Effective write for the whole batch, and
over a 40k run most batches would never baseline;
- on success, its output is **compared to Lodged and logged, not persisted**`warning` when
`|sap_continuous lodged_sap| > 0.5`, or PEUI / CO2 diverge beyond tolerance (CO2 after the
kg→tonnes conversion). Each log is tagged with the cert's `sap_version` so SAP-10.2 divergence
(a real calculator signal) is separable from older-spec drift (expected — see
[ADR-0010](0010-sap10-calculator-spec-target-and-validation.md) Validation Cohort).
2. **Next slice or two — load-bearing.** When overrides + EPC estimation land (days away),
`StubRebaseliner` is replaced by a calculator-backed `Rebaseliner`: the calculator's output
**becomes Effective Performance**. The failure posture **flips to abort** per ADR-0012 — now that
the calculator *is* the baseline, a silent wrong answer is the expensive outcome, so a raise must
fail the batch noisily. Same exception, opposite handling, because the calculator went from
shadow to load-bearing. The shadow logging is then retired.
## Considered options
- **A third persisted `calculated_*` value-set on `PropertyBaselinePerformance`** (the handover's
recommendation) — rejected: `Effective = Calculated` in every scenario, so the columns would
store a distinction with no domain reality, and the future "supersede effective" promotion would
be a data move instead of nothing.
- **Promote the calculator to drive Effective immediately** — rejected for this one slice: it still
strict-raises on un-hardened mappings, so over the imminent 40k run it would gate the
load-bearing baseline write. Shadow-first surfaces every gap as an aggregatable error log without
blocking baselining.
- **A separate `calculator_shadow` validation table** — held in reserve: log-only is enough while
the calculator is moving and the shadow step is a 12 day stepping stone; we add a queryable table
only if log aggregation proves too weak.
## Consequences
- `property_baseline_performance` is **unchanged** this slice — no migration.
- CONTEXT **Calculated SAP10 Performance**, **Effective Performance**, and **Rebaselining** are
updated: the calculator (not ML) is the rebaseliner mechanism in the rebuilt engine; Calculated is
not a stored third set.
- The shadow runner's broad `except` is deliberate (the point is to discover *what* breaks in the
wild); each caught exception is logged with its type and `property_id`.
- This decision is short-lived in its shadow form by design; the durable half — "the calculator
produces Effective Performance; there is no third value-set" — outlives it.
## Amendment (2026-06-02): shadow collapsed — the calculator is load-bearing now
The shadow stepping-stone was right in shape but wrong in duration: the calculator was ready, and
wiring [Bill Derivation](0014-bill-derivation-from-real-fuel-rates.md) onto its delivered-kWh
breakdown makes it load-bearing for *bills on every property* — so the "shadow until overrides /
estimation land" timeline collapses to now. The durable decision stands (calculator produces
Effective Performance; no third value-set); only the timing changes:
- **`sap_version < 10.2`** → effective performance **is** the calculator's output (the
`StubRebaseliner` floor moves `10.0 → 10.2`; mechanism is the calculator, not ML).
- **`sap_version ≥ 10.2`** → effective = the API's lodged figures; the calculator still runs
**alongside, logging divergence** (the surviving half of the shadow runner) as a validation signal.
- **Failure posture flips to abort:** the calculator is load-bearing for Bill Derivation regardless
of version, so a strict-raise **aborts the batch** (ADR-0012) — the un-mapped cert is fixed
immediately rather than skipped. The shadow's catch-and-log of raises is retired; divergence
*warnings* on `≥ 10.2` certs remain.
The `≥1000-cert parity` gate from ADR-0009/0010 still governs whether the calculator's figures are
*trusted as definitive* for the SAP-10.2 cohort, but it no longer gates *wiring* — pre-10.2 certs
have no current-spec lodged figure to fall back to, so the calculator is the only source there.
## Amendment (2026-06-02): the calculator is the *scoring engine* within Rebaselining, which also feeds Bill Derivation
This ADR's shorthand — "the calculator *is* the Rebaseliner" — is sharpened by the fuller picture of
Rebaselining. **Rebaselining is _assemble the Effective EPC picture, then score it_**: apply
**Landlord Overrides** (boiler → ASHP, wall insulated) as a simulation on `EpcPropertyData`; estimate
components from surrounding properties when there is no EPC; re-map an old-schema EPC and gap-fill from
neighbour predictions (the override/estimation work lands shortly). The `Sap10Calculator` is the
**scoring engine at the tail of that assembly**, not the whole of Rebaselining — so the calculator
call lives **inside** the Rebaseliner (after assembly), never hoisted up into the orchestrator.
Because [Bill Derivation](0014-bill-derivation-from-real-fuel-rates.md) prices the **same scored
picture**, the Rebaseliner **exposes its `SapResult` as a first-class part of its result** — not just
`(Performance, reason)`. The orchestrator runs the calculator **once** (via the Rebaseliner) and
composes two products from that one `SapResult`: Effective Performance, and the Bill
(`EnergyBreakdown.from_sap_result``BillDerivation`). Running the calculator a second time for bills
is rejected — it is the expensive step over the ~40k cohort and a second call could drift from the
first.
Corollary: once Overrides/estimation land, Effective Performance is the calculator's output **even for
`sap_version ≥ 10.2`** — a user-modified or estimated dwelling has no valid lodged figure to keep. The
"keep lodged ≥ 10.2" rule holds only for a real, current, un-overridden EPC; the **Bill always derives
from the `SapResult` regardless** (lodged figures carry no per-end-use kWh).