Model/docs/adr/0013-calculator-produces-effective-performance-shadow-first.md
Khalim Conn-Kowlessar 19a56461ba docs(baseline): Bill Derivation design — fuel as calculator output + rebaselining is assemble-and-score
Captures a /grill-with-docs session resolving how BillDerivation gets the
fuel each end use burns, and what Rebaselining actually is.

- ADR-0014 amendment: per-end-use fuel is a calculator OUTPUT (resolved
  Table-32 codes on SapResult: main-1/main-2/secondary/HW + pv_exported_kwh);
  the adapter is a pure SapResult->EnergyBreakdown map. Corrects stale §3
  (is_gas_code... -> sap_fuel.sap_code_to_fuel). Adds COOLING section.
  Interim, pending ADR-0015.
- ADR-0013 amendment: the calculator is the SCORING ENGINE within
  Rebaselining (assemble the Effective EPC picture, then score), not the
  whole of it; the Rebaseliner exposes its SapResult so the orchestrator
  composes Effective Performance AND the Bill from one scoring.
- ADR-0015 (new): mappers own cert normalization; EpcPropertyData becomes a
  strict type. Explains why fuel resolution sits in the calculator today.
- CONTEXT.md: Effective EPC = the assembled picture; Rebaselining = assemble
  (overrides / neighbour-estimation / old-schema remap) then score.
- EpcPropertyData docstring points at ADR-0015.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 18:04:55 +00:00

9 KiB
Raw Blame History

Status
accepted

The Sap10Calculator produces Effective Performance (it is the Rebaseliner); Calculated SAP10 Performance is not a persisted third value-set, and is wired in shadow first

Refines ADR-0004 (the Lodged/Effective pair), ADR-0009/ADR-0010 (the calculator + the Calculated SAP10 Performance term), ADR-0011 (the Rebaseliner seam) and ADR-0012 (all-or-nothing per batch). Decided in a /grill-with-docs session (2026-06-01) before wiring Sap10Calculator into PropertyBaselineOrchestrator.

Context

The old model_engine (backend/engine/engine.py) called out to an ML API (model_api.predict_all over BASELINE_MODEL_PREFIXES) to rebaseline the properties that needed it. The rebuild replaces that round-trip with the deterministic Sap10Calculator, run live.

The handover and CONTEXT (line 100) framed Calculated SAP10 Performance as a third value-set persisted alongside Lodged and Effective (calculated_* columns). Walking the baselining scenarios shows that framing reifies a distinction that does not exist in the domain:

  • real lodged SAP10 EPC, no overrides ⇒ Calculated = Lodged = Effective;
  • real EPC + property/landlord overrides ⇒ Calculated = Lodged-plus-overrides = Effective;
  • estimated EPC (± overrides), or a pre-SAP10 EPC ⇒ Calculated = Effective (no lodged SAP10 to compare against — Lodged Performance exists only for a real lodged EPC).

In every scenario Effective = Calculated. There is no third quantity.

Decision

The calculator is the mechanism that produces Effective Performance — i.e. the deterministic Rebaseliner (ADR-0011's seam), superseding the old ML-API rebaseliner. "Calculated SAP10 Performance" is the name of that output during validation, not a separately-persisted third value-set. No calculated_* columns are added; property_baseline_performance keeps its Lodged/Effective shape (ADR-0004). The ADR-0009 ML model is repositioned as a future residual head over the calculator, not the baseline producer.

Shadow-first, then promotion. The calculator still strict-raises (UnmappedSapCode, MissingMainFuelType, UnresolvedPcdbCombiLoss) on cert mappings it has not yet hardened, and the strict-typing of EpcPropertyData that will close most of those gaps is still pending. A ~40,000 property test cohort is about to flow through baselining. So this lands in two steps:

  1. This slice — shadow. Performance is still defined by the input data: StubRebaseliner keeps producing Effective (= Lodged for the only live scenario, real SAP10 + no overrides). The calculator runs beside it, on every Property's Effective EPC, purely to be battle-tested in the wild. It is not load-bearing, therefore:

    • a calculator raise is caught and logged at error, never aborts the batch — otherwise one unmappable cert would lose the load-bearing Lodged/Effective write for the whole batch, and over a 40k run most batches would never baseline;
    • on success, its output is compared to Lodged and logged, not persistedwarning when |sap_continuous lodged_sap| > 0.5, or PEUI / CO2 diverge beyond tolerance (CO2 after the kg→tonnes conversion). Each log is tagged with the cert's sap_version so SAP-10.2 divergence (a real calculator signal) is separable from older-spec drift (expected — see ADR-0010 Validation Cohort).
  2. Next slice or two — load-bearing. When overrides + EPC estimation land (days away), StubRebaseliner is replaced by a calculator-backed Rebaseliner: the calculator's output becomes Effective Performance. The failure posture flips to abort per ADR-0012 — now that the calculator is the baseline, a silent wrong answer is the expensive outcome, so a raise must fail the batch noisily. Same exception, opposite handling, because the calculator went from shadow to load-bearing. The shadow logging is then retired.

Considered options

  • A third persisted calculated_* value-set on PropertyBaselinePerformance (the handover's recommendation) — rejected: Effective = Calculated in every scenario, so the columns would store a distinction with no domain reality, and the future "supersede effective" promotion would be a data move instead of nothing.
  • Promote the calculator to drive Effective immediately — rejected for this one slice: it still strict-raises on un-hardened mappings, so over the imminent 40k run it would gate the load-bearing baseline write. Shadow-first surfaces every gap as an aggregatable error log without blocking baselining.
  • A separate calculator_shadow validation table — held in reserve: log-only is enough while the calculator is moving and the shadow step is a 12 day stepping stone; we add a queryable table only if log aggregation proves too weak.

Consequences

  • property_baseline_performance is unchanged this slice — no migration.
  • CONTEXT Calculated SAP10 Performance, Effective Performance, and Rebaselining are updated: the calculator (not ML) is the rebaseliner mechanism in the rebuilt engine; Calculated is not a stored third set.
  • The shadow runner's broad except is deliberate (the point is to discover what breaks in the wild); each caught exception is logged with its type and property_id.
  • This decision is short-lived in its shadow form by design; the durable half — "the calculator produces Effective Performance; there is no third value-set" — outlives it.

Amendment (2026-06-02): shadow collapsed — the calculator is load-bearing now

The shadow stepping-stone was right in shape but wrong in duration: the calculator was ready, and wiring Bill Derivation onto its delivered-kWh breakdown makes it load-bearing for bills on every property — so the "shadow until overrides / estimation land" timeline collapses to now. The durable decision stands (calculator produces Effective Performance; no third value-set); only the timing changes:

  • sap_version < 10.2 → effective performance is the calculator's output (the StubRebaseliner floor moves 10.0 → 10.2; mechanism is the calculator, not ML).
  • sap_version ≥ 10.2 → effective = the API's lodged figures; the calculator still runs alongside, logging divergence (the surviving half of the shadow runner) as a validation signal.
  • Failure posture flips to abort: the calculator is load-bearing for Bill Derivation regardless of version, so a strict-raise aborts the batch (ADR-0012) — the un-mapped cert is fixed immediately rather than skipped. The shadow's catch-and-log of raises is retired; divergence warnings on ≥ 10.2 certs remain.

The ≥1000-cert parity gate from ADR-0009/0010 still governs whether the calculator's figures are trusted as definitive for the SAP-10.2 cohort, but it no longer gates wiring — pre-10.2 certs have no current-spec lodged figure to fall back to, so the calculator is the only source there.

Amendment (2026-06-02): the calculator is the scoring engine within Rebaselining, which also feeds Bill Derivation

This ADR's shorthand — "the calculator is the Rebaseliner" — is sharpened by the fuller picture of Rebaselining. Rebaselining is assemble the Effective EPC picture, then score it: apply Landlord Overrides (boiler → ASHP, wall insulated) as a simulation on EpcPropertyData; estimate components from surrounding properties when there is no EPC; re-map an old-schema EPC and gap-fill from neighbour predictions (the override/estimation work lands shortly). The Sap10Calculator is the scoring engine at the tail of that assembly, not the whole of Rebaselining — so the calculator call lives inside the Rebaseliner (after assembly), never hoisted up into the orchestrator.

Because Bill Derivation prices the same scored picture, the Rebaseliner exposes its SapResult as a first-class part of its result — not just (Performance, reason). The orchestrator runs the calculator once (via the Rebaseliner) and composes two products from that one SapResult: Effective Performance, and the Bill (EnergyBreakdown.from_sap_resultBillDerivation). Running the calculator a second time for bills is rejected — it is the expensive step over the ~40k cohort and a second call could drift from the first.

Corollary: once Overrides/estimation land, Effective Performance is the calculator's output even for sap_version ≥ 10.2 — a user-modified or estimated dwelling has no valid lodged figure to keep. The "keep lodged ≥ 10.2" rule holds only for a real, current, un-overridden EPC; the Bill always derives from the SapResult regardless (lodged figures carry no per-end-use kWh).