# Deterministic SAP 10.3 calculator alongside the ML model; ML becomes a residual learner

**Status: Accepted.** Builds on [ADR-0007](0007-kwh-as-ml-target.md) (the SAP10 calculator is the ground truth ML approximates) and [ADR-0008](0008-physics-as-feature.md) (we already ship ~30% of a calculator as physics features). Decision point: do we keep grinding ML accuracy on `sap_score`, or do we *write the calculator* and have ML predict its residual?

## Grill outcomes (2026-05-17)

Seven open questions resolved through a `/grill-with-docs` session before Session A. Each lands a binding scope decision for the implementation:

| # | Question | Decision |
|---|---|---|
| 0 | Domain placement | **Option B** — new term **Calculated SAP10 Performance**, parallel to Effective Performance (ML) and Lodged Performance (gov register). Effective Performance is **not** retired now; a future ADR may promote Calculated to its current role once parity is confirmed. Process named **SAP10 Calculation**. |
| 1 | PCDB heat-pump COP source for Session A | **Stub-seam.** Define `PcdbLookup` Protocol, ship `NoOpPcdbLookup` returning None, fall back to Table 4a. Session C bundles a CSV PCDB extract under `domain/sap10_calculator/tables/pcdb/data/` and implements the lookup. |
| 2 | MCS installation factors | **Boolean input on calculator inputs, default `False`.** Plumbing in Session A; no behaviour change until the input is populated. Slice 18f (separate, tracked in HANDOFF §7-D0) lifts `mcs_installed_heat_pump` from gov API → `EpcPropertyData.MainHeatingDetail` so calculator can apply the factor on the ~1.5% of HP certs that carry it. |
| 3 | Thermal bridging | **Global y factor** (the path SAP 10.3 specifies for RdSAP-driven assessments). Per-junction Table R2 sum requires junction-count inputs the cert doesn't carry — not available on the RdSAP-driven flow. |
| 4 | Living-area fraction default | **RdSAP 10 Table 27** — direct lookup from `habitable_rooms_count`. Unambiguous, one-line table. |
| 5 | Secondary-heating allocation | **SAP 10.2/10.3 Table 11** keyed on main heating type. RdSAP doesn't redefine the fraction — it identifies the type only. Forcing rule: when main is micro-CHP and Table N9 says non-zero secondary heat with no secondary specified, assume portable electric heaters. |
| 6 | Validation cohort | **Stratified random of 1000 certs**; report MAE per stratum. Session A success criterion = MAE ≤ 1.0 SAP-point on the **typical subset** (excluding sap_score ≤ 5, sap_score ≥ 100, multi-heating, conservatory, RIR). Global MAE reported alongside for honesty. |
| 7 | `MeasureOverrides` shape | **Rejected as phantom mid-layer.** `Sap10Calculator.calculate(epc) -> SapResult` takes a single immutable cert. A separate **MeasureApplicator** service translates Optimised Package → cert-field changes, returning the "ending state snapshot" EpcPropertyData that Plan Phase already persists. Three pure functions in chain: applicator → calculator → result. |

## Additional findings from the grill that change Session A scope

- **SAP rating formula belongs to RdSAP, not SAP 10.3.** RdSAP §19 ("RdSAP10-specific SAP rating equations referred to as EER") defines the SAP-score equation used for RdSAP-driven assessments. SAP 10.3 §13 defines the rating for new-build assessments. The cert's `energy_rating_current` was computed by RdSAP §19, so parity validation must compute against RdSAP §19, not SAP 10.3 §13.
- **RdSAP 10 (June 2025) cross-references SAP 10.2 (March 2025) for heating-system identification (Appendix A).** RdSAP was published before SAP 10.3 (Jan 2026). Until BRE updates RdSAP to reference SAP 10.3, the calculator's heating-identification logic reads SAP 10.2 Appendix A while everything else reads SAP 10.3. Keep both PDFs in `domain/sap10_calculator/docs/specs/`.
- **RdSAP Table 29 ("Heating and hot water parameters") is a 20+-entry defaulting table** that the `cascade_defaults.py` module needs to encode. Current scope of `rdsap_uvalues.py` is U-values only; Table 29 extends the cascade pattern to cylinder insulation, primary-pipework insulation, boiler interlock, emitter temperature, underfloor-heating routing, solar-panel parameters, heat-network defaults. Adds ~1-2 hrs to Session A (effective Session A.5 if not split).
- **MCS field exists in gov API** but is dropped by the current mapper. Slice 18f (lift `mcs_installed_heat_pump` into `EpcPropertyData`) is a prerequisite for the MCS-factor path. ~30 min slice; can ship before Session A or in parallel.

## Problem

After six slices of physics-feature work (18b/18c/18d/20a/20a.1) the ML model is at sap_score MAPE 3.63%, MAE 1.86 globally; per-decile MAE 3.86 (d0) and 2.25 (d9). Each new slice now nudges d0 MAE by ~0.05. User's target is MAE ≤ 0.5 across all bands. The remaining error is dominated by:

1. **Catastrophic tail noise** — d0 has 3.3% of rows with `sap_score ≤ 20` (heritage / abandoned / data-anomaly homes). MAE on those rows is structurally large because the model's prediction floor is ~30 even for the worst inputs.
2. **Calculator nuance the physics features can't reach** — monthly heat balance with solar/internal gains and utilisation factor, full SAP §J hot-water variants, PCDB heat-pump overrides, dual-fuel allocation, conservatory modes, room-in-roof handling. Each of these is a deterministic line in the SAP10.3 spec but we model it via tree splits over input fields.

These cannot be closed by another tree feature. They require executing the calculator.

## Decision

Build a deterministic **`Sap10Calculator`** that reads `EpcPropertyData` and emits the same outputs the certificate's BRE-approved assessor software emits: `sap_score`, `co2_emissions`, `peui_raw`, `peui_ucl`, `space_heating_kwh`, `hot_water_kwh`. Target the SAP 10.3 specification (DESNZ/BRE, 13-01-2026) and the RdSAP 10 specification (BRE, 10-06-2025), both held in `domain/sap10_calculator/docs/specs/`.

The ML model is **not deprecated**. It is repurposed as a **residual learner** against `actual_sap − calculator_sap` (and similar deltas for the other five targets). Residual distributions are much narrower than the raw target distributions (calculator is within ~1 SAP-point on 95% of typical certs, per the working hypothesis), so the ML residual head should fit the corrections with far fewer features and reach the MAE ≤ 0.5 target.

## Why now

1. **SAP 10.3 just dropped (Jan 2026).** Building against the new spec means the calculator outputs match assessor software for any cert lodged from 2026 onward. Building against SAP 10.2 (March 2025) now would need re-derivation later.
2. **The retrofit-simulation use case demands transparency.** Surveyors, building physicists, and homeowners need to see exactly which physics line — wall U×A, ventilation ACH, solar gain on south-facing windows — contributes how much heat-loss/cost. Tree-model attribution doesn't supply that. Calculator does.
3. **30% of the calculator is already shipped.** `rdsap_uvalues.py` (Tables 6–10, 15–20, 24, 26), `sap_efficiencies.py` (Tables 4a, 4b, 32), `envelope.py` (Σ U·A + thermal bridging), partial `ventilation.py` (slice 20a tracer), partial `demand.py` (annual heat balance), `ecf.py` (Total fuel cost, ECF, log10ECF), PV credit (slice 17a), SAP §J hot-water port (slice 17b). The pivot is mostly re-platforming, not new physics.
4. **ML residual learning has a clean home for the noise.** The catastrophic-tail rows the calculator gets wrong (data anomalies, mis-described systems) are exactly where ML *should* live, because they're not closed-form solvable. Calculator + residual head is a cleaner split of responsibility than "ML approximates the deterministic spec".

## Scope of the calculator (Session A)

A full SAP 10.3 worksheet plus the data-extraction rules from RdSAP 10 Appendix S. Module organisation:

```
domain/sap10_calculator/
  __init__.py                    # Sap10Calculator entry point + SapResult dataclass
  worksheet/
    dimensions.py                # §1
    ventilation.py               # §2 + Table 5 + Appendix Q
    heat_transmission.py         # §3 + Appendix K (thermal bridging) + Tables 6–10/15–20/24/26
    hot_water.py                 # §4 + Appendix J + Appendix G (FGHRS/WWHRS/PV-diverters)
    internal_gains.py            # §5 + Appendix L (lighting)
    solar_gains.py               # §6 + Tables 6d/6e
    mean_temperature.py          # §7
    climate.py                   # §8 + Appendix U (region-from-postcode, monthly external temp/wind/solar)
    space_heating.py             # §9 + Appendices A/B/D/E/N (heating systems, efficiency, heat pumps)
    fuel_cost.py                 # §12 + Table 32 (fuel prices) + Appendix M (PV/wind/hydro generation)
    energy_cost_rating.py        # §13 + the SAP score formula
    co2_primary_energy.py        # §14 (emissions + primary energy)
    fee.py                       # §11 Fabric Energy Efficiency
  tables/
    table_4a_4b.py               # heating-system seasonal efficiency
    table_5.py                   # ventilation rate components
    table_6.py                   # monthly external temp by region
    table_6d.py                  # monthly solar flux by orientation by region
    table_32.py                  # fuel prices
    table_R.py                   # reference values (Appendix R)
  rdsap/
    appendix_s.py                # cert → calculator input mapping
    cascade_defaults.py          # the RdSAP10 "assume-typical" rules (currently in rdsap_uvalues.py)
```

The existing `domain.sap10_ml.*` modules stay where they are during Session A; they continue serving the live ML pipeline. Session B promotes them into `domain.sap10_calculator.*` once parity is reached.

## Sap10Calculator interface

```python
@dataclass(frozen=True)
class SapResult:
    sap_score: float
    energy_cost_rating: float          # alias for sap_score before band lookup
    sap_band: str                      # A-G
    co2_emissions_kgco2_per_m2: float
    peui_raw_kwh_per_m2: float
    peui_ucl_kwh_per_m2: float
    space_heating_kwh_per_yr: float
    hot_water_kwh_per_yr: float
    monthly_breakdown: MonthlyBreakdown
    intermediate: dict[str, float]     # every named worksheet quantity, for traceability

class Sap10Calculator:
    def __init__(self, climate: ClimateData, pcdb: Optional[PcdbLookup] = None) -> None: ...
    def calculate(self, epc: EpcPropertyData) -> SapResult: ...
```

`intermediate` carries every named SAP10.3 worksheet variable (envelope conduction W/K, ventilation rate, solar gains by month, utilisation factor, heat-pump SCOP, ECF, ...) so consumers can drill down. This replaces ADR-0008's physics-as-feature columns for retrofit-simulation consumers; the ML pipeline keeps generating them as features until the residual head is trained and validated.

## Validation

Two corpora:

1. **Calculator-vs-cert parity (Session B).** Run the calculator over 1000 randomly-sampled RdSAP-10 certs from `data/ml_training/runs/2025_2026_n250000_v18a/data.parquet`. Compare `Sap10Calculator.calculate(epc).sap_score` to the cert's `energy_rating_current`. Target: MAE ≤ 1.0 on 95% of certs; outliers investigated case-by-case to find spec-interpretation gaps or PCDB requirements.
2. **Residual ML head (Session C+).** Train LightGBM on `actual_sap − calculator_sap` as the target. Validate that residual MAE is materially smaller than the current 1.86 global / 3.86 d0. If residual MAE on d0 falls below 0.5, the calculator + residual approach hits the user's target.

We do **not** retire the existing ML pipeline until both validations pass.

## What this ADR does *not* change

- **The six ML targets remain those from ADR-0007.** The residual head predicts deltas against the same six quantities.
- **ADR-0008's physics-as-feature pattern stays valid for the ML residual head.** The residual head probably needs fewer features, but the cascade U-value defaults and SAP efficiency lookups remain useful as feature builders if the calculator subset alone underfits.
- **`energy_rating_current` remains excluded from features.** Same leakage rule.
- **RdSAP 10 cert-extraction rules are now first-class in the codebase.** Rules that were ad-hoc in `transform.py` move into `domain.sap10_calculator.rdsap.appendix_s`.
- **The training parquet schema continues at v2.x.** A new column `calculator_sap_score` lands as a non-breaking addition once Session A reaches parity. The schema version bumps to v3.0.0 only when the residual targets replace the raw targets — a coordinated AutoGluon-repo deploy, per ADR-0008's cutover discipline.

## SAP 10.2 → SAP 10.3 implications

The newer spec replaces tables we already ship:

- Table 4a/4b (heating efficiencies) — likely identical, verify on read.
- Table 32 (fuel prices) — almost certainly different, re-derive from Appendix in 10.3.
- Table 6d (solar flux) — likely identical (climate data).
- Energy cost rating formula constants — unchanged in 10.3 vs 10.2 unless DESNZ updated the deflator.

Re-derivation work is bounded — a few hundred numbers across tables — and the `*_table_*.py` modules already have a clean shape for the cutover.

## Session plan (carried from HANDOFF §High-value next slices)

- **Session A (3–4 hrs):** Implement ventilation per §2 (replacing the slice-20a tracer), 12-month heat balance per §6 + §8 + Appendix U, solar gains per §6 + Table 6d, internal gains per §5 + Appendix L, utilisation factor per §6.4, mean internal temperature per §7. End of Session A: `Sap10Calculator.calculate(epc) -> SapResult` runs on typical certs.
- **Session B (3–4 hrs):** Edge cases — conservatory modes, room-in-roof handling, multi-heating allocation, dual fuel, secondary heating fraction (Appendix A). Run parity validation across 1000 certs. Iterate on spec-interpretation gaps. End of Session B: 95% of typical certs within 1 SAP-point of cert value.
- **Session C (2–3 hrs):** PCDB integration for boiler + heat-pump overrides (Appendices D, N). Residual-head training on `actual_sap − calculator_sap`. ADR-0010 if any non-trivial calculator/ML hybrid pattern emerges that ADR-0009 didn't anticipate.

## Caveats

- **Spec interpretation will need product input.** 5–10 questions per session on edge cases: multi-heating split logic, secondary heating threshold rules, PCDB-vs-Table-4b precedence, etc. These are not in the spec text and are real business decisions.
- **No reference BRE Python port is currently known.** If one surfaces, porting accelerates. If not, every line of the calculator is implemented from the spec PDF directly, with tests.
- **PCDB (Product Characteristics Database).** SAP 10.3 references the PCDB throughout for boiler/HP efficiency overrides. Without PCDB integration, calculator carries ~1 SAP-point penalty on PCDB-listed equipment. Defer to Session C.
- **The current ML pipeline keeps running through all three sessions.** No deprecation until residual validation lands. The branch `ara-backend-design-prd` (current ML grind) and the calculator work proceed in parallel.

## Consequences

- A new top-level domain area `domain.sap10_calculator.*` is introduced; over Sessions B/C it absorbs `domain.sap10_ml.{envelope,demand,ecf,rdsap_uvalues,sap_efficiencies,ventilation}.py`. The ML transform stops shipping those as standalone features once the residual head takes over.
- The codebase carries two SAP outputs: cert-reported `sap_score` (ground truth at training time) and calculator-emitted `sap_score` (ground truth at inference time for any RdSAP cert input). The product layer chooses; for "score this hypothetical post-retrofit state", calculator wins.
- The deterministic calculator is **version-bound to SAP 10.3.** A future SAP 10.4 is a calculator MAJOR bump and an ADR. The ML residual head is SAP-version-agnostic only insofar as the residual distribution it learns stays stationary; in practice a spec bump retrains the residual head.
- Spec PDFs live in `domain/sap10_calculator/docs/specs/` (this repo). The repo now carries the canonical reference for what the calculator computes. License: SAP 10.3 © Crown copyright 2026; RdSAP 10 © BRE — both are public-interest references for SAP-compliant software, included for traceability.