docs: ADR-0010 retargets calculator to SAP 10.2; rewrite handover

Adds ADR-0010 superseding ADR-0009's spec-version target, PCDB sequencing, and cert-calibration layer. Captures the conclusions of a grill-with-docs session: 1. Active spec target is SAP 10.2 (14-03-2025), not SAP 10.3 — no SAP-10.3-lodged certs exist in the corpus to validate against. 2. table_12_cert_calibration is deleted (not "re-derived at the end"). It was pre-March-2025 spec prices fit against a mixture distribution of two spec-version regimes, with downstream- component bugs absorbed into the fit — not Elmhurst deviation. 3. Validation Cohort: filter the corpus to inspection_date ≥ 2025-07-01 so every cert in the probe was lodged on SAP 10.2 (14-03-2025) prices. One spec, one signal. 4. PCDB integration is promoted from "Session C deferred" to prerequisite P4 — dominates residual variance on heat pumps and the 78% of gas-boiler certs lodging main_heating_data_source=1. 5. Trace mode (SapResult.intermediate) and BRE worked-example fixtures replace the 7 cert-based golden fixtures, which contained compensating errors. 6. Strict-type EpcPropertyData via codes.csv-derived canonical enums (P6) — the in-source motivation lives at dimensions.py:74-82 (Khalim's comment, included in this commit). 7. Worksheet-faithful structure is a sweep-time principle: each worksheet module mirrors SAP 10.2 worksheet line numbering. CONTEXT.md additions: - Refined "Calculated SAP10 Performance" and "SAP10 Calculation" to reference SAP 10.2 + ADR-0010. - New term "SAP Spec Version" — domain-meaningful because the same EpcPropertyData yields different sap_score under different spec revisions. - New term "Validation Cohort" — the version-locked sub-corpus. HANDOVER_SYSTEMATIC_REVIEW.md is rewritten section-by-section to reflect ADR-0010: §1 framing, §2 status pointer, new §2.5 with the six prerequisites P1–P6 in dependency order, §3 diagnosis (cert-cal was stale prices, not Elmhurst deviation), §4 scope (PCDB IN, SAP 10.3 stays OUT), §5 approach (worksheet-faithful principle as §5.5), §7 tension dissolved, §7b findings re-framed, §8 dead-ends re-classified as conditional, §9 cohort filter, §10 fixture strategy, §11 trace mode as prerequisite, §12 prereqs-first, §13 Phase 0/Phase 1 workflow, §14 ADR-0010 reference, §15 final note. P2.1 (commit ac1aa56a) already lands the first ADR-0010 slice (probe swap to spec prices). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-08 11:17:27 +00:00 · 2026-05-19 09:54:24 +00:00 · 2026-05-19 09:54:24 +00:00 · bb9c5ac017
commit bb9c5ac017
parent ac1aa56ab1
4 changed files with 551 additions and 209 deletions
--- a/CONTEXT.md
+++ b/CONTEXT.md
@ -98,12 +98,20 @@ The SAP / EPC Band / carbon emissions / heat demand the modelling pipeline actua
 _Avoid_: modelled performance, rebaselined performance (only correct when rebaselining ran), scored values
 **Calculated SAP10 Performance**:
-The SAP score, EPC Band, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh produced by **SAP10 Calculation** from a Property's EpcPropertyData. Distinct from Effective Performance (ML output) and Lodged Performance (gov register) during the validation phase. Surfaced alongside Effective Performance in the UI; may supersede Effective Performance in a later ADR once parity is confirmed against the cert-reported SAP across ≥1000 sample certs. ADR-0009.
+The SAP score, EPC Band, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh produced by **SAP10 Calculation** from a Property's EpcPropertyData. Distinct from Effective Performance (ML output) and Lodged Performance (gov register) during the validation phase. Surfaced alongside Effective Performance in the UI; may supersede Effective Performance in a later ADR once parity is confirmed against the cert-reported SAP across ≥1000 sample certs lodged on the calculator's target spec version (see [[sap-spec-version]]). ADR-0009 (as amended by ADR-0010).
 _Avoid_: calculator output, computed performance, worksheet performance, SAP10 output
 **SAP10 Calculation**:
-The process that runs the deterministic SAP 10.3 worksheet over a Property's EpcPropertyData and emits **Calculated SAP10 Performance**. Implemented by the `Sap10Calculator` service class in `domain/sap/`. Reads cert fabric/heating/geometry fields, applies the RdSAP 10 cert→input mapping, executes the 12-month heat balance per SAP 10.3 §§1-14, and returns a `SapResult` carrying the five Calculated SAP10 Performance quantities plus a monthly breakdown and worksheet-line audit trail. Distinct from **Rebaselining**, which is ML-based. ADR-0009.
+The process that runs the deterministic SAP 10.2 (14-03-2025 amendment) worksheet over a Property's EpcPropertyData and emits **Calculated SAP10 Performance**. Implemented by the `Sap10Calculator` service class in `domain/sap/`. Reads cert fabric/heating/geometry fields, applies the RdSAP 10 (10-06-2025) cert→input mapping, executes the 12-month heat balance per SAP 10.2 §§1-14, looks up boiler/heat-pump performance in the **PCDB** when the cert lodges a product index, and returns a `SapResult` carrying the five Calculated SAP10 Performance quantities plus a monthly breakdown and worksheet-line audit trail. Distinct from **Rebaselining**, which is ML-based. ADR-0009 originally targeted SAP 10.3 (13-01-2026); ADR-0010 retargets to SAP 10.2 (14-03-2025) until the cert corpus migrates.
-_Avoid_: SAP calculation (ambiguous with the gov calculator), SAP scoring, calculator run
+_Avoid_: SAP calculation (ambiguous with the gov calculator), SAP scoring, calculator run, SAP 10.3 calculation (active target is 10.2 — see [[sap-spec-version]])
 **SAP Spec Version**:
 The dated revision of the SAP specification that produced a given SAP/PEUI/CO2 value. Domain-meaningful because the same EpcPropertyData yields different `sap_score` under different spec versions — fuel-price tables, CO2 factors, PCDB references, and rating-equation deflators all change between revisions. **Lodged Performance** carries the version current when the cert was lodged (mostly SAP 10.1 / SAP 10.2 pre- and post-14-03-2025 amendment in the corpus). **Calculated SAP10 Performance** is locked to SAP 10.2 (14-03-2025). A 1-to-1 Lodged-vs-Calculated comparison therefore only makes sense within a **Validation Cohort** of certs lodged on the same spec version.
 _Avoid_: SAP version (ambiguous with the `sap_version` field on the cert, which only carries the major version like 10.2 — not the amendment date), spec revision
 **Validation Cohort**:
 The subset of corpus certs used to validate **SAP10 Calculation** against **Lodged Performance**, filtered to certs lodged after the calculator's target **SAP Spec Version** rolled out in commercial assessor software — currently `inspection_date ≥ 2025-07-01` (a buffer past 14-03-2025 to allow vendor rollout). Smaller than the full corpus but each cert is comparable under the same spec, so probe MAE is a clean signal of calculator-vs-spec correctness rather than spec-version mixture noise. ADR-0010.
 _Avoid_: parity cohort, validation set, corpus sample
 **Measure Application**:
 The process that translates an Optimised Package into cert-field changes and produces the "ending state snapshot" EpcPropertyData that Plan Phase persists. Implemented by the `MeasureApplicator` service class in `domain/sap/` (or a sibling package). Each Measure Type's translation rules (e.g. `loft_insulation` → `roof_insulation_thickness_mm = 270mm`, `ashp` → `main_heating_details[0]` replacement) live here. Pure function — does not run SAP10 Calculation itself; the caller chains `MeasureApplicator.apply(epc, package) → Sap10Calculator.calculate(post_epc)`. ADR-0009.
--- a/docs/adr/0010-sap10-calculator-spec-target-and-validation.md
+++ b/docs/adr/0010-sap10-calculator-spec-target-and-validation.md
@ -0,0 +1,68 @@
 # Retarget Sap10Calculator to SAP 10.2 (14-03-2025); delete cert-calibration; validate on a spec-version-locked cohort
 **Status: Accepted.** Supersedes the spec-version target, the PCDB sequencing, and the cert-calibration layer of [ADR-0009](0009-deterministic-sap-calculator.md). Adds strict typing of `EpcPropertyData` (P6) and a worksheet-faithful structural principle for the `domain/sap/worksheet/*` modules — both new concerns ADR-0009 didn't address. All other ADR-0009 decisions stand (Calculated SAP10 Performance as a glossary term, MeasureApplicator/Sap10Calculator chain, MCS boolean default-false, global thermal-bridging y factor, Table 27 living-area fraction, Table 11 secondary-heating allocation, MeasureOverrides rejection).
 ## Why this ADR exists
 ADR-0009 was written before a second-order problem in the validation corpus was visible: the 250k-cert training parquet spans **multiple SAP spec versions** (SAP 10.1 from 2019, SAP 10.2 pre- and post-14-March-2025 amendment), each of which was the active table when its certs were lodged. The prior session's `domain.sap.tables.table_12_cert_calibration` layer was implicitly absorbing this version mixture into a single "best fit" price set ~10–25 % lower than the SAP 10.2 (14-03-2025) spec — closer to the SAP 10.1 era prices. Every spec-correctness slice that touched a downstream component (HW cylinder zero-loss, gas standing charges, Table 12a fractional blending) registered as a regression on the parity probe because the cert-cal layer had been numerically calibrated against the buggy state of every other component.
 This ADR resolves four entangled decisions at once. They are coupled — none of them is the right call in isolation.
 ## Decisions
 ### 1. Active spec target is **SAP 10.2 (14-03-2025)**, not SAP 10.3
 ADR-0009 named SAP 10.3 (13-01-2026) as the calculator's target. No SAP-10.3-lodged certs exist in the corpus; assessor software has not migrated. Targeting SAP 10.3 produces a calculator whose output is verifiable against no cert. The active target is SAP 10.2 (14-03-2025 amendment) — both the document RdSAP 10 (10-06-2025) cross-references for heating-system identification, and the amendment that current assessor software is on.
 `packages/domain/src/domain/sap/tables/table_12.py` is re-labelled as SAP 10.2 (14-03-2025). Its CO2 factors are corrected to spec (0.210 kg/kWh mains gas, 0.136 kg/kWh standard electricity — the file currently has SAP 10.3 values 0.214 and 0.086). Prices already match SAP 10.2 (3.64 p mains gas, 16.49 p standard electricity, etc.) — the misleading "+25 % shift from SAP 10.2 to 10.3" comment is removed; the 13.19 p figure is from SAP 10.1, not SAP 10.2.
 A future ADR retargets to SAP 10.3 once the cert corpus migrates (expected late 2026 or 2027 once BRE updates RdSAP to reference SAP 10.3).
 ### 2. `table_12_cert_calibration` is deleted
 The cert-calibration table is bug-masking. Its prices are pre-March-2025 SAP values fit against the average cert in a mixed-version corpus, with downstream-component bugs absorbed into the fit. Removing it forces upstream errors to surface where they live, in the component that owns them, instead of being silently compensated for by a price tweak.
 This includes the `cert_calibration_e7_codes` extension that routes codes 191–196 (direct-electric) and 691–696 (room heaters) to off-peak rates — Table 12a is explicit that "other direct-acting electric heating" bills 100 % at the high rate on a 7-hour tariff. The S-B14 finding that motivated this hack is in §8 of the handover as a documented dead-end.
 `domain.sap.tables.table_12.unit_price_p_per_kwh` becomes the only price API. Parity probes are updated to use it.
 ### 3. Validation Cohort is filtered to a single spec-version window
 Probe MAE against the full 250k-cert corpus measures both calculator correctness *and* the spec-version drift across certs lodged at different times. Without separating them, every spec-correctness improvement is noisy.
 The **Validation Cohort** is the subset of corpus certs with `inspection_date ≥ 2025-07-01` — chosen to allow ~4 months past the 14-March-2025 SAP 10.2 amendment for commercial assessor software to roll out the new tables. Filtering to this cohort yields a probe where every cert was lodged on the same spec version the calculator targets. MAE on the Validation Cohort is the only metric used for spec-sweep go/no-go.
 This requires re-extracting the training parquet to include `inspection_date` (currently dropped by the ETL — 202 columns, none of them dates). That extraction is a prerequisite slice.
 ### 4. PCDB integration is promoted from Session C to a prerequisite
 ADR-0009 deferred PCDB to Session C and shipped a `NoOpPcdbLookup` stub. The handover's own measurements show PCDB absence accounts for ~19 SAP points of MAE on heat-pump certs (Table 4a fallback SCOP 2.30 vs typical PCDB 2.80–3.50) and most per-cert variance on the 78 % of gas-boiler certs lodging `main_heating_data_source=1` (category-default 0.80 vs typical PCDB 0.88–0.94). The handover's rationale for deferral ("cert-cal absorbs PCDB gaps") collapses with decision (2).
 PCDB lookup against `main_heating_index_number` is built before the section-by-section sweep starts. Data source: https://www.ncm-pcdb.org.uk — CSV exports of boilers and heat pumps. Per-product fields needed: seasonal efficiency, secondary efficiency, output kW, flow-temperature curve (heat pumps). The `NoOpPcdbLookup` seam from ADR-0009 grill outcome #1 is the integration point; the stub returns None and the calculator falls back to Table 4a only when the cert lodges no `main_heating_index_number` or the PCDB has no matching record.
 ## Verification infrastructure (also prerequisites)
 Three pieces of infrastructure are built before the section sweep so per-section verification has unambiguous signal:
 1. **Trace mode populated.** ADR-0009 specced `SapResult.intermediate: dict[str, float]` and it was never built. Every named SAP 10.2 worksheet variable (heat transfer coefficient, mean internal temperature, monthly solar gains, utilisation factor, ECF, etc.) is exposed on `intermediate` so any single cert can be diffed against a hand-computed value, a BRE worked example, or a future Elmhurst reference trace.
 2. **BRE worked-example unit tests.** SAP 10.2 spec appendices and RdSAP 10 worked examples are transcribed as fixtures keyed on per-intermediate expected values, not aggregate SAP score. These replace the 7 cert-based golden fixtures (which contained compensating errors per the handover §10). The cert fixtures are retired.
 3. **Strict typing of `EpcPropertyData` via canonical domain enums.** Bare `str` and `Union[int, str]` fields (the latter because the gov API gives ints and Site Notes give strings) cascade defensive type-handling into every consumer — the calculator's `dimensions.py:74-82` is Khalim's documented example. The domain holds one canonical enum per field, derived from `datatypes/epc/domain/epc_codes.csv` (union of keys across schema versions, hand-authored). The API mapper and Site Notes mapper each adapt their raw input to the canonical enum. Repo-wide test compatibility is a hard constraint — every consumer of `EpcPropertyData` (calculator, ML pipeline, recommendations, ETL) continues working after the typing pass. Pyright `strict` mode stays clean.
 These map to prerequisites P5 (trace mode + BRE fixtures) and P6 (strict typing) in the handover §2.5.
 ## Worksheet-faithful structure (sweep-time principle)
 Each `domain/sap/worksheet/*.py` module must mirror the SAP 10.2 worksheet structure for its section — function names reference their worksheet-line origin (e.g. `heat_transfer_coefficient` aligns with worksheet line (40)), compound calculations split into one function per line where possible, defensive type-handling replaced by typed-enum dispatch. This is not a prerequisite slice; the refactor lands as part of each section's sweep slice, verified by the BRE worked examples (which assert per-intermediate values).
 ## Consequences
 - ADR-0009's "MAE ≤ 1.0 SAP-point on typical subset" success criterion is restated against the Validation Cohort (not the full corpus). The "typical subset" exclusions in ADR-0009 (sap_score ≤ 5, ≥ 100, multi-heating, conservatory, RIR) still apply on top of the cohort filter.
 - The training parquet schema bumps when `inspection_date` is added — a non-breaking MINOR addition under [ADR-0008](0008-physics-as-feature.md)'s `Feature Schema Version` discipline.
 - The handover document `docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md` is rewritten in lockstep: §3 (diagnosis), §4 (scope), §7 (state-A-vs-state-B framing deleted), §7b (findings re-framed), §10 (fixture strategy), and a new §2.5 listing the five prerequisites.
 - Sessions A/B/C from ADR-0009 collapse into a single sequence: prerequisites land, then the section sweep runs against a clean probe with PCDB available.
 ## Considered alternatives
 - **Build versioned Table 12 (pre/post 14-March-2025) keyed on `inspection_date` and validate across the full corpus.** Rejected as more work for no signal benefit during the spec sweep — the filtered cohort gets us to a clean probe faster. A versioned table is still future work if Calculated SAP10 Performance ever needs to reproduce historical cert SAP for products that compare against Lodged Performance directly.
 - **Keep cert-cal during the sweep and re-derive at the end** (the handover's prescription). Rejected for the reasons in decision (2): the cert-cal layer corrupts the signal during the sweep, which is precisely when the signal needs to be cleanest.
 - **Pay for an Elmhurst license, lock fixtures to its output.** Held in reserve. BRE worked examples are free and spec-derived; an Elmhurst trace would add value as a per-component reference but is not a prerequisite.
--- a/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md
+++ b/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md
@ -24,10 +24,12 @@ The SAP/RdSAP energy assessment splits cleanly into two roles:
   takes the lodged fields and produces SAP score, CO2 emissions,
   primary energy (PEUI), CO2 per m², EI rating, etc.
-**Our calculator is replicating role #2.** Where Elmhurst's
+**Our calculator is replicating role #2.** Assessor software
-implementation diverges from spec, we follow Elmhurst, but we don't
+implements the SAP 10.2 spec faithfully; the question of "where does
-guess at divergence; we localise it via reference traces or
+Elmhurst diverge from spec?" is no longer the operative one (per
-empirically against the cert corpus.
+ADR-0010 + §3 below). Our job is to enumerate every spec
 table / formula / footnote and verify each against the published SAP
 10.2 (14-03-2025) and RdSAP 10 (10-06-2025) PDFs.
 There is no "assessor judgement" knob to tune. Each field on the cert
 has a deterministic interpretation per the spec. Each spec table /
@ -57,109 +59,259 @@ all of them and verify each.
  Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10 kWh/m²`. Known
  caveat: some of these are compensating-error matches (e.g. cert
  `7536-3827`'s PE matches but cost is £143 under cert's implied cost
-  due to multi-factor offsetting bugs).
+  due to multi-factor offsetting bugs). **These fixtures are retired
  per ADR-0010 and §10 below — they lock buggy compensating outputs
  in place and will fight the spec sweep.**
 > **Read this before anything else.** [ADR-0010](../adr/0010-sap10-calculator-spec-target-and-validation.md)
 > supersedes the spec-version target, the PCDB sequencing, and the
 > cert-calibration layer of ADR-0009. This handover document was
 > originally written under the rejected framing; §3, §4, §7, §7b,
 > §10 below have been rewritten in lockstep. §2.5 lists the five
 > prerequisites that land **before** the section-by-section sweep
 > starts.
 ---
-## 3. Why we are pivoting to systematic review
+## 2.5. Prerequisites before the sweep starts
 Five blockers, in dependency order. The section sweep does not start
 until all five are merged. Together they convert the parity probe
 from a noisy mixture-distribution signal into a clean per-section
 verification tool.
 ### P1 — Re-extract the training parquet with `inspection_date`
 The 250k-cert parquet has 202 columns; **none of them are dates**.
 Without `inspection_date` on each cert we cannot construct the
 Validation Cohort (P3). The ETL currently drops the dates; add them
 back as a non-breaking MINOR Feature Schema Version bump (per
 ADR-0008). `EpcPropertyData.inspection_date` and `.registration_date`
 both exist on the domain object and are populated upstream — the
 parquet writer just needs to include them.
 ### P2 — Delete `domain.sap.tables.table_12_cert_calibration`; correct `domain.sap.tables.table_12`
 Per ADR-0010 §2 and §1:
 - Remove `table_12_cert_calibration.py` and every call site
  (`cert_calibration_prices()`, `cert_calibration_e7_codes`, the
  `PriceTable` constructor argument that defaults to it).
 - Re-label `table_12.py` as `SAP 10.2 Table 12 (14-03-2025 amendment)`.
 - Correct CO2 factors: mains gas 0.214 → **0.210**, standard electricity 0.086 → **0.136** (the file currently mixes SAP 10.2 prices with SAP 10.3 CO2 factors).
 - Delete the misleading "+25 % shift from SAP 10.2" comment — 13.19 p
  is SAP 10.1 (or SAP 10.2 amendment 0), not SAP 10.2 (14-03-2025).
 ### P3 — Filter the parity probe to the Validation Cohort
 `Validation Cohort` is defined in `CONTEXT.md` and ADR-0010 §3:
 `inspection_date ≥ 2025-07-01`. Modify
 `services/ml_training_data/src/ml_training_data/sap_parity_probe.py`
 to apply the filter before sampling. The probe sample size and seed
 remain configurable; `sap_score ∈ [5, 99]` remains the typicality
 filter on top of the cohort filter.
 ### P4 — Implement `PcdbLookup` (replace `NoOpPcdbLookup`)
 Per ADR-0010 §4. Download boiler + heat-pump CSVs from
 https://www.ncm-pcdb.org.uk. Build a lookup keyed on
 `main_heating_index_number`. Surface seasonal efficiency, secondary
 efficiency, output kW, and (for HPs) flow-temperature curve. ~half-day
 of work per the original handover estimate. The
 `Sap10Calculator.__init__(pcdb: Optional[PcdbLookup])` seam from
 ADR-0009 grill outcome #1 is the integration point; no calculator-side
 changes needed beyond reading `index_number` and routing PCDB-returns
 to space-heating / hot-water efficiency lookups instead of Table 4a.
 ### P5 — Populate `SapResult.intermediate` + transcribe BRE worked examples
 Per ADR-0010 "Verification infrastructure":
 - Populate every named SAP 10.2 worksheet variable on
  `SapResult.intermediate` as sketched in §11. This is mechanical —
  thread the values from each worksheet module into the dict.
 - Transcribe the BRE worked examples from the SAP 10.2 appendices and
  RdSAP 10 worked-example annex into unit tests
  (`tests/test_bre_worked_examples.py`) that lock per-intermediate
  values, not aggregate SAP. These replace the retired cert fixtures.
 ### P6 — Strict-type `EpcPropertyData` via canonical domain enums
 The current `EpcPropertyData` and its nested types carry many bare
 `str` fields and `Union[int, str]` fields (the latter because the
 gov API gives ints and Site Notes give strings). The defensive
 type-handling cascades into the calculator (`cert_to_inputs.py`,
 `dimensions.py`, etc.) — `dimensions.py:74-82` is Khalim's documented
 example: `SapBuildingPart.identifier` carries main-vs-extension
 information but is bare `str`, so the dimensions code defensively
 iterates instead of dispatching on a typed kind.
 The fix:
 1. **One canonical enum per field**, union of all keys appearing
   across all schema versions in
   `datatypes/epc/domain/epc_codes.csv`. Hand-author the 18 enum
   classes (`built_form`, `construction_age_band`, `energy_tariff`,
   `glazed_area`, `glazed_type`, `heat_loss_corridor`, `main_fuel`,
   `mechanical_ventilation`, `property_type`, `tenure`,
   `transaction_type`, `ventilation_type`, `water_heating_fuel`,
   `cylinder_insulation_thickness`, `energy_efficiency_rating`,
   `improvement_description`, `improvement_summary`, `code`) plus
   `BuildingPartKind` (Main Dwelling / Extension N). codes.csv is
   the reference; a dedup script can optionally verify coverage but
   is not a build dependency.
 2. **The API mapper** parses raw ints into the canonical enum.
 3. **The Site Notes mapper** parses raw strings into the canonical
   enum.
 4. **The domain object** (`EpcPropertyData` and nested) holds only
   the canonical enums — no `Union[int, str]`, no bare `str` for
   coded fields.
 5. **Every consumer** (calculator, ML pipeline, recommendations,
   ETL, scenario builder) reads from the typed fields.
 **Constraint**: repo-wide tests must keep passing. The calculator
 is one consumer; the ML pipeline, recommendations, and the Site
 Notes ingestion path also consume `EpcPropertyData`. Each mapper-
 layer change is paired with adapter updates that preserve the
 behaviour the existing tests cover.
 Pyright `strict` mode must remain clean (CLAUDE.md).
 ### Expected outcome of P1–P6
 After all six land, run the probe against the Validation Cohort. The
 expected baseline MAE on the clean probe is much smaller than the
 current 4.61 — likely 1.5–2.5 SAP-points based on what we know about
 the residual breakdown (heat pumps closed by P4, gas boilers tightened
 by P4, price-version noise removed by P2+P3). The remaining residual
 is the genuine spec sweep target — and per-section fixes will move
 the probe in measurable, distinguishable amounts because there's no
 compensating layer to mask them, and there's no defensive type
 branching obscuring which input value drove which intermediate.
 ---
 ## 3. Why the prior diagnosis was wrong and how we fixed it
 The prior session shipped ten slices (S-B23 → S-B31) by debugging the
 biggest residuals one at a time:
 - **PE MAE dropped substantially: 57.28 → 43.32 (−14)** — real progress
  on the demand-side calculation.
- **SAP MAE barely moved: 5.34 → 4.61 (−0.73)** — the cost-side is
+- **SAP MAE barely moved: 5.34 → 4.61 (−0.73)** — diagnosed at the time
-  bottlenecked by cert-calibration prices that absorb multiple
+  as "cert-calibration absorbs multiple spec deviations".
  structural deviations from spec, making any single slice that fixes
  one component break the calibration for others.
-Two failed slice attempts in the prior session exposed the pattern:
+Three slice attempts looked like they "proved" the cert-cal-absorbs-
 deviations diagnosis:
- **Standing charges**: spec note Table 12 (a) clearly says gas standing
+- **Standing charges**: spec Table 12 note (a) requires £92/yr gas
-  charge of £92 is added to space + water heating costs for energy
+  standing charge on space + water heating. Adding it pushed SAP bias
-  ratings. Empirically: adding it pushed SAP bias from +0.98 to −2.62.
+  +0.98 → −2.62. Reverted.
-  Reverted before committing.
+- **Cat=10 room heaters off-peak routing**: Table 12a says "other
- **Cat=10 room heaters off-peak routing**: Table 12a clearly says
+  direct-acting electric heating" bills 100 % high rate on 7-hour
-  "Other direct-acting electric heating" bills 100% high rate on
+  tariff. Switching cat=10 from off-peak to standard rate inverted
-  7-hour tariff. Empirically: switching cat=10 from off-peak to
+  the bias +5.88 → −6.00. Reverted.
-  standard rate inverted the bias from +5.88 to −6.00 without
+- **HW cylinder zero-loss for combi** (uncommitted): Table 2 + Table
-  improving MAE. Reverted before committing.
+  3 footers require zero storage + primary loss when efficiency comes
- **Hot water cylinder loss (uncommitted)**: spec Table 2 footer +
+  from Table 4b. Zeroing them dropped PE MAE −6.64 but raised SAP
-  Table 3 footer clearly say combi boilers using Table 4b efficiency
+  MAE +0.39 and broke 3 of 7 golden fixtures. Reverted.
  have zero storage + primary loss. Empirically: zeroing them dropped
  PE MAE −6.64 (huge improvement) but raised SAP MAE +0.39 AND broke
  3 of 7 golden fixtures. Reverted because no way to know whether to
  follow spec (PE-correct) or Elmhurst (SAP-MAE-correct) without
  reference traces.
-The pattern: **the cert-calibration prices** (in
+The prior agent concluded: *cert-calibration absorbs Elmhurst's
-`domain.sap.tables.table_12_cert_calibration`) **were reverse-engineered
+deviations from spec — we can't fix one without re-deriving the
-to match Elmhurst's output assuming all our other calculations are
+calibration, so do a full spec sweep first and re-derive cert-cal at
-correct.** When we fix a spec-violation bug in some other component, we
+the end.* This diagnosis is **wrong** and the proposed remedy
-break the calibration and SAP MAE goes up even though we're more
+amplifies the problem.
 spec-correct.
-This means **whack-a-mole on the biggest residual won't converge**. We
+### What was actually going on
-need to systematically verify every component against the spec, then
+
-re-derive the cert-calibration once at the end.
+The 250k-cert corpus spans multiple SAP spec-version regimes:
 - **Pre-2025-03-14**: certs lodged under SAP 10.1 / SAP 10.2 amendment
  0 prices — mains gas ~3.48 p, standard electricity 13.19 p.
 - **Post-2025-03-14**: certs lodged under SAP 10.2 (14-03-2025) prices
  — mains gas 3.64 p, standard electricity 16.49 p.
 The `table_12_cert_calibration` prices (3.48 p / 13.19 p) are **the
 older spec's prices**, not Elmhurst deviations from the spec. They
 are an empirical "best fit" across a mixture distribution of two
 price regimes, with downstream-component bugs (PCDB absence, HW
 cylinder loss applied to combi, etc.) absorbed into the fit. The
 table looks like compensation for assessor-software quirks because we
 were never told which spec each cert was on.
 Each "spec-correct fix that worsened MAE" in the failed slices above
 was actually correct. The MAE regressed because:
 1. The cert-cal prices (pre-March-2025 spec) cancelled with one set
   of downstream errors to produce a quasi-stable cost.
 2. The spec-correct fix landed → that cancellation broke → the
   probe MAE went up.
 3. But the spec-correct fix was *right* — what regressed was a
   compensating-error equilibrium, not the calculator's truth.
 The prior session's "re-derive cert-cal at the end" plan would
 re-establish a new compensating-error equilibrium across the new bug
 set. It does not converge on spec-correctness.
 ### The fix (per ADR-0010)
 1. **Stop fitting against a mixture distribution.** Filter the
   validation corpus to a single spec-version window (Validation
   Cohort, `inspection_date ≥ 2025-07-01`). Every cert in the cohort
   was lodged on SAP 10.2 (14-03-2025) prices.
 2. **Delete the cert-calibration layer.** Use spec prices everywhere
   (`domain.sap.tables.table_12`). The only price-routing decision
   left is Table 12a fractional high-rate blending — a real spec
   feature, not a calibration.
 3. **Build PCDB**, because it dominates residual variance and the
   reason it was deferred (cert-cal-absorbs-PCDB) no longer holds.
 4. **Build trace mode and BRE worked-example fixtures**, so
   per-section verification works against single-cert intermediates
   instead of aggregate corpus MAE.
 This is what §2.5 lists as the five prerequisites. Once they land,
 the section-by-section spec sweep produces clean, monotonic
 improvements.
 ---
-## 4. Scope decisions
+## 4. Scope decisions (per ADR-0010)
 ### IN scope
- **RdSAP 10 specification (10-06-2025)** — full document, all sections
+- **SAP 10.2 (14-03-2025 amendment)** is the active spec target.
-  (`docs/sap-spec/rdsap-10-specification-2025-06-10.pdf`, 114 pages).
+  `docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf`, 199 pages.
- **SAP 10.2 full specification (14-03-2025)** — the worksheet, tables,
+- **RdSAP 10 (10-06-2025)** — the cert→input mapping layer that
-  appendices that RdSAP 10 references
+  cross-references SAP 10.2. `docs/sap-spec/rdsap-10-specification-2025-06-10.pdf`,
-  (`docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf`, 199 pages).
+  114 pages.
 - **PCDB integration.** Moved from "Session C deferred" to **P4
  prerequisite** (§2.5). Heat pumps and the 78 % of gas-boiler certs
  lodging `main_heating_data_source=1` need PCDB-sourced efficiency
  for the calculator to be spec-correct. Data source:
  https://www.ncm-pcdb.org.uk; lookup keyed on `main_heating_index_number`;
  fields: seasonal efficiency, secondary efficiency, output kW,
  flow-temperature curve (HPs).
 - **All RdSAP 10 sections in document order.** §1 → §§19, plus
  Tables 27 / 28 / 29 / 30 / 31. The verification approach in §5 is
  unchanged — only the precondition changes: the sweep runs against a
  clean probe (Validation Cohort + spec prices + PCDB + trace mode).
-### OUT of scope (for now)
+### OUT of scope
- **Full SAP assessments.** Full-SAP certs lodge a measured/calculated
+- **Full SAP assessments.** Full-SAP certs lodge measured/calculated
-  U-value in `walls[i].description` (e.g.
+  U-values in `walls[i].description` (e.g.
  "Average thermal transmittance 0.18 W/m²K"). These are a separate
-  calculation path (BS EN ISO 6946) and a different corpus. **Park them
+  calculation path (BS EN ISO 6946) and a different corpus. Park
-  until the RdSAP 10 base case matches Elmhurst.** S-B24 / S-B29
+  until the RdSAP 10 base case parity is reached. S-B24 / S-B29
  attempted partial handling; those slices can stay or be reverted at
  your discretion when you reach §§4-7 of RdSAP and §3 of SAP 10.2.
- **PCDB (Product Characteristics Database).** ADR-0009 deferred this
+- **SAP 10.3 (13-01-2026).** No SAP-10.3-lodged certs in the corpus,
-  to Session C. **This is a real future task, not a permanent
+  so it cannot be validated. Calculator targets SAP 10.2 until the
-  exclusion.** Heat pumps (cat=4) have catastrophic per-cert MAE (19
+  corpus migrates (expected late 2026 / 2027 once BRE updates RdSAP
-  SAP points) because we use Table 4a fallback efficiency 2.30
+  to reference SAP 10.3). Note: `table_12.py` currently mixes SAP
-  instead of PCDB SCOP (typically 2.80-3.50). Gas boilers with
+  10.2 prices with SAP 10.3 CO2 factors — corrected as part of P2.
-  `main_heating_data_source=1` (78% of corpus boiler certs) fall back
+- **Historical-spec cert reproduction.** Calculating what cert SAP
-  to a category-default 0.80 vs typical PCDB-listed condensing-boiler
+  *would have been* under SAP 10.1 / pre-March-2025 SAP 10.2 prices is
-  efficiencies of 0.88-0.94 — that's most of the per-cert SAP residual
+  not the calculator's job. Lodged Performance carries the historical
-  variance on gas certs.
+  value; Calculated SAP10 Performance is current-spec only. The
-
+  Validation Cohort filter operationalises this — older certs are
-  A `NoOpPcdbLookup` stub seam exists in Session A (per ADR-0009 grill
+  out of the validation loop, not because they're "wrong" but because
-  outcome #1). The fetch+parse work is non-trivial:
+  they're a different spec's output.
-  - **Data source**: BRE PCDB at https://www.ncm-pcdb.org.uk —
+- **Re-deriving cert-cal at the end.** The prior session's plan. The
-    boilers + heat pumps are downloadable CSVs (thousands of rows
+  cert-calibration layer is deleted in P2, not re-fit.
    each).
  - **Lookup key**: cert lodges `main_heating_index_number` which is
    the PCDB product ID. Match by that.
  - **Per-product fields needed**: seasonal efficiency, secondary
    efficiency, output kW, flow-temperature curve (for HPs).
  - **Effort**: ~half-day for the lookup + tests; ongoing maintenance
    when BRE publishes new PCDB revisions.
  **Recommended sequencing**: complete the systematic RdSAP spec
  sweep first. Once the spec-correct engine is built and cert-cal
  re-derived, PCDB integration should drop heat-pump residuals from
  19 SAP points to ~1, and tighten the gas-boiler residual variance.
  At that point heat pumps (cat=4) and PCDB-listed boilers
  (`main_heating_data_source=1`) become accessible.
  **Why not now**: the cert-calibration prices currently absorb the
  missing PCDB efficiency (HP costs at off-peak rate compensates for
  too-low SCOP). Fixing PCDB without re-deriving cert-cal would push
  HP certs in the wrong direction. Same lesson as the other reverted
  fixes in §7b — fix the spec layer first, the calibration layer
  later.
 - **SAP 10.3** (13-01-2026). The corpus is SAP 10.2. SAP 10.3 has
  identical Table 12 codes (only values shift). Don't update spec
  references to 10.3 until the corpus migrates.
 ---
@ -208,22 +360,49 @@ For each table, formula, footnote, exception:
 3. Are there spec-defined edge cases / footnotes we're missing?
 ### 5.4. When a gap is found
- Write a failing unit test that asserts the spec-correct behaviour.
+- Write a failing unit test that asserts the spec-correct behaviour
  — wherever possible, write it as an assertion on `intermediate`
  values rather than on aggregate SAP, using a BRE worked example
  if one covers the section.
 - Implement the fix.
- Run **all 7 golden fixtures** plus the broader probe. Note both
+- Run `test_bre_worked_examples.py` plus the Validation Cohort
-  direction and magnitude of change.
+  probe. Note both direction and magnitude of change.
- If the fix is spec-correct but breaks a golden fixture, this is
+- If a BRE worked-example breaks, the new code is wrong (revert).
-  evidence that the fixture was a compensating-error case — proceed
+  BRE examples are spec-derived and cannot regress from a
-  with the spec-correct fix and update the fixture (with a comment
+  spec-correct change.
-  noting it was a compensating case).
+- Commit per-slice: one section → one commit. Reference the spec
- Commit per-slice as before: one section → one commit. Reference the
+  section in the commit message.
  spec section in the commit message.
-### 5.5. Use trace mode when you need it
+### 5.5. Sweep-time principle: worksheet-faithful structure
-ADR-0009 specifies a `SapResult.intermediate: dict[str, float]` field
+
-that was never populated. Adding this is highly recommended for the
+Each `worksheet/*.py` module must mirror the SAP 10.2 worksheet
-systematic pass — each section's verification benefits from
+structure for its section. As you verify a section, also restructure
-inspecting the intermediate values. See §11 below for a sketch.
+its module so that:
 1. **Each function name references its worksheet-line origin** (e.g.
   `heat_transfer_coefficient` aligns with worksheet line (40);
   `mean_internal_temperature` aligns with worksheet line (93)).
 2. **Compound calculations are split** into one function per
   worksheet line where possible — easier to verify against
   `intermediate[...]` and against BRE worked-example values.
 3. **Defensive type-handling disappears**. Once P6 lands, the input
   is a typed enum or numeric — branching on `isinstance(x, int)` is
   replaced by enum dispatch.
 4. **Domain-typed inputs flow directly**. `SapBuildingPart.kind ==
   BuildingPartKind.MAIN_DWELLING` replaces string sniffing of
   `identifier`. The dimensions.py "unnecessarily complicated"
   pattern Khalim flagged is the canonical example of what *not*
   to do.
 The principle applies during section-sweep slices. It is **not**
 a separate prerequisite — the refactor lands with the verification
 slice for the section it touches.
 ### 5.6. Use trace mode when you need it
 P5 populates `SapResult.intermediate: dict[str, float]` with every
 named SAP 10.2 worksheet variable. Each section's verification
 benefits from inspecting these values per-cert. See §11 below for
 the sketch.
 ---
@ -350,59 +529,60 @@ touched and what the current state is.
 ---
-## 7. The cert-calibration vs spec-correctness tension
+## 7. The cert-calibration "tension" is dissolved (per ADR-0010)
-This is THE central architectural decision you have to make as you
+This section originally framed cert-calibration vs spec-correctness as
-work through the spec.
+two end-states the calculator had to choose between. That framing is
 wrong (see §3 for the actual diagnosis): the cert-cal values are
 pre-March-2025 SAP prices, not Elmhurst deviations from SAP 10.2.
 Once the corpus is filtered to the Validation Cohort (P3) and the
 cert-cal layer is deleted (P2), the false dichotomy disappears.
-### Two tables of fuel prices
+### What replaces this section
 - `domain.sap.tables.table_12.UNIT_PRICE_P_PER_KWH` — SAP 10.2 spec
  values (3.64p gas, 16.49p standard elec).
 - `domain.sap.tables.table_12_cert_calibration.UNIT_PRICE_P_PER_KWH`
  — empirically lower values (3.48p gas, 13.19p elec) that match the
  cert assessor software's output.
-### Two possible end states for the calculator
+- **One price table.** `domain.sap.tables.table_12` (re-labelled SAP
  10.2 14-03-2025 amendment, CO2 factors corrected per P2).
 - **One validation cohort.** `inspection_date ≥ 2025-07-01`, every
  cert lodged on the calculator's target spec version.
 - **One verification mechanism.** Trace-mode intermediates + BRE
  worked-example unit tests for per-section verification; Validation
  Cohort probe MAE for aggregate go/no-go.
-**End state A — Spec-perfect.** Use spec prices, apply every spec rule
+Cert-software deviations from spec, if they exist at all, are
-(standing charges, Table 12a fractions, combi zero-loss, etc.). The
+expected to be small and localised. They surface as residual after
-calculator output is then what a *correct SAP 10.2 implementation*
+the spec sweep completes against a clean probe — and at that point
-would produce. SAP MAE against the corpus will likely worsen because
+the question is whether to chase them at all (Elmhurst-deviation
-Elmhurst doesn't perfectly implement spec.
+fixes have low domain value compared to spec-correctness, given the
-
+calculator's product use case is scoring counterfactuals for the
-**End state B — Elmhurst-perfect.** Use cert-cal prices and reproduce
+MeasureApplicator chain, not reproducing historical certs).
 Elmhurst's deviations exactly. The calculator output matches cert
 SAP scores. The calculator becomes a "reverse-engineered Elmhurst
 clone" rather than a SAP 10.2 implementation.
 ### The pragmatic recommendation
 **Aim for state A but track state B as the parity probe.** Concretely:
 1. Verify each spec section in isolation; fix spec violations
   regardless of MAE impact, but commit each fix WITH a measured
   probe delta in the commit message.
 2. After the spec sweep is complete, the calculator's output is
   spec-correct. The corpus residual at that point is Elmhurst's
   deviation from spec.
 3. THEN re-derive the cert-calibration prices to match Elmhurst's
   deviation pattern. The calibration becomes a thin Elmhurst-
   compatibility layer on top of a spec-correct engine.
 This avoids the whack-a-mole problem because state A is unambiguous:
 each fix is either spec-correct or not. State B is iterative on top
 of state A, not entangled with it.
 ---
 ## 7b. Outstanding findings to pick up during the systematic pass
 The prior session identified several spec-correct fixes that were
-**reverted because they made SAP MAE worse against the corpus, but the
+reverted because they made SAP MAE worse against the **full corpus**.
-spec basis is unambiguous and the fixes WILL be the right answer once
+The empirical signal that "reverted" them was version-mixture noise
-the cert-calibration is re-derived against a clean engine.** Treat
+(see §3) plus compensating-error breakage in the 7 retired golden
-these as TODOs the systematic pass should encounter when it reaches
+fixtures. Each fix below is **expected to land cleanly** once the
-the relevant section. They're listed here so the work isn't lost.
+five prerequisites in §2.5 are done, because:
 - The Validation Cohort (P3) is on a single spec version — the price
  mismatch that drove the bias regression on standing charges and
  cat=10 routing disappears.
 - The cert-cal layer is gone (P2) — no calibration to "break".
 - PCDB is integrated (P4) — the heat-pump and gas-boiler residuals
  that dominated per-cert MAE collapse before any of these findings
  even matter.
 - The fixtures are now BRE worked examples (P5 + §10) — they cannot
  be broken by spec-correct changes because they are themselves
  derived from the spec.
 Treat each finding as a section-sweep TODO. The empirical impacts
 below were measured against the **dirty probe** (full corpus + cert-cal
 + no PCDB) and are **not predictive** of behaviour on the clean probe.
 Re-measure each fix against the Validation Cohort after prerequisites
 land.
 ### Finding 1 — HW cylinder zero-loss rule for combi boilers
 **Status**: spec-correct fix exists in working-tree-only form
@ -587,32 +767,45 @@ direction.
 ## 8. Don't repeat — known dead-ends
 > **Re-read after §3 + §7b.** Three entries below were classified as
 > "dead-ends because cert-cal absorbs" — that diagnosis is wrong.
 > They are spec-correct fixes that were measured under a noisy probe.
 > Now flagged as **conditional dead-ends**: dead only if you try them
 > before P1–P5 land. After prerequisites: they are expected
 > improvements, not dead-ends. See ADR-0010.
 - ❌ **Switching "NI" wall thickness to None alone** (S-B5 in history) —
  over-corrected because it routed to the (Unfilled cavity, 50mm) row
  instead of the dedicated Filled cavity row. The right fix landed in
  S-B23 with a `WALL_INSULATION_FILLED_CAVITY` dispatcher.
 - ❌ **Aggressive efficiency rescue for missing `sap_main_heating_code`**
  (S-B5) — over-corrected. The category fallback (cat=4 → 2.30) is
-  intentionally conservative; PCDB is needed for real efficiency.
+  intentionally conservative; PCDB (P4 prerequisite) supplies the
- ❌ **Using SAP 10.2 spec prices for parity validation** — cert assessor
+  real efficiency.
-  uses lower prices despite reporting `sap_version=10.2` (S-B9, S-B10).
+- ⚠️ **Using SAP 10.2 spec prices for parity validation** — under
-  Use `cert_calibration_prices()` for the probe.
+  the dirty probe, cert-cal prices fit better. **Inverts under the
  clean probe (P2 + P3): SAP 10.2 spec prices are correct because the
  Validation Cohort is on the 14-03-2025 amendment.** Listed here
  only as a warning if you start the sweep before prerequisites land.
 - ❌ **Always applying 10% secondary heating** — must be conditional on
  cert lodging or main system being electric storage (S-B20). See
  spec Appendix A.4.
 - ❌ **Respecting `main_heating_fraction` for secondary allocation**
  (failed S-B30) — the field is the multi-main allocation (system 1 vs
  system 2), not main-vs-secondary. SAP MAE 4.69 → 4.85 (worse).
- ❌ **Switching cat=10 room heaters off off-peak** (failed S-B32) —
+- ⚠️ **Switching cat=10 room heaters off off-peak** (failed S-B32) —
-  spec-correct per Table 12a but inverts bias direction. Cert-cal
+  spec-correct per Table 12a. The bias inversion under the dirty
-  calibration absorbs the deviation.
+  probe was driven by cert-cal compensating; on the clean probe this
- ❌ **Adding gas standing charges** (4-mode probe, unimplemented) —
+  is just spec-correct. Land as part of the §12 spec sweep after
-  spec-correct per Table 12 note (a) but pushes SAP bias from +0.98
+  prerequisites.
-  to −2.62. Cert-cal calibration absorbs.
+- ⚠️ **Adding gas standing charges** (4-mode probe, unimplemented) —
- ❌ **Zeroing storage + primary loss for combi boilers** (uncommitted
+  spec-correct per Table 12 note (a). Same logic: bias drift under
-  S-B32) — spec-correct per Table 2 + Table 3 footers and drops PE
+  dirty probe is version-mixture + missing-PCDB noise, not Elmhurst
-  MAE −6.64 (huge win) BUT raises SAP MAE +0.39 and breaks 3 golden
+  deviation. Land as part of §12 spec sweep.
-  fixtures. Decision deferred to systematic pass.
+- ⚠️ **Zeroing storage + primary loss for combi boilers** (uncommitted
  S-B32) — spec-correct per Table 2 + Table 3 footers. SAP MAE
  regression was driven by the now-retired golden fixtures (§10) and
  cert-cal absorption. Land as part of §4 / Appendix J sweep.
 ---
@ -620,10 +813,15 @@ direction.
 ### Sample
 `data/ml_training/runs/2025_2026_n250000_v18a/data.parquet` is the
-250k-cert parquet. The probe filters to `sap_score ∈ [5, 99]` and
+250k-cert parquet. **After P1 lands** the parquet carries
 `inspection_date`; the probe then filters to the **Validation Cohort**
 (`inspection_date ≥ 2025-07-01`) plus `sap_score ∈ [5, 99]` and
 samples 300 at seed=7 by default. Filtering rationale:
- ≤ 5 is heritage/anomaly stock (sub-3% of corpus)
+- ≤ 5 is heritage/anomaly stock (sub-3 % of corpus)
 - ≥ 99 is full-SAP new-builds the parquet excludes anyway
 - `inspection_date ≥ 2025-07-01` ensures every cert was lodged on
  SAP 10.2 (14-03-2025 amendment) — see [CONTEXT.md](../../CONTEXT.md)
  / "Validation Cohort" and ADR-0010 §3.
 ### Run the probe
 ```bash
@ -654,38 +852,58 @@ main(['300','7'])
 ---
-## 10. The 7 golden fixtures
+## 10. Fixtures: retire the 7 cert-based golden fixtures, replace with BRE worked examples (per ADR-0010 + P5)
 The 7 cert-based fixtures at
 `packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`
-locks 7 corpus certs as regression anchors:
+were locked in against the current calculator state — *with* cert-cal,
 *without* PCDB, *with* HW cylinder loss always applied, *with* the
 lighting heuristic, etc. They are documented in §3 / the prior
 handover as containing compensating errors. Once the prerequisites
 land, every spec-correct fix breaks at least one of them. They will
 fight the spec sweep.
-| Cert | TFA | Cat | Notes |
+### Replacement strategy
 |---|---|---|---|
 | `0240-0200-5706-2365-8010` | 202 | 2 | Detached, age J, oil boiler, Table 4b code 130 |
 | `0300-2747-7640-2526-2135` | 526 | 2 | Semi-detached, age D, gas PCDB |
 | `0390-2954-3640-2196-4175` | 360 | 2 | Detached, age F, oil PCDB |
 | `6035-7729-2309-0879-2296` | 128 | 2 | Mid-terrace, age A, gas combi code 104 |
 | `7536-3827-0600-0600-0276` | 152 | 2 | Detached + extensions, age D, gas PCDB. Cleanest PE match (−0.29 kWh/m²) |
 | `8135-1728-8500-0511-3296` | 102 | 2 | Semi-detached, age C, gas PCDB |
 | `9390-2722-3520-2105-8715` | 75 | 6 | Mid-floor flat, age D, heat network code 301 |
-Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10`. **Tighten as
+**Primary regression suite: BRE worked-example fixtures.**
 the spec sweep progresses.**
-The cert JSONs are stored under `fixtures/golden/<cert>.json` —
+Transcribe the worked examples from:
-frozen at extraction time so the test is reproducible without
+- SAP 10.2 spec appendices (especially Appendix R — reference values
-bulk-zip access. The probe extraction script for new fixtures is
+  and the worked example dwelling).
-inlined in the test history (see commit `f4a8d2a0`).
+- RdSAP 10 (10-06-2025) worked-example annex.
-**Important caveat**: some of these 7 are compensating-error matches
+Each worked example becomes a unit test that locks **per-intermediate
-(see §3). When a spec-correct slice breaks one, the fixture is
+expected values** (HLP, HTC, mean internal temperature monthly, MIT,
-probably the compensating case — investigate before reverting.
+ECF, SAP score) rather than the aggregate SAP score alone. Because
 they are spec-derived, no spec-correct change can break them — any
 break is an implementation bug, unambiguously.
 These tests live at
 `packages/domain/src/domain/sap/tests/test_bre_worked_examples.py`
 (new module — separate from the cert-based fixtures module).
 **Cert-based fixtures retired.**
 The current `test_golden_fixtures.py` is either deleted or repurposed
 as a *very loose* smoke-test integration suite (e.g. `|SAP residual|
 ≤ 5`) that catches catastrophic regressions only. The 7 cert JSONs
 under `fixtures/golden/<cert>.json` can be kept on disk as reference
 data, but they no longer drive go/no-go decisions in the sweep.
 **Optional future addition.**
 If/when a current Elmhurst (or Stroma / Quidos / NHER) license is
 available, run a handful of representative corpus certs through it
 and lock those outputs as a second-tier regression suite — Elmhurst-
 parity fixtures alongside spec-parity fixtures. Not a prerequisite.
 ---
-## 11. Trace mode (recommended infrastructure)
+## 11. Trace mode (prerequisite P5 — implementation sketch)
-ADR-0009 proposed:
+This section was originally labelled "recommended"; it is now
 **prerequisite P5** per ADR-0010. The sweep does not start until
 `intermediate` is populated everywhere. ADR-0009 proposed:
 ```python
@dataclass(frozen=True)
 class SapResult:
@ -765,14 +983,39 @@ This single session should produce zero behaviour changes if §1-3 are
 correctly implemented, but expect to find at least one issue in §3
 geometry (per the reviewer's "biggest SAP error sources" list).
-Run the golden fixtures + probe at the end of each session; expect no
+**Important:** Session 1 only starts after all five prerequisites in
-movement until you start hitting actual gaps.
+§2.5 have landed and the Validation Cohort probe baseline has been
 captured. Until then, running per-section verification produces noisy
 signal.
 Run the BRE worked-example fixtures (P5) + Validation Cohort probe
 (P3) at the end of each session; expect no movement until you start
 hitting actual gaps.
 ---
 ## 13. Workflow recap
-For each section, in order:
+**Phase 0 — Prerequisites (§2.5).** Land P1–P6 first, in dependency
 order:
 | | Slice | Depends on |
 |---|---|---|
 | P1 | Re-extract parquet with `inspection_date` | — |
 | P2 | Delete cert-cal; correct `table_12.py` CO2 factors | — |
 | P3 | Filter parity probe to Validation Cohort | P1 |
 | P4 | Implement `PcdbLookup` | — (P2 helpful) |
 | P5 | Populate `SapResult.intermediate` + transcribe BRE worked examples | — |
 | P6 | Strict-type `EpcPropertyData` via codes.csv-derived enums | — |
 P1, P2, P4, P5, P6 can run in parallel. P3 needs P1. Capture a
 Validation Cohort probe baseline once all six land — that is the new
 MAE starting line. Repo-wide tests stay green throughout P6 (Site
 Notes consumers, ML pipeline, recommendations, etc. all need the
 mapper updates that accompany each typing change).
 **Phase 1 — Section sweep.** For each RdSAP 10 section, in document
 order:
 1. Read the spec section text + cited tables.
 2. Identify code location(s).
@ -780,22 +1023,36 @@ For each section, in order:
   - Does our code implement it?
   - Does the implementation match?
   - Edge cases / fallback paths handled?
-4. For each gap: AAA unit test → minimal implementation → commit.
+4. For each gap: AAA unit test (preferring a BRE worked-example
-5. After each commit: run golden fixtures (`pytest test_golden_fixtures.py`)
+   assertion on `intermediate` values when possible) → minimal
-   and the parity probe. Note both deltas in the commit message.
+   implementation → commit.
-6. If a golden fixture breaks: investigate. Either fixture was a
+5. **Apply the worksheet-faithful structure principle** (§5.5) as
-   compensating case (acceptable to break) or the new code is wrong
+   part of this slice: name functions after worksheet lines, split
-   (revert).
+   compound calculations, replace any remaining defensive
   type-handling with typed-enum dispatch.
 6. After each commit: run `test_bre_worked_examples.py` + Validation
   Cohort probe. Note both deltas in the commit message.
 7. If a BRE worked-example breaks: the new code is wrong (revert).
   The worked examples are spec-derived and cannot be broken by
   spec-correct changes.
 Stick to this. The prior session's mistake was jumping between
-sections based on residual-size. Don't.
+sections based on residual-size **on a dirty probe**. Clean probe
 plus document-order discipline plus worksheet-faithful structure is
 what makes the sweep converge.
 ---
 ## 14. Useful references
 - **ADR-0010** `docs/adr/0010-sap10-calculator-spec-target-and-validation.md`
  — the binding decisions reflected in this rewrite: SAP 10.2 target,
  cert-cal deletion, Validation Cohort, PCDB-as-prerequisite, fixture
  retirement. **Read first.**
 - **ADR-0009** `docs/adr/0009-deterministic-sap-calculator.md` —
-  decision rationale + Session A/B/C plan.
+  original calculator decision rationale + Session A/B/C plan. Read
  for context; spec-version target / PCDB sequencing / cert-cal
  rationale are superseded by ADR-0010.
 - **Spec coverage map**
  `docs/sap-spec/SPEC_COVERAGE.md` — pre-existing coverage tracker.
  Update as you go.
@ -817,19 +1074,19 @@ sections based on residual-size. Don't.
 ## 15. Final note
-The prior session demonstrated that **moving SAP MAE down requires
+The prior session's framing — *"the cert-calibration layer absorbs
-either spec-correctness OR Elmhurst-perfect calibration, not both
+Elmhurst's spec deviations; we'll re-derive it at the end"* — was
-simultaneously**. The cert-cal layer absorbs Elmhurst's spec
+load-bearing on a false diagnosis. The cert-cal layer is
-deviations; any spec-correct fix risks breaking it.
+pre-March-2025 SAP prices fit against a mixture distribution of two
 spec-version regimes. Once you separate the regimes (Validation
 Cohort) and use spec prices everywhere, the "tension" disappears.
-The systematic pass clears this by separating the layers:
+After P1–P5 land, the section sweep is straightforward: every
-1. Build the spec-correct engine first.
+spec-correct fix is unambiguously the right answer, BRE
-2. Re-fit the cert-cal compatibility layer once at the end.
+worked-example fixtures lock the result, and Validation Cohort probe
 MAE moves monotonically downward. The fixes the prior session marked
 as "spec-correct but probe-regressed" become trivially landable.
-Don't be discouraged when SAP MAE rises temporarily during the spec
+**Welcome to the project. Read ADR-0010, land the five prerequisites,
-sweep. PE residual is the truer signal of engine correctness. SAP
+then walk the spec in document order. The deterministic answer is in
-MAE convergence will follow once cert-cal is re-derived against the
+there.**
 clean engine.
 **Welcome to the project. Read the spec, follow the order, commit one
 section at a time. The deterministic answer is in there.**
--- a/packages/domain/src/domain/sap/worksheet/dimensions.py
+++ b/packages/domain/src/domain/sap/worksheet/dimensions.py
@ -21,7 +21,6 @@ from typing import Final
 from datatypes.epc.domain.epc_property_data import EpcPropertyData, SapBuildingPart
 _DEFAULT_STOREY_HEIGHT_M: Final[float] = 2.5
@ -72,6 +71,16 @@ def dimensions_from_cert(epc: EpcPropertyData) -> Dimensions:
    """Build the `Dimensions` aggregate from an EpcPropertyData."""
    parts = epc.sap_building_parts or []
    # Khalim Comments - this section seems to implement the
    # worksheet section in page 132 and is unnecessarily
    # complicated. The sap building parts are pre-ordered, form
    # main building part to the extensions and the
    # "identifier" field tells us if the part is the Main Dwelling
    # of it's an extension. E.g. if it's an extension, identifier
    # should be "Extension 1".
    # We should strictly type the values on the EpcPropertyData
    # domain model
    ground_area = 0.0
    ground_perim = 0.0
    top_area = 0.0