docs: ADR-0010 retargets calculator to SAP 10.2; rewrite handover

Adds ADR-0010 superseding ADR-0009's spec-version target, PCDB sequencing, and cert-calibration layer. Captures the conclusions of a grill-with-docs session: 1. Active spec target is SAP 10.2 (14-03-2025), not SAP 10.3 — no SAP-10.3-lodged certs exist in the corpus to validate against. 2. table_12_cert_calibration is deleted (not "re-derived at the end"). It was pre-March-2025 spec prices fit against a mixture distribution of two spec-version regimes, with downstream- component bugs absorbed into the fit — not Elmhurst deviation. 3. Validation Cohort: filter the corpus to inspection_date ≥ 2025-07-01 so every cert in the probe was lodged on SAP 10.2 (14-03-2025) prices. One spec, one signal. 4. PCDB integration is promoted from "Session C deferred" to prerequisite P4 — dominates residual variance on heat pumps and the 78% of gas-boiler certs lodging main_heating_data_source=1. 5. Trace mode (SapResult.intermediate) and BRE worked-example fixtures replace the 7 cert-based golden fixtures, which contained compensating errors. 6. Strict-type EpcPropertyData via codes.csv-derived canonical enums (P6) — the in-source motivation lives at dimensions.py:74-82 (Khalim's comment, included in this commit). 7. Worksheet-faithful structure is a sweep-time principle: each worksheet module mirrors SAP 10.2 worksheet line numbering. CONTEXT.md additions: - Refined "Calculated SAP10 Performance" and "SAP10 Calculation" to reference SAP 10.2 + ADR-0010. - New term "SAP Spec Version" — domain-meaningful because the same EpcPropertyData yields different sap_score under different spec revisions. - New term "Validation Cohort" — the version-locked sub-corpus. HANDOVER_SYSTEMATIC_REVIEW.md is rewritten section-by-section to reflect ADR-0010: §1 framing, §2 status pointer, new §2.5 with the six prerequisites P1–P6 in dependency order, §3 diagnosis (cert-cal was stale prices, not Elmhurst deviation), §4 scope (PCDB IN, SAP 10.3 stays OUT), §5 approach (worksheet-faithful principle as §5.5), §7 tension dissolved, §7b findings re-framed, §8 dead-ends re-classified as conditional, §9 cohort filter, §10 fixture strategy, §11 trace mode as prerequisite, §12 prereqs-first, §13 Phase 0/Phase 1 workflow, §14 ADR-0010 reference, §15 final note. P2.1 (commit ac1aa56a) already lands the first ADR-0010 slice (probe swap to spec prices). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-08 11:17:27 +00:00 · 2026-05-19 09:54:24 +00:00 · 2026-05-19 09:54:24 +00:00 · bb9c5ac017
commit bb9c5ac017
parent ac1aa56ab1
4 changed files with 551 additions and 209 deletions
--- a/CONTEXT.md
+++ b/CONTEXT.md
@ -98,12 +98,20 @@ The SAP / EPC Band / carbon emissions / heat demand the modelling pipeline actua
 _Avoid_: modelled performance, rebaselined performance (only correct when rebaselining ran), scored values

 **Calculated SAP10 Performance**:
-The SAP score, EPC Band, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh produced by **SAP10 Calculation** from a Property's EpcPropertyData. Distinct from Effective Performance (ML output) and Lodged Performance (gov register) during the validation phase. Surfaced alongside Effective Performance in the UI; may supersede Effective Performance in a later ADR once parity is confirmed against the cert-reported SAP across ≥1000 sample certs. ADR-0009.
+The SAP score, EPC Band, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh produced by **SAP10 Calculation** from a Property's EpcPropertyData. Distinct from Effective Performance (ML output) and Lodged Performance (gov register) during the validation phase. Surfaced alongside Effective Performance in the UI; may supersede Effective Performance in a later ADR once parity is confirmed against the cert-reported SAP across ≥1000 sample certs lodged on the calculator's target spec version (see [[sap-spec-version]]). ADR-0009 (as amended by ADR-0010).
 _Avoid_: calculator output, computed performance, worksheet performance, SAP10 output

 **SAP10 Calculation**:
-The process that runs the deterministic SAP 10.3 worksheet over a Property's EpcPropertyData and emits **Calculated SAP10 Performance**. Implemented by the `Sap10Calculator` service class in `domain/sap/`. Reads cert fabric/heating/geometry fields, applies the RdSAP 10 cert→input mapping, executes the 12-month heat balance per SAP 10.3 §§1-14, and returns a `SapResult` carrying the five Calculated SAP10 Performance quantities plus a monthly breakdown and worksheet-line audit trail. Distinct from **Rebaselining**, which is ML-based. ADR-0009.
-_Avoid_: SAP calculation (ambiguous with the gov calculator), SAP scoring, calculator run
+The process that runs the deterministic SAP 10.2 (14-03-2025 amendment) worksheet over a Property's EpcPropertyData and emits **Calculated SAP10 Performance**. Implemented by the `Sap10Calculator` service class in `domain/sap/`. Reads cert fabric/heating/geometry fields, applies the RdSAP 10 (10-06-2025) cert→input mapping, executes the 12-month heat balance per SAP 10.2 §§1-14, looks up boiler/heat-pump performance in the **PCDB** when the cert lodges a product index, and returns a `SapResult` carrying the five Calculated SAP10 Performance quantities plus a monthly breakdown and worksheet-line audit trail. Distinct from **Rebaselining**, which is ML-based. ADR-0009 originally targeted SAP 10.3 (13-01-2026); ADR-0010 retargets to SAP 10.2 (14-03-2025) until the cert corpus migrates.
+_Avoid_: SAP calculation (ambiguous with the gov calculator), SAP scoring, calculator run, SAP 10.3 calculation (active target is 10.2 — see [[sap-spec-version]])
+
+**SAP Spec Version**:
+The dated revision of the SAP specification that produced a given SAP/PEUI/CO2 value. Domain-meaningful because the same EpcPropertyData yields different `sap_score` under different spec versions — fuel-price tables, CO2 factors, PCDB references, and rating-equation deflators all change between revisions. **Lodged Performance** carries the version current when the cert was lodged (mostly SAP 10.1 / SAP 10.2 pre- and post-14-03-2025 amendment in the corpus). **Calculated SAP10 Performance** is locked to SAP 10.2 (14-03-2025). A 1-to-1 Lodged-vs-Calculated comparison therefore only makes sense within a **Validation Cohort** of certs lodged on the same spec version.
+_Avoid_: SAP version (ambiguous with the `sap_version` field on the cert, which only carries the major version like 10.2 — not the amendment date), spec revision
+
+**Validation Cohort**:
+The subset of corpus certs used to validate **SAP10 Calculation** against **Lodged Performance**, filtered to certs lodged after the calculator's target **SAP Spec Version** rolled out in commercial assessor software — currently `inspection_date ≥ 2025-07-01` (a buffer past 14-03-2025 to allow vendor rollout). Smaller than the full corpus but each cert is comparable under the same spec, so probe MAE is a clean signal of calculator-vs-spec correctness rather than spec-version mixture noise. ADR-0010.
+_Avoid_: parity cohort, validation set, corpus sample

 **Measure Application**:
 The process that translates an Optimised Package into cert-field changes and produces the "ending state snapshot" EpcPropertyData that Plan Phase persists. Implemented by the `MeasureApplicator` service class in `domain/sap/` (or a sibling package). Each Measure Type's translation rules (e.g. `loft_insulation` → `roof_insulation_thickness_mm = 270mm`, `ashp` → `main_heating_details[0]` replacement) live here. Pure function — does not run SAP10 Calculation itself; the caller chains `MeasureApplicator.apply(epc, package) → Sap10Calculator.calculate(post_epc)`. ADR-0009.
--- a/docs/adr/0010-sap10-calculator-spec-target-and-validation.md
+++ b/docs/adr/0010-sap10-calculator-spec-target-and-validation.md
@ -0,0 +1,68 @@
+# Retarget Sap10Calculator to SAP 10.2 (14-03-2025); delete cert-calibration; validate on a spec-version-locked cohort
+
+**Status: Accepted.** Supersedes the spec-version target, the PCDB sequencing, and the cert-calibration layer of [ADR-0009](0009-deterministic-sap-calculator.md). Adds strict typing of `EpcPropertyData` (P6) and a worksheet-faithful structural principle for the `domain/sap/worksheet/*` modules — both new concerns ADR-0009 didn't address. All other ADR-0009 decisions stand (Calculated SAP10 Performance as a glossary term, MeasureApplicator/Sap10Calculator chain, MCS boolean default-false, global thermal-bridging y factor, Table 27 living-area fraction, Table 11 secondary-heating allocation, MeasureOverrides rejection).
+
+## Why this ADR exists
+
+ADR-0009 was written before a second-order problem in the validation corpus was visible: the 250k-cert training parquet spans **multiple SAP spec versions** (SAP 10.1 from 2019, SAP 10.2 pre- and post-14-March-2025 amendment), each of which was the active table when its certs were lodged. The prior session's `domain.sap.tables.table_12_cert_calibration` layer was implicitly absorbing this version mixture into a single "best fit" price set ~10–25 % lower than the SAP 10.2 (14-03-2025) spec — closer to the SAP 10.1 era prices. Every spec-correctness slice that touched a downstream component (HW cylinder zero-loss, gas standing charges, Table 12a fractional blending) registered as a regression on the parity probe because the cert-cal layer had been numerically calibrated against the buggy state of every other component.
+
+This ADR resolves four entangled decisions at once. They are coupled — none of them is the right call in isolation.
+
+## Decisions
+
+### 1. Active spec target is **SAP 10.2 (14-03-2025)**, not SAP 10.3
+
+ADR-0009 named SAP 10.3 (13-01-2026) as the calculator's target. No SAP-10.3-lodged certs exist in the corpus; assessor software has not migrated. Targeting SAP 10.3 produces a calculator whose output is verifiable against no cert. The active target is SAP 10.2 (14-03-2025 amendment) — both the document RdSAP 10 (10-06-2025) cross-references for heating-system identification, and the amendment that current assessor software is on.
+
+`packages/domain/src/domain/sap/tables/table_12.py` is re-labelled as SAP 10.2 (14-03-2025). Its CO2 factors are corrected to spec (0.210 kg/kWh mains gas, 0.136 kg/kWh standard electricity — the file currently has SAP 10.3 values 0.214 and 0.086). Prices already match SAP 10.2 (3.64 p mains gas, 16.49 p standard electricity, etc.) — the misleading "+25 % shift from SAP 10.2 to 10.3" comment is removed; the 13.19 p figure is from SAP 10.1, not SAP 10.2.
+
+A future ADR retargets to SAP 10.3 once the cert corpus migrates (expected late 2026 or 2027 once BRE updates RdSAP to reference SAP 10.3).
+
+### 2. `table_12_cert_calibration` is deleted
+
+The cert-calibration table is bug-masking. Its prices are pre-March-2025 SAP values fit against the average cert in a mixed-version corpus, with downstream-component bugs absorbed into the fit. Removing it forces upstream errors to surface where they live, in the component that owns them, instead of being silently compensated for by a price tweak.
+
+This includes the `cert_calibration_e7_codes` extension that routes codes 191–196 (direct-electric) and 691–696 (room heaters) to off-peak rates — Table 12a is explicit that "other direct-acting electric heating" bills 100 % at the high rate on a 7-hour tariff. The S-B14 finding that motivated this hack is in §8 of the handover as a documented dead-end.
+
+`domain.sap.tables.table_12.unit_price_p_per_kwh` becomes the only price API. Parity probes are updated to use it.
+
+### 3. Validation Cohort is filtered to a single spec-version window
+
+Probe MAE against the full 250k-cert corpus measures both calculator correctness *and* the spec-version drift across certs lodged at different times. Without separating them, every spec-correctness improvement is noisy.
+
+The **Validation Cohort** is the subset of corpus certs with `inspection_date ≥ 2025-07-01` — chosen to allow ~4 months past the 14-March-2025 SAP 10.2 amendment for commercial assessor software to roll out the new tables. Filtering to this cohort yields a probe where every cert was lodged on the same spec version the calculator targets. MAE on the Validation Cohort is the only metric used for spec-sweep go/no-go.
+
+This requires re-extracting the training parquet to include `inspection_date` (currently dropped by the ETL — 202 columns, none of them dates). That extraction is a prerequisite slice.
+
+### 4. PCDB integration is promoted from Session C to a prerequisite
+
+ADR-0009 deferred PCDB to Session C and shipped a `NoOpPcdbLookup` stub. The handover's own measurements show PCDB absence accounts for ~19 SAP points of MAE on heat-pump certs (Table 4a fallback SCOP 2.30 vs typical PCDB 2.80–3.50) and most per-cert variance on the 78 % of gas-boiler certs lodging `main_heating_data_source=1` (category-default 0.80 vs typical PCDB 0.88–0.94). The handover's rationale for deferral ("cert-cal absorbs PCDB gaps") collapses with decision (2).
+
+PCDB lookup against `main_heating_index_number` is built before the section-by-section sweep starts. Data source: https://www.ncm-pcdb.org.uk — CSV exports of boilers and heat pumps. Per-product fields needed: seasonal efficiency, secondary efficiency, output kW, flow-temperature curve (heat pumps). The `NoOpPcdbLookup` seam from ADR-0009 grill outcome #1 is the integration point; the stub returns None and the calculator falls back to Table 4a only when the cert lodges no `main_heating_index_number` or the PCDB has no matching record.
+
+## Verification infrastructure (also prerequisites)
+
+Three pieces of infrastructure are built before the section sweep so per-section verification has unambiguous signal:
+
+1. **Trace mode populated.** ADR-0009 specced `SapResult.intermediate: dict[str, float]` and it was never built. Every named SAP 10.2 worksheet variable (heat transfer coefficient, mean internal temperature, monthly solar gains, utilisation factor, ECF, etc.) is exposed on `intermediate` so any single cert can be diffed against a hand-computed value, a BRE worked example, or a future Elmhurst reference trace.
+2. **BRE worked-example unit tests.** SAP 10.2 spec appendices and RdSAP 10 worked examples are transcribed as fixtures keyed on per-intermediate expected values, not aggregate SAP score. These replace the 7 cert-based golden fixtures (which contained compensating errors per the handover §10). The cert fixtures are retired.
+3. **Strict typing of `EpcPropertyData` via canonical domain enums.** Bare `str` and `Union[int, str]` fields (the latter because the gov API gives ints and Site Notes give strings) cascade defensive type-handling into every consumer — the calculator's `dimensions.py:74-82` is Khalim's documented example. The domain holds one canonical enum per field, derived from `datatypes/epc/domain/epc_codes.csv` (union of keys across schema versions, hand-authored). The API mapper and Site Notes mapper each adapt their raw input to the canonical enum. Repo-wide test compatibility is a hard constraint — every consumer of `EpcPropertyData` (calculator, ML pipeline, recommendations, ETL) continues working after the typing pass. Pyright `strict` mode stays clean.
+
+These map to prerequisites P5 (trace mode + BRE fixtures) and P6 (strict typing) in the handover §2.5.
+
+## Worksheet-faithful structure (sweep-time principle)
+
+Each `domain/sap/worksheet/*.py` module must mirror the SAP 10.2 worksheet structure for its section — function names reference their worksheet-line origin (e.g. `heat_transfer_coefficient` aligns with worksheet line (40)), compound calculations split into one function per line where possible, defensive type-handling replaced by typed-enum dispatch. This is not a prerequisite slice; the refactor lands as part of each section's sweep slice, verified by the BRE worked examples (which assert per-intermediate values).
+
+## Consequences
+
+- ADR-0009's "MAE ≤ 1.0 SAP-point on typical subset" success criterion is restated against the Validation Cohort (not the full corpus). The "typical subset" exclusions in ADR-0009 (sap_score ≤ 5, ≥ 100, multi-heating, conservatory, RIR) still apply on top of the cohort filter.
+- The training parquet schema bumps when `inspection_date` is added — a non-breaking MINOR addition under [ADR-0008](0008-physics-as-feature.md)'s `Feature Schema Version` discipline.
+- The handover document `docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md` is rewritten in lockstep: §3 (diagnosis), §4 (scope), §7 (state-A-vs-state-B framing deleted), §7b (findings re-framed), §10 (fixture strategy), and a new §2.5 listing the five prerequisites.
+- Sessions A/B/C from ADR-0009 collapse into a single sequence: prerequisites land, then the section sweep runs against a clean probe with PCDB available.
+
+## Considered alternatives
+
+- **Build versioned Table 12 (pre/post 14-March-2025) keyed on `inspection_date` and validate across the full corpus.** Rejected as more work for no signal benefit during the spec sweep — the filtered cohort gets us to a clean probe faster. A versioned table is still future work if Calculated SAP10 Performance ever needs to reproduce historical cert SAP for products that compare against Lodged Performance directly.
+- **Keep cert-cal during the sweep and re-derive at the end** (the handover's prescription). Rejected for the reasons in decision (2): the cert-cal layer corrupts the signal during the sweep, which is precisely when the signal needs to be cleanest.
+- **Pay for an Elmhurst license, lock fixtures to its output.** Held in reserve. BRE worked examples are free and spec-derived; an Elmhurst trace would add value as a per-component reference but is not a prerequisite.
--- a/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md
+++ b/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md
@ -24,10 +24,12 @@ The SAP/RdSAP energy assessment splits cleanly into two roles:
   takes the lodged fields and produces SAP score, CO2 emissions,
   primary energy (PEUI), CO2 per m², EI rating, etc.

-**Our calculator is replicating role #2.** Where Elmhurst's
-implementation diverges from spec, we follow Elmhurst, but we don't
-guess at divergence; we localise it via reference traces or
-empirically against the cert corpus.
+**Our calculator is replicating role #2.** Assessor software
+implements the SAP 10.2 spec faithfully; the question of "where does
+Elmhurst diverge from spec?" is no longer the operative one (per
+ADR-0010 + §3 below). Our job is to enumerate every spec
+table / formula / footnote and verify each against the published SAP
+10.2 (14-03-2025) and RdSAP 10 (10-06-2025) PDFs.

 There is no "assessor judgement" knob to tune. Each field on the cert
 has a deterministic interpretation per the spec. Each spec table /
@ -57,109 +59,259 @@ all of them and verify each.
  Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10 kWh/m²`. Known
  caveat: some of these are compensating-error matches (e.g. cert
  `7536-3827`'s PE matches but cost is £143 under cert's implied cost
-  due to multi-factor offsetting bugs).
+  due to multi-factor offsetting bugs). **These fixtures are retired
+  per ADR-0010 and §10 below — they lock buggy compensating outputs
+  in place and will fight the spec sweep.**
+
+> **Read this before anything else.** [ADR-0010](../adr/0010-sap10-calculator-spec-target-and-validation.md)
+> supersedes the spec-version target, the PCDB sequencing, and the
+> cert-calibration layer of ADR-0009. This handover document was
+> originally written under the rejected framing; §3, §4, §7, §7b,
+> §10 below have been rewritten in lockstep. §2.5 lists the five
+> prerequisites that land **before** the section-by-section sweep
+> starts.

 ---

-## 3. Why we are pivoting to systematic review
+## 2.5. Prerequisites before the sweep starts
+
+Five blockers, in dependency order. The section sweep does not start
+until all five are merged. Together they convert the parity probe
+from a noisy mixture-distribution signal into a clean per-section
+verification tool.
+
+### P1 — Re-extract the training parquet with `inspection_date`
+
+The 250k-cert parquet has 202 columns; **none of them are dates**.
+Without `inspection_date` on each cert we cannot construct the
+Validation Cohort (P3). The ETL currently drops the dates; add them
+back as a non-breaking MINOR Feature Schema Version bump (per
+ADR-0008). `EpcPropertyData.inspection_date` and `.registration_date`
+both exist on the domain object and are populated upstream — the
+parquet writer just needs to include them.
+
+### P2 — Delete `domain.sap.tables.table_12_cert_calibration`; correct `domain.sap.tables.table_12`
+
+Per ADR-0010 §2 and §1:
+- Remove `table_12_cert_calibration.py` and every call site
+  (`cert_calibration_prices()`, `cert_calibration_e7_codes`, the
+  `PriceTable` constructor argument that defaults to it).
+- Re-label `table_12.py` as `SAP 10.2 Table 12 (14-03-2025 amendment)`.
+- Correct CO2 factors: mains gas 0.214 → **0.210**, standard electricity 0.086 → **0.136** (the file currently mixes SAP 10.2 prices with SAP 10.3 CO2 factors).
+- Delete the misleading "+25 % shift from SAP 10.2" comment — 13.19 p
+  is SAP 10.1 (or SAP 10.2 amendment 0), not SAP 10.2 (14-03-2025).
+
+### P3 — Filter the parity probe to the Validation Cohort
+
+`Validation Cohort` is defined in `CONTEXT.md` and ADR-0010 §3:
+`inspection_date ≥ 2025-07-01`. Modify
+`services/ml_training_data/src/ml_training_data/sap_parity_probe.py`
+to apply the filter before sampling. The probe sample size and seed
+remain configurable; `sap_score ∈ [5, 99]` remains the typicality
+filter on top of the cohort filter.
+
+### P4 — Implement `PcdbLookup` (replace `NoOpPcdbLookup`)
+
+Per ADR-0010 §4. Download boiler + heat-pump CSVs from
+https://www.ncm-pcdb.org.uk. Build a lookup keyed on
+`main_heating_index_number`. Surface seasonal efficiency, secondary
+efficiency, output kW, and (for HPs) flow-temperature curve. ~half-day
+of work per the original handover estimate. The
+`Sap10Calculator.__init__(pcdb: Optional[PcdbLookup])` seam from
+ADR-0009 grill outcome #1 is the integration point; no calculator-side
+changes needed beyond reading `index_number` and routing PCDB-returns
+to space-heating / hot-water efficiency lookups instead of Table 4a.
+
+### P5 — Populate `SapResult.intermediate` + transcribe BRE worked examples
+
+Per ADR-0010 "Verification infrastructure":
+- Populate every named SAP 10.2 worksheet variable on
+  `SapResult.intermediate` as sketched in §11. This is mechanical —
+  thread the values from each worksheet module into the dict.
+- Transcribe the BRE worked examples from the SAP 10.2 appendices and
+  RdSAP 10 worked-example annex into unit tests
+  (`tests/test_bre_worked_examples.py`) that lock per-intermediate
+  values, not aggregate SAP. These replace the retired cert fixtures.
+
+### P6 — Strict-type `EpcPropertyData` via canonical domain enums
+
+The current `EpcPropertyData` and its nested types carry many bare
+`str` fields and `Union[int, str]` fields (the latter because the
+gov API gives ints and Site Notes give strings). The defensive
+type-handling cascades into the calculator (`cert_to_inputs.py`,
+`dimensions.py`, etc.) — `dimensions.py:74-82` is Khalim's documented
+example: `SapBuildingPart.identifier` carries main-vs-extension
+information but is bare `str`, so the dimensions code defensively
+iterates instead of dispatching on a typed kind.
+
+The fix:
+1. **One canonical enum per field**, union of all keys appearing
+   across all schema versions in
+   `datatypes/epc/domain/epc_codes.csv`. Hand-author the 18 enum
+   classes (`built_form`, `construction_age_band`, `energy_tariff`,
+   `glazed_area`, `glazed_type`, `heat_loss_corridor`, `main_fuel`,
+   `mechanical_ventilation`, `property_type`, `tenure`,
+   `transaction_type`, `ventilation_type`, `water_heating_fuel`,
+   `cylinder_insulation_thickness`, `energy_efficiency_rating`,
+   `improvement_description`, `improvement_summary`, `code`) plus
+   `BuildingPartKind` (Main Dwelling / Extension N). codes.csv is
+   the reference; a dedup script can optionally verify coverage but
+   is not a build dependency.
+2. **The API mapper** parses raw ints into the canonical enum.
+3. **The Site Notes mapper** parses raw strings into the canonical
+   enum.
+4. **The domain object** (`EpcPropertyData` and nested) holds only
+   the canonical enums — no `Union[int, str]`, no bare `str` for
+   coded fields.
+5. **Every consumer** (calculator, ML pipeline, recommendations,
+   ETL, scenario builder) reads from the typed fields.
+
+**Constraint**: repo-wide tests must keep passing. The calculator
+is one consumer; the ML pipeline, recommendations, and the Site
+Notes ingestion path also consume `EpcPropertyData`. Each mapper-
+layer change is paired with adapter updates that preserve the
+behaviour the existing tests cover.
+
+Pyright `strict` mode must remain clean (CLAUDE.md).
+
+### Expected outcome of P1–P6
+
+After all six land, run the probe against the Validation Cohort. The
+expected baseline MAE on the clean probe is much smaller than the
+current 4.61 — likely 1.5–2.5 SAP-points based on what we know about
+the residual breakdown (heat pumps closed by P4, gas boilers tightened
+by P4, price-version noise removed by P2+P3). The remaining residual
+is the genuine spec sweep target — and per-section fixes will move
+the probe in measurable, distinguishable amounts because there's no
+compensating layer to mask them, and there's no defensive type
+branching obscuring which input value drove which intermediate.
+
+---
+
+## 3. Why the prior diagnosis was wrong and how we fixed it

 The prior session shipped ten slices (S-B23 → S-B31) by debugging the
 biggest residuals one at a time:

 - **PE MAE dropped substantially: 57.28 → 43.32 (−14)** — real progress
  on the demand-side calculation.
- **SAP MAE barely moved: 5.34 → 4.61 (−0.73)** — the cost-side is
-  bottlenecked by cert-calibration prices that absorb multiple
-  structural deviations from spec, making any single slice that fixes
-  one component break the calibration for others.
+- **SAP MAE barely moved: 5.34 → 4.61 (−0.73)** — diagnosed at the time
+  as "cert-calibration absorbs multiple spec deviations".

-Two failed slice attempts in the prior session exposed the pattern:
+Three slice attempts looked like they "proved" the cert-cal-absorbs-
+deviations diagnosis:

- **Standing charges**: spec note Table 12 (a) clearly says gas standing
-  charge of £92 is added to space + water heating costs for energy
-  ratings. Empirically: adding it pushed SAP bias from +0.98 to −2.62.
-  Reverted before committing.
- **Cat=10 room heaters off-peak routing**: Table 12a clearly says
-  "Other direct-acting electric heating" bills 100% high rate on
-  7-hour tariff. Empirically: switching cat=10 from off-peak to
-  standard rate inverted the bias from +5.88 to −6.00 without
-  improving MAE. Reverted before committing.
- **Hot water cylinder loss (uncommitted)**: spec Table 2 footer +
-  Table 3 footer clearly say combi boilers using Table 4b efficiency
-  have zero storage + primary loss. Empirically: zeroing them dropped
-  PE MAE −6.64 (huge improvement) but raised SAP MAE +0.39 AND broke
-  3 of 7 golden fixtures. Reverted because no way to know whether to
-  follow spec (PE-correct) or Elmhurst (SAP-MAE-correct) without
-  reference traces.
+- **Standing charges**: spec Table 12 note (a) requires £92/yr gas
+  standing charge on space + water heating. Adding it pushed SAP bias
+  +0.98 → −2.62. Reverted.
+- **Cat=10 room heaters off-peak routing**: Table 12a says "other
+  direct-acting electric heating" bills 100 % high rate on 7-hour
+  tariff. Switching cat=10 from off-peak to standard rate inverted
+  the bias +5.88 → −6.00. Reverted.
+- **HW cylinder zero-loss for combi** (uncommitted): Table 2 + Table
+  3 footers require zero storage + primary loss when efficiency comes
+  from Table 4b. Zeroing them dropped PE MAE −6.64 but raised SAP
+  MAE +0.39 and broke 3 of 7 golden fixtures. Reverted.

-The pattern: **the cert-calibration prices** (in
-`domain.sap.tables.table_12_cert_calibration`) **were reverse-engineered
-to match Elmhurst's output assuming all our other calculations are
-correct.** When we fix a spec-violation bug in some other component, we
-break the calibration and SAP MAE goes up even though we're more
-spec-correct.
+The prior agent concluded: *cert-calibration absorbs Elmhurst's
+deviations from spec — we can't fix one without re-deriving the
+calibration, so do a full spec sweep first and re-derive cert-cal at
+the end.* This diagnosis is **wrong** and the proposed remedy
+amplifies the problem.

-This means **whack-a-mole on the biggest residual won't converge**. We
-need to systematically verify every component against the spec, then
-re-derive the cert-calibration once at the end.
+### What was actually going on
+
+The 250k-cert corpus spans multiple SAP spec-version regimes:
+- **Pre-2025-03-14**: certs lodged under SAP 10.1 / SAP 10.2 amendment
+  0 prices — mains gas ~3.48 p, standard electricity 13.19 p.
+- **Post-2025-03-14**: certs lodged under SAP 10.2 (14-03-2025) prices
+  — mains gas 3.64 p, standard electricity 16.49 p.
+
+The `table_12_cert_calibration` prices (3.48 p / 13.19 p) are **the
+older spec's prices**, not Elmhurst deviations from the spec. They
+are an empirical "best fit" across a mixture distribution of two
+price regimes, with downstream-component bugs (PCDB absence, HW
+cylinder loss applied to combi, etc.) absorbed into the fit. The
+table looks like compensation for assessor-software quirks because we
+were never told which spec each cert was on.
+
+Each "spec-correct fix that worsened MAE" in the failed slices above
+was actually correct. The MAE regressed because:
+1. The cert-cal prices (pre-March-2025 spec) cancelled with one set
+   of downstream errors to produce a quasi-stable cost.
+2. The spec-correct fix landed → that cancellation broke → the
+   probe MAE went up.
+3. But the spec-correct fix was *right* — what regressed was a
+   compensating-error equilibrium, not the calculator's truth.
+
+The prior session's "re-derive cert-cal at the end" plan would
+re-establish a new compensating-error equilibrium across the new bug
+set. It does not converge on spec-correctness.
+
+### The fix (per ADR-0010)
+
+1. **Stop fitting against a mixture distribution.** Filter the
+   validation corpus to a single spec-version window (Validation
+   Cohort, `inspection_date ≥ 2025-07-01`). Every cert in the cohort
+   was lodged on SAP 10.2 (14-03-2025) prices.
+2. **Delete the cert-calibration layer.** Use spec prices everywhere
+   (`domain.sap.tables.table_12`). The only price-routing decision
+   left is Table 12a fractional high-rate blending — a real spec
+   feature, not a calibration.
+3. **Build PCDB**, because it dominates residual variance and the
+   reason it was deferred (cert-cal-absorbs-PCDB) no longer holds.
+4. **Build trace mode and BRE worked-example fixtures**, so
+   per-section verification works against single-cert intermediates
+   instead of aggregate corpus MAE.
+
+This is what §2.5 lists as the five prerequisites. Once they land,
+the section-by-section spec sweep produces clean, monotonic
+improvements.

 ---

-## 4. Scope decisions
+## 4. Scope decisions (per ADR-0010)

 ### IN scope
- **RdSAP 10 specification (10-06-2025)** — full document, all sections
-  (`docs/sap-spec/rdsap-10-specification-2025-06-10.pdf`, 114 pages).
- **SAP 10.2 full specification (14-03-2025)** — the worksheet, tables,
-  appendices that RdSAP 10 references
-  (`docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf`, 199 pages).
+- **SAP 10.2 (14-03-2025 amendment)** is the active spec target.
+  `docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf`, 199 pages.
+- **RdSAP 10 (10-06-2025)** — the cert→input mapping layer that
+  cross-references SAP 10.2. `docs/sap-spec/rdsap-10-specification-2025-06-10.pdf`,
+  114 pages.
+- **PCDB integration.** Moved from "Session C deferred" to **P4
+  prerequisite** (§2.5). Heat pumps and the 78 % of gas-boiler certs
+  lodging `main_heating_data_source=1` need PCDB-sourced efficiency
+  for the calculator to be spec-correct. Data source:
+  https://www.ncm-pcdb.org.uk; lookup keyed on `main_heating_index_number`;
+  fields: seasonal efficiency, secondary efficiency, output kW,
+  flow-temperature curve (HPs).
+- **All RdSAP 10 sections in document order.** §1 → §§19, plus
+  Tables 27 / 28 / 29 / 30 / 31. The verification approach in §5 is
+  unchanged — only the precondition changes: the sweep runs against a
+  clean probe (Validation Cohort + spec prices + PCDB + trace mode).

-### OUT of scope (for now)
- **Full SAP assessments.** Full-SAP certs lodge a measured/calculated
-  U-value in `walls[i].description` (e.g.
+### OUT of scope
+- **Full SAP assessments.** Full-SAP certs lodge measured/calculated
+  U-values in `walls[i].description` (e.g.
  "Average thermal transmittance 0.18 W/m²K"). These are a separate
-  calculation path (BS EN ISO 6946) and a different corpus. **Park them
-  until the RdSAP 10 base case matches Elmhurst.** S-B24 / S-B29
+  calculation path (BS EN ISO 6946) and a different corpus. Park
+  until the RdSAP 10 base case parity is reached. S-B24 / S-B29
  attempted partial handling; those slices can stay or be reverted at
  your discretion when you reach §§4-7 of RdSAP and §3 of SAP 10.2.
- **PCDB (Product Characteristics Database).** ADR-0009 deferred this
-  to Session C. **This is a real future task, not a permanent
-  exclusion.** Heat pumps (cat=4) have catastrophic per-cert MAE (19
-  SAP points) because we use Table 4a fallback efficiency 2.30
-  instead of PCDB SCOP (typically 2.80-3.50). Gas boilers with
-  `main_heating_data_source=1` (78% of corpus boiler certs) fall back
-  to a category-default 0.80 vs typical PCDB-listed condensing-boiler
-  efficiencies of 0.88-0.94 — that's most of the per-cert SAP residual
-  variance on gas certs.
-
-  A `NoOpPcdbLookup` stub seam exists in Session A (per ADR-0009 grill
-  outcome #1). The fetch+parse work is non-trivial:
-  - **Data source**: BRE PCDB at https://www.ncm-pcdb.org.uk —
-    boilers + heat pumps are downloadable CSVs (thousands of rows
-    each).
-  - **Lookup key**: cert lodges `main_heating_index_number` which is
-    the PCDB product ID. Match by that.
-  - **Per-product fields needed**: seasonal efficiency, secondary
-    efficiency, output kW, flow-temperature curve (for HPs).
-  - **Effort**: ~half-day for the lookup + tests; ongoing maintenance
-    when BRE publishes new PCDB revisions.
-
-  **Recommended sequencing**: complete the systematic RdSAP spec
-  sweep first. Once the spec-correct engine is built and cert-cal
-  re-derived, PCDB integration should drop heat-pump residuals from
-  19 SAP points to ~1, and tighten the gas-boiler residual variance.
-  At that point heat pumps (cat=4) and PCDB-listed boilers
-  (`main_heating_data_source=1`) become accessible.
-
-  **Why not now**: the cert-calibration prices currently absorb the
-  missing PCDB efficiency (HP costs at off-peak rate compensates for
-  too-low SCOP). Fixing PCDB without re-deriving cert-cal would push
-  HP certs in the wrong direction. Same lesson as the other reverted
-  fixes in §7b — fix the spec layer first, the calibration layer
-  later.
- **SAP 10.3** (13-01-2026). The corpus is SAP 10.2. SAP 10.3 has
-  identical Table 12 codes (only values shift). Don't update spec
-  references to 10.3 until the corpus migrates.
+- **SAP 10.3 (13-01-2026).** No SAP-10.3-lodged certs in the corpus,
+  so it cannot be validated. Calculator targets SAP 10.2 until the
+  corpus migrates (expected late 2026 / 2027 once BRE updates RdSAP
+  to reference SAP 10.3). Note: `table_12.py` currently mixes SAP
+  10.2 prices with SAP 10.3 CO2 factors — corrected as part of P2.
+- **Historical-spec cert reproduction.** Calculating what cert SAP
+  *would have been* under SAP 10.1 / pre-March-2025 SAP 10.2 prices is
+  not the calculator's job. Lodged Performance carries the historical
+  value; Calculated SAP10 Performance is current-spec only. The
+  Validation Cohort filter operationalises this — older certs are
+  out of the validation loop, not because they're "wrong" but because
+  they're a different spec's output.
+- **Re-deriving cert-cal at the end.** The prior session's plan. The
+  cert-calibration layer is deleted in P2, not re-fit.

 ---

@ -208,22 +360,49 @@ For each table, formula, footnote, exception:
 3. Are there spec-defined edge cases / footnotes we're missing?

 ### 5.4. When a gap is found
- Write a failing unit test that asserts the spec-correct behaviour.
+- Write a failing unit test that asserts the spec-correct behaviour
+  — wherever possible, write it as an assertion on `intermediate`
+  values rather than on aggregate SAP, using a BRE worked example
+  if one covers the section.
 - Implement the fix.
- Run **all 7 golden fixtures** plus the broader probe. Note both
-  direction and magnitude of change.
- If the fix is spec-correct but breaks a golden fixture, this is
-  evidence that the fixture was a compensating-error case — proceed
-  with the spec-correct fix and update the fixture (with a comment
-  noting it was a compensating case).
- Commit per-slice as before: one section → one commit. Reference the
-  spec section in the commit message.
+- Run `test_bre_worked_examples.py` plus the Validation Cohort
+  probe. Note both direction and magnitude of change.
+- If a BRE worked-example breaks, the new code is wrong (revert).
+  BRE examples are spec-derived and cannot regress from a
+  spec-correct change.
+- Commit per-slice: one section → one commit. Reference the spec
+  section in the commit message.

-### 5.5. Use trace mode when you need it
-ADR-0009 specifies a `SapResult.intermediate: dict[str, float]` field
-that was never populated. Adding this is highly recommended for the
-systematic pass — each section's verification benefits from
-inspecting the intermediate values. See §11 below for a sketch.
+### 5.5. Sweep-time principle: worksheet-faithful structure
+
+Each `worksheet/*.py` module must mirror the SAP 10.2 worksheet
+structure for its section. As you verify a section, also restructure
+its module so that:
+
+1. **Each function name references its worksheet-line origin** (e.g.
+   `heat_transfer_coefficient` aligns with worksheet line (40);
+   `mean_internal_temperature` aligns with worksheet line (93)).
+2. **Compound calculations are split** into one function per
+   worksheet line where possible — easier to verify against
+   `intermediate[...]` and against BRE worked-example values.
+3. **Defensive type-handling disappears**. Once P6 lands, the input
+   is a typed enum or numeric — branching on `isinstance(x, int)` is
+   replaced by enum dispatch.
+4. **Domain-typed inputs flow directly**. `SapBuildingPart.kind ==
+   BuildingPartKind.MAIN_DWELLING` replaces string sniffing of
+   `identifier`. The dimensions.py "unnecessarily complicated"
+   pattern Khalim flagged is the canonical example of what *not*
+   to do.
+
+The principle applies during section-sweep slices. It is **not**
+a separate prerequisite — the refactor lands with the verification
+slice for the section it touches.
+
+### 5.6. Use trace mode when you need it
+P5 populates `SapResult.intermediate: dict[str, float]` with every
+named SAP 10.2 worksheet variable. Each section's verification
+benefits from inspecting these values per-cert. See §11 below for
+the sketch.

 ---

@ -350,59 +529,60 @@ touched and what the current state is.

 ---

-## 7. The cert-calibration vs spec-correctness tension
+## 7. The cert-calibration "tension" is dissolved (per ADR-0010)

-This is THE central architectural decision you have to make as you
-work through the spec.
+This section originally framed cert-calibration vs spec-correctness as
+two end-states the calculator had to choose between. That framing is
+wrong (see §3 for the actual diagnosis): the cert-cal values are
+pre-March-2025 SAP prices, not Elmhurst deviations from SAP 10.2.
+Once the corpus is filtered to the Validation Cohort (P3) and the
+cert-cal layer is deleted (P2), the false dichotomy disappears.

-### Two tables of fuel prices
- `domain.sap.tables.table_12.UNIT_PRICE_P_PER_KWH` — SAP 10.2 spec
-  values (3.64p gas, 16.49p standard elec).
- `domain.sap.tables.table_12_cert_calibration.UNIT_PRICE_P_PER_KWH`
-  — empirically lower values (3.48p gas, 13.19p elec) that match the
-  cert assessor software's output.
+### What replaces this section

-### Two possible end states for the calculator
+- **One price table.** `domain.sap.tables.table_12` (re-labelled SAP
+  10.2 14-03-2025 amendment, CO2 factors corrected per P2).
+- **One validation cohort.** `inspection_date ≥ 2025-07-01`, every
+  cert lodged on the calculator's target spec version.
+- **One verification mechanism.** Trace-mode intermediates + BRE
+  worked-example unit tests for per-section verification; Validation
+  Cohort probe MAE for aggregate go/no-go.

-**End state A — Spec-perfect.** Use spec prices, apply every spec rule
-(standing charges, Table 12a fractions, combi zero-loss, etc.). The
-calculator output is then what a *correct SAP 10.2 implementation*
-would produce. SAP MAE against the corpus will likely worsen because
-Elmhurst doesn't perfectly implement spec.
-
-**End state B — Elmhurst-perfect.** Use cert-cal prices and reproduce
-Elmhurst's deviations exactly. The calculator output matches cert
-SAP scores. The calculator becomes a "reverse-engineered Elmhurst
-clone" rather than a SAP 10.2 implementation.
-
-### The pragmatic recommendation
-
-**Aim for state A but track state B as the parity probe.** Concretely:
-
-1. Verify each spec section in isolation; fix spec violations
-   regardless of MAE impact, but commit each fix WITH a measured
-   probe delta in the commit message.
-2. After the spec sweep is complete, the calculator's output is
-   spec-correct. The corpus residual at that point is Elmhurst's
-   deviation from spec.
-3. THEN re-derive the cert-calibration prices to match Elmhurst's
-   deviation pattern. The calibration becomes a thin Elmhurst-
-   compatibility layer on top of a spec-correct engine.
-
-This avoids the whack-a-mole problem because state A is unambiguous:
-each fix is either spec-correct or not. State B is iterative on top
-of state A, not entangled with it.
+Cert-software deviations from spec, if they exist at all, are
+expected to be small and localised. They surface as residual after
+the spec sweep completes against a clean probe — and at that point
+the question is whether to chase them at all (Elmhurst-deviation
+fixes have low domain value compared to spec-correctness, given the
+calculator's product use case is scoring counterfactuals for the
+MeasureApplicator chain, not reproducing historical certs).

 ---

 ## 7b. Outstanding findings to pick up during the systematic pass

 The prior session identified several spec-correct fixes that were
-**reverted because they made SAP MAE worse against the corpus, but the
-spec basis is unambiguous and the fixes WILL be the right answer once
-the cert-calibration is re-derived against a clean engine.** Treat
-these as TODOs the systematic pass should encounter when it reaches
-the relevant section. They're listed here so the work isn't lost.
+reverted because they made SAP MAE worse against the **full corpus**.
+The empirical signal that "reverted" them was version-mixture noise
+(see §3) plus compensating-error breakage in the 7 retired golden
+fixtures. Each fix below is **expected to land cleanly** once the
+five prerequisites in §2.5 are done, because:
+
+- The Validation Cohort (P3) is on a single spec version — the price
+  mismatch that drove the bias regression on standing charges and
+  cat=10 routing disappears.
+- The cert-cal layer is gone (P2) — no calibration to "break".
+- PCDB is integrated (P4) — the heat-pump and gas-boiler residuals
+  that dominated per-cert MAE collapse before any of these findings
+  even matter.
+- The fixtures are now BRE worked examples (P5 + §10) — they cannot
+  be broken by spec-correct changes because they are themselves
+  derived from the spec.
+
+Treat each finding as a section-sweep TODO. The empirical impacts
+below were measured against the **dirty probe** (full corpus + cert-cal
+ no PCDB) and are **not predictive** of behaviour on the clean probe.
+Re-measure each fix against the Validation Cohort after prerequisites
+land.

 ### Finding 1 — HW cylinder zero-loss rule for combi boilers
 **Status**: spec-correct fix exists in working-tree-only form
@ -587,32 +767,45 @@ direction.

 ## 8. Don't repeat — known dead-ends

+> **Re-read after §3 + §7b.** Three entries below were classified as
+> "dead-ends because cert-cal absorbs" — that diagnosis is wrong.
+> They are spec-correct fixes that were measured under a noisy probe.
+> Now flagged as **conditional dead-ends**: dead only if you try them
+> before P1–P5 land. After prerequisites: they are expected
+> improvements, not dead-ends. See ADR-0010.
+
 - ❌ **Switching "NI" wall thickness to None alone** (S-B5 in history) —
  over-corrected because it routed to the (Unfilled cavity, 50mm) row
  instead of the dedicated Filled cavity row. The right fix landed in
  S-B23 with a `WALL_INSULATION_FILLED_CAVITY` dispatcher.
 - ❌ **Aggressive efficiency rescue for missing `sap_main_heating_code`**
  (S-B5) — over-corrected. The category fallback (cat=4 → 2.30) is
-  intentionally conservative; PCDB is needed for real efficiency.
- ❌ **Using SAP 10.2 spec prices for parity validation** — cert assessor
-  uses lower prices despite reporting `sap_version=10.2` (S-B9, S-B10).
-  Use `cert_calibration_prices()` for the probe.
+  intentionally conservative; PCDB (P4 prerequisite) supplies the
+  real efficiency.
+- ⚠️ **Using SAP 10.2 spec prices for parity validation** — under
+  the dirty probe, cert-cal prices fit better. **Inverts under the
+  clean probe (P2 + P3): SAP 10.2 spec prices are correct because the
+  Validation Cohort is on the 14-03-2025 amendment.** Listed here
+  only as a warning if you start the sweep before prerequisites land.
 - ❌ **Always applying 10% secondary heating** — must be conditional on
  cert lodging or main system being electric storage (S-B20). See
  spec Appendix A.4.
 - ❌ **Respecting `main_heating_fraction` for secondary allocation**
  (failed S-B30) — the field is the multi-main allocation (system 1 vs
  system 2), not main-vs-secondary. SAP MAE 4.69 → 4.85 (worse).
- ❌ **Switching cat=10 room heaters off off-peak** (failed S-B32) —
-  spec-correct per Table 12a but inverts bias direction. Cert-cal
-  calibration absorbs the deviation.
- ❌ **Adding gas standing charges** (4-mode probe, unimplemented) —
-  spec-correct per Table 12 note (a) but pushes SAP bias from +0.98
-  to −2.62. Cert-cal calibration absorbs.
- ❌ **Zeroing storage + primary loss for combi boilers** (uncommitted
-  S-B32) — spec-correct per Table 2 + Table 3 footers and drops PE
-  MAE −6.64 (huge win) BUT raises SAP MAE +0.39 and breaks 3 golden
-  fixtures. Decision deferred to systematic pass.
+- ⚠️ **Switching cat=10 room heaters off off-peak** (failed S-B32) —
+  spec-correct per Table 12a. The bias inversion under the dirty
+  probe was driven by cert-cal compensating; on the clean probe this
+  is just spec-correct. Land as part of the §12 spec sweep after
+  prerequisites.
+- ⚠️ **Adding gas standing charges** (4-mode probe, unimplemented) —
+  spec-correct per Table 12 note (a). Same logic: bias drift under
+  dirty probe is version-mixture + missing-PCDB noise, not Elmhurst
+  deviation. Land as part of §12 spec sweep.
+- ⚠️ **Zeroing storage + primary loss for combi boilers** (uncommitted
+  S-B32) — spec-correct per Table 2 + Table 3 footers. SAP MAE
+  regression was driven by the now-retired golden fixtures (§10) and
+  cert-cal absorption. Land as part of §4 / Appendix J sweep.

 ---

@ -620,10 +813,15 @@ direction.

 ### Sample
 `data/ml_training/runs/2025_2026_n250000_v18a/data.parquet` is the
-250k-cert parquet. The probe filters to `sap_score ∈ [5, 99]` and
+250k-cert parquet. **After P1 lands** the parquet carries
+`inspection_date`; the probe then filters to the **Validation Cohort**
+(`inspection_date ≥ 2025-07-01`) plus `sap_score ∈ [5, 99]` and
 samples 300 at seed=7 by default. Filtering rationale:
- ≤ 5 is heritage/anomaly stock (sub-3% of corpus)
+- ≤ 5 is heritage/anomaly stock (sub-3 % of corpus)
 - ≥ 99 is full-SAP new-builds the parquet excludes anyway
+- `inspection_date ≥ 2025-07-01` ensures every cert was lodged on
+  SAP 10.2 (14-03-2025 amendment) — see [CONTEXT.md](../../CONTEXT.md)
+  / "Validation Cohort" and ADR-0010 §3.

 ### Run the probe
 ```bash
@ -654,38 +852,58 @@ main(['300','7'])

 ---

-## 10. The 7 golden fixtures
+## 10. Fixtures: retire the 7 cert-based golden fixtures, replace with BRE worked examples (per ADR-0010 + P5)

+The 7 cert-based fixtures at
 `packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`
-locks 7 corpus certs as regression anchors:
+were locked in against the current calculator state — *with* cert-cal,
+*without* PCDB, *with* HW cylinder loss always applied, *with* the
+lighting heuristic, etc. They are documented in §3 / the prior
+handover as containing compensating errors. Once the prerequisites
+land, every spec-correct fix breaks at least one of them. They will
+fight the spec sweep.

-| Cert | TFA | Cat | Notes |
-|---|---|---|---|
-| `0240-0200-5706-2365-8010` | 202 | 2 | Detached, age J, oil boiler, Table 4b code 130 |
-| `0300-2747-7640-2526-2135` | 526 | 2 | Semi-detached, age D, gas PCDB |
-| `0390-2954-3640-2196-4175` | 360 | 2 | Detached, age F, oil PCDB |
-| `6035-7729-2309-0879-2296` | 128 | 2 | Mid-terrace, age A, gas combi code 104 |
-| `7536-3827-0600-0600-0276` | 152 | 2 | Detached + extensions, age D, gas PCDB. Cleanest PE match (−0.29 kWh/m²) |
-| `8135-1728-8500-0511-3296` | 102 | 2 | Semi-detached, age C, gas PCDB |
-| `9390-2722-3520-2105-8715` | 75 | 6 | Mid-floor flat, age D, heat network code 301 |
+### Replacement strategy

-Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10`. **Tighten as
-the spec sweep progresses.**
+**Primary regression suite: BRE worked-example fixtures.**

-The cert JSONs are stored under `fixtures/golden/<cert>.json` —
-frozen at extraction time so the test is reproducible without
-bulk-zip access. The probe extraction script for new fixtures is
-inlined in the test history (see commit `f4a8d2a0`).
+Transcribe the worked examples from:
+- SAP 10.2 spec appendices (especially Appendix R — reference values
+  and the worked example dwelling).
+- RdSAP 10 (10-06-2025) worked-example annex.

-**Important caveat**: some of these 7 are compensating-error matches
-(see §3). When a spec-correct slice breaks one, the fixture is
-probably the compensating case — investigate before reverting.
+Each worked example becomes a unit test that locks **per-intermediate
+expected values** (HLP, HTC, mean internal temperature monthly, MIT,
+ECF, SAP score) rather than the aggregate SAP score alone. Because
+they are spec-derived, no spec-correct change can break them — any
+break is an implementation bug, unambiguously.
+
+These tests live at
+`packages/domain/src/domain/sap/tests/test_bre_worked_examples.py`
+(new module — separate from the cert-based fixtures module).
+
+**Cert-based fixtures retired.**
+
+The current `test_golden_fixtures.py` is either deleted or repurposed
+as a *very loose* smoke-test integration suite (e.g. `|SAP residual|
+≤ 5`) that catches catastrophic regressions only. The 7 cert JSONs
+under `fixtures/golden/<cert>.json` can be kept on disk as reference
+data, but they no longer drive go/no-go decisions in the sweep.
+
+**Optional future addition.**
+
+If/when a current Elmhurst (or Stroma / Quidos / NHER) license is
+available, run a handful of representative corpus certs through it
+and lock those outputs as a second-tier regression suite — Elmhurst-
+parity fixtures alongside spec-parity fixtures. Not a prerequisite.

 ---

-## 11. Trace mode (recommended infrastructure)
+## 11. Trace mode (prerequisite P5 — implementation sketch)

-ADR-0009 proposed:
+This section was originally labelled "recommended"; it is now
+**prerequisite P5** per ADR-0010. The sweep does not start until
+`intermediate` is populated everywhere. ADR-0009 proposed:
 ```python
@dataclass(frozen=True)
 class SapResult:
@ -765,14 +983,39 @@ This single session should produce zero behaviour changes if §1-3 are
 correctly implemented, but expect to find at least one issue in §3
 geometry (per the reviewer's "biggest SAP error sources" list).

-Run the golden fixtures + probe at the end of each session; expect no
-movement until you start hitting actual gaps.
+**Important:** Session 1 only starts after all five prerequisites in
+§2.5 have landed and the Validation Cohort probe baseline has been
+captured. Until then, running per-section verification produces noisy
+signal.
+
+Run the BRE worked-example fixtures (P5) + Validation Cohort probe
+(P3) at the end of each session; expect no movement until you start
+hitting actual gaps.

 ---

 ## 13. Workflow recap

-For each section, in order:
+**Phase 0 — Prerequisites (§2.5).** Land P1–P6 first, in dependency
+order:
+
+| | Slice | Depends on |
+|---|---|---|
+| P1 | Re-extract parquet with `inspection_date` | — |
+| P2 | Delete cert-cal; correct `table_12.py` CO2 factors | — |
+| P3 | Filter parity probe to Validation Cohort | P1 |
+| P4 | Implement `PcdbLookup` | — (P2 helpful) |
+| P5 | Populate `SapResult.intermediate` + transcribe BRE worked examples | — |
+| P6 | Strict-type `EpcPropertyData` via codes.csv-derived enums | — |
+
+P1, P2, P4, P5, P6 can run in parallel. P3 needs P1. Capture a
+Validation Cohort probe baseline once all six land — that is the new
+MAE starting line. Repo-wide tests stay green throughout P6 (Site
+Notes consumers, ML pipeline, recommendations, etc. all need the
+mapper updates that accompany each typing change).
+
+**Phase 1 — Section sweep.** For each RdSAP 10 section, in document
+order:

 1. Read the spec section text + cited tables.
 2. Identify code location(s).
@ -780,22 +1023,36 @@ For each section, in order:
   - Does our code implement it?
   - Does the implementation match?
   - Edge cases / fallback paths handled?
-4. For each gap: AAA unit test → minimal implementation → commit.
-5. After each commit: run golden fixtures (`pytest test_golden_fixtures.py`)
-   and the parity probe. Note both deltas in the commit message.
-6. If a golden fixture breaks: investigate. Either fixture was a
-   compensating case (acceptable to break) or the new code is wrong
-   (revert).
+4. For each gap: AAA unit test (preferring a BRE worked-example
+   assertion on `intermediate` values when possible) → minimal
+   implementation → commit.
+5. **Apply the worksheet-faithful structure principle** (§5.5) as
+   part of this slice: name functions after worksheet lines, split
+   compound calculations, replace any remaining defensive
+   type-handling with typed-enum dispatch.
+6. After each commit: run `test_bre_worked_examples.py` + Validation
+   Cohort probe. Note both deltas in the commit message.
+7. If a BRE worked-example breaks: the new code is wrong (revert).
+   The worked examples are spec-derived and cannot be broken by
+   spec-correct changes.

 Stick to this. The prior session's mistake was jumping between
-sections based on residual-size. Don't.
+sections based on residual-size **on a dirty probe**. Clean probe
+plus document-order discipline plus worksheet-faithful structure is
+what makes the sweep converge.

 ---

 ## 14. Useful references

+- **ADR-0010** `docs/adr/0010-sap10-calculator-spec-target-and-validation.md`
+  — the binding decisions reflected in this rewrite: SAP 10.2 target,
+  cert-cal deletion, Validation Cohort, PCDB-as-prerequisite, fixture
+  retirement. **Read first.**
 - **ADR-0009** `docs/adr/0009-deterministic-sap-calculator.md` —
-  decision rationale + Session A/B/C plan.
+  original calculator decision rationale + Session A/B/C plan. Read
+  for context; spec-version target / PCDB sequencing / cert-cal
+  rationale are superseded by ADR-0010.
 - **Spec coverage map**
  `docs/sap-spec/SPEC_COVERAGE.md` — pre-existing coverage tracker.
  Update as you go.
@ -817,19 +1074,19 @@ sections based on residual-size. Don't.

 ## 15. Final note

-The prior session demonstrated that **moving SAP MAE down requires
-either spec-correctness OR Elmhurst-perfect calibration, not both
-simultaneously**. The cert-cal layer absorbs Elmhurst's spec
-deviations; any spec-correct fix risks breaking it.
+The prior session's framing — *"the cert-calibration layer absorbs
+Elmhurst's spec deviations; we'll re-derive it at the end"* — was
+load-bearing on a false diagnosis. The cert-cal layer is
+pre-March-2025 SAP prices fit against a mixture distribution of two
+spec-version regimes. Once you separate the regimes (Validation
+Cohort) and use spec prices everywhere, the "tension" disappears.

-The systematic pass clears this by separating the layers:
-1. Build the spec-correct engine first.
-2. Re-fit the cert-cal compatibility layer once at the end.
+After P1–P5 land, the section sweep is straightforward: every
+spec-correct fix is unambiguously the right answer, BRE
+worked-example fixtures lock the result, and Validation Cohort probe
+MAE moves monotonically downward. The fixes the prior session marked
+as "spec-correct but probe-regressed" become trivially landable.

-Don't be discouraged when SAP MAE rises temporarily during the spec
-sweep. PE residual is the truer signal of engine correctness. SAP
-MAE convergence will follow once cert-cal is re-derived against the
-clean engine.
-
-**Welcome to the project. Read the spec, follow the order, commit one
-section at a time. The deterministic answer is in there.**
+**Welcome to the project. Read ADR-0010, land the five prerequisites,
+then walk the spec in document order. The deterministic answer is in
+there.**
--- a/packages/domain/src/domain/sap/worksheet/dimensions.py
+++ b/packages/domain/src/domain/sap/worksheet/dimensions.py
@ -21,7 +21,6 @@ from typing import Final

 from datatypes.epc.domain.epc_property_data import EpcPropertyData, SapBuildingPart

-
 _DEFAULT_STOREY_HEIGHT_M: Final[float] = 2.5


@ -72,6 +71,16 @@ def dimensions_from_cert(epc: EpcPropertyData) -> Dimensions:
    """Build the `Dimensions` aggregate from an EpcPropertyData."""
    parts = epc.sap_building_parts or []

+    # Khalim Comments - this section seems to implement the
+    # worksheet section in page 132 and is unnecessarily
+    # complicated. The sap building parts are pre-ordered, form
+    # main building part to the extensions and the
+    # "identifier" field tells us if the part is the Main Dwelling
+    # of it's an extension. E.g. if it's an extension, identifier
+    # should be "Extension 1".
+    # We should strictly type the values on the EpcPropertyData
+    # domain model
+
    ground_area = 0.0
    ground_perim = 0.0
    top_area = 0.0