From 49e8c65ae88eaa8659e8ffd1cee45ad700a4f227 Mon Sep 17 00:00:00 2001
From: Khalim Conn-Kowlessar <kconnkowlessar@gmail.com>
Date: Wed, 20 May 2026 13:03:09 +0000
Subject: [PATCH] =?UTF-8?q?Handover:=20replace=20stale=20docs=20with=20foc?=
 =?UTF-8?q?used=20=C2=A73-close=20+=20Table-11=20brief?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Delete HANDOVER_FRESH_REVIEW (22-slice, MAE-5.34 era) and
HANDOVER_SYSTEMATIC_REVIEW (pre-Elmhurst-conformance). Both described
a state the Elmhurst worksheet work has since superseded.

Add HANDOVER_S3_CLOSE.md with:
- Accurate §3 status: §1/§2 fully done; LINE_31/LINE_36 exact for
  non-RR fixtures; LINE_33 gap diagnosed as missing floor_construction
  codes (not a window-area problem as previously assumed)
- Concrete investigation steps to close LINE_33 for 000474 + 000490
- Table 11 Secondary Heating framed as next slice after §3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 docs/sap-spec/HANDOVER_FRESH_REVIEW.md      |  136 ---
 docs/sap-spec/HANDOVER_S3_CLOSE.md          |  149 +++
 docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md | 1092 -------------------
 3 files changed, 149 insertions(+), 1228 deletions(-)
 delete mode 100644 docs/sap-spec/HANDOVER_FRESH_REVIEW.md
 create mode 100644 docs/sap-spec/HANDOVER_S3_CLOSE.md
 delete mode 100644 docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md

diff --git a/docs/sap-spec/HANDOVER_FRESH_REVIEW.md b/docs/sap-spec/HANDOVER_FRESH_REVIEW.md
deleted file mode 100644
index 6f9ca0b5..00000000
--- a/docs/sap-spec/HANDOVER_FRESH_REVIEW.md
+++ /dev/null
@@ -1,136 +0,0 @@
-# Handover: fresh-context review of the SAP 10.2 calculator
-
-Audience: a fresh agent in a new context window. Read this first, then the SAP 10.2 + RdSAP 10 spec PDFs, then the calculator code. Your job is to find spec-vs-implementation gaps that the previous (long-context) agent has missed or got wrong.
-
-## TL;DR — where we are
-
-- Deterministic SAP 10.2 calculator at `packages/domain/src/domain/sap/`.
-- 22 slices shipped under ADR-0009.
-- 300-cert parity probe: **SAP MAE 5.34, bias +0.29** (we're slightly over-predicting SAP score on average).
-- **Primary-energy bias +51.6 kWh/m²** ← biggest surprise; we over-predict primary energy by ~50%. This was discovered just before this handover; previous slices weren't accounting for it correctly.
-- 17/300 (5.7%) certs match the cert's `energy_rating_current` exactly.
-
-Goal per ADR-0009: typical-subset SAP MAE ≤ 1.0.
-
-## Critical context
-
-1. **Two truth-sources collide.** `tables/table_12.py` carries the spec-correct SAP 10.2/10.3 prices (mains gas 3.64p, std elec 16.49p). `tables/table_12_cert_calibration.py` carries the empirical lower prices that match the cert assessor's actual output (3.48p, 13.19p). The parity probe uses the cert-calibration table; the engine's default is spec.
-2. **The cert assessor diverges from the published SAP 10.2 spec in several places** we've found:
-   - Unit prices: cert uses ~10-25% lower than published Table 12
-   - Tariff routing: cert applies off-peak to electric room heaters (code 691) when meter_type=1 (Dual), even when Table 12a says these should bill at the high rate
-   - Unknown meter (RdSAP energy_tariff=3): cert defaults to Single (per Elmhurst test), our code also matches this
-3. **PEUI bias was discovered right at handover time.** Our `primary_energy_kwh_per_m2` runs +51 kWh/m² over the cert's `energy_consumption_current`. This is the biggest clue and the most efficient next dig.
-
-## Repo layout
-
-```
-packages/domain/src/domain/sap/
-├── calculator.py                    # Sap10Calculator + calculate_sap_from_inputs
-├── tables/
-│   ├── table_12.py                  # SAP 10.2 spec prices, CO2, PEF
-│   └── table_12_cert_calibration.py # empirical cert prices
-├── worksheet/
-│   ├── dimensions.py                # §1
-│   ├── ventilation.py               # §2 (incl wind shelter S-B21)
-│   ├── heat_transmission.py         # §3 (incl DwellingExposure)
-│   ├── internal_gains.py            # §5 + Appendix L
-│   ├── solar_gains.py               # §6 + Appendix U §U3.2
-│   ├── utilisation_factor.py        # Table 9a
-│   ├── mean_internal_temperature.py # §7 + Table 9/9b/9c
-│   ├── space_heating.py             # §9
-│   └── rating.py                    # §13 (SAP rating equations)
-├── climate/
-│   └── appendix_u.py                # Tables U1/U2/U3 + solar declination
-├── rdsap/
-│   └── cert_to_inputs.py            # EpcPropertyData → CalculatorInputs mapping
-├── validation/
-│   └── parity_report.py             # ParityReport aggregator
-└── tests/                           # 103 unit tests
-
-services/ml_training_data/src/ml_training_data/
-└── sap_parity_probe.py              # runs calculator on N random certs from corpus
-
-docs/sap-spec/
-├── sap-10-2-full-specification-2025-03-14.pdf  (199pp) — primary spec
-├── sap-10-3-full-specification-2026-01-13.pdf  (201pp) — newer spec (Table 12 identical)
-├── rdsap-10-specification-2025-06-10.pdf       (114pp) — RdSAP rules (separate from SAP)
-├── SPEC_COVERAGE.md                            — our coverage map
-└── PARITY_FINDINGS.md                          — earlier probe findings
-
-docs/adr/0009-deterministic-sap-calculator.md   — accepted ADR
-```
-
-## How to run the parity probe
-
-```bash
-python -c "
-import sys
-sys.path.insert(0, 'packages/domain/src')
-sys.path.insert(0, '.')
-sys.path.insert(0, 'services/ml_training_data/src')
-from ml_training_data.sap_parity_probe import main
-main(['300','7'])  # 300 certs, seed=7
-"
-```
-
-## Where to dig (priority-ordered, by likely MAE impact)
-
-### Tier 1 — the PEUI mystery (50% over)
-
-Our `primary_energy_kwh_per_m2` runs +51 kWh/m² over the cert's `energy_consumption_current`. Possibilities:
-
-- **Wrong primary energy factors in `tables/table_12.py PRIMARY_ENERGY_FACTOR`**. I populated this from approximate spec values; verify each one against SAP 10.2 Table 12 (page 189). Especially electricity PEF=1.501 — that's ~30% of corpus uses electricity for some end-use.
-- **HW demand over-counted.** Look at `domain.ml.demand.predicted_hot_water_kwh`. Cylinder loss + primary circuit loss may be over-stated. SAP §J + Appendix J details exact formulas. We use bucket-rounded `_STORAGE_LOSS_FACTOR` instead of interpolation.
-- **Space heating demand over-counted.** Could come from:
-  - Living-area-fraction defaults (Table 27): we use {1:0.75, 2:0.50, 3:0.30, 4:0.25, ≥5:0.21}; double-check against the RdSAP 10 PDF.
-  - Control-temperature adjustment (Table 4e): we always pass 0; spec applies ~-0.7°C in some configurations.
-  - Thermal mass parameter: we use 250 kJ/m²K always; spec varies by construction type.
-- **Lighting/pumps over-counted.** Currently using Appendix L existing-dwelling fallback (no fixed lighting). Newer dwellings should use lower lighting energy.
-
-### Tier 2 — wall U-value cascade
-
-Worst-residual certs have `wall_construction=4 (cavity)`, `wall_insulation_type=2`, `wall_insulation_thickness="NI"`. We treat as uninsulated cavity (column 0). Cert assessor may know it's insulated (the type=2 code says so). See `domain.ml.rdsap_uvalues._insulation_bucket` — when `thickness=0` AND `present=True`, spec says use 50mm row but our parser converts "NI"→0 which short-circuits to "uninsulated".
-
-I tried switching "NI"→None in S-B5 cycle but it over-corrected aggregate MAE. Worth re-trying with the new understanding (compare PRIMARY energy delta on affected certs specifically).
-
-### Tier 3 — cost-side residuals
-
-Per S-B17 hand-trace: cert 2389-4472 has correct delivered energy but our SAP is 10 points lower than the cert's. Implied cert blended unit-cost rate is lower than ours. Likely cause: cert assessor applies different rate logic in edge cases (oil + off-peak meter, electricity-and-gas mix, etc.). Worth tracing more carefully.
-
-### Tier 4 — known unimplemented spec pieces
-
-(per `SPEC_COVERAGE.md`)
-- Cooling §10 (rare)
-- FEE §11 (new-build only)
-- Per-junction thermal bridging Table R2 (ADR says defer)
-- Multi-main heating Table 11 with non-zero secondary (we have this conditionally)
-- Standing charges (Table 12 note (a))
-
-## What's been validated
-
-- §13 SAP rating equations: 108.8 − 120.5 log10(ECF) for ECF ≥ 3.5, else 100 − 16.21·ECF. Verified against SAP 10.2 PDF page 38.
-- §12.2 fuel price rule: "Other prices must not be used". We have spec-correct prices + cert-calibration prices as separate tables.
-- Appendix U: tables verbatim.
-- Appendix U rating-uses-UK-average rule: applied (S-B18).
-- Solar gains §6.1 + Appendix U §U3.2 polynomial: implemented.
-
-## Suggested first session
-
-1. **Read SAP 10.2 §§4 + Appendix J carefully** (hot water demand). Map every formula against our `domain.ml.demand.predicted_hot_water_kwh`. Note divergences. The PEUI bias is largely driven by HW + heating demand.
-2. **Read SAP 10.2 §14** (CO2 and primary energy). Compare to our `calculate_sap_from_inputs` primary_energy aggregation. Note especially: does the cert's `energy_consumption_current` use the same end-use list (space + HW + lighting + pumps/fans) or a different one?
-3. **Read RdSAP 10 §11 (Heating)**. Check our `domain.ml.sap_efficiencies.seasonal_efficiency` cascade against the RdSAP rules. Especially heat pump efficiency (we use 2.30 for category 4 fallback).
-4. Open issues in the parity-decomp data:
-   - 26 certs with correct energy but SAP MAE 4.12 → cost-side
-   - 51 kWh/m² primary-energy bias → demand-side
-
-## Don't repeat these dead-ends
-
-- ❌ Switching "NI" wall thickness to None — over-corrected in aggregate (S-B5)
-- ❌ Aggressive efficiency rescue for missing sap_main_heating_code — over-corrected (S-B5)
-- ❌ Using SAP 10.2 spec prices for parity validation — the cert assessor uses legacy lower prices despite reporting sap_version=10.2 (S-B9, S-B10)
-- ❌ Applying off-peak to electric main heating regardless of meter_type — the meter_type field is the truth (S-B15)
-- ❌ Always applying 10% secondary heating — should be conditional on cert lodging or main system being electric storage (S-B20)
-
-## Commit history
-
-The last 22 commits are S-B1..S-B22. Each commit message documents the slice's hypothesis, change, and measured impact. Worth reading 5-10 of the latest commit messages for context on what's been tried.
diff --git a/docs/sap-spec/HANDOVER_S3_CLOSE.md b/docs/sap-spec/HANDOVER_S3_CLOSE.md
new file mode 100644
index 00000000..228c065c
--- /dev/null
+++ b/docs/sap-spec/HANDOVER_S3_CLOSE.md
@@ -0,0 +1,149 @@
+# Handover — Close §3, then Table 11 Secondary Heating
+
+**Audience:** Fresh agent continuing the deterministic SAP 10.2 calculator
+(`packages/domain/src/domain/sap/`). Read this document first, then skim
+the two key source files listed below.
+
+---
+
+## What we're building
+
+A deterministic SAP 10.2 calculator that replicates cert-software output
+(Elmhurst, Stroma, etc.) exactly for RdSAP 10 input certs. The domain
+concept is **Calculated SAP10 Performance** — see
+`docs/adr/0009-deterministic-sap-calculator.md`. Progress is tracked in
+`docs/sap-spec/SPEC_COVERAGE.md`.
+
+The workflow is strict TDD: **one failing test → minimal implementation →
+commit**. Each commit is one slice.
+
+---
+
+## Current state
+
+### §1 Dimensions — DONE
+All 6 Elmhurst fixtures pass exactly (`test_section_1_matches_elmhurst_worksheet`).
+
+### §2 Ventilation — DONE
+All 6 Elmhurst fixtures pass exactly (`test_section_2_matches_elmhurst_worksheet`).
+
+### §3 Heat transmission — PARTIALLY DONE
+
+What passes today:
+- **Internal invariants** (all 6 fixtures): `(33) = Σ per-element`,
+  `(37) = (33) + (36)`.
+- **Exact LINE_31 + LINE_36** (non-RR fixtures 000474 and 000490 only):
+  `test_section_3_non_rr_line_31_and_36_match_elmhurst_worksheet`.
+
+What does NOT yet pass:
+- **Exact LINE_33** (fabric heat loss) for any fixture. This is the
+  remaining §3 close task (see below).
+- **RR sub-areas** (fixtures 000487, 000480, 000477, 000516): gable/
+  slope/stud-wall/flat-ceiling areas are not in `SapRoomInRoof`; these
+  fixtures are **formally deferred** — see gap notes in
+  `test_section_3_partial_match_against_elmhurst_worksheet`.
+
+---
+
+## Task A — Close LINE_33 for non-RR fixtures (investigation slice)
+
+**Goal:** assert exact LINE_33 and LINE_37 for 000474 and 000490.
+
+### The diagnostic gap
+
+Running `heat_transmission_from_cert(epc, window_total_area_m2=0, door_count=actual)` on
+000474 gives `fabric = 193.83 W/K`. The Elmhurst `LINE_33 = 209.11 W/K`.
+The gap is +15.28 W/K — and it cannot be explained by window area alone,
+because `u_wall (1.5) > u_window_eff (1.33)`, so adding windows would
+*decrease* fabric heat loss, not increase it.
+
+The gap is therefore in one or more of the other elements. Most likely
+culprits, in priority order:
+
+1. **Floor construction missing from fixture.**
+   `SapFloorDimension.floor_construction` is `None` in all Elmhurst
+   fixture files (field not set). Our `u_floor` fallback may not match
+   the Elmhurst value. The 000490 fixture comment records the expected
+   U-values explicitly: *"suspended timber ground floor on main (U=0.71),
+   exposed timber floor on Extension 1 (U=1.20)"*. Set the correct
+   `floor_construction` and `floor_insulation` codes on each
+   `SapFloorDimension` and see if the gap closes.
+
+2. **Roof construction / insulation thickness missing from fixture.**
+   Similarly, `roof_insulation_thickness` may not be set on the building
+   parts. The Elmhurst cert will have a specific roof type and insulation
+   depth that drives a specific `u_roof`.
+
+3. **Wall insulation re-check.** All fixtures use `wall_insulation_type=4`
+   (`_WALL_INSULATION_NONE`), giving `u_wall = 1.5` for cavity age B.
+   Confirm this matches the actual Elmhurst worksheet row.
+
+### How to proceed
+
+1. Read the EPC API field encoding for `floor_construction` and
+   `floor_insulation` in `datatypes/epc/domain/epc_property_data.py`
+   and `packages/domain/src/domain/ml/rdsap_uvalues.py` (the `u_floor`
+   function + its construction constants).
+2. Look up the actual floor type for 000474 and 000490 from the PDF
+   (ask the user — PDFs were supplied manually; not stored in repo).
+3. Set `floor_construction` + `floor_insulation` + `floor_insulation_thickness`
+   on the `SapFloorDimension` objects in the fixture files.
+4. Re-run the debug calc (`r0.fabric` with `window_area=0`) and check
+   whether the gap collapses.
+5. Once floor/roof are resolved, back-calculate window area:
+   `A_w = (LINE_33 - r0.fabric) / (window_u_eff - u_wall)`.
+   If the gap is now ≤ the window contribution, this formula should give
+   a physically plausible positive area (5–15 m² for a 2-storey terrace).
+6. Add `WINDOW_TOTAL_AREA_M2: float` and `WINDOW_AVG_U_VALUE: float = 1.4`
+   constants to each non-RR fixture file.
+7. Write a new parametrised test asserting exact LINE_33 and LINE_37 for
+   000474 and 000490. Commit as one slice.
+
+---
+
+## Task B — Table 11 Secondary Heating (highest-MAE-impact gap)
+
+Per `SPEC_COVERAGE.md`, this is the **next priority after §3**.
+
+Most boiler-main certs allocate ~10 % of space heating to a secondary
+system (electric room heater or similar). We currently model 0 %. This
+causes a systematic bias on the large majority of boiler certs.
+
+**SAP 10.2 Table 11** gives the secondary fraction keyed on main-heating
+type. **RdSAP 10 Appendix A** identifies the heating type from cert codes.
+
+Starting point: `packages/domain/src/domain/sap/calculator.py` (entry
+point) and `packages/domain/src/domain/sap/rdsap/cert_to_inputs.py`
+(cert→inputs adapter). The `SapInputs` struct carries `main_heating_*`
+fields — see how space heating demand is calculated and where a secondary
+fraction would hook in.
+
+---
+
+## Key files to read
+
+| File | Why |
+|---|---|
+| `packages/domain/src/domain/sap/worksheet/heat_transmission.py` | §3 implementation — `heat_transmission_from_cert` |
+| `packages/domain/src/domain/sap/worksheet/tests/test_heat_transmission.py` | all §3 tests including the partial Elmhurst conformance test |
+| `packages/domain/src/domain/sap/worksheet/tests/_elmhurst_worksheet_000474.py` | non-RR fixture to close |
+| `packages/domain/src/domain/sap/worksheet/tests/_elmhurst_worksheet_000490.py` | non-RR fixture to close |
+| `packages/domain/src/domain/ml/rdsap_uvalues.py` | all U-value lookups — `u_floor`, `u_wall`, `u_roof` |
+| `docs/sap-spec/SPEC_COVERAGE.md` | overall progress tracker |
+| `docs/adr/0009-deterministic-sap-calculator.md` | scope + architectural decisions |
+
+Spec PDFs are at `docs/sap-spec/` — SAP 10.2 (March 2025), SAP 10.3
+(Jan 2026), RdSAP 10 (June 2025).
+
+The canonical reference Excel worksheet is at the repo root:
+`2026-05-19-17-18 RdSap10Worksheet.xlsx`. A loader for it is at
+`packages/domain/src/domain/sap/worksheet/tests/_xlsx_loader.py`.
+
+---
+
+## Test suite
+
+```
+python -m pytest packages/domain/src/domain/sap/worksheet/tests/ -q
+# Should show 122 passed
+```
diff --git a/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md b/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md
deleted file mode 100644
index a3d23a07..00000000
--- a/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md
+++ /dev/null
@@ -1,1092 +0,0 @@
-# Handover — Systematic Section-by-Section RdSAP 10 / SAP 10.2 Review
-
-**Audience:** A fresh agent picking up the deterministic SAP calculator at
-`packages/domain/src/domain/sap/`. Read this first, then the spec PDFs,
-then the code.
-
-**Goal:** Match the cert software (Elmhurst / Stroma / etc.) output exactly
-for RdSAP 10 / SAP 10.2 input certs. This is a **deterministic, mechanical
-calculation** — not a model — so MAE should approach zero on certs whose
-inputs are fully populated.
-
----
-
-## 1. Critical framing — this is NOT a judgement call
-
-The SAP/RdSAP energy assessment splits cleanly into two roles:
-
-1. **The assessor** — a person who surveys the dwelling and lodges
-   measured/observed fields onto the cert (areas, perimeters,
-   construction codes, insulation thicknesses, fuel types, etc.).
-   The assessor makes NO calculation decisions.
-2. **The cert software** (Elmhurst, Stroma, Quidos, NHER, ECMK) — a
-   deterministic implementation of the RdSAP 10 + SAP 10.2 specs. It
-   takes the lodged fields and produces SAP score, CO2 emissions,
-   primary energy (PEUI), CO2 per m², EI rating, etc.
-
-**Our calculator is replicating role #2.** Assessor software
-implements the SAP 10.2 spec faithfully; the question of "where does
-Elmhurst diverge from spec?" is no longer the operative one (per
-ADR-0010 + §3 below). Our job is to enumerate every spec
-table / formula / footnote and verify each against the published SAP
-10.2 (14-03-2025) and RdSAP 10 (10-06-2025) PDFs.
-
-There is no "assessor judgement" knob to tune. Each field on the cert
-has a deterministic interpretation per the spec. Each spec table /
-formula has a deterministic implementation. Our job is to enumerate
-all of them and verify each.
-
----
-
-## 2. Current state (2026-05-19)
-
-- Branch: `ara-backend-design-prd`
-- Last clean commit: `f4a8d2a0` ("tests: golden-fixture regression set — 7 currently-correct corpus certs")
-- 301 tests passing
-- Parity probe (300 random certs from
-  `data/ml_training/runs/2025_2026_n250000_v18a/data.parquet`, seed=7,
-  `sap_score ∈ [5, 99]`):
-
-  | Metric | Value |
-  |---|---|
-  | SAP MAE | 4.61 |
-  | SAP bias | +0.87 |
-  | PE MAE | 43.32 kWh/m² |
-  | PE bias | +37.69 kWh/m² |
-
-- 7 "golden" regression certs locked in
-  `packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`.
-  Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10 kWh/m²`. Known
-  caveat: some of these are compensating-error matches (e.g. cert
-  `7536-3827`'s PE matches but cost is £143 under cert's implied cost
-  due to multi-factor offsetting bugs). **These fixtures are retired
-  per ADR-0010 and §10 below — they lock buggy compensating outputs
-  in place and will fight the spec sweep.**
-
-> **Read this before anything else.** [ADR-0010](../adr/0010-sap10-calculator-spec-target-and-validation.md)
-> supersedes the spec-version target, the PCDB sequencing, and the
-> cert-calibration layer of ADR-0009. This handover document was
-> originally written under the rejected framing; §3, §4, §7, §7b,
-> §10 below have been rewritten in lockstep. §2.5 lists the five
-> prerequisites that land **before** the section-by-section sweep
-> starts.
-
----
-
-## 2.5. Prerequisites before the sweep starts
-
-Five blockers, in dependency order. The section sweep does not start
-until all five are merged. Together they convert the parity probe
-from a noisy mixture-distribution signal into a clean per-section
-verification tool.
-
-### P1 — Re-extract the training parquet with `inspection_date`
-
-The 250k-cert parquet has 202 columns; **none of them are dates**.
-Without `inspection_date` on each cert we cannot construct the
-Validation Cohort (P3). The ETL currently drops the dates; add them
-back as a non-breaking MINOR Feature Schema Version bump (per
-ADR-0008). `EpcPropertyData.inspection_date` and `.registration_date`
-both exist on the domain object and are populated upstream — the
-parquet writer just needs to include them.
-
-### P2 — Delete `domain.sap.tables.table_12_cert_calibration`; correct `domain.sap.tables.table_12`
-
-Per ADR-0010 §2 and §1:
-- Remove `table_12_cert_calibration.py` and every call site
-  (`cert_calibration_prices()`, `cert_calibration_e7_codes`, the
-  `PriceTable` constructor argument that defaults to it).
-- Re-label `table_12.py` as `SAP 10.2 Table 12 (14-03-2025 amendment)`.
-- Correct CO2 factors: mains gas 0.214 → **0.210**, standard electricity 0.086 → **0.136** (the file currently mixes SAP 10.2 prices with SAP 10.3 CO2 factors).
-- Delete the misleading "+25 % shift from SAP 10.2" comment — 13.19 p
-  is SAP 10.1 (or SAP 10.2 amendment 0), not SAP 10.2 (14-03-2025).
-
-### P3 — Filter the parity probe to the Validation Cohort
-
-`Validation Cohort` is defined in `CONTEXT.md` and ADR-0010 §3:
-`inspection_date ≥ 2025-07-01`. Modify
-`services/ml_training_data/src/ml_training_data/sap_parity_probe.py`
-to apply the filter before sampling. The probe sample size and seed
-remain configurable; `sap_score ∈ [5, 99]` remains the typicality
-filter on top of the cohort filter.
-
-### P4 — Implement `PcdbLookup` (replace `NoOpPcdbLookup`)
-
-Per ADR-0010 §4. Download boiler + heat-pump CSVs from
-https://www.ncm-pcdb.org.uk. Build a lookup keyed on
-`main_heating_index_number`. Surface seasonal efficiency, secondary
-efficiency, output kW, and (for HPs) flow-temperature curve. ~half-day
-of work per the original handover estimate. The
-`Sap10Calculator.__init__(pcdb: Optional[PcdbLookup])` seam from
-ADR-0009 grill outcome #1 is the integration point; no calculator-side
-changes needed beyond reading `index_number` and routing PCDB-returns
-to space-heating / hot-water efficiency lookups instead of Table 4a.
-
-### P5 — Populate `SapResult.intermediate` + transcribe BRE worked examples
-
-Per ADR-0010 "Verification infrastructure":
-- Populate every named SAP 10.2 worksheet variable on
-  `SapResult.intermediate` as sketched in §11. This is mechanical —
-  thread the values from each worksheet module into the dict.
-- Transcribe the BRE worked examples from the SAP 10.2 appendices and
-  RdSAP 10 worked-example annex into unit tests
-  (`tests/test_bre_worked_examples.py`) that lock per-intermediate
-  values, not aggregate SAP. These replace the retired cert fixtures.
-
-### P6 — Strict-type `EpcPropertyData` via canonical domain enums
-
-The current `EpcPropertyData` and its nested types carry many bare
-`str` fields and `Union[int, str]` fields (the latter because the
-gov API gives ints and Site Notes give strings). The defensive
-type-handling cascades into the calculator (`cert_to_inputs.py`,
-`dimensions.py`, etc.) — `dimensions.py:74-82` is Khalim's documented
-example: `SapBuildingPart.identifier` carries main-vs-extension
-information but is bare `str`, so the dimensions code defensively
-iterates instead of dispatching on a typed kind.
-
-The fix:
-1. **One canonical enum per field**, union of all keys appearing
-   across all schema versions in
-   `datatypes/epc/domain/epc_codes.csv`. Hand-author the 18 enum
-   classes (`built_form`, `construction_age_band`, `energy_tariff`,
-   `glazed_area`, `glazed_type`, `heat_loss_corridor`, `main_fuel`,
-   `mechanical_ventilation`, `property_type`, `tenure`,
-   `transaction_type`, `ventilation_type`, `water_heating_fuel`,
-   `cylinder_insulation_thickness`, `energy_efficiency_rating`,
-   `improvement_description`, `improvement_summary`, `code`) plus
-   `BuildingPartKind` (Main Dwelling / Extension N). codes.csv is
-   the reference; a dedup script can optionally verify coverage but
-   is not a build dependency.
-2. **The API mapper** parses raw ints into the canonical enum.
-3. **The Site Notes mapper** parses raw strings into the canonical
-   enum.
-4. **The domain object** (`EpcPropertyData` and nested) holds only
-   the canonical enums — no `Union[int, str]`, no bare `str` for
-   coded fields.
-5. **Every consumer** (calculator, ML pipeline, recommendations,
-   ETL, scenario builder) reads from the typed fields.
-
-**Constraint**: repo-wide tests must keep passing. The calculator
-is one consumer; the ML pipeline, recommendations, and the Site
-Notes ingestion path also consume `EpcPropertyData`. Each mapper-
-layer change is paired with adapter updates that preserve the
-behaviour the existing tests cover.
-
-Pyright `strict` mode must remain clean (CLAUDE.md).
-
-### Expected outcome of P1–P6
-
-After all six land, run the probe against the Validation Cohort. The
-expected baseline MAE on the clean probe is much smaller than the
-current 4.61 — likely 1.5–2.5 SAP-points based on what we know about
-the residual breakdown (heat pumps closed by P4, gas boilers tightened
-by P4, price-version noise removed by P2+P3). The remaining residual
-is the genuine spec sweep target — and per-section fixes will move
-the probe in measurable, distinguishable amounts because there's no
-compensating layer to mask them, and there's no defensive type
-branching obscuring which input value drove which intermediate.
-
----
-
-## 3. Why the prior diagnosis was wrong and how we fixed it
-
-The prior session shipped ten slices (S-B23 → S-B31) by debugging the
-biggest residuals one at a time:
-
-- **PE MAE dropped substantially: 57.28 → 43.32 (−14)** — real progress
-  on the demand-side calculation.
-- **SAP MAE barely moved: 5.34 → 4.61 (−0.73)** — diagnosed at the time
-  as "cert-calibration absorbs multiple spec deviations".
-
-Three slice attempts looked like they "proved" the cert-cal-absorbs-
-deviations diagnosis:
-
-- **Standing charges**: spec Table 12 note (a) requires £92/yr gas
-  standing charge on space + water heating. Adding it pushed SAP bias
-  +0.98 → −2.62. Reverted.
-- **Cat=10 room heaters off-peak routing**: Table 12a says "other
-  direct-acting electric heating" bills 100 % high rate on 7-hour
-  tariff. Switching cat=10 from off-peak to standard rate inverted
-  the bias +5.88 → −6.00. Reverted.
-- **HW cylinder zero-loss for combi** (uncommitted): Table 2 + Table
-  3 footers require zero storage + primary loss when efficiency comes
-  from Table 4b. Zeroing them dropped PE MAE −6.64 but raised SAP
-  MAE +0.39 and broke 3 of 7 golden fixtures. Reverted.
-
-The prior agent concluded: *cert-calibration absorbs Elmhurst's
-deviations from spec — we can't fix one without re-deriving the
-calibration, so do a full spec sweep first and re-derive cert-cal at
-the end.* This diagnosis is **wrong** and the proposed remedy
-amplifies the problem.
-
-### What was actually going on
-
-The 250k-cert corpus spans multiple SAP spec-version regimes:
-- **Pre-2025-03-14**: certs lodged under SAP 10.1 / SAP 10.2 amendment
-  0 prices — mains gas ~3.48 p, standard electricity 13.19 p.
-- **Post-2025-03-14**: certs lodged under SAP 10.2 (14-03-2025) prices
-  — mains gas 3.64 p, standard electricity 16.49 p.
-
-The `table_12_cert_calibration` prices (3.48 p / 13.19 p) are **the
-older spec's prices**, not Elmhurst deviations from the spec. They
-are an empirical "best fit" across a mixture distribution of two
-price regimes, with downstream-component bugs (PCDB absence, HW
-cylinder loss applied to combi, etc.) absorbed into the fit. The
-table looks like compensation for assessor-software quirks because we
-were never told which spec each cert was on.
-
-Each "spec-correct fix that worsened MAE" in the failed slices above
-was actually correct. The MAE regressed because:
-1. The cert-cal prices (pre-March-2025 spec) cancelled with one set
-   of downstream errors to produce a quasi-stable cost.
-2. The spec-correct fix landed → that cancellation broke → the
-   probe MAE went up.
-3. But the spec-correct fix was *right* — what regressed was a
-   compensating-error equilibrium, not the calculator's truth.
-
-The prior session's "re-derive cert-cal at the end" plan would
-re-establish a new compensating-error equilibrium across the new bug
-set. It does not converge on spec-correctness.
-
-### The fix (per ADR-0010)
-
-1. **Stop fitting against a mixture distribution.** Filter the
-   validation corpus to a single spec-version window (Validation
-   Cohort, `inspection_date ≥ 2025-07-01`). Every cert in the cohort
-   was lodged on SAP 10.2 (14-03-2025) prices.
-2. **Delete the cert-calibration layer.** Use spec prices everywhere
-   (`domain.sap.tables.table_12`). The only price-routing decision
-   left is Table 12a fractional high-rate blending — a real spec
-   feature, not a calibration.
-3. **Build PCDB**, because it dominates residual variance and the
-   reason it was deferred (cert-cal-absorbs-PCDB) no longer holds.
-4. **Build trace mode and BRE worked-example fixtures**, so
-   per-section verification works against single-cert intermediates
-   instead of aggregate corpus MAE.
-
-This is what §2.5 lists as the five prerequisites. Once they land,
-the section-by-section spec sweep produces clean, monotonic
-improvements.
-
----
-
-## 4. Scope decisions (per ADR-0010)
-
-### IN scope
-- **SAP 10.2 (14-03-2025 amendment)** is the active spec target.
-  `docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf`, 199 pages.
-- **RdSAP 10 (10-06-2025)** — the cert→input mapping layer that
-  cross-references SAP 10.2. `docs/sap-spec/rdsap-10-specification-2025-06-10.pdf`,
-  114 pages.
-- **PCDB integration.** Moved from "Session C deferred" to **P4
-  prerequisite** (§2.5). Heat pumps and the 78 % of gas-boiler certs
-  lodging `main_heating_data_source=1` need PCDB-sourced efficiency
-  for the calculator to be spec-correct. Data source:
-  https://www.ncm-pcdb.org.uk; lookup keyed on `main_heating_index_number`;
-  fields: seasonal efficiency, secondary efficiency, output kW,
-  flow-temperature curve (HPs).
-- **All RdSAP 10 sections in document order.** §1 → §§19, plus
-  Tables 27 / 28 / 29 / 30 / 31. The verification approach in §5 is
-  unchanged — only the precondition changes: the sweep runs against a
-  clean probe (Validation Cohort + spec prices + PCDB + trace mode).
-
-### OUT of scope
-- **Full SAP assessments.** Full-SAP certs lodge measured/calculated
-  U-values in `walls[i].description` (e.g.
-  "Average thermal transmittance 0.18 W/m²K"). These are a separate
-  calculation path (BS EN ISO 6946) and a different corpus. Park
-  until the RdSAP 10 base case parity is reached. S-B24 / S-B29
-  attempted partial handling; those slices can stay or be reverted at
-  your discretion when you reach §§4-7 of RdSAP and §3 of SAP 10.2.
-- **SAP 10.3 (13-01-2026).** No SAP-10.3-lodged certs in the corpus,
-  so it cannot be validated. Calculator targets SAP 10.2 until the
-  corpus migrates (expected late 2026 / 2027 once BRE updates RdSAP
-  to reference SAP 10.3). Note: `table_12.py` currently mixes SAP
-  10.2 prices with SAP 10.3 CO2 factors — corrected as part of P2.
-- **Historical-spec cert reproduction.** Calculating what cert SAP
-  *would have been* under SAP 10.1 / pre-March-2025 SAP 10.2 prices is
-  not the calculator's job. Lodged Performance carries the historical
-  value; Calculated SAP10 Performance is current-spec only. The
-  Validation Cohort filter operationalises this — older certs are
-  out of the validation loop, not because they're "wrong" but because
-  they're a different spec's output.
-- **Re-deriving cert-cal at the end.** The prior session's plan. The
-  cert-calibration layer is deleted in P2, not re-fit.
-
----
-
-## 5. The approach — section-by-section spec verification
-
-Work through the RdSAP 10 spec **in document order**, starting at
-§1. For each section:
-
-### 5.1. Read the spec section
-Read the section text fully. Note every rule, table reference, and
-defaulting cascade.
-
-### 5.2. Find the corresponding code
-Map the section to the source file(s) implementing it. The current
-mapping (some sections are split across modules):
-
-| RdSAP 10 section | Code location |
-|---|---|
-| §1 Introduction / general | n/a |
-| §2 Property descriptors | `datatypes/epc/domain/epc_property_data.py` |
-| §3 Dimensions | `packages/domain/src/domain/sap/worksheet/dimensions.py` |
-| §4 Ventilation | `packages/domain/src/domain/sap/worksheet/ventilation.py` |
-| §5 Construction / U-values | `packages/domain/src/domain/ml/rdsap_uvalues.py` + `worksheet/heat_transmission.py` |
-| §6 Windows / doors / overshading | `worksheet/solar_gains.py` + `rdsap/cert_to_inputs.py` |
-| §7 Heating systems (refers to SAP 10.2 Appendix A) | `domain.ml.sap_efficiencies` + `rdsap/cert_to_inputs.py` |
-| §8 Heating controls (Table 4e) | `rdsap/cert_to_inputs.py` |
-| §9 Heat emitters / flow temperatures | not implemented |
-| §10 Space and water heating (Appendix A) | `rdsap/cert_to_inputs.py` |
-| §11 Additional items (PV, batteries, wind, hydro, shutters) | partial in `cert_to_inputs.py` (PV only) |
-| §12 Electricity tariff | `rdsap/cert_to_inputs.py` (`_is_off_peak_meter`, fuel routing) |
-| §13 Addendum to EPCs | n/a |
-| §14 Special cases (e.g. flats above commercial) | not implemented |
-| §15 Improvements (recommendations) | n/a (not rating) |
-| §16-19 RdSAP-specific SAP rating equations | `worksheet/rating.py` |
-| Table 27 — Living-area fraction | `rdsap/cert_to_inputs.py:_living_area_fraction` |
-| Table 28 — Cylinder size defaults | `domain.ml.demand:_CYLINDER_VOLUME_L` |
-| Table 29 — Heating + HW parameters | partial in `cert_to_inputs.py` |
-| Table 30 — Mechanical ventilation | not implemented |
-| Table 31 — Data to be collected | n/a |
-
-### 5.3. For each spec rule in the section, check our code
-For each table, formula, footnote, exception:
-
-1. Does our code implement it?
-2. Does the implementation match the spec values exactly?
-3. Are there spec-defined edge cases / footnotes we're missing?
-
-### 5.4. When a gap is found
-- Write a failing unit test that asserts the spec-correct behaviour
-  — wherever possible, write it as an assertion on `intermediate`
-  values rather than on aggregate SAP, using a BRE worked example
-  if one covers the section.
-- Implement the fix.
-- Run `test_bre_worked_examples.py` plus the Validation Cohort
-  probe. Note both direction and magnitude of change.
-- If a BRE worked-example breaks, the new code is wrong (revert).
-  BRE examples are spec-derived and cannot regress from a
-  spec-correct change.
-- Commit per-slice: one section → one commit. Reference the spec
-  section in the commit message.
-
-### 5.5. Sweep-time principle: worksheet-faithful structure
-
-Each `worksheet/*.py` module must mirror the SAP 10.2 worksheet
-structure for its section. As you verify a section, also restructure
-its module so that:
-
-1. **Each function name references its worksheet-line origin** (e.g.
-   `heat_transfer_coefficient` aligns with worksheet line (40);
-   `mean_internal_temperature` aligns with worksheet line (93)).
-2. **Compound calculations are split** into one function per
-   worksheet line where possible — easier to verify against
-   `intermediate[...]` and against BRE worked-example values.
-3. **Defensive type-handling disappears**. Once P6 lands, the input
-   is a typed enum or numeric — branching on `isinstance(x, int)` is
-   replaced by enum dispatch.
-4. **Domain-typed inputs flow directly**. `SapBuildingPart.kind ==
-   BuildingPartKind.MAIN_DWELLING` replaces string sniffing of
-   `identifier`. The dimensions.py "unnecessarily complicated"
-   pattern Khalim flagged is the canonical example of what *not*
-   to do.
-
-The principle applies during section-sweep slices. It is **not**
-a separate prerequisite — the refactor lands with the verification
-slice for the section it touches.
-
-### 5.6. Use trace mode when you need it
-P5 populates `SapResult.intermediate: dict[str, float]` with every
-named SAP 10.2 worksheet variable. Each section's verification
-benefits from inspecting these values per-cert. See §11 below for
-the sketch.
-
----
-
-## 6. What's already been done — section by section
-
-This is your starting map. Each row says whether the section has been
-touched and what the current state is.
-
-### Walls / construction (§5)
-- **S-B23 (committed `9a509e41`)**: Table 6 "Filled cavity" row dispatch
-  when `wall_insulation_type=2` AND `wall_construction=4`. Spec-anchored.
-- **S-B24 (committed `15613309`)**: Parse `walls[i].description` for
-  "Average thermal transmittance X W/m²K". **PARK** — full-SAP path.
-- **S-B25 (committed `6b934710`)**: Description-based dispatch for cavity
-  "as built, insulated (assumed)" + similar (type=4 with descriptive
-  signal). Spec-anchored via legacy `epc_wall_description_map`.
-- **S-B26 (committed `361f9154`)**: `_insulation_bucket(0, True) → 50`
-  fix (the "NI" thickness sentinel) + description-based override of
-  `wall_ins_present` for non-cavity walls. Spec footnote (Table 6).
-- **S-B27 (committed `1f49fa03`)**: Floor `_insulation_bucket` analog —
-  Table 19 footnote (2) "max(50, age-band default)" when description
-  signals retrofit.
-- **S-B28 (committed `25261d5c`)**: Roof NI thickness + insulated
-  description → §5.11.4 footnote 50mm joist row.
-- **S-B29 (committed `3ab09845`)**: Floor + roof "Average thermal
-  transmittance" parse. **PARK** — full-SAP path.
-
-**Still to verify in §5**:
-- Stone wall U-values for Scotland / Wales / NIR (Tables 7-10) — only
-  England is fully transcribed; country overrides are partial.
-- Cob U-values (§5.6) — table only, no formula implementation.
-- Stone formula §5.6 / §5.7 for non-standard wall thicknesses.
-- Curtain wall §5.18 — not implemented.
-- Party wall U-values (Table 15) — implemented in `u_party_wall`,
-  verify table values.
-- Thermal bridging (Table 21) — implemented as global `y` factor,
-  verify per-age-band values.
-- §5.16 Thermal mass — Table 22 (only 100 / 250 kJ/m²K, dispatched
-  by construction type with internal insulation). Currently we
-  hardcode 250 (see `cert_to_inputs.py:_DEFAULT_THERMAL_MASS_PARAMETER_KJ_PER_M2_K`).
-  This is wrong for timber-frame / cob / internally-insulated masonry
-  (should be 100).
-
-### Heating systems (§§7-10, SAP Appendix A)
-- **S-B20 (in history)**: Table 11 secondary heating allocation,
-  conditional on cert lodging secondary or being electric storage.
-- **Failed S-B30 (reverted)**: respect `main_heating_fraction` —
-  shown empirically wrong. Field is multi-main allocation, not
-  main-vs-secondary. Spec verified against SAP 10.2 Appendix A1/A4.
-- **S-B31 (committed `afdf297f`)**: Table 12c DLF on heat-network main.
-  Spec §C3.1 + Table 12c.
-- **Failed S-B32 (room heater off-peak routing, reverted)**: Table 12a
-  says cat=10 room heaters on 7-hour tariff bill 100% high rate. Our
-  cert-cal extends off-peak to codes 691-696. Spec-correct fix
-  inverted bias direction — calibration was absorbing this.
-- **Uncommitted HW cylinder fix**: spec-correct (combi → zero
-  storage/primary loss per Table 2 + Table 3 footers) but breaks 3
-  golden fixtures. Decision deferred to systematic pass.
-
-**Still to verify in heating**:
-- Table 4a efficiency values for every code (heat pumps, storage
-  heaters, room heaters, CPSU). The category-fallback (cat=4 → 2.30)
-  is documented as a known limitation.
-- Boiler interlock penalty (−5%) — spec §9.2.1: "The efficiency of
-  gas and liquid fuel boilers for both space and water heating is
-  reduced by 5% if the boiler is not interlocked for space and water
-  heating." We don't apply this. Known gap.
-- Table 4c condensing-boiler / heat-pump emitter-temperature
-  adjustment — we don't apply this.
-- Table 12a high-rate fractions for off-peak dwellings — we apply
-  100% off-peak or 100% standard, never fractional blending.
-
-### Hot water (§4 SAP + Appendix J)
-- Storage loss factor table (Table 2) — current values in
-  `domain.ml.demand:_STORAGE_LOSS_FACTOR` are ~3× off from spec
-  (verified). Known under-prediction of cylinder loss for storage
-  systems; cancelled by over-prediction of primary loss for combi
-  systems in aggregate.
-- Primary loss formula (Table 3) — implemented as 245/60 kWh by age
-  band. Spec is a per-month formula `nₘ × 14 × [{0.0091·p + 0.0245·(1-p)}·h + 0.0263]`
-  with `p` (pipework insulation fraction) and `h` (circulation hours).
-  Known approximation.
-- Combi-boiler zero-loss rule (Table 2 + Table 3 footers) — currently
-  NOT applied (the failed uncommitted slice). Adding this drops PE
-  MAE −6.64 but raises SAP MAE +0.39.
-- Appendix J Vd formula `25N + 36` — currently the simple form, not
-  the full per-component (shower / bath / other) breakdown. Useful
-  HW demand is ~7% under spec value.
-- ΔT — currently 43°C constant (55−12). Spec uses monthly Tcold and
-  hot at 52°C, not 55°C. Per-month variance unmodelled.
-
-### Lighting (Appendix L)
-- `predicted_lighting_kwh` in `domain.ml.demand` uses `9.3 × TFA ×
-  (1 − 0.5·led_share − 0.4·cfl_share)` heuristic.
-- Spec is L1-L12: daylight correction, fixed-lighting capacity, top-up
-  + portable shares, monthly profile.
-- For LED-dominant home (50+ LEDs): our heuristic gives ~465 kWh, spec
-  gives ~94 kWh. Known over-prediction by ~5× for new-build LED homes.
-
-### Internal gains (§5 SAP)
-- `worksheet/internal_gains.py` implements metabolic + cooking +
-  appliances + lighting (the four positive rows of Table 5).
-- **Missing**: Water heating row (`1000 × (65)ₘ / (nₘ × 24)` — i.e.
-  HW losses recycled as heated-space gains) and Losses row (`−40 × N`
-  for cold inflow + evaporation). Both documented in S-B23 gap list.
-
-### Ventilation (§4 / Table 5)
-- Wind-shelter factor implemented in S-B21.
-- Mechanical ventilation (MVHR, MEV, PIV) — not implemented; cert
-  rarely lodges. Spec §4.2 + Table 4g.
-- Pressure-test override (worksheet lines 17-18) — not implemented.
-
-### Tariff / cost (§12 + Table 12 / 12a / 12c)
-- Cert-calibration prices in
-  `domain.sap.tables.table_12_cert_calibration` are an EMPIRICAL fit
-  to Elmhurst's output. They are LOWER than the published Table 12
-  spec values by 4-25%. Known divergence; investigation deferred.
-- Standing charges (Table 12 note (a)) — NOT applied. Adding them
-  empirically worsens MAE (calibration absorbs).
-- Table 12a high-rate fractions — currently 100% off-peak for E7-
-  eligible codes, 100% standard otherwise. No fractional blending.
-- Heat network DLF (Table 12c) — applied per S-B31 only to main
-  heating + HW from main. HW-only-from-heat-network is a separate slice.
-
----
-
-## 7. The cert-calibration "tension" is dissolved (per ADR-0010)
-
-This section originally framed cert-calibration vs spec-correctness as
-two end-states the calculator had to choose between. That framing is
-wrong (see §3 for the actual diagnosis): the cert-cal values are
-pre-March-2025 SAP prices, not Elmhurst deviations from SAP 10.2.
-Once the corpus is filtered to the Validation Cohort (P3) and the
-cert-cal layer is deleted (P2), the false dichotomy disappears.
-
-### What replaces this section
-
-- **One price table.** `domain.sap.tables.table_12` (re-labelled SAP
-  10.2 14-03-2025 amendment, CO2 factors corrected per P2).
-- **One validation cohort.** `inspection_date ≥ 2025-07-01`, every
-  cert lodged on the calculator's target spec version.
-- **One verification mechanism.** Trace-mode intermediates + BRE
-  worked-example unit tests for per-section verification; Validation
-  Cohort probe MAE for aggregate go/no-go.
-
-Cert-software deviations from spec, if they exist at all, are
-expected to be small and localised. They surface as residual after
-the spec sweep completes against a clean probe — and at that point
-the question is whether to chase them at all (Elmhurst-deviation
-fixes have low domain value compared to spec-correctness, given the
-calculator's product use case is scoring counterfactuals for the
-MeasureApplicator chain, not reproducing historical certs).
-
----
-
-## 7b. Outstanding findings to pick up during the systematic pass
-
-The prior session identified several spec-correct fixes that were
-reverted because they made SAP MAE worse against the **full corpus**.
-The empirical signal that "reverted" them was version-mixture noise
-(see §3) plus compensating-error breakage in the 7 retired golden
-fixtures. Each fix below is **expected to land cleanly** once the
-five prerequisites in §2.5 are done, because:
-
-- The Validation Cohort (P3) is on a single spec version — the price
-  mismatch that drove the bias regression on standing charges and
-  cat=10 routing disappears.
-- The cert-cal layer is gone (P2) — no calibration to "break".
-- PCDB is integrated (P4) — the heat-pump and gas-boiler residuals
-  that dominated per-cert MAE collapse before any of these findings
-  even matter.
-- The fixtures are now BRE worked examples (P5 + §10) — they cannot
-  be broken by spec-correct changes because they are themselves
-  derived from the spec.
-
-Treat each finding as a section-sweep TODO. The empirical impacts
-below were measured against the **dirty probe** (full corpus + cert-cal
-+ no PCDB) and are **not predictive** of behaviour on the clean probe.
-Re-measure each fix against the Validation Cohort after prerequisites
-land.
-
-### Finding 1 — HW cylinder zero-loss rule for combi boilers
-**Status**: spec-correct fix exists in working-tree-only form
-(uncommitted). Reverted at end of last session.
-
-**Spec basis**:
-- **SAP 10.2 Table 2 footer (page 158)**: "In the case of a
-  combination boiler: a) the storage loss factor is zero if the
-  efficiency is taken from Table 4b"
-- **SAP 10.2 Table 3 footer (page 160)**: "Primary loss is set to
-  zero for the following: Electric immersion heater, Combi boiler
-  (including when it is part of a combined heat pump and boiler
-  package and provides all the hot water), CPSU (including electric
-  CPSU), Boiler and thermal store within a single casing, Separate
-  boiler and thermal store connected by no more than 1.5 m of
-  insulated pipework, Direct-acting electric boiler, Heat pump (not
-  combined heat pump and boiler package with a non-combi boiler)
-  from PCDB with hot water vessel integral to package"
-
-**The bug**: our calculator currently adds storage loss (~135 kWh)
-and primary loss (~245 kWh) for ALL certs with an age band lodged,
-ignoring whether the dwelling has a cylinder. **67% of corpus certs
-explicitly lodge `has_hot_water_cylinder=False`** (the modal combi
-boiler case) — we add 380 kWh of fictional HW losses for each.
-
-**The fix** (sketch, ~10 lines):
-1. Add `has_cylinder: bool = True` keyword to
-   `predicted_hot_water_kwh` in `packages/domain/src/domain/ml/demand.py`.
-2. When `has_cylinder=False`, set `storage_loss = 0` and `primary_loss = 0`.
-3. In `cert_to_inputs.py` (around line 829), pass
-   `has_cylinder=epc.has_hot_water_cylinder and not is_instantaneous`.
-
-**Empirical impact** (measured on 300-cert probe):
-- **PE MAE: 43.32 → 36.68 (−6.64) ← biggest single fix found this session**
-- PE bias: 37.69 → 30.41 (−7.28)
-- SAP MAE: 4.61 → 5.00 (+0.39, regression)
-- 3 of 7 golden fixtures break
-
-**Why it was reverted**: the SAP regression + broken fixtures indicate
-the fictional HW losses were partially compensating for OTHER bugs
-(likely lighting over-prediction for LED-dominant homes). The right
-ordering is: fix the spec-clear cases (HW cylinder, lighting per
-Appendix L, etc.) together, then re-derive cert-cal.
-
-**When to pick up**: when you reach §4 / Appendix J during the
-systematic pass. Pair with the lighting Appendix L fix to avoid
-breaking the golden fixtures individually.
-
-### Finding 2 — Standing charges (Table 12 note (a))
-**Status**: spec-correct, never implemented. Empirically rejected by
-4-mode probe.
-
-**Spec basis**: SAP 10.2 Table 12 note (a), page 190:
-> "For calculations including regulated energy uses only (e.g.
-> regulation compliance, energy ratings):
-> - The standing charge for electricity standard tariff is omitted
-> - The standing charge for off-peak electricity is added to space
->   and water heating costs where either main heating or hot water
->   uses off-peak electricity
-> - The standing charge for gas fuels is added to space and water
->   heating costs where the gas fuel is used for space heating
->   (main or secondary) or for water heating"
-
-**The bug**: our calculator never adds standing charges. Per spec, a
-gas-heated dwelling should have £92/yr added to the ECF numerator.
-
-**Empirical impact** (4-mode probe, 300 certs):
-| Mode | All certs | Gas-only |
-|---|---|---|
-| cert-cal, no standing (current) | MAE 4.69, bias +0.98 | MAE 4.01, bias +0.80 |
-| cert-cal + gas standing | MAE 4.94, bias **−2.62** | MAE 4.31, bias **−3.53** |
-
-Adding standing charges shifts SAP bias by ~3.5 points downward —
-clearly the wrong direction. The cert-cal prices (3.48p gas vs spec
-3.64p) implicitly absorb the standing-charge contribution.
-
-**When to pick up**: when you reach §12 / Table 12. Apply alongside
-spec-correct unit prices (3.64p gas, 16.49p elec) and re-derive
-cert-cal to match Elmhurst's residual deviation pattern.
-
-### Finding 3 — Cat=10 room heaters off-peak routing
-**Status**: spec-correct, currently bills room heaters at off-peak
-rate on E7 dwellings. Empirically rejected.
-
-**Spec basis**: SAP 10.2 Table 12a (page 191):
-> "Other direct-acting electric heating (including electric secondary
-> heating): 7-hour tariff 1.00 high rate; 10-hour tariff 0.50 high rate"
-
-**The bug**: our cert-calibration (`cert_calibration_e7_codes`)
-extends the off-peak set to include codes 691-696 (room heaters).
-That's the S-B14 empirical extension — the previous agent found it
-helped some specific certs. Per Table 12a it's WRONG: room heaters
-on E7 should bill 100% at HIGH rate, not at low rate.
-
-**Empirical impact**: switching from off-peak (5.50p cert-cal) to
-standard rate (13.19p) — closer to spec but still not the high rate
-(15.29p cert-cal) — inverted the bias from +5.88 to −6.00 without
-improving MAE.
-
-**The real issue**: Table 12a defines FRACTIONAL blending (e.g.
-"90% high, 10% low" for direct-acting electric boiler on 7-hour
-tariff), not binary on/off-peak. Our calculator only supports binary.
-A proper implementation needs per-system high-rate fractions.
-
-**When to pick up**: when you reach §12 / Table 12a. Implement
-fractional blending for all the rows of Table 12a, not just cat=10.
-
-### Finding 4 — Lighting (Appendix L proper)
-**Status**: gap. Current code uses a 9.3 kWh/m² heuristic with simple
-LED/CFL reductions; spec is the L1-L12 cascade with daylight
-correction, fixed-lighting capacity, top-up + portable shares,
-monthly profile.
-
-**Spec basis**: SAP 10.2 Appendix L §L1 (pages 88-90), equations
-L1-L12.
-
-**The bug**: for a 100 m² LED-dominant home (e.g. cert 7536-3827 with
-51 LEDs), our heuristic returns 465 kWh/yr; spec returns ~94 kWh/yr.
-Over-prediction by ~5× on LED-dominant homes (which is most modern
-stock).
-
-**Empirical impact** (estimated):
-- ~5-6 kWh/m² PEUI over-prediction for LED-dominant population
-- Corpus-weighted: ~3-4 kWh/m² PEUI bias contribution
-
-**When to pick up**: when you reach Appendix L. Pair with the HW
-cylinder fix (Finding 1) to avoid the SAP MAE regression.
-
-### Finding 5 — Internal-gains Table 5 missing rows
-**Status**: gap. Spec Table 5 has 7 rows for internal gains; our
-`worksheet/internal_gains.py` implements 4.
-
-**Spec basis**: SAP 10.2 Table 5 (page 177).
-
-**Missing rows**:
-- **Water heating**: `1000 × (65)ₘ / (nₘ × 24)` W — the HW losses
-  (cylinder + distribution + primary) recycled as heated-space gains
-  via worksheet line (65). Reduces space heating demand.
-- **Losses**: `−40 × N` W — heat to incoming cold water and
-  evaporation. Negative contribution.
-
-**Empirical impact** (estimated):
-- For N=2.7: HW gains ≈+75 W, losses ≈−108 W, net ≈−33 W. Currently
-  we miss both → our gains are 33 W too high → space heating demand
-  too low → PE under-predicted by ~3 kWh/m² (rough).
-
-**When to pick up**: when you reach §5 / Table 5. Worksheet line (65)
-also needs implementation — the HW losses already exist in our calc
-(see `demand.py:_cylinder_storage_loss_kwh` etc.), they just need
-piping into internal_gains.
-
-### Finding 6 — Storage-loss-factor table values are wrong
-**Status**: gap. Affects only certs with `has_hot_water_cylinder=True`
-(33% of corpus).
-
-**Spec basis**: SAP 10.2 Table 2 (page 158).
-
-**The bug**: `domain.ml.demand:_STORAGE_LOSS_FACTOR` values are ~3×
-LOWER than spec. E.g. for 38mm foam our value is 0.0056, spec is
-0.0181. Effect: we UNDER-predict cylinder storage loss by ~300 kWh
-for storage systems, partly cancelling the over-prediction from
-Finding 1.
-
-**When to pick up**: when you reach §4 / Table 2. Fix WITH Finding 1
-(combi zero-loss) so the cancellation doesn't dominate the
-direction.
-
-### Finding 7 — Heat-pump fallback efficiency 2.30
-**Status**: gap that requires PCDB. See §8b.
-
-### Finding 8 — Other smaller gaps (carry forward)
-- Boiler interlock −5% penalty (§9.2.1) — never applied
-- Table 4c condensing boiler / HP emitter temperature adjustment — never applied
-- Control-temperature adjustment from Table 4e — always 0 in code, spec varies
-- Wall U-values for Scotland / Wales / NIR — only England fully transcribed
-- Per-junction thermal bridging (Table R2) — global y approximation only
-- Multi-main heating (`main_heating_fraction` ≠ 1) — first main only
-- Cooling §10 — not implemented (rare in UK)
-- FEE §11 — not implemented (new-build only)
-
----
-
-## 8. Don't repeat — known dead-ends
-
-> **Re-read after §3 + §7b.** Three entries below were classified as
-> "dead-ends because cert-cal absorbs" — that diagnosis is wrong.
-> They are spec-correct fixes that were measured under a noisy probe.
-> Now flagged as **conditional dead-ends**: dead only if you try them
-> before P1–P5 land. After prerequisites: they are expected
-> improvements, not dead-ends. See ADR-0010.
-
-- ❌ **Switching "NI" wall thickness to None alone** (S-B5 in history) —
-  over-corrected because it routed to the (Unfilled cavity, 50mm) row
-  instead of the dedicated Filled cavity row. The right fix landed in
-  S-B23 with a `WALL_INSULATION_FILLED_CAVITY` dispatcher.
-- ❌ **Aggressive efficiency rescue for missing `sap_main_heating_code`**
-  (S-B5) — over-corrected. The category fallback (cat=4 → 2.30) is
-  intentionally conservative; PCDB (P4 prerequisite) supplies the
-  real efficiency.
-- ⚠️ **Using SAP 10.2 spec prices for parity validation** — under
-  the dirty probe, cert-cal prices fit better. **Inverts under the
-  clean probe (P2 + P3): SAP 10.2 spec prices are correct because the
-  Validation Cohort is on the 14-03-2025 amendment.** Listed here
-  only as a warning if you start the sweep before prerequisites land.
-- ❌ **Always applying 10% secondary heating** — must be conditional on
-  cert lodging or main system being electric storage (S-B20). See
-  spec Appendix A.4.
-- ❌ **Respecting `main_heating_fraction` for secondary allocation**
-  (failed S-B30) — the field is the multi-main allocation (system 1 vs
-  system 2), not main-vs-secondary. SAP MAE 4.69 → 4.85 (worse).
-- ⚠️ **Switching cat=10 room heaters off off-peak** (failed S-B32) —
-  spec-correct per Table 12a. The bias inversion under the dirty
-  probe was driven by cert-cal compensating; on the clean probe this
-  is just spec-correct. Land as part of the §12 spec sweep after
-  prerequisites.
-- ⚠️ **Adding gas standing charges** (4-mode probe, unimplemented) —
-  spec-correct per Table 12 note (a). Same logic: bias drift under
-  dirty probe is version-mixture + missing-PCDB noise, not Elmhurst
-  deviation. Land as part of §12 spec sweep.
-- ⚠️ **Zeroing storage + primary loss for combi boilers** (uncommitted
-  S-B32) — spec-correct per Table 2 + Table 3 footers. SAP MAE
-  regression was driven by the now-retired golden fixtures (§10) and
-  cert-cal absorption. Land as part of §4 / Appendix J sweep.
-
----
-
-## 9. The cert corpus and parity probe
-
-### Sample
-`data/ml_training/runs/2025_2026_n250000_v18a/data.parquet` is the
-250k-cert parquet. **After P1 lands** the parquet carries
-`inspection_date`; the probe then filters to the **Validation Cohort**
-(`inspection_date ≥ 2025-07-01`) plus `sap_score ∈ [5, 99]` and
-samples 300 at seed=7 by default. Filtering rationale:
-- ≤ 5 is heritage/anomaly stock (sub-3 % of corpus)
-- ≥ 99 is full-SAP new-builds the parquet excludes anyway
-- `inspection_date ≥ 2025-07-01` ensures every cert was lodged on
-  SAP 10.2 (14-03-2025 amendment) — see [CONTEXT.md](../../CONTEXT.md)
-  / "Validation Cohort" and ADR-0010 §3.
-
-### Run the probe
-```bash
-python -c "
-import sys
-sys.path.insert(0, 'packages/domain/src')
-sys.path.insert(0, '.')
-sys.path.insert(0, 'services/ml_training_data/src')
-from ml_training_data.sap_parity_probe import main
-main(['300','7'])
-"
-```
-
-### What the probe shows
-- Aggregate SAP MAE / RMSE / bias
-- Aggregate PE MAE / RMSE / bias
-- Per-end-use PEUI breakdown (space / HW / lighting / pumps)
-- Stratification by `main_heating_category`, `construction_age_band`,
-  `dwelling_type`
-- Worst-15 residuals (SAP and PE)
-
-### Known parquet limitations
-- ~0.7% of parquet certs have `construction_age_band=None` vs 15% in
-  the raw bulk-zip. The parquet filters out full-SAP new-builds
-  upstream. Don't measure full-SAP-path slices against the parquet.
-- Heat-pump certs (cat=4) are under-represented and concentrated in
-  the worst-residual tail because PCDB efficiency is unavailable.
-
----
-
-## 10. Fixtures: retire the 7 cert-based golden fixtures, replace with BRE worked examples (per ADR-0010 + P5)
-
-The 7 cert-based fixtures at
-`packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`
-were locked in against the current calculator state — *with* cert-cal,
-*without* PCDB, *with* HW cylinder loss always applied, *with* the
-lighting heuristic, etc. They are documented in §3 / the prior
-handover as containing compensating errors. Once the prerequisites
-land, every spec-correct fix breaks at least one of them. They will
-fight the spec sweep.
-
-### Replacement strategy
-
-**Primary regression suite: BRE worked-example fixtures.**
-
-Transcribe the worked examples from:
-- SAP 10.2 spec appendices (especially Appendix R — reference values
-  and the worked example dwelling).
-- RdSAP 10 (10-06-2025) worked-example annex.
-
-Each worked example becomes a unit test that locks **per-intermediate
-expected values** (HLP, HTC, mean internal temperature monthly, MIT,
-ECF, SAP score) rather than the aggregate SAP score alone. Because
-they are spec-derived, no spec-correct change can break them — any
-break is an implementation bug, unambiguously.
-
-These tests live at
-`packages/domain/src/domain/sap/tests/test_bre_worked_examples.py`
-(new module — separate from the cert-based fixtures module).
-
-**Cert-based fixtures retired.**
-
-The current `test_golden_fixtures.py` is either deleted or repurposed
-as a *very loose* smoke-test integration suite (e.g. `|SAP residual|
-≤ 5`) that catches catastrophic regressions only. The 7 cert JSONs
-under `fixtures/golden/<cert>.json` can be kept on disk as reference
-data, but they no longer drive go/no-go decisions in the sweep.
-
-**Optional future addition.**
-
-If/when a current Elmhurst (or Stroma / Quidos / NHER) license is
-available, run a handful of representative corpus certs through it
-and lock those outputs as a second-tier regression suite — Elmhurst-
-parity fixtures alongside spec-parity fixtures. Not a prerequisite.
-
----
-
-## 11. Trace mode (prerequisite P5 — implementation sketch)
-
-This section was originally labelled "recommended"; it is now
-**prerequisite P5** per ADR-0010. The sweep does not start until
-`intermediate` is populated everywhere. ADR-0009 proposed:
-```python
-@dataclass(frozen=True)
-class SapResult:
-    sap_score: float
-    ...
-    intermediate: dict[str, float]
-```
-
-The `intermediate` field was never populated. Suggested implementation
-for the systematic pass:
-
-```python
-intermediate = {
-    # §1 dimensions
-    "tfa_m2": tfa,
-    "volume_m3": volume,
-    "storey_count": storeys,
-    # §3 heat transmission
-    "walls_w_per_k": ht.walls_w_per_k,
-    "roof_w_per_k": ht.roof_w_per_k,
-    "floor_w_per_k": ht.floor_w_per_k,
-    "party_walls_w_per_k": ht.party_walls_w_per_k,
-    "windows_w_per_k": ht.windows_w_per_k,
-    "doors_w_per_k": ht.doors_w_per_k,
-    "thermal_bridging_w_per_k": ht.thermal_bridging_w_per_k,
-    "infiltration_ach": infiltration,
-    "infiltration_w_per_k": infiltration * volume * 0.33,
-    "heat_transfer_coefficient_w_per_k": hlc,
-    "heat_loss_parameter_w_per_m2k": hlp,
-    "time_constant_h": tau_h,
-    # §5 internal gains (annual averages)
-    "internal_gains_annual_avg_w": ...,
-    # §7 mean internal temperature (annual avg)
-    "mean_internal_temp_annual_avg_c": ...,
-    # §9 space heating
-    "useful_space_heating_kwh_per_yr": space_heating_kwh,
-    # §12 fuel costs (per end-use)
-    "main_heating_cost_gbp": ...,
-    "hot_water_cost_gbp": ...,
-    "lighting_cost_gbp": ...,
-    "pumps_fans_cost_gbp": ...,
-    # §13 rating
-    "ecf": ecf,
-    "deflator": 0.36,
-    # §14 primary energy and CO2 per end-use
-    "space_heating_pe_kwh_per_m2": ...,
-    "hot_water_pe_kwh_per_m2": ...,
-    ...
-}
-```
-
-Once populated, the differential debugging the reviewer recommended
-becomes possible: change one input field, compare deltas against an
-Elmhurst export.
-
----
-
-## 12. Specific section-1 starting tasks (suggested first session)
-
-A concrete pickup point:
-
-### Session 1 — §1 (Introduction), §2 (Property Descriptors), §3 (Dimensions)
-- §1 is prose; nothing to verify.
-- §2 maps to `EpcPropertyData`. Verify that every field RdSAP §2
-  enumerates is present and correctly typed on the domain object.
-  Specifically check: `dwelling_type`, `built_form`, `property_type`,
-  `construction_age_band`, `country_code`. Note that
-  `construction_age_band` is per-building-part, not dwelling-level,
-  and the primary age band drives most defaults.
-- §3 maps to `worksheet/dimensions.py`. Verify:
-  - Total floor area sum across building parts equals TFA
-  - Volume calculation per storey × area × height
-  - Storey count handling for extensions and room-in-roof
-  - Multi-storey heat-loss-perimeter rules
-
-This single session should produce zero behaviour changes if §1-3 are
-correctly implemented, but expect to find at least one issue in §3
-geometry (per the reviewer's "biggest SAP error sources" list).
-
-**Important:** Session 1 only starts after all five prerequisites in
-§2.5 have landed and the Validation Cohort probe baseline has been
-captured. Until then, running per-section verification produces noisy
-signal.
-
-Run the BRE worked-example fixtures (P5) + Validation Cohort probe
-(P3) at the end of each session; expect no movement until you start
-hitting actual gaps.
-
----
-
-## 13. Workflow recap
-
-**Phase 0 — Prerequisites (§2.5).** Land P1–P6 first, in dependency
-order:
-
-| | Slice | Depends on |
-|---|---|---|
-| P1 | Re-extract parquet with `inspection_date` | — |
-| P2 | Delete cert-cal; correct `table_12.py` CO2 factors | — |
-| P3 | Filter parity probe to Validation Cohort | P1 |
-| P4 | Implement `PcdbLookup` | — (P2 helpful) |
-| P5 | Populate `SapResult.intermediate` + transcribe BRE worked examples | — |
-| P6 | Strict-type `EpcPropertyData` via codes.csv-derived enums | — |
-
-P1, P2, P4, P5, P6 can run in parallel. P3 needs P1. Capture a
-Validation Cohort probe baseline once all six land — that is the new
-MAE starting line. Repo-wide tests stay green throughout P6 (Site
-Notes consumers, ML pipeline, recommendations, etc. all need the
-mapper updates that accompany each typing change).
-
-**Phase 1 — Section sweep.** For each RdSAP 10 section, in document
-order:
-
-1. Read the spec section text + cited tables.
-2. Identify code location(s).
-3. For each rule / table / footnote:
-   - Does our code implement it?
-   - Does the implementation match?
-   - Edge cases / fallback paths handled?
-4. For each gap: AAA unit test (preferring a BRE worked-example
-   assertion on `intermediate` values when possible) → minimal
-   implementation → commit.
-5. **Apply the worksheet-faithful structure principle** (§5.5) as
-   part of this slice: name functions after worksheet lines, split
-   compound calculations, replace any remaining defensive
-   type-handling with typed-enum dispatch.
-6. After each commit: run `test_bre_worked_examples.py` + Validation
-   Cohort probe. Note both deltas in the commit message.
-7. If a BRE worked-example breaks: the new code is wrong (revert).
-   The worked examples are spec-derived and cannot be broken by
-   spec-correct changes.
-
-Stick to this. The prior session's mistake was jumping between
-sections based on residual-size **on a dirty probe**. Clean probe
-plus document-order discipline plus worksheet-faithful structure is
-what makes the sweep converge.
-
----
-
-## 14. Useful references
-
-- **ADR-0010** `docs/adr/0010-sap10-calculator-spec-target-and-validation.md`
-  — the binding decisions reflected in this rewrite: SAP 10.2 target,
-  cert-cal deletion, Validation Cohort, PCDB-as-prerequisite, fixture
-  retirement. **Read first.**
-- **ADR-0009** `docs/adr/0009-deterministic-sap-calculator.md` —
-  original calculator decision rationale + Session A/B/C plan. Read
-  for context; spec-version target / PCDB sequencing / cert-cal
-  rationale are superseded by ADR-0010.
-- **Spec coverage map**
-  `docs/sap-spec/SPEC_COVERAGE.md` — pre-existing coverage tracker.
-  Update as you go.
-- **Parity findings**
-  `docs/sap-spec/PARITY_FINDINGS.md` — empirical findings from prior
-  sessions.
-- **Earlier handover**
-  `docs/sap-spec/HANDOVER_FRESH_REVIEW.md` — orientation from the
-  previous fresh-context pass.
-- **Reviewer feedback (informal)** — chatGPT critique of the slice-by-
-  slice approach. Key recommendations: two-layer architecture
-  (RdSAP expansion → SAP worksheet), trace mode, golden-master
-  methodology, differential debugging, reference traces from
-  Elmhurst/Stroma/Quidos.
-- **Commit log** — `git log --oneline` shows the slice history; each
-  S-Bxx commit message documents the spec ref + measured impact.
-
----
-
-## 15. Final note
-
-The prior session's framing — *"the cert-calibration layer absorbs
-Elmhurst's spec deviations; we'll re-derive it at the end"* — was
-load-bearing on a false diagnosis. The cert-cal layer is
-pre-March-2025 SAP prices fit against a mixture distribution of two
-spec-version regimes. Once you separate the regimes (Validation
-Cohort) and use spec prices everywhere, the "tension" disappears.
-
-After P1–P5 land, the section sweep is straightforward: every
-spec-correct fix is unambiguously the right answer, BRE
-worked-example fixtures lock the result, and Validation Cohort probe
-MAE moves monotonically downward. The fixes the prior session marked
-as "spec-correct but probe-regressed" become trivially landable.
-
-**Welcome to the project. Read ADR-0010, land the five prerequisites,
-then walk the spec in document order. The deterministic answer is in
-there.**