docs: handover for systematic section-by-section RdSAP 10 review

The slice-by-slice "fix the biggest residual" approach has hit a ceiling at SAP MAE ~4.6 because the cert-calibration prices absorb multiple structural deviations from spec. Any spec-correct fix in one component breaks the calibration for others. Three failed slices this session (standing charges, cat=10 routing, combi zero-loss) made the pattern unambiguous. Pivot: systematic section-by-section spec verification. Read the RdSAP 10 + SAP 10.2 spec in order, check each table / formula / footnote against the corresponding code, fix gaps one at a time. Build the spec-correct engine first; re-derive cert-cal calibration once at the end as a thin Elmhurst-compatibility layer. Handover doc covers: - Critical framing (deterministic, not assessor judgement) - Current state (SAP MAE 4.61, PE MAE 43.32 at f4a8d2a0) - Why the slice-by-slice approach won't converge - Scope decisions (RdSAP 10 + SAP 10.2 only; park full-SAP + PCDB) - Section-to-code mapping - Known dead-ends to skip - Cert-calibration vs spec-correctness tension and how to resolve it - The 7 golden fixtures and their compensating-error caveats - Trace mode recommendation (ADR-0009's `intermediate` field) - Specific §1-3 starting tasks - Workflow recap Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-07-27 23:35:01 +00:00 · 2026-05-19 07:30:27 +00:00 · 2026-05-19 07:30:27 +00:00 · 3363f63f5e
commit 3363f63f5e
parent f4a8d2a017
1 changed files with 615 additions and 0 deletions
--- a/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md
+++ b/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md
@ -0,0 +1,615 @@
+# Handover — Systematic Section-by-Section RdSAP 10 / SAP 10.2 Review
+
+**Audience:** A fresh agent picking up the deterministic SAP calculator at
+`packages/domain/src/domain/sap/`. Read this first, then the spec PDFs,
+then the code.
+
+**Goal:** Match the cert software (Elmhurst / Stroma / etc.) output exactly
+for RdSAP 10 / SAP 10.2 input certs. This is a **deterministic, mechanical
+calculation** — not a model — so MAE should approach zero on certs whose
+inputs are fully populated.
+
+---
+
+## 1. Critical framing — this is NOT a judgement call
+
+The SAP/RdSAP energy assessment splits cleanly into two roles:
+
+1. **The assessor** — a person who surveys the dwelling and lodges
+   measured/observed fields onto the cert (areas, perimeters,
+   construction codes, insulation thicknesses, fuel types, etc.).
+   The assessor makes NO calculation decisions.
+2. **The cert software** (Elmhurst, Stroma, Quidos, NHER, ECMK) — a
+   deterministic implementation of the RdSAP 10 + SAP 10.2 specs. It
+   takes the lodged fields and produces SAP score, CO2 emissions,
+   primary energy (PEUI), CO2 per m², EI rating, etc.
+
+**Our calculator is replicating role #2.** Where Elmhurst's
+implementation diverges from spec, we follow Elmhurst, but we don't
+guess at divergence; we localise it via reference traces or
+empirically against the cert corpus.
+
+There is no "assessor judgement" knob to tune. Each field on the cert
+has a deterministic interpretation per the spec. Each spec table /
+formula has a deterministic implementation. Our job is to enumerate
+all of them and verify each.
+
+---
+
+## 2. Current state (2026-05-19)
+
+- Branch: `ara-backend-design-prd`
+- Last clean commit: `f4a8d2a0` ("tests: golden-fixture regression set — 7 currently-correct corpus certs")
+- 301 tests passing
+- Parity probe (300 random certs from
+  `data/ml_training/runs/2025_2026_n250000_v18a/data.parquet`, seed=7,
+  `sap_score ∈ [5, 99]`):
+
+  | Metric | Value |
+  |---|---|
+  | SAP MAE | 4.61 |
+  | SAP bias | +0.87 |
+  | PE MAE | 43.32 kWh/m² |
+  | PE bias | +37.69 kWh/m² |
+
+- 7 "golden" regression certs locked in
+  `packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`.
+  Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10 kWh/m²`. Known
+  caveat: some of these are compensating-error matches (e.g. cert
+  `7536-3827`'s PE matches but cost is £143 under cert's implied cost
+  due to multi-factor offsetting bugs).
+
+---
+
+## 3. Why we are pivoting to systematic review
+
+The prior session shipped ten slices (S-B23 → S-B31) by debugging the
+biggest residuals one at a time:
+
+- **PE MAE dropped substantially: 57.28 → 43.32 (−14)** — real progress
+  on the demand-side calculation.
+- **SAP MAE barely moved: 5.34 → 4.61 (−0.73)** — the cost-side is
+  bottlenecked by cert-calibration prices that absorb multiple
+  structural deviations from spec, making any single slice that fixes
+  one component break the calibration for others.
+
+Two failed slice attempts in the prior session exposed the pattern:
+
+- **Standing charges**: spec note Table 12 (a) clearly says gas standing
+  charge of £92 is added to space + water heating costs for energy
+  ratings. Empirically: adding it pushed SAP bias from +0.98 to −2.62.
+  Reverted before committing.
+- **Cat=10 room heaters off-peak routing**: Table 12a clearly says
+  "Other direct-acting electric heating" bills 100% high rate on
+  7-hour tariff. Empirically: switching cat=10 from off-peak to
+  standard rate inverted the bias from +5.88 to −6.00 without
+  improving MAE. Reverted before committing.
+- **Hot water cylinder loss (uncommitted)**: spec Table 2 footer +
+  Table 3 footer clearly say combi boilers using Table 4b efficiency
+  have zero storage + primary loss. Empirically: zeroing them dropped
+  PE MAE −6.64 (huge improvement) but raised SAP MAE +0.39 AND broke
+  3 of 7 golden fixtures. Reverted because no way to know whether to
+  follow spec (PE-correct) or Elmhurst (SAP-MAE-correct) without
+  reference traces.
+
+The pattern: **the cert-calibration prices** (in
+`domain.sap.tables.table_12_cert_calibration`) **were reverse-engineered
+to match Elmhurst's output assuming all our other calculations are
+correct.** When we fix a spec-violation bug in some other component, we
+break the calibration and SAP MAE goes up even though we're more
+spec-correct.
+
+This means **whack-a-mole on the biggest residual won't converge**. We
+need to systematically verify every component against the spec, then
+re-derive the cert-calibration once at the end.
+
+---
+
+## 4. Scope decisions
+
+### IN scope
+- **RdSAP 10 specification (10-06-2025)** — full document, all sections
+  (`docs/sap-spec/rdsap-10-specification-2025-06-10.pdf`, 114 pages).
+- **SAP 10.2 full specification (14-03-2025)** — the worksheet, tables,
+  appendices that RdSAP 10 references
+  (`docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf`, 199 pages).
+
+### OUT of scope (for now)
+- **Full SAP assessments.** Full-SAP certs lodge a measured/calculated
+  U-value in `walls[i].description` (e.g.
+  "Average thermal transmittance 0.18 W/m²K"). These are a separate
+  calculation path (BS EN ISO 6946) and a different corpus. **Park them
+  until the RdSAP 10 base case matches Elmhurst.** S-B24 / S-B29
+  attempted partial handling; those slices can stay or be reverted at
+  your discretion when you reach §§4-7 of RdSAP and §3 of SAP 10.2.
+- **PCDB (Product Characteristics Database).** ADR-0009 says Session C.
+  Heat pumps (cat=4) have catastrophic per-cert MAE because we use
+  Table 4a fallback efficiency 2.30 instead of PCDB SCOP. There's a
+  `NoOpPcdbLookup` stub seam ready in Session A; data fetch + parser
+  is its own milestone.
+- **SAP 10.3** (13-01-2026). The corpus is SAP 10.2. SAP 10.3 has
+  identical Table 12 codes (only values shift). Don't update spec
+  references to 10.3 until the corpus migrates.
+
+---
+
+## 5. The approach — section-by-section spec verification
+
+Work through the RdSAP 10 spec **in document order**, starting at
+§1. For each section:
+
+### 5.1. Read the spec section
+Read the section text fully. Note every rule, table reference, and
+defaulting cascade.
+
+### 5.2. Find the corresponding code
+Map the section to the source file(s) implementing it. The current
+mapping (some sections are split across modules):
+
+| RdSAP 10 section | Code location |
+|---|---|
+| §1 Introduction / general | n/a |
+| §2 Property descriptors | `datatypes/epc/domain/epc_property_data.py` |
+| §3 Dimensions | `packages/domain/src/domain/sap/worksheet/dimensions.py` |
+| §4 Ventilation | `packages/domain/src/domain/sap/worksheet/ventilation.py` |
+| §5 Construction / U-values | `packages/domain/src/domain/ml/rdsap_uvalues.py` + `worksheet/heat_transmission.py` |
+| §6 Windows / doors / overshading | `worksheet/solar_gains.py` + `rdsap/cert_to_inputs.py` |
+| §7 Heating systems (refers to SAP 10.2 Appendix A) | `domain.ml.sap_efficiencies` + `rdsap/cert_to_inputs.py` |
+| §8 Heating controls (Table 4e) | `rdsap/cert_to_inputs.py` |
+| §9 Heat emitters / flow temperatures | not implemented |
+| §10 Space and water heating (Appendix A) | `rdsap/cert_to_inputs.py` |
+| §11 Additional items (PV, batteries, wind, hydro, shutters) | partial in `cert_to_inputs.py` (PV only) |
+| §12 Electricity tariff | `rdsap/cert_to_inputs.py` (`_is_off_peak_meter`, fuel routing) |
+| §13 Addendum to EPCs | n/a |
+| §14 Special cases (e.g. flats above commercial) | not implemented |
+| §15 Improvements (recommendations) | n/a (not rating) |
+| §16-19 RdSAP-specific SAP rating equations | `worksheet/rating.py` |
+| Table 27 — Living-area fraction | `rdsap/cert_to_inputs.py:_living_area_fraction` |
+| Table 28 — Cylinder size defaults | `domain.ml.demand:_CYLINDER_VOLUME_L` |
+| Table 29 — Heating + HW parameters | partial in `cert_to_inputs.py` |
+| Table 30 — Mechanical ventilation | not implemented |
+| Table 31 — Data to be collected | n/a |
+
+### 5.3. For each spec rule in the section, check our code
+For each table, formula, footnote, exception:
+
+1. Does our code implement it?
+2. Does the implementation match the spec values exactly?
+3. Are there spec-defined edge cases / footnotes we're missing?
+
+### 5.4. When a gap is found
+- Write a failing unit test that asserts the spec-correct behaviour.
+- Implement the fix.
+- Run **all 7 golden fixtures** plus the broader probe. Note both
+  direction and magnitude of change.
+- If the fix is spec-correct but breaks a golden fixture, this is
+  evidence that the fixture was a compensating-error case — proceed
+  with the spec-correct fix and update the fixture (with a comment
+  noting it was a compensating case).
+- Commit per-slice as before: one section → one commit. Reference the
+  spec section in the commit message.
+
+### 5.5. Use trace mode when you need it
+ADR-0009 specifies a `SapResult.intermediate: dict[str, float]` field
+that was never populated. Adding this is highly recommended for the
+systematic pass — each section's verification benefits from
+inspecting the intermediate values. See §11 below for a sketch.
+
+---
+
+## 6. What's already been done — section by section
+
+This is your starting map. Each row says whether the section has been
+touched and what the current state is.
+
+### Walls / construction (§5)
+- **S-B23 (committed `9a509e41`)**: Table 6 "Filled cavity" row dispatch
+  when `wall_insulation_type=2` AND `wall_construction=4`. Spec-anchored.
+- **S-B24 (committed `15613309`)**: Parse `walls[i].description` for
+  "Average thermal transmittance X W/m²K". **PARK** — full-SAP path.
+- **S-B25 (committed `6b934710`)**: Description-based dispatch for cavity
+  "as built, insulated (assumed)" + similar (type=4 with descriptive
+  signal). Spec-anchored via legacy `epc_wall_description_map`.
+- **S-B26 (committed `361f9154`)**: `_insulation_bucket(0, True) → 50`
+  fix (the "NI" thickness sentinel) + description-based override of
+  `wall_ins_present` for non-cavity walls. Spec footnote (Table 6).
+- **S-B27 (committed `1f49fa03`)**: Floor `_insulation_bucket` analog —
+  Table 19 footnote (2) "max(50, age-band default)" when description
+  signals retrofit.
+- **S-B28 (committed `25261d5c`)**: Roof NI thickness + insulated
+  description → §5.11.4 footnote 50mm joist row.
+- **S-B29 (committed `3ab09845`)**: Floor + roof "Average thermal
+  transmittance" parse. **PARK** — full-SAP path.
+
+**Still to verify in §5**:
+- Stone wall U-values for Scotland / Wales / NIR (Tables 7-10) — only
+  England is fully transcribed; country overrides are partial.
+- Cob U-values (§5.6) — table only, no formula implementation.
+- Stone formula §5.6 / §5.7 for non-standard wall thicknesses.
+- Curtain wall §5.18 — not implemented.
+- Party wall U-values (Table 15) — implemented in `u_party_wall`,
+  verify table values.
+- Thermal bridging (Table 21) — implemented as global `y` factor,
+  verify per-age-band values.
+- §5.16 Thermal mass — Table 22 (only 100 / 250 kJ/m²K, dispatched
+  by construction type with internal insulation). Currently we
+  hardcode 250 (see `cert_to_inputs.py:_DEFAULT_THERMAL_MASS_PARAMETER_KJ_PER_M2_K`).
+  This is wrong for timber-frame / cob / internally-insulated masonry
+  (should be 100).
+
+### Heating systems (§§7-10, SAP Appendix A)
+- **S-B20 (in history)**: Table 11 secondary heating allocation,
+  conditional on cert lodging secondary or being electric storage.
+- **Failed S-B30 (reverted)**: respect `main_heating_fraction` —
+  shown empirically wrong. Field is multi-main allocation, not
+  main-vs-secondary. Spec verified against SAP 10.2 Appendix A1/A4.
+- **S-B31 (committed `afdf297f`)**: Table 12c DLF on heat-network main.
+  Spec §C3.1 + Table 12c.
+- **Failed S-B32 (room heater off-peak routing, reverted)**: Table 12a
+  says cat=10 room heaters on 7-hour tariff bill 100% high rate. Our
+  cert-cal extends off-peak to codes 691-696. Spec-correct fix
+  inverted bias direction — calibration was absorbing this.
+- **Uncommitted HW cylinder fix**: spec-correct (combi → zero
+  storage/primary loss per Table 2 + Table 3 footers) but breaks 3
+  golden fixtures. Decision deferred to systematic pass.
+
+**Still to verify in heating**:
+- Table 4a efficiency values for every code (heat pumps, storage
+  heaters, room heaters, CPSU). The category-fallback (cat=4 → 2.30)
+  is documented as a known limitation.
+- Boiler interlock penalty (−5%) — spec §9.2.1: "The efficiency of
+  gas and liquid fuel boilers for both space and water heating is
+  reduced by 5% if the boiler is not interlocked for space and water
+  heating." We don't apply this. Known gap.
+- Table 4c condensing-boiler / heat-pump emitter-temperature
+  adjustment — we don't apply this.
+- Table 12a high-rate fractions for off-peak dwellings — we apply
+  100% off-peak or 100% standard, never fractional blending.
+
+### Hot water (§4 SAP + Appendix J)
+- Storage loss factor table (Table 2) — current values in
+  `domain.ml.demand:_STORAGE_LOSS_FACTOR` are ~3× off from spec
+  (verified). Known under-prediction of cylinder loss for storage
+  systems; cancelled by over-prediction of primary loss for combi
+  systems in aggregate.
+- Primary loss formula (Table 3) — implemented as 245/60 kWh by age
+  band. Spec is a per-month formula `nₘ × 14 × [{0.0091·p + 0.0245·(1-p)}·h + 0.0263]`
+  with `p` (pipework insulation fraction) and `h` (circulation hours).
+  Known approximation.
+- Combi-boiler zero-loss rule (Table 2 + Table 3 footers) — currently
+  NOT applied (the failed uncommitted slice). Adding this drops PE
+  MAE −6.64 but raises SAP MAE +0.39.
+- Appendix J Vd formula `25N + 36` — currently the simple form, not
+  the full per-component (shower / bath / other) breakdown. Useful
+  HW demand is ~7% under spec value.
+- ΔT — currently 43°C constant (55−12). Spec uses monthly Tcold and
+  hot at 52°C, not 55°C. Per-month variance unmodelled.
+
+### Lighting (Appendix L)
+- `predicted_lighting_kwh` in `domain.ml.demand` uses `9.3 × TFA ×
+  (1 − 0.5·led_share − 0.4·cfl_share)` heuristic.
+- Spec is L1-L12: daylight correction, fixed-lighting capacity, top-up
+  + portable shares, monthly profile.
+- For LED-dominant home (50+ LEDs): our heuristic gives ~465 kWh, spec
+  gives ~94 kWh. Known over-prediction by ~5× for new-build LED homes.
+
+### Internal gains (§5 SAP)
+- `worksheet/internal_gains.py` implements metabolic + cooking +
+  appliances + lighting (the four positive rows of Table 5).
+- **Missing**: Water heating row (`1000 × (65)ₘ / (nₘ × 24)` — i.e.
+  HW losses recycled as heated-space gains) and Losses row (`−40 × N`
+  for cold inflow + evaporation). Both documented in S-B23 gap list.
+
+### Ventilation (§4 / Table 5)
+- Wind-shelter factor implemented in S-B21.
+- Mechanical ventilation (MVHR, MEV, PIV) — not implemented; cert
+  rarely lodges. Spec §4.2 + Table 4g.
+- Pressure-test override (worksheet lines 17-18) — not implemented.
+
+### Tariff / cost (§12 + Table 12 / 12a / 12c)
+- Cert-calibration prices in
+  `domain.sap.tables.table_12_cert_calibration` are an EMPIRICAL fit
+  to Elmhurst's output. They are LOWER than the published Table 12
+  spec values by 4-25%. Known divergence; investigation deferred.
+- Standing charges (Table 12 note (a)) — NOT applied. Adding them
+  empirically worsens MAE (calibration absorbs).
+- Table 12a high-rate fractions — currently 100% off-peak for E7-
+  eligible codes, 100% standard otherwise. No fractional blending.
+- Heat network DLF (Table 12c) — applied per S-B31 only to main
+  heating + HW from main. HW-only-from-heat-network is a separate slice.
+
+---
+
+## 7. The cert-calibration vs spec-correctness tension
+
+This is THE central architectural decision you have to make as you
+work through the spec.
+
+### Two tables of fuel prices
+- `domain.sap.tables.table_12.UNIT_PRICE_P_PER_KWH` — SAP 10.2 spec
+  values (3.64p gas, 16.49p standard elec).
+- `domain.sap.tables.table_12_cert_calibration.UNIT_PRICE_P_PER_KWH`
+  — empirically lower values (3.48p gas, 13.19p elec) that match the
+  cert assessor software's output.
+
+### Two possible end states for the calculator
+
+**End state A — Spec-perfect.** Use spec prices, apply every spec rule
+(standing charges, Table 12a fractions, combi zero-loss, etc.). The
+calculator output is then what a *correct SAP 10.2 implementation*
+would produce. SAP MAE against the corpus will likely worsen because
+Elmhurst doesn't perfectly implement spec.
+
+**End state B — Elmhurst-perfect.** Use cert-cal prices and reproduce
+Elmhurst's deviations exactly. The calculator output matches cert
+SAP scores. The calculator becomes a "reverse-engineered Elmhurst
+clone" rather than a SAP 10.2 implementation.
+
+### The pragmatic recommendation
+
+**Aim for state A but track state B as the parity probe.** Concretely:
+
+1. Verify each spec section in isolation; fix spec violations
+   regardless of MAE impact, but commit each fix WITH a measured
+   probe delta in the commit message.
+2. After the spec sweep is complete, the calculator's output is
+   spec-correct. The corpus residual at that point is Elmhurst's
+   deviation from spec.
+3. THEN re-derive the cert-calibration prices to match Elmhurst's
+   deviation pattern. The calibration becomes a thin Elmhurst-
+   compatibility layer on top of a spec-correct engine.
+
+This avoids the whack-a-mole problem because state A is unambiguous:
+each fix is either spec-correct or not. State B is iterative on top
+of state A, not entangled with it.
+
+---
+
+## 8. Don't repeat — known dead-ends
+
+- ❌ **Switching "NI" wall thickness to None alone** (S-B5 in history) —
+  over-corrected because it routed to the (Unfilled cavity, 50mm) row
+  instead of the dedicated Filled cavity row. The right fix landed in
+  S-B23 with a `WALL_INSULATION_FILLED_CAVITY` dispatcher.
+- ❌ **Aggressive efficiency rescue for missing `sap_main_heating_code`**
+  (S-B5) — over-corrected. The category fallback (cat=4 → 2.30) is
+  intentionally conservative; PCDB is needed for real efficiency.
+- ❌ **Using SAP 10.2 spec prices for parity validation** — cert assessor
+  uses lower prices despite reporting `sap_version=10.2` (S-B9, S-B10).
+  Use `cert_calibration_prices()` for the probe.
+- ❌ **Always applying 10% secondary heating** — must be conditional on
+  cert lodging or main system being electric storage (S-B20). See
+  spec Appendix A.4.
+- ❌ **Respecting `main_heating_fraction` for secondary allocation**
+  (failed S-B30) — the field is the multi-main allocation (system 1 vs
+  system 2), not main-vs-secondary. SAP MAE 4.69 → 4.85 (worse).
+- ❌ **Switching cat=10 room heaters off off-peak** (failed S-B32) —
+  spec-correct per Table 12a but inverts bias direction. Cert-cal
+  calibration absorbs the deviation.
+- ❌ **Adding gas standing charges** (4-mode probe, unimplemented) —
+  spec-correct per Table 12 note (a) but pushes SAP bias from +0.98
+  to −2.62. Cert-cal calibration absorbs.
+- ❌ **Zeroing storage + primary loss for combi boilers** (uncommitted
+  S-B32) — spec-correct per Table 2 + Table 3 footers and drops PE
+  MAE −6.64 (huge win) BUT raises SAP MAE +0.39 and breaks 3 golden
+  fixtures. Decision deferred to systematic pass.
+
+---
+
+## 9. The cert corpus and parity probe
+
+### Sample
+`data/ml_training/runs/2025_2026_n250000_v18a/data.parquet` is the
+250k-cert parquet. The probe filters to `sap_score ∈ [5, 99]` and
+samples 300 at seed=7 by default. Filtering rationale:
+- ≤ 5 is heritage/anomaly stock (sub-3% of corpus)
+- ≥ 99 is full-SAP new-builds the parquet excludes anyway
+
+### Run the probe
+```bash
+python -c "
+import sys
+sys.path.insert(0, 'packages/domain/src')
+sys.path.insert(0, '.')
+sys.path.insert(0, 'services/ml_training_data/src')
+from ml_training_data.sap_parity_probe import main
+main(['300','7'])
+"
+```
+
+### What the probe shows
+- Aggregate SAP MAE / RMSE / bias
+- Aggregate PE MAE / RMSE / bias
+- Per-end-use PEUI breakdown (space / HW / lighting / pumps)
+- Stratification by `main_heating_category`, `construction_age_band`,
+  `dwelling_type`
+- Worst-15 residuals (SAP and PE)
+
+### Known parquet limitations
+- ~0.7% of parquet certs have `construction_age_band=None` vs 15% in
+  the raw bulk-zip. The parquet filters out full-SAP new-builds
+  upstream. Don't measure full-SAP-path slices against the parquet.
+- Heat-pump certs (cat=4) are under-represented and concentrated in
+  the worst-residual tail because PCDB efficiency is unavailable.
+
+---
+
+## 10. The 7 golden fixtures
+
+`packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`
+locks 7 corpus certs as regression anchors:
+
+| Cert | TFA | Cat | Notes |
+|---|---|---|---|
+| `0240-0200-5706-2365-8010` | 202 | 2 | Detached, age J, oil boiler, Table 4b code 130 |
+| `0300-2747-7640-2526-2135` | 526 | 2 | Semi-detached, age D, gas PCDB |
+| `0390-2954-3640-2196-4175` | 360 | 2 | Detached, age F, oil PCDB |
+| `6035-7729-2309-0879-2296` | 128 | 2 | Mid-terrace, age A, gas combi code 104 |
+| `7536-3827-0600-0600-0276` | 152 | 2 | Detached + extensions, age D, gas PCDB. Cleanest PE match (−0.29 kWh/m²) |
+| `8135-1728-8500-0511-3296` | 102 | 2 | Semi-detached, age C, gas PCDB |
+| `9390-2722-3520-2105-8715` | 75 | 6 | Mid-floor flat, age D, heat network code 301 |
+
+Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10`. **Tighten as
+the spec sweep progresses.**
+
+The cert JSONs are stored under `fixtures/golden/<cert>.json` —
+frozen at extraction time so the test is reproducible without
+bulk-zip access. The probe extraction script for new fixtures is
+inlined in the test history (see commit `f4a8d2a0`).
+
+**Important caveat**: some of these 7 are compensating-error matches
+(see §3). When a spec-correct slice breaks one, the fixture is
+probably the compensating case — investigate before reverting.
+
+---
+
+## 11. Trace mode (recommended infrastructure)
+
+ADR-0009 proposed:
+```python
+@dataclass(frozen=True)
+class SapResult:
+    sap_score: float
+    ...
+    intermediate: dict[str, float]
+```
+
+The `intermediate` field was never populated. Suggested implementation
+for the systematic pass:
+
+```python
+intermediate = {
+    # §1 dimensions
+    "tfa_m2": tfa,
+    "volume_m3": volume,
+    "storey_count": storeys,
+    # §3 heat transmission
+    "walls_w_per_k": ht.walls_w_per_k,
+    "roof_w_per_k": ht.roof_w_per_k,
+    "floor_w_per_k": ht.floor_w_per_k,
+    "party_walls_w_per_k": ht.party_walls_w_per_k,
+    "windows_w_per_k": ht.windows_w_per_k,
+    "doors_w_per_k": ht.doors_w_per_k,
+    "thermal_bridging_w_per_k": ht.thermal_bridging_w_per_k,
+    "infiltration_ach": infiltration,
+    "infiltration_w_per_k": infiltration * volume * 0.33,
+    "heat_transfer_coefficient_w_per_k": hlc,
+    "heat_loss_parameter_w_per_m2k": hlp,
+    "time_constant_h": tau_h,
+    # §5 internal gains (annual averages)
+    "internal_gains_annual_avg_w": ...,
+    # §7 mean internal temperature (annual avg)
+    "mean_internal_temp_annual_avg_c": ...,
+    # §9 space heating
+    "useful_space_heating_kwh_per_yr": space_heating_kwh,
+    # §12 fuel costs (per end-use)
+    "main_heating_cost_gbp": ...,
+    "hot_water_cost_gbp": ...,
+    "lighting_cost_gbp": ...,
+    "pumps_fans_cost_gbp": ...,
+    # §13 rating
+    "ecf": ecf,
+    "deflator": 0.36,
+    # §14 primary energy and CO2 per end-use
+    "space_heating_pe_kwh_per_m2": ...,
+    "hot_water_pe_kwh_per_m2": ...,
+    ...
+}
+```
+
+Once populated, the differential debugging the reviewer recommended
+becomes possible: change one input field, compare deltas against an
+Elmhurst export.
+
+---
+
+## 12. Specific section-1 starting tasks (suggested first session)
+
+A concrete pickup point:
+
+### Session 1 — §1 (Introduction), §2 (Property Descriptors), §3 (Dimensions)
+- §1 is prose; nothing to verify.
+- §2 maps to `EpcPropertyData`. Verify that every field RdSAP §2
+  enumerates is present and correctly typed on the domain object.
+  Specifically check: `dwelling_type`, `built_form`, `property_type`,
+  `construction_age_band`, `country_code`. Note that
+  `construction_age_band` is per-building-part, not dwelling-level,
+  and the primary age band drives most defaults.
+- §3 maps to `worksheet/dimensions.py`. Verify:
+  - Total floor area sum across building parts equals TFA
+  - Volume calculation per storey × area × height
+  - Storey count handling for extensions and room-in-roof
+  - Multi-storey heat-loss-perimeter rules
+
+This single session should produce zero behaviour changes if §1-3 are
+correctly implemented, but expect to find at least one issue in §3
+geometry (per the reviewer's "biggest SAP error sources" list).
+
+Run the golden fixtures + probe at the end of each session; expect no
+movement until you start hitting actual gaps.
+
+---
+
+## 13. Workflow recap
+
+For each section, in order:
+
+1. Read the spec section text + cited tables.
+2. Identify code location(s).
+3. For each rule / table / footnote:
+   - Does our code implement it?
+   - Does the implementation match?
+   - Edge cases / fallback paths handled?
+4. For each gap: AAA unit test → minimal implementation → commit.
+5. After each commit: run golden fixtures (`pytest test_golden_fixtures.py`)
+   and the parity probe. Note both deltas in the commit message.
+6. If a golden fixture breaks: investigate. Either fixture was a
+   compensating case (acceptable to break) or the new code is wrong
+   (revert).
+
+Stick to this. The prior session's mistake was jumping between
+sections based on residual-size. Don't.
+
+---
+
+## 14. Useful references
+
+- **ADR-0009** `docs/adr/0009-deterministic-sap-calculator.md` —
+  decision rationale + Session A/B/C plan.
+- **Spec coverage map**
+  `docs/sap-spec/SPEC_COVERAGE.md` — pre-existing coverage tracker.
+  Update as you go.
+- **Parity findings**
+  `docs/sap-spec/PARITY_FINDINGS.md` — empirical findings from prior
+  sessions.
+- **Earlier handover**
+  `docs/sap-spec/HANDOVER_FRESH_REVIEW.md` — orientation from the
+  previous fresh-context pass.
+- **Reviewer feedback (informal)** — chatGPT critique of the slice-by-
+  slice approach. Key recommendations: two-layer architecture
+  (RdSAP expansion → SAP worksheet), trace mode, golden-master
+  methodology, differential debugging, reference traces from
+  Elmhurst/Stroma/Quidos.
+- **Commit log** — `git log --oneline` shows the slice history; each
+  S-Bxx commit message documents the spec ref + measured impact.
+
+---
+
+## 15. Final note
+
+The prior session demonstrated that **moving SAP MAE down requires
+either spec-correctness OR Elmhurst-perfect calibration, not both
+simultaneously**. The cert-cal layer absorbs Elmhurst's spec
+deviations; any spec-correct fix risks breaking it.
+
+The systematic pass clears this by separating the layers:
+1. Build the spec-correct engine first.
+2. Re-fit the cert-cal compatibility layer once at the end.
+
+Don't be discouraged when SAP MAE rises temporarily during the spec
+sweep. PE residual is the truer signal of engine correctness. SAP
+MAE convergence will follow once cert-cal is re-derived against the
+clean engine.
+
+**Welcome to the project. Read the spec, follow the order, commit one
+section at a time. The deterministic answer is in there.**