diff --git a/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md b/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md new file mode 100644 index 00000000..bf90df2a --- /dev/null +++ b/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md @@ -0,0 +1,615 @@ +# Handover — Systematic Section-by-Section RdSAP 10 / SAP 10.2 Review + +**Audience:** A fresh agent picking up the deterministic SAP calculator at +`packages/domain/src/domain/sap/`. Read this first, then the spec PDFs, +then the code. + +**Goal:** Match the cert software (Elmhurst / Stroma / etc.) output exactly +for RdSAP 10 / SAP 10.2 input certs. This is a **deterministic, mechanical +calculation** — not a model — so MAE should approach zero on certs whose +inputs are fully populated. + +--- + +## 1. Critical framing — this is NOT a judgement call + +The SAP/RdSAP energy assessment splits cleanly into two roles: + +1. **The assessor** — a person who surveys the dwelling and lodges + measured/observed fields onto the cert (areas, perimeters, + construction codes, insulation thicknesses, fuel types, etc.). + The assessor makes NO calculation decisions. +2. **The cert software** (Elmhurst, Stroma, Quidos, NHER, ECMK) — a + deterministic implementation of the RdSAP 10 + SAP 10.2 specs. It + takes the lodged fields and produces SAP score, CO2 emissions, + primary energy (PEUI), CO2 per m², EI rating, etc. + +**Our calculator is replicating role #2.** Where Elmhurst's +implementation diverges from spec, we follow Elmhurst, but we don't +guess at divergence; we localise it via reference traces or +empirically against the cert corpus. + +There is no "assessor judgement" knob to tune. Each field on the cert +has a deterministic interpretation per the spec. Each spec table / +formula has a deterministic implementation. Our job is to enumerate +all of them and verify each. + +--- + +## 2. Current state (2026-05-19) + +- Branch: `ara-backend-design-prd` +- Last clean commit: `f4a8d2a0` ("tests: golden-fixture regression set — 7 currently-correct corpus certs") +- 301 tests passing +- Parity probe (300 random certs from + `data/ml_training/runs/2025_2026_n250000_v18a/data.parquet`, seed=7, + `sap_score ∈ [5, 99]`): + + | Metric | Value | + |---|---| + | SAP MAE | 4.61 | + | SAP bias | +0.87 | + | PE MAE | 43.32 kWh/m² | + | PE bias | +37.69 kWh/m² | + +- 7 "golden" regression certs locked in + `packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`. + Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10 kWh/m²`. Known + caveat: some of these are compensating-error matches (e.g. cert + `7536-3827`'s PE matches but cost is £143 under cert's implied cost + due to multi-factor offsetting bugs). + +--- + +## 3. Why we are pivoting to systematic review + +The prior session shipped ten slices (S-B23 → S-B31) by debugging the +biggest residuals one at a time: + +- **PE MAE dropped substantially: 57.28 → 43.32 (−14)** — real progress + on the demand-side calculation. +- **SAP MAE barely moved: 5.34 → 4.61 (−0.73)** — the cost-side is + bottlenecked by cert-calibration prices that absorb multiple + structural deviations from spec, making any single slice that fixes + one component break the calibration for others. + +Two failed slice attempts in the prior session exposed the pattern: + +- **Standing charges**: spec note Table 12 (a) clearly says gas standing + charge of £92 is added to space + water heating costs for energy + ratings. Empirically: adding it pushed SAP bias from +0.98 to −2.62. + Reverted before committing. +- **Cat=10 room heaters off-peak routing**: Table 12a clearly says + "Other direct-acting electric heating" bills 100% high rate on + 7-hour tariff. Empirically: switching cat=10 from off-peak to + standard rate inverted the bias from +5.88 to −6.00 without + improving MAE. Reverted before committing. +- **Hot water cylinder loss (uncommitted)**: spec Table 2 footer + + Table 3 footer clearly say combi boilers using Table 4b efficiency + have zero storage + primary loss. Empirically: zeroing them dropped + PE MAE −6.64 (huge improvement) but raised SAP MAE +0.39 AND broke + 3 of 7 golden fixtures. Reverted because no way to know whether to + follow spec (PE-correct) or Elmhurst (SAP-MAE-correct) without + reference traces. + +The pattern: **the cert-calibration prices** (in +`domain.sap.tables.table_12_cert_calibration`) **were reverse-engineered +to match Elmhurst's output assuming all our other calculations are +correct.** When we fix a spec-violation bug in some other component, we +break the calibration and SAP MAE goes up even though we're more +spec-correct. + +This means **whack-a-mole on the biggest residual won't converge**. We +need to systematically verify every component against the spec, then +re-derive the cert-calibration once at the end. + +--- + +## 4. Scope decisions + +### IN scope +- **RdSAP 10 specification (10-06-2025)** — full document, all sections + (`docs/sap-spec/rdsap-10-specification-2025-06-10.pdf`, 114 pages). +- **SAP 10.2 full specification (14-03-2025)** — the worksheet, tables, + appendices that RdSAP 10 references + (`docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf`, 199 pages). + +### OUT of scope (for now) +- **Full SAP assessments.** Full-SAP certs lodge a measured/calculated + U-value in `walls[i].description` (e.g. + "Average thermal transmittance 0.18 W/m²K"). These are a separate + calculation path (BS EN ISO 6946) and a different corpus. **Park them + until the RdSAP 10 base case matches Elmhurst.** S-B24 / S-B29 + attempted partial handling; those slices can stay or be reverted at + your discretion when you reach §§4-7 of RdSAP and §3 of SAP 10.2. +- **PCDB (Product Characteristics Database).** ADR-0009 says Session C. + Heat pumps (cat=4) have catastrophic per-cert MAE because we use + Table 4a fallback efficiency 2.30 instead of PCDB SCOP. There's a + `NoOpPcdbLookup` stub seam ready in Session A; data fetch + parser + is its own milestone. +- **SAP 10.3** (13-01-2026). The corpus is SAP 10.2. SAP 10.3 has + identical Table 12 codes (only values shift). Don't update spec + references to 10.3 until the corpus migrates. + +--- + +## 5. The approach — section-by-section spec verification + +Work through the RdSAP 10 spec **in document order**, starting at +§1. For each section: + +### 5.1. Read the spec section +Read the section text fully. Note every rule, table reference, and +defaulting cascade. + +### 5.2. Find the corresponding code +Map the section to the source file(s) implementing it. The current +mapping (some sections are split across modules): + +| RdSAP 10 section | Code location | +|---|---| +| §1 Introduction / general | n/a | +| §2 Property descriptors | `datatypes/epc/domain/epc_property_data.py` | +| §3 Dimensions | `packages/domain/src/domain/sap/worksheet/dimensions.py` | +| §4 Ventilation | `packages/domain/src/domain/sap/worksheet/ventilation.py` | +| §5 Construction / U-values | `packages/domain/src/domain/ml/rdsap_uvalues.py` + `worksheet/heat_transmission.py` | +| §6 Windows / doors / overshading | `worksheet/solar_gains.py` + `rdsap/cert_to_inputs.py` | +| §7 Heating systems (refers to SAP 10.2 Appendix A) | `domain.ml.sap_efficiencies` + `rdsap/cert_to_inputs.py` | +| §8 Heating controls (Table 4e) | `rdsap/cert_to_inputs.py` | +| §9 Heat emitters / flow temperatures | not implemented | +| §10 Space and water heating (Appendix A) | `rdsap/cert_to_inputs.py` | +| §11 Additional items (PV, batteries, wind, hydro, shutters) | partial in `cert_to_inputs.py` (PV only) | +| §12 Electricity tariff | `rdsap/cert_to_inputs.py` (`_is_off_peak_meter`, fuel routing) | +| §13 Addendum to EPCs | n/a | +| §14 Special cases (e.g. flats above commercial) | not implemented | +| §15 Improvements (recommendations) | n/a (not rating) | +| §16-19 RdSAP-specific SAP rating equations | `worksheet/rating.py` | +| Table 27 — Living-area fraction | `rdsap/cert_to_inputs.py:_living_area_fraction` | +| Table 28 — Cylinder size defaults | `domain.ml.demand:_CYLINDER_VOLUME_L` | +| Table 29 — Heating + HW parameters | partial in `cert_to_inputs.py` | +| Table 30 — Mechanical ventilation | not implemented | +| Table 31 — Data to be collected | n/a | + +### 5.3. For each spec rule in the section, check our code +For each table, formula, footnote, exception: + +1. Does our code implement it? +2. Does the implementation match the spec values exactly? +3. Are there spec-defined edge cases / footnotes we're missing? + +### 5.4. When a gap is found +- Write a failing unit test that asserts the spec-correct behaviour. +- Implement the fix. +- Run **all 7 golden fixtures** plus the broader probe. Note both + direction and magnitude of change. +- If the fix is spec-correct but breaks a golden fixture, this is + evidence that the fixture was a compensating-error case — proceed + with the spec-correct fix and update the fixture (with a comment + noting it was a compensating case). +- Commit per-slice as before: one section → one commit. Reference the + spec section in the commit message. + +### 5.5. Use trace mode when you need it +ADR-0009 specifies a `SapResult.intermediate: dict[str, float]` field +that was never populated. Adding this is highly recommended for the +systematic pass — each section's verification benefits from +inspecting the intermediate values. See §11 below for a sketch. + +--- + +## 6. What's already been done — section by section + +This is your starting map. Each row says whether the section has been +touched and what the current state is. + +### Walls / construction (§5) +- **S-B23 (committed `9a509e41`)**: Table 6 "Filled cavity" row dispatch + when `wall_insulation_type=2` AND `wall_construction=4`. Spec-anchored. +- **S-B24 (committed `15613309`)**: Parse `walls[i].description` for + "Average thermal transmittance X W/m²K". **PARK** — full-SAP path. +- **S-B25 (committed `6b934710`)**: Description-based dispatch for cavity + "as built, insulated (assumed)" + similar (type=4 with descriptive + signal). Spec-anchored via legacy `epc_wall_description_map`. +- **S-B26 (committed `361f9154`)**: `_insulation_bucket(0, True) → 50` + fix (the "NI" thickness sentinel) + description-based override of + `wall_ins_present` for non-cavity walls. Spec footnote (Table 6). +- **S-B27 (committed `1f49fa03`)**: Floor `_insulation_bucket` analog — + Table 19 footnote (2) "max(50, age-band default)" when description + signals retrofit. +- **S-B28 (committed `25261d5c`)**: Roof NI thickness + insulated + description → §5.11.4 footnote 50mm joist row. +- **S-B29 (committed `3ab09845`)**: Floor + roof "Average thermal + transmittance" parse. **PARK** — full-SAP path. + +**Still to verify in §5**: +- Stone wall U-values for Scotland / Wales / NIR (Tables 7-10) — only + England is fully transcribed; country overrides are partial. +- Cob U-values (§5.6) — table only, no formula implementation. +- Stone formula §5.6 / §5.7 for non-standard wall thicknesses. +- Curtain wall §5.18 — not implemented. +- Party wall U-values (Table 15) — implemented in `u_party_wall`, + verify table values. +- Thermal bridging (Table 21) — implemented as global `y` factor, + verify per-age-band values. +- §5.16 Thermal mass — Table 22 (only 100 / 250 kJ/m²K, dispatched + by construction type with internal insulation). Currently we + hardcode 250 (see `cert_to_inputs.py:_DEFAULT_THERMAL_MASS_PARAMETER_KJ_PER_M2_K`). + This is wrong for timber-frame / cob / internally-insulated masonry + (should be 100). + +### Heating systems (§§7-10, SAP Appendix A) +- **S-B20 (in history)**: Table 11 secondary heating allocation, + conditional on cert lodging secondary or being electric storage. +- **Failed S-B30 (reverted)**: respect `main_heating_fraction` — + shown empirically wrong. Field is multi-main allocation, not + main-vs-secondary. Spec verified against SAP 10.2 Appendix A1/A4. +- **S-B31 (committed `afdf297f`)**: Table 12c DLF on heat-network main. + Spec §C3.1 + Table 12c. +- **Failed S-B32 (room heater off-peak routing, reverted)**: Table 12a + says cat=10 room heaters on 7-hour tariff bill 100% high rate. Our + cert-cal extends off-peak to codes 691-696. Spec-correct fix + inverted bias direction — calibration was absorbing this. +- **Uncommitted HW cylinder fix**: spec-correct (combi → zero + storage/primary loss per Table 2 + Table 3 footers) but breaks 3 + golden fixtures. Decision deferred to systematic pass. + +**Still to verify in heating**: +- Table 4a efficiency values for every code (heat pumps, storage + heaters, room heaters, CPSU). The category-fallback (cat=4 → 2.30) + is documented as a known limitation. +- Boiler interlock penalty (−5%) — spec §9.2.1: "The efficiency of + gas and liquid fuel boilers for both space and water heating is + reduced by 5% if the boiler is not interlocked for space and water + heating." We don't apply this. Known gap. +- Table 4c condensing-boiler / heat-pump emitter-temperature + adjustment — we don't apply this. +- Table 12a high-rate fractions for off-peak dwellings — we apply + 100% off-peak or 100% standard, never fractional blending. + +### Hot water (§4 SAP + Appendix J) +- Storage loss factor table (Table 2) — current values in + `domain.ml.demand:_STORAGE_LOSS_FACTOR` are ~3× off from spec + (verified). Known under-prediction of cylinder loss for storage + systems; cancelled by over-prediction of primary loss for combi + systems in aggregate. +- Primary loss formula (Table 3) — implemented as 245/60 kWh by age + band. Spec is a per-month formula `nₘ × 14 × [{0.0091·p + 0.0245·(1-p)}·h + 0.0263]` + with `p` (pipework insulation fraction) and `h` (circulation hours). + Known approximation. +- Combi-boiler zero-loss rule (Table 2 + Table 3 footers) — currently + NOT applied (the failed uncommitted slice). Adding this drops PE + MAE −6.64 but raises SAP MAE +0.39. +- Appendix J Vd formula `25N + 36` — currently the simple form, not + the full per-component (shower / bath / other) breakdown. Useful + HW demand is ~7% under spec value. +- ΔT — currently 43°C constant (55−12). Spec uses monthly Tcold and + hot at 52°C, not 55°C. Per-month variance unmodelled. + +### Lighting (Appendix L) +- `predicted_lighting_kwh` in `domain.ml.demand` uses `9.3 × TFA × + (1 − 0.5·led_share − 0.4·cfl_share)` heuristic. +- Spec is L1-L12: daylight correction, fixed-lighting capacity, top-up + + portable shares, monthly profile. +- For LED-dominant home (50+ LEDs): our heuristic gives ~465 kWh, spec + gives ~94 kWh. Known over-prediction by ~5× for new-build LED homes. + +### Internal gains (§5 SAP) +- `worksheet/internal_gains.py` implements metabolic + cooking + + appliances + lighting (the four positive rows of Table 5). +- **Missing**: Water heating row (`1000 × (65)ₘ / (nₘ × 24)` — i.e. + HW losses recycled as heated-space gains) and Losses row (`−40 × N` + for cold inflow + evaporation). Both documented in S-B23 gap list. + +### Ventilation (§4 / Table 5) +- Wind-shelter factor implemented in S-B21. +- Mechanical ventilation (MVHR, MEV, PIV) — not implemented; cert + rarely lodges. Spec §4.2 + Table 4g. +- Pressure-test override (worksheet lines 17-18) — not implemented. + +### Tariff / cost (§12 + Table 12 / 12a / 12c) +- Cert-calibration prices in + `domain.sap.tables.table_12_cert_calibration` are an EMPIRICAL fit + to Elmhurst's output. They are LOWER than the published Table 12 + spec values by 4-25%. Known divergence; investigation deferred. +- Standing charges (Table 12 note (a)) — NOT applied. Adding them + empirically worsens MAE (calibration absorbs). +- Table 12a high-rate fractions — currently 100% off-peak for E7- + eligible codes, 100% standard otherwise. No fractional blending. +- Heat network DLF (Table 12c) — applied per S-B31 only to main + heating + HW from main. HW-only-from-heat-network is a separate slice. + +--- + +## 7. The cert-calibration vs spec-correctness tension + +This is THE central architectural decision you have to make as you +work through the spec. + +### Two tables of fuel prices +- `domain.sap.tables.table_12.UNIT_PRICE_P_PER_KWH` — SAP 10.2 spec + values (3.64p gas, 16.49p standard elec). +- `domain.sap.tables.table_12_cert_calibration.UNIT_PRICE_P_PER_KWH` + — empirically lower values (3.48p gas, 13.19p elec) that match the + cert assessor software's output. + +### Two possible end states for the calculator + +**End state A — Spec-perfect.** Use spec prices, apply every spec rule +(standing charges, Table 12a fractions, combi zero-loss, etc.). The +calculator output is then what a *correct SAP 10.2 implementation* +would produce. SAP MAE against the corpus will likely worsen because +Elmhurst doesn't perfectly implement spec. + +**End state B — Elmhurst-perfect.** Use cert-cal prices and reproduce +Elmhurst's deviations exactly. The calculator output matches cert +SAP scores. The calculator becomes a "reverse-engineered Elmhurst +clone" rather than a SAP 10.2 implementation. + +### The pragmatic recommendation + +**Aim for state A but track state B as the parity probe.** Concretely: + +1. Verify each spec section in isolation; fix spec violations + regardless of MAE impact, but commit each fix WITH a measured + probe delta in the commit message. +2. After the spec sweep is complete, the calculator's output is + spec-correct. The corpus residual at that point is Elmhurst's + deviation from spec. +3. THEN re-derive the cert-calibration prices to match Elmhurst's + deviation pattern. The calibration becomes a thin Elmhurst- + compatibility layer on top of a spec-correct engine. + +This avoids the whack-a-mole problem because state A is unambiguous: +each fix is either spec-correct or not. State B is iterative on top +of state A, not entangled with it. + +--- + +## 8. Don't repeat — known dead-ends + +- ❌ **Switching "NI" wall thickness to None alone** (S-B5 in history) — + over-corrected because it routed to the (Unfilled cavity, 50mm) row + instead of the dedicated Filled cavity row. The right fix landed in + S-B23 with a `WALL_INSULATION_FILLED_CAVITY` dispatcher. +- ❌ **Aggressive efficiency rescue for missing `sap_main_heating_code`** + (S-B5) — over-corrected. The category fallback (cat=4 → 2.30) is + intentionally conservative; PCDB is needed for real efficiency. +- ❌ **Using SAP 10.2 spec prices for parity validation** — cert assessor + uses lower prices despite reporting `sap_version=10.2` (S-B9, S-B10). + Use `cert_calibration_prices()` for the probe. +- ❌ **Always applying 10% secondary heating** — must be conditional on + cert lodging or main system being electric storage (S-B20). See + spec Appendix A.4. +- ❌ **Respecting `main_heating_fraction` for secondary allocation** + (failed S-B30) — the field is the multi-main allocation (system 1 vs + system 2), not main-vs-secondary. SAP MAE 4.69 → 4.85 (worse). +- ❌ **Switching cat=10 room heaters off off-peak** (failed S-B32) — + spec-correct per Table 12a but inverts bias direction. Cert-cal + calibration absorbs the deviation. +- ❌ **Adding gas standing charges** (4-mode probe, unimplemented) — + spec-correct per Table 12 note (a) but pushes SAP bias from +0.98 + to −2.62. Cert-cal calibration absorbs. +- ❌ **Zeroing storage + primary loss for combi boilers** (uncommitted + S-B32) — spec-correct per Table 2 + Table 3 footers and drops PE + MAE −6.64 (huge win) BUT raises SAP MAE +0.39 and breaks 3 golden + fixtures. Decision deferred to systematic pass. + +--- + +## 9. The cert corpus and parity probe + +### Sample +`data/ml_training/runs/2025_2026_n250000_v18a/data.parquet` is the +250k-cert parquet. The probe filters to `sap_score ∈ [5, 99]` and +samples 300 at seed=7 by default. Filtering rationale: +- ≤ 5 is heritage/anomaly stock (sub-3% of corpus) +- ≥ 99 is full-SAP new-builds the parquet excludes anyway + +### Run the probe +```bash +python -c " +import sys +sys.path.insert(0, 'packages/domain/src') +sys.path.insert(0, '.') +sys.path.insert(0, 'services/ml_training_data/src') +from ml_training_data.sap_parity_probe import main +main(['300','7']) +" +``` + +### What the probe shows +- Aggregate SAP MAE / RMSE / bias +- Aggregate PE MAE / RMSE / bias +- Per-end-use PEUI breakdown (space / HW / lighting / pumps) +- Stratification by `main_heating_category`, `construction_age_band`, + `dwelling_type` +- Worst-15 residuals (SAP and PE) + +### Known parquet limitations +- ~0.7% of parquet certs have `construction_age_band=None` vs 15% in + the raw bulk-zip. The parquet filters out full-SAP new-builds + upstream. Don't measure full-SAP-path slices against the parquet. +- Heat-pump certs (cat=4) are under-represented and concentrated in + the worst-residual tail because PCDB efficiency is unavailable. + +--- + +## 10. The 7 golden fixtures + +`packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py` +locks 7 corpus certs as regression anchors: + +| Cert | TFA | Cat | Notes | +|---|---|---|---| +| `0240-0200-5706-2365-8010` | 202 | 2 | Detached, age J, oil boiler, Table 4b code 130 | +| `0300-2747-7640-2526-2135` | 526 | 2 | Semi-detached, age D, gas PCDB | +| `0390-2954-3640-2196-4175` | 360 | 2 | Detached, age F, oil PCDB | +| `6035-7729-2309-0879-2296` | 128 | 2 | Mid-terrace, age A, gas combi code 104 | +| `7536-3827-0600-0600-0276` | 152 | 2 | Detached + extensions, age D, gas PCDB. Cleanest PE match (−0.29 kWh/m²) | +| `8135-1728-8500-0511-3296` | 102 | 2 | Semi-detached, age C, gas PCDB | +| `9390-2722-3520-2105-8715` | 75 | 6 | Mid-floor flat, age D, heat network code 301 | + +Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10`. **Tighten as +the spec sweep progresses.** + +The cert JSONs are stored under `fixtures/golden/.json` — +frozen at extraction time so the test is reproducible without +bulk-zip access. The probe extraction script for new fixtures is +inlined in the test history (see commit `f4a8d2a0`). + +**Important caveat**: some of these 7 are compensating-error matches +(see §3). When a spec-correct slice breaks one, the fixture is +probably the compensating case — investigate before reverting. + +--- + +## 11. Trace mode (recommended infrastructure) + +ADR-0009 proposed: +```python +@dataclass(frozen=True) +class SapResult: + sap_score: float + ... + intermediate: dict[str, float] +``` + +The `intermediate` field was never populated. Suggested implementation +for the systematic pass: + +```python +intermediate = { + # §1 dimensions + "tfa_m2": tfa, + "volume_m3": volume, + "storey_count": storeys, + # §3 heat transmission + "walls_w_per_k": ht.walls_w_per_k, + "roof_w_per_k": ht.roof_w_per_k, + "floor_w_per_k": ht.floor_w_per_k, + "party_walls_w_per_k": ht.party_walls_w_per_k, + "windows_w_per_k": ht.windows_w_per_k, + "doors_w_per_k": ht.doors_w_per_k, + "thermal_bridging_w_per_k": ht.thermal_bridging_w_per_k, + "infiltration_ach": infiltration, + "infiltration_w_per_k": infiltration * volume * 0.33, + "heat_transfer_coefficient_w_per_k": hlc, + "heat_loss_parameter_w_per_m2k": hlp, + "time_constant_h": tau_h, + # §5 internal gains (annual averages) + "internal_gains_annual_avg_w": ..., + # §7 mean internal temperature (annual avg) + "mean_internal_temp_annual_avg_c": ..., + # §9 space heating + "useful_space_heating_kwh_per_yr": space_heating_kwh, + # §12 fuel costs (per end-use) + "main_heating_cost_gbp": ..., + "hot_water_cost_gbp": ..., + "lighting_cost_gbp": ..., + "pumps_fans_cost_gbp": ..., + # §13 rating + "ecf": ecf, + "deflator": 0.36, + # §14 primary energy and CO2 per end-use + "space_heating_pe_kwh_per_m2": ..., + "hot_water_pe_kwh_per_m2": ..., + ... +} +``` + +Once populated, the differential debugging the reviewer recommended +becomes possible: change one input field, compare deltas against an +Elmhurst export. + +--- + +## 12. Specific section-1 starting tasks (suggested first session) + +A concrete pickup point: + +### Session 1 — §1 (Introduction), §2 (Property Descriptors), §3 (Dimensions) +- §1 is prose; nothing to verify. +- §2 maps to `EpcPropertyData`. Verify that every field RdSAP §2 + enumerates is present and correctly typed on the domain object. + Specifically check: `dwelling_type`, `built_form`, `property_type`, + `construction_age_band`, `country_code`. Note that + `construction_age_band` is per-building-part, not dwelling-level, + and the primary age band drives most defaults. +- §3 maps to `worksheet/dimensions.py`. Verify: + - Total floor area sum across building parts equals TFA + - Volume calculation per storey × area × height + - Storey count handling for extensions and room-in-roof + - Multi-storey heat-loss-perimeter rules + +This single session should produce zero behaviour changes if §1-3 are +correctly implemented, but expect to find at least one issue in §3 +geometry (per the reviewer's "biggest SAP error sources" list). + +Run the golden fixtures + probe at the end of each session; expect no +movement until you start hitting actual gaps. + +--- + +## 13. Workflow recap + +For each section, in order: + +1. Read the spec section text + cited tables. +2. Identify code location(s). +3. For each rule / table / footnote: + - Does our code implement it? + - Does the implementation match? + - Edge cases / fallback paths handled? +4. For each gap: AAA unit test → minimal implementation → commit. +5. After each commit: run golden fixtures (`pytest test_golden_fixtures.py`) + and the parity probe. Note both deltas in the commit message. +6. If a golden fixture breaks: investigate. Either fixture was a + compensating case (acceptable to break) or the new code is wrong + (revert). + +Stick to this. The prior session's mistake was jumping between +sections based on residual-size. Don't. + +--- + +## 14. Useful references + +- **ADR-0009** `docs/adr/0009-deterministic-sap-calculator.md` — + decision rationale + Session A/B/C plan. +- **Spec coverage map** + `docs/sap-spec/SPEC_COVERAGE.md` — pre-existing coverage tracker. + Update as you go. +- **Parity findings** + `docs/sap-spec/PARITY_FINDINGS.md` — empirical findings from prior + sessions. +- **Earlier handover** + `docs/sap-spec/HANDOVER_FRESH_REVIEW.md` — orientation from the + previous fresh-context pass. +- **Reviewer feedback (informal)** — chatGPT critique of the slice-by- + slice approach. Key recommendations: two-layer architecture + (RdSAP expansion → SAP worksheet), trace mode, golden-master + methodology, differential debugging, reference traces from + Elmhurst/Stroma/Quidos. +- **Commit log** — `git log --oneline` shows the slice history; each + S-Bxx commit message documents the spec ref + measured impact. + +--- + +## 15. Final note + +The prior session demonstrated that **moving SAP MAE down requires +either spec-correctness OR Elmhurst-perfect calibration, not both +simultaneously**. The cert-cal layer absorbs Elmhurst's spec +deviations; any spec-correct fix risks breaking it. + +The systematic pass clears this by separating the layers: +1. Build the spec-correct engine first. +2. Re-fit the cert-cal compatibility layer once at the end. + +Don't be discouraged when SAP MAE rises temporarily during the spec +sweep. PE residual is the truer signal of engine correctness. SAP +MAE convergence will follow once cert-cal is re-derived against the +clean engine. + +**Welcome to the project. Read the spec, follow the order, commit one +section at a time. The deterministic answer is in there.**