From 49e8c65ae88eaa8659e8ffd1cee45ad700a4f227 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Wed, 20 May 2026 13:03:09 +0000 Subject: [PATCH] =?UTF-8?q?Handover:=20replace=20stale=20docs=20with=20foc?= =?UTF-8?q?used=20=C2=A73-close=20+=20Table-11=20brief?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Delete HANDOVER_FRESH_REVIEW (22-slice, MAE-5.34 era) and HANDOVER_SYSTEMATIC_REVIEW (pre-Elmhurst-conformance). Both described a state the Elmhurst worksheet work has since superseded. Add HANDOVER_S3_CLOSE.md with: - Accurate §3 status: §1/§2 fully done; LINE_31/LINE_36 exact for non-RR fixtures; LINE_33 gap diagnosed as missing floor_construction codes (not a window-area problem as previously assumed) - Concrete investigation steps to close LINE_33 for 000474 + 000490 - Table 11 Secondary Heating framed as next slice after §3 Co-Authored-By: Claude Sonnet 4.6 --- docs/sap-spec/HANDOVER_FRESH_REVIEW.md | 136 --- docs/sap-spec/HANDOVER_S3_CLOSE.md | 149 +++ docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md | 1092 ------------------- 3 files changed, 149 insertions(+), 1228 deletions(-) delete mode 100644 docs/sap-spec/HANDOVER_FRESH_REVIEW.md create mode 100644 docs/sap-spec/HANDOVER_S3_CLOSE.md delete mode 100644 docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md diff --git a/docs/sap-spec/HANDOVER_FRESH_REVIEW.md b/docs/sap-spec/HANDOVER_FRESH_REVIEW.md deleted file mode 100644 index 6f9ca0b5..00000000 --- a/docs/sap-spec/HANDOVER_FRESH_REVIEW.md +++ /dev/null @@ -1,136 +0,0 @@ -# Handover: fresh-context review of the SAP 10.2 calculator - -Audience: a fresh agent in a new context window. Read this first, then the SAP 10.2 + RdSAP 10 spec PDFs, then the calculator code. Your job is to find spec-vs-implementation gaps that the previous (long-context) agent has missed or got wrong. - -## TL;DR — where we are - -- Deterministic SAP 10.2 calculator at `packages/domain/src/domain/sap/`. -- 22 slices shipped under ADR-0009. -- 300-cert parity probe: **SAP MAE 5.34, bias +0.29** (we're slightly over-predicting SAP score on average). -- **Primary-energy bias +51.6 kWh/m²** ← biggest surprise; we over-predict primary energy by ~50%. This was discovered just before this handover; previous slices weren't accounting for it correctly. -- 17/300 (5.7%) certs match the cert's `energy_rating_current` exactly. - -Goal per ADR-0009: typical-subset SAP MAE ≤ 1.0. - -## Critical context - -1. **Two truth-sources collide.** `tables/table_12.py` carries the spec-correct SAP 10.2/10.3 prices (mains gas 3.64p, std elec 16.49p). `tables/table_12_cert_calibration.py` carries the empirical lower prices that match the cert assessor's actual output (3.48p, 13.19p). The parity probe uses the cert-calibration table; the engine's default is spec. -2. **The cert assessor diverges from the published SAP 10.2 spec in several places** we've found: - - Unit prices: cert uses ~10-25% lower than published Table 12 - - Tariff routing: cert applies off-peak to electric room heaters (code 691) when meter_type=1 (Dual), even when Table 12a says these should bill at the high rate - - Unknown meter (RdSAP energy_tariff=3): cert defaults to Single (per Elmhurst test), our code also matches this -3. **PEUI bias was discovered right at handover time.** Our `primary_energy_kwh_per_m2` runs +51 kWh/m² over the cert's `energy_consumption_current`. This is the biggest clue and the most efficient next dig. - -## Repo layout - -``` -packages/domain/src/domain/sap/ -├── calculator.py # Sap10Calculator + calculate_sap_from_inputs -├── tables/ -│ ├── table_12.py # SAP 10.2 spec prices, CO2, PEF -│ └── table_12_cert_calibration.py # empirical cert prices -├── worksheet/ -│ ├── dimensions.py # §1 -│ ├── ventilation.py # §2 (incl wind shelter S-B21) -│ ├── heat_transmission.py # §3 (incl DwellingExposure) -│ ├── internal_gains.py # §5 + Appendix L -│ ├── solar_gains.py # §6 + Appendix U §U3.2 -│ ├── utilisation_factor.py # Table 9a -│ ├── mean_internal_temperature.py # §7 + Table 9/9b/9c -│ ├── space_heating.py # §9 -│ └── rating.py # §13 (SAP rating equations) -├── climate/ -│ └── appendix_u.py # Tables U1/U2/U3 + solar declination -├── rdsap/ -│ └── cert_to_inputs.py # EpcPropertyData → CalculatorInputs mapping -├── validation/ -│ └── parity_report.py # ParityReport aggregator -└── tests/ # 103 unit tests - -services/ml_training_data/src/ml_training_data/ -└── sap_parity_probe.py # runs calculator on N random certs from corpus - -docs/sap-spec/ -├── sap-10-2-full-specification-2025-03-14.pdf (199pp) — primary spec -├── sap-10-3-full-specification-2026-01-13.pdf (201pp) — newer spec (Table 12 identical) -├── rdsap-10-specification-2025-06-10.pdf (114pp) — RdSAP rules (separate from SAP) -├── SPEC_COVERAGE.md — our coverage map -└── PARITY_FINDINGS.md — earlier probe findings - -docs/adr/0009-deterministic-sap-calculator.md — accepted ADR -``` - -## How to run the parity probe - -```bash -python -c " -import sys -sys.path.insert(0, 'packages/domain/src') -sys.path.insert(0, '.') -sys.path.insert(0, 'services/ml_training_data/src') -from ml_training_data.sap_parity_probe import main -main(['300','7']) # 300 certs, seed=7 -" -``` - -## Where to dig (priority-ordered, by likely MAE impact) - -### Tier 1 — the PEUI mystery (50% over) - -Our `primary_energy_kwh_per_m2` runs +51 kWh/m² over the cert's `energy_consumption_current`. Possibilities: - -- **Wrong primary energy factors in `tables/table_12.py PRIMARY_ENERGY_FACTOR`**. I populated this from approximate spec values; verify each one against SAP 10.2 Table 12 (page 189). Especially electricity PEF=1.501 — that's ~30% of corpus uses electricity for some end-use. -- **HW demand over-counted.** Look at `domain.ml.demand.predicted_hot_water_kwh`. Cylinder loss + primary circuit loss may be over-stated. SAP §J + Appendix J details exact formulas. We use bucket-rounded `_STORAGE_LOSS_FACTOR` instead of interpolation. -- **Space heating demand over-counted.** Could come from: - - Living-area-fraction defaults (Table 27): we use {1:0.75, 2:0.50, 3:0.30, 4:0.25, ≥5:0.21}; double-check against the RdSAP 10 PDF. - - Control-temperature adjustment (Table 4e): we always pass 0; spec applies ~-0.7°C in some configurations. - - Thermal mass parameter: we use 250 kJ/m²K always; spec varies by construction type. -- **Lighting/pumps over-counted.** Currently using Appendix L existing-dwelling fallback (no fixed lighting). Newer dwellings should use lower lighting energy. - -### Tier 2 — wall U-value cascade - -Worst-residual certs have `wall_construction=4 (cavity)`, `wall_insulation_type=2`, `wall_insulation_thickness="NI"`. We treat as uninsulated cavity (column 0). Cert assessor may know it's insulated (the type=2 code says so). See `domain.ml.rdsap_uvalues._insulation_bucket` — when `thickness=0` AND `present=True`, spec says use 50mm row but our parser converts "NI"→0 which short-circuits to "uninsulated". - -I tried switching "NI"→None in S-B5 cycle but it over-corrected aggregate MAE. Worth re-trying with the new understanding (compare PRIMARY energy delta on affected certs specifically). - -### Tier 3 — cost-side residuals - -Per S-B17 hand-trace: cert 2389-4472 has correct delivered energy but our SAP is 10 points lower than the cert's. Implied cert blended unit-cost rate is lower than ours. Likely cause: cert assessor applies different rate logic in edge cases (oil + off-peak meter, electricity-and-gas mix, etc.). Worth tracing more carefully. - -### Tier 4 — known unimplemented spec pieces - -(per `SPEC_COVERAGE.md`) -- Cooling §10 (rare) -- FEE §11 (new-build only) -- Per-junction thermal bridging Table R2 (ADR says defer) -- Multi-main heating Table 11 with non-zero secondary (we have this conditionally) -- Standing charges (Table 12 note (a)) - -## What's been validated - -- §13 SAP rating equations: 108.8 − 120.5 log10(ECF) for ECF ≥ 3.5, else 100 − 16.21·ECF. Verified against SAP 10.2 PDF page 38. -- §12.2 fuel price rule: "Other prices must not be used". We have spec-correct prices + cert-calibration prices as separate tables. -- Appendix U: tables verbatim. -- Appendix U rating-uses-UK-average rule: applied (S-B18). -- Solar gains §6.1 + Appendix U §U3.2 polynomial: implemented. - -## Suggested first session - -1. **Read SAP 10.2 §§4 + Appendix J carefully** (hot water demand). Map every formula against our `domain.ml.demand.predicted_hot_water_kwh`. Note divergences. The PEUI bias is largely driven by HW + heating demand. -2. **Read SAP 10.2 §14** (CO2 and primary energy). Compare to our `calculate_sap_from_inputs` primary_energy aggregation. Note especially: does the cert's `energy_consumption_current` use the same end-use list (space + HW + lighting + pumps/fans) or a different one? -3. **Read RdSAP 10 §11 (Heating)**. Check our `domain.ml.sap_efficiencies.seasonal_efficiency` cascade against the RdSAP rules. Especially heat pump efficiency (we use 2.30 for category 4 fallback). -4. Open issues in the parity-decomp data: - - 26 certs with correct energy but SAP MAE 4.12 → cost-side - - 51 kWh/m² primary-energy bias → demand-side - -## Don't repeat these dead-ends - -- ❌ Switching "NI" wall thickness to None — over-corrected in aggregate (S-B5) -- ❌ Aggressive efficiency rescue for missing sap_main_heating_code — over-corrected (S-B5) -- ❌ Using SAP 10.2 spec prices for parity validation — the cert assessor uses legacy lower prices despite reporting sap_version=10.2 (S-B9, S-B10) -- ❌ Applying off-peak to electric main heating regardless of meter_type — the meter_type field is the truth (S-B15) -- ❌ Always applying 10% secondary heating — should be conditional on cert lodging or main system being electric storage (S-B20) - -## Commit history - -The last 22 commits are S-B1..S-B22. Each commit message documents the slice's hypothesis, change, and measured impact. Worth reading 5-10 of the latest commit messages for context on what's been tried. diff --git a/docs/sap-spec/HANDOVER_S3_CLOSE.md b/docs/sap-spec/HANDOVER_S3_CLOSE.md new file mode 100644 index 00000000..228c065c --- /dev/null +++ b/docs/sap-spec/HANDOVER_S3_CLOSE.md @@ -0,0 +1,149 @@ +# Handover — Close §3, then Table 11 Secondary Heating + +**Audience:** Fresh agent continuing the deterministic SAP 10.2 calculator +(`packages/domain/src/domain/sap/`). Read this document first, then skim +the two key source files listed below. + +--- + +## What we're building + +A deterministic SAP 10.2 calculator that replicates cert-software output +(Elmhurst, Stroma, etc.) exactly for RdSAP 10 input certs. The domain +concept is **Calculated SAP10 Performance** — see +`docs/adr/0009-deterministic-sap-calculator.md`. Progress is tracked in +`docs/sap-spec/SPEC_COVERAGE.md`. + +The workflow is strict TDD: **one failing test → minimal implementation → +commit**. Each commit is one slice. + +--- + +## Current state + +### §1 Dimensions — DONE +All 6 Elmhurst fixtures pass exactly (`test_section_1_matches_elmhurst_worksheet`). + +### §2 Ventilation — DONE +All 6 Elmhurst fixtures pass exactly (`test_section_2_matches_elmhurst_worksheet`). + +### §3 Heat transmission — PARTIALLY DONE + +What passes today: +- **Internal invariants** (all 6 fixtures): `(33) = Σ per-element`, + `(37) = (33) + (36)`. +- **Exact LINE_31 + LINE_36** (non-RR fixtures 000474 and 000490 only): + `test_section_3_non_rr_line_31_and_36_match_elmhurst_worksheet`. + +What does NOT yet pass: +- **Exact LINE_33** (fabric heat loss) for any fixture. This is the + remaining §3 close task (see below). +- **RR sub-areas** (fixtures 000487, 000480, 000477, 000516): gable/ + slope/stud-wall/flat-ceiling areas are not in `SapRoomInRoof`; these + fixtures are **formally deferred** — see gap notes in + `test_section_3_partial_match_against_elmhurst_worksheet`. + +--- + +## Task A — Close LINE_33 for non-RR fixtures (investigation slice) + +**Goal:** assert exact LINE_33 and LINE_37 for 000474 and 000490. + +### The diagnostic gap + +Running `heat_transmission_from_cert(epc, window_total_area_m2=0, door_count=actual)` on +000474 gives `fabric = 193.83 W/K`. The Elmhurst `LINE_33 = 209.11 W/K`. +The gap is +15.28 W/K — and it cannot be explained by window area alone, +because `u_wall (1.5) > u_window_eff (1.33)`, so adding windows would +*decrease* fabric heat loss, not increase it. + +The gap is therefore in one or more of the other elements. Most likely +culprits, in priority order: + +1. **Floor construction missing from fixture.** + `SapFloorDimension.floor_construction` is `None` in all Elmhurst + fixture files (field not set). Our `u_floor` fallback may not match + the Elmhurst value. The 000490 fixture comment records the expected + U-values explicitly: *"suspended timber ground floor on main (U=0.71), + exposed timber floor on Extension 1 (U=1.20)"*. Set the correct + `floor_construction` and `floor_insulation` codes on each + `SapFloorDimension` and see if the gap closes. + +2. **Roof construction / insulation thickness missing from fixture.** + Similarly, `roof_insulation_thickness` may not be set on the building + parts. The Elmhurst cert will have a specific roof type and insulation + depth that drives a specific `u_roof`. + +3. **Wall insulation re-check.** All fixtures use `wall_insulation_type=4` + (`_WALL_INSULATION_NONE`), giving `u_wall = 1.5` for cavity age B. + Confirm this matches the actual Elmhurst worksheet row. + +### How to proceed + +1. Read the EPC API field encoding for `floor_construction` and + `floor_insulation` in `datatypes/epc/domain/epc_property_data.py` + and `packages/domain/src/domain/ml/rdsap_uvalues.py` (the `u_floor` + function + its construction constants). +2. Look up the actual floor type for 000474 and 000490 from the PDF + (ask the user — PDFs were supplied manually; not stored in repo). +3. Set `floor_construction` + `floor_insulation` + `floor_insulation_thickness` + on the `SapFloorDimension` objects in the fixture files. +4. Re-run the debug calc (`r0.fabric` with `window_area=0`) and check + whether the gap collapses. +5. Once floor/roof are resolved, back-calculate window area: + `A_w = (LINE_33 - r0.fabric) / (window_u_eff - u_wall)`. + If the gap is now ≤ the window contribution, this formula should give + a physically plausible positive area (5–15 m² for a 2-storey terrace). +6. Add `WINDOW_TOTAL_AREA_M2: float` and `WINDOW_AVG_U_VALUE: float = 1.4` + constants to each non-RR fixture file. +7. Write a new parametrised test asserting exact LINE_33 and LINE_37 for + 000474 and 000490. Commit as one slice. + +--- + +## Task B — Table 11 Secondary Heating (highest-MAE-impact gap) + +Per `SPEC_COVERAGE.md`, this is the **next priority after §3**. + +Most boiler-main certs allocate ~10 % of space heating to a secondary +system (electric room heater or similar). We currently model 0 %. This +causes a systematic bias on the large majority of boiler certs. + +**SAP 10.2 Table 11** gives the secondary fraction keyed on main-heating +type. **RdSAP 10 Appendix A** identifies the heating type from cert codes. + +Starting point: `packages/domain/src/domain/sap/calculator.py` (entry +point) and `packages/domain/src/domain/sap/rdsap/cert_to_inputs.py` +(cert→inputs adapter). The `SapInputs` struct carries `main_heating_*` +fields — see how space heating demand is calculated and where a secondary +fraction would hook in. + +--- + +## Key files to read + +| File | Why | +|---|---| +| `packages/domain/src/domain/sap/worksheet/heat_transmission.py` | §3 implementation — `heat_transmission_from_cert` | +| `packages/domain/src/domain/sap/worksheet/tests/test_heat_transmission.py` | all §3 tests including the partial Elmhurst conformance test | +| `packages/domain/src/domain/sap/worksheet/tests/_elmhurst_worksheet_000474.py` | non-RR fixture to close | +| `packages/domain/src/domain/sap/worksheet/tests/_elmhurst_worksheet_000490.py` | non-RR fixture to close | +| `packages/domain/src/domain/ml/rdsap_uvalues.py` | all U-value lookups — `u_floor`, `u_wall`, `u_roof` | +| `docs/sap-spec/SPEC_COVERAGE.md` | overall progress tracker | +| `docs/adr/0009-deterministic-sap-calculator.md` | scope + architectural decisions | + +Spec PDFs are at `docs/sap-spec/` — SAP 10.2 (March 2025), SAP 10.3 +(Jan 2026), RdSAP 10 (June 2025). + +The canonical reference Excel worksheet is at the repo root: +`2026-05-19-17-18 RdSap10Worksheet.xlsx`. A loader for it is at +`packages/domain/src/domain/sap/worksheet/tests/_xlsx_loader.py`. + +--- + +## Test suite + +``` +python -m pytest packages/domain/src/domain/sap/worksheet/tests/ -q +# Should show 122 passed +``` diff --git a/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md b/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md deleted file mode 100644 index a3d23a07..00000000 --- a/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md +++ /dev/null @@ -1,1092 +0,0 @@ -# Handover — Systematic Section-by-Section RdSAP 10 / SAP 10.2 Review - -**Audience:** A fresh agent picking up the deterministic SAP calculator at -`packages/domain/src/domain/sap/`. Read this first, then the spec PDFs, -then the code. - -**Goal:** Match the cert software (Elmhurst / Stroma / etc.) output exactly -for RdSAP 10 / SAP 10.2 input certs. This is a **deterministic, mechanical -calculation** — not a model — so MAE should approach zero on certs whose -inputs are fully populated. - ---- - -## 1. Critical framing — this is NOT a judgement call - -The SAP/RdSAP energy assessment splits cleanly into two roles: - -1. **The assessor** — a person who surveys the dwelling and lodges - measured/observed fields onto the cert (areas, perimeters, - construction codes, insulation thicknesses, fuel types, etc.). - The assessor makes NO calculation decisions. -2. **The cert software** (Elmhurst, Stroma, Quidos, NHER, ECMK) — a - deterministic implementation of the RdSAP 10 + SAP 10.2 specs. It - takes the lodged fields and produces SAP score, CO2 emissions, - primary energy (PEUI), CO2 per m², EI rating, etc. - -**Our calculator is replicating role #2.** Assessor software -implements the SAP 10.2 spec faithfully; the question of "where does -Elmhurst diverge from spec?" is no longer the operative one (per -ADR-0010 + §3 below). Our job is to enumerate every spec -table / formula / footnote and verify each against the published SAP -10.2 (14-03-2025) and RdSAP 10 (10-06-2025) PDFs. - -There is no "assessor judgement" knob to tune. Each field on the cert -has a deterministic interpretation per the spec. Each spec table / -formula has a deterministic implementation. Our job is to enumerate -all of them and verify each. - ---- - -## 2. Current state (2026-05-19) - -- Branch: `ara-backend-design-prd` -- Last clean commit: `f4a8d2a0` ("tests: golden-fixture regression set — 7 currently-correct corpus certs") -- 301 tests passing -- Parity probe (300 random certs from - `data/ml_training/runs/2025_2026_n250000_v18a/data.parquet`, seed=7, - `sap_score ∈ [5, 99]`): - - | Metric | Value | - |---|---| - | SAP MAE | 4.61 | - | SAP bias | +0.87 | - | PE MAE | 43.32 kWh/m² | - | PE bias | +37.69 kWh/m² | - -- 7 "golden" regression certs locked in - `packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`. - Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10 kWh/m²`. Known - caveat: some of these are compensating-error matches (e.g. cert - `7536-3827`'s PE matches but cost is £143 under cert's implied cost - due to multi-factor offsetting bugs). **These fixtures are retired - per ADR-0010 and §10 below — they lock buggy compensating outputs - in place and will fight the spec sweep.** - -> **Read this before anything else.** [ADR-0010](../adr/0010-sap10-calculator-spec-target-and-validation.md) -> supersedes the spec-version target, the PCDB sequencing, and the -> cert-calibration layer of ADR-0009. This handover document was -> originally written under the rejected framing; §3, §4, §7, §7b, -> §10 below have been rewritten in lockstep. §2.5 lists the five -> prerequisites that land **before** the section-by-section sweep -> starts. - ---- - -## 2.5. Prerequisites before the sweep starts - -Five blockers, in dependency order. The section sweep does not start -until all five are merged. Together they convert the parity probe -from a noisy mixture-distribution signal into a clean per-section -verification tool. - -### P1 — Re-extract the training parquet with `inspection_date` - -The 250k-cert parquet has 202 columns; **none of them are dates**. -Without `inspection_date` on each cert we cannot construct the -Validation Cohort (P3). The ETL currently drops the dates; add them -back as a non-breaking MINOR Feature Schema Version bump (per -ADR-0008). `EpcPropertyData.inspection_date` and `.registration_date` -both exist on the domain object and are populated upstream — the -parquet writer just needs to include them. - -### P2 — Delete `domain.sap.tables.table_12_cert_calibration`; correct `domain.sap.tables.table_12` - -Per ADR-0010 §2 and §1: -- Remove `table_12_cert_calibration.py` and every call site - (`cert_calibration_prices()`, `cert_calibration_e7_codes`, the - `PriceTable` constructor argument that defaults to it). -- Re-label `table_12.py` as `SAP 10.2 Table 12 (14-03-2025 amendment)`. -- Correct CO2 factors: mains gas 0.214 → **0.210**, standard electricity 0.086 → **0.136** (the file currently mixes SAP 10.2 prices with SAP 10.3 CO2 factors). -- Delete the misleading "+25 % shift from SAP 10.2" comment — 13.19 p - is SAP 10.1 (or SAP 10.2 amendment 0), not SAP 10.2 (14-03-2025). - -### P3 — Filter the parity probe to the Validation Cohort - -`Validation Cohort` is defined in `CONTEXT.md` and ADR-0010 §3: -`inspection_date ≥ 2025-07-01`. Modify -`services/ml_training_data/src/ml_training_data/sap_parity_probe.py` -to apply the filter before sampling. The probe sample size and seed -remain configurable; `sap_score ∈ [5, 99]` remains the typicality -filter on top of the cohort filter. - -### P4 — Implement `PcdbLookup` (replace `NoOpPcdbLookup`) - -Per ADR-0010 §4. Download boiler + heat-pump CSVs from -https://www.ncm-pcdb.org.uk. Build a lookup keyed on -`main_heating_index_number`. Surface seasonal efficiency, secondary -efficiency, output kW, and (for HPs) flow-temperature curve. ~half-day -of work per the original handover estimate. The -`Sap10Calculator.__init__(pcdb: Optional[PcdbLookup])` seam from -ADR-0009 grill outcome #1 is the integration point; no calculator-side -changes needed beyond reading `index_number` and routing PCDB-returns -to space-heating / hot-water efficiency lookups instead of Table 4a. - -### P5 — Populate `SapResult.intermediate` + transcribe BRE worked examples - -Per ADR-0010 "Verification infrastructure": -- Populate every named SAP 10.2 worksheet variable on - `SapResult.intermediate` as sketched in §11. This is mechanical — - thread the values from each worksheet module into the dict. -- Transcribe the BRE worked examples from the SAP 10.2 appendices and - RdSAP 10 worked-example annex into unit tests - (`tests/test_bre_worked_examples.py`) that lock per-intermediate - values, not aggregate SAP. These replace the retired cert fixtures. - -### P6 — Strict-type `EpcPropertyData` via canonical domain enums - -The current `EpcPropertyData` and its nested types carry many bare -`str` fields and `Union[int, str]` fields (the latter because the -gov API gives ints and Site Notes give strings). The defensive -type-handling cascades into the calculator (`cert_to_inputs.py`, -`dimensions.py`, etc.) — `dimensions.py:74-82` is Khalim's documented -example: `SapBuildingPart.identifier` carries main-vs-extension -information but is bare `str`, so the dimensions code defensively -iterates instead of dispatching on a typed kind. - -The fix: -1. **One canonical enum per field**, union of all keys appearing - across all schema versions in - `datatypes/epc/domain/epc_codes.csv`. Hand-author the 18 enum - classes (`built_form`, `construction_age_band`, `energy_tariff`, - `glazed_area`, `glazed_type`, `heat_loss_corridor`, `main_fuel`, - `mechanical_ventilation`, `property_type`, `tenure`, - `transaction_type`, `ventilation_type`, `water_heating_fuel`, - `cylinder_insulation_thickness`, `energy_efficiency_rating`, - `improvement_description`, `improvement_summary`, `code`) plus - `BuildingPartKind` (Main Dwelling / Extension N). codes.csv is - the reference; a dedup script can optionally verify coverage but - is not a build dependency. -2. **The API mapper** parses raw ints into the canonical enum. -3. **The Site Notes mapper** parses raw strings into the canonical - enum. -4. **The domain object** (`EpcPropertyData` and nested) holds only - the canonical enums — no `Union[int, str]`, no bare `str` for - coded fields. -5. **Every consumer** (calculator, ML pipeline, recommendations, - ETL, scenario builder) reads from the typed fields. - -**Constraint**: repo-wide tests must keep passing. The calculator -is one consumer; the ML pipeline, recommendations, and the Site -Notes ingestion path also consume `EpcPropertyData`. Each mapper- -layer change is paired with adapter updates that preserve the -behaviour the existing tests cover. - -Pyright `strict` mode must remain clean (CLAUDE.md). - -### Expected outcome of P1–P6 - -After all six land, run the probe against the Validation Cohort. The -expected baseline MAE on the clean probe is much smaller than the -current 4.61 — likely 1.5–2.5 SAP-points based on what we know about -the residual breakdown (heat pumps closed by P4, gas boilers tightened -by P4, price-version noise removed by P2+P3). The remaining residual -is the genuine spec sweep target — and per-section fixes will move -the probe in measurable, distinguishable amounts because there's no -compensating layer to mask them, and there's no defensive type -branching obscuring which input value drove which intermediate. - ---- - -## 3. Why the prior diagnosis was wrong and how we fixed it - -The prior session shipped ten slices (S-B23 → S-B31) by debugging the -biggest residuals one at a time: - -- **PE MAE dropped substantially: 57.28 → 43.32 (−14)** — real progress - on the demand-side calculation. -- **SAP MAE barely moved: 5.34 → 4.61 (−0.73)** — diagnosed at the time - as "cert-calibration absorbs multiple spec deviations". - -Three slice attempts looked like they "proved" the cert-cal-absorbs- -deviations diagnosis: - -- **Standing charges**: spec Table 12 note (a) requires £92/yr gas - standing charge on space + water heating. Adding it pushed SAP bias - +0.98 → −2.62. Reverted. -- **Cat=10 room heaters off-peak routing**: Table 12a says "other - direct-acting electric heating" bills 100 % high rate on 7-hour - tariff. Switching cat=10 from off-peak to standard rate inverted - the bias +5.88 → −6.00. Reverted. -- **HW cylinder zero-loss for combi** (uncommitted): Table 2 + Table - 3 footers require zero storage + primary loss when efficiency comes - from Table 4b. Zeroing them dropped PE MAE −6.64 but raised SAP - MAE +0.39 and broke 3 of 7 golden fixtures. Reverted. - -The prior agent concluded: *cert-calibration absorbs Elmhurst's -deviations from spec — we can't fix one without re-deriving the -calibration, so do a full spec sweep first and re-derive cert-cal at -the end.* This diagnosis is **wrong** and the proposed remedy -amplifies the problem. - -### What was actually going on - -The 250k-cert corpus spans multiple SAP spec-version regimes: -- **Pre-2025-03-14**: certs lodged under SAP 10.1 / SAP 10.2 amendment - 0 prices — mains gas ~3.48 p, standard electricity 13.19 p. -- **Post-2025-03-14**: certs lodged under SAP 10.2 (14-03-2025) prices - — mains gas 3.64 p, standard electricity 16.49 p. - -The `table_12_cert_calibration` prices (3.48 p / 13.19 p) are **the -older spec's prices**, not Elmhurst deviations from the spec. They -are an empirical "best fit" across a mixture distribution of two -price regimes, with downstream-component bugs (PCDB absence, HW -cylinder loss applied to combi, etc.) absorbed into the fit. The -table looks like compensation for assessor-software quirks because we -were never told which spec each cert was on. - -Each "spec-correct fix that worsened MAE" in the failed slices above -was actually correct. The MAE regressed because: -1. The cert-cal prices (pre-March-2025 spec) cancelled with one set - of downstream errors to produce a quasi-stable cost. -2. The spec-correct fix landed → that cancellation broke → the - probe MAE went up. -3. But the spec-correct fix was *right* — what regressed was a - compensating-error equilibrium, not the calculator's truth. - -The prior session's "re-derive cert-cal at the end" plan would -re-establish a new compensating-error equilibrium across the new bug -set. It does not converge on spec-correctness. - -### The fix (per ADR-0010) - -1. **Stop fitting against a mixture distribution.** Filter the - validation corpus to a single spec-version window (Validation - Cohort, `inspection_date ≥ 2025-07-01`). Every cert in the cohort - was lodged on SAP 10.2 (14-03-2025) prices. -2. **Delete the cert-calibration layer.** Use spec prices everywhere - (`domain.sap.tables.table_12`). The only price-routing decision - left is Table 12a fractional high-rate blending — a real spec - feature, not a calibration. -3. **Build PCDB**, because it dominates residual variance and the - reason it was deferred (cert-cal-absorbs-PCDB) no longer holds. -4. **Build trace mode and BRE worked-example fixtures**, so - per-section verification works against single-cert intermediates - instead of aggregate corpus MAE. - -This is what §2.5 lists as the five prerequisites. Once they land, -the section-by-section spec sweep produces clean, monotonic -improvements. - ---- - -## 4. Scope decisions (per ADR-0010) - -### IN scope -- **SAP 10.2 (14-03-2025 amendment)** is the active spec target. - `docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf`, 199 pages. -- **RdSAP 10 (10-06-2025)** — the cert→input mapping layer that - cross-references SAP 10.2. `docs/sap-spec/rdsap-10-specification-2025-06-10.pdf`, - 114 pages. -- **PCDB integration.** Moved from "Session C deferred" to **P4 - prerequisite** (§2.5). Heat pumps and the 78 % of gas-boiler certs - lodging `main_heating_data_source=1` need PCDB-sourced efficiency - for the calculator to be spec-correct. Data source: - https://www.ncm-pcdb.org.uk; lookup keyed on `main_heating_index_number`; - fields: seasonal efficiency, secondary efficiency, output kW, - flow-temperature curve (HPs). -- **All RdSAP 10 sections in document order.** §1 → §§19, plus - Tables 27 / 28 / 29 / 30 / 31. The verification approach in §5 is - unchanged — only the precondition changes: the sweep runs against a - clean probe (Validation Cohort + spec prices + PCDB + trace mode). - -### OUT of scope -- **Full SAP assessments.** Full-SAP certs lodge measured/calculated - U-values in `walls[i].description` (e.g. - "Average thermal transmittance 0.18 W/m²K"). These are a separate - calculation path (BS EN ISO 6946) and a different corpus. Park - until the RdSAP 10 base case parity is reached. S-B24 / S-B29 - attempted partial handling; those slices can stay or be reverted at - your discretion when you reach §§4-7 of RdSAP and §3 of SAP 10.2. -- **SAP 10.3 (13-01-2026).** No SAP-10.3-lodged certs in the corpus, - so it cannot be validated. Calculator targets SAP 10.2 until the - corpus migrates (expected late 2026 / 2027 once BRE updates RdSAP - to reference SAP 10.3). Note: `table_12.py` currently mixes SAP - 10.2 prices with SAP 10.3 CO2 factors — corrected as part of P2. -- **Historical-spec cert reproduction.** Calculating what cert SAP - *would have been* under SAP 10.1 / pre-March-2025 SAP 10.2 prices is - not the calculator's job. Lodged Performance carries the historical - value; Calculated SAP10 Performance is current-spec only. The - Validation Cohort filter operationalises this — older certs are - out of the validation loop, not because they're "wrong" but because - they're a different spec's output. -- **Re-deriving cert-cal at the end.** The prior session's plan. The - cert-calibration layer is deleted in P2, not re-fit. - ---- - -## 5. The approach — section-by-section spec verification - -Work through the RdSAP 10 spec **in document order**, starting at -§1. For each section: - -### 5.1. Read the spec section -Read the section text fully. Note every rule, table reference, and -defaulting cascade. - -### 5.2. Find the corresponding code -Map the section to the source file(s) implementing it. The current -mapping (some sections are split across modules): - -| RdSAP 10 section | Code location | -|---|---| -| §1 Introduction / general | n/a | -| §2 Property descriptors | `datatypes/epc/domain/epc_property_data.py` | -| §3 Dimensions | `packages/domain/src/domain/sap/worksheet/dimensions.py` | -| §4 Ventilation | `packages/domain/src/domain/sap/worksheet/ventilation.py` | -| §5 Construction / U-values | `packages/domain/src/domain/ml/rdsap_uvalues.py` + `worksheet/heat_transmission.py` | -| §6 Windows / doors / overshading | `worksheet/solar_gains.py` + `rdsap/cert_to_inputs.py` | -| §7 Heating systems (refers to SAP 10.2 Appendix A) | `domain.ml.sap_efficiencies` + `rdsap/cert_to_inputs.py` | -| §8 Heating controls (Table 4e) | `rdsap/cert_to_inputs.py` | -| §9 Heat emitters / flow temperatures | not implemented | -| §10 Space and water heating (Appendix A) | `rdsap/cert_to_inputs.py` | -| §11 Additional items (PV, batteries, wind, hydro, shutters) | partial in `cert_to_inputs.py` (PV only) | -| §12 Electricity tariff | `rdsap/cert_to_inputs.py` (`_is_off_peak_meter`, fuel routing) | -| §13 Addendum to EPCs | n/a | -| §14 Special cases (e.g. flats above commercial) | not implemented | -| §15 Improvements (recommendations) | n/a (not rating) | -| §16-19 RdSAP-specific SAP rating equations | `worksheet/rating.py` | -| Table 27 — Living-area fraction | `rdsap/cert_to_inputs.py:_living_area_fraction` | -| Table 28 — Cylinder size defaults | `domain.ml.demand:_CYLINDER_VOLUME_L` | -| Table 29 — Heating + HW parameters | partial in `cert_to_inputs.py` | -| Table 30 — Mechanical ventilation | not implemented | -| Table 31 — Data to be collected | n/a | - -### 5.3. For each spec rule in the section, check our code -For each table, formula, footnote, exception: - -1. Does our code implement it? -2. Does the implementation match the spec values exactly? -3. Are there spec-defined edge cases / footnotes we're missing? - -### 5.4. When a gap is found -- Write a failing unit test that asserts the spec-correct behaviour - — wherever possible, write it as an assertion on `intermediate` - values rather than on aggregate SAP, using a BRE worked example - if one covers the section. -- Implement the fix. -- Run `test_bre_worked_examples.py` plus the Validation Cohort - probe. Note both direction and magnitude of change. -- If a BRE worked-example breaks, the new code is wrong (revert). - BRE examples are spec-derived and cannot regress from a - spec-correct change. -- Commit per-slice: one section → one commit. Reference the spec - section in the commit message. - -### 5.5. Sweep-time principle: worksheet-faithful structure - -Each `worksheet/*.py` module must mirror the SAP 10.2 worksheet -structure for its section. As you verify a section, also restructure -its module so that: - -1. **Each function name references its worksheet-line origin** (e.g. - `heat_transfer_coefficient` aligns with worksheet line (40); - `mean_internal_temperature` aligns with worksheet line (93)). -2. **Compound calculations are split** into one function per - worksheet line where possible — easier to verify against - `intermediate[...]` and against BRE worked-example values. -3. **Defensive type-handling disappears**. Once P6 lands, the input - is a typed enum or numeric — branching on `isinstance(x, int)` is - replaced by enum dispatch. -4. **Domain-typed inputs flow directly**. `SapBuildingPart.kind == - BuildingPartKind.MAIN_DWELLING` replaces string sniffing of - `identifier`. The dimensions.py "unnecessarily complicated" - pattern Khalim flagged is the canonical example of what *not* - to do. - -The principle applies during section-sweep slices. It is **not** -a separate prerequisite — the refactor lands with the verification -slice for the section it touches. - -### 5.6. Use trace mode when you need it -P5 populates `SapResult.intermediate: dict[str, float]` with every -named SAP 10.2 worksheet variable. Each section's verification -benefits from inspecting these values per-cert. See §11 below for -the sketch. - ---- - -## 6. What's already been done — section by section - -This is your starting map. Each row says whether the section has been -touched and what the current state is. - -### Walls / construction (§5) -- **S-B23 (committed `9a509e41`)**: Table 6 "Filled cavity" row dispatch - when `wall_insulation_type=2` AND `wall_construction=4`. Spec-anchored. -- **S-B24 (committed `15613309`)**: Parse `walls[i].description` for - "Average thermal transmittance X W/m²K". **PARK** — full-SAP path. -- **S-B25 (committed `6b934710`)**: Description-based dispatch for cavity - "as built, insulated (assumed)" + similar (type=4 with descriptive - signal). Spec-anchored via legacy `epc_wall_description_map`. -- **S-B26 (committed `361f9154`)**: `_insulation_bucket(0, True) → 50` - fix (the "NI" thickness sentinel) + description-based override of - `wall_ins_present` for non-cavity walls. Spec footnote (Table 6). -- **S-B27 (committed `1f49fa03`)**: Floor `_insulation_bucket` analog — - Table 19 footnote (2) "max(50, age-band default)" when description - signals retrofit. -- **S-B28 (committed `25261d5c`)**: Roof NI thickness + insulated - description → §5.11.4 footnote 50mm joist row. -- **S-B29 (committed `3ab09845`)**: Floor + roof "Average thermal - transmittance" parse. **PARK** — full-SAP path. - -**Still to verify in §5**: -- Stone wall U-values for Scotland / Wales / NIR (Tables 7-10) — only - England is fully transcribed; country overrides are partial. -- Cob U-values (§5.6) — table only, no formula implementation. -- Stone formula §5.6 / §5.7 for non-standard wall thicknesses. -- Curtain wall §5.18 — not implemented. -- Party wall U-values (Table 15) — implemented in `u_party_wall`, - verify table values. -- Thermal bridging (Table 21) — implemented as global `y` factor, - verify per-age-band values. -- §5.16 Thermal mass — Table 22 (only 100 / 250 kJ/m²K, dispatched - by construction type with internal insulation). Currently we - hardcode 250 (see `cert_to_inputs.py:_DEFAULT_THERMAL_MASS_PARAMETER_KJ_PER_M2_K`). - This is wrong for timber-frame / cob / internally-insulated masonry - (should be 100). - -### Heating systems (§§7-10, SAP Appendix A) -- **S-B20 (in history)**: Table 11 secondary heating allocation, - conditional on cert lodging secondary or being electric storage. -- **Failed S-B30 (reverted)**: respect `main_heating_fraction` — - shown empirically wrong. Field is multi-main allocation, not - main-vs-secondary. Spec verified against SAP 10.2 Appendix A1/A4. -- **S-B31 (committed `afdf297f`)**: Table 12c DLF on heat-network main. - Spec §C3.1 + Table 12c. -- **Failed S-B32 (room heater off-peak routing, reverted)**: Table 12a - says cat=10 room heaters on 7-hour tariff bill 100% high rate. Our - cert-cal extends off-peak to codes 691-696. Spec-correct fix - inverted bias direction — calibration was absorbing this. -- **Uncommitted HW cylinder fix**: spec-correct (combi → zero - storage/primary loss per Table 2 + Table 3 footers) but breaks 3 - golden fixtures. Decision deferred to systematic pass. - -**Still to verify in heating**: -- Table 4a efficiency values for every code (heat pumps, storage - heaters, room heaters, CPSU). The category-fallback (cat=4 → 2.30) - is documented as a known limitation. -- Boiler interlock penalty (−5%) — spec §9.2.1: "The efficiency of - gas and liquid fuel boilers for both space and water heating is - reduced by 5% if the boiler is not interlocked for space and water - heating." We don't apply this. Known gap. -- Table 4c condensing-boiler / heat-pump emitter-temperature - adjustment — we don't apply this. -- Table 12a high-rate fractions for off-peak dwellings — we apply - 100% off-peak or 100% standard, never fractional blending. - -### Hot water (§4 SAP + Appendix J) -- Storage loss factor table (Table 2) — current values in - `domain.ml.demand:_STORAGE_LOSS_FACTOR` are ~3× off from spec - (verified). Known under-prediction of cylinder loss for storage - systems; cancelled by over-prediction of primary loss for combi - systems in aggregate. -- Primary loss formula (Table 3) — implemented as 245/60 kWh by age - band. Spec is a per-month formula `nₘ × 14 × [{0.0091·p + 0.0245·(1-p)}·h + 0.0263]` - with `p` (pipework insulation fraction) and `h` (circulation hours). - Known approximation. -- Combi-boiler zero-loss rule (Table 2 + Table 3 footers) — currently - NOT applied (the failed uncommitted slice). Adding this drops PE - MAE −6.64 but raises SAP MAE +0.39. -- Appendix J Vd formula `25N + 36` — currently the simple form, not - the full per-component (shower / bath / other) breakdown. Useful - HW demand is ~7% under spec value. -- ΔT — currently 43°C constant (55−12). Spec uses monthly Tcold and - hot at 52°C, not 55°C. Per-month variance unmodelled. - -### Lighting (Appendix L) -- `predicted_lighting_kwh` in `domain.ml.demand` uses `9.3 × TFA × - (1 − 0.5·led_share − 0.4·cfl_share)` heuristic. -- Spec is L1-L12: daylight correction, fixed-lighting capacity, top-up - + portable shares, monthly profile. -- For LED-dominant home (50+ LEDs): our heuristic gives ~465 kWh, spec - gives ~94 kWh. Known over-prediction by ~5× for new-build LED homes. - -### Internal gains (§5 SAP) -- `worksheet/internal_gains.py` implements metabolic + cooking + - appliances + lighting (the four positive rows of Table 5). -- **Missing**: Water heating row (`1000 × (65)ₘ / (nₘ × 24)` — i.e. - HW losses recycled as heated-space gains) and Losses row (`−40 × N` - for cold inflow + evaporation). Both documented in S-B23 gap list. - -### Ventilation (§4 / Table 5) -- Wind-shelter factor implemented in S-B21. -- Mechanical ventilation (MVHR, MEV, PIV) — not implemented; cert - rarely lodges. Spec §4.2 + Table 4g. -- Pressure-test override (worksheet lines 17-18) — not implemented. - -### Tariff / cost (§12 + Table 12 / 12a / 12c) -- Cert-calibration prices in - `domain.sap.tables.table_12_cert_calibration` are an EMPIRICAL fit - to Elmhurst's output. They are LOWER than the published Table 12 - spec values by 4-25%. Known divergence; investigation deferred. -- Standing charges (Table 12 note (a)) — NOT applied. Adding them - empirically worsens MAE (calibration absorbs). -- Table 12a high-rate fractions — currently 100% off-peak for E7- - eligible codes, 100% standard otherwise. No fractional blending. -- Heat network DLF (Table 12c) — applied per S-B31 only to main - heating + HW from main. HW-only-from-heat-network is a separate slice. - ---- - -## 7. The cert-calibration "tension" is dissolved (per ADR-0010) - -This section originally framed cert-calibration vs spec-correctness as -two end-states the calculator had to choose between. That framing is -wrong (see §3 for the actual diagnosis): the cert-cal values are -pre-March-2025 SAP prices, not Elmhurst deviations from SAP 10.2. -Once the corpus is filtered to the Validation Cohort (P3) and the -cert-cal layer is deleted (P2), the false dichotomy disappears. - -### What replaces this section - -- **One price table.** `domain.sap.tables.table_12` (re-labelled SAP - 10.2 14-03-2025 amendment, CO2 factors corrected per P2). -- **One validation cohort.** `inspection_date ≥ 2025-07-01`, every - cert lodged on the calculator's target spec version. -- **One verification mechanism.** Trace-mode intermediates + BRE - worked-example unit tests for per-section verification; Validation - Cohort probe MAE for aggregate go/no-go. - -Cert-software deviations from spec, if they exist at all, are -expected to be small and localised. They surface as residual after -the spec sweep completes against a clean probe — and at that point -the question is whether to chase them at all (Elmhurst-deviation -fixes have low domain value compared to spec-correctness, given the -calculator's product use case is scoring counterfactuals for the -MeasureApplicator chain, not reproducing historical certs). - ---- - -## 7b. Outstanding findings to pick up during the systematic pass - -The prior session identified several spec-correct fixes that were -reverted because they made SAP MAE worse against the **full corpus**. -The empirical signal that "reverted" them was version-mixture noise -(see §3) plus compensating-error breakage in the 7 retired golden -fixtures. Each fix below is **expected to land cleanly** once the -five prerequisites in §2.5 are done, because: - -- The Validation Cohort (P3) is on a single spec version — the price - mismatch that drove the bias regression on standing charges and - cat=10 routing disappears. -- The cert-cal layer is gone (P2) — no calibration to "break". -- PCDB is integrated (P4) — the heat-pump and gas-boiler residuals - that dominated per-cert MAE collapse before any of these findings - even matter. -- The fixtures are now BRE worked examples (P5 + §10) — they cannot - be broken by spec-correct changes because they are themselves - derived from the spec. - -Treat each finding as a section-sweep TODO. The empirical impacts -below were measured against the **dirty probe** (full corpus + cert-cal -+ no PCDB) and are **not predictive** of behaviour on the clean probe. -Re-measure each fix against the Validation Cohort after prerequisites -land. - -### Finding 1 — HW cylinder zero-loss rule for combi boilers -**Status**: spec-correct fix exists in working-tree-only form -(uncommitted). Reverted at end of last session. - -**Spec basis**: -- **SAP 10.2 Table 2 footer (page 158)**: "In the case of a - combination boiler: a) the storage loss factor is zero if the - efficiency is taken from Table 4b" -- **SAP 10.2 Table 3 footer (page 160)**: "Primary loss is set to - zero for the following: Electric immersion heater, Combi boiler - (including when it is part of a combined heat pump and boiler - package and provides all the hot water), CPSU (including electric - CPSU), Boiler and thermal store within a single casing, Separate - boiler and thermal store connected by no more than 1.5 m of - insulated pipework, Direct-acting electric boiler, Heat pump (not - combined heat pump and boiler package with a non-combi boiler) - from PCDB with hot water vessel integral to package" - -**The bug**: our calculator currently adds storage loss (~135 kWh) -and primary loss (~245 kWh) for ALL certs with an age band lodged, -ignoring whether the dwelling has a cylinder. **67% of corpus certs -explicitly lodge `has_hot_water_cylinder=False`** (the modal combi -boiler case) — we add 380 kWh of fictional HW losses for each. - -**The fix** (sketch, ~10 lines): -1. Add `has_cylinder: bool = True` keyword to - `predicted_hot_water_kwh` in `packages/domain/src/domain/ml/demand.py`. -2. When `has_cylinder=False`, set `storage_loss = 0` and `primary_loss = 0`. -3. In `cert_to_inputs.py` (around line 829), pass - `has_cylinder=epc.has_hot_water_cylinder and not is_instantaneous`. - -**Empirical impact** (measured on 300-cert probe): -- **PE MAE: 43.32 → 36.68 (−6.64) ← biggest single fix found this session** -- PE bias: 37.69 → 30.41 (−7.28) -- SAP MAE: 4.61 → 5.00 (+0.39, regression) -- 3 of 7 golden fixtures break - -**Why it was reverted**: the SAP regression + broken fixtures indicate -the fictional HW losses were partially compensating for OTHER bugs -(likely lighting over-prediction for LED-dominant homes). The right -ordering is: fix the spec-clear cases (HW cylinder, lighting per -Appendix L, etc.) together, then re-derive cert-cal. - -**When to pick up**: when you reach §4 / Appendix J during the -systematic pass. Pair with the lighting Appendix L fix to avoid -breaking the golden fixtures individually. - -### Finding 2 — Standing charges (Table 12 note (a)) -**Status**: spec-correct, never implemented. Empirically rejected by -4-mode probe. - -**Spec basis**: SAP 10.2 Table 12 note (a), page 190: -> "For calculations including regulated energy uses only (e.g. -> regulation compliance, energy ratings): -> - The standing charge for electricity standard tariff is omitted -> - The standing charge for off-peak electricity is added to space -> and water heating costs where either main heating or hot water -> uses off-peak electricity -> - The standing charge for gas fuels is added to space and water -> heating costs where the gas fuel is used for space heating -> (main or secondary) or for water heating" - -**The bug**: our calculator never adds standing charges. Per spec, a -gas-heated dwelling should have £92/yr added to the ECF numerator. - -**Empirical impact** (4-mode probe, 300 certs): -| Mode | All certs | Gas-only | -|---|---|---| -| cert-cal, no standing (current) | MAE 4.69, bias +0.98 | MAE 4.01, bias +0.80 | -| cert-cal + gas standing | MAE 4.94, bias **−2.62** | MAE 4.31, bias **−3.53** | - -Adding standing charges shifts SAP bias by ~3.5 points downward — -clearly the wrong direction. The cert-cal prices (3.48p gas vs spec -3.64p) implicitly absorb the standing-charge contribution. - -**When to pick up**: when you reach §12 / Table 12. Apply alongside -spec-correct unit prices (3.64p gas, 16.49p elec) and re-derive -cert-cal to match Elmhurst's residual deviation pattern. - -### Finding 3 — Cat=10 room heaters off-peak routing -**Status**: spec-correct, currently bills room heaters at off-peak -rate on E7 dwellings. Empirically rejected. - -**Spec basis**: SAP 10.2 Table 12a (page 191): -> "Other direct-acting electric heating (including electric secondary -> heating): 7-hour tariff 1.00 high rate; 10-hour tariff 0.50 high rate" - -**The bug**: our cert-calibration (`cert_calibration_e7_codes`) -extends the off-peak set to include codes 691-696 (room heaters). -That's the S-B14 empirical extension — the previous agent found it -helped some specific certs. Per Table 12a it's WRONG: room heaters -on E7 should bill 100% at HIGH rate, not at low rate. - -**Empirical impact**: switching from off-peak (5.50p cert-cal) to -standard rate (13.19p) — closer to spec but still not the high rate -(15.29p cert-cal) — inverted the bias from +5.88 to −6.00 without -improving MAE. - -**The real issue**: Table 12a defines FRACTIONAL blending (e.g. -"90% high, 10% low" for direct-acting electric boiler on 7-hour -tariff), not binary on/off-peak. Our calculator only supports binary. -A proper implementation needs per-system high-rate fractions. - -**When to pick up**: when you reach §12 / Table 12a. Implement -fractional blending for all the rows of Table 12a, not just cat=10. - -### Finding 4 — Lighting (Appendix L proper) -**Status**: gap. Current code uses a 9.3 kWh/m² heuristic with simple -LED/CFL reductions; spec is the L1-L12 cascade with daylight -correction, fixed-lighting capacity, top-up + portable shares, -monthly profile. - -**Spec basis**: SAP 10.2 Appendix L §L1 (pages 88-90), equations -L1-L12. - -**The bug**: for a 100 m² LED-dominant home (e.g. cert 7536-3827 with -51 LEDs), our heuristic returns 465 kWh/yr; spec returns ~94 kWh/yr. -Over-prediction by ~5× on LED-dominant homes (which is most modern -stock). - -**Empirical impact** (estimated): -- ~5-6 kWh/m² PEUI over-prediction for LED-dominant population -- Corpus-weighted: ~3-4 kWh/m² PEUI bias contribution - -**When to pick up**: when you reach Appendix L. Pair with the HW -cylinder fix (Finding 1) to avoid the SAP MAE regression. - -### Finding 5 — Internal-gains Table 5 missing rows -**Status**: gap. Spec Table 5 has 7 rows for internal gains; our -`worksheet/internal_gains.py` implements 4. - -**Spec basis**: SAP 10.2 Table 5 (page 177). - -**Missing rows**: -- **Water heating**: `1000 × (65)ₘ / (nₘ × 24)` W — the HW losses - (cylinder + distribution + primary) recycled as heated-space gains - via worksheet line (65). Reduces space heating demand. -- **Losses**: `−40 × N` W — heat to incoming cold water and - evaporation. Negative contribution. - -**Empirical impact** (estimated): -- For N=2.7: HW gains ≈+75 W, losses ≈−108 W, net ≈−33 W. Currently - we miss both → our gains are 33 W too high → space heating demand - too low → PE under-predicted by ~3 kWh/m² (rough). - -**When to pick up**: when you reach §5 / Table 5. Worksheet line (65) -also needs implementation — the HW losses already exist in our calc -(see `demand.py:_cylinder_storage_loss_kwh` etc.), they just need -piping into internal_gains. - -### Finding 6 — Storage-loss-factor table values are wrong -**Status**: gap. Affects only certs with `has_hot_water_cylinder=True` -(33% of corpus). - -**Spec basis**: SAP 10.2 Table 2 (page 158). - -**The bug**: `domain.ml.demand:_STORAGE_LOSS_FACTOR` values are ~3× -LOWER than spec. E.g. for 38mm foam our value is 0.0056, spec is -0.0181. Effect: we UNDER-predict cylinder storage loss by ~300 kWh -for storage systems, partly cancelling the over-prediction from -Finding 1. - -**When to pick up**: when you reach §4 / Table 2. Fix WITH Finding 1 -(combi zero-loss) so the cancellation doesn't dominate the -direction. - -### Finding 7 — Heat-pump fallback efficiency 2.30 -**Status**: gap that requires PCDB. See §8b. - -### Finding 8 — Other smaller gaps (carry forward) -- Boiler interlock −5% penalty (§9.2.1) — never applied -- Table 4c condensing boiler / HP emitter temperature adjustment — never applied -- Control-temperature adjustment from Table 4e — always 0 in code, spec varies -- Wall U-values for Scotland / Wales / NIR — only England fully transcribed -- Per-junction thermal bridging (Table R2) — global y approximation only -- Multi-main heating (`main_heating_fraction` ≠ 1) — first main only -- Cooling §10 — not implemented (rare in UK) -- FEE §11 — not implemented (new-build only) - ---- - -## 8. Don't repeat — known dead-ends - -> **Re-read after §3 + §7b.** Three entries below were classified as -> "dead-ends because cert-cal absorbs" — that diagnosis is wrong. -> They are spec-correct fixes that were measured under a noisy probe. -> Now flagged as **conditional dead-ends**: dead only if you try them -> before P1–P5 land. After prerequisites: they are expected -> improvements, not dead-ends. See ADR-0010. - -- ❌ **Switching "NI" wall thickness to None alone** (S-B5 in history) — - over-corrected because it routed to the (Unfilled cavity, 50mm) row - instead of the dedicated Filled cavity row. The right fix landed in - S-B23 with a `WALL_INSULATION_FILLED_CAVITY` dispatcher. -- ❌ **Aggressive efficiency rescue for missing `sap_main_heating_code`** - (S-B5) — over-corrected. The category fallback (cat=4 → 2.30) is - intentionally conservative; PCDB (P4 prerequisite) supplies the - real efficiency. -- ⚠️ **Using SAP 10.2 spec prices for parity validation** — under - the dirty probe, cert-cal prices fit better. **Inverts under the - clean probe (P2 + P3): SAP 10.2 spec prices are correct because the - Validation Cohort is on the 14-03-2025 amendment.** Listed here - only as a warning if you start the sweep before prerequisites land. -- ❌ **Always applying 10% secondary heating** — must be conditional on - cert lodging or main system being electric storage (S-B20). See - spec Appendix A.4. -- ❌ **Respecting `main_heating_fraction` for secondary allocation** - (failed S-B30) — the field is the multi-main allocation (system 1 vs - system 2), not main-vs-secondary. SAP MAE 4.69 → 4.85 (worse). -- ⚠️ **Switching cat=10 room heaters off off-peak** (failed S-B32) — - spec-correct per Table 12a. The bias inversion under the dirty - probe was driven by cert-cal compensating; on the clean probe this - is just spec-correct. Land as part of the §12 spec sweep after - prerequisites. -- ⚠️ **Adding gas standing charges** (4-mode probe, unimplemented) — - spec-correct per Table 12 note (a). Same logic: bias drift under - dirty probe is version-mixture + missing-PCDB noise, not Elmhurst - deviation. Land as part of §12 spec sweep. -- ⚠️ **Zeroing storage + primary loss for combi boilers** (uncommitted - S-B32) — spec-correct per Table 2 + Table 3 footers. SAP MAE - regression was driven by the now-retired golden fixtures (§10) and - cert-cal absorption. Land as part of §4 / Appendix J sweep. - ---- - -## 9. The cert corpus and parity probe - -### Sample -`data/ml_training/runs/2025_2026_n250000_v18a/data.parquet` is the -250k-cert parquet. **After P1 lands** the parquet carries -`inspection_date`; the probe then filters to the **Validation Cohort** -(`inspection_date ≥ 2025-07-01`) plus `sap_score ∈ [5, 99]` and -samples 300 at seed=7 by default. Filtering rationale: -- ≤ 5 is heritage/anomaly stock (sub-3 % of corpus) -- ≥ 99 is full-SAP new-builds the parquet excludes anyway -- `inspection_date ≥ 2025-07-01` ensures every cert was lodged on - SAP 10.2 (14-03-2025 amendment) — see [CONTEXT.md](../../CONTEXT.md) - / "Validation Cohort" and ADR-0010 §3. - -### Run the probe -```bash -python -c " -import sys -sys.path.insert(0, 'packages/domain/src') -sys.path.insert(0, '.') -sys.path.insert(0, 'services/ml_training_data/src') -from ml_training_data.sap_parity_probe import main -main(['300','7']) -" -``` - -### What the probe shows -- Aggregate SAP MAE / RMSE / bias -- Aggregate PE MAE / RMSE / bias -- Per-end-use PEUI breakdown (space / HW / lighting / pumps) -- Stratification by `main_heating_category`, `construction_age_band`, - `dwelling_type` -- Worst-15 residuals (SAP and PE) - -### Known parquet limitations -- ~0.7% of parquet certs have `construction_age_band=None` vs 15% in - the raw bulk-zip. The parquet filters out full-SAP new-builds - upstream. Don't measure full-SAP-path slices against the parquet. -- Heat-pump certs (cat=4) are under-represented and concentrated in - the worst-residual tail because PCDB efficiency is unavailable. - ---- - -## 10. Fixtures: retire the 7 cert-based golden fixtures, replace with BRE worked examples (per ADR-0010 + P5) - -The 7 cert-based fixtures at -`packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py` -were locked in against the current calculator state — *with* cert-cal, -*without* PCDB, *with* HW cylinder loss always applied, *with* the -lighting heuristic, etc. They are documented in §3 / the prior -handover as containing compensating errors. Once the prerequisites -land, every spec-correct fix breaks at least one of them. They will -fight the spec sweep. - -### Replacement strategy - -**Primary regression suite: BRE worked-example fixtures.** - -Transcribe the worked examples from: -- SAP 10.2 spec appendices (especially Appendix R — reference values - and the worked example dwelling). -- RdSAP 10 (10-06-2025) worked-example annex. - -Each worked example becomes a unit test that locks **per-intermediate -expected values** (HLP, HTC, mean internal temperature monthly, MIT, -ECF, SAP score) rather than the aggregate SAP score alone. Because -they are spec-derived, no spec-correct change can break them — any -break is an implementation bug, unambiguously. - -These tests live at -`packages/domain/src/domain/sap/tests/test_bre_worked_examples.py` -(new module — separate from the cert-based fixtures module). - -**Cert-based fixtures retired.** - -The current `test_golden_fixtures.py` is either deleted or repurposed -as a *very loose* smoke-test integration suite (e.g. `|SAP residual| -≤ 5`) that catches catastrophic regressions only. The 7 cert JSONs -under `fixtures/golden/.json` can be kept on disk as reference -data, but they no longer drive go/no-go decisions in the sweep. - -**Optional future addition.** - -If/when a current Elmhurst (or Stroma / Quidos / NHER) license is -available, run a handful of representative corpus certs through it -and lock those outputs as a second-tier regression suite — Elmhurst- -parity fixtures alongside spec-parity fixtures. Not a prerequisite. - ---- - -## 11. Trace mode (prerequisite P5 — implementation sketch) - -This section was originally labelled "recommended"; it is now -**prerequisite P5** per ADR-0010. The sweep does not start until -`intermediate` is populated everywhere. ADR-0009 proposed: -```python -@dataclass(frozen=True) -class SapResult: - sap_score: float - ... - intermediate: dict[str, float] -``` - -The `intermediate` field was never populated. Suggested implementation -for the systematic pass: - -```python -intermediate = { - # §1 dimensions - "tfa_m2": tfa, - "volume_m3": volume, - "storey_count": storeys, - # §3 heat transmission - "walls_w_per_k": ht.walls_w_per_k, - "roof_w_per_k": ht.roof_w_per_k, - "floor_w_per_k": ht.floor_w_per_k, - "party_walls_w_per_k": ht.party_walls_w_per_k, - "windows_w_per_k": ht.windows_w_per_k, - "doors_w_per_k": ht.doors_w_per_k, - "thermal_bridging_w_per_k": ht.thermal_bridging_w_per_k, - "infiltration_ach": infiltration, - "infiltration_w_per_k": infiltration * volume * 0.33, - "heat_transfer_coefficient_w_per_k": hlc, - "heat_loss_parameter_w_per_m2k": hlp, - "time_constant_h": tau_h, - # §5 internal gains (annual averages) - "internal_gains_annual_avg_w": ..., - # §7 mean internal temperature (annual avg) - "mean_internal_temp_annual_avg_c": ..., - # §9 space heating - "useful_space_heating_kwh_per_yr": space_heating_kwh, - # §12 fuel costs (per end-use) - "main_heating_cost_gbp": ..., - "hot_water_cost_gbp": ..., - "lighting_cost_gbp": ..., - "pumps_fans_cost_gbp": ..., - # §13 rating - "ecf": ecf, - "deflator": 0.36, - # §14 primary energy and CO2 per end-use - "space_heating_pe_kwh_per_m2": ..., - "hot_water_pe_kwh_per_m2": ..., - ... -} -``` - -Once populated, the differential debugging the reviewer recommended -becomes possible: change one input field, compare deltas against an -Elmhurst export. - ---- - -## 12. Specific section-1 starting tasks (suggested first session) - -A concrete pickup point: - -### Session 1 — §1 (Introduction), §2 (Property Descriptors), §3 (Dimensions) -- §1 is prose; nothing to verify. -- §2 maps to `EpcPropertyData`. Verify that every field RdSAP §2 - enumerates is present and correctly typed on the domain object. - Specifically check: `dwelling_type`, `built_form`, `property_type`, - `construction_age_band`, `country_code`. Note that - `construction_age_band` is per-building-part, not dwelling-level, - and the primary age band drives most defaults. -- §3 maps to `worksheet/dimensions.py`. Verify: - - Total floor area sum across building parts equals TFA - - Volume calculation per storey × area × height - - Storey count handling for extensions and room-in-roof - - Multi-storey heat-loss-perimeter rules - -This single session should produce zero behaviour changes if §1-3 are -correctly implemented, but expect to find at least one issue in §3 -geometry (per the reviewer's "biggest SAP error sources" list). - -**Important:** Session 1 only starts after all five prerequisites in -§2.5 have landed and the Validation Cohort probe baseline has been -captured. Until then, running per-section verification produces noisy -signal. - -Run the BRE worked-example fixtures (P5) + Validation Cohort probe -(P3) at the end of each session; expect no movement until you start -hitting actual gaps. - ---- - -## 13. Workflow recap - -**Phase 0 — Prerequisites (§2.5).** Land P1–P6 first, in dependency -order: - -| | Slice | Depends on | -|---|---|---| -| P1 | Re-extract parquet with `inspection_date` | — | -| P2 | Delete cert-cal; correct `table_12.py` CO2 factors | — | -| P3 | Filter parity probe to Validation Cohort | P1 | -| P4 | Implement `PcdbLookup` | — (P2 helpful) | -| P5 | Populate `SapResult.intermediate` + transcribe BRE worked examples | — | -| P6 | Strict-type `EpcPropertyData` via codes.csv-derived enums | — | - -P1, P2, P4, P5, P6 can run in parallel. P3 needs P1. Capture a -Validation Cohort probe baseline once all six land — that is the new -MAE starting line. Repo-wide tests stay green throughout P6 (Site -Notes consumers, ML pipeline, recommendations, etc. all need the -mapper updates that accompany each typing change). - -**Phase 1 — Section sweep.** For each RdSAP 10 section, in document -order: - -1. Read the spec section text + cited tables. -2. Identify code location(s). -3. For each rule / table / footnote: - - Does our code implement it? - - Does the implementation match? - - Edge cases / fallback paths handled? -4. For each gap: AAA unit test (preferring a BRE worked-example - assertion on `intermediate` values when possible) → minimal - implementation → commit. -5. **Apply the worksheet-faithful structure principle** (§5.5) as - part of this slice: name functions after worksheet lines, split - compound calculations, replace any remaining defensive - type-handling with typed-enum dispatch. -6. After each commit: run `test_bre_worked_examples.py` + Validation - Cohort probe. Note both deltas in the commit message. -7. If a BRE worked-example breaks: the new code is wrong (revert). - The worked examples are spec-derived and cannot be broken by - spec-correct changes. - -Stick to this. The prior session's mistake was jumping between -sections based on residual-size **on a dirty probe**. Clean probe -plus document-order discipline plus worksheet-faithful structure is -what makes the sweep converge. - ---- - -## 14. Useful references - -- **ADR-0010** `docs/adr/0010-sap10-calculator-spec-target-and-validation.md` - — the binding decisions reflected in this rewrite: SAP 10.2 target, - cert-cal deletion, Validation Cohort, PCDB-as-prerequisite, fixture - retirement. **Read first.** -- **ADR-0009** `docs/adr/0009-deterministic-sap-calculator.md` — - original calculator decision rationale + Session A/B/C plan. Read - for context; spec-version target / PCDB sequencing / cert-cal - rationale are superseded by ADR-0010. -- **Spec coverage map** - `docs/sap-spec/SPEC_COVERAGE.md` — pre-existing coverage tracker. - Update as you go. -- **Parity findings** - `docs/sap-spec/PARITY_FINDINGS.md` — empirical findings from prior - sessions. -- **Earlier handover** - `docs/sap-spec/HANDOVER_FRESH_REVIEW.md` — orientation from the - previous fresh-context pass. -- **Reviewer feedback (informal)** — chatGPT critique of the slice-by- - slice approach. Key recommendations: two-layer architecture - (RdSAP expansion → SAP worksheet), trace mode, golden-master - methodology, differential debugging, reference traces from - Elmhurst/Stroma/Quidos. -- **Commit log** — `git log --oneline` shows the slice history; each - S-Bxx commit message documents the spec ref + measured impact. - ---- - -## 15. Final note - -The prior session's framing — *"the cert-calibration layer absorbs -Elmhurst's spec deviations; we'll re-derive it at the end"* — was -load-bearing on a false diagnosis. The cert-cal layer is -pre-March-2025 SAP prices fit against a mixture distribution of two -spec-version regimes. Once you separate the regimes (Validation -Cohort) and use spec prices everywhere, the "tension" disappears. - -After P1–P5 land, the section sweep is straightforward: every -spec-correct fix is unambiguously the right answer, BRE -worked-example fixtures lock the result, and Validation Cohort probe -MAE moves monotonically downward. The fixes the prior session marked -as "spec-correct but probe-regressed" become trivially landable. - -**Welcome to the project. Read ADR-0010, land the five prerequisites, -then walk the spec in document order. The deterministic answer is in -there.**