docs: handover for systematic section-by-section RdSAP 10 review

The slice-by-slice "fix the biggest residual" approach has hit a
ceiling at SAP MAE ~4.6 because the cert-calibration prices absorb
multiple structural deviations from spec. Any spec-correct fix in one
component breaks the calibration for others. Three failed slices this
session (standing charges, cat=10 routing, combi zero-loss) made the
pattern unambiguous.

Pivot: systematic section-by-section spec verification. Read the
RdSAP 10 + SAP 10.2 spec in order, check each table / formula /
footnote against the corresponding code, fix gaps one at a time.
Build the spec-correct engine first; re-derive cert-cal calibration
once at the end as a thin Elmhurst-compatibility layer.

Handover doc covers:
- Critical framing (deterministic, not assessor judgement)
- Current state (SAP MAE 4.61, PE MAE 43.32 at f4a8d2a0)
- Why the slice-by-slice approach won't converge
- Scope decisions (RdSAP 10 + SAP 10.2 only; park full-SAP + PCDB)
- Section-to-code mapping
- Known dead-ends to skip
- Cert-calibration vs spec-correctness tension and how to resolve it
- The 7 golden fixtures and their compensating-error caveats
- Trace mode recommendation (ADR-0009's `intermediate` field)
- Specific §1-3 starting tasks
- Workflow recap

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-05-19 07:30:27 +00:00
parent f4a8d2a017
commit 3363f63f5e

View file

@ -0,0 +1,615 @@
# Handover — Systematic Section-by-Section RdSAP 10 / SAP 10.2 Review
**Audience:** A fresh agent picking up the deterministic SAP calculator at
`packages/domain/src/domain/sap/`. Read this first, then the spec PDFs,
then the code.
**Goal:** Match the cert software (Elmhurst / Stroma / etc.) output exactly
for RdSAP 10 / SAP 10.2 input certs. This is a **deterministic, mechanical
calculation** — not a model — so MAE should approach zero on certs whose
inputs are fully populated.
---
## 1. Critical framing — this is NOT a judgement call
The SAP/RdSAP energy assessment splits cleanly into two roles:
1. **The assessor** — a person who surveys the dwelling and lodges
measured/observed fields onto the cert (areas, perimeters,
construction codes, insulation thicknesses, fuel types, etc.).
The assessor makes NO calculation decisions.
2. **The cert software** (Elmhurst, Stroma, Quidos, NHER, ECMK) — a
deterministic implementation of the RdSAP 10 + SAP 10.2 specs. It
takes the lodged fields and produces SAP score, CO2 emissions,
primary energy (PEUI), CO2 per m², EI rating, etc.
**Our calculator is replicating role #2.** Where Elmhurst's
implementation diverges from spec, we follow Elmhurst, but we don't
guess at divergence; we localise it via reference traces or
empirically against the cert corpus.
There is no "assessor judgement" knob to tune. Each field on the cert
has a deterministic interpretation per the spec. Each spec table /
formula has a deterministic implementation. Our job is to enumerate
all of them and verify each.
---
## 2. Current state (2026-05-19)
- Branch: `ara-backend-design-prd`
- Last clean commit: `f4a8d2a0` ("tests: golden-fixture regression set — 7 currently-correct corpus certs")
- 301 tests passing
- Parity probe (300 random certs from
`data/ml_training/runs/2025_2026_n250000_v18a/data.parquet`, seed=7,
`sap_score ∈ [5, 99]`):
| Metric | Value |
|---|---|
| SAP MAE | 4.61 |
| SAP bias | +0.87 |
| PE MAE | 43.32 kWh/m² |
| PE bias | +37.69 kWh/m² |
- 7 "golden" regression certs locked in
`packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`.
Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10 kWh/m²`. Known
caveat: some of these are compensating-error matches (e.g. cert
`7536-3827`'s PE matches but cost is £143 under cert's implied cost
due to multi-factor offsetting bugs).
---
## 3. Why we are pivoting to systematic review
The prior session shipped ten slices (S-B23 → S-B31) by debugging the
biggest residuals one at a time:
- **PE MAE dropped substantially: 57.28 → 43.32 (14)** — real progress
on the demand-side calculation.
- **SAP MAE barely moved: 5.34 → 4.61 (0.73)** — the cost-side is
bottlenecked by cert-calibration prices that absorb multiple
structural deviations from spec, making any single slice that fixes
one component break the calibration for others.
Two failed slice attempts in the prior session exposed the pattern:
- **Standing charges**: spec note Table 12 (a) clearly says gas standing
charge of £92 is added to space + water heating costs for energy
ratings. Empirically: adding it pushed SAP bias from +0.98 to 2.62.
Reverted before committing.
- **Cat=10 room heaters off-peak routing**: Table 12a clearly says
"Other direct-acting electric heating" bills 100% high rate on
7-hour tariff. Empirically: switching cat=10 from off-peak to
standard rate inverted the bias from +5.88 to 6.00 without
improving MAE. Reverted before committing.
- **Hot water cylinder loss (uncommitted)**: spec Table 2 footer +
Table 3 footer clearly say combi boilers using Table 4b efficiency
have zero storage + primary loss. Empirically: zeroing them dropped
PE MAE 6.64 (huge improvement) but raised SAP MAE +0.39 AND broke
3 of 7 golden fixtures. Reverted because no way to know whether to
follow spec (PE-correct) or Elmhurst (SAP-MAE-correct) without
reference traces.
The pattern: **the cert-calibration prices** (in
`domain.sap.tables.table_12_cert_calibration`) **were reverse-engineered
to match Elmhurst's output assuming all our other calculations are
correct.** When we fix a spec-violation bug in some other component, we
break the calibration and SAP MAE goes up even though we're more
spec-correct.
This means **whack-a-mole on the biggest residual won't converge**. We
need to systematically verify every component against the spec, then
re-derive the cert-calibration once at the end.
---
## 4. Scope decisions
### IN scope
- **RdSAP 10 specification (10-06-2025)** — full document, all sections
(`docs/sap-spec/rdsap-10-specification-2025-06-10.pdf`, 114 pages).
- **SAP 10.2 full specification (14-03-2025)** — the worksheet, tables,
appendices that RdSAP 10 references
(`docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf`, 199 pages).
### OUT of scope (for now)
- **Full SAP assessments.** Full-SAP certs lodge a measured/calculated
U-value in `walls[i].description` (e.g.
"Average thermal transmittance 0.18 W/m²K"). These are a separate
calculation path (BS EN ISO 6946) and a different corpus. **Park them
until the RdSAP 10 base case matches Elmhurst.** S-B24 / S-B29
attempted partial handling; those slices can stay or be reverted at
your discretion when you reach §§4-7 of RdSAP and §3 of SAP 10.2.
- **PCDB (Product Characteristics Database).** ADR-0009 says Session C.
Heat pumps (cat=4) have catastrophic per-cert MAE because we use
Table 4a fallback efficiency 2.30 instead of PCDB SCOP. There's a
`NoOpPcdbLookup` stub seam ready in Session A; data fetch + parser
is its own milestone.
- **SAP 10.3** (13-01-2026). The corpus is SAP 10.2. SAP 10.3 has
identical Table 12 codes (only values shift). Don't update spec
references to 10.3 until the corpus migrates.
---
## 5. The approach — section-by-section spec verification
Work through the RdSAP 10 spec **in document order**, starting at
§1. For each section:
### 5.1. Read the spec section
Read the section text fully. Note every rule, table reference, and
defaulting cascade.
### 5.2. Find the corresponding code
Map the section to the source file(s) implementing it. The current
mapping (some sections are split across modules):
| RdSAP 10 section | Code location |
|---|---|
| §1 Introduction / general | n/a |
| §2 Property descriptors | `datatypes/epc/domain/epc_property_data.py` |
| §3 Dimensions | `packages/domain/src/domain/sap/worksheet/dimensions.py` |
| §4 Ventilation | `packages/domain/src/domain/sap/worksheet/ventilation.py` |
| §5 Construction / U-values | `packages/domain/src/domain/ml/rdsap_uvalues.py` + `worksheet/heat_transmission.py` |
| §6 Windows / doors / overshading | `worksheet/solar_gains.py` + `rdsap/cert_to_inputs.py` |
| §7 Heating systems (refers to SAP 10.2 Appendix A) | `domain.ml.sap_efficiencies` + `rdsap/cert_to_inputs.py` |
| §8 Heating controls (Table 4e) | `rdsap/cert_to_inputs.py` |
| §9 Heat emitters / flow temperatures | not implemented |
| §10 Space and water heating (Appendix A) | `rdsap/cert_to_inputs.py` |
| §11 Additional items (PV, batteries, wind, hydro, shutters) | partial in `cert_to_inputs.py` (PV only) |
| §12 Electricity tariff | `rdsap/cert_to_inputs.py` (`_is_off_peak_meter`, fuel routing) |
| §13 Addendum to EPCs | n/a |
| §14 Special cases (e.g. flats above commercial) | not implemented |
| §15 Improvements (recommendations) | n/a (not rating) |
| §16-19 RdSAP-specific SAP rating equations | `worksheet/rating.py` |
| Table 27 — Living-area fraction | `rdsap/cert_to_inputs.py:_living_area_fraction` |
| Table 28 — Cylinder size defaults | `domain.ml.demand:_CYLINDER_VOLUME_L` |
| Table 29 — Heating + HW parameters | partial in `cert_to_inputs.py` |
| Table 30 — Mechanical ventilation | not implemented |
| Table 31 — Data to be collected | n/a |
### 5.3. For each spec rule in the section, check our code
For each table, formula, footnote, exception:
1. Does our code implement it?
2. Does the implementation match the spec values exactly?
3. Are there spec-defined edge cases / footnotes we're missing?
### 5.4. When a gap is found
- Write a failing unit test that asserts the spec-correct behaviour.
- Implement the fix.
- Run **all 7 golden fixtures** plus the broader probe. Note both
direction and magnitude of change.
- If the fix is spec-correct but breaks a golden fixture, this is
evidence that the fixture was a compensating-error case — proceed
with the spec-correct fix and update the fixture (with a comment
noting it was a compensating case).
- Commit per-slice as before: one section → one commit. Reference the
spec section in the commit message.
### 5.5. Use trace mode when you need it
ADR-0009 specifies a `SapResult.intermediate: dict[str, float]` field
that was never populated. Adding this is highly recommended for the
systematic pass — each section's verification benefits from
inspecting the intermediate values. See §11 below for a sketch.
---
## 6. What's already been done — section by section
This is your starting map. Each row says whether the section has been
touched and what the current state is.
### Walls / construction (§5)
- **S-B23 (committed `9a509e41`)**: Table 6 "Filled cavity" row dispatch
when `wall_insulation_type=2` AND `wall_construction=4`. Spec-anchored.
- **S-B24 (committed `15613309`)**: Parse `walls[i].description` for
"Average thermal transmittance X W/m²K". **PARK** — full-SAP path.
- **S-B25 (committed `6b934710`)**: Description-based dispatch for cavity
"as built, insulated (assumed)" + similar (type=4 with descriptive
signal). Spec-anchored via legacy `epc_wall_description_map`.
- **S-B26 (committed `361f9154`)**: `_insulation_bucket(0, True) → 50`
fix (the "NI" thickness sentinel) + description-based override of
`wall_ins_present` for non-cavity walls. Spec footnote (Table 6).
- **S-B27 (committed `1f49fa03`)**: Floor `_insulation_bucket` analog —
Table 19 footnote (2) "max(50, age-band default)" when description
signals retrofit.
- **S-B28 (committed `25261d5c`)**: Roof NI thickness + insulated
description → §5.11.4 footnote 50mm joist row.
- **S-B29 (committed `3ab09845`)**: Floor + roof "Average thermal
transmittance" parse. **PARK** — full-SAP path.
**Still to verify in §5**:
- Stone wall U-values for Scotland / Wales / NIR (Tables 7-10) — only
England is fully transcribed; country overrides are partial.
- Cob U-values (§5.6) — table only, no formula implementation.
- Stone formula §5.6 / §5.7 for non-standard wall thicknesses.
- Curtain wall §5.18 — not implemented.
- Party wall U-values (Table 15) — implemented in `u_party_wall`,
verify table values.
- Thermal bridging (Table 21) — implemented as global `y` factor,
verify per-age-band values.
- §5.16 Thermal mass — Table 22 (only 100 / 250 kJ/m²K, dispatched
by construction type with internal insulation). Currently we
hardcode 250 (see `cert_to_inputs.py:_DEFAULT_THERMAL_MASS_PARAMETER_KJ_PER_M2_K`).
This is wrong for timber-frame / cob / internally-insulated masonry
(should be 100).
### Heating systems (§§7-10, SAP Appendix A)
- **S-B20 (in history)**: Table 11 secondary heating allocation,
conditional on cert lodging secondary or being electric storage.
- **Failed S-B30 (reverted)**: respect `main_heating_fraction`
shown empirically wrong. Field is multi-main allocation, not
main-vs-secondary. Spec verified against SAP 10.2 Appendix A1/A4.
- **S-B31 (committed `afdf297f`)**: Table 12c DLF on heat-network main.
Spec §C3.1 + Table 12c.
- **Failed S-B32 (room heater off-peak routing, reverted)**: Table 12a
says cat=10 room heaters on 7-hour tariff bill 100% high rate. Our
cert-cal extends off-peak to codes 691-696. Spec-correct fix
inverted bias direction — calibration was absorbing this.
- **Uncommitted HW cylinder fix**: spec-correct (combi → zero
storage/primary loss per Table 2 + Table 3 footers) but breaks 3
golden fixtures. Decision deferred to systematic pass.
**Still to verify in heating**:
- Table 4a efficiency values for every code (heat pumps, storage
heaters, room heaters, CPSU). The category-fallback (cat=4 → 2.30)
is documented as a known limitation.
- Boiler interlock penalty (5%) — spec §9.2.1: "The efficiency of
gas and liquid fuel boilers for both space and water heating is
reduced by 5% if the boiler is not interlocked for space and water
heating." We don't apply this. Known gap.
- Table 4c condensing-boiler / heat-pump emitter-temperature
adjustment — we don't apply this.
- Table 12a high-rate fractions for off-peak dwellings — we apply
100% off-peak or 100% standard, never fractional blending.
### Hot water (§4 SAP + Appendix J)
- Storage loss factor table (Table 2) — current values in
`domain.ml.demand:_STORAGE_LOSS_FACTOR` are ~3× off from spec
(verified). Known under-prediction of cylinder loss for storage
systems; cancelled by over-prediction of primary loss for combi
systems in aggregate.
- Primary loss formula (Table 3) — implemented as 245/60 kWh by age
band. Spec is a per-month formula `nₘ × 14 × [{0.0091·p + 0.0245·(1-p)}·h + 0.0263]`
with `p` (pipework insulation fraction) and `h` (circulation hours).
Known approximation.
- Combi-boiler zero-loss rule (Table 2 + Table 3 footers) — currently
NOT applied (the failed uncommitted slice). Adding this drops PE
MAE 6.64 but raises SAP MAE +0.39.
- Appendix J Vd formula `25N + 36` — currently the simple form, not
the full per-component (shower / bath / other) breakdown. Useful
HW demand is ~7% under spec value.
- ΔT — currently 43°C constant (5512). Spec uses monthly Tcold and
hot at 52°C, not 55°C. Per-month variance unmodelled.
### Lighting (Appendix L)
- `predicted_lighting_kwh` in `domain.ml.demand` uses `9.3 × TFA ×
(1 0.5·led_share 0.4·cfl_share)` heuristic.
- Spec is L1-L12: daylight correction, fixed-lighting capacity, top-up
+ portable shares, monthly profile.
- For LED-dominant home (50+ LEDs): our heuristic gives ~465 kWh, spec
gives ~94 kWh. Known over-prediction by ~5× for new-build LED homes.
### Internal gains (§5 SAP)
- `worksheet/internal_gains.py` implements metabolic + cooking +
appliances + lighting (the four positive rows of Table 5).
- **Missing**: Water heating row (`1000 × (65)ₘ / (nₘ × 24)` — i.e.
HW losses recycled as heated-space gains) and Losses row (`40 × N`
for cold inflow + evaporation). Both documented in S-B23 gap list.
### Ventilation (§4 / Table 5)
- Wind-shelter factor implemented in S-B21.
- Mechanical ventilation (MVHR, MEV, PIV) — not implemented; cert
rarely lodges. Spec §4.2 + Table 4g.
- Pressure-test override (worksheet lines 17-18) — not implemented.
### Tariff / cost (§12 + Table 12 / 12a / 12c)
- Cert-calibration prices in
`domain.sap.tables.table_12_cert_calibration` are an EMPIRICAL fit
to Elmhurst's output. They are LOWER than the published Table 12
spec values by 4-25%. Known divergence; investigation deferred.
- Standing charges (Table 12 note (a)) — NOT applied. Adding them
empirically worsens MAE (calibration absorbs).
- Table 12a high-rate fractions — currently 100% off-peak for E7-
eligible codes, 100% standard otherwise. No fractional blending.
- Heat network DLF (Table 12c) — applied per S-B31 only to main
heating + HW from main. HW-only-from-heat-network is a separate slice.
---
## 7. The cert-calibration vs spec-correctness tension
This is THE central architectural decision you have to make as you
work through the spec.
### Two tables of fuel prices
- `domain.sap.tables.table_12.UNIT_PRICE_P_PER_KWH` — SAP 10.2 spec
values (3.64p gas, 16.49p standard elec).
- `domain.sap.tables.table_12_cert_calibration.UNIT_PRICE_P_PER_KWH`
— empirically lower values (3.48p gas, 13.19p elec) that match the
cert assessor software's output.
### Two possible end states for the calculator
**End state A — Spec-perfect.** Use spec prices, apply every spec rule
(standing charges, Table 12a fractions, combi zero-loss, etc.). The
calculator output is then what a *correct SAP 10.2 implementation*
would produce. SAP MAE against the corpus will likely worsen because
Elmhurst doesn't perfectly implement spec.
**End state B — Elmhurst-perfect.** Use cert-cal prices and reproduce
Elmhurst's deviations exactly. The calculator output matches cert
SAP scores. The calculator becomes a "reverse-engineered Elmhurst
clone" rather than a SAP 10.2 implementation.
### The pragmatic recommendation
**Aim for state A but track state B as the parity probe.** Concretely:
1. Verify each spec section in isolation; fix spec violations
regardless of MAE impact, but commit each fix WITH a measured
probe delta in the commit message.
2. After the spec sweep is complete, the calculator's output is
spec-correct. The corpus residual at that point is Elmhurst's
deviation from spec.
3. THEN re-derive the cert-calibration prices to match Elmhurst's
deviation pattern. The calibration becomes a thin Elmhurst-
compatibility layer on top of a spec-correct engine.
This avoids the whack-a-mole problem because state A is unambiguous:
each fix is either spec-correct or not. State B is iterative on top
of state A, not entangled with it.
---
## 8. Don't repeat — known dead-ends
- ❌ **Switching "NI" wall thickness to None alone** (S-B5 in history) —
over-corrected because it routed to the (Unfilled cavity, 50mm) row
instead of the dedicated Filled cavity row. The right fix landed in
S-B23 with a `WALL_INSULATION_FILLED_CAVITY` dispatcher.
- ❌ **Aggressive efficiency rescue for missing `sap_main_heating_code`**
(S-B5) — over-corrected. The category fallback (cat=4 → 2.30) is
intentionally conservative; PCDB is needed for real efficiency.
- ❌ **Using SAP 10.2 spec prices for parity validation** — cert assessor
uses lower prices despite reporting `sap_version=10.2` (S-B9, S-B10).
Use `cert_calibration_prices()` for the probe.
- ❌ **Always applying 10% secondary heating** — must be conditional on
cert lodging or main system being electric storage (S-B20). See
spec Appendix A.4.
- ❌ **Respecting `main_heating_fraction` for secondary allocation**
(failed S-B30) — the field is the multi-main allocation (system 1 vs
system 2), not main-vs-secondary. SAP MAE 4.69 → 4.85 (worse).
- ❌ **Switching cat=10 room heaters off off-peak** (failed S-B32) —
spec-correct per Table 12a but inverts bias direction. Cert-cal
calibration absorbs the deviation.
- ❌ **Adding gas standing charges** (4-mode probe, unimplemented) —
spec-correct per Table 12 note (a) but pushes SAP bias from +0.98
to 2.62. Cert-cal calibration absorbs.
- ❌ **Zeroing storage + primary loss for combi boilers** (uncommitted
S-B32) — spec-correct per Table 2 + Table 3 footers and drops PE
MAE 6.64 (huge win) BUT raises SAP MAE +0.39 and breaks 3 golden
fixtures. Decision deferred to systematic pass.
---
## 9. The cert corpus and parity probe
### Sample
`data/ml_training/runs/2025_2026_n250000_v18a/data.parquet` is the
250k-cert parquet. The probe filters to `sap_score ∈ [5, 99]` and
samples 300 at seed=7 by default. Filtering rationale:
- ≤ 5 is heritage/anomaly stock (sub-3% of corpus)
- ≥ 99 is full-SAP new-builds the parquet excludes anyway
### Run the probe
```bash
python -c "
import sys
sys.path.insert(0, 'packages/domain/src')
sys.path.insert(0, '.')
sys.path.insert(0, 'services/ml_training_data/src')
from ml_training_data.sap_parity_probe import main
main(['300','7'])
"
```
### What the probe shows
- Aggregate SAP MAE / RMSE / bias
- Aggregate PE MAE / RMSE / bias
- Per-end-use PEUI breakdown (space / HW / lighting / pumps)
- Stratification by `main_heating_category`, `construction_age_band`,
`dwelling_type`
- Worst-15 residuals (SAP and PE)
### Known parquet limitations
- ~0.7% of parquet certs have `construction_age_band=None` vs 15% in
the raw bulk-zip. The parquet filters out full-SAP new-builds
upstream. Don't measure full-SAP-path slices against the parquet.
- Heat-pump certs (cat=4) are under-represented and concentrated in
the worst-residual tail because PCDB efficiency is unavailable.
---
## 10. The 7 golden fixtures
`packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`
locks 7 corpus certs as regression anchors:
| Cert | TFA | Cat | Notes |
|---|---|---|---|
| `0240-0200-5706-2365-8010` | 202 | 2 | Detached, age J, oil boiler, Table 4b code 130 |
| `0300-2747-7640-2526-2135` | 526 | 2 | Semi-detached, age D, gas PCDB |
| `0390-2954-3640-2196-4175` | 360 | 2 | Detached, age F, oil PCDB |
| `6035-7729-2309-0879-2296` | 128 | 2 | Mid-terrace, age A, gas combi code 104 |
| `7536-3827-0600-0600-0276` | 152 | 2 | Detached + extensions, age D, gas PCDB. Cleanest PE match (0.29 kWh/m²) |
| `8135-1728-8500-0511-3296` | 102 | 2 | Semi-detached, age C, gas PCDB |
| `9390-2722-3520-2105-8715` | 75 | 6 | Mid-floor flat, age D, heat network code 301 |
Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10`. **Tighten as
the spec sweep progresses.**
The cert JSONs are stored under `fixtures/golden/<cert>.json`
frozen at extraction time so the test is reproducible without
bulk-zip access. The probe extraction script for new fixtures is
inlined in the test history (see commit `f4a8d2a0`).
**Important caveat**: some of these 7 are compensating-error matches
(see §3). When a spec-correct slice breaks one, the fixture is
probably the compensating case — investigate before reverting.
---
## 11. Trace mode (recommended infrastructure)
ADR-0009 proposed:
```python
@dataclass(frozen=True)
class SapResult:
sap_score: float
...
intermediate: dict[str, float]
```
The `intermediate` field was never populated. Suggested implementation
for the systematic pass:
```python
intermediate = {
# §1 dimensions
"tfa_m2": tfa,
"volume_m3": volume,
"storey_count": storeys,
# §3 heat transmission
"walls_w_per_k": ht.walls_w_per_k,
"roof_w_per_k": ht.roof_w_per_k,
"floor_w_per_k": ht.floor_w_per_k,
"party_walls_w_per_k": ht.party_walls_w_per_k,
"windows_w_per_k": ht.windows_w_per_k,
"doors_w_per_k": ht.doors_w_per_k,
"thermal_bridging_w_per_k": ht.thermal_bridging_w_per_k,
"infiltration_ach": infiltration,
"infiltration_w_per_k": infiltration * volume * 0.33,
"heat_transfer_coefficient_w_per_k": hlc,
"heat_loss_parameter_w_per_m2k": hlp,
"time_constant_h": tau_h,
# §5 internal gains (annual averages)
"internal_gains_annual_avg_w": ...,
# §7 mean internal temperature (annual avg)
"mean_internal_temp_annual_avg_c": ...,
# §9 space heating
"useful_space_heating_kwh_per_yr": space_heating_kwh,
# §12 fuel costs (per end-use)
"main_heating_cost_gbp": ...,
"hot_water_cost_gbp": ...,
"lighting_cost_gbp": ...,
"pumps_fans_cost_gbp": ...,
# §13 rating
"ecf": ecf,
"deflator": 0.36,
# §14 primary energy and CO2 per end-use
"space_heating_pe_kwh_per_m2": ...,
"hot_water_pe_kwh_per_m2": ...,
...
}
```
Once populated, the differential debugging the reviewer recommended
becomes possible: change one input field, compare deltas against an
Elmhurst export.
---
## 12. Specific section-1 starting tasks (suggested first session)
A concrete pickup point:
### Session 1 — §1 (Introduction), §2 (Property Descriptors), §3 (Dimensions)
- §1 is prose; nothing to verify.
- §2 maps to `EpcPropertyData`. Verify that every field RdSAP §2
enumerates is present and correctly typed on the domain object.
Specifically check: `dwelling_type`, `built_form`, `property_type`,
`construction_age_band`, `country_code`. Note that
`construction_age_band` is per-building-part, not dwelling-level,
and the primary age band drives most defaults.
- §3 maps to `worksheet/dimensions.py`. Verify:
- Total floor area sum across building parts equals TFA
- Volume calculation per storey × area × height
- Storey count handling for extensions and room-in-roof
- Multi-storey heat-loss-perimeter rules
This single session should produce zero behaviour changes if §1-3 are
correctly implemented, but expect to find at least one issue in §3
geometry (per the reviewer's "biggest SAP error sources" list).
Run the golden fixtures + probe at the end of each session; expect no
movement until you start hitting actual gaps.
---
## 13. Workflow recap
For each section, in order:
1. Read the spec section text + cited tables.
2. Identify code location(s).
3. For each rule / table / footnote:
- Does our code implement it?
- Does the implementation match?
- Edge cases / fallback paths handled?
4. For each gap: AAA unit test → minimal implementation → commit.
5. After each commit: run golden fixtures (`pytest test_golden_fixtures.py`)
and the parity probe. Note both deltas in the commit message.
6. If a golden fixture breaks: investigate. Either fixture was a
compensating case (acceptable to break) or the new code is wrong
(revert).
Stick to this. The prior session's mistake was jumping between
sections based on residual-size. Don't.
---
## 14. Useful references
- **ADR-0009** `docs/adr/0009-deterministic-sap-calculator.md`
decision rationale + Session A/B/C plan.
- **Spec coverage map**
`docs/sap-spec/SPEC_COVERAGE.md` — pre-existing coverage tracker.
Update as you go.
- **Parity findings**
`docs/sap-spec/PARITY_FINDINGS.md` — empirical findings from prior
sessions.
- **Earlier handover**
`docs/sap-spec/HANDOVER_FRESH_REVIEW.md` — orientation from the
previous fresh-context pass.
- **Reviewer feedback (informal)** — chatGPT critique of the slice-by-
slice approach. Key recommendations: two-layer architecture
(RdSAP expansion → SAP worksheet), trace mode, golden-master
methodology, differential debugging, reference traces from
Elmhurst/Stroma/Quidos.
- **Commit log**`git log --oneline` shows the slice history; each
S-Bxx commit message documents the spec ref + measured impact.
---
## 15. Final note
The prior session demonstrated that **moving SAP MAE down requires
either spec-correctness OR Elmhurst-perfect calibration, not both
simultaneously**. The cert-cal layer absorbs Elmhurst's spec
deviations; any spec-correct fix risks breaking it.
The systematic pass clears this by separating the layers:
1. Build the spec-correct engine first.
2. Re-fit the cert-cal compatibility layer once at the end.
Don't be discouraged when SAP MAE rises temporarily during the spec
sweep. PE residual is the truer signal of engine correctness. SAP
MAE convergence will follow once cert-cal is re-derived against the
clean engine.
**Welcome to the project. Read the spec, follow the order, commit one
section at a time. The deterministic answer is in there.**