Model/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md
Khalim Conn-Kowlessar 3363f63f5e docs: handover for systematic section-by-section RdSAP 10 review
The slice-by-slice "fix the biggest residual" approach has hit a
ceiling at SAP MAE ~4.6 because the cert-calibration prices absorb
multiple structural deviations from spec. Any spec-correct fix in one
component breaks the calibration for others. Three failed slices this
session (standing charges, cat=10 routing, combi zero-loss) made the
pattern unambiguous.

Pivot: systematic section-by-section spec verification. Read the
RdSAP 10 + SAP 10.2 spec in order, check each table / formula /
footnote against the corresponding code, fix gaps one at a time.
Build the spec-correct engine first; re-derive cert-cal calibration
once at the end as a thin Elmhurst-compatibility layer.

Handover doc covers:
- Critical framing (deterministic, not assessor judgement)
- Current state (SAP MAE 4.61, PE MAE 43.32 at f4a8d2a0)
- Why the slice-by-slice approach won't converge
- Scope decisions (RdSAP 10 + SAP 10.2 only; park full-SAP + PCDB)
- Section-to-code mapping
- Known dead-ends to skip
- Cert-calibration vs spec-correctness tension and how to resolve it
- The 7 golden fixtures and their compensating-error caveats
- Trace mode recommendation (ADR-0009's `intermediate` field)
- Specific §1-3 starting tasks
- Workflow recap

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 07:30:27 +00:00

26 KiB
Raw Blame History

Handover — Systematic Section-by-Section RdSAP 10 / SAP 10.2 Review

Audience: A fresh agent picking up the deterministic SAP calculator at packages/domain/src/domain/sap/. Read this first, then the spec PDFs, then the code.

Goal: Match the cert software (Elmhurst / Stroma / etc.) output exactly for RdSAP 10 / SAP 10.2 input certs. This is a deterministic, mechanical calculation — not a model — so MAE should approach zero on certs whose inputs are fully populated.


1. Critical framing — this is NOT a judgement call

The SAP/RdSAP energy assessment splits cleanly into two roles:

  1. The assessor — a person who surveys the dwelling and lodges measured/observed fields onto the cert (areas, perimeters, construction codes, insulation thicknesses, fuel types, etc.). The assessor makes NO calculation decisions.
  2. The cert software (Elmhurst, Stroma, Quidos, NHER, ECMK) — a deterministic implementation of the RdSAP 10 + SAP 10.2 specs. It takes the lodged fields and produces SAP score, CO2 emissions, primary energy (PEUI), CO2 per m², EI rating, etc.

Our calculator is replicating role #2. Where Elmhurst's implementation diverges from spec, we follow Elmhurst, but we don't guess at divergence; we localise it via reference traces or empirically against the cert corpus.

There is no "assessor judgement" knob to tune. Each field on the cert has a deterministic interpretation per the spec. Each spec table / formula has a deterministic implementation. Our job is to enumerate all of them and verify each.


2. Current state (2026-05-19)

  • Branch: ara-backend-design-prd

  • Last clean commit: f4a8d2a0 ("tests: golden-fixture regression set — 7 currently-correct corpus certs")

  • 301 tests passing

  • Parity probe (300 random certs from data/ml_training/runs/2025_2026_n250000_v18a/data.parquet, seed=7, sap_score ∈ [5, 99]):

    Metric Value
    SAP MAE 4.61
    SAP bias +0.87
    PE MAE 43.32 kWh/m²
    PE bias +37.69 kWh/m²
  • 7 "golden" regression certs locked in packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py. Tolerance: |SAP residual| ≤ 1, |PE residual| ≤ 10 kWh/m². Known caveat: some of these are compensating-error matches (e.g. cert 7536-3827's PE matches but cost is £143 under cert's implied cost due to multi-factor offsetting bugs).


3. Why we are pivoting to systematic review

The prior session shipped ten slices (S-B23 → S-B31) by debugging the biggest residuals one at a time:

  • PE MAE dropped substantially: 57.28 → 43.32 (14) — real progress on the demand-side calculation.
  • SAP MAE barely moved: 5.34 → 4.61 (0.73) — the cost-side is bottlenecked by cert-calibration prices that absorb multiple structural deviations from spec, making any single slice that fixes one component break the calibration for others.

Two failed slice attempts in the prior session exposed the pattern:

  • Standing charges: spec note Table 12 (a) clearly says gas standing charge of £92 is added to space + water heating costs for energy ratings. Empirically: adding it pushed SAP bias from +0.98 to 2.62. Reverted before committing.
  • Cat=10 room heaters off-peak routing: Table 12a clearly says "Other direct-acting electric heating" bills 100% high rate on 7-hour tariff. Empirically: switching cat=10 from off-peak to standard rate inverted the bias from +5.88 to 6.00 without improving MAE. Reverted before committing.
  • Hot water cylinder loss (uncommitted): spec Table 2 footer + Table 3 footer clearly say combi boilers using Table 4b efficiency have zero storage + primary loss. Empirically: zeroing them dropped PE MAE 6.64 (huge improvement) but raised SAP MAE +0.39 AND broke 3 of 7 golden fixtures. Reverted because no way to know whether to follow spec (PE-correct) or Elmhurst (SAP-MAE-correct) without reference traces.

The pattern: the cert-calibration prices (in domain.sap.tables.table_12_cert_calibration) were reverse-engineered to match Elmhurst's output assuming all our other calculations are correct. When we fix a spec-violation bug in some other component, we break the calibration and SAP MAE goes up even though we're more spec-correct.

This means whack-a-mole on the biggest residual won't converge. We need to systematically verify every component against the spec, then re-derive the cert-calibration once at the end.


4. Scope decisions

IN scope

  • RdSAP 10 specification (10-06-2025) — full document, all sections (docs/sap-spec/rdsap-10-specification-2025-06-10.pdf, 114 pages).
  • SAP 10.2 full specification (14-03-2025) — the worksheet, tables, appendices that RdSAP 10 references (docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf, 199 pages).

OUT of scope (for now)

  • Full SAP assessments. Full-SAP certs lodge a measured/calculated U-value in walls[i].description (e.g. "Average thermal transmittance 0.18 W/m²K"). These are a separate calculation path (BS EN ISO 6946) and a different corpus. Park them until the RdSAP 10 base case matches Elmhurst. S-B24 / S-B29 attempted partial handling; those slices can stay or be reverted at your discretion when you reach §§4-7 of RdSAP and §3 of SAP 10.2.
  • PCDB (Product Characteristics Database). ADR-0009 says Session C. Heat pumps (cat=4) have catastrophic per-cert MAE because we use Table 4a fallback efficiency 2.30 instead of PCDB SCOP. There's a NoOpPcdbLookup stub seam ready in Session A; data fetch + parser is its own milestone.
  • SAP 10.3 (13-01-2026). The corpus is SAP 10.2. SAP 10.3 has identical Table 12 codes (only values shift). Don't update spec references to 10.3 until the corpus migrates.

5. The approach — section-by-section spec verification

Work through the RdSAP 10 spec in document order, starting at §1. For each section:

5.1. Read the spec section

Read the section text fully. Note every rule, table reference, and defaulting cascade.

5.2. Find the corresponding code

Map the section to the source file(s) implementing it. The current mapping (some sections are split across modules):

RdSAP 10 section Code location
§1 Introduction / general n/a
§2 Property descriptors datatypes/epc/domain/epc_property_data.py
§3 Dimensions packages/domain/src/domain/sap/worksheet/dimensions.py
§4 Ventilation packages/domain/src/domain/sap/worksheet/ventilation.py
§5 Construction / U-values packages/domain/src/domain/ml/rdsap_uvalues.py + worksheet/heat_transmission.py
§6 Windows / doors / overshading worksheet/solar_gains.py + rdsap/cert_to_inputs.py
§7 Heating systems (refers to SAP 10.2 Appendix A) domain.ml.sap_efficiencies + rdsap/cert_to_inputs.py
§8 Heating controls (Table 4e) rdsap/cert_to_inputs.py
§9 Heat emitters / flow temperatures not implemented
§10 Space and water heating (Appendix A) rdsap/cert_to_inputs.py
§11 Additional items (PV, batteries, wind, hydro, shutters) partial in cert_to_inputs.py (PV only)
§12 Electricity tariff rdsap/cert_to_inputs.py (_is_off_peak_meter, fuel routing)
§13 Addendum to EPCs n/a
§14 Special cases (e.g. flats above commercial) not implemented
§15 Improvements (recommendations) n/a (not rating)
§16-19 RdSAP-specific SAP rating equations worksheet/rating.py
Table 27 — Living-area fraction rdsap/cert_to_inputs.py:_living_area_fraction
Table 28 — Cylinder size defaults domain.ml.demand:_CYLINDER_VOLUME_L
Table 29 — Heating + HW parameters partial in cert_to_inputs.py
Table 30 — Mechanical ventilation not implemented
Table 31 — Data to be collected n/a

5.3. For each spec rule in the section, check our code

For each table, formula, footnote, exception:

  1. Does our code implement it?
  2. Does the implementation match the spec values exactly?
  3. Are there spec-defined edge cases / footnotes we're missing?

5.4. When a gap is found

  • Write a failing unit test that asserts the spec-correct behaviour.
  • Implement the fix.
  • Run all 7 golden fixtures plus the broader probe. Note both direction and magnitude of change.
  • If the fix is spec-correct but breaks a golden fixture, this is evidence that the fixture was a compensating-error case — proceed with the spec-correct fix and update the fixture (with a comment noting it was a compensating case).
  • Commit per-slice as before: one section → one commit. Reference the spec section in the commit message.

5.5. Use trace mode when you need it

ADR-0009 specifies a SapResult.intermediate: dict[str, float] field that was never populated. Adding this is highly recommended for the systematic pass — each section's verification benefits from inspecting the intermediate values. See §11 below for a sketch.


6. What's already been done — section by section

This is your starting map. Each row says whether the section has been touched and what the current state is.

Walls / construction (§5)

  • S-B23 (committed 9a509e41): Table 6 "Filled cavity" row dispatch when wall_insulation_type=2 AND wall_construction=4. Spec-anchored.
  • S-B24 (committed 15613309): Parse walls[i].description for "Average thermal transmittance X W/m²K". PARK — full-SAP path.
  • S-B25 (committed 6b934710): Description-based dispatch for cavity "as built, insulated (assumed)" + similar (type=4 with descriptive signal). Spec-anchored via legacy epc_wall_description_map.
  • S-B26 (committed 361f9154): _insulation_bucket(0, True) → 50 fix (the "NI" thickness sentinel) + description-based override of wall_ins_present for non-cavity walls. Spec footnote (Table 6).
  • S-B27 (committed 1f49fa03): Floor _insulation_bucket analog — Table 19 footnote (2) "max(50, age-band default)" when description signals retrofit.
  • S-B28 (committed 25261d5c): Roof NI thickness + insulated description → §5.11.4 footnote 50mm joist row.
  • S-B29 (committed 3ab09845): Floor + roof "Average thermal transmittance" parse. PARK — full-SAP path.

Still to verify in §5:

  • Stone wall U-values for Scotland / Wales / NIR (Tables 7-10) — only England is fully transcribed; country overrides are partial.
  • Cob U-values (§5.6) — table only, no formula implementation.
  • Stone formula §5.6 / §5.7 for non-standard wall thicknesses.
  • Curtain wall §5.18 — not implemented.
  • Party wall U-values (Table 15) — implemented in u_party_wall, verify table values.
  • Thermal bridging (Table 21) — implemented as global y factor, verify per-age-band values.
  • §5.16 Thermal mass — Table 22 (only 100 / 250 kJ/m²K, dispatched by construction type with internal insulation). Currently we hardcode 250 (see cert_to_inputs.py:_DEFAULT_THERMAL_MASS_PARAMETER_KJ_PER_M2_K). This is wrong for timber-frame / cob / internally-insulated masonry (should be 100).

Heating systems (§§7-10, SAP Appendix A)

  • S-B20 (in history): Table 11 secondary heating allocation, conditional on cert lodging secondary or being electric storage.
  • Failed S-B30 (reverted): respect main_heating_fraction — shown empirically wrong. Field is multi-main allocation, not main-vs-secondary. Spec verified against SAP 10.2 Appendix A1/A4.
  • S-B31 (committed afdf297f): Table 12c DLF on heat-network main. Spec §C3.1 + Table 12c.
  • Failed S-B32 (room heater off-peak routing, reverted): Table 12a says cat=10 room heaters on 7-hour tariff bill 100% high rate. Our cert-cal extends off-peak to codes 691-696. Spec-correct fix inverted bias direction — calibration was absorbing this.
  • Uncommitted HW cylinder fix: spec-correct (combi → zero storage/primary loss per Table 2 + Table 3 footers) but breaks 3 golden fixtures. Decision deferred to systematic pass.

Still to verify in heating:

  • Table 4a efficiency values for every code (heat pumps, storage heaters, room heaters, CPSU). The category-fallback (cat=4 → 2.30) is documented as a known limitation.
  • Boiler interlock penalty (5%) — spec §9.2.1: "The efficiency of gas and liquid fuel boilers for both space and water heating is reduced by 5% if the boiler is not interlocked for space and water heating." We don't apply this. Known gap.
  • Table 4c condensing-boiler / heat-pump emitter-temperature adjustment — we don't apply this.
  • Table 12a high-rate fractions for off-peak dwellings — we apply 100% off-peak or 100% standard, never fractional blending.

Hot water (§4 SAP + Appendix J)

  • Storage loss factor table (Table 2) — current values in domain.ml.demand:_STORAGE_LOSS_FACTOR are ~3× off from spec (verified). Known under-prediction of cylinder loss for storage systems; cancelled by over-prediction of primary loss for combi systems in aggregate.
  • Primary loss formula (Table 3) — implemented as 245/60 kWh by age band. Spec is a per-month formula nₘ × 14 × [{0.0091·p + 0.0245·(1-p)}·h + 0.0263] with p (pipework insulation fraction) and h (circulation hours). Known approximation.
  • Combi-boiler zero-loss rule (Table 2 + Table 3 footers) — currently NOT applied (the failed uncommitted slice). Adding this drops PE MAE 6.64 but raises SAP MAE +0.39.
  • Appendix J Vd formula 25N + 36 — currently the simple form, not the full per-component (shower / bath / other) breakdown. Useful HW demand is ~7% under spec value.
  • ΔT — currently 43°C constant (5512). Spec uses monthly Tcold and hot at 52°C, not 55°C. Per-month variance unmodelled.

Lighting (Appendix L)

  • predicted_lighting_kwh in domain.ml.demand uses 9.3 × TFA × (1 0.5·led_share 0.4·cfl_share) heuristic.
  • Spec is L1-L12: daylight correction, fixed-lighting capacity, top-up
    • portable shares, monthly profile.
  • For LED-dominant home (50+ LEDs): our heuristic gives ~465 kWh, spec gives ~94 kWh. Known over-prediction by ~5× for new-build LED homes.

Internal gains (§5 SAP)

  • worksheet/internal_gains.py implements metabolic + cooking + appliances + lighting (the four positive rows of Table 5).
  • Missing: Water heating row (1000 × (65)ₘ / (nₘ × 24) — i.e. HW losses recycled as heated-space gains) and Losses row (40 × N for cold inflow + evaporation). Both documented in S-B23 gap list.

Ventilation (§4 / Table 5)

  • Wind-shelter factor implemented in S-B21.
  • Mechanical ventilation (MVHR, MEV, PIV) — not implemented; cert rarely lodges. Spec §4.2 + Table 4g.
  • Pressure-test override (worksheet lines 17-18) — not implemented.

Tariff / cost (§12 + Table 12 / 12a / 12c)

  • Cert-calibration prices in domain.sap.tables.table_12_cert_calibration are an EMPIRICAL fit to Elmhurst's output. They are LOWER than the published Table 12 spec values by 4-25%. Known divergence; investigation deferred.
  • Standing charges (Table 12 note (a)) — NOT applied. Adding them empirically worsens MAE (calibration absorbs).
  • Table 12a high-rate fractions — currently 100% off-peak for E7- eligible codes, 100% standard otherwise. No fractional blending.
  • Heat network DLF (Table 12c) — applied per S-B31 only to main heating + HW from main. HW-only-from-heat-network is a separate slice.

7. The cert-calibration vs spec-correctness tension

This is THE central architectural decision you have to make as you work through the spec.

Two tables of fuel prices

  • domain.sap.tables.table_12.UNIT_PRICE_P_PER_KWH — SAP 10.2 spec values (3.64p gas, 16.49p standard elec).
  • domain.sap.tables.table_12_cert_calibration.UNIT_PRICE_P_PER_KWH — empirically lower values (3.48p gas, 13.19p elec) that match the cert assessor software's output.

Two possible end states for the calculator

End state A — Spec-perfect. Use spec prices, apply every spec rule (standing charges, Table 12a fractions, combi zero-loss, etc.). The calculator output is then what a correct SAP 10.2 implementation would produce. SAP MAE against the corpus will likely worsen because Elmhurst doesn't perfectly implement spec.

End state B — Elmhurst-perfect. Use cert-cal prices and reproduce Elmhurst's deviations exactly. The calculator output matches cert SAP scores. The calculator becomes a "reverse-engineered Elmhurst clone" rather than a SAP 10.2 implementation.

The pragmatic recommendation

Aim for state A but track state B as the parity probe. Concretely:

  1. Verify each spec section in isolation; fix spec violations regardless of MAE impact, but commit each fix WITH a measured probe delta in the commit message.
  2. After the spec sweep is complete, the calculator's output is spec-correct. The corpus residual at that point is Elmhurst's deviation from spec.
  3. THEN re-derive the cert-calibration prices to match Elmhurst's deviation pattern. The calibration becomes a thin Elmhurst- compatibility layer on top of a spec-correct engine.

This avoids the whack-a-mole problem because state A is unambiguous: each fix is either spec-correct or not. State B is iterative on top of state A, not entangled with it.


8. Don't repeat — known dead-ends

  • Switching "NI" wall thickness to None alone (S-B5 in history) — over-corrected because it routed to the (Unfilled cavity, 50mm) row instead of the dedicated Filled cavity row. The right fix landed in S-B23 with a WALL_INSULATION_FILLED_CAVITY dispatcher.
  • Aggressive efficiency rescue for missing sap_main_heating_code (S-B5) — over-corrected. The category fallback (cat=4 → 2.30) is intentionally conservative; PCDB is needed for real efficiency.
  • Using SAP 10.2 spec prices for parity validation — cert assessor uses lower prices despite reporting sap_version=10.2 (S-B9, S-B10). Use cert_calibration_prices() for the probe.
  • Always applying 10% secondary heating — must be conditional on cert lodging or main system being electric storage (S-B20). See spec Appendix A.4.
  • Respecting main_heating_fraction for secondary allocation (failed S-B30) — the field is the multi-main allocation (system 1 vs system 2), not main-vs-secondary. SAP MAE 4.69 → 4.85 (worse).
  • Switching cat=10 room heaters off off-peak (failed S-B32) — spec-correct per Table 12a but inverts bias direction. Cert-cal calibration absorbs the deviation.
  • Adding gas standing charges (4-mode probe, unimplemented) — spec-correct per Table 12 note (a) but pushes SAP bias from +0.98 to 2.62. Cert-cal calibration absorbs.
  • Zeroing storage + primary loss for combi boilers (uncommitted S-B32) — spec-correct per Table 2 + Table 3 footers and drops PE MAE 6.64 (huge win) BUT raises SAP MAE +0.39 and breaks 3 golden fixtures. Decision deferred to systematic pass.

9. The cert corpus and parity probe

Sample

data/ml_training/runs/2025_2026_n250000_v18a/data.parquet is the 250k-cert parquet. The probe filters to sap_score ∈ [5, 99] and samples 300 at seed=7 by default. Filtering rationale:

  • ≤ 5 is heritage/anomaly stock (sub-3% of corpus)
  • ≥ 99 is full-SAP new-builds the parquet excludes anyway

Run the probe

python -c "
import sys
sys.path.insert(0, 'packages/domain/src')
sys.path.insert(0, '.')
sys.path.insert(0, 'services/ml_training_data/src')
from ml_training_data.sap_parity_probe import main
main(['300','7'])
"

What the probe shows

  • Aggregate SAP MAE / RMSE / bias
  • Aggregate PE MAE / RMSE / bias
  • Per-end-use PEUI breakdown (space / HW / lighting / pumps)
  • Stratification by main_heating_category, construction_age_band, dwelling_type
  • Worst-15 residuals (SAP and PE)

Known parquet limitations

  • ~0.7% of parquet certs have construction_age_band=None vs 15% in the raw bulk-zip. The parquet filters out full-SAP new-builds upstream. Don't measure full-SAP-path slices against the parquet.
  • Heat-pump certs (cat=4) are under-represented and concentrated in the worst-residual tail because PCDB efficiency is unavailable.

10. The 7 golden fixtures

packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py locks 7 corpus certs as regression anchors:

Cert TFA Cat Notes
0240-0200-5706-2365-8010 202 2 Detached, age J, oil boiler, Table 4b code 130
0300-2747-7640-2526-2135 526 2 Semi-detached, age D, gas PCDB
0390-2954-3640-2196-4175 360 2 Detached, age F, oil PCDB
6035-7729-2309-0879-2296 128 2 Mid-terrace, age A, gas combi code 104
7536-3827-0600-0600-0276 152 2 Detached + extensions, age D, gas PCDB. Cleanest PE match (0.29 kWh/m²)
8135-1728-8500-0511-3296 102 2 Semi-detached, age C, gas PCDB
9390-2722-3520-2105-8715 75 6 Mid-floor flat, age D, heat network code 301

Tolerance: |SAP residual| ≤ 1, |PE residual| ≤ 10. Tighten as the spec sweep progresses.

The cert JSONs are stored under fixtures/golden/<cert>.json — frozen at extraction time so the test is reproducible without bulk-zip access. The probe extraction script for new fixtures is inlined in the test history (see commit f4a8d2a0).

Important caveat: some of these 7 are compensating-error matches (see §3). When a spec-correct slice breaks one, the fixture is probably the compensating case — investigate before reverting.


ADR-0009 proposed:

@dataclass(frozen=True)
class SapResult:
    sap_score: float
    ...
    intermediate: dict[str, float]

The intermediate field was never populated. Suggested implementation for the systematic pass:

intermediate = {
    # §1 dimensions
    "tfa_m2": tfa,
    "volume_m3": volume,
    "storey_count": storeys,
    # §3 heat transmission
    "walls_w_per_k": ht.walls_w_per_k,
    "roof_w_per_k": ht.roof_w_per_k,
    "floor_w_per_k": ht.floor_w_per_k,
    "party_walls_w_per_k": ht.party_walls_w_per_k,
    "windows_w_per_k": ht.windows_w_per_k,
    "doors_w_per_k": ht.doors_w_per_k,
    "thermal_bridging_w_per_k": ht.thermal_bridging_w_per_k,
    "infiltration_ach": infiltration,
    "infiltration_w_per_k": infiltration * volume * 0.33,
    "heat_transfer_coefficient_w_per_k": hlc,
    "heat_loss_parameter_w_per_m2k": hlp,
    "time_constant_h": tau_h,
    # §5 internal gains (annual averages)
    "internal_gains_annual_avg_w": ...,
    # §7 mean internal temperature (annual avg)
    "mean_internal_temp_annual_avg_c": ...,
    # §9 space heating
    "useful_space_heating_kwh_per_yr": space_heating_kwh,
    # §12 fuel costs (per end-use)
    "main_heating_cost_gbp": ...,
    "hot_water_cost_gbp": ...,
    "lighting_cost_gbp": ...,
    "pumps_fans_cost_gbp": ...,
    # §13 rating
    "ecf": ecf,
    "deflator": 0.36,
    # §14 primary energy and CO2 per end-use
    "space_heating_pe_kwh_per_m2": ...,
    "hot_water_pe_kwh_per_m2": ...,
    ...
}

Once populated, the differential debugging the reviewer recommended becomes possible: change one input field, compare deltas against an Elmhurst export.


12. Specific section-1 starting tasks (suggested first session)

A concrete pickup point:

Session 1 — §1 (Introduction), §2 (Property Descriptors), §3 (Dimensions)

  • §1 is prose; nothing to verify.
  • §2 maps to EpcPropertyData. Verify that every field RdSAP §2 enumerates is present and correctly typed on the domain object. Specifically check: dwelling_type, built_form, property_type, construction_age_band, country_code. Note that construction_age_band is per-building-part, not dwelling-level, and the primary age band drives most defaults.
  • §3 maps to worksheet/dimensions.py. Verify:
    • Total floor area sum across building parts equals TFA
    • Volume calculation per storey × area × height
    • Storey count handling for extensions and room-in-roof
    • Multi-storey heat-loss-perimeter rules

This single session should produce zero behaviour changes if §1-3 are correctly implemented, but expect to find at least one issue in §3 geometry (per the reviewer's "biggest SAP error sources" list).

Run the golden fixtures + probe at the end of each session; expect no movement until you start hitting actual gaps.


13. Workflow recap

For each section, in order:

  1. Read the spec section text + cited tables.
  2. Identify code location(s).
  3. For each rule / table / footnote:
    • Does our code implement it?
    • Does the implementation match?
    • Edge cases / fallback paths handled?
  4. For each gap: AAA unit test → minimal implementation → commit.
  5. After each commit: run golden fixtures (pytest test_golden_fixtures.py) and the parity probe. Note both deltas in the commit message.
  6. If a golden fixture breaks: investigate. Either fixture was a compensating case (acceptable to break) or the new code is wrong (revert).

Stick to this. The prior session's mistake was jumping between sections based on residual-size. Don't.


14. Useful references

  • ADR-0009 docs/adr/0009-deterministic-sap-calculator.md — decision rationale + Session A/B/C plan.
  • Spec coverage map docs/sap-spec/SPEC_COVERAGE.md — pre-existing coverage tracker. Update as you go.
  • Parity findings docs/sap-spec/PARITY_FINDINGS.md — empirical findings from prior sessions.
  • Earlier handover docs/sap-spec/HANDOVER_FRESH_REVIEW.md — orientation from the previous fresh-context pass.
  • Reviewer feedback (informal) — chatGPT critique of the slice-by- slice approach. Key recommendations: two-layer architecture (RdSAP expansion → SAP worksheet), trace mode, golden-master methodology, differential debugging, reference traces from Elmhurst/Stroma/Quidos.
  • Commit loggit log --oneline shows the slice history; each S-Bxx commit message documents the spec ref + measured impact.

15. Final note

The prior session demonstrated that moving SAP MAE down requires either spec-correctness OR Elmhurst-perfect calibration, not both simultaneously. The cert-cal layer absorbs Elmhurst's spec deviations; any spec-correct fix risks breaking it.

The systematic pass clears this by separating the layers:

  1. Build the spec-correct engine first.
  2. Re-fit the cert-cal compatibility layer once at the end.

Don't be discouraged when SAP MAE rises temporarily during the spec sweep. PE residual is the truer signal of engine correctness. SAP MAE convergence will follow once cert-cal is re-derived against the clean engine.

Welcome to the project. Read the spec, follow the order, commit one section at a time. The deterministic answer is in there.