The slice-by-slice "fix the biggest residual" approach has hit a
ceiling at SAP MAE ~4.6 because the cert-calibration prices absorb
multiple structural deviations from spec. Any spec-correct fix in one
component breaks the calibration for others. Three failed slices this
session (standing charges, cat=10 routing, combi zero-loss) made the
pattern unambiguous.
Pivot: systematic section-by-section spec verification. Read the
RdSAP 10 + SAP 10.2 spec in order, check each table / formula /
footnote against the corresponding code, fix gaps one at a time.
Build the spec-correct engine first; re-derive cert-cal calibration
once at the end as a thin Elmhurst-compatibility layer.
Handover doc covers:
- Critical framing (deterministic, not assessor judgement)
- Current state (SAP MAE 4.61, PE MAE 43.32 at f4a8d2a0)
- Why the slice-by-slice approach won't converge
- Scope decisions (RdSAP 10 + SAP 10.2 only; park full-SAP + PCDB)
- Section-to-code mapping
- Known dead-ends to skip
- Cert-calibration vs spec-correctness tension and how to resolve it
- The 7 golden fixtures and their compensating-error caveats
- Trace mode recommendation (ADR-0009's `intermediate` field)
- Specific §1-3 starting tasks
- Workflow recap
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
26 KiB
Handover — Systematic Section-by-Section RdSAP 10 / SAP 10.2 Review
Audience: A fresh agent picking up the deterministic SAP calculator at
packages/domain/src/domain/sap/. Read this first, then the spec PDFs,
then the code.
Goal: Match the cert software (Elmhurst / Stroma / etc.) output exactly for RdSAP 10 / SAP 10.2 input certs. This is a deterministic, mechanical calculation — not a model — so MAE should approach zero on certs whose inputs are fully populated.
1. Critical framing — this is NOT a judgement call
The SAP/RdSAP energy assessment splits cleanly into two roles:
- The assessor — a person who surveys the dwelling and lodges measured/observed fields onto the cert (areas, perimeters, construction codes, insulation thicknesses, fuel types, etc.). The assessor makes NO calculation decisions.
- The cert software (Elmhurst, Stroma, Quidos, NHER, ECMK) — a deterministic implementation of the RdSAP 10 + SAP 10.2 specs. It takes the lodged fields and produces SAP score, CO2 emissions, primary energy (PEUI), CO2 per m², EI rating, etc.
Our calculator is replicating role #2. Where Elmhurst's implementation diverges from spec, we follow Elmhurst, but we don't guess at divergence; we localise it via reference traces or empirically against the cert corpus.
There is no "assessor judgement" knob to tune. Each field on the cert has a deterministic interpretation per the spec. Each spec table / formula has a deterministic implementation. Our job is to enumerate all of them and verify each.
2. Current state (2026-05-19)
-
Branch:
ara-backend-design-prd -
Last clean commit:
f4a8d2a0("tests: golden-fixture regression set — 7 currently-correct corpus certs") -
301 tests passing
-
Parity probe (300 random certs from
data/ml_training/runs/2025_2026_n250000_v18a/data.parquet, seed=7,sap_score ∈ [5, 99]):Metric Value SAP MAE 4.61 SAP bias +0.87 PE MAE 43.32 kWh/m² PE bias +37.69 kWh/m² -
7 "golden" regression certs locked in
packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py. Tolerance:|SAP residual| ≤ 1,|PE residual| ≤ 10 kWh/m². Known caveat: some of these are compensating-error matches (e.g. cert7536-3827's PE matches but cost is £143 under cert's implied cost due to multi-factor offsetting bugs).
3. Why we are pivoting to systematic review
The prior session shipped ten slices (S-B23 → S-B31) by debugging the biggest residuals one at a time:
- PE MAE dropped substantially: 57.28 → 43.32 (−14) — real progress on the demand-side calculation.
- SAP MAE barely moved: 5.34 → 4.61 (−0.73) — the cost-side is bottlenecked by cert-calibration prices that absorb multiple structural deviations from spec, making any single slice that fixes one component break the calibration for others.
Two failed slice attempts in the prior session exposed the pattern:
- Standing charges: spec note Table 12 (a) clearly says gas standing charge of £92 is added to space + water heating costs for energy ratings. Empirically: adding it pushed SAP bias from +0.98 to −2.62. Reverted before committing.
- Cat=10 room heaters off-peak routing: Table 12a clearly says "Other direct-acting electric heating" bills 100% high rate on 7-hour tariff. Empirically: switching cat=10 from off-peak to standard rate inverted the bias from +5.88 to −6.00 without improving MAE. Reverted before committing.
- Hot water cylinder loss (uncommitted): spec Table 2 footer + Table 3 footer clearly say combi boilers using Table 4b efficiency have zero storage + primary loss. Empirically: zeroing them dropped PE MAE −6.64 (huge improvement) but raised SAP MAE +0.39 AND broke 3 of 7 golden fixtures. Reverted because no way to know whether to follow spec (PE-correct) or Elmhurst (SAP-MAE-correct) without reference traces.
The pattern: the cert-calibration prices (in
domain.sap.tables.table_12_cert_calibration) were reverse-engineered
to match Elmhurst's output assuming all our other calculations are
correct. When we fix a spec-violation bug in some other component, we
break the calibration and SAP MAE goes up even though we're more
spec-correct.
This means whack-a-mole on the biggest residual won't converge. We need to systematically verify every component against the spec, then re-derive the cert-calibration once at the end.
4. Scope decisions
IN scope
- RdSAP 10 specification (10-06-2025) — full document, all sections
(
docs/sap-spec/rdsap-10-specification-2025-06-10.pdf, 114 pages). - SAP 10.2 full specification (14-03-2025) — the worksheet, tables,
appendices that RdSAP 10 references
(
docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf, 199 pages).
OUT of scope (for now)
- Full SAP assessments. Full-SAP certs lodge a measured/calculated
U-value in
walls[i].description(e.g. "Average thermal transmittance 0.18 W/m²K"). These are a separate calculation path (BS EN ISO 6946) and a different corpus. Park them until the RdSAP 10 base case matches Elmhurst. S-B24 / S-B29 attempted partial handling; those slices can stay or be reverted at your discretion when you reach §§4-7 of RdSAP and §3 of SAP 10.2. - PCDB (Product Characteristics Database). ADR-0009 says Session C.
Heat pumps (cat=4) have catastrophic per-cert MAE because we use
Table 4a fallback efficiency 2.30 instead of PCDB SCOP. There's a
NoOpPcdbLookupstub seam ready in Session A; data fetch + parser is its own milestone. - SAP 10.3 (13-01-2026). The corpus is SAP 10.2. SAP 10.3 has identical Table 12 codes (only values shift). Don't update spec references to 10.3 until the corpus migrates.
5. The approach — section-by-section spec verification
Work through the RdSAP 10 spec in document order, starting at §1. For each section:
5.1. Read the spec section
Read the section text fully. Note every rule, table reference, and defaulting cascade.
5.2. Find the corresponding code
Map the section to the source file(s) implementing it. The current mapping (some sections are split across modules):
| RdSAP 10 section | Code location |
|---|---|
| §1 Introduction / general | n/a |
| §2 Property descriptors | datatypes/epc/domain/epc_property_data.py |
| §3 Dimensions | packages/domain/src/domain/sap/worksheet/dimensions.py |
| §4 Ventilation | packages/domain/src/domain/sap/worksheet/ventilation.py |
| §5 Construction / U-values | packages/domain/src/domain/ml/rdsap_uvalues.py + worksheet/heat_transmission.py |
| §6 Windows / doors / overshading | worksheet/solar_gains.py + rdsap/cert_to_inputs.py |
| §7 Heating systems (refers to SAP 10.2 Appendix A) | domain.ml.sap_efficiencies + rdsap/cert_to_inputs.py |
| §8 Heating controls (Table 4e) | rdsap/cert_to_inputs.py |
| §9 Heat emitters / flow temperatures | not implemented |
| §10 Space and water heating (Appendix A) | rdsap/cert_to_inputs.py |
| §11 Additional items (PV, batteries, wind, hydro, shutters) | partial in cert_to_inputs.py (PV only) |
| §12 Electricity tariff | rdsap/cert_to_inputs.py (_is_off_peak_meter, fuel routing) |
| §13 Addendum to EPCs | n/a |
| §14 Special cases (e.g. flats above commercial) | not implemented |
| §15 Improvements (recommendations) | n/a (not rating) |
| §16-19 RdSAP-specific SAP rating equations | worksheet/rating.py |
| Table 27 — Living-area fraction | rdsap/cert_to_inputs.py:_living_area_fraction |
| Table 28 — Cylinder size defaults | domain.ml.demand:_CYLINDER_VOLUME_L |
| Table 29 — Heating + HW parameters | partial in cert_to_inputs.py |
| Table 30 — Mechanical ventilation | not implemented |
| Table 31 — Data to be collected | n/a |
5.3. For each spec rule in the section, check our code
For each table, formula, footnote, exception:
- Does our code implement it?
- Does the implementation match the spec values exactly?
- Are there spec-defined edge cases / footnotes we're missing?
5.4. When a gap is found
- Write a failing unit test that asserts the spec-correct behaviour.
- Implement the fix.
- Run all 7 golden fixtures plus the broader probe. Note both direction and magnitude of change.
- If the fix is spec-correct but breaks a golden fixture, this is evidence that the fixture was a compensating-error case — proceed with the spec-correct fix and update the fixture (with a comment noting it was a compensating case).
- Commit per-slice as before: one section → one commit. Reference the spec section in the commit message.
5.5. Use trace mode when you need it
ADR-0009 specifies a SapResult.intermediate: dict[str, float] field
that was never populated. Adding this is highly recommended for the
systematic pass — each section's verification benefits from
inspecting the intermediate values. See §11 below for a sketch.
6. What's already been done — section by section
This is your starting map. Each row says whether the section has been touched and what the current state is.
Walls / construction (§5)
- S-B23 (committed
9a509e41): Table 6 "Filled cavity" row dispatch whenwall_insulation_type=2ANDwall_construction=4. Spec-anchored. - S-B24 (committed
15613309): Parsewalls[i].descriptionfor "Average thermal transmittance X W/m²K". PARK — full-SAP path. - S-B25 (committed
6b934710): Description-based dispatch for cavity "as built, insulated (assumed)" + similar (type=4 with descriptive signal). Spec-anchored via legacyepc_wall_description_map. - S-B26 (committed
361f9154):_insulation_bucket(0, True) → 50fix (the "NI" thickness sentinel) + description-based override ofwall_ins_presentfor non-cavity walls. Spec footnote (Table 6). - S-B27 (committed
1f49fa03): Floor_insulation_bucketanalog — Table 19 footnote (2) "max(50, age-band default)" when description signals retrofit. - S-B28 (committed
25261d5c): Roof NI thickness + insulated description → §5.11.4 footnote 50mm joist row. - S-B29 (committed
3ab09845): Floor + roof "Average thermal transmittance" parse. PARK — full-SAP path.
Still to verify in §5:
- Stone wall U-values for Scotland / Wales / NIR (Tables 7-10) — only England is fully transcribed; country overrides are partial.
- Cob U-values (§5.6) — table only, no formula implementation.
- Stone formula §5.6 / §5.7 for non-standard wall thicknesses.
- Curtain wall §5.18 — not implemented.
- Party wall U-values (Table 15) — implemented in
u_party_wall, verify table values. - Thermal bridging (Table 21) — implemented as global
yfactor, verify per-age-band values. - §5.16 Thermal mass — Table 22 (only 100 / 250 kJ/m²K, dispatched
by construction type with internal insulation). Currently we
hardcode 250 (see
cert_to_inputs.py:_DEFAULT_THERMAL_MASS_PARAMETER_KJ_PER_M2_K). This is wrong for timber-frame / cob / internally-insulated masonry (should be 100).
Heating systems (§§7-10, SAP Appendix A)
- S-B20 (in history): Table 11 secondary heating allocation, conditional on cert lodging secondary or being electric storage.
- Failed S-B30 (reverted): respect
main_heating_fraction— shown empirically wrong. Field is multi-main allocation, not main-vs-secondary. Spec verified against SAP 10.2 Appendix A1/A4. - S-B31 (committed
afdf297f): Table 12c DLF on heat-network main. Spec §C3.1 + Table 12c. - Failed S-B32 (room heater off-peak routing, reverted): Table 12a says cat=10 room heaters on 7-hour tariff bill 100% high rate. Our cert-cal extends off-peak to codes 691-696. Spec-correct fix inverted bias direction — calibration was absorbing this.
- Uncommitted HW cylinder fix: spec-correct (combi → zero storage/primary loss per Table 2 + Table 3 footers) but breaks 3 golden fixtures. Decision deferred to systematic pass.
Still to verify in heating:
- Table 4a efficiency values for every code (heat pumps, storage heaters, room heaters, CPSU). The category-fallback (cat=4 → 2.30) is documented as a known limitation.
- Boiler interlock penalty (−5%) — spec §9.2.1: "The efficiency of gas and liquid fuel boilers for both space and water heating is reduced by 5% if the boiler is not interlocked for space and water heating." We don't apply this. Known gap.
- Table 4c condensing-boiler / heat-pump emitter-temperature adjustment — we don't apply this.
- Table 12a high-rate fractions for off-peak dwellings — we apply 100% off-peak or 100% standard, never fractional blending.
Hot water (§4 SAP + Appendix J)
- Storage loss factor table (Table 2) — current values in
domain.ml.demand:_STORAGE_LOSS_FACTORare ~3× off from spec (verified). Known under-prediction of cylinder loss for storage systems; cancelled by over-prediction of primary loss for combi systems in aggregate. - Primary loss formula (Table 3) — implemented as 245/60 kWh by age
band. Spec is a per-month formula
nₘ × 14 × [{0.0091·p + 0.0245·(1-p)}·h + 0.0263]withp(pipework insulation fraction) andh(circulation hours). Known approximation. - Combi-boiler zero-loss rule (Table 2 + Table 3 footers) — currently NOT applied (the failed uncommitted slice). Adding this drops PE MAE −6.64 but raises SAP MAE +0.39.
- Appendix J Vd formula
25N + 36— currently the simple form, not the full per-component (shower / bath / other) breakdown. Useful HW demand is ~7% under spec value. - ΔT — currently 43°C constant (55−12). Spec uses monthly Tcold and hot at 52°C, not 55°C. Per-month variance unmodelled.
Lighting (Appendix L)
predicted_lighting_kwhindomain.ml.demanduses9.3 × TFA × (1 − 0.5·led_share − 0.4·cfl_share)heuristic.- Spec is L1-L12: daylight correction, fixed-lighting capacity, top-up
- portable shares, monthly profile.
- For LED-dominant home (50+ LEDs): our heuristic gives ~465 kWh, spec gives ~94 kWh. Known over-prediction by ~5× for new-build LED homes.
Internal gains (§5 SAP)
worksheet/internal_gains.pyimplements metabolic + cooking + appliances + lighting (the four positive rows of Table 5).- Missing: Water heating row (
1000 × (65)ₘ / (nₘ × 24)— i.e. HW losses recycled as heated-space gains) and Losses row (−40 × Nfor cold inflow + evaporation). Both documented in S-B23 gap list.
Ventilation (§4 / Table 5)
- Wind-shelter factor implemented in S-B21.
- Mechanical ventilation (MVHR, MEV, PIV) — not implemented; cert rarely lodges. Spec §4.2 + Table 4g.
- Pressure-test override (worksheet lines 17-18) — not implemented.
Tariff / cost (§12 + Table 12 / 12a / 12c)
- Cert-calibration prices in
domain.sap.tables.table_12_cert_calibrationare an EMPIRICAL fit to Elmhurst's output. They are LOWER than the published Table 12 spec values by 4-25%. Known divergence; investigation deferred. - Standing charges (Table 12 note (a)) — NOT applied. Adding them empirically worsens MAE (calibration absorbs).
- Table 12a high-rate fractions — currently 100% off-peak for E7- eligible codes, 100% standard otherwise. No fractional blending.
- Heat network DLF (Table 12c) — applied per S-B31 only to main heating + HW from main. HW-only-from-heat-network is a separate slice.
7. The cert-calibration vs spec-correctness tension
This is THE central architectural decision you have to make as you work through the spec.
Two tables of fuel prices
domain.sap.tables.table_12.UNIT_PRICE_P_PER_KWH— SAP 10.2 spec values (3.64p gas, 16.49p standard elec).domain.sap.tables.table_12_cert_calibration.UNIT_PRICE_P_PER_KWH— empirically lower values (3.48p gas, 13.19p elec) that match the cert assessor software's output.
Two possible end states for the calculator
End state A — Spec-perfect. Use spec prices, apply every spec rule (standing charges, Table 12a fractions, combi zero-loss, etc.). The calculator output is then what a correct SAP 10.2 implementation would produce. SAP MAE against the corpus will likely worsen because Elmhurst doesn't perfectly implement spec.
End state B — Elmhurst-perfect. Use cert-cal prices and reproduce Elmhurst's deviations exactly. The calculator output matches cert SAP scores. The calculator becomes a "reverse-engineered Elmhurst clone" rather than a SAP 10.2 implementation.
The pragmatic recommendation
Aim for state A but track state B as the parity probe. Concretely:
- Verify each spec section in isolation; fix spec violations regardless of MAE impact, but commit each fix WITH a measured probe delta in the commit message.
- After the spec sweep is complete, the calculator's output is spec-correct. The corpus residual at that point is Elmhurst's deviation from spec.
- THEN re-derive the cert-calibration prices to match Elmhurst's deviation pattern. The calibration becomes a thin Elmhurst- compatibility layer on top of a spec-correct engine.
This avoids the whack-a-mole problem because state A is unambiguous: each fix is either spec-correct or not. State B is iterative on top of state A, not entangled with it.
8. Don't repeat — known dead-ends
- ❌ Switching "NI" wall thickness to None alone (S-B5 in history) —
over-corrected because it routed to the (Unfilled cavity, 50mm) row
instead of the dedicated Filled cavity row. The right fix landed in
S-B23 with a
WALL_INSULATION_FILLED_CAVITYdispatcher. - ❌ Aggressive efficiency rescue for missing
sap_main_heating_code(S-B5) — over-corrected. The category fallback (cat=4 → 2.30) is intentionally conservative; PCDB is needed for real efficiency. - ❌ Using SAP 10.2 spec prices for parity validation — cert assessor
uses lower prices despite reporting
sap_version=10.2(S-B9, S-B10). Usecert_calibration_prices()for the probe. - ❌ Always applying 10% secondary heating — must be conditional on cert lodging or main system being electric storage (S-B20). See spec Appendix A.4.
- ❌ Respecting
main_heating_fractionfor secondary allocation (failed S-B30) — the field is the multi-main allocation (system 1 vs system 2), not main-vs-secondary. SAP MAE 4.69 → 4.85 (worse). - ❌ Switching cat=10 room heaters off off-peak (failed S-B32) — spec-correct per Table 12a but inverts bias direction. Cert-cal calibration absorbs the deviation.
- ❌ Adding gas standing charges (4-mode probe, unimplemented) — spec-correct per Table 12 note (a) but pushes SAP bias from +0.98 to −2.62. Cert-cal calibration absorbs.
- ❌ Zeroing storage + primary loss for combi boilers (uncommitted S-B32) — spec-correct per Table 2 + Table 3 footers and drops PE MAE −6.64 (huge win) BUT raises SAP MAE +0.39 and breaks 3 golden fixtures. Decision deferred to systematic pass.
9. The cert corpus and parity probe
Sample
data/ml_training/runs/2025_2026_n250000_v18a/data.parquet is the
250k-cert parquet. The probe filters to sap_score ∈ [5, 99] and
samples 300 at seed=7 by default. Filtering rationale:
- ≤ 5 is heritage/anomaly stock (sub-3% of corpus)
- ≥ 99 is full-SAP new-builds the parquet excludes anyway
Run the probe
python -c "
import sys
sys.path.insert(0, 'packages/domain/src')
sys.path.insert(0, '.')
sys.path.insert(0, 'services/ml_training_data/src')
from ml_training_data.sap_parity_probe import main
main(['300','7'])
"
What the probe shows
- Aggregate SAP MAE / RMSE / bias
- Aggregate PE MAE / RMSE / bias
- Per-end-use PEUI breakdown (space / HW / lighting / pumps)
- Stratification by
main_heating_category,construction_age_band,dwelling_type - Worst-15 residuals (SAP and PE)
Known parquet limitations
- ~0.7% of parquet certs have
construction_age_band=Nonevs 15% in the raw bulk-zip. The parquet filters out full-SAP new-builds upstream. Don't measure full-SAP-path slices against the parquet. - Heat-pump certs (cat=4) are under-represented and concentrated in the worst-residual tail because PCDB efficiency is unavailable.
10. The 7 golden fixtures
packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py
locks 7 corpus certs as regression anchors:
| Cert | TFA | Cat | Notes |
|---|---|---|---|
0240-0200-5706-2365-8010 |
202 | 2 | Detached, age J, oil boiler, Table 4b code 130 |
0300-2747-7640-2526-2135 |
526 | 2 | Semi-detached, age D, gas PCDB |
0390-2954-3640-2196-4175 |
360 | 2 | Detached, age F, oil PCDB |
6035-7729-2309-0879-2296 |
128 | 2 | Mid-terrace, age A, gas combi code 104 |
7536-3827-0600-0600-0276 |
152 | 2 | Detached + extensions, age D, gas PCDB. Cleanest PE match (−0.29 kWh/m²) |
8135-1728-8500-0511-3296 |
102 | 2 | Semi-detached, age C, gas PCDB |
9390-2722-3520-2105-8715 |
75 | 6 | Mid-floor flat, age D, heat network code 301 |
Tolerance: |SAP residual| ≤ 1, |PE residual| ≤ 10. Tighten as
the spec sweep progresses.
The cert JSONs are stored under fixtures/golden/<cert>.json —
frozen at extraction time so the test is reproducible without
bulk-zip access. The probe extraction script for new fixtures is
inlined in the test history (see commit f4a8d2a0).
Important caveat: some of these 7 are compensating-error matches (see §3). When a spec-correct slice breaks one, the fixture is probably the compensating case — investigate before reverting.
11. Trace mode (recommended infrastructure)
ADR-0009 proposed:
@dataclass(frozen=True)
class SapResult:
sap_score: float
...
intermediate: dict[str, float]
The intermediate field was never populated. Suggested implementation
for the systematic pass:
intermediate = {
# §1 dimensions
"tfa_m2": tfa,
"volume_m3": volume,
"storey_count": storeys,
# §3 heat transmission
"walls_w_per_k": ht.walls_w_per_k,
"roof_w_per_k": ht.roof_w_per_k,
"floor_w_per_k": ht.floor_w_per_k,
"party_walls_w_per_k": ht.party_walls_w_per_k,
"windows_w_per_k": ht.windows_w_per_k,
"doors_w_per_k": ht.doors_w_per_k,
"thermal_bridging_w_per_k": ht.thermal_bridging_w_per_k,
"infiltration_ach": infiltration,
"infiltration_w_per_k": infiltration * volume * 0.33,
"heat_transfer_coefficient_w_per_k": hlc,
"heat_loss_parameter_w_per_m2k": hlp,
"time_constant_h": tau_h,
# §5 internal gains (annual averages)
"internal_gains_annual_avg_w": ...,
# §7 mean internal temperature (annual avg)
"mean_internal_temp_annual_avg_c": ...,
# §9 space heating
"useful_space_heating_kwh_per_yr": space_heating_kwh,
# §12 fuel costs (per end-use)
"main_heating_cost_gbp": ...,
"hot_water_cost_gbp": ...,
"lighting_cost_gbp": ...,
"pumps_fans_cost_gbp": ...,
# §13 rating
"ecf": ecf,
"deflator": 0.36,
# §14 primary energy and CO2 per end-use
"space_heating_pe_kwh_per_m2": ...,
"hot_water_pe_kwh_per_m2": ...,
...
}
Once populated, the differential debugging the reviewer recommended becomes possible: change one input field, compare deltas against an Elmhurst export.
12. Specific section-1 starting tasks (suggested first session)
A concrete pickup point:
Session 1 — §1 (Introduction), §2 (Property Descriptors), §3 (Dimensions)
- §1 is prose; nothing to verify.
- §2 maps to
EpcPropertyData. Verify that every field RdSAP §2 enumerates is present and correctly typed on the domain object. Specifically check:dwelling_type,built_form,property_type,construction_age_band,country_code. Note thatconstruction_age_bandis per-building-part, not dwelling-level, and the primary age band drives most defaults. - §3 maps to
worksheet/dimensions.py. Verify:- Total floor area sum across building parts equals TFA
- Volume calculation per storey × area × height
- Storey count handling for extensions and room-in-roof
- Multi-storey heat-loss-perimeter rules
This single session should produce zero behaviour changes if §1-3 are correctly implemented, but expect to find at least one issue in §3 geometry (per the reviewer's "biggest SAP error sources" list).
Run the golden fixtures + probe at the end of each session; expect no movement until you start hitting actual gaps.
13. Workflow recap
For each section, in order:
- Read the spec section text + cited tables.
- Identify code location(s).
- For each rule / table / footnote:
- Does our code implement it?
- Does the implementation match?
- Edge cases / fallback paths handled?
- For each gap: AAA unit test → minimal implementation → commit.
- After each commit: run golden fixtures (
pytest test_golden_fixtures.py) and the parity probe. Note both deltas in the commit message. - If a golden fixture breaks: investigate. Either fixture was a compensating case (acceptable to break) or the new code is wrong (revert).
Stick to this. The prior session's mistake was jumping between sections based on residual-size. Don't.
14. Useful references
- ADR-0009
docs/adr/0009-deterministic-sap-calculator.md— decision rationale + Session A/B/C plan. - Spec coverage map
docs/sap-spec/SPEC_COVERAGE.md— pre-existing coverage tracker. Update as you go. - Parity findings
docs/sap-spec/PARITY_FINDINGS.md— empirical findings from prior sessions. - Earlier handover
docs/sap-spec/HANDOVER_FRESH_REVIEW.md— orientation from the previous fresh-context pass. - Reviewer feedback (informal) — chatGPT critique of the slice-by- slice approach. Key recommendations: two-layer architecture (RdSAP expansion → SAP worksheet), trace mode, golden-master methodology, differential debugging, reference traces from Elmhurst/Stroma/Quidos.
- Commit log —
git log --onelineshows the slice history; each S-Bxx commit message documents the spec ref + measured impact.
15. Final note
The prior session demonstrated that moving SAP MAE down requires either spec-correctness OR Elmhurst-perfect calibration, not both simultaneously. The cert-cal layer absorbs Elmhurst's spec deviations; any spec-correct fix risks breaking it.
The systematic pass clears this by separating the layers:
- Build the spec-correct engine first.
- Re-fit the cert-cal compatibility layer once at the end.
Don't be discouraged when SAP MAE rises temporarily during the spec sweep. PE residual is the truer signal of engine correctness. SAP MAE convergence will follow once cert-cal is re-derived against the clean engine.
Welcome to the project. Read the spec, follow the order, commit one section at a time. The deterministic answer is in there.