mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
docs: handover for systematic section-by-section RdSAP 10 review
The slice-by-slice "fix the biggest residual" approach has hit a
ceiling at SAP MAE ~4.6 because the cert-calibration prices absorb
multiple structural deviations from spec. Any spec-correct fix in one
component breaks the calibration for others. Three failed slices this
session (standing charges, cat=10 routing, combi zero-loss) made the
pattern unambiguous.
Pivot: systematic section-by-section spec verification. Read the
RdSAP 10 + SAP 10.2 spec in order, check each table / formula /
footnote against the corresponding code, fix gaps one at a time.
Build the spec-correct engine first; re-derive cert-cal calibration
once at the end as a thin Elmhurst-compatibility layer.
Handover doc covers:
- Critical framing (deterministic, not assessor judgement)
- Current state (SAP MAE 4.61, PE MAE 43.32 at f4a8d2a0)
- Why the slice-by-slice approach won't converge
- Scope decisions (RdSAP 10 + SAP 10.2 only; park full-SAP + PCDB)
- Section-to-code mapping
- Known dead-ends to skip
- Cert-calibration vs spec-correctness tension and how to resolve it
- The 7 golden fixtures and their compensating-error caveats
- Trace mode recommendation (ADR-0009's `intermediate` field)
- Specific §1-3 starting tasks
- Workflow recap
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
f4a8d2a017
commit
3363f63f5e
1 changed files with 615 additions and 0 deletions
615
docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md
Normal file
615
docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md
Normal file
|
|
@ -0,0 +1,615 @@
|
|||
# Handover — Systematic Section-by-Section RdSAP 10 / SAP 10.2 Review
|
||||
|
||||
**Audience:** A fresh agent picking up the deterministic SAP calculator at
|
||||
`packages/domain/src/domain/sap/`. Read this first, then the spec PDFs,
|
||||
then the code.
|
||||
|
||||
**Goal:** Match the cert software (Elmhurst / Stroma / etc.) output exactly
|
||||
for RdSAP 10 / SAP 10.2 input certs. This is a **deterministic, mechanical
|
||||
calculation** — not a model — so MAE should approach zero on certs whose
|
||||
inputs are fully populated.
|
||||
|
||||
---
|
||||
|
||||
## 1. Critical framing — this is NOT a judgement call
|
||||
|
||||
The SAP/RdSAP energy assessment splits cleanly into two roles:
|
||||
|
||||
1. **The assessor** — a person who surveys the dwelling and lodges
|
||||
measured/observed fields onto the cert (areas, perimeters,
|
||||
construction codes, insulation thicknesses, fuel types, etc.).
|
||||
The assessor makes NO calculation decisions.
|
||||
2. **The cert software** (Elmhurst, Stroma, Quidos, NHER, ECMK) — a
|
||||
deterministic implementation of the RdSAP 10 + SAP 10.2 specs. It
|
||||
takes the lodged fields and produces SAP score, CO2 emissions,
|
||||
primary energy (PEUI), CO2 per m², EI rating, etc.
|
||||
|
||||
**Our calculator is replicating role #2.** Where Elmhurst's
|
||||
implementation diverges from spec, we follow Elmhurst, but we don't
|
||||
guess at divergence; we localise it via reference traces or
|
||||
empirically against the cert corpus.
|
||||
|
||||
There is no "assessor judgement" knob to tune. Each field on the cert
|
||||
has a deterministic interpretation per the spec. Each spec table /
|
||||
formula has a deterministic implementation. Our job is to enumerate
|
||||
all of them and verify each.
|
||||
|
||||
---
|
||||
|
||||
## 2. Current state (2026-05-19)
|
||||
|
||||
- Branch: `ara-backend-design-prd`
|
||||
- Last clean commit: `f4a8d2a0` ("tests: golden-fixture regression set — 7 currently-correct corpus certs")
|
||||
- 301 tests passing
|
||||
- Parity probe (300 random certs from
|
||||
`data/ml_training/runs/2025_2026_n250000_v18a/data.parquet`, seed=7,
|
||||
`sap_score ∈ [5, 99]`):
|
||||
|
||||
| Metric | Value |
|
||||
|---|---|
|
||||
| SAP MAE | 4.61 |
|
||||
| SAP bias | +0.87 |
|
||||
| PE MAE | 43.32 kWh/m² |
|
||||
| PE bias | +37.69 kWh/m² |
|
||||
|
||||
- 7 "golden" regression certs locked in
|
||||
`packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`.
|
||||
Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10 kWh/m²`. Known
|
||||
caveat: some of these are compensating-error matches (e.g. cert
|
||||
`7536-3827`'s PE matches but cost is £143 under cert's implied cost
|
||||
due to multi-factor offsetting bugs).
|
||||
|
||||
---
|
||||
|
||||
## 3. Why we are pivoting to systematic review
|
||||
|
||||
The prior session shipped ten slices (S-B23 → S-B31) by debugging the
|
||||
biggest residuals one at a time:
|
||||
|
||||
- **PE MAE dropped substantially: 57.28 → 43.32 (−14)** — real progress
|
||||
on the demand-side calculation.
|
||||
- **SAP MAE barely moved: 5.34 → 4.61 (−0.73)** — the cost-side is
|
||||
bottlenecked by cert-calibration prices that absorb multiple
|
||||
structural deviations from spec, making any single slice that fixes
|
||||
one component break the calibration for others.
|
||||
|
||||
Two failed slice attempts in the prior session exposed the pattern:
|
||||
|
||||
- **Standing charges**: spec note Table 12 (a) clearly says gas standing
|
||||
charge of £92 is added to space + water heating costs for energy
|
||||
ratings. Empirically: adding it pushed SAP bias from +0.98 to −2.62.
|
||||
Reverted before committing.
|
||||
- **Cat=10 room heaters off-peak routing**: Table 12a clearly says
|
||||
"Other direct-acting electric heating" bills 100% high rate on
|
||||
7-hour tariff. Empirically: switching cat=10 from off-peak to
|
||||
standard rate inverted the bias from +5.88 to −6.00 without
|
||||
improving MAE. Reverted before committing.
|
||||
- **Hot water cylinder loss (uncommitted)**: spec Table 2 footer +
|
||||
Table 3 footer clearly say combi boilers using Table 4b efficiency
|
||||
have zero storage + primary loss. Empirically: zeroing them dropped
|
||||
PE MAE −6.64 (huge improvement) but raised SAP MAE +0.39 AND broke
|
||||
3 of 7 golden fixtures. Reverted because no way to know whether to
|
||||
follow spec (PE-correct) or Elmhurst (SAP-MAE-correct) without
|
||||
reference traces.
|
||||
|
||||
The pattern: **the cert-calibration prices** (in
|
||||
`domain.sap.tables.table_12_cert_calibration`) **were reverse-engineered
|
||||
to match Elmhurst's output assuming all our other calculations are
|
||||
correct.** When we fix a spec-violation bug in some other component, we
|
||||
break the calibration and SAP MAE goes up even though we're more
|
||||
spec-correct.
|
||||
|
||||
This means **whack-a-mole on the biggest residual won't converge**. We
|
||||
need to systematically verify every component against the spec, then
|
||||
re-derive the cert-calibration once at the end.
|
||||
|
||||
---
|
||||
|
||||
## 4. Scope decisions
|
||||
|
||||
### IN scope
|
||||
- **RdSAP 10 specification (10-06-2025)** — full document, all sections
|
||||
(`docs/sap-spec/rdsap-10-specification-2025-06-10.pdf`, 114 pages).
|
||||
- **SAP 10.2 full specification (14-03-2025)** — the worksheet, tables,
|
||||
appendices that RdSAP 10 references
|
||||
(`docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf`, 199 pages).
|
||||
|
||||
### OUT of scope (for now)
|
||||
- **Full SAP assessments.** Full-SAP certs lodge a measured/calculated
|
||||
U-value in `walls[i].description` (e.g.
|
||||
"Average thermal transmittance 0.18 W/m²K"). These are a separate
|
||||
calculation path (BS EN ISO 6946) and a different corpus. **Park them
|
||||
until the RdSAP 10 base case matches Elmhurst.** S-B24 / S-B29
|
||||
attempted partial handling; those slices can stay or be reverted at
|
||||
your discretion when you reach §§4-7 of RdSAP and §3 of SAP 10.2.
|
||||
- **PCDB (Product Characteristics Database).** ADR-0009 says Session C.
|
||||
Heat pumps (cat=4) have catastrophic per-cert MAE because we use
|
||||
Table 4a fallback efficiency 2.30 instead of PCDB SCOP. There's a
|
||||
`NoOpPcdbLookup` stub seam ready in Session A; data fetch + parser
|
||||
is its own milestone.
|
||||
- **SAP 10.3** (13-01-2026). The corpus is SAP 10.2. SAP 10.3 has
|
||||
identical Table 12 codes (only values shift). Don't update spec
|
||||
references to 10.3 until the corpus migrates.
|
||||
|
||||
---
|
||||
|
||||
## 5. The approach — section-by-section spec verification
|
||||
|
||||
Work through the RdSAP 10 spec **in document order**, starting at
|
||||
§1. For each section:
|
||||
|
||||
### 5.1. Read the spec section
|
||||
Read the section text fully. Note every rule, table reference, and
|
||||
defaulting cascade.
|
||||
|
||||
### 5.2. Find the corresponding code
|
||||
Map the section to the source file(s) implementing it. The current
|
||||
mapping (some sections are split across modules):
|
||||
|
||||
| RdSAP 10 section | Code location |
|
||||
|---|---|
|
||||
| §1 Introduction / general | n/a |
|
||||
| §2 Property descriptors | `datatypes/epc/domain/epc_property_data.py` |
|
||||
| §3 Dimensions | `packages/domain/src/domain/sap/worksheet/dimensions.py` |
|
||||
| §4 Ventilation | `packages/domain/src/domain/sap/worksheet/ventilation.py` |
|
||||
| §5 Construction / U-values | `packages/domain/src/domain/ml/rdsap_uvalues.py` + `worksheet/heat_transmission.py` |
|
||||
| §6 Windows / doors / overshading | `worksheet/solar_gains.py` + `rdsap/cert_to_inputs.py` |
|
||||
| §7 Heating systems (refers to SAP 10.2 Appendix A) | `domain.ml.sap_efficiencies` + `rdsap/cert_to_inputs.py` |
|
||||
| §8 Heating controls (Table 4e) | `rdsap/cert_to_inputs.py` |
|
||||
| §9 Heat emitters / flow temperatures | not implemented |
|
||||
| §10 Space and water heating (Appendix A) | `rdsap/cert_to_inputs.py` |
|
||||
| §11 Additional items (PV, batteries, wind, hydro, shutters) | partial in `cert_to_inputs.py` (PV only) |
|
||||
| §12 Electricity tariff | `rdsap/cert_to_inputs.py` (`_is_off_peak_meter`, fuel routing) |
|
||||
| §13 Addendum to EPCs | n/a |
|
||||
| §14 Special cases (e.g. flats above commercial) | not implemented |
|
||||
| §15 Improvements (recommendations) | n/a (not rating) |
|
||||
| §16-19 RdSAP-specific SAP rating equations | `worksheet/rating.py` |
|
||||
| Table 27 — Living-area fraction | `rdsap/cert_to_inputs.py:_living_area_fraction` |
|
||||
| Table 28 — Cylinder size defaults | `domain.ml.demand:_CYLINDER_VOLUME_L` |
|
||||
| Table 29 — Heating + HW parameters | partial in `cert_to_inputs.py` |
|
||||
| Table 30 — Mechanical ventilation | not implemented |
|
||||
| Table 31 — Data to be collected | n/a |
|
||||
|
||||
### 5.3. For each spec rule in the section, check our code
|
||||
For each table, formula, footnote, exception:
|
||||
|
||||
1. Does our code implement it?
|
||||
2. Does the implementation match the spec values exactly?
|
||||
3. Are there spec-defined edge cases / footnotes we're missing?
|
||||
|
||||
### 5.4. When a gap is found
|
||||
- Write a failing unit test that asserts the spec-correct behaviour.
|
||||
- Implement the fix.
|
||||
- Run **all 7 golden fixtures** plus the broader probe. Note both
|
||||
direction and magnitude of change.
|
||||
- If the fix is spec-correct but breaks a golden fixture, this is
|
||||
evidence that the fixture was a compensating-error case — proceed
|
||||
with the spec-correct fix and update the fixture (with a comment
|
||||
noting it was a compensating case).
|
||||
- Commit per-slice as before: one section → one commit. Reference the
|
||||
spec section in the commit message.
|
||||
|
||||
### 5.5. Use trace mode when you need it
|
||||
ADR-0009 specifies a `SapResult.intermediate: dict[str, float]` field
|
||||
that was never populated. Adding this is highly recommended for the
|
||||
systematic pass — each section's verification benefits from
|
||||
inspecting the intermediate values. See §11 below for a sketch.
|
||||
|
||||
---
|
||||
|
||||
## 6. What's already been done — section by section
|
||||
|
||||
This is your starting map. Each row says whether the section has been
|
||||
touched and what the current state is.
|
||||
|
||||
### Walls / construction (§5)
|
||||
- **S-B23 (committed `9a509e41`)**: Table 6 "Filled cavity" row dispatch
|
||||
when `wall_insulation_type=2` AND `wall_construction=4`. Spec-anchored.
|
||||
- **S-B24 (committed `15613309`)**: Parse `walls[i].description` for
|
||||
"Average thermal transmittance X W/m²K". **PARK** — full-SAP path.
|
||||
- **S-B25 (committed `6b934710`)**: Description-based dispatch for cavity
|
||||
"as built, insulated (assumed)" + similar (type=4 with descriptive
|
||||
signal). Spec-anchored via legacy `epc_wall_description_map`.
|
||||
- **S-B26 (committed `361f9154`)**: `_insulation_bucket(0, True) → 50`
|
||||
fix (the "NI" thickness sentinel) + description-based override of
|
||||
`wall_ins_present` for non-cavity walls. Spec footnote (Table 6).
|
||||
- **S-B27 (committed `1f49fa03`)**: Floor `_insulation_bucket` analog —
|
||||
Table 19 footnote (2) "max(50, age-band default)" when description
|
||||
signals retrofit.
|
||||
- **S-B28 (committed `25261d5c`)**: Roof NI thickness + insulated
|
||||
description → §5.11.4 footnote 50mm joist row.
|
||||
- **S-B29 (committed `3ab09845`)**: Floor + roof "Average thermal
|
||||
transmittance" parse. **PARK** — full-SAP path.
|
||||
|
||||
**Still to verify in §5**:
|
||||
- Stone wall U-values for Scotland / Wales / NIR (Tables 7-10) — only
|
||||
England is fully transcribed; country overrides are partial.
|
||||
- Cob U-values (§5.6) — table only, no formula implementation.
|
||||
- Stone formula §5.6 / §5.7 for non-standard wall thicknesses.
|
||||
- Curtain wall §5.18 — not implemented.
|
||||
- Party wall U-values (Table 15) — implemented in `u_party_wall`,
|
||||
verify table values.
|
||||
- Thermal bridging (Table 21) — implemented as global `y` factor,
|
||||
verify per-age-band values.
|
||||
- §5.16 Thermal mass — Table 22 (only 100 / 250 kJ/m²K, dispatched
|
||||
by construction type with internal insulation). Currently we
|
||||
hardcode 250 (see `cert_to_inputs.py:_DEFAULT_THERMAL_MASS_PARAMETER_KJ_PER_M2_K`).
|
||||
This is wrong for timber-frame / cob / internally-insulated masonry
|
||||
(should be 100).
|
||||
|
||||
### Heating systems (§§7-10, SAP Appendix A)
|
||||
- **S-B20 (in history)**: Table 11 secondary heating allocation,
|
||||
conditional on cert lodging secondary or being electric storage.
|
||||
- **Failed S-B30 (reverted)**: respect `main_heating_fraction` —
|
||||
shown empirically wrong. Field is multi-main allocation, not
|
||||
main-vs-secondary. Spec verified against SAP 10.2 Appendix A1/A4.
|
||||
- **S-B31 (committed `afdf297f`)**: Table 12c DLF on heat-network main.
|
||||
Spec §C3.1 + Table 12c.
|
||||
- **Failed S-B32 (room heater off-peak routing, reverted)**: Table 12a
|
||||
says cat=10 room heaters on 7-hour tariff bill 100% high rate. Our
|
||||
cert-cal extends off-peak to codes 691-696. Spec-correct fix
|
||||
inverted bias direction — calibration was absorbing this.
|
||||
- **Uncommitted HW cylinder fix**: spec-correct (combi → zero
|
||||
storage/primary loss per Table 2 + Table 3 footers) but breaks 3
|
||||
golden fixtures. Decision deferred to systematic pass.
|
||||
|
||||
**Still to verify in heating**:
|
||||
- Table 4a efficiency values for every code (heat pumps, storage
|
||||
heaters, room heaters, CPSU). The category-fallback (cat=4 → 2.30)
|
||||
is documented as a known limitation.
|
||||
- Boiler interlock penalty (−5%) — spec §9.2.1: "The efficiency of
|
||||
gas and liquid fuel boilers for both space and water heating is
|
||||
reduced by 5% if the boiler is not interlocked for space and water
|
||||
heating." We don't apply this. Known gap.
|
||||
- Table 4c condensing-boiler / heat-pump emitter-temperature
|
||||
adjustment — we don't apply this.
|
||||
- Table 12a high-rate fractions for off-peak dwellings — we apply
|
||||
100% off-peak or 100% standard, never fractional blending.
|
||||
|
||||
### Hot water (§4 SAP + Appendix J)
|
||||
- Storage loss factor table (Table 2) — current values in
|
||||
`domain.ml.demand:_STORAGE_LOSS_FACTOR` are ~3× off from spec
|
||||
(verified). Known under-prediction of cylinder loss for storage
|
||||
systems; cancelled by over-prediction of primary loss for combi
|
||||
systems in aggregate.
|
||||
- Primary loss formula (Table 3) — implemented as 245/60 kWh by age
|
||||
band. Spec is a per-month formula `nₘ × 14 × [{0.0091·p + 0.0245·(1-p)}·h + 0.0263]`
|
||||
with `p` (pipework insulation fraction) and `h` (circulation hours).
|
||||
Known approximation.
|
||||
- Combi-boiler zero-loss rule (Table 2 + Table 3 footers) — currently
|
||||
NOT applied (the failed uncommitted slice). Adding this drops PE
|
||||
MAE −6.64 but raises SAP MAE +0.39.
|
||||
- Appendix J Vd formula `25N + 36` — currently the simple form, not
|
||||
the full per-component (shower / bath / other) breakdown. Useful
|
||||
HW demand is ~7% under spec value.
|
||||
- ΔT — currently 43°C constant (55−12). Spec uses monthly Tcold and
|
||||
hot at 52°C, not 55°C. Per-month variance unmodelled.
|
||||
|
||||
### Lighting (Appendix L)
|
||||
- `predicted_lighting_kwh` in `domain.ml.demand` uses `9.3 × TFA ×
|
||||
(1 − 0.5·led_share − 0.4·cfl_share)` heuristic.
|
||||
- Spec is L1-L12: daylight correction, fixed-lighting capacity, top-up
|
||||
+ portable shares, monthly profile.
|
||||
- For LED-dominant home (50+ LEDs): our heuristic gives ~465 kWh, spec
|
||||
gives ~94 kWh. Known over-prediction by ~5× for new-build LED homes.
|
||||
|
||||
### Internal gains (§5 SAP)
|
||||
- `worksheet/internal_gains.py` implements metabolic + cooking +
|
||||
appliances + lighting (the four positive rows of Table 5).
|
||||
- **Missing**: Water heating row (`1000 × (65)ₘ / (nₘ × 24)` — i.e.
|
||||
HW losses recycled as heated-space gains) and Losses row (`−40 × N`
|
||||
for cold inflow + evaporation). Both documented in S-B23 gap list.
|
||||
|
||||
### Ventilation (§4 / Table 5)
|
||||
- Wind-shelter factor implemented in S-B21.
|
||||
- Mechanical ventilation (MVHR, MEV, PIV) — not implemented; cert
|
||||
rarely lodges. Spec §4.2 + Table 4g.
|
||||
- Pressure-test override (worksheet lines 17-18) — not implemented.
|
||||
|
||||
### Tariff / cost (§12 + Table 12 / 12a / 12c)
|
||||
- Cert-calibration prices in
|
||||
`domain.sap.tables.table_12_cert_calibration` are an EMPIRICAL fit
|
||||
to Elmhurst's output. They are LOWER than the published Table 12
|
||||
spec values by 4-25%. Known divergence; investigation deferred.
|
||||
- Standing charges (Table 12 note (a)) — NOT applied. Adding them
|
||||
empirically worsens MAE (calibration absorbs).
|
||||
- Table 12a high-rate fractions — currently 100% off-peak for E7-
|
||||
eligible codes, 100% standard otherwise. No fractional blending.
|
||||
- Heat network DLF (Table 12c) — applied per S-B31 only to main
|
||||
heating + HW from main. HW-only-from-heat-network is a separate slice.
|
||||
|
||||
---
|
||||
|
||||
## 7. The cert-calibration vs spec-correctness tension
|
||||
|
||||
This is THE central architectural decision you have to make as you
|
||||
work through the spec.
|
||||
|
||||
### Two tables of fuel prices
|
||||
- `domain.sap.tables.table_12.UNIT_PRICE_P_PER_KWH` — SAP 10.2 spec
|
||||
values (3.64p gas, 16.49p standard elec).
|
||||
- `domain.sap.tables.table_12_cert_calibration.UNIT_PRICE_P_PER_KWH`
|
||||
— empirically lower values (3.48p gas, 13.19p elec) that match the
|
||||
cert assessor software's output.
|
||||
|
||||
### Two possible end states for the calculator
|
||||
|
||||
**End state A — Spec-perfect.** Use spec prices, apply every spec rule
|
||||
(standing charges, Table 12a fractions, combi zero-loss, etc.). The
|
||||
calculator output is then what a *correct SAP 10.2 implementation*
|
||||
would produce. SAP MAE against the corpus will likely worsen because
|
||||
Elmhurst doesn't perfectly implement spec.
|
||||
|
||||
**End state B — Elmhurst-perfect.** Use cert-cal prices and reproduce
|
||||
Elmhurst's deviations exactly. The calculator output matches cert
|
||||
SAP scores. The calculator becomes a "reverse-engineered Elmhurst
|
||||
clone" rather than a SAP 10.2 implementation.
|
||||
|
||||
### The pragmatic recommendation
|
||||
|
||||
**Aim for state A but track state B as the parity probe.** Concretely:
|
||||
|
||||
1. Verify each spec section in isolation; fix spec violations
|
||||
regardless of MAE impact, but commit each fix WITH a measured
|
||||
probe delta in the commit message.
|
||||
2. After the spec sweep is complete, the calculator's output is
|
||||
spec-correct. The corpus residual at that point is Elmhurst's
|
||||
deviation from spec.
|
||||
3. THEN re-derive the cert-calibration prices to match Elmhurst's
|
||||
deviation pattern. The calibration becomes a thin Elmhurst-
|
||||
compatibility layer on top of a spec-correct engine.
|
||||
|
||||
This avoids the whack-a-mole problem because state A is unambiguous:
|
||||
each fix is either spec-correct or not. State B is iterative on top
|
||||
of state A, not entangled with it.
|
||||
|
||||
---
|
||||
|
||||
## 8. Don't repeat — known dead-ends
|
||||
|
||||
- ❌ **Switching "NI" wall thickness to None alone** (S-B5 in history) —
|
||||
over-corrected because it routed to the (Unfilled cavity, 50mm) row
|
||||
instead of the dedicated Filled cavity row. The right fix landed in
|
||||
S-B23 with a `WALL_INSULATION_FILLED_CAVITY` dispatcher.
|
||||
- ❌ **Aggressive efficiency rescue for missing `sap_main_heating_code`**
|
||||
(S-B5) — over-corrected. The category fallback (cat=4 → 2.30) is
|
||||
intentionally conservative; PCDB is needed for real efficiency.
|
||||
- ❌ **Using SAP 10.2 spec prices for parity validation** — cert assessor
|
||||
uses lower prices despite reporting `sap_version=10.2` (S-B9, S-B10).
|
||||
Use `cert_calibration_prices()` for the probe.
|
||||
- ❌ **Always applying 10% secondary heating** — must be conditional on
|
||||
cert lodging or main system being electric storage (S-B20). See
|
||||
spec Appendix A.4.
|
||||
- ❌ **Respecting `main_heating_fraction` for secondary allocation**
|
||||
(failed S-B30) — the field is the multi-main allocation (system 1 vs
|
||||
system 2), not main-vs-secondary. SAP MAE 4.69 → 4.85 (worse).
|
||||
- ❌ **Switching cat=10 room heaters off off-peak** (failed S-B32) —
|
||||
spec-correct per Table 12a but inverts bias direction. Cert-cal
|
||||
calibration absorbs the deviation.
|
||||
- ❌ **Adding gas standing charges** (4-mode probe, unimplemented) —
|
||||
spec-correct per Table 12 note (a) but pushes SAP bias from +0.98
|
||||
to −2.62. Cert-cal calibration absorbs.
|
||||
- ❌ **Zeroing storage + primary loss for combi boilers** (uncommitted
|
||||
S-B32) — spec-correct per Table 2 + Table 3 footers and drops PE
|
||||
MAE −6.64 (huge win) BUT raises SAP MAE +0.39 and breaks 3 golden
|
||||
fixtures. Decision deferred to systematic pass.
|
||||
|
||||
---
|
||||
|
||||
## 9. The cert corpus and parity probe
|
||||
|
||||
### Sample
|
||||
`data/ml_training/runs/2025_2026_n250000_v18a/data.parquet` is the
|
||||
250k-cert parquet. The probe filters to `sap_score ∈ [5, 99]` and
|
||||
samples 300 at seed=7 by default. Filtering rationale:
|
||||
- ≤ 5 is heritage/anomaly stock (sub-3% of corpus)
|
||||
- ≥ 99 is full-SAP new-builds the parquet excludes anyway
|
||||
|
||||
### Run the probe
|
||||
```bash
|
||||
python -c "
|
||||
import sys
|
||||
sys.path.insert(0, 'packages/domain/src')
|
||||
sys.path.insert(0, '.')
|
||||
sys.path.insert(0, 'services/ml_training_data/src')
|
||||
from ml_training_data.sap_parity_probe import main
|
||||
main(['300','7'])
|
||||
"
|
||||
```
|
||||
|
||||
### What the probe shows
|
||||
- Aggregate SAP MAE / RMSE / bias
|
||||
- Aggregate PE MAE / RMSE / bias
|
||||
- Per-end-use PEUI breakdown (space / HW / lighting / pumps)
|
||||
- Stratification by `main_heating_category`, `construction_age_band`,
|
||||
`dwelling_type`
|
||||
- Worst-15 residuals (SAP and PE)
|
||||
|
||||
### Known parquet limitations
|
||||
- ~0.7% of parquet certs have `construction_age_band=None` vs 15% in
|
||||
the raw bulk-zip. The parquet filters out full-SAP new-builds
|
||||
upstream. Don't measure full-SAP-path slices against the parquet.
|
||||
- Heat-pump certs (cat=4) are under-represented and concentrated in
|
||||
the worst-residual tail because PCDB efficiency is unavailable.
|
||||
|
||||
---
|
||||
|
||||
## 10. The 7 golden fixtures
|
||||
|
||||
`packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`
|
||||
locks 7 corpus certs as regression anchors:
|
||||
|
||||
| Cert | TFA | Cat | Notes |
|
||||
|---|---|---|---|
|
||||
| `0240-0200-5706-2365-8010` | 202 | 2 | Detached, age J, oil boiler, Table 4b code 130 |
|
||||
| `0300-2747-7640-2526-2135` | 526 | 2 | Semi-detached, age D, gas PCDB |
|
||||
| `0390-2954-3640-2196-4175` | 360 | 2 | Detached, age F, oil PCDB |
|
||||
| `6035-7729-2309-0879-2296` | 128 | 2 | Mid-terrace, age A, gas combi code 104 |
|
||||
| `7536-3827-0600-0600-0276` | 152 | 2 | Detached + extensions, age D, gas PCDB. Cleanest PE match (−0.29 kWh/m²) |
|
||||
| `8135-1728-8500-0511-3296` | 102 | 2 | Semi-detached, age C, gas PCDB |
|
||||
| `9390-2722-3520-2105-8715` | 75 | 6 | Mid-floor flat, age D, heat network code 301 |
|
||||
|
||||
Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10`. **Tighten as
|
||||
the spec sweep progresses.**
|
||||
|
||||
The cert JSONs are stored under `fixtures/golden/<cert>.json` —
|
||||
frozen at extraction time so the test is reproducible without
|
||||
bulk-zip access. The probe extraction script for new fixtures is
|
||||
inlined in the test history (see commit `f4a8d2a0`).
|
||||
|
||||
**Important caveat**: some of these 7 are compensating-error matches
|
||||
(see §3). When a spec-correct slice breaks one, the fixture is
|
||||
probably the compensating case — investigate before reverting.
|
||||
|
||||
---
|
||||
|
||||
## 11. Trace mode (recommended infrastructure)
|
||||
|
||||
ADR-0009 proposed:
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class SapResult:
|
||||
sap_score: float
|
||||
...
|
||||
intermediate: dict[str, float]
|
||||
```
|
||||
|
||||
The `intermediate` field was never populated. Suggested implementation
|
||||
for the systematic pass:
|
||||
|
||||
```python
|
||||
intermediate = {
|
||||
# §1 dimensions
|
||||
"tfa_m2": tfa,
|
||||
"volume_m3": volume,
|
||||
"storey_count": storeys,
|
||||
# §3 heat transmission
|
||||
"walls_w_per_k": ht.walls_w_per_k,
|
||||
"roof_w_per_k": ht.roof_w_per_k,
|
||||
"floor_w_per_k": ht.floor_w_per_k,
|
||||
"party_walls_w_per_k": ht.party_walls_w_per_k,
|
||||
"windows_w_per_k": ht.windows_w_per_k,
|
||||
"doors_w_per_k": ht.doors_w_per_k,
|
||||
"thermal_bridging_w_per_k": ht.thermal_bridging_w_per_k,
|
||||
"infiltration_ach": infiltration,
|
||||
"infiltration_w_per_k": infiltration * volume * 0.33,
|
||||
"heat_transfer_coefficient_w_per_k": hlc,
|
||||
"heat_loss_parameter_w_per_m2k": hlp,
|
||||
"time_constant_h": tau_h,
|
||||
# §5 internal gains (annual averages)
|
||||
"internal_gains_annual_avg_w": ...,
|
||||
# §7 mean internal temperature (annual avg)
|
||||
"mean_internal_temp_annual_avg_c": ...,
|
||||
# §9 space heating
|
||||
"useful_space_heating_kwh_per_yr": space_heating_kwh,
|
||||
# §12 fuel costs (per end-use)
|
||||
"main_heating_cost_gbp": ...,
|
||||
"hot_water_cost_gbp": ...,
|
||||
"lighting_cost_gbp": ...,
|
||||
"pumps_fans_cost_gbp": ...,
|
||||
# §13 rating
|
||||
"ecf": ecf,
|
||||
"deflator": 0.36,
|
||||
# §14 primary energy and CO2 per end-use
|
||||
"space_heating_pe_kwh_per_m2": ...,
|
||||
"hot_water_pe_kwh_per_m2": ...,
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Once populated, the differential debugging the reviewer recommended
|
||||
becomes possible: change one input field, compare deltas against an
|
||||
Elmhurst export.
|
||||
|
||||
---
|
||||
|
||||
## 12. Specific section-1 starting tasks (suggested first session)
|
||||
|
||||
A concrete pickup point:
|
||||
|
||||
### Session 1 — §1 (Introduction), §2 (Property Descriptors), §3 (Dimensions)
|
||||
- §1 is prose; nothing to verify.
|
||||
- §2 maps to `EpcPropertyData`. Verify that every field RdSAP §2
|
||||
enumerates is present and correctly typed on the domain object.
|
||||
Specifically check: `dwelling_type`, `built_form`, `property_type`,
|
||||
`construction_age_band`, `country_code`. Note that
|
||||
`construction_age_band` is per-building-part, not dwelling-level,
|
||||
and the primary age band drives most defaults.
|
||||
- §3 maps to `worksheet/dimensions.py`. Verify:
|
||||
- Total floor area sum across building parts equals TFA
|
||||
- Volume calculation per storey × area × height
|
||||
- Storey count handling for extensions and room-in-roof
|
||||
- Multi-storey heat-loss-perimeter rules
|
||||
|
||||
This single session should produce zero behaviour changes if §1-3 are
|
||||
correctly implemented, but expect to find at least one issue in §3
|
||||
geometry (per the reviewer's "biggest SAP error sources" list).
|
||||
|
||||
Run the golden fixtures + probe at the end of each session; expect no
|
||||
movement until you start hitting actual gaps.
|
||||
|
||||
---
|
||||
|
||||
## 13. Workflow recap
|
||||
|
||||
For each section, in order:
|
||||
|
||||
1. Read the spec section text + cited tables.
|
||||
2. Identify code location(s).
|
||||
3. For each rule / table / footnote:
|
||||
- Does our code implement it?
|
||||
- Does the implementation match?
|
||||
- Edge cases / fallback paths handled?
|
||||
4. For each gap: AAA unit test → minimal implementation → commit.
|
||||
5. After each commit: run golden fixtures (`pytest test_golden_fixtures.py`)
|
||||
and the parity probe. Note both deltas in the commit message.
|
||||
6. If a golden fixture breaks: investigate. Either fixture was a
|
||||
compensating case (acceptable to break) or the new code is wrong
|
||||
(revert).
|
||||
|
||||
Stick to this. The prior session's mistake was jumping between
|
||||
sections based on residual-size. Don't.
|
||||
|
||||
---
|
||||
|
||||
## 14. Useful references
|
||||
|
||||
- **ADR-0009** `docs/adr/0009-deterministic-sap-calculator.md` —
|
||||
decision rationale + Session A/B/C plan.
|
||||
- **Spec coverage map**
|
||||
`docs/sap-spec/SPEC_COVERAGE.md` — pre-existing coverage tracker.
|
||||
Update as you go.
|
||||
- **Parity findings**
|
||||
`docs/sap-spec/PARITY_FINDINGS.md` — empirical findings from prior
|
||||
sessions.
|
||||
- **Earlier handover**
|
||||
`docs/sap-spec/HANDOVER_FRESH_REVIEW.md` — orientation from the
|
||||
previous fresh-context pass.
|
||||
- **Reviewer feedback (informal)** — chatGPT critique of the slice-by-
|
||||
slice approach. Key recommendations: two-layer architecture
|
||||
(RdSAP expansion → SAP worksheet), trace mode, golden-master
|
||||
methodology, differential debugging, reference traces from
|
||||
Elmhurst/Stroma/Quidos.
|
||||
- **Commit log** — `git log --oneline` shows the slice history; each
|
||||
S-Bxx commit message documents the spec ref + measured impact.
|
||||
|
||||
---
|
||||
|
||||
## 15. Final note
|
||||
|
||||
The prior session demonstrated that **moving SAP MAE down requires
|
||||
either spec-correctness OR Elmhurst-perfect calibration, not both
|
||||
simultaneously**. The cert-cal layer absorbs Elmhurst's spec
|
||||
deviations; any spec-correct fix risks breaking it.
|
||||
|
||||
The systematic pass clears this by separating the layers:
|
||||
1. Build the spec-correct engine first.
|
||||
2. Re-fit the cert-cal compatibility layer once at the end.
|
||||
|
||||
Don't be discouraged when SAP MAE rises temporarily during the spec
|
||||
sweep. PE residual is the truer signal of engine correctness. SAP
|
||||
MAE convergence will follow once cert-cal is re-derived against the
|
||||
clean engine.
|
||||
|
||||
**Welcome to the project. Read the spec, follow the order, commit one
|
||||
section at a time. The deterministic answer is in there.**
|
||||
Loading…
Add table
Reference in a new issue