mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
Adds ADR-0010 superseding ADR-0009's spec-version target, PCDB
sequencing, and cert-calibration layer. Captures the conclusions
of a grill-with-docs session:
1. Active spec target is SAP 10.2 (14-03-2025), not SAP 10.3 — no
SAP-10.3-lodged certs exist in the corpus to validate against.
2. table_12_cert_calibration is deleted (not "re-derived at the
end"). It was pre-March-2025 spec prices fit against a mixture
distribution of two spec-version regimes, with downstream-
component bugs absorbed into the fit — not Elmhurst deviation.
3. Validation Cohort: filter the corpus to inspection_date ≥
2025-07-01 so every cert in the probe was lodged on SAP 10.2
(14-03-2025) prices. One spec, one signal.
4. PCDB integration is promoted from "Session C deferred" to
prerequisite P4 — dominates residual variance on heat pumps and
the 78% of gas-boiler certs lodging main_heating_data_source=1.
5. Trace mode (SapResult.intermediate) and BRE worked-example
fixtures replace the 7 cert-based golden fixtures, which
contained compensating errors.
6. Strict-type EpcPropertyData via codes.csv-derived canonical
enums (P6) — the in-source motivation lives at
dimensions.py:74-82 (Khalim's comment, included in this commit).
7. Worksheet-faithful structure is a sweep-time principle: each
worksheet module mirrors SAP 10.2 worksheet line numbering.
CONTEXT.md additions:
- Refined "Calculated SAP10 Performance" and "SAP10 Calculation"
to reference SAP 10.2 + ADR-0010.
- New term "SAP Spec Version" — domain-meaningful because the
same EpcPropertyData yields different sap_score under different
spec revisions.
- New term "Validation Cohort" — the version-locked sub-corpus.
HANDOVER_SYSTEMATIC_REVIEW.md is rewritten section-by-section to
reflect ADR-0010: §1 framing, §2 status pointer, new §2.5 with the
six prerequisites P1–P6 in dependency order, §3 diagnosis (cert-cal
was stale prices, not Elmhurst deviation), §4 scope (PCDB IN,
SAP 10.3 stays OUT), §5 approach (worksheet-faithful principle as
§5.5), §7 tension dissolved, §7b findings re-framed, §8 dead-ends
re-classified as conditional, §9 cohort filter, §10 fixture
strategy, §11 trace mode as prerequisite, §12 prereqs-first,
§13 Phase 0/Phase 1 workflow, §14 ADR-0010 reference, §15 final
note.
P2.1 (commit ac1aa56a) already lands the first ADR-0010 slice
(probe swap to spec prices).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1092 lines
50 KiB
Markdown
1092 lines
50 KiB
Markdown
# Handover — Systematic Section-by-Section RdSAP 10 / SAP 10.2 Review
|
||
|
||
**Audience:** A fresh agent picking up the deterministic SAP calculator at
|
||
`packages/domain/src/domain/sap/`. Read this first, then the spec PDFs,
|
||
then the code.
|
||
|
||
**Goal:** Match the cert software (Elmhurst / Stroma / etc.) output exactly
|
||
for RdSAP 10 / SAP 10.2 input certs. This is a **deterministic, mechanical
|
||
calculation** — not a model — so MAE should approach zero on certs whose
|
||
inputs are fully populated.
|
||
|
||
---
|
||
|
||
## 1. Critical framing — this is NOT a judgement call
|
||
|
||
The SAP/RdSAP energy assessment splits cleanly into two roles:
|
||
|
||
1. **The assessor** — a person who surveys the dwelling and lodges
|
||
measured/observed fields onto the cert (areas, perimeters,
|
||
construction codes, insulation thicknesses, fuel types, etc.).
|
||
The assessor makes NO calculation decisions.
|
||
2. **The cert software** (Elmhurst, Stroma, Quidos, NHER, ECMK) — a
|
||
deterministic implementation of the RdSAP 10 + SAP 10.2 specs. It
|
||
takes the lodged fields and produces SAP score, CO2 emissions,
|
||
primary energy (PEUI), CO2 per m², EI rating, etc.
|
||
|
||
**Our calculator is replicating role #2.** Assessor software
|
||
implements the SAP 10.2 spec faithfully; the question of "where does
|
||
Elmhurst diverge from spec?" is no longer the operative one (per
|
||
ADR-0010 + §3 below). Our job is to enumerate every spec
|
||
table / formula / footnote and verify each against the published SAP
|
||
10.2 (14-03-2025) and RdSAP 10 (10-06-2025) PDFs.
|
||
|
||
There is no "assessor judgement" knob to tune. Each field on the cert
|
||
has a deterministic interpretation per the spec. Each spec table /
|
||
formula has a deterministic implementation. Our job is to enumerate
|
||
all of them and verify each.
|
||
|
||
---
|
||
|
||
## 2. Current state (2026-05-19)
|
||
|
||
- Branch: `ara-backend-design-prd`
|
||
- Last clean commit: `f4a8d2a0` ("tests: golden-fixture regression set — 7 currently-correct corpus certs")
|
||
- 301 tests passing
|
||
- Parity probe (300 random certs from
|
||
`data/ml_training/runs/2025_2026_n250000_v18a/data.parquet`, seed=7,
|
||
`sap_score ∈ [5, 99]`):
|
||
|
||
| Metric | Value |
|
||
|---|---|
|
||
| SAP MAE | 4.61 |
|
||
| SAP bias | +0.87 |
|
||
| PE MAE | 43.32 kWh/m² |
|
||
| PE bias | +37.69 kWh/m² |
|
||
|
||
- 7 "golden" regression certs locked in
|
||
`packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`.
|
||
Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10 kWh/m²`. Known
|
||
caveat: some of these are compensating-error matches (e.g. cert
|
||
`7536-3827`'s PE matches but cost is £143 under cert's implied cost
|
||
due to multi-factor offsetting bugs). **These fixtures are retired
|
||
per ADR-0010 and §10 below — they lock buggy compensating outputs
|
||
in place and will fight the spec sweep.**
|
||
|
||
> **Read this before anything else.** [ADR-0010](../adr/0010-sap10-calculator-spec-target-and-validation.md)
|
||
> supersedes the spec-version target, the PCDB sequencing, and the
|
||
> cert-calibration layer of ADR-0009. This handover document was
|
||
> originally written under the rejected framing; §3, §4, §7, §7b,
|
||
> §10 below have been rewritten in lockstep. §2.5 lists the five
|
||
> prerequisites that land **before** the section-by-section sweep
|
||
> starts.
|
||
|
||
---
|
||
|
||
## 2.5. Prerequisites before the sweep starts
|
||
|
||
Five blockers, in dependency order. The section sweep does not start
|
||
until all five are merged. Together they convert the parity probe
|
||
from a noisy mixture-distribution signal into a clean per-section
|
||
verification tool.
|
||
|
||
### P1 — Re-extract the training parquet with `inspection_date`
|
||
|
||
The 250k-cert parquet has 202 columns; **none of them are dates**.
|
||
Without `inspection_date` on each cert we cannot construct the
|
||
Validation Cohort (P3). The ETL currently drops the dates; add them
|
||
back as a non-breaking MINOR Feature Schema Version bump (per
|
||
ADR-0008). `EpcPropertyData.inspection_date` and `.registration_date`
|
||
both exist on the domain object and are populated upstream — the
|
||
parquet writer just needs to include them.
|
||
|
||
### P2 — Delete `domain.sap.tables.table_12_cert_calibration`; correct `domain.sap.tables.table_12`
|
||
|
||
Per ADR-0010 §2 and §1:
|
||
- Remove `table_12_cert_calibration.py` and every call site
|
||
(`cert_calibration_prices()`, `cert_calibration_e7_codes`, the
|
||
`PriceTable` constructor argument that defaults to it).
|
||
- Re-label `table_12.py` as `SAP 10.2 Table 12 (14-03-2025 amendment)`.
|
||
- Correct CO2 factors: mains gas 0.214 → **0.210**, standard electricity 0.086 → **0.136** (the file currently mixes SAP 10.2 prices with SAP 10.3 CO2 factors).
|
||
- Delete the misleading "+25 % shift from SAP 10.2" comment — 13.19 p
|
||
is SAP 10.1 (or SAP 10.2 amendment 0), not SAP 10.2 (14-03-2025).
|
||
|
||
### P3 — Filter the parity probe to the Validation Cohort
|
||
|
||
`Validation Cohort` is defined in `CONTEXT.md` and ADR-0010 §3:
|
||
`inspection_date ≥ 2025-07-01`. Modify
|
||
`services/ml_training_data/src/ml_training_data/sap_parity_probe.py`
|
||
to apply the filter before sampling. The probe sample size and seed
|
||
remain configurable; `sap_score ∈ [5, 99]` remains the typicality
|
||
filter on top of the cohort filter.
|
||
|
||
### P4 — Implement `PcdbLookup` (replace `NoOpPcdbLookup`)
|
||
|
||
Per ADR-0010 §4. Download boiler + heat-pump CSVs from
|
||
https://www.ncm-pcdb.org.uk. Build a lookup keyed on
|
||
`main_heating_index_number`. Surface seasonal efficiency, secondary
|
||
efficiency, output kW, and (for HPs) flow-temperature curve. ~half-day
|
||
of work per the original handover estimate. The
|
||
`Sap10Calculator.__init__(pcdb: Optional[PcdbLookup])` seam from
|
||
ADR-0009 grill outcome #1 is the integration point; no calculator-side
|
||
changes needed beyond reading `index_number` and routing PCDB-returns
|
||
to space-heating / hot-water efficiency lookups instead of Table 4a.
|
||
|
||
### P5 — Populate `SapResult.intermediate` + transcribe BRE worked examples
|
||
|
||
Per ADR-0010 "Verification infrastructure":
|
||
- Populate every named SAP 10.2 worksheet variable on
|
||
`SapResult.intermediate` as sketched in §11. This is mechanical —
|
||
thread the values from each worksheet module into the dict.
|
||
- Transcribe the BRE worked examples from the SAP 10.2 appendices and
|
||
RdSAP 10 worked-example annex into unit tests
|
||
(`tests/test_bre_worked_examples.py`) that lock per-intermediate
|
||
values, not aggregate SAP. These replace the retired cert fixtures.
|
||
|
||
### P6 — Strict-type `EpcPropertyData` via canonical domain enums
|
||
|
||
The current `EpcPropertyData` and its nested types carry many bare
|
||
`str` fields and `Union[int, str]` fields (the latter because the
|
||
gov API gives ints and Site Notes give strings). The defensive
|
||
type-handling cascades into the calculator (`cert_to_inputs.py`,
|
||
`dimensions.py`, etc.) — `dimensions.py:74-82` is Khalim's documented
|
||
example: `SapBuildingPart.identifier` carries main-vs-extension
|
||
information but is bare `str`, so the dimensions code defensively
|
||
iterates instead of dispatching on a typed kind.
|
||
|
||
The fix:
|
||
1. **One canonical enum per field**, union of all keys appearing
|
||
across all schema versions in
|
||
`datatypes/epc/domain/epc_codes.csv`. Hand-author the 18 enum
|
||
classes (`built_form`, `construction_age_band`, `energy_tariff`,
|
||
`glazed_area`, `glazed_type`, `heat_loss_corridor`, `main_fuel`,
|
||
`mechanical_ventilation`, `property_type`, `tenure`,
|
||
`transaction_type`, `ventilation_type`, `water_heating_fuel`,
|
||
`cylinder_insulation_thickness`, `energy_efficiency_rating`,
|
||
`improvement_description`, `improvement_summary`, `code`) plus
|
||
`BuildingPartKind` (Main Dwelling / Extension N). codes.csv is
|
||
the reference; a dedup script can optionally verify coverage but
|
||
is not a build dependency.
|
||
2. **The API mapper** parses raw ints into the canonical enum.
|
||
3. **The Site Notes mapper** parses raw strings into the canonical
|
||
enum.
|
||
4. **The domain object** (`EpcPropertyData` and nested) holds only
|
||
the canonical enums — no `Union[int, str]`, no bare `str` for
|
||
coded fields.
|
||
5. **Every consumer** (calculator, ML pipeline, recommendations,
|
||
ETL, scenario builder) reads from the typed fields.
|
||
|
||
**Constraint**: repo-wide tests must keep passing. The calculator
|
||
is one consumer; the ML pipeline, recommendations, and the Site
|
||
Notes ingestion path also consume `EpcPropertyData`. Each mapper-
|
||
layer change is paired with adapter updates that preserve the
|
||
behaviour the existing tests cover.
|
||
|
||
Pyright `strict` mode must remain clean (CLAUDE.md).
|
||
|
||
### Expected outcome of P1–P6
|
||
|
||
After all six land, run the probe against the Validation Cohort. The
|
||
expected baseline MAE on the clean probe is much smaller than the
|
||
current 4.61 — likely 1.5–2.5 SAP-points based on what we know about
|
||
the residual breakdown (heat pumps closed by P4, gas boilers tightened
|
||
by P4, price-version noise removed by P2+P3). The remaining residual
|
||
is the genuine spec sweep target — and per-section fixes will move
|
||
the probe in measurable, distinguishable amounts because there's no
|
||
compensating layer to mask them, and there's no defensive type
|
||
branching obscuring which input value drove which intermediate.
|
||
|
||
---
|
||
|
||
## 3. Why the prior diagnosis was wrong and how we fixed it
|
||
|
||
The prior session shipped ten slices (S-B23 → S-B31) by debugging the
|
||
biggest residuals one at a time:
|
||
|
||
- **PE MAE dropped substantially: 57.28 → 43.32 (−14)** — real progress
|
||
on the demand-side calculation.
|
||
- **SAP MAE barely moved: 5.34 → 4.61 (−0.73)** — diagnosed at the time
|
||
as "cert-calibration absorbs multiple spec deviations".
|
||
|
||
Three slice attempts looked like they "proved" the cert-cal-absorbs-
|
||
deviations diagnosis:
|
||
|
||
- **Standing charges**: spec Table 12 note (a) requires £92/yr gas
|
||
standing charge on space + water heating. Adding it pushed SAP bias
|
||
+0.98 → −2.62. Reverted.
|
||
- **Cat=10 room heaters off-peak routing**: Table 12a says "other
|
||
direct-acting electric heating" bills 100 % high rate on 7-hour
|
||
tariff. Switching cat=10 from off-peak to standard rate inverted
|
||
the bias +5.88 → −6.00. Reverted.
|
||
- **HW cylinder zero-loss for combi** (uncommitted): Table 2 + Table
|
||
3 footers require zero storage + primary loss when efficiency comes
|
||
from Table 4b. Zeroing them dropped PE MAE −6.64 but raised SAP
|
||
MAE +0.39 and broke 3 of 7 golden fixtures. Reverted.
|
||
|
||
The prior agent concluded: *cert-calibration absorbs Elmhurst's
|
||
deviations from spec — we can't fix one without re-deriving the
|
||
calibration, so do a full spec sweep first and re-derive cert-cal at
|
||
the end.* This diagnosis is **wrong** and the proposed remedy
|
||
amplifies the problem.
|
||
|
||
### What was actually going on
|
||
|
||
The 250k-cert corpus spans multiple SAP spec-version regimes:
|
||
- **Pre-2025-03-14**: certs lodged under SAP 10.1 / SAP 10.2 amendment
|
||
0 prices — mains gas ~3.48 p, standard electricity 13.19 p.
|
||
- **Post-2025-03-14**: certs lodged under SAP 10.2 (14-03-2025) prices
|
||
— mains gas 3.64 p, standard electricity 16.49 p.
|
||
|
||
The `table_12_cert_calibration` prices (3.48 p / 13.19 p) are **the
|
||
older spec's prices**, not Elmhurst deviations from the spec. They
|
||
are an empirical "best fit" across a mixture distribution of two
|
||
price regimes, with downstream-component bugs (PCDB absence, HW
|
||
cylinder loss applied to combi, etc.) absorbed into the fit. The
|
||
table looks like compensation for assessor-software quirks because we
|
||
were never told which spec each cert was on.
|
||
|
||
Each "spec-correct fix that worsened MAE" in the failed slices above
|
||
was actually correct. The MAE regressed because:
|
||
1. The cert-cal prices (pre-March-2025 spec) cancelled with one set
|
||
of downstream errors to produce a quasi-stable cost.
|
||
2. The spec-correct fix landed → that cancellation broke → the
|
||
probe MAE went up.
|
||
3. But the spec-correct fix was *right* — what regressed was a
|
||
compensating-error equilibrium, not the calculator's truth.
|
||
|
||
The prior session's "re-derive cert-cal at the end" plan would
|
||
re-establish a new compensating-error equilibrium across the new bug
|
||
set. It does not converge on spec-correctness.
|
||
|
||
### The fix (per ADR-0010)
|
||
|
||
1. **Stop fitting against a mixture distribution.** Filter the
|
||
validation corpus to a single spec-version window (Validation
|
||
Cohort, `inspection_date ≥ 2025-07-01`). Every cert in the cohort
|
||
was lodged on SAP 10.2 (14-03-2025) prices.
|
||
2. **Delete the cert-calibration layer.** Use spec prices everywhere
|
||
(`domain.sap.tables.table_12`). The only price-routing decision
|
||
left is Table 12a fractional high-rate blending — a real spec
|
||
feature, not a calibration.
|
||
3. **Build PCDB**, because it dominates residual variance and the
|
||
reason it was deferred (cert-cal-absorbs-PCDB) no longer holds.
|
||
4. **Build trace mode and BRE worked-example fixtures**, so
|
||
per-section verification works against single-cert intermediates
|
||
instead of aggregate corpus MAE.
|
||
|
||
This is what §2.5 lists as the five prerequisites. Once they land,
|
||
the section-by-section spec sweep produces clean, monotonic
|
||
improvements.
|
||
|
||
---
|
||
|
||
## 4. Scope decisions (per ADR-0010)
|
||
|
||
### IN scope
|
||
- **SAP 10.2 (14-03-2025 amendment)** is the active spec target.
|
||
`docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf`, 199 pages.
|
||
- **RdSAP 10 (10-06-2025)** — the cert→input mapping layer that
|
||
cross-references SAP 10.2. `docs/sap-spec/rdsap-10-specification-2025-06-10.pdf`,
|
||
114 pages.
|
||
- **PCDB integration.** Moved from "Session C deferred" to **P4
|
||
prerequisite** (§2.5). Heat pumps and the 78 % of gas-boiler certs
|
||
lodging `main_heating_data_source=1` need PCDB-sourced efficiency
|
||
for the calculator to be spec-correct. Data source:
|
||
https://www.ncm-pcdb.org.uk; lookup keyed on `main_heating_index_number`;
|
||
fields: seasonal efficiency, secondary efficiency, output kW,
|
||
flow-temperature curve (HPs).
|
||
- **All RdSAP 10 sections in document order.** §1 → §§19, plus
|
||
Tables 27 / 28 / 29 / 30 / 31. The verification approach in §5 is
|
||
unchanged — only the precondition changes: the sweep runs against a
|
||
clean probe (Validation Cohort + spec prices + PCDB + trace mode).
|
||
|
||
### OUT of scope
|
||
- **Full SAP assessments.** Full-SAP certs lodge measured/calculated
|
||
U-values in `walls[i].description` (e.g.
|
||
"Average thermal transmittance 0.18 W/m²K"). These are a separate
|
||
calculation path (BS EN ISO 6946) and a different corpus. Park
|
||
until the RdSAP 10 base case parity is reached. S-B24 / S-B29
|
||
attempted partial handling; those slices can stay or be reverted at
|
||
your discretion when you reach §§4-7 of RdSAP and §3 of SAP 10.2.
|
||
- **SAP 10.3 (13-01-2026).** No SAP-10.3-lodged certs in the corpus,
|
||
so it cannot be validated. Calculator targets SAP 10.2 until the
|
||
corpus migrates (expected late 2026 / 2027 once BRE updates RdSAP
|
||
to reference SAP 10.3). Note: `table_12.py` currently mixes SAP
|
||
10.2 prices with SAP 10.3 CO2 factors — corrected as part of P2.
|
||
- **Historical-spec cert reproduction.** Calculating what cert SAP
|
||
*would have been* under SAP 10.1 / pre-March-2025 SAP 10.2 prices is
|
||
not the calculator's job. Lodged Performance carries the historical
|
||
value; Calculated SAP10 Performance is current-spec only. The
|
||
Validation Cohort filter operationalises this — older certs are
|
||
out of the validation loop, not because they're "wrong" but because
|
||
they're a different spec's output.
|
||
- **Re-deriving cert-cal at the end.** The prior session's plan. The
|
||
cert-calibration layer is deleted in P2, not re-fit.
|
||
|
||
---
|
||
|
||
## 5. The approach — section-by-section spec verification
|
||
|
||
Work through the RdSAP 10 spec **in document order**, starting at
|
||
§1. For each section:
|
||
|
||
### 5.1. Read the spec section
|
||
Read the section text fully. Note every rule, table reference, and
|
||
defaulting cascade.
|
||
|
||
### 5.2. Find the corresponding code
|
||
Map the section to the source file(s) implementing it. The current
|
||
mapping (some sections are split across modules):
|
||
|
||
| RdSAP 10 section | Code location |
|
||
|---|---|
|
||
| §1 Introduction / general | n/a |
|
||
| §2 Property descriptors | `datatypes/epc/domain/epc_property_data.py` |
|
||
| §3 Dimensions | `packages/domain/src/domain/sap/worksheet/dimensions.py` |
|
||
| §4 Ventilation | `packages/domain/src/domain/sap/worksheet/ventilation.py` |
|
||
| §5 Construction / U-values | `packages/domain/src/domain/ml/rdsap_uvalues.py` + `worksheet/heat_transmission.py` |
|
||
| §6 Windows / doors / overshading | `worksheet/solar_gains.py` + `rdsap/cert_to_inputs.py` |
|
||
| §7 Heating systems (refers to SAP 10.2 Appendix A) | `domain.ml.sap_efficiencies` + `rdsap/cert_to_inputs.py` |
|
||
| §8 Heating controls (Table 4e) | `rdsap/cert_to_inputs.py` |
|
||
| §9 Heat emitters / flow temperatures | not implemented |
|
||
| §10 Space and water heating (Appendix A) | `rdsap/cert_to_inputs.py` |
|
||
| §11 Additional items (PV, batteries, wind, hydro, shutters) | partial in `cert_to_inputs.py` (PV only) |
|
||
| §12 Electricity tariff | `rdsap/cert_to_inputs.py` (`_is_off_peak_meter`, fuel routing) |
|
||
| §13 Addendum to EPCs | n/a |
|
||
| §14 Special cases (e.g. flats above commercial) | not implemented |
|
||
| §15 Improvements (recommendations) | n/a (not rating) |
|
||
| §16-19 RdSAP-specific SAP rating equations | `worksheet/rating.py` |
|
||
| Table 27 — Living-area fraction | `rdsap/cert_to_inputs.py:_living_area_fraction` |
|
||
| Table 28 — Cylinder size defaults | `domain.ml.demand:_CYLINDER_VOLUME_L` |
|
||
| Table 29 — Heating + HW parameters | partial in `cert_to_inputs.py` |
|
||
| Table 30 — Mechanical ventilation | not implemented |
|
||
| Table 31 — Data to be collected | n/a |
|
||
|
||
### 5.3. For each spec rule in the section, check our code
|
||
For each table, formula, footnote, exception:
|
||
|
||
1. Does our code implement it?
|
||
2. Does the implementation match the spec values exactly?
|
||
3. Are there spec-defined edge cases / footnotes we're missing?
|
||
|
||
### 5.4. When a gap is found
|
||
- Write a failing unit test that asserts the spec-correct behaviour
|
||
— wherever possible, write it as an assertion on `intermediate`
|
||
values rather than on aggregate SAP, using a BRE worked example
|
||
if one covers the section.
|
||
- Implement the fix.
|
||
- Run `test_bre_worked_examples.py` plus the Validation Cohort
|
||
probe. Note both direction and magnitude of change.
|
||
- If a BRE worked-example breaks, the new code is wrong (revert).
|
||
BRE examples are spec-derived and cannot regress from a
|
||
spec-correct change.
|
||
- Commit per-slice: one section → one commit. Reference the spec
|
||
section in the commit message.
|
||
|
||
### 5.5. Sweep-time principle: worksheet-faithful structure
|
||
|
||
Each `worksheet/*.py` module must mirror the SAP 10.2 worksheet
|
||
structure for its section. As you verify a section, also restructure
|
||
its module so that:
|
||
|
||
1. **Each function name references its worksheet-line origin** (e.g.
|
||
`heat_transfer_coefficient` aligns with worksheet line (40);
|
||
`mean_internal_temperature` aligns with worksheet line (93)).
|
||
2. **Compound calculations are split** into one function per
|
||
worksheet line where possible — easier to verify against
|
||
`intermediate[...]` and against BRE worked-example values.
|
||
3. **Defensive type-handling disappears**. Once P6 lands, the input
|
||
is a typed enum or numeric — branching on `isinstance(x, int)` is
|
||
replaced by enum dispatch.
|
||
4. **Domain-typed inputs flow directly**. `SapBuildingPart.kind ==
|
||
BuildingPartKind.MAIN_DWELLING` replaces string sniffing of
|
||
`identifier`. The dimensions.py "unnecessarily complicated"
|
||
pattern Khalim flagged is the canonical example of what *not*
|
||
to do.
|
||
|
||
The principle applies during section-sweep slices. It is **not**
|
||
a separate prerequisite — the refactor lands with the verification
|
||
slice for the section it touches.
|
||
|
||
### 5.6. Use trace mode when you need it
|
||
P5 populates `SapResult.intermediate: dict[str, float]` with every
|
||
named SAP 10.2 worksheet variable. Each section's verification
|
||
benefits from inspecting these values per-cert. See §11 below for
|
||
the sketch.
|
||
|
||
---
|
||
|
||
## 6. What's already been done — section by section
|
||
|
||
This is your starting map. Each row says whether the section has been
|
||
touched and what the current state is.
|
||
|
||
### Walls / construction (§5)
|
||
- **S-B23 (committed `9a509e41`)**: Table 6 "Filled cavity" row dispatch
|
||
when `wall_insulation_type=2` AND `wall_construction=4`. Spec-anchored.
|
||
- **S-B24 (committed `15613309`)**: Parse `walls[i].description` for
|
||
"Average thermal transmittance X W/m²K". **PARK** — full-SAP path.
|
||
- **S-B25 (committed `6b934710`)**: Description-based dispatch for cavity
|
||
"as built, insulated (assumed)" + similar (type=4 with descriptive
|
||
signal). Spec-anchored via legacy `epc_wall_description_map`.
|
||
- **S-B26 (committed `361f9154`)**: `_insulation_bucket(0, True) → 50`
|
||
fix (the "NI" thickness sentinel) + description-based override of
|
||
`wall_ins_present` for non-cavity walls. Spec footnote (Table 6).
|
||
- **S-B27 (committed `1f49fa03`)**: Floor `_insulation_bucket` analog —
|
||
Table 19 footnote (2) "max(50, age-band default)" when description
|
||
signals retrofit.
|
||
- **S-B28 (committed `25261d5c`)**: Roof NI thickness + insulated
|
||
description → §5.11.4 footnote 50mm joist row.
|
||
- **S-B29 (committed `3ab09845`)**: Floor + roof "Average thermal
|
||
transmittance" parse. **PARK** — full-SAP path.
|
||
|
||
**Still to verify in §5**:
|
||
- Stone wall U-values for Scotland / Wales / NIR (Tables 7-10) — only
|
||
England is fully transcribed; country overrides are partial.
|
||
- Cob U-values (§5.6) — table only, no formula implementation.
|
||
- Stone formula §5.6 / §5.7 for non-standard wall thicknesses.
|
||
- Curtain wall §5.18 — not implemented.
|
||
- Party wall U-values (Table 15) — implemented in `u_party_wall`,
|
||
verify table values.
|
||
- Thermal bridging (Table 21) — implemented as global `y` factor,
|
||
verify per-age-band values.
|
||
- §5.16 Thermal mass — Table 22 (only 100 / 250 kJ/m²K, dispatched
|
||
by construction type with internal insulation). Currently we
|
||
hardcode 250 (see `cert_to_inputs.py:_DEFAULT_THERMAL_MASS_PARAMETER_KJ_PER_M2_K`).
|
||
This is wrong for timber-frame / cob / internally-insulated masonry
|
||
(should be 100).
|
||
|
||
### Heating systems (§§7-10, SAP Appendix A)
|
||
- **S-B20 (in history)**: Table 11 secondary heating allocation,
|
||
conditional on cert lodging secondary or being electric storage.
|
||
- **Failed S-B30 (reverted)**: respect `main_heating_fraction` —
|
||
shown empirically wrong. Field is multi-main allocation, not
|
||
main-vs-secondary. Spec verified against SAP 10.2 Appendix A1/A4.
|
||
- **S-B31 (committed `afdf297f`)**: Table 12c DLF on heat-network main.
|
||
Spec §C3.1 + Table 12c.
|
||
- **Failed S-B32 (room heater off-peak routing, reverted)**: Table 12a
|
||
says cat=10 room heaters on 7-hour tariff bill 100% high rate. Our
|
||
cert-cal extends off-peak to codes 691-696. Spec-correct fix
|
||
inverted bias direction — calibration was absorbing this.
|
||
- **Uncommitted HW cylinder fix**: spec-correct (combi → zero
|
||
storage/primary loss per Table 2 + Table 3 footers) but breaks 3
|
||
golden fixtures. Decision deferred to systematic pass.
|
||
|
||
**Still to verify in heating**:
|
||
- Table 4a efficiency values for every code (heat pumps, storage
|
||
heaters, room heaters, CPSU). The category-fallback (cat=4 → 2.30)
|
||
is documented as a known limitation.
|
||
- Boiler interlock penalty (−5%) — spec §9.2.1: "The efficiency of
|
||
gas and liquid fuel boilers for both space and water heating is
|
||
reduced by 5% if the boiler is not interlocked for space and water
|
||
heating." We don't apply this. Known gap.
|
||
- Table 4c condensing-boiler / heat-pump emitter-temperature
|
||
adjustment — we don't apply this.
|
||
- Table 12a high-rate fractions for off-peak dwellings — we apply
|
||
100% off-peak or 100% standard, never fractional blending.
|
||
|
||
### Hot water (§4 SAP + Appendix J)
|
||
- Storage loss factor table (Table 2) — current values in
|
||
`domain.ml.demand:_STORAGE_LOSS_FACTOR` are ~3× off from spec
|
||
(verified). Known under-prediction of cylinder loss for storage
|
||
systems; cancelled by over-prediction of primary loss for combi
|
||
systems in aggregate.
|
||
- Primary loss formula (Table 3) — implemented as 245/60 kWh by age
|
||
band. Spec is a per-month formula `nₘ × 14 × [{0.0091·p + 0.0245·(1-p)}·h + 0.0263]`
|
||
with `p` (pipework insulation fraction) and `h` (circulation hours).
|
||
Known approximation.
|
||
- Combi-boiler zero-loss rule (Table 2 + Table 3 footers) — currently
|
||
NOT applied (the failed uncommitted slice). Adding this drops PE
|
||
MAE −6.64 but raises SAP MAE +0.39.
|
||
- Appendix J Vd formula `25N + 36` — currently the simple form, not
|
||
the full per-component (shower / bath / other) breakdown. Useful
|
||
HW demand is ~7% under spec value.
|
||
- ΔT — currently 43°C constant (55−12). Spec uses monthly Tcold and
|
||
hot at 52°C, not 55°C. Per-month variance unmodelled.
|
||
|
||
### Lighting (Appendix L)
|
||
- `predicted_lighting_kwh` in `domain.ml.demand` uses `9.3 × TFA ×
|
||
(1 − 0.5·led_share − 0.4·cfl_share)` heuristic.
|
||
- Spec is L1-L12: daylight correction, fixed-lighting capacity, top-up
|
||
+ portable shares, monthly profile.
|
||
- For LED-dominant home (50+ LEDs): our heuristic gives ~465 kWh, spec
|
||
gives ~94 kWh. Known over-prediction by ~5× for new-build LED homes.
|
||
|
||
### Internal gains (§5 SAP)
|
||
- `worksheet/internal_gains.py` implements metabolic + cooking +
|
||
appliances + lighting (the four positive rows of Table 5).
|
||
- **Missing**: Water heating row (`1000 × (65)ₘ / (nₘ × 24)` — i.e.
|
||
HW losses recycled as heated-space gains) and Losses row (`−40 × N`
|
||
for cold inflow + evaporation). Both documented in S-B23 gap list.
|
||
|
||
### Ventilation (§4 / Table 5)
|
||
- Wind-shelter factor implemented in S-B21.
|
||
- Mechanical ventilation (MVHR, MEV, PIV) — not implemented; cert
|
||
rarely lodges. Spec §4.2 + Table 4g.
|
||
- Pressure-test override (worksheet lines 17-18) — not implemented.
|
||
|
||
### Tariff / cost (§12 + Table 12 / 12a / 12c)
|
||
- Cert-calibration prices in
|
||
`domain.sap.tables.table_12_cert_calibration` are an EMPIRICAL fit
|
||
to Elmhurst's output. They are LOWER than the published Table 12
|
||
spec values by 4-25%. Known divergence; investigation deferred.
|
||
- Standing charges (Table 12 note (a)) — NOT applied. Adding them
|
||
empirically worsens MAE (calibration absorbs).
|
||
- Table 12a high-rate fractions — currently 100% off-peak for E7-
|
||
eligible codes, 100% standard otherwise. No fractional blending.
|
||
- Heat network DLF (Table 12c) — applied per S-B31 only to main
|
||
heating + HW from main. HW-only-from-heat-network is a separate slice.
|
||
|
||
---
|
||
|
||
## 7. The cert-calibration "tension" is dissolved (per ADR-0010)
|
||
|
||
This section originally framed cert-calibration vs spec-correctness as
|
||
two end-states the calculator had to choose between. That framing is
|
||
wrong (see §3 for the actual diagnosis): the cert-cal values are
|
||
pre-March-2025 SAP prices, not Elmhurst deviations from SAP 10.2.
|
||
Once the corpus is filtered to the Validation Cohort (P3) and the
|
||
cert-cal layer is deleted (P2), the false dichotomy disappears.
|
||
|
||
### What replaces this section
|
||
|
||
- **One price table.** `domain.sap.tables.table_12` (re-labelled SAP
|
||
10.2 14-03-2025 amendment, CO2 factors corrected per P2).
|
||
- **One validation cohort.** `inspection_date ≥ 2025-07-01`, every
|
||
cert lodged on the calculator's target spec version.
|
||
- **One verification mechanism.** Trace-mode intermediates + BRE
|
||
worked-example unit tests for per-section verification; Validation
|
||
Cohort probe MAE for aggregate go/no-go.
|
||
|
||
Cert-software deviations from spec, if they exist at all, are
|
||
expected to be small and localised. They surface as residual after
|
||
the spec sweep completes against a clean probe — and at that point
|
||
the question is whether to chase them at all (Elmhurst-deviation
|
||
fixes have low domain value compared to spec-correctness, given the
|
||
calculator's product use case is scoring counterfactuals for the
|
||
MeasureApplicator chain, not reproducing historical certs).
|
||
|
||
---
|
||
|
||
## 7b. Outstanding findings to pick up during the systematic pass
|
||
|
||
The prior session identified several spec-correct fixes that were
|
||
reverted because they made SAP MAE worse against the **full corpus**.
|
||
The empirical signal that "reverted" them was version-mixture noise
|
||
(see §3) plus compensating-error breakage in the 7 retired golden
|
||
fixtures. Each fix below is **expected to land cleanly** once the
|
||
five prerequisites in §2.5 are done, because:
|
||
|
||
- The Validation Cohort (P3) is on a single spec version — the price
|
||
mismatch that drove the bias regression on standing charges and
|
||
cat=10 routing disappears.
|
||
- The cert-cal layer is gone (P2) — no calibration to "break".
|
||
- PCDB is integrated (P4) — the heat-pump and gas-boiler residuals
|
||
that dominated per-cert MAE collapse before any of these findings
|
||
even matter.
|
||
- The fixtures are now BRE worked examples (P5 + §10) — they cannot
|
||
be broken by spec-correct changes because they are themselves
|
||
derived from the spec.
|
||
|
||
Treat each finding as a section-sweep TODO. The empirical impacts
|
||
below were measured against the **dirty probe** (full corpus + cert-cal
|
||
+ no PCDB) and are **not predictive** of behaviour on the clean probe.
|
||
Re-measure each fix against the Validation Cohort after prerequisites
|
||
land.
|
||
|
||
### Finding 1 — HW cylinder zero-loss rule for combi boilers
|
||
**Status**: spec-correct fix exists in working-tree-only form
|
||
(uncommitted). Reverted at end of last session.
|
||
|
||
**Spec basis**:
|
||
- **SAP 10.2 Table 2 footer (page 158)**: "In the case of a
|
||
combination boiler: a) the storage loss factor is zero if the
|
||
efficiency is taken from Table 4b"
|
||
- **SAP 10.2 Table 3 footer (page 160)**: "Primary loss is set to
|
||
zero for the following: Electric immersion heater, Combi boiler
|
||
(including when it is part of a combined heat pump and boiler
|
||
package and provides all the hot water), CPSU (including electric
|
||
CPSU), Boiler and thermal store within a single casing, Separate
|
||
boiler and thermal store connected by no more than 1.5 m of
|
||
insulated pipework, Direct-acting electric boiler, Heat pump (not
|
||
combined heat pump and boiler package with a non-combi boiler)
|
||
from PCDB with hot water vessel integral to package"
|
||
|
||
**The bug**: our calculator currently adds storage loss (~135 kWh)
|
||
and primary loss (~245 kWh) for ALL certs with an age band lodged,
|
||
ignoring whether the dwelling has a cylinder. **67% of corpus certs
|
||
explicitly lodge `has_hot_water_cylinder=False`** (the modal combi
|
||
boiler case) — we add 380 kWh of fictional HW losses for each.
|
||
|
||
**The fix** (sketch, ~10 lines):
|
||
1. Add `has_cylinder: bool = True` keyword to
|
||
`predicted_hot_water_kwh` in `packages/domain/src/domain/ml/demand.py`.
|
||
2. When `has_cylinder=False`, set `storage_loss = 0` and `primary_loss = 0`.
|
||
3. In `cert_to_inputs.py` (around line 829), pass
|
||
`has_cylinder=epc.has_hot_water_cylinder and not is_instantaneous`.
|
||
|
||
**Empirical impact** (measured on 300-cert probe):
|
||
- **PE MAE: 43.32 → 36.68 (−6.64) ← biggest single fix found this session**
|
||
- PE bias: 37.69 → 30.41 (−7.28)
|
||
- SAP MAE: 4.61 → 5.00 (+0.39, regression)
|
||
- 3 of 7 golden fixtures break
|
||
|
||
**Why it was reverted**: the SAP regression + broken fixtures indicate
|
||
the fictional HW losses were partially compensating for OTHER bugs
|
||
(likely lighting over-prediction for LED-dominant homes). The right
|
||
ordering is: fix the spec-clear cases (HW cylinder, lighting per
|
||
Appendix L, etc.) together, then re-derive cert-cal.
|
||
|
||
**When to pick up**: when you reach §4 / Appendix J during the
|
||
systematic pass. Pair with the lighting Appendix L fix to avoid
|
||
breaking the golden fixtures individually.
|
||
|
||
### Finding 2 — Standing charges (Table 12 note (a))
|
||
**Status**: spec-correct, never implemented. Empirically rejected by
|
||
4-mode probe.
|
||
|
||
**Spec basis**: SAP 10.2 Table 12 note (a), page 190:
|
||
> "For calculations including regulated energy uses only (e.g.
|
||
> regulation compliance, energy ratings):
|
||
> - The standing charge for electricity standard tariff is omitted
|
||
> - The standing charge for off-peak electricity is added to space
|
||
> and water heating costs where either main heating or hot water
|
||
> uses off-peak electricity
|
||
> - The standing charge for gas fuels is added to space and water
|
||
> heating costs where the gas fuel is used for space heating
|
||
> (main or secondary) or for water heating"
|
||
|
||
**The bug**: our calculator never adds standing charges. Per spec, a
|
||
gas-heated dwelling should have £92/yr added to the ECF numerator.
|
||
|
||
**Empirical impact** (4-mode probe, 300 certs):
|
||
| Mode | All certs | Gas-only |
|
||
|---|---|---|
|
||
| cert-cal, no standing (current) | MAE 4.69, bias +0.98 | MAE 4.01, bias +0.80 |
|
||
| cert-cal + gas standing | MAE 4.94, bias **−2.62** | MAE 4.31, bias **−3.53** |
|
||
|
||
Adding standing charges shifts SAP bias by ~3.5 points downward —
|
||
clearly the wrong direction. The cert-cal prices (3.48p gas vs spec
|
||
3.64p) implicitly absorb the standing-charge contribution.
|
||
|
||
**When to pick up**: when you reach §12 / Table 12. Apply alongside
|
||
spec-correct unit prices (3.64p gas, 16.49p elec) and re-derive
|
||
cert-cal to match Elmhurst's residual deviation pattern.
|
||
|
||
### Finding 3 — Cat=10 room heaters off-peak routing
|
||
**Status**: spec-correct, currently bills room heaters at off-peak
|
||
rate on E7 dwellings. Empirically rejected.
|
||
|
||
**Spec basis**: SAP 10.2 Table 12a (page 191):
|
||
> "Other direct-acting electric heating (including electric secondary
|
||
> heating): 7-hour tariff 1.00 high rate; 10-hour tariff 0.50 high rate"
|
||
|
||
**The bug**: our cert-calibration (`cert_calibration_e7_codes`)
|
||
extends the off-peak set to include codes 691-696 (room heaters).
|
||
That's the S-B14 empirical extension — the previous agent found it
|
||
helped some specific certs. Per Table 12a it's WRONG: room heaters
|
||
on E7 should bill 100% at HIGH rate, not at low rate.
|
||
|
||
**Empirical impact**: switching from off-peak (5.50p cert-cal) to
|
||
standard rate (13.19p) — closer to spec but still not the high rate
|
||
(15.29p cert-cal) — inverted the bias from +5.88 to −6.00 without
|
||
improving MAE.
|
||
|
||
**The real issue**: Table 12a defines FRACTIONAL blending (e.g.
|
||
"90% high, 10% low" for direct-acting electric boiler on 7-hour
|
||
tariff), not binary on/off-peak. Our calculator only supports binary.
|
||
A proper implementation needs per-system high-rate fractions.
|
||
|
||
**When to pick up**: when you reach §12 / Table 12a. Implement
|
||
fractional blending for all the rows of Table 12a, not just cat=10.
|
||
|
||
### Finding 4 — Lighting (Appendix L proper)
|
||
**Status**: gap. Current code uses a 9.3 kWh/m² heuristic with simple
|
||
LED/CFL reductions; spec is the L1-L12 cascade with daylight
|
||
correction, fixed-lighting capacity, top-up + portable shares,
|
||
monthly profile.
|
||
|
||
**Spec basis**: SAP 10.2 Appendix L §L1 (pages 88-90), equations
|
||
L1-L12.
|
||
|
||
**The bug**: for a 100 m² LED-dominant home (e.g. cert 7536-3827 with
|
||
51 LEDs), our heuristic returns 465 kWh/yr; spec returns ~94 kWh/yr.
|
||
Over-prediction by ~5× on LED-dominant homes (which is most modern
|
||
stock).
|
||
|
||
**Empirical impact** (estimated):
|
||
- ~5-6 kWh/m² PEUI over-prediction for LED-dominant population
|
||
- Corpus-weighted: ~3-4 kWh/m² PEUI bias contribution
|
||
|
||
**When to pick up**: when you reach Appendix L. Pair with the HW
|
||
cylinder fix (Finding 1) to avoid the SAP MAE regression.
|
||
|
||
### Finding 5 — Internal-gains Table 5 missing rows
|
||
**Status**: gap. Spec Table 5 has 7 rows for internal gains; our
|
||
`worksheet/internal_gains.py` implements 4.
|
||
|
||
**Spec basis**: SAP 10.2 Table 5 (page 177).
|
||
|
||
**Missing rows**:
|
||
- **Water heating**: `1000 × (65)ₘ / (nₘ × 24)` W — the HW losses
|
||
(cylinder + distribution + primary) recycled as heated-space gains
|
||
via worksheet line (65). Reduces space heating demand.
|
||
- **Losses**: `−40 × N` W — heat to incoming cold water and
|
||
evaporation. Negative contribution.
|
||
|
||
**Empirical impact** (estimated):
|
||
- For N=2.7: HW gains ≈+75 W, losses ≈−108 W, net ≈−33 W. Currently
|
||
we miss both → our gains are 33 W too high → space heating demand
|
||
too low → PE under-predicted by ~3 kWh/m² (rough).
|
||
|
||
**When to pick up**: when you reach §5 / Table 5. Worksheet line (65)
|
||
also needs implementation — the HW losses already exist in our calc
|
||
(see `demand.py:_cylinder_storage_loss_kwh` etc.), they just need
|
||
piping into internal_gains.
|
||
|
||
### Finding 6 — Storage-loss-factor table values are wrong
|
||
**Status**: gap. Affects only certs with `has_hot_water_cylinder=True`
|
||
(33% of corpus).
|
||
|
||
**Spec basis**: SAP 10.2 Table 2 (page 158).
|
||
|
||
**The bug**: `domain.ml.demand:_STORAGE_LOSS_FACTOR` values are ~3×
|
||
LOWER than spec. E.g. for 38mm foam our value is 0.0056, spec is
|
||
0.0181. Effect: we UNDER-predict cylinder storage loss by ~300 kWh
|
||
for storage systems, partly cancelling the over-prediction from
|
||
Finding 1.
|
||
|
||
**When to pick up**: when you reach §4 / Table 2. Fix WITH Finding 1
|
||
(combi zero-loss) so the cancellation doesn't dominate the
|
||
direction.
|
||
|
||
### Finding 7 — Heat-pump fallback efficiency 2.30
|
||
**Status**: gap that requires PCDB. See §8b.
|
||
|
||
### Finding 8 — Other smaller gaps (carry forward)
|
||
- Boiler interlock −5% penalty (§9.2.1) — never applied
|
||
- Table 4c condensing boiler / HP emitter temperature adjustment — never applied
|
||
- Control-temperature adjustment from Table 4e — always 0 in code, spec varies
|
||
- Wall U-values for Scotland / Wales / NIR — only England fully transcribed
|
||
- Per-junction thermal bridging (Table R2) — global y approximation only
|
||
- Multi-main heating (`main_heating_fraction` ≠ 1) — first main only
|
||
- Cooling §10 — not implemented (rare in UK)
|
||
- FEE §11 — not implemented (new-build only)
|
||
|
||
---
|
||
|
||
## 8. Don't repeat — known dead-ends
|
||
|
||
> **Re-read after §3 + §7b.** Three entries below were classified as
|
||
> "dead-ends because cert-cal absorbs" — that diagnosis is wrong.
|
||
> They are spec-correct fixes that were measured under a noisy probe.
|
||
> Now flagged as **conditional dead-ends**: dead only if you try them
|
||
> before P1–P5 land. After prerequisites: they are expected
|
||
> improvements, not dead-ends. See ADR-0010.
|
||
|
||
- ❌ **Switching "NI" wall thickness to None alone** (S-B5 in history) —
|
||
over-corrected because it routed to the (Unfilled cavity, 50mm) row
|
||
instead of the dedicated Filled cavity row. The right fix landed in
|
||
S-B23 with a `WALL_INSULATION_FILLED_CAVITY` dispatcher.
|
||
- ❌ **Aggressive efficiency rescue for missing `sap_main_heating_code`**
|
||
(S-B5) — over-corrected. The category fallback (cat=4 → 2.30) is
|
||
intentionally conservative; PCDB (P4 prerequisite) supplies the
|
||
real efficiency.
|
||
- ⚠️ **Using SAP 10.2 spec prices for parity validation** — under
|
||
the dirty probe, cert-cal prices fit better. **Inverts under the
|
||
clean probe (P2 + P3): SAP 10.2 spec prices are correct because the
|
||
Validation Cohort is on the 14-03-2025 amendment.** Listed here
|
||
only as a warning if you start the sweep before prerequisites land.
|
||
- ❌ **Always applying 10% secondary heating** — must be conditional on
|
||
cert lodging or main system being electric storage (S-B20). See
|
||
spec Appendix A.4.
|
||
- ❌ **Respecting `main_heating_fraction` for secondary allocation**
|
||
(failed S-B30) — the field is the multi-main allocation (system 1 vs
|
||
system 2), not main-vs-secondary. SAP MAE 4.69 → 4.85 (worse).
|
||
- ⚠️ **Switching cat=10 room heaters off off-peak** (failed S-B32) —
|
||
spec-correct per Table 12a. The bias inversion under the dirty
|
||
probe was driven by cert-cal compensating; on the clean probe this
|
||
is just spec-correct. Land as part of the §12 spec sweep after
|
||
prerequisites.
|
||
- ⚠️ **Adding gas standing charges** (4-mode probe, unimplemented) —
|
||
spec-correct per Table 12 note (a). Same logic: bias drift under
|
||
dirty probe is version-mixture + missing-PCDB noise, not Elmhurst
|
||
deviation. Land as part of §12 spec sweep.
|
||
- ⚠️ **Zeroing storage + primary loss for combi boilers** (uncommitted
|
||
S-B32) — spec-correct per Table 2 + Table 3 footers. SAP MAE
|
||
regression was driven by the now-retired golden fixtures (§10) and
|
||
cert-cal absorption. Land as part of §4 / Appendix J sweep.
|
||
|
||
---
|
||
|
||
## 9. The cert corpus and parity probe
|
||
|
||
### Sample
|
||
`data/ml_training/runs/2025_2026_n250000_v18a/data.parquet` is the
|
||
250k-cert parquet. **After P1 lands** the parquet carries
|
||
`inspection_date`; the probe then filters to the **Validation Cohort**
|
||
(`inspection_date ≥ 2025-07-01`) plus `sap_score ∈ [5, 99]` and
|
||
samples 300 at seed=7 by default. Filtering rationale:
|
||
- ≤ 5 is heritage/anomaly stock (sub-3 % of corpus)
|
||
- ≥ 99 is full-SAP new-builds the parquet excludes anyway
|
||
- `inspection_date ≥ 2025-07-01` ensures every cert was lodged on
|
||
SAP 10.2 (14-03-2025 amendment) — see [CONTEXT.md](../../CONTEXT.md)
|
||
/ "Validation Cohort" and ADR-0010 §3.
|
||
|
||
### Run the probe
|
||
```bash
|
||
python -c "
|
||
import sys
|
||
sys.path.insert(0, 'packages/domain/src')
|
||
sys.path.insert(0, '.')
|
||
sys.path.insert(0, 'services/ml_training_data/src')
|
||
from ml_training_data.sap_parity_probe import main
|
||
main(['300','7'])
|
||
"
|
||
```
|
||
|
||
### What the probe shows
|
||
- Aggregate SAP MAE / RMSE / bias
|
||
- Aggregate PE MAE / RMSE / bias
|
||
- Per-end-use PEUI breakdown (space / HW / lighting / pumps)
|
||
- Stratification by `main_heating_category`, `construction_age_band`,
|
||
`dwelling_type`
|
||
- Worst-15 residuals (SAP and PE)
|
||
|
||
### Known parquet limitations
|
||
- ~0.7% of parquet certs have `construction_age_band=None` vs 15% in
|
||
the raw bulk-zip. The parquet filters out full-SAP new-builds
|
||
upstream. Don't measure full-SAP-path slices against the parquet.
|
||
- Heat-pump certs (cat=4) are under-represented and concentrated in
|
||
the worst-residual tail because PCDB efficiency is unavailable.
|
||
|
||
---
|
||
|
||
## 10. Fixtures: retire the 7 cert-based golden fixtures, replace with BRE worked examples (per ADR-0010 + P5)
|
||
|
||
The 7 cert-based fixtures at
|
||
`packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`
|
||
were locked in against the current calculator state — *with* cert-cal,
|
||
*without* PCDB, *with* HW cylinder loss always applied, *with* the
|
||
lighting heuristic, etc. They are documented in §3 / the prior
|
||
handover as containing compensating errors. Once the prerequisites
|
||
land, every spec-correct fix breaks at least one of them. They will
|
||
fight the spec sweep.
|
||
|
||
### Replacement strategy
|
||
|
||
**Primary regression suite: BRE worked-example fixtures.**
|
||
|
||
Transcribe the worked examples from:
|
||
- SAP 10.2 spec appendices (especially Appendix R — reference values
|
||
and the worked example dwelling).
|
||
- RdSAP 10 (10-06-2025) worked-example annex.
|
||
|
||
Each worked example becomes a unit test that locks **per-intermediate
|
||
expected values** (HLP, HTC, mean internal temperature monthly, MIT,
|
||
ECF, SAP score) rather than the aggregate SAP score alone. Because
|
||
they are spec-derived, no spec-correct change can break them — any
|
||
break is an implementation bug, unambiguously.
|
||
|
||
These tests live at
|
||
`packages/domain/src/domain/sap/tests/test_bre_worked_examples.py`
|
||
(new module — separate from the cert-based fixtures module).
|
||
|
||
**Cert-based fixtures retired.**
|
||
|
||
The current `test_golden_fixtures.py` is either deleted or repurposed
|
||
as a *very loose* smoke-test integration suite (e.g. `|SAP residual|
|
||
≤ 5`) that catches catastrophic regressions only. The 7 cert JSONs
|
||
under `fixtures/golden/<cert>.json` can be kept on disk as reference
|
||
data, but they no longer drive go/no-go decisions in the sweep.
|
||
|
||
**Optional future addition.**
|
||
|
||
If/when a current Elmhurst (or Stroma / Quidos / NHER) license is
|
||
available, run a handful of representative corpus certs through it
|
||
and lock those outputs as a second-tier regression suite — Elmhurst-
|
||
parity fixtures alongside spec-parity fixtures. Not a prerequisite.
|
||
|
||
---
|
||
|
||
## 11. Trace mode (prerequisite P5 — implementation sketch)
|
||
|
||
This section was originally labelled "recommended"; it is now
|
||
**prerequisite P5** per ADR-0010. The sweep does not start until
|
||
`intermediate` is populated everywhere. ADR-0009 proposed:
|
||
```python
|
||
@dataclass(frozen=True)
|
||
class SapResult:
|
||
sap_score: float
|
||
...
|
||
intermediate: dict[str, float]
|
||
```
|
||
|
||
The `intermediate` field was never populated. Suggested implementation
|
||
for the systematic pass:
|
||
|
||
```python
|
||
intermediate = {
|
||
# §1 dimensions
|
||
"tfa_m2": tfa,
|
||
"volume_m3": volume,
|
||
"storey_count": storeys,
|
||
# §3 heat transmission
|
||
"walls_w_per_k": ht.walls_w_per_k,
|
||
"roof_w_per_k": ht.roof_w_per_k,
|
||
"floor_w_per_k": ht.floor_w_per_k,
|
||
"party_walls_w_per_k": ht.party_walls_w_per_k,
|
||
"windows_w_per_k": ht.windows_w_per_k,
|
||
"doors_w_per_k": ht.doors_w_per_k,
|
||
"thermal_bridging_w_per_k": ht.thermal_bridging_w_per_k,
|
||
"infiltration_ach": infiltration,
|
||
"infiltration_w_per_k": infiltration * volume * 0.33,
|
||
"heat_transfer_coefficient_w_per_k": hlc,
|
||
"heat_loss_parameter_w_per_m2k": hlp,
|
||
"time_constant_h": tau_h,
|
||
# §5 internal gains (annual averages)
|
||
"internal_gains_annual_avg_w": ...,
|
||
# §7 mean internal temperature (annual avg)
|
||
"mean_internal_temp_annual_avg_c": ...,
|
||
# §9 space heating
|
||
"useful_space_heating_kwh_per_yr": space_heating_kwh,
|
||
# §12 fuel costs (per end-use)
|
||
"main_heating_cost_gbp": ...,
|
||
"hot_water_cost_gbp": ...,
|
||
"lighting_cost_gbp": ...,
|
||
"pumps_fans_cost_gbp": ...,
|
||
# §13 rating
|
||
"ecf": ecf,
|
||
"deflator": 0.36,
|
||
# §14 primary energy and CO2 per end-use
|
||
"space_heating_pe_kwh_per_m2": ...,
|
||
"hot_water_pe_kwh_per_m2": ...,
|
||
...
|
||
}
|
||
```
|
||
|
||
Once populated, the differential debugging the reviewer recommended
|
||
becomes possible: change one input field, compare deltas against an
|
||
Elmhurst export.
|
||
|
||
---
|
||
|
||
## 12. Specific section-1 starting tasks (suggested first session)
|
||
|
||
A concrete pickup point:
|
||
|
||
### Session 1 — §1 (Introduction), §2 (Property Descriptors), §3 (Dimensions)
|
||
- §1 is prose; nothing to verify.
|
||
- §2 maps to `EpcPropertyData`. Verify that every field RdSAP §2
|
||
enumerates is present and correctly typed on the domain object.
|
||
Specifically check: `dwelling_type`, `built_form`, `property_type`,
|
||
`construction_age_band`, `country_code`. Note that
|
||
`construction_age_band` is per-building-part, not dwelling-level,
|
||
and the primary age band drives most defaults.
|
||
- §3 maps to `worksheet/dimensions.py`. Verify:
|
||
- Total floor area sum across building parts equals TFA
|
||
- Volume calculation per storey × area × height
|
||
- Storey count handling for extensions and room-in-roof
|
||
- Multi-storey heat-loss-perimeter rules
|
||
|
||
This single session should produce zero behaviour changes if §1-3 are
|
||
correctly implemented, but expect to find at least one issue in §3
|
||
geometry (per the reviewer's "biggest SAP error sources" list).
|
||
|
||
**Important:** Session 1 only starts after all five prerequisites in
|
||
§2.5 have landed and the Validation Cohort probe baseline has been
|
||
captured. Until then, running per-section verification produces noisy
|
||
signal.
|
||
|
||
Run the BRE worked-example fixtures (P5) + Validation Cohort probe
|
||
(P3) at the end of each session; expect no movement until you start
|
||
hitting actual gaps.
|
||
|
||
---
|
||
|
||
## 13. Workflow recap
|
||
|
||
**Phase 0 — Prerequisites (§2.5).** Land P1–P6 first, in dependency
|
||
order:
|
||
|
||
| | Slice | Depends on |
|
||
|---|---|---|
|
||
| P1 | Re-extract parquet with `inspection_date` | — |
|
||
| P2 | Delete cert-cal; correct `table_12.py` CO2 factors | — |
|
||
| P3 | Filter parity probe to Validation Cohort | P1 |
|
||
| P4 | Implement `PcdbLookup` | — (P2 helpful) |
|
||
| P5 | Populate `SapResult.intermediate` + transcribe BRE worked examples | — |
|
||
| P6 | Strict-type `EpcPropertyData` via codes.csv-derived enums | — |
|
||
|
||
P1, P2, P4, P5, P6 can run in parallel. P3 needs P1. Capture a
|
||
Validation Cohort probe baseline once all six land — that is the new
|
||
MAE starting line. Repo-wide tests stay green throughout P6 (Site
|
||
Notes consumers, ML pipeline, recommendations, etc. all need the
|
||
mapper updates that accompany each typing change).
|
||
|
||
**Phase 1 — Section sweep.** For each RdSAP 10 section, in document
|
||
order:
|
||
|
||
1. Read the spec section text + cited tables.
|
||
2. Identify code location(s).
|
||
3. For each rule / table / footnote:
|
||
- Does our code implement it?
|
||
- Does the implementation match?
|
||
- Edge cases / fallback paths handled?
|
||
4. For each gap: AAA unit test (preferring a BRE worked-example
|
||
assertion on `intermediate` values when possible) → minimal
|
||
implementation → commit.
|
||
5. **Apply the worksheet-faithful structure principle** (§5.5) as
|
||
part of this slice: name functions after worksheet lines, split
|
||
compound calculations, replace any remaining defensive
|
||
type-handling with typed-enum dispatch.
|
||
6. After each commit: run `test_bre_worked_examples.py` + Validation
|
||
Cohort probe. Note both deltas in the commit message.
|
||
7. If a BRE worked-example breaks: the new code is wrong (revert).
|
||
The worked examples are spec-derived and cannot be broken by
|
||
spec-correct changes.
|
||
|
||
Stick to this. The prior session's mistake was jumping between
|
||
sections based on residual-size **on a dirty probe**. Clean probe
|
||
plus document-order discipline plus worksheet-faithful structure is
|
||
what makes the sweep converge.
|
||
|
||
---
|
||
|
||
## 14. Useful references
|
||
|
||
- **ADR-0010** `docs/adr/0010-sap10-calculator-spec-target-and-validation.md`
|
||
— the binding decisions reflected in this rewrite: SAP 10.2 target,
|
||
cert-cal deletion, Validation Cohort, PCDB-as-prerequisite, fixture
|
||
retirement. **Read first.**
|
||
- **ADR-0009** `docs/adr/0009-deterministic-sap-calculator.md` —
|
||
original calculator decision rationale + Session A/B/C plan. Read
|
||
for context; spec-version target / PCDB sequencing / cert-cal
|
||
rationale are superseded by ADR-0010.
|
||
- **Spec coverage map**
|
||
`docs/sap-spec/SPEC_COVERAGE.md` — pre-existing coverage tracker.
|
||
Update as you go.
|
||
- **Parity findings**
|
||
`docs/sap-spec/PARITY_FINDINGS.md` — empirical findings from prior
|
||
sessions.
|
||
- **Earlier handover**
|
||
`docs/sap-spec/HANDOVER_FRESH_REVIEW.md` — orientation from the
|
||
previous fresh-context pass.
|
||
- **Reviewer feedback (informal)** — chatGPT critique of the slice-by-
|
||
slice approach. Key recommendations: two-layer architecture
|
||
(RdSAP expansion → SAP worksheet), trace mode, golden-master
|
||
methodology, differential debugging, reference traces from
|
||
Elmhurst/Stroma/Quidos.
|
||
- **Commit log** — `git log --oneline` shows the slice history; each
|
||
S-Bxx commit message documents the spec ref + measured impact.
|
||
|
||
---
|
||
|
||
## 15. Final note
|
||
|
||
The prior session's framing — *"the cert-calibration layer absorbs
|
||
Elmhurst's spec deviations; we'll re-derive it at the end"* — was
|
||
load-bearing on a false diagnosis. The cert-cal layer is
|
||
pre-March-2025 SAP prices fit against a mixture distribution of two
|
||
spec-version regimes. Once you separate the regimes (Validation
|
||
Cohort) and use spec prices everywhere, the "tension" disappears.
|
||
|
||
After P1–P5 land, the section sweep is straightforward: every
|
||
spec-correct fix is unambiguously the right answer, BRE
|
||
worked-example fixtures lock the result, and Validation Cohort probe
|
||
MAE moves monotonically downward. The fixes the prior session marked
|
||
as "spec-correct but probe-regressed" become trivially landable.
|
||
|
||
**Welcome to the project. Read ADR-0010, land the five prerequisites,
|
||
then walk the spec in document order. The deterministic answer is in
|
||
there.**
|