mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
docs: ADR-0010 retargets calculator to SAP 10.2; rewrite handover
Adds ADR-0010 superseding ADR-0009's spec-version target, PCDB
sequencing, and cert-calibration layer. Captures the conclusions
of a grill-with-docs session:
1. Active spec target is SAP 10.2 (14-03-2025), not SAP 10.3 — no
SAP-10.3-lodged certs exist in the corpus to validate against.
2. table_12_cert_calibration is deleted (not "re-derived at the
end"). It was pre-March-2025 spec prices fit against a mixture
distribution of two spec-version regimes, with downstream-
component bugs absorbed into the fit — not Elmhurst deviation.
3. Validation Cohort: filter the corpus to inspection_date ≥
2025-07-01 so every cert in the probe was lodged on SAP 10.2
(14-03-2025) prices. One spec, one signal.
4. PCDB integration is promoted from "Session C deferred" to
prerequisite P4 — dominates residual variance on heat pumps and
the 78% of gas-boiler certs lodging main_heating_data_source=1.
5. Trace mode (SapResult.intermediate) and BRE worked-example
fixtures replace the 7 cert-based golden fixtures, which
contained compensating errors.
6. Strict-type EpcPropertyData via codes.csv-derived canonical
enums (P6) — the in-source motivation lives at
dimensions.py:74-82 (Khalim's comment, included in this commit).
7. Worksheet-faithful structure is a sweep-time principle: each
worksheet module mirrors SAP 10.2 worksheet line numbering.
CONTEXT.md additions:
- Refined "Calculated SAP10 Performance" and "SAP10 Calculation"
to reference SAP 10.2 + ADR-0010.
- New term "SAP Spec Version" — domain-meaningful because the
same EpcPropertyData yields different sap_score under different
spec revisions.
- New term "Validation Cohort" — the version-locked sub-corpus.
HANDOVER_SYSTEMATIC_REVIEW.md is rewritten section-by-section to
reflect ADR-0010: §1 framing, §2 status pointer, new §2.5 with the
six prerequisites P1–P6 in dependency order, §3 diagnosis (cert-cal
was stale prices, not Elmhurst deviation), §4 scope (PCDB IN,
SAP 10.3 stays OUT), §5 approach (worksheet-faithful principle as
§5.5), §7 tension dissolved, §7b findings re-framed, §8 dead-ends
re-classified as conditional, §9 cohort filter, §10 fixture
strategy, §11 trace mode as prerequisite, §12 prereqs-first,
§13 Phase 0/Phase 1 workflow, §14 ADR-0010 reference, §15 final
note.
P2.1 (commit ac1aa56a) already lands the first ADR-0010 slice
(probe swap to spec prices).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
ac1aa56ab1
commit
bb9c5ac017
4 changed files with 551 additions and 209 deletions
14
CONTEXT.md
14
CONTEXT.md
|
|
@ -98,12 +98,20 @@ The SAP / EPC Band / carbon emissions / heat demand the modelling pipeline actua
|
|||
_Avoid_: modelled performance, rebaselined performance (only correct when rebaselining ran), scored values
|
||||
|
||||
**Calculated SAP10 Performance**:
|
||||
The SAP score, EPC Band, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh produced by **SAP10 Calculation** from a Property's EpcPropertyData. Distinct from Effective Performance (ML output) and Lodged Performance (gov register) during the validation phase. Surfaced alongside Effective Performance in the UI; may supersede Effective Performance in a later ADR once parity is confirmed against the cert-reported SAP across ≥1000 sample certs. ADR-0009.
|
||||
The SAP score, EPC Band, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh produced by **SAP10 Calculation** from a Property's EpcPropertyData. Distinct from Effective Performance (ML output) and Lodged Performance (gov register) during the validation phase. Surfaced alongside Effective Performance in the UI; may supersede Effective Performance in a later ADR once parity is confirmed against the cert-reported SAP across ≥1000 sample certs lodged on the calculator's target spec version (see [[sap-spec-version]]). ADR-0009 (as amended by ADR-0010).
|
||||
_Avoid_: calculator output, computed performance, worksheet performance, SAP10 output
|
||||
|
||||
**SAP10 Calculation**:
|
||||
The process that runs the deterministic SAP 10.3 worksheet over a Property's EpcPropertyData and emits **Calculated SAP10 Performance**. Implemented by the `Sap10Calculator` service class in `domain/sap/`. Reads cert fabric/heating/geometry fields, applies the RdSAP 10 cert→input mapping, executes the 12-month heat balance per SAP 10.3 §§1-14, and returns a `SapResult` carrying the five Calculated SAP10 Performance quantities plus a monthly breakdown and worksheet-line audit trail. Distinct from **Rebaselining**, which is ML-based. ADR-0009.
|
||||
_Avoid_: SAP calculation (ambiguous with the gov calculator), SAP scoring, calculator run
|
||||
The process that runs the deterministic SAP 10.2 (14-03-2025 amendment) worksheet over a Property's EpcPropertyData and emits **Calculated SAP10 Performance**. Implemented by the `Sap10Calculator` service class in `domain/sap/`. Reads cert fabric/heating/geometry fields, applies the RdSAP 10 (10-06-2025) cert→input mapping, executes the 12-month heat balance per SAP 10.2 §§1-14, looks up boiler/heat-pump performance in the **PCDB** when the cert lodges a product index, and returns a `SapResult` carrying the five Calculated SAP10 Performance quantities plus a monthly breakdown and worksheet-line audit trail. Distinct from **Rebaselining**, which is ML-based. ADR-0009 originally targeted SAP 10.3 (13-01-2026); ADR-0010 retargets to SAP 10.2 (14-03-2025) until the cert corpus migrates.
|
||||
_Avoid_: SAP calculation (ambiguous with the gov calculator), SAP scoring, calculator run, SAP 10.3 calculation (active target is 10.2 — see [[sap-spec-version]])
|
||||
|
||||
**SAP Spec Version**:
|
||||
The dated revision of the SAP specification that produced a given SAP/PEUI/CO2 value. Domain-meaningful because the same EpcPropertyData yields different `sap_score` under different spec versions — fuel-price tables, CO2 factors, PCDB references, and rating-equation deflators all change between revisions. **Lodged Performance** carries the version current when the cert was lodged (mostly SAP 10.1 / SAP 10.2 pre- and post-14-03-2025 amendment in the corpus). **Calculated SAP10 Performance** is locked to SAP 10.2 (14-03-2025). A 1-to-1 Lodged-vs-Calculated comparison therefore only makes sense within a **Validation Cohort** of certs lodged on the same spec version.
|
||||
_Avoid_: SAP version (ambiguous with the `sap_version` field on the cert, which only carries the major version like 10.2 — not the amendment date), spec revision
|
||||
|
||||
**Validation Cohort**:
|
||||
The subset of corpus certs used to validate **SAP10 Calculation** against **Lodged Performance**, filtered to certs lodged after the calculator's target **SAP Spec Version** rolled out in commercial assessor software — currently `inspection_date ≥ 2025-07-01` (a buffer past 14-03-2025 to allow vendor rollout). Smaller than the full corpus but each cert is comparable under the same spec, so probe MAE is a clean signal of calculator-vs-spec correctness rather than spec-version mixture noise. ADR-0010.
|
||||
_Avoid_: parity cohort, validation set, corpus sample
|
||||
|
||||
**Measure Application**:
|
||||
The process that translates an Optimised Package into cert-field changes and produces the "ending state snapshot" EpcPropertyData that Plan Phase persists. Implemented by the `MeasureApplicator` service class in `domain/sap/` (or a sibling package). Each Measure Type's translation rules (e.g. `loft_insulation` → `roof_insulation_thickness_mm = 270mm`, `ashp` → `main_heating_details[0]` replacement) live here. Pure function — does not run SAP10 Calculation itself; the caller chains `MeasureApplicator.apply(epc, package) → Sap10Calculator.calculate(post_epc)`. ADR-0009.
|
||||
|
|
|
|||
68
docs/adr/0010-sap10-calculator-spec-target-and-validation.md
Normal file
68
docs/adr/0010-sap10-calculator-spec-target-and-validation.md
Normal file
|
|
@ -0,0 +1,68 @@
|
|||
# Retarget Sap10Calculator to SAP 10.2 (14-03-2025); delete cert-calibration; validate on a spec-version-locked cohort
|
||||
|
||||
**Status: Accepted.** Supersedes the spec-version target, the PCDB sequencing, and the cert-calibration layer of [ADR-0009](0009-deterministic-sap-calculator.md). Adds strict typing of `EpcPropertyData` (P6) and a worksheet-faithful structural principle for the `domain/sap/worksheet/*` modules — both new concerns ADR-0009 didn't address. All other ADR-0009 decisions stand (Calculated SAP10 Performance as a glossary term, MeasureApplicator/Sap10Calculator chain, MCS boolean default-false, global thermal-bridging y factor, Table 27 living-area fraction, Table 11 secondary-heating allocation, MeasureOverrides rejection).
|
||||
|
||||
## Why this ADR exists
|
||||
|
||||
ADR-0009 was written before a second-order problem in the validation corpus was visible: the 250k-cert training parquet spans **multiple SAP spec versions** (SAP 10.1 from 2019, SAP 10.2 pre- and post-14-March-2025 amendment), each of which was the active table when its certs were lodged. The prior session's `domain.sap.tables.table_12_cert_calibration` layer was implicitly absorbing this version mixture into a single "best fit" price set ~10–25 % lower than the SAP 10.2 (14-03-2025) spec — closer to the SAP 10.1 era prices. Every spec-correctness slice that touched a downstream component (HW cylinder zero-loss, gas standing charges, Table 12a fractional blending) registered as a regression on the parity probe because the cert-cal layer had been numerically calibrated against the buggy state of every other component.
|
||||
|
||||
This ADR resolves four entangled decisions at once. They are coupled — none of them is the right call in isolation.
|
||||
|
||||
## Decisions
|
||||
|
||||
### 1. Active spec target is **SAP 10.2 (14-03-2025)**, not SAP 10.3
|
||||
|
||||
ADR-0009 named SAP 10.3 (13-01-2026) as the calculator's target. No SAP-10.3-lodged certs exist in the corpus; assessor software has not migrated. Targeting SAP 10.3 produces a calculator whose output is verifiable against no cert. The active target is SAP 10.2 (14-03-2025 amendment) — both the document RdSAP 10 (10-06-2025) cross-references for heating-system identification, and the amendment that current assessor software is on.
|
||||
|
||||
`packages/domain/src/domain/sap/tables/table_12.py` is re-labelled as SAP 10.2 (14-03-2025). Its CO2 factors are corrected to spec (0.210 kg/kWh mains gas, 0.136 kg/kWh standard electricity — the file currently has SAP 10.3 values 0.214 and 0.086). Prices already match SAP 10.2 (3.64 p mains gas, 16.49 p standard electricity, etc.) — the misleading "+25 % shift from SAP 10.2 to 10.3" comment is removed; the 13.19 p figure is from SAP 10.1, not SAP 10.2.
|
||||
|
||||
A future ADR retargets to SAP 10.3 once the cert corpus migrates (expected late 2026 or 2027 once BRE updates RdSAP to reference SAP 10.3).
|
||||
|
||||
### 2. `table_12_cert_calibration` is deleted
|
||||
|
||||
The cert-calibration table is bug-masking. Its prices are pre-March-2025 SAP values fit against the average cert in a mixed-version corpus, with downstream-component bugs absorbed into the fit. Removing it forces upstream errors to surface where they live, in the component that owns them, instead of being silently compensated for by a price tweak.
|
||||
|
||||
This includes the `cert_calibration_e7_codes` extension that routes codes 191–196 (direct-electric) and 691–696 (room heaters) to off-peak rates — Table 12a is explicit that "other direct-acting electric heating" bills 100 % at the high rate on a 7-hour tariff. The S-B14 finding that motivated this hack is in §8 of the handover as a documented dead-end.
|
||||
|
||||
`domain.sap.tables.table_12.unit_price_p_per_kwh` becomes the only price API. Parity probes are updated to use it.
|
||||
|
||||
### 3. Validation Cohort is filtered to a single spec-version window
|
||||
|
||||
Probe MAE against the full 250k-cert corpus measures both calculator correctness *and* the spec-version drift across certs lodged at different times. Without separating them, every spec-correctness improvement is noisy.
|
||||
|
||||
The **Validation Cohort** is the subset of corpus certs with `inspection_date ≥ 2025-07-01` — chosen to allow ~4 months past the 14-March-2025 SAP 10.2 amendment for commercial assessor software to roll out the new tables. Filtering to this cohort yields a probe where every cert was lodged on the same spec version the calculator targets. MAE on the Validation Cohort is the only metric used for spec-sweep go/no-go.
|
||||
|
||||
This requires re-extracting the training parquet to include `inspection_date` (currently dropped by the ETL — 202 columns, none of them dates). That extraction is a prerequisite slice.
|
||||
|
||||
### 4. PCDB integration is promoted from Session C to a prerequisite
|
||||
|
||||
ADR-0009 deferred PCDB to Session C and shipped a `NoOpPcdbLookup` stub. The handover's own measurements show PCDB absence accounts for ~19 SAP points of MAE on heat-pump certs (Table 4a fallback SCOP 2.30 vs typical PCDB 2.80–3.50) and most per-cert variance on the 78 % of gas-boiler certs lodging `main_heating_data_source=1` (category-default 0.80 vs typical PCDB 0.88–0.94). The handover's rationale for deferral ("cert-cal absorbs PCDB gaps") collapses with decision (2).
|
||||
|
||||
PCDB lookup against `main_heating_index_number` is built before the section-by-section sweep starts. Data source: https://www.ncm-pcdb.org.uk — CSV exports of boilers and heat pumps. Per-product fields needed: seasonal efficiency, secondary efficiency, output kW, flow-temperature curve (heat pumps). The `NoOpPcdbLookup` seam from ADR-0009 grill outcome #1 is the integration point; the stub returns None and the calculator falls back to Table 4a only when the cert lodges no `main_heating_index_number` or the PCDB has no matching record.
|
||||
|
||||
## Verification infrastructure (also prerequisites)
|
||||
|
||||
Three pieces of infrastructure are built before the section sweep so per-section verification has unambiguous signal:
|
||||
|
||||
1. **Trace mode populated.** ADR-0009 specced `SapResult.intermediate: dict[str, float]` and it was never built. Every named SAP 10.2 worksheet variable (heat transfer coefficient, mean internal temperature, monthly solar gains, utilisation factor, ECF, etc.) is exposed on `intermediate` so any single cert can be diffed against a hand-computed value, a BRE worked example, or a future Elmhurst reference trace.
|
||||
2. **BRE worked-example unit tests.** SAP 10.2 spec appendices and RdSAP 10 worked examples are transcribed as fixtures keyed on per-intermediate expected values, not aggregate SAP score. These replace the 7 cert-based golden fixtures (which contained compensating errors per the handover §10). The cert fixtures are retired.
|
||||
3. **Strict typing of `EpcPropertyData` via canonical domain enums.** Bare `str` and `Union[int, str]` fields (the latter because the gov API gives ints and Site Notes give strings) cascade defensive type-handling into every consumer — the calculator's `dimensions.py:74-82` is Khalim's documented example. The domain holds one canonical enum per field, derived from `datatypes/epc/domain/epc_codes.csv` (union of keys across schema versions, hand-authored). The API mapper and Site Notes mapper each adapt their raw input to the canonical enum. Repo-wide test compatibility is a hard constraint — every consumer of `EpcPropertyData` (calculator, ML pipeline, recommendations, ETL) continues working after the typing pass. Pyright `strict` mode stays clean.
|
||||
|
||||
These map to prerequisites P5 (trace mode + BRE fixtures) and P6 (strict typing) in the handover §2.5.
|
||||
|
||||
## Worksheet-faithful structure (sweep-time principle)
|
||||
|
||||
Each `domain/sap/worksheet/*.py` module must mirror the SAP 10.2 worksheet structure for its section — function names reference their worksheet-line origin (e.g. `heat_transfer_coefficient` aligns with worksheet line (40)), compound calculations split into one function per line where possible, defensive type-handling replaced by typed-enum dispatch. This is not a prerequisite slice; the refactor lands as part of each section's sweep slice, verified by the BRE worked examples (which assert per-intermediate values).
|
||||
|
||||
## Consequences
|
||||
|
||||
- ADR-0009's "MAE ≤ 1.0 SAP-point on typical subset" success criterion is restated against the Validation Cohort (not the full corpus). The "typical subset" exclusions in ADR-0009 (sap_score ≤ 5, ≥ 100, multi-heating, conservatory, RIR) still apply on top of the cohort filter.
|
||||
- The training parquet schema bumps when `inspection_date` is added — a non-breaking MINOR addition under [ADR-0008](0008-physics-as-feature.md)'s `Feature Schema Version` discipline.
|
||||
- The handover document `docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md` is rewritten in lockstep: §3 (diagnosis), §4 (scope), §7 (state-A-vs-state-B framing deleted), §7b (findings re-framed), §10 (fixture strategy), and a new §2.5 listing the five prerequisites.
|
||||
- Sessions A/B/C from ADR-0009 collapse into a single sequence: prerequisites land, then the section sweep runs against a clean probe with PCDB available.
|
||||
|
||||
## Considered alternatives
|
||||
|
||||
- **Build versioned Table 12 (pre/post 14-March-2025) keyed on `inspection_date` and validate across the full corpus.** Rejected as more work for no signal benefit during the spec sweep — the filtered cohort gets us to a clean probe faster. A versioned table is still future work if Calculated SAP10 Performance ever needs to reproduce historical cert SAP for products that compare against Lodged Performance directly.
|
||||
- **Keep cert-cal during the sweep and re-derive at the end** (the handover's prescription). Rejected for the reasons in decision (2): the cert-cal layer corrupts the signal during the sweep, which is precisely when the signal needs to be cleanest.
|
||||
- **Pay for an Elmhurst license, lock fixtures to its output.** Held in reserve. BRE worked examples are free and spec-derived; an Elmhurst trace would add value as a per-component reference but is not a prerequisite.
|
||||
|
|
@ -24,10 +24,12 @@ The SAP/RdSAP energy assessment splits cleanly into two roles:
|
|||
takes the lodged fields and produces SAP score, CO2 emissions,
|
||||
primary energy (PEUI), CO2 per m², EI rating, etc.
|
||||
|
||||
**Our calculator is replicating role #2.** Where Elmhurst's
|
||||
implementation diverges from spec, we follow Elmhurst, but we don't
|
||||
guess at divergence; we localise it via reference traces or
|
||||
empirically against the cert corpus.
|
||||
**Our calculator is replicating role #2.** Assessor software
|
||||
implements the SAP 10.2 spec faithfully; the question of "where does
|
||||
Elmhurst diverge from spec?" is no longer the operative one (per
|
||||
ADR-0010 + §3 below). Our job is to enumerate every spec
|
||||
table / formula / footnote and verify each against the published SAP
|
||||
10.2 (14-03-2025) and RdSAP 10 (10-06-2025) PDFs.
|
||||
|
||||
There is no "assessor judgement" knob to tune. Each field on the cert
|
||||
has a deterministic interpretation per the spec. Each spec table /
|
||||
|
|
@ -57,109 +59,259 @@ all of them and verify each.
|
|||
Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10 kWh/m²`. Known
|
||||
caveat: some of these are compensating-error matches (e.g. cert
|
||||
`7536-3827`'s PE matches but cost is £143 under cert's implied cost
|
||||
due to multi-factor offsetting bugs).
|
||||
due to multi-factor offsetting bugs). **These fixtures are retired
|
||||
per ADR-0010 and §10 below — they lock buggy compensating outputs
|
||||
in place and will fight the spec sweep.**
|
||||
|
||||
> **Read this before anything else.** [ADR-0010](../adr/0010-sap10-calculator-spec-target-and-validation.md)
|
||||
> supersedes the spec-version target, the PCDB sequencing, and the
|
||||
> cert-calibration layer of ADR-0009. This handover document was
|
||||
> originally written under the rejected framing; §3, §4, §7, §7b,
|
||||
> §10 below have been rewritten in lockstep. §2.5 lists the five
|
||||
> prerequisites that land **before** the section-by-section sweep
|
||||
> starts.
|
||||
|
||||
---
|
||||
|
||||
## 3. Why we are pivoting to systematic review
|
||||
## 2.5. Prerequisites before the sweep starts
|
||||
|
||||
Five blockers, in dependency order. The section sweep does not start
|
||||
until all five are merged. Together they convert the parity probe
|
||||
from a noisy mixture-distribution signal into a clean per-section
|
||||
verification tool.
|
||||
|
||||
### P1 — Re-extract the training parquet with `inspection_date`
|
||||
|
||||
The 250k-cert parquet has 202 columns; **none of them are dates**.
|
||||
Without `inspection_date` on each cert we cannot construct the
|
||||
Validation Cohort (P3). The ETL currently drops the dates; add them
|
||||
back as a non-breaking MINOR Feature Schema Version bump (per
|
||||
ADR-0008). `EpcPropertyData.inspection_date` and `.registration_date`
|
||||
both exist on the domain object and are populated upstream — the
|
||||
parquet writer just needs to include them.
|
||||
|
||||
### P2 — Delete `domain.sap.tables.table_12_cert_calibration`; correct `domain.sap.tables.table_12`
|
||||
|
||||
Per ADR-0010 §2 and §1:
|
||||
- Remove `table_12_cert_calibration.py` and every call site
|
||||
(`cert_calibration_prices()`, `cert_calibration_e7_codes`, the
|
||||
`PriceTable` constructor argument that defaults to it).
|
||||
- Re-label `table_12.py` as `SAP 10.2 Table 12 (14-03-2025 amendment)`.
|
||||
- Correct CO2 factors: mains gas 0.214 → **0.210**, standard electricity 0.086 → **0.136** (the file currently mixes SAP 10.2 prices with SAP 10.3 CO2 factors).
|
||||
- Delete the misleading "+25 % shift from SAP 10.2" comment — 13.19 p
|
||||
is SAP 10.1 (or SAP 10.2 amendment 0), not SAP 10.2 (14-03-2025).
|
||||
|
||||
### P3 — Filter the parity probe to the Validation Cohort
|
||||
|
||||
`Validation Cohort` is defined in `CONTEXT.md` and ADR-0010 §3:
|
||||
`inspection_date ≥ 2025-07-01`. Modify
|
||||
`services/ml_training_data/src/ml_training_data/sap_parity_probe.py`
|
||||
to apply the filter before sampling. The probe sample size and seed
|
||||
remain configurable; `sap_score ∈ [5, 99]` remains the typicality
|
||||
filter on top of the cohort filter.
|
||||
|
||||
### P4 — Implement `PcdbLookup` (replace `NoOpPcdbLookup`)
|
||||
|
||||
Per ADR-0010 §4. Download boiler + heat-pump CSVs from
|
||||
https://www.ncm-pcdb.org.uk. Build a lookup keyed on
|
||||
`main_heating_index_number`. Surface seasonal efficiency, secondary
|
||||
efficiency, output kW, and (for HPs) flow-temperature curve. ~half-day
|
||||
of work per the original handover estimate. The
|
||||
`Sap10Calculator.__init__(pcdb: Optional[PcdbLookup])` seam from
|
||||
ADR-0009 grill outcome #1 is the integration point; no calculator-side
|
||||
changes needed beyond reading `index_number` and routing PCDB-returns
|
||||
to space-heating / hot-water efficiency lookups instead of Table 4a.
|
||||
|
||||
### P5 — Populate `SapResult.intermediate` + transcribe BRE worked examples
|
||||
|
||||
Per ADR-0010 "Verification infrastructure":
|
||||
- Populate every named SAP 10.2 worksheet variable on
|
||||
`SapResult.intermediate` as sketched in §11. This is mechanical —
|
||||
thread the values from each worksheet module into the dict.
|
||||
- Transcribe the BRE worked examples from the SAP 10.2 appendices and
|
||||
RdSAP 10 worked-example annex into unit tests
|
||||
(`tests/test_bre_worked_examples.py`) that lock per-intermediate
|
||||
values, not aggregate SAP. These replace the retired cert fixtures.
|
||||
|
||||
### P6 — Strict-type `EpcPropertyData` via canonical domain enums
|
||||
|
||||
The current `EpcPropertyData` and its nested types carry many bare
|
||||
`str` fields and `Union[int, str]` fields (the latter because the
|
||||
gov API gives ints and Site Notes give strings). The defensive
|
||||
type-handling cascades into the calculator (`cert_to_inputs.py`,
|
||||
`dimensions.py`, etc.) — `dimensions.py:74-82` is Khalim's documented
|
||||
example: `SapBuildingPart.identifier` carries main-vs-extension
|
||||
information but is bare `str`, so the dimensions code defensively
|
||||
iterates instead of dispatching on a typed kind.
|
||||
|
||||
The fix:
|
||||
1. **One canonical enum per field**, union of all keys appearing
|
||||
across all schema versions in
|
||||
`datatypes/epc/domain/epc_codes.csv`. Hand-author the 18 enum
|
||||
classes (`built_form`, `construction_age_band`, `energy_tariff`,
|
||||
`glazed_area`, `glazed_type`, `heat_loss_corridor`, `main_fuel`,
|
||||
`mechanical_ventilation`, `property_type`, `tenure`,
|
||||
`transaction_type`, `ventilation_type`, `water_heating_fuel`,
|
||||
`cylinder_insulation_thickness`, `energy_efficiency_rating`,
|
||||
`improvement_description`, `improvement_summary`, `code`) plus
|
||||
`BuildingPartKind` (Main Dwelling / Extension N). codes.csv is
|
||||
the reference; a dedup script can optionally verify coverage but
|
||||
is not a build dependency.
|
||||
2. **The API mapper** parses raw ints into the canonical enum.
|
||||
3. **The Site Notes mapper** parses raw strings into the canonical
|
||||
enum.
|
||||
4. **The domain object** (`EpcPropertyData` and nested) holds only
|
||||
the canonical enums — no `Union[int, str]`, no bare `str` for
|
||||
coded fields.
|
||||
5. **Every consumer** (calculator, ML pipeline, recommendations,
|
||||
ETL, scenario builder) reads from the typed fields.
|
||||
|
||||
**Constraint**: repo-wide tests must keep passing. The calculator
|
||||
is one consumer; the ML pipeline, recommendations, and the Site
|
||||
Notes ingestion path also consume `EpcPropertyData`. Each mapper-
|
||||
layer change is paired with adapter updates that preserve the
|
||||
behaviour the existing tests cover.
|
||||
|
||||
Pyright `strict` mode must remain clean (CLAUDE.md).
|
||||
|
||||
### Expected outcome of P1–P6
|
||||
|
||||
After all six land, run the probe against the Validation Cohort. The
|
||||
expected baseline MAE on the clean probe is much smaller than the
|
||||
current 4.61 — likely 1.5–2.5 SAP-points based on what we know about
|
||||
the residual breakdown (heat pumps closed by P4, gas boilers tightened
|
||||
by P4, price-version noise removed by P2+P3). The remaining residual
|
||||
is the genuine spec sweep target — and per-section fixes will move
|
||||
the probe in measurable, distinguishable amounts because there's no
|
||||
compensating layer to mask them, and there's no defensive type
|
||||
branching obscuring which input value drove which intermediate.
|
||||
|
||||
---
|
||||
|
||||
## 3. Why the prior diagnosis was wrong and how we fixed it
|
||||
|
||||
The prior session shipped ten slices (S-B23 → S-B31) by debugging the
|
||||
biggest residuals one at a time:
|
||||
|
||||
- **PE MAE dropped substantially: 57.28 → 43.32 (−14)** — real progress
|
||||
on the demand-side calculation.
|
||||
- **SAP MAE barely moved: 5.34 → 4.61 (−0.73)** — the cost-side is
|
||||
bottlenecked by cert-calibration prices that absorb multiple
|
||||
structural deviations from spec, making any single slice that fixes
|
||||
one component break the calibration for others.
|
||||
- **SAP MAE barely moved: 5.34 → 4.61 (−0.73)** — diagnosed at the time
|
||||
as "cert-calibration absorbs multiple spec deviations".
|
||||
|
||||
Two failed slice attempts in the prior session exposed the pattern:
|
||||
Three slice attempts looked like they "proved" the cert-cal-absorbs-
|
||||
deviations diagnosis:
|
||||
|
||||
- **Standing charges**: spec note Table 12 (a) clearly says gas standing
|
||||
charge of £92 is added to space + water heating costs for energy
|
||||
ratings. Empirically: adding it pushed SAP bias from +0.98 to −2.62.
|
||||
Reverted before committing.
|
||||
- **Cat=10 room heaters off-peak routing**: Table 12a clearly says
|
||||
"Other direct-acting electric heating" bills 100% high rate on
|
||||
7-hour tariff. Empirically: switching cat=10 from off-peak to
|
||||
standard rate inverted the bias from +5.88 to −6.00 without
|
||||
improving MAE. Reverted before committing.
|
||||
- **Hot water cylinder loss (uncommitted)**: spec Table 2 footer +
|
||||
Table 3 footer clearly say combi boilers using Table 4b efficiency
|
||||
have zero storage + primary loss. Empirically: zeroing them dropped
|
||||
PE MAE −6.64 (huge improvement) but raised SAP MAE +0.39 AND broke
|
||||
3 of 7 golden fixtures. Reverted because no way to know whether to
|
||||
follow spec (PE-correct) or Elmhurst (SAP-MAE-correct) without
|
||||
reference traces.
|
||||
- **Standing charges**: spec Table 12 note (a) requires £92/yr gas
|
||||
standing charge on space + water heating. Adding it pushed SAP bias
|
||||
+0.98 → −2.62. Reverted.
|
||||
- **Cat=10 room heaters off-peak routing**: Table 12a says "other
|
||||
direct-acting electric heating" bills 100 % high rate on 7-hour
|
||||
tariff. Switching cat=10 from off-peak to standard rate inverted
|
||||
the bias +5.88 → −6.00. Reverted.
|
||||
- **HW cylinder zero-loss for combi** (uncommitted): Table 2 + Table
|
||||
3 footers require zero storage + primary loss when efficiency comes
|
||||
from Table 4b. Zeroing them dropped PE MAE −6.64 but raised SAP
|
||||
MAE +0.39 and broke 3 of 7 golden fixtures. Reverted.
|
||||
|
||||
The pattern: **the cert-calibration prices** (in
|
||||
`domain.sap.tables.table_12_cert_calibration`) **were reverse-engineered
|
||||
to match Elmhurst's output assuming all our other calculations are
|
||||
correct.** When we fix a spec-violation bug in some other component, we
|
||||
break the calibration and SAP MAE goes up even though we're more
|
||||
spec-correct.
|
||||
The prior agent concluded: *cert-calibration absorbs Elmhurst's
|
||||
deviations from spec — we can't fix one without re-deriving the
|
||||
calibration, so do a full spec sweep first and re-derive cert-cal at
|
||||
the end.* This diagnosis is **wrong** and the proposed remedy
|
||||
amplifies the problem.
|
||||
|
||||
This means **whack-a-mole on the biggest residual won't converge**. We
|
||||
need to systematically verify every component against the spec, then
|
||||
re-derive the cert-calibration once at the end.
|
||||
### What was actually going on
|
||||
|
||||
The 250k-cert corpus spans multiple SAP spec-version regimes:
|
||||
- **Pre-2025-03-14**: certs lodged under SAP 10.1 / SAP 10.2 amendment
|
||||
0 prices — mains gas ~3.48 p, standard electricity 13.19 p.
|
||||
- **Post-2025-03-14**: certs lodged under SAP 10.2 (14-03-2025) prices
|
||||
— mains gas 3.64 p, standard electricity 16.49 p.
|
||||
|
||||
The `table_12_cert_calibration` prices (3.48 p / 13.19 p) are **the
|
||||
older spec's prices**, not Elmhurst deviations from the spec. They
|
||||
are an empirical "best fit" across a mixture distribution of two
|
||||
price regimes, with downstream-component bugs (PCDB absence, HW
|
||||
cylinder loss applied to combi, etc.) absorbed into the fit. The
|
||||
table looks like compensation for assessor-software quirks because we
|
||||
were never told which spec each cert was on.
|
||||
|
||||
Each "spec-correct fix that worsened MAE" in the failed slices above
|
||||
was actually correct. The MAE regressed because:
|
||||
1. The cert-cal prices (pre-March-2025 spec) cancelled with one set
|
||||
of downstream errors to produce a quasi-stable cost.
|
||||
2. The spec-correct fix landed → that cancellation broke → the
|
||||
probe MAE went up.
|
||||
3. But the spec-correct fix was *right* — what regressed was a
|
||||
compensating-error equilibrium, not the calculator's truth.
|
||||
|
||||
The prior session's "re-derive cert-cal at the end" plan would
|
||||
re-establish a new compensating-error equilibrium across the new bug
|
||||
set. It does not converge on spec-correctness.
|
||||
|
||||
### The fix (per ADR-0010)
|
||||
|
||||
1. **Stop fitting against a mixture distribution.** Filter the
|
||||
validation corpus to a single spec-version window (Validation
|
||||
Cohort, `inspection_date ≥ 2025-07-01`). Every cert in the cohort
|
||||
was lodged on SAP 10.2 (14-03-2025) prices.
|
||||
2. **Delete the cert-calibration layer.** Use spec prices everywhere
|
||||
(`domain.sap.tables.table_12`). The only price-routing decision
|
||||
left is Table 12a fractional high-rate blending — a real spec
|
||||
feature, not a calibration.
|
||||
3. **Build PCDB**, because it dominates residual variance and the
|
||||
reason it was deferred (cert-cal-absorbs-PCDB) no longer holds.
|
||||
4. **Build trace mode and BRE worked-example fixtures**, so
|
||||
per-section verification works against single-cert intermediates
|
||||
instead of aggregate corpus MAE.
|
||||
|
||||
This is what §2.5 lists as the five prerequisites. Once they land,
|
||||
the section-by-section spec sweep produces clean, monotonic
|
||||
improvements.
|
||||
|
||||
---
|
||||
|
||||
## 4. Scope decisions
|
||||
## 4. Scope decisions (per ADR-0010)
|
||||
|
||||
### IN scope
|
||||
- **RdSAP 10 specification (10-06-2025)** — full document, all sections
|
||||
(`docs/sap-spec/rdsap-10-specification-2025-06-10.pdf`, 114 pages).
|
||||
- **SAP 10.2 full specification (14-03-2025)** — the worksheet, tables,
|
||||
appendices that RdSAP 10 references
|
||||
(`docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf`, 199 pages).
|
||||
- **SAP 10.2 (14-03-2025 amendment)** is the active spec target.
|
||||
`docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf`, 199 pages.
|
||||
- **RdSAP 10 (10-06-2025)** — the cert→input mapping layer that
|
||||
cross-references SAP 10.2. `docs/sap-spec/rdsap-10-specification-2025-06-10.pdf`,
|
||||
114 pages.
|
||||
- **PCDB integration.** Moved from "Session C deferred" to **P4
|
||||
prerequisite** (§2.5). Heat pumps and the 78 % of gas-boiler certs
|
||||
lodging `main_heating_data_source=1` need PCDB-sourced efficiency
|
||||
for the calculator to be spec-correct. Data source:
|
||||
https://www.ncm-pcdb.org.uk; lookup keyed on `main_heating_index_number`;
|
||||
fields: seasonal efficiency, secondary efficiency, output kW,
|
||||
flow-temperature curve (HPs).
|
||||
- **All RdSAP 10 sections in document order.** §1 → §§19, plus
|
||||
Tables 27 / 28 / 29 / 30 / 31. The verification approach in §5 is
|
||||
unchanged — only the precondition changes: the sweep runs against a
|
||||
clean probe (Validation Cohort + spec prices + PCDB + trace mode).
|
||||
|
||||
### OUT of scope (for now)
|
||||
- **Full SAP assessments.** Full-SAP certs lodge a measured/calculated
|
||||
U-value in `walls[i].description` (e.g.
|
||||
### OUT of scope
|
||||
- **Full SAP assessments.** Full-SAP certs lodge measured/calculated
|
||||
U-values in `walls[i].description` (e.g.
|
||||
"Average thermal transmittance 0.18 W/m²K"). These are a separate
|
||||
calculation path (BS EN ISO 6946) and a different corpus. **Park them
|
||||
until the RdSAP 10 base case matches Elmhurst.** S-B24 / S-B29
|
||||
calculation path (BS EN ISO 6946) and a different corpus. Park
|
||||
until the RdSAP 10 base case parity is reached. S-B24 / S-B29
|
||||
attempted partial handling; those slices can stay or be reverted at
|
||||
your discretion when you reach §§4-7 of RdSAP and §3 of SAP 10.2.
|
||||
- **PCDB (Product Characteristics Database).** ADR-0009 deferred this
|
||||
to Session C. **This is a real future task, not a permanent
|
||||
exclusion.** Heat pumps (cat=4) have catastrophic per-cert MAE (19
|
||||
SAP points) because we use Table 4a fallback efficiency 2.30
|
||||
instead of PCDB SCOP (typically 2.80-3.50). Gas boilers with
|
||||
`main_heating_data_source=1` (78% of corpus boiler certs) fall back
|
||||
to a category-default 0.80 vs typical PCDB-listed condensing-boiler
|
||||
efficiencies of 0.88-0.94 — that's most of the per-cert SAP residual
|
||||
variance on gas certs.
|
||||
|
||||
A `NoOpPcdbLookup` stub seam exists in Session A (per ADR-0009 grill
|
||||
outcome #1). The fetch+parse work is non-trivial:
|
||||
- **Data source**: BRE PCDB at https://www.ncm-pcdb.org.uk —
|
||||
boilers + heat pumps are downloadable CSVs (thousands of rows
|
||||
each).
|
||||
- **Lookup key**: cert lodges `main_heating_index_number` which is
|
||||
the PCDB product ID. Match by that.
|
||||
- **Per-product fields needed**: seasonal efficiency, secondary
|
||||
efficiency, output kW, flow-temperature curve (for HPs).
|
||||
- **Effort**: ~half-day for the lookup + tests; ongoing maintenance
|
||||
when BRE publishes new PCDB revisions.
|
||||
|
||||
**Recommended sequencing**: complete the systematic RdSAP spec
|
||||
sweep first. Once the spec-correct engine is built and cert-cal
|
||||
re-derived, PCDB integration should drop heat-pump residuals from
|
||||
19 SAP points to ~1, and tighten the gas-boiler residual variance.
|
||||
At that point heat pumps (cat=4) and PCDB-listed boilers
|
||||
(`main_heating_data_source=1`) become accessible.
|
||||
|
||||
**Why not now**: the cert-calibration prices currently absorb the
|
||||
missing PCDB efficiency (HP costs at off-peak rate compensates for
|
||||
too-low SCOP). Fixing PCDB without re-deriving cert-cal would push
|
||||
HP certs in the wrong direction. Same lesson as the other reverted
|
||||
fixes in §7b — fix the spec layer first, the calibration layer
|
||||
later.
|
||||
- **SAP 10.3** (13-01-2026). The corpus is SAP 10.2. SAP 10.3 has
|
||||
identical Table 12 codes (only values shift). Don't update spec
|
||||
references to 10.3 until the corpus migrates.
|
||||
- **SAP 10.3 (13-01-2026).** No SAP-10.3-lodged certs in the corpus,
|
||||
so it cannot be validated. Calculator targets SAP 10.2 until the
|
||||
corpus migrates (expected late 2026 / 2027 once BRE updates RdSAP
|
||||
to reference SAP 10.3). Note: `table_12.py` currently mixes SAP
|
||||
10.2 prices with SAP 10.3 CO2 factors — corrected as part of P2.
|
||||
- **Historical-spec cert reproduction.** Calculating what cert SAP
|
||||
*would have been* under SAP 10.1 / pre-March-2025 SAP 10.2 prices is
|
||||
not the calculator's job. Lodged Performance carries the historical
|
||||
value; Calculated SAP10 Performance is current-spec only. The
|
||||
Validation Cohort filter operationalises this — older certs are
|
||||
out of the validation loop, not because they're "wrong" but because
|
||||
they're a different spec's output.
|
||||
- **Re-deriving cert-cal at the end.** The prior session's plan. The
|
||||
cert-calibration layer is deleted in P2, not re-fit.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -208,22 +360,49 @@ For each table, formula, footnote, exception:
|
|||
3. Are there spec-defined edge cases / footnotes we're missing?
|
||||
|
||||
### 5.4. When a gap is found
|
||||
- Write a failing unit test that asserts the spec-correct behaviour.
|
||||
- Write a failing unit test that asserts the spec-correct behaviour
|
||||
— wherever possible, write it as an assertion on `intermediate`
|
||||
values rather than on aggregate SAP, using a BRE worked example
|
||||
if one covers the section.
|
||||
- Implement the fix.
|
||||
- Run **all 7 golden fixtures** plus the broader probe. Note both
|
||||
direction and magnitude of change.
|
||||
- If the fix is spec-correct but breaks a golden fixture, this is
|
||||
evidence that the fixture was a compensating-error case — proceed
|
||||
with the spec-correct fix and update the fixture (with a comment
|
||||
noting it was a compensating case).
|
||||
- Commit per-slice as before: one section → one commit. Reference the
|
||||
spec section in the commit message.
|
||||
- Run `test_bre_worked_examples.py` plus the Validation Cohort
|
||||
probe. Note both direction and magnitude of change.
|
||||
- If a BRE worked-example breaks, the new code is wrong (revert).
|
||||
BRE examples are spec-derived and cannot regress from a
|
||||
spec-correct change.
|
||||
- Commit per-slice: one section → one commit. Reference the spec
|
||||
section in the commit message.
|
||||
|
||||
### 5.5. Use trace mode when you need it
|
||||
ADR-0009 specifies a `SapResult.intermediate: dict[str, float]` field
|
||||
that was never populated. Adding this is highly recommended for the
|
||||
systematic pass — each section's verification benefits from
|
||||
inspecting the intermediate values. See §11 below for a sketch.
|
||||
### 5.5. Sweep-time principle: worksheet-faithful structure
|
||||
|
||||
Each `worksheet/*.py` module must mirror the SAP 10.2 worksheet
|
||||
structure for its section. As you verify a section, also restructure
|
||||
its module so that:
|
||||
|
||||
1. **Each function name references its worksheet-line origin** (e.g.
|
||||
`heat_transfer_coefficient` aligns with worksheet line (40);
|
||||
`mean_internal_temperature` aligns with worksheet line (93)).
|
||||
2. **Compound calculations are split** into one function per
|
||||
worksheet line where possible — easier to verify against
|
||||
`intermediate[...]` and against BRE worked-example values.
|
||||
3. **Defensive type-handling disappears**. Once P6 lands, the input
|
||||
is a typed enum or numeric — branching on `isinstance(x, int)` is
|
||||
replaced by enum dispatch.
|
||||
4. **Domain-typed inputs flow directly**. `SapBuildingPart.kind ==
|
||||
BuildingPartKind.MAIN_DWELLING` replaces string sniffing of
|
||||
`identifier`. The dimensions.py "unnecessarily complicated"
|
||||
pattern Khalim flagged is the canonical example of what *not*
|
||||
to do.
|
||||
|
||||
The principle applies during section-sweep slices. It is **not**
|
||||
a separate prerequisite — the refactor lands with the verification
|
||||
slice for the section it touches.
|
||||
|
||||
### 5.6. Use trace mode when you need it
|
||||
P5 populates `SapResult.intermediate: dict[str, float]` with every
|
||||
named SAP 10.2 worksheet variable. Each section's verification
|
||||
benefits from inspecting these values per-cert. See §11 below for
|
||||
the sketch.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -350,59 +529,60 @@ touched and what the current state is.
|
|||
|
||||
---
|
||||
|
||||
## 7. The cert-calibration vs spec-correctness tension
|
||||
## 7. The cert-calibration "tension" is dissolved (per ADR-0010)
|
||||
|
||||
This is THE central architectural decision you have to make as you
|
||||
work through the spec.
|
||||
This section originally framed cert-calibration vs spec-correctness as
|
||||
two end-states the calculator had to choose between. That framing is
|
||||
wrong (see §3 for the actual diagnosis): the cert-cal values are
|
||||
pre-March-2025 SAP prices, not Elmhurst deviations from SAP 10.2.
|
||||
Once the corpus is filtered to the Validation Cohort (P3) and the
|
||||
cert-cal layer is deleted (P2), the false dichotomy disappears.
|
||||
|
||||
### Two tables of fuel prices
|
||||
- `domain.sap.tables.table_12.UNIT_PRICE_P_PER_KWH` — SAP 10.2 spec
|
||||
values (3.64p gas, 16.49p standard elec).
|
||||
- `domain.sap.tables.table_12_cert_calibration.UNIT_PRICE_P_PER_KWH`
|
||||
— empirically lower values (3.48p gas, 13.19p elec) that match the
|
||||
cert assessor software's output.
|
||||
### What replaces this section
|
||||
|
||||
### Two possible end states for the calculator
|
||||
- **One price table.** `domain.sap.tables.table_12` (re-labelled SAP
|
||||
10.2 14-03-2025 amendment, CO2 factors corrected per P2).
|
||||
- **One validation cohort.** `inspection_date ≥ 2025-07-01`, every
|
||||
cert lodged on the calculator's target spec version.
|
||||
- **One verification mechanism.** Trace-mode intermediates + BRE
|
||||
worked-example unit tests for per-section verification; Validation
|
||||
Cohort probe MAE for aggregate go/no-go.
|
||||
|
||||
**End state A — Spec-perfect.** Use spec prices, apply every spec rule
|
||||
(standing charges, Table 12a fractions, combi zero-loss, etc.). The
|
||||
calculator output is then what a *correct SAP 10.2 implementation*
|
||||
would produce. SAP MAE against the corpus will likely worsen because
|
||||
Elmhurst doesn't perfectly implement spec.
|
||||
|
||||
**End state B — Elmhurst-perfect.** Use cert-cal prices and reproduce
|
||||
Elmhurst's deviations exactly. The calculator output matches cert
|
||||
SAP scores. The calculator becomes a "reverse-engineered Elmhurst
|
||||
clone" rather than a SAP 10.2 implementation.
|
||||
|
||||
### The pragmatic recommendation
|
||||
|
||||
**Aim for state A but track state B as the parity probe.** Concretely:
|
||||
|
||||
1. Verify each spec section in isolation; fix spec violations
|
||||
regardless of MAE impact, but commit each fix WITH a measured
|
||||
probe delta in the commit message.
|
||||
2. After the spec sweep is complete, the calculator's output is
|
||||
spec-correct. The corpus residual at that point is Elmhurst's
|
||||
deviation from spec.
|
||||
3. THEN re-derive the cert-calibration prices to match Elmhurst's
|
||||
deviation pattern. The calibration becomes a thin Elmhurst-
|
||||
compatibility layer on top of a spec-correct engine.
|
||||
|
||||
This avoids the whack-a-mole problem because state A is unambiguous:
|
||||
each fix is either spec-correct or not. State B is iterative on top
|
||||
of state A, not entangled with it.
|
||||
Cert-software deviations from spec, if they exist at all, are
|
||||
expected to be small and localised. They surface as residual after
|
||||
the spec sweep completes against a clean probe — and at that point
|
||||
the question is whether to chase them at all (Elmhurst-deviation
|
||||
fixes have low domain value compared to spec-correctness, given the
|
||||
calculator's product use case is scoring counterfactuals for the
|
||||
MeasureApplicator chain, not reproducing historical certs).
|
||||
|
||||
---
|
||||
|
||||
## 7b. Outstanding findings to pick up during the systematic pass
|
||||
|
||||
The prior session identified several spec-correct fixes that were
|
||||
**reverted because they made SAP MAE worse against the corpus, but the
|
||||
spec basis is unambiguous and the fixes WILL be the right answer once
|
||||
the cert-calibration is re-derived against a clean engine.** Treat
|
||||
these as TODOs the systematic pass should encounter when it reaches
|
||||
the relevant section. They're listed here so the work isn't lost.
|
||||
reverted because they made SAP MAE worse against the **full corpus**.
|
||||
The empirical signal that "reverted" them was version-mixture noise
|
||||
(see §3) plus compensating-error breakage in the 7 retired golden
|
||||
fixtures. Each fix below is **expected to land cleanly** once the
|
||||
five prerequisites in §2.5 are done, because:
|
||||
|
||||
- The Validation Cohort (P3) is on a single spec version — the price
|
||||
mismatch that drove the bias regression on standing charges and
|
||||
cat=10 routing disappears.
|
||||
- The cert-cal layer is gone (P2) — no calibration to "break".
|
||||
- PCDB is integrated (P4) — the heat-pump and gas-boiler residuals
|
||||
that dominated per-cert MAE collapse before any of these findings
|
||||
even matter.
|
||||
- The fixtures are now BRE worked examples (P5 + §10) — they cannot
|
||||
be broken by spec-correct changes because they are themselves
|
||||
derived from the spec.
|
||||
|
||||
Treat each finding as a section-sweep TODO. The empirical impacts
|
||||
below were measured against the **dirty probe** (full corpus + cert-cal
|
||||
+ no PCDB) and are **not predictive** of behaviour on the clean probe.
|
||||
Re-measure each fix against the Validation Cohort after prerequisites
|
||||
land.
|
||||
|
||||
### Finding 1 — HW cylinder zero-loss rule for combi boilers
|
||||
**Status**: spec-correct fix exists in working-tree-only form
|
||||
|
|
@ -587,32 +767,45 @@ direction.
|
|||
|
||||
## 8. Don't repeat — known dead-ends
|
||||
|
||||
> **Re-read after §3 + §7b.** Three entries below were classified as
|
||||
> "dead-ends because cert-cal absorbs" — that diagnosis is wrong.
|
||||
> They are spec-correct fixes that were measured under a noisy probe.
|
||||
> Now flagged as **conditional dead-ends**: dead only if you try them
|
||||
> before P1–P5 land. After prerequisites: they are expected
|
||||
> improvements, not dead-ends. See ADR-0010.
|
||||
|
||||
- ❌ **Switching "NI" wall thickness to None alone** (S-B5 in history) —
|
||||
over-corrected because it routed to the (Unfilled cavity, 50mm) row
|
||||
instead of the dedicated Filled cavity row. The right fix landed in
|
||||
S-B23 with a `WALL_INSULATION_FILLED_CAVITY` dispatcher.
|
||||
- ❌ **Aggressive efficiency rescue for missing `sap_main_heating_code`**
|
||||
(S-B5) — over-corrected. The category fallback (cat=4 → 2.30) is
|
||||
intentionally conservative; PCDB is needed for real efficiency.
|
||||
- ❌ **Using SAP 10.2 spec prices for parity validation** — cert assessor
|
||||
uses lower prices despite reporting `sap_version=10.2` (S-B9, S-B10).
|
||||
Use `cert_calibration_prices()` for the probe.
|
||||
intentionally conservative; PCDB (P4 prerequisite) supplies the
|
||||
real efficiency.
|
||||
- ⚠️ **Using SAP 10.2 spec prices for parity validation** — under
|
||||
the dirty probe, cert-cal prices fit better. **Inverts under the
|
||||
clean probe (P2 + P3): SAP 10.2 spec prices are correct because the
|
||||
Validation Cohort is on the 14-03-2025 amendment.** Listed here
|
||||
only as a warning if you start the sweep before prerequisites land.
|
||||
- ❌ **Always applying 10% secondary heating** — must be conditional on
|
||||
cert lodging or main system being electric storage (S-B20). See
|
||||
spec Appendix A.4.
|
||||
- ❌ **Respecting `main_heating_fraction` for secondary allocation**
|
||||
(failed S-B30) — the field is the multi-main allocation (system 1 vs
|
||||
system 2), not main-vs-secondary. SAP MAE 4.69 → 4.85 (worse).
|
||||
- ❌ **Switching cat=10 room heaters off off-peak** (failed S-B32) —
|
||||
spec-correct per Table 12a but inverts bias direction. Cert-cal
|
||||
calibration absorbs the deviation.
|
||||
- ❌ **Adding gas standing charges** (4-mode probe, unimplemented) —
|
||||
spec-correct per Table 12 note (a) but pushes SAP bias from +0.98
|
||||
to −2.62. Cert-cal calibration absorbs.
|
||||
- ❌ **Zeroing storage + primary loss for combi boilers** (uncommitted
|
||||
S-B32) — spec-correct per Table 2 + Table 3 footers and drops PE
|
||||
MAE −6.64 (huge win) BUT raises SAP MAE +0.39 and breaks 3 golden
|
||||
fixtures. Decision deferred to systematic pass.
|
||||
- ⚠️ **Switching cat=10 room heaters off off-peak** (failed S-B32) —
|
||||
spec-correct per Table 12a. The bias inversion under the dirty
|
||||
probe was driven by cert-cal compensating; on the clean probe this
|
||||
is just spec-correct. Land as part of the §12 spec sweep after
|
||||
prerequisites.
|
||||
- ⚠️ **Adding gas standing charges** (4-mode probe, unimplemented) —
|
||||
spec-correct per Table 12 note (a). Same logic: bias drift under
|
||||
dirty probe is version-mixture + missing-PCDB noise, not Elmhurst
|
||||
deviation. Land as part of §12 spec sweep.
|
||||
- ⚠️ **Zeroing storage + primary loss for combi boilers** (uncommitted
|
||||
S-B32) — spec-correct per Table 2 + Table 3 footers. SAP MAE
|
||||
regression was driven by the now-retired golden fixtures (§10) and
|
||||
cert-cal absorption. Land as part of §4 / Appendix J sweep.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -620,10 +813,15 @@ direction.
|
|||
|
||||
### Sample
|
||||
`data/ml_training/runs/2025_2026_n250000_v18a/data.parquet` is the
|
||||
250k-cert parquet. The probe filters to `sap_score ∈ [5, 99]` and
|
||||
250k-cert parquet. **After P1 lands** the parquet carries
|
||||
`inspection_date`; the probe then filters to the **Validation Cohort**
|
||||
(`inspection_date ≥ 2025-07-01`) plus `sap_score ∈ [5, 99]` and
|
||||
samples 300 at seed=7 by default. Filtering rationale:
|
||||
- ≤ 5 is heritage/anomaly stock (sub-3% of corpus)
|
||||
- ≤ 5 is heritage/anomaly stock (sub-3 % of corpus)
|
||||
- ≥ 99 is full-SAP new-builds the parquet excludes anyway
|
||||
- `inspection_date ≥ 2025-07-01` ensures every cert was lodged on
|
||||
SAP 10.2 (14-03-2025 amendment) — see [CONTEXT.md](../../CONTEXT.md)
|
||||
/ "Validation Cohort" and ADR-0010 §3.
|
||||
|
||||
### Run the probe
|
||||
```bash
|
||||
|
|
@ -654,38 +852,58 @@ main(['300','7'])
|
|||
|
||||
---
|
||||
|
||||
## 10. The 7 golden fixtures
|
||||
## 10. Fixtures: retire the 7 cert-based golden fixtures, replace with BRE worked examples (per ADR-0010 + P5)
|
||||
|
||||
The 7 cert-based fixtures at
|
||||
`packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`
|
||||
locks 7 corpus certs as regression anchors:
|
||||
were locked in against the current calculator state — *with* cert-cal,
|
||||
*without* PCDB, *with* HW cylinder loss always applied, *with* the
|
||||
lighting heuristic, etc. They are documented in §3 / the prior
|
||||
handover as containing compensating errors. Once the prerequisites
|
||||
land, every spec-correct fix breaks at least one of them. They will
|
||||
fight the spec sweep.
|
||||
|
||||
| Cert | TFA | Cat | Notes |
|
||||
|---|---|---|---|
|
||||
| `0240-0200-5706-2365-8010` | 202 | 2 | Detached, age J, oil boiler, Table 4b code 130 |
|
||||
| `0300-2747-7640-2526-2135` | 526 | 2 | Semi-detached, age D, gas PCDB |
|
||||
| `0390-2954-3640-2196-4175` | 360 | 2 | Detached, age F, oil PCDB |
|
||||
| `6035-7729-2309-0879-2296` | 128 | 2 | Mid-terrace, age A, gas combi code 104 |
|
||||
| `7536-3827-0600-0600-0276` | 152 | 2 | Detached + extensions, age D, gas PCDB. Cleanest PE match (−0.29 kWh/m²) |
|
||||
| `8135-1728-8500-0511-3296` | 102 | 2 | Semi-detached, age C, gas PCDB |
|
||||
| `9390-2722-3520-2105-8715` | 75 | 6 | Mid-floor flat, age D, heat network code 301 |
|
||||
### Replacement strategy
|
||||
|
||||
Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10`. **Tighten as
|
||||
the spec sweep progresses.**
|
||||
**Primary regression suite: BRE worked-example fixtures.**
|
||||
|
||||
The cert JSONs are stored under `fixtures/golden/<cert>.json` —
|
||||
frozen at extraction time so the test is reproducible without
|
||||
bulk-zip access. The probe extraction script for new fixtures is
|
||||
inlined in the test history (see commit `f4a8d2a0`).
|
||||
Transcribe the worked examples from:
|
||||
- SAP 10.2 spec appendices (especially Appendix R — reference values
|
||||
and the worked example dwelling).
|
||||
- RdSAP 10 (10-06-2025) worked-example annex.
|
||||
|
||||
**Important caveat**: some of these 7 are compensating-error matches
|
||||
(see §3). When a spec-correct slice breaks one, the fixture is
|
||||
probably the compensating case — investigate before reverting.
|
||||
Each worked example becomes a unit test that locks **per-intermediate
|
||||
expected values** (HLP, HTC, mean internal temperature monthly, MIT,
|
||||
ECF, SAP score) rather than the aggregate SAP score alone. Because
|
||||
they are spec-derived, no spec-correct change can break them — any
|
||||
break is an implementation bug, unambiguously.
|
||||
|
||||
These tests live at
|
||||
`packages/domain/src/domain/sap/tests/test_bre_worked_examples.py`
|
||||
(new module — separate from the cert-based fixtures module).
|
||||
|
||||
**Cert-based fixtures retired.**
|
||||
|
||||
The current `test_golden_fixtures.py` is either deleted or repurposed
|
||||
as a *very loose* smoke-test integration suite (e.g. `|SAP residual|
|
||||
≤ 5`) that catches catastrophic regressions only. The 7 cert JSONs
|
||||
under `fixtures/golden/<cert>.json` can be kept on disk as reference
|
||||
data, but they no longer drive go/no-go decisions in the sweep.
|
||||
|
||||
**Optional future addition.**
|
||||
|
||||
If/when a current Elmhurst (or Stroma / Quidos / NHER) license is
|
||||
available, run a handful of representative corpus certs through it
|
||||
and lock those outputs as a second-tier regression suite — Elmhurst-
|
||||
parity fixtures alongside spec-parity fixtures. Not a prerequisite.
|
||||
|
||||
---
|
||||
|
||||
## 11. Trace mode (recommended infrastructure)
|
||||
## 11. Trace mode (prerequisite P5 — implementation sketch)
|
||||
|
||||
ADR-0009 proposed:
|
||||
This section was originally labelled "recommended"; it is now
|
||||
**prerequisite P5** per ADR-0010. The sweep does not start until
|
||||
`intermediate` is populated everywhere. ADR-0009 proposed:
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class SapResult:
|
||||
|
|
@ -765,14 +983,39 @@ This single session should produce zero behaviour changes if §1-3 are
|
|||
correctly implemented, but expect to find at least one issue in §3
|
||||
geometry (per the reviewer's "biggest SAP error sources" list).
|
||||
|
||||
Run the golden fixtures + probe at the end of each session; expect no
|
||||
movement until you start hitting actual gaps.
|
||||
**Important:** Session 1 only starts after all five prerequisites in
|
||||
§2.5 have landed and the Validation Cohort probe baseline has been
|
||||
captured. Until then, running per-section verification produces noisy
|
||||
signal.
|
||||
|
||||
Run the BRE worked-example fixtures (P5) + Validation Cohort probe
|
||||
(P3) at the end of each session; expect no movement until you start
|
||||
hitting actual gaps.
|
||||
|
||||
---
|
||||
|
||||
## 13. Workflow recap
|
||||
|
||||
For each section, in order:
|
||||
**Phase 0 — Prerequisites (§2.5).** Land P1–P6 first, in dependency
|
||||
order:
|
||||
|
||||
| | Slice | Depends on |
|
||||
|---|---|---|
|
||||
| P1 | Re-extract parquet with `inspection_date` | — |
|
||||
| P2 | Delete cert-cal; correct `table_12.py` CO2 factors | — |
|
||||
| P3 | Filter parity probe to Validation Cohort | P1 |
|
||||
| P4 | Implement `PcdbLookup` | — (P2 helpful) |
|
||||
| P5 | Populate `SapResult.intermediate` + transcribe BRE worked examples | — |
|
||||
| P6 | Strict-type `EpcPropertyData` via codes.csv-derived enums | — |
|
||||
|
||||
P1, P2, P4, P5, P6 can run in parallel. P3 needs P1. Capture a
|
||||
Validation Cohort probe baseline once all six land — that is the new
|
||||
MAE starting line. Repo-wide tests stay green throughout P6 (Site
|
||||
Notes consumers, ML pipeline, recommendations, etc. all need the
|
||||
mapper updates that accompany each typing change).
|
||||
|
||||
**Phase 1 — Section sweep.** For each RdSAP 10 section, in document
|
||||
order:
|
||||
|
||||
1. Read the spec section text + cited tables.
|
||||
2. Identify code location(s).
|
||||
|
|
@ -780,22 +1023,36 @@ For each section, in order:
|
|||
- Does our code implement it?
|
||||
- Does the implementation match?
|
||||
- Edge cases / fallback paths handled?
|
||||
4. For each gap: AAA unit test → minimal implementation → commit.
|
||||
5. After each commit: run golden fixtures (`pytest test_golden_fixtures.py`)
|
||||
and the parity probe. Note both deltas in the commit message.
|
||||
6. If a golden fixture breaks: investigate. Either fixture was a
|
||||
compensating case (acceptable to break) or the new code is wrong
|
||||
(revert).
|
||||
4. For each gap: AAA unit test (preferring a BRE worked-example
|
||||
assertion on `intermediate` values when possible) → minimal
|
||||
implementation → commit.
|
||||
5. **Apply the worksheet-faithful structure principle** (§5.5) as
|
||||
part of this slice: name functions after worksheet lines, split
|
||||
compound calculations, replace any remaining defensive
|
||||
type-handling with typed-enum dispatch.
|
||||
6. After each commit: run `test_bre_worked_examples.py` + Validation
|
||||
Cohort probe. Note both deltas in the commit message.
|
||||
7. If a BRE worked-example breaks: the new code is wrong (revert).
|
||||
The worked examples are spec-derived and cannot be broken by
|
||||
spec-correct changes.
|
||||
|
||||
Stick to this. The prior session's mistake was jumping between
|
||||
sections based on residual-size. Don't.
|
||||
sections based on residual-size **on a dirty probe**. Clean probe
|
||||
plus document-order discipline plus worksheet-faithful structure is
|
||||
what makes the sweep converge.
|
||||
|
||||
---
|
||||
|
||||
## 14. Useful references
|
||||
|
||||
- **ADR-0010** `docs/adr/0010-sap10-calculator-spec-target-and-validation.md`
|
||||
— the binding decisions reflected in this rewrite: SAP 10.2 target,
|
||||
cert-cal deletion, Validation Cohort, PCDB-as-prerequisite, fixture
|
||||
retirement. **Read first.**
|
||||
- **ADR-0009** `docs/adr/0009-deterministic-sap-calculator.md` —
|
||||
decision rationale + Session A/B/C plan.
|
||||
original calculator decision rationale + Session A/B/C plan. Read
|
||||
for context; spec-version target / PCDB sequencing / cert-cal
|
||||
rationale are superseded by ADR-0010.
|
||||
- **Spec coverage map**
|
||||
`docs/sap-spec/SPEC_COVERAGE.md` — pre-existing coverage tracker.
|
||||
Update as you go.
|
||||
|
|
@ -817,19 +1074,19 @@ sections based on residual-size. Don't.
|
|||
|
||||
## 15. Final note
|
||||
|
||||
The prior session demonstrated that **moving SAP MAE down requires
|
||||
either spec-correctness OR Elmhurst-perfect calibration, not both
|
||||
simultaneously**. The cert-cal layer absorbs Elmhurst's spec
|
||||
deviations; any spec-correct fix risks breaking it.
|
||||
The prior session's framing — *"the cert-calibration layer absorbs
|
||||
Elmhurst's spec deviations; we'll re-derive it at the end"* — was
|
||||
load-bearing on a false diagnosis. The cert-cal layer is
|
||||
pre-March-2025 SAP prices fit against a mixture distribution of two
|
||||
spec-version regimes. Once you separate the regimes (Validation
|
||||
Cohort) and use spec prices everywhere, the "tension" disappears.
|
||||
|
||||
The systematic pass clears this by separating the layers:
|
||||
1. Build the spec-correct engine first.
|
||||
2. Re-fit the cert-cal compatibility layer once at the end.
|
||||
After P1–P5 land, the section sweep is straightforward: every
|
||||
spec-correct fix is unambiguously the right answer, BRE
|
||||
worked-example fixtures lock the result, and Validation Cohort probe
|
||||
MAE moves monotonically downward. The fixes the prior session marked
|
||||
as "spec-correct but probe-regressed" become trivially landable.
|
||||
|
||||
Don't be discouraged when SAP MAE rises temporarily during the spec
|
||||
sweep. PE residual is the truer signal of engine correctness. SAP
|
||||
MAE convergence will follow once cert-cal is re-derived against the
|
||||
clean engine.
|
||||
|
||||
**Welcome to the project. Read the spec, follow the order, commit one
|
||||
section at a time. The deterministic answer is in there.**
|
||||
**Welcome to the project. Read ADR-0010, land the five prerequisites,
|
||||
then walk the spec in document order. The deterministic answer is in
|
||||
there.**
|
||||
|
|
|
|||
|
|
@ -21,7 +21,6 @@ from typing import Final
|
|||
|
||||
from datatypes.epc.domain.epc_property_data import EpcPropertyData, SapBuildingPart
|
||||
|
||||
|
||||
_DEFAULT_STOREY_HEIGHT_M: Final[float] = 2.5
|
||||
|
||||
|
||||
|
|
@ -72,6 +71,16 @@ def dimensions_from_cert(epc: EpcPropertyData) -> Dimensions:
|
|||
"""Build the `Dimensions` aggregate from an EpcPropertyData."""
|
||||
parts = epc.sap_building_parts or []
|
||||
|
||||
# Khalim Comments - this section seems to implement the
|
||||
# worksheet section in page 132 and is unnecessarily
|
||||
# complicated. The sap building parts are pre-ordered, form
|
||||
# main building part to the extensions and the
|
||||
# "identifier" field tells us if the part is the Main Dwelling
|
||||
# of it's an extension. E.g. if it's an extension, identifier
|
||||
# should be "Extension 1".
|
||||
# We should strictly type the values on the EpcPropertyData
|
||||
# domain model
|
||||
|
||||
ground_area = 0.0
|
||||
ground_perim = 0.0
|
||||
top_area = 0.0
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue