diff --git a/CONTEXT.md b/CONTEXT.md index 9ea32088..34afd5fe 100644 --- a/CONTEXT.md +++ b/CONTEXT.md @@ -98,12 +98,20 @@ The SAP / EPC Band / carbon emissions / heat demand the modelling pipeline actua _Avoid_: modelled performance, rebaselined performance (only correct when rebaselining ran), scored values **Calculated SAP10 Performance**: -The SAP score, EPC Band, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh produced by **SAP10 Calculation** from a Property's EpcPropertyData. Distinct from Effective Performance (ML output) and Lodged Performance (gov register) during the validation phase. Surfaced alongside Effective Performance in the UI; may supersede Effective Performance in a later ADR once parity is confirmed against the cert-reported SAP across ≥1000 sample certs. ADR-0009. +The SAP score, EPC Band, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh produced by **SAP10 Calculation** from a Property's EpcPropertyData. Distinct from Effective Performance (ML output) and Lodged Performance (gov register) during the validation phase. Surfaced alongside Effective Performance in the UI; may supersede Effective Performance in a later ADR once parity is confirmed against the cert-reported SAP across ≥1000 sample certs lodged on the calculator's target spec version (see [[sap-spec-version]]). ADR-0009 (as amended by ADR-0010). _Avoid_: calculator output, computed performance, worksheet performance, SAP10 output **SAP10 Calculation**: -The process that runs the deterministic SAP 10.3 worksheet over a Property's EpcPropertyData and emits **Calculated SAP10 Performance**. Implemented by the `Sap10Calculator` service class in `domain/sap/`. Reads cert fabric/heating/geometry fields, applies the RdSAP 10 cert→input mapping, executes the 12-month heat balance per SAP 10.3 §§1-14, and returns a `SapResult` carrying the five Calculated SAP10 Performance quantities plus a monthly breakdown and worksheet-line audit trail. Distinct from **Rebaselining**, which is ML-based. ADR-0009. -_Avoid_: SAP calculation (ambiguous with the gov calculator), SAP scoring, calculator run +The process that runs the deterministic SAP 10.2 (14-03-2025 amendment) worksheet over a Property's EpcPropertyData and emits **Calculated SAP10 Performance**. Implemented by the `Sap10Calculator` service class in `domain/sap/`. Reads cert fabric/heating/geometry fields, applies the RdSAP 10 (10-06-2025) cert→input mapping, executes the 12-month heat balance per SAP 10.2 §§1-14, looks up boiler/heat-pump performance in the **PCDB** when the cert lodges a product index, and returns a `SapResult` carrying the five Calculated SAP10 Performance quantities plus a monthly breakdown and worksheet-line audit trail. Distinct from **Rebaselining**, which is ML-based. ADR-0009 originally targeted SAP 10.3 (13-01-2026); ADR-0010 retargets to SAP 10.2 (14-03-2025) until the cert corpus migrates. +_Avoid_: SAP calculation (ambiguous with the gov calculator), SAP scoring, calculator run, SAP 10.3 calculation (active target is 10.2 — see [[sap-spec-version]]) + +**SAP Spec Version**: +The dated revision of the SAP specification that produced a given SAP/PEUI/CO2 value. Domain-meaningful because the same EpcPropertyData yields different `sap_score` under different spec versions — fuel-price tables, CO2 factors, PCDB references, and rating-equation deflators all change between revisions. **Lodged Performance** carries the version current when the cert was lodged (mostly SAP 10.1 / SAP 10.2 pre- and post-14-03-2025 amendment in the corpus). **Calculated SAP10 Performance** is locked to SAP 10.2 (14-03-2025). A 1-to-1 Lodged-vs-Calculated comparison therefore only makes sense within a **Validation Cohort** of certs lodged on the same spec version. +_Avoid_: SAP version (ambiguous with the `sap_version` field on the cert, which only carries the major version like 10.2 — not the amendment date), spec revision + +**Validation Cohort**: +The subset of corpus certs used to validate **SAP10 Calculation** against **Lodged Performance**, filtered to certs lodged after the calculator's target **SAP Spec Version** rolled out in commercial assessor software — currently `inspection_date ≥ 2025-07-01` (a buffer past 14-03-2025 to allow vendor rollout). Smaller than the full corpus but each cert is comparable under the same spec, so probe MAE is a clean signal of calculator-vs-spec correctness rather than spec-version mixture noise. ADR-0010. +_Avoid_: parity cohort, validation set, corpus sample **Measure Application**: The process that translates an Optimised Package into cert-field changes and produces the "ending state snapshot" EpcPropertyData that Plan Phase persists. Implemented by the `MeasureApplicator` service class in `domain/sap/` (or a sibling package). Each Measure Type's translation rules (e.g. `loft_insulation` → `roof_insulation_thickness_mm = 270mm`, `ashp` → `main_heating_details[0]` replacement) live here. Pure function — does not run SAP10 Calculation itself; the caller chains `MeasureApplicator.apply(epc, package) → Sap10Calculator.calculate(post_epc)`. ADR-0009. diff --git a/docs/adr/0010-sap10-calculator-spec-target-and-validation.md b/docs/adr/0010-sap10-calculator-spec-target-and-validation.md new file mode 100644 index 00000000..8ad06da9 --- /dev/null +++ b/docs/adr/0010-sap10-calculator-spec-target-and-validation.md @@ -0,0 +1,68 @@ +# Retarget Sap10Calculator to SAP 10.2 (14-03-2025); delete cert-calibration; validate on a spec-version-locked cohort + +**Status: Accepted.** Supersedes the spec-version target, the PCDB sequencing, and the cert-calibration layer of [ADR-0009](0009-deterministic-sap-calculator.md). Adds strict typing of `EpcPropertyData` (P6) and a worksheet-faithful structural principle for the `domain/sap/worksheet/*` modules — both new concerns ADR-0009 didn't address. All other ADR-0009 decisions stand (Calculated SAP10 Performance as a glossary term, MeasureApplicator/Sap10Calculator chain, MCS boolean default-false, global thermal-bridging y factor, Table 27 living-area fraction, Table 11 secondary-heating allocation, MeasureOverrides rejection). + +## Why this ADR exists + +ADR-0009 was written before a second-order problem in the validation corpus was visible: the 250k-cert training parquet spans **multiple SAP spec versions** (SAP 10.1 from 2019, SAP 10.2 pre- and post-14-March-2025 amendment), each of which was the active table when its certs were lodged. The prior session's `domain.sap.tables.table_12_cert_calibration` layer was implicitly absorbing this version mixture into a single "best fit" price set ~10–25 % lower than the SAP 10.2 (14-03-2025) spec — closer to the SAP 10.1 era prices. Every spec-correctness slice that touched a downstream component (HW cylinder zero-loss, gas standing charges, Table 12a fractional blending) registered as a regression on the parity probe because the cert-cal layer had been numerically calibrated against the buggy state of every other component. + +This ADR resolves four entangled decisions at once. They are coupled — none of them is the right call in isolation. + +## Decisions + +### 1. Active spec target is **SAP 10.2 (14-03-2025)**, not SAP 10.3 + +ADR-0009 named SAP 10.3 (13-01-2026) as the calculator's target. No SAP-10.3-lodged certs exist in the corpus; assessor software has not migrated. Targeting SAP 10.3 produces a calculator whose output is verifiable against no cert. The active target is SAP 10.2 (14-03-2025 amendment) — both the document RdSAP 10 (10-06-2025) cross-references for heating-system identification, and the amendment that current assessor software is on. + +`packages/domain/src/domain/sap/tables/table_12.py` is re-labelled as SAP 10.2 (14-03-2025). Its CO2 factors are corrected to spec (0.210 kg/kWh mains gas, 0.136 kg/kWh standard electricity — the file currently has SAP 10.3 values 0.214 and 0.086). Prices already match SAP 10.2 (3.64 p mains gas, 16.49 p standard electricity, etc.) — the misleading "+25 % shift from SAP 10.2 to 10.3" comment is removed; the 13.19 p figure is from SAP 10.1, not SAP 10.2. + +A future ADR retargets to SAP 10.3 once the cert corpus migrates (expected late 2026 or 2027 once BRE updates RdSAP to reference SAP 10.3). + +### 2. `table_12_cert_calibration` is deleted + +The cert-calibration table is bug-masking. Its prices are pre-March-2025 SAP values fit against the average cert in a mixed-version corpus, with downstream-component bugs absorbed into the fit. Removing it forces upstream errors to surface where they live, in the component that owns them, instead of being silently compensated for by a price tweak. + +This includes the `cert_calibration_e7_codes` extension that routes codes 191–196 (direct-electric) and 691–696 (room heaters) to off-peak rates — Table 12a is explicit that "other direct-acting electric heating" bills 100 % at the high rate on a 7-hour tariff. The S-B14 finding that motivated this hack is in §8 of the handover as a documented dead-end. + +`domain.sap.tables.table_12.unit_price_p_per_kwh` becomes the only price API. Parity probes are updated to use it. + +### 3. Validation Cohort is filtered to a single spec-version window + +Probe MAE against the full 250k-cert corpus measures both calculator correctness *and* the spec-version drift across certs lodged at different times. Without separating them, every spec-correctness improvement is noisy. + +The **Validation Cohort** is the subset of corpus certs with `inspection_date ≥ 2025-07-01` — chosen to allow ~4 months past the 14-March-2025 SAP 10.2 amendment for commercial assessor software to roll out the new tables. Filtering to this cohort yields a probe where every cert was lodged on the same spec version the calculator targets. MAE on the Validation Cohort is the only metric used for spec-sweep go/no-go. + +This requires re-extracting the training parquet to include `inspection_date` (currently dropped by the ETL — 202 columns, none of them dates). That extraction is a prerequisite slice. + +### 4. PCDB integration is promoted from Session C to a prerequisite + +ADR-0009 deferred PCDB to Session C and shipped a `NoOpPcdbLookup` stub. The handover's own measurements show PCDB absence accounts for ~19 SAP points of MAE on heat-pump certs (Table 4a fallback SCOP 2.30 vs typical PCDB 2.80–3.50) and most per-cert variance on the 78 % of gas-boiler certs lodging `main_heating_data_source=1` (category-default 0.80 vs typical PCDB 0.88–0.94). The handover's rationale for deferral ("cert-cal absorbs PCDB gaps") collapses with decision (2). + +PCDB lookup against `main_heating_index_number` is built before the section-by-section sweep starts. Data source: https://www.ncm-pcdb.org.uk — CSV exports of boilers and heat pumps. Per-product fields needed: seasonal efficiency, secondary efficiency, output kW, flow-temperature curve (heat pumps). The `NoOpPcdbLookup` seam from ADR-0009 grill outcome #1 is the integration point; the stub returns None and the calculator falls back to Table 4a only when the cert lodges no `main_heating_index_number` or the PCDB has no matching record. + +## Verification infrastructure (also prerequisites) + +Three pieces of infrastructure are built before the section sweep so per-section verification has unambiguous signal: + +1. **Trace mode populated.** ADR-0009 specced `SapResult.intermediate: dict[str, float]` and it was never built. Every named SAP 10.2 worksheet variable (heat transfer coefficient, mean internal temperature, monthly solar gains, utilisation factor, ECF, etc.) is exposed on `intermediate` so any single cert can be diffed against a hand-computed value, a BRE worked example, or a future Elmhurst reference trace. +2. **BRE worked-example unit tests.** SAP 10.2 spec appendices and RdSAP 10 worked examples are transcribed as fixtures keyed on per-intermediate expected values, not aggregate SAP score. These replace the 7 cert-based golden fixtures (which contained compensating errors per the handover §10). The cert fixtures are retired. +3. **Strict typing of `EpcPropertyData` via canonical domain enums.** Bare `str` and `Union[int, str]` fields (the latter because the gov API gives ints and Site Notes give strings) cascade defensive type-handling into every consumer — the calculator's `dimensions.py:74-82` is Khalim's documented example. The domain holds one canonical enum per field, derived from `datatypes/epc/domain/epc_codes.csv` (union of keys across schema versions, hand-authored). The API mapper and Site Notes mapper each adapt their raw input to the canonical enum. Repo-wide test compatibility is a hard constraint — every consumer of `EpcPropertyData` (calculator, ML pipeline, recommendations, ETL) continues working after the typing pass. Pyright `strict` mode stays clean. + +These map to prerequisites P5 (trace mode + BRE fixtures) and P6 (strict typing) in the handover §2.5. + +## Worksheet-faithful structure (sweep-time principle) + +Each `domain/sap/worksheet/*.py` module must mirror the SAP 10.2 worksheet structure for its section — function names reference their worksheet-line origin (e.g. `heat_transfer_coefficient` aligns with worksheet line (40)), compound calculations split into one function per line where possible, defensive type-handling replaced by typed-enum dispatch. This is not a prerequisite slice; the refactor lands as part of each section's sweep slice, verified by the BRE worked examples (which assert per-intermediate values). + +## Consequences + +- ADR-0009's "MAE ≤ 1.0 SAP-point on typical subset" success criterion is restated against the Validation Cohort (not the full corpus). The "typical subset" exclusions in ADR-0009 (sap_score ≤ 5, ≥ 100, multi-heating, conservatory, RIR) still apply on top of the cohort filter. +- The training parquet schema bumps when `inspection_date` is added — a non-breaking MINOR addition under [ADR-0008](0008-physics-as-feature.md)'s `Feature Schema Version` discipline. +- The handover document `docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md` is rewritten in lockstep: §3 (diagnosis), §4 (scope), §7 (state-A-vs-state-B framing deleted), §7b (findings re-framed), §10 (fixture strategy), and a new §2.5 listing the five prerequisites. +- Sessions A/B/C from ADR-0009 collapse into a single sequence: prerequisites land, then the section sweep runs against a clean probe with PCDB available. + +## Considered alternatives + +- **Build versioned Table 12 (pre/post 14-March-2025) keyed on `inspection_date` and validate across the full corpus.** Rejected as more work for no signal benefit during the spec sweep — the filtered cohort gets us to a clean probe faster. A versioned table is still future work if Calculated SAP10 Performance ever needs to reproduce historical cert SAP for products that compare against Lodged Performance directly. +- **Keep cert-cal during the sweep and re-derive at the end** (the handover's prescription). Rejected for the reasons in decision (2): the cert-cal layer corrupts the signal during the sweep, which is precisely when the signal needs to be cleanest. +- **Pay for an Elmhurst license, lock fixtures to its output.** Held in reserve. BRE worked examples are free and spec-derived; an Elmhurst trace would add value as a per-component reference but is not a prerequisite. diff --git a/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md b/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md index eb001368..a3d23a07 100644 --- a/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md +++ b/docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md @@ -24,10 +24,12 @@ The SAP/RdSAP energy assessment splits cleanly into two roles: takes the lodged fields and produces SAP score, CO2 emissions, primary energy (PEUI), CO2 per m², EI rating, etc. -**Our calculator is replicating role #2.** Where Elmhurst's -implementation diverges from spec, we follow Elmhurst, but we don't -guess at divergence; we localise it via reference traces or -empirically against the cert corpus. +**Our calculator is replicating role #2.** Assessor software +implements the SAP 10.2 spec faithfully; the question of "where does +Elmhurst diverge from spec?" is no longer the operative one (per +ADR-0010 + §3 below). Our job is to enumerate every spec +table / formula / footnote and verify each against the published SAP +10.2 (14-03-2025) and RdSAP 10 (10-06-2025) PDFs. There is no "assessor judgement" knob to tune. Each field on the cert has a deterministic interpretation per the spec. Each spec table / @@ -57,109 +59,259 @@ all of them and verify each. Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10 kWh/m²`. Known caveat: some of these are compensating-error matches (e.g. cert `7536-3827`'s PE matches but cost is £143 under cert's implied cost - due to multi-factor offsetting bugs). + due to multi-factor offsetting bugs). **These fixtures are retired + per ADR-0010 and §10 below — they lock buggy compensating outputs + in place and will fight the spec sweep.** + +> **Read this before anything else.** [ADR-0010](../adr/0010-sap10-calculator-spec-target-and-validation.md) +> supersedes the spec-version target, the PCDB sequencing, and the +> cert-calibration layer of ADR-0009. This handover document was +> originally written under the rejected framing; §3, §4, §7, §7b, +> §10 below have been rewritten in lockstep. §2.5 lists the five +> prerequisites that land **before** the section-by-section sweep +> starts. --- -## 3. Why we are pivoting to systematic review +## 2.5. Prerequisites before the sweep starts + +Five blockers, in dependency order. The section sweep does not start +until all five are merged. Together they convert the parity probe +from a noisy mixture-distribution signal into a clean per-section +verification tool. + +### P1 — Re-extract the training parquet with `inspection_date` + +The 250k-cert parquet has 202 columns; **none of them are dates**. +Without `inspection_date` on each cert we cannot construct the +Validation Cohort (P3). The ETL currently drops the dates; add them +back as a non-breaking MINOR Feature Schema Version bump (per +ADR-0008). `EpcPropertyData.inspection_date` and `.registration_date` +both exist on the domain object and are populated upstream — the +parquet writer just needs to include them. + +### P2 — Delete `domain.sap.tables.table_12_cert_calibration`; correct `domain.sap.tables.table_12` + +Per ADR-0010 §2 and §1: +- Remove `table_12_cert_calibration.py` and every call site + (`cert_calibration_prices()`, `cert_calibration_e7_codes`, the + `PriceTable` constructor argument that defaults to it). +- Re-label `table_12.py` as `SAP 10.2 Table 12 (14-03-2025 amendment)`. +- Correct CO2 factors: mains gas 0.214 → **0.210**, standard electricity 0.086 → **0.136** (the file currently mixes SAP 10.2 prices with SAP 10.3 CO2 factors). +- Delete the misleading "+25 % shift from SAP 10.2" comment — 13.19 p + is SAP 10.1 (or SAP 10.2 amendment 0), not SAP 10.2 (14-03-2025). + +### P3 — Filter the parity probe to the Validation Cohort + +`Validation Cohort` is defined in `CONTEXT.md` and ADR-0010 §3: +`inspection_date ≥ 2025-07-01`. Modify +`services/ml_training_data/src/ml_training_data/sap_parity_probe.py` +to apply the filter before sampling. The probe sample size and seed +remain configurable; `sap_score ∈ [5, 99]` remains the typicality +filter on top of the cohort filter. + +### P4 — Implement `PcdbLookup` (replace `NoOpPcdbLookup`) + +Per ADR-0010 §4. Download boiler + heat-pump CSVs from +https://www.ncm-pcdb.org.uk. Build a lookup keyed on +`main_heating_index_number`. Surface seasonal efficiency, secondary +efficiency, output kW, and (for HPs) flow-temperature curve. ~half-day +of work per the original handover estimate. The +`Sap10Calculator.__init__(pcdb: Optional[PcdbLookup])` seam from +ADR-0009 grill outcome #1 is the integration point; no calculator-side +changes needed beyond reading `index_number` and routing PCDB-returns +to space-heating / hot-water efficiency lookups instead of Table 4a. + +### P5 — Populate `SapResult.intermediate` + transcribe BRE worked examples + +Per ADR-0010 "Verification infrastructure": +- Populate every named SAP 10.2 worksheet variable on + `SapResult.intermediate` as sketched in §11. This is mechanical — + thread the values from each worksheet module into the dict. +- Transcribe the BRE worked examples from the SAP 10.2 appendices and + RdSAP 10 worked-example annex into unit tests + (`tests/test_bre_worked_examples.py`) that lock per-intermediate + values, not aggregate SAP. These replace the retired cert fixtures. + +### P6 — Strict-type `EpcPropertyData` via canonical domain enums + +The current `EpcPropertyData` and its nested types carry many bare +`str` fields and `Union[int, str]` fields (the latter because the +gov API gives ints and Site Notes give strings). The defensive +type-handling cascades into the calculator (`cert_to_inputs.py`, +`dimensions.py`, etc.) — `dimensions.py:74-82` is Khalim's documented +example: `SapBuildingPart.identifier` carries main-vs-extension +information but is bare `str`, so the dimensions code defensively +iterates instead of dispatching on a typed kind. + +The fix: +1. **One canonical enum per field**, union of all keys appearing + across all schema versions in + `datatypes/epc/domain/epc_codes.csv`. Hand-author the 18 enum + classes (`built_form`, `construction_age_band`, `energy_tariff`, + `glazed_area`, `glazed_type`, `heat_loss_corridor`, `main_fuel`, + `mechanical_ventilation`, `property_type`, `tenure`, + `transaction_type`, `ventilation_type`, `water_heating_fuel`, + `cylinder_insulation_thickness`, `energy_efficiency_rating`, + `improvement_description`, `improvement_summary`, `code`) plus + `BuildingPartKind` (Main Dwelling / Extension N). codes.csv is + the reference; a dedup script can optionally verify coverage but + is not a build dependency. +2. **The API mapper** parses raw ints into the canonical enum. +3. **The Site Notes mapper** parses raw strings into the canonical + enum. +4. **The domain object** (`EpcPropertyData` and nested) holds only + the canonical enums — no `Union[int, str]`, no bare `str` for + coded fields. +5. **Every consumer** (calculator, ML pipeline, recommendations, + ETL, scenario builder) reads from the typed fields. + +**Constraint**: repo-wide tests must keep passing. The calculator +is one consumer; the ML pipeline, recommendations, and the Site +Notes ingestion path also consume `EpcPropertyData`. Each mapper- +layer change is paired with adapter updates that preserve the +behaviour the existing tests cover. + +Pyright `strict` mode must remain clean (CLAUDE.md). + +### Expected outcome of P1–P6 + +After all six land, run the probe against the Validation Cohort. The +expected baseline MAE on the clean probe is much smaller than the +current 4.61 — likely 1.5–2.5 SAP-points based on what we know about +the residual breakdown (heat pumps closed by P4, gas boilers tightened +by P4, price-version noise removed by P2+P3). The remaining residual +is the genuine spec sweep target — and per-section fixes will move +the probe in measurable, distinguishable amounts because there's no +compensating layer to mask them, and there's no defensive type +branching obscuring which input value drove which intermediate. + +--- + +## 3. Why the prior diagnosis was wrong and how we fixed it The prior session shipped ten slices (S-B23 → S-B31) by debugging the biggest residuals one at a time: - **PE MAE dropped substantially: 57.28 → 43.32 (−14)** — real progress on the demand-side calculation. -- **SAP MAE barely moved: 5.34 → 4.61 (−0.73)** — the cost-side is - bottlenecked by cert-calibration prices that absorb multiple - structural deviations from spec, making any single slice that fixes - one component break the calibration for others. +- **SAP MAE barely moved: 5.34 → 4.61 (−0.73)** — diagnosed at the time + as "cert-calibration absorbs multiple spec deviations". -Two failed slice attempts in the prior session exposed the pattern: +Three slice attempts looked like they "proved" the cert-cal-absorbs- +deviations diagnosis: -- **Standing charges**: spec note Table 12 (a) clearly says gas standing - charge of £92 is added to space + water heating costs for energy - ratings. Empirically: adding it pushed SAP bias from +0.98 to −2.62. - Reverted before committing. -- **Cat=10 room heaters off-peak routing**: Table 12a clearly says - "Other direct-acting electric heating" bills 100% high rate on - 7-hour tariff. Empirically: switching cat=10 from off-peak to - standard rate inverted the bias from +5.88 to −6.00 without - improving MAE. Reverted before committing. -- **Hot water cylinder loss (uncommitted)**: spec Table 2 footer + - Table 3 footer clearly say combi boilers using Table 4b efficiency - have zero storage + primary loss. Empirically: zeroing them dropped - PE MAE −6.64 (huge improvement) but raised SAP MAE +0.39 AND broke - 3 of 7 golden fixtures. Reverted because no way to know whether to - follow spec (PE-correct) or Elmhurst (SAP-MAE-correct) without - reference traces. +- **Standing charges**: spec Table 12 note (a) requires £92/yr gas + standing charge on space + water heating. Adding it pushed SAP bias + +0.98 → −2.62. Reverted. +- **Cat=10 room heaters off-peak routing**: Table 12a says "other + direct-acting electric heating" bills 100 % high rate on 7-hour + tariff. Switching cat=10 from off-peak to standard rate inverted + the bias +5.88 → −6.00. Reverted. +- **HW cylinder zero-loss for combi** (uncommitted): Table 2 + Table + 3 footers require zero storage + primary loss when efficiency comes + from Table 4b. Zeroing them dropped PE MAE −6.64 but raised SAP + MAE +0.39 and broke 3 of 7 golden fixtures. Reverted. -The pattern: **the cert-calibration prices** (in -`domain.sap.tables.table_12_cert_calibration`) **were reverse-engineered -to match Elmhurst's output assuming all our other calculations are -correct.** When we fix a spec-violation bug in some other component, we -break the calibration and SAP MAE goes up even though we're more -spec-correct. +The prior agent concluded: *cert-calibration absorbs Elmhurst's +deviations from spec — we can't fix one without re-deriving the +calibration, so do a full spec sweep first and re-derive cert-cal at +the end.* This diagnosis is **wrong** and the proposed remedy +amplifies the problem. -This means **whack-a-mole on the biggest residual won't converge**. We -need to systematically verify every component against the spec, then -re-derive the cert-calibration once at the end. +### What was actually going on + +The 250k-cert corpus spans multiple SAP spec-version regimes: +- **Pre-2025-03-14**: certs lodged under SAP 10.1 / SAP 10.2 amendment + 0 prices — mains gas ~3.48 p, standard electricity 13.19 p. +- **Post-2025-03-14**: certs lodged under SAP 10.2 (14-03-2025) prices + — mains gas 3.64 p, standard electricity 16.49 p. + +The `table_12_cert_calibration` prices (3.48 p / 13.19 p) are **the +older spec's prices**, not Elmhurst deviations from the spec. They +are an empirical "best fit" across a mixture distribution of two +price regimes, with downstream-component bugs (PCDB absence, HW +cylinder loss applied to combi, etc.) absorbed into the fit. The +table looks like compensation for assessor-software quirks because we +were never told which spec each cert was on. + +Each "spec-correct fix that worsened MAE" in the failed slices above +was actually correct. The MAE regressed because: +1. The cert-cal prices (pre-March-2025 spec) cancelled with one set + of downstream errors to produce a quasi-stable cost. +2. The spec-correct fix landed → that cancellation broke → the + probe MAE went up. +3. But the spec-correct fix was *right* — what regressed was a + compensating-error equilibrium, not the calculator's truth. + +The prior session's "re-derive cert-cal at the end" plan would +re-establish a new compensating-error equilibrium across the new bug +set. It does not converge on spec-correctness. + +### The fix (per ADR-0010) + +1. **Stop fitting against a mixture distribution.** Filter the + validation corpus to a single spec-version window (Validation + Cohort, `inspection_date ≥ 2025-07-01`). Every cert in the cohort + was lodged on SAP 10.2 (14-03-2025) prices. +2. **Delete the cert-calibration layer.** Use spec prices everywhere + (`domain.sap.tables.table_12`). The only price-routing decision + left is Table 12a fractional high-rate blending — a real spec + feature, not a calibration. +3. **Build PCDB**, because it dominates residual variance and the + reason it was deferred (cert-cal-absorbs-PCDB) no longer holds. +4. **Build trace mode and BRE worked-example fixtures**, so + per-section verification works against single-cert intermediates + instead of aggregate corpus MAE. + +This is what §2.5 lists as the five prerequisites. Once they land, +the section-by-section spec sweep produces clean, monotonic +improvements. --- -## 4. Scope decisions +## 4. Scope decisions (per ADR-0010) ### IN scope -- **RdSAP 10 specification (10-06-2025)** — full document, all sections - (`docs/sap-spec/rdsap-10-specification-2025-06-10.pdf`, 114 pages). -- **SAP 10.2 full specification (14-03-2025)** — the worksheet, tables, - appendices that RdSAP 10 references - (`docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf`, 199 pages). +- **SAP 10.2 (14-03-2025 amendment)** is the active spec target. + `docs/sap-spec/sap-10-2-full-specification-2025-03-14.pdf`, 199 pages. +- **RdSAP 10 (10-06-2025)** — the cert→input mapping layer that + cross-references SAP 10.2. `docs/sap-spec/rdsap-10-specification-2025-06-10.pdf`, + 114 pages. +- **PCDB integration.** Moved from "Session C deferred" to **P4 + prerequisite** (§2.5). Heat pumps and the 78 % of gas-boiler certs + lodging `main_heating_data_source=1` need PCDB-sourced efficiency + for the calculator to be spec-correct. Data source: + https://www.ncm-pcdb.org.uk; lookup keyed on `main_heating_index_number`; + fields: seasonal efficiency, secondary efficiency, output kW, + flow-temperature curve (HPs). +- **All RdSAP 10 sections in document order.** §1 → §§19, plus + Tables 27 / 28 / 29 / 30 / 31. The verification approach in §5 is + unchanged — only the precondition changes: the sweep runs against a + clean probe (Validation Cohort + spec prices + PCDB + trace mode). -### OUT of scope (for now) -- **Full SAP assessments.** Full-SAP certs lodge a measured/calculated - U-value in `walls[i].description` (e.g. +### OUT of scope +- **Full SAP assessments.** Full-SAP certs lodge measured/calculated + U-values in `walls[i].description` (e.g. "Average thermal transmittance 0.18 W/m²K"). These are a separate - calculation path (BS EN ISO 6946) and a different corpus. **Park them - until the RdSAP 10 base case matches Elmhurst.** S-B24 / S-B29 + calculation path (BS EN ISO 6946) and a different corpus. Park + until the RdSAP 10 base case parity is reached. S-B24 / S-B29 attempted partial handling; those slices can stay or be reverted at your discretion when you reach §§4-7 of RdSAP and §3 of SAP 10.2. -- **PCDB (Product Characteristics Database).** ADR-0009 deferred this - to Session C. **This is a real future task, not a permanent - exclusion.** Heat pumps (cat=4) have catastrophic per-cert MAE (19 - SAP points) because we use Table 4a fallback efficiency 2.30 - instead of PCDB SCOP (typically 2.80-3.50). Gas boilers with - `main_heating_data_source=1` (78% of corpus boiler certs) fall back - to a category-default 0.80 vs typical PCDB-listed condensing-boiler - efficiencies of 0.88-0.94 — that's most of the per-cert SAP residual - variance on gas certs. - - A `NoOpPcdbLookup` stub seam exists in Session A (per ADR-0009 grill - outcome #1). The fetch+parse work is non-trivial: - - **Data source**: BRE PCDB at https://www.ncm-pcdb.org.uk — - boilers + heat pumps are downloadable CSVs (thousands of rows - each). - - **Lookup key**: cert lodges `main_heating_index_number` which is - the PCDB product ID. Match by that. - - **Per-product fields needed**: seasonal efficiency, secondary - efficiency, output kW, flow-temperature curve (for HPs). - - **Effort**: ~half-day for the lookup + tests; ongoing maintenance - when BRE publishes new PCDB revisions. - - **Recommended sequencing**: complete the systematic RdSAP spec - sweep first. Once the spec-correct engine is built and cert-cal - re-derived, PCDB integration should drop heat-pump residuals from - 19 SAP points to ~1, and tighten the gas-boiler residual variance. - At that point heat pumps (cat=4) and PCDB-listed boilers - (`main_heating_data_source=1`) become accessible. - - **Why not now**: the cert-calibration prices currently absorb the - missing PCDB efficiency (HP costs at off-peak rate compensates for - too-low SCOP). Fixing PCDB without re-deriving cert-cal would push - HP certs in the wrong direction. Same lesson as the other reverted - fixes in §7b — fix the spec layer first, the calibration layer - later. -- **SAP 10.3** (13-01-2026). The corpus is SAP 10.2. SAP 10.3 has - identical Table 12 codes (only values shift). Don't update spec - references to 10.3 until the corpus migrates. +- **SAP 10.3 (13-01-2026).** No SAP-10.3-lodged certs in the corpus, + so it cannot be validated. Calculator targets SAP 10.2 until the + corpus migrates (expected late 2026 / 2027 once BRE updates RdSAP + to reference SAP 10.3). Note: `table_12.py` currently mixes SAP + 10.2 prices with SAP 10.3 CO2 factors — corrected as part of P2. +- **Historical-spec cert reproduction.** Calculating what cert SAP + *would have been* under SAP 10.1 / pre-March-2025 SAP 10.2 prices is + not the calculator's job. Lodged Performance carries the historical + value; Calculated SAP10 Performance is current-spec only. The + Validation Cohort filter operationalises this — older certs are + out of the validation loop, not because they're "wrong" but because + they're a different spec's output. +- **Re-deriving cert-cal at the end.** The prior session's plan. The + cert-calibration layer is deleted in P2, not re-fit. --- @@ -208,22 +360,49 @@ For each table, formula, footnote, exception: 3. Are there spec-defined edge cases / footnotes we're missing? ### 5.4. When a gap is found -- Write a failing unit test that asserts the spec-correct behaviour. +- Write a failing unit test that asserts the spec-correct behaviour + — wherever possible, write it as an assertion on `intermediate` + values rather than on aggregate SAP, using a BRE worked example + if one covers the section. - Implement the fix. -- Run **all 7 golden fixtures** plus the broader probe. Note both - direction and magnitude of change. -- If the fix is spec-correct but breaks a golden fixture, this is - evidence that the fixture was a compensating-error case — proceed - with the spec-correct fix and update the fixture (with a comment - noting it was a compensating case). -- Commit per-slice as before: one section → one commit. Reference the - spec section in the commit message. +- Run `test_bre_worked_examples.py` plus the Validation Cohort + probe. Note both direction and magnitude of change. +- If a BRE worked-example breaks, the new code is wrong (revert). + BRE examples are spec-derived and cannot regress from a + spec-correct change. +- Commit per-slice: one section → one commit. Reference the spec + section in the commit message. -### 5.5. Use trace mode when you need it -ADR-0009 specifies a `SapResult.intermediate: dict[str, float]` field -that was never populated. Adding this is highly recommended for the -systematic pass — each section's verification benefits from -inspecting the intermediate values. See §11 below for a sketch. +### 5.5. Sweep-time principle: worksheet-faithful structure + +Each `worksheet/*.py` module must mirror the SAP 10.2 worksheet +structure for its section. As you verify a section, also restructure +its module so that: + +1. **Each function name references its worksheet-line origin** (e.g. + `heat_transfer_coefficient` aligns with worksheet line (40); + `mean_internal_temperature` aligns with worksheet line (93)). +2. **Compound calculations are split** into one function per + worksheet line where possible — easier to verify against + `intermediate[...]` and against BRE worked-example values. +3. **Defensive type-handling disappears**. Once P6 lands, the input + is a typed enum or numeric — branching on `isinstance(x, int)` is + replaced by enum dispatch. +4. **Domain-typed inputs flow directly**. `SapBuildingPart.kind == + BuildingPartKind.MAIN_DWELLING` replaces string sniffing of + `identifier`. The dimensions.py "unnecessarily complicated" + pattern Khalim flagged is the canonical example of what *not* + to do. + +The principle applies during section-sweep slices. It is **not** +a separate prerequisite — the refactor lands with the verification +slice for the section it touches. + +### 5.6. Use trace mode when you need it +P5 populates `SapResult.intermediate: dict[str, float]` with every +named SAP 10.2 worksheet variable. Each section's verification +benefits from inspecting these values per-cert. See §11 below for +the sketch. --- @@ -350,59 +529,60 @@ touched and what the current state is. --- -## 7. The cert-calibration vs spec-correctness tension +## 7. The cert-calibration "tension" is dissolved (per ADR-0010) -This is THE central architectural decision you have to make as you -work through the spec. +This section originally framed cert-calibration vs spec-correctness as +two end-states the calculator had to choose between. That framing is +wrong (see §3 for the actual diagnosis): the cert-cal values are +pre-March-2025 SAP prices, not Elmhurst deviations from SAP 10.2. +Once the corpus is filtered to the Validation Cohort (P3) and the +cert-cal layer is deleted (P2), the false dichotomy disappears. -### Two tables of fuel prices -- `domain.sap.tables.table_12.UNIT_PRICE_P_PER_KWH` — SAP 10.2 spec - values (3.64p gas, 16.49p standard elec). -- `domain.sap.tables.table_12_cert_calibration.UNIT_PRICE_P_PER_KWH` - — empirically lower values (3.48p gas, 13.19p elec) that match the - cert assessor software's output. +### What replaces this section -### Two possible end states for the calculator +- **One price table.** `domain.sap.tables.table_12` (re-labelled SAP + 10.2 14-03-2025 amendment, CO2 factors corrected per P2). +- **One validation cohort.** `inspection_date ≥ 2025-07-01`, every + cert lodged on the calculator's target spec version. +- **One verification mechanism.** Trace-mode intermediates + BRE + worked-example unit tests for per-section verification; Validation + Cohort probe MAE for aggregate go/no-go. -**End state A — Spec-perfect.** Use spec prices, apply every spec rule -(standing charges, Table 12a fractions, combi zero-loss, etc.). The -calculator output is then what a *correct SAP 10.2 implementation* -would produce. SAP MAE against the corpus will likely worsen because -Elmhurst doesn't perfectly implement spec. - -**End state B — Elmhurst-perfect.** Use cert-cal prices and reproduce -Elmhurst's deviations exactly. The calculator output matches cert -SAP scores. The calculator becomes a "reverse-engineered Elmhurst -clone" rather than a SAP 10.2 implementation. - -### The pragmatic recommendation - -**Aim for state A but track state B as the parity probe.** Concretely: - -1. Verify each spec section in isolation; fix spec violations - regardless of MAE impact, but commit each fix WITH a measured - probe delta in the commit message. -2. After the spec sweep is complete, the calculator's output is - spec-correct. The corpus residual at that point is Elmhurst's - deviation from spec. -3. THEN re-derive the cert-calibration prices to match Elmhurst's - deviation pattern. The calibration becomes a thin Elmhurst- - compatibility layer on top of a spec-correct engine. - -This avoids the whack-a-mole problem because state A is unambiguous: -each fix is either spec-correct or not. State B is iterative on top -of state A, not entangled with it. +Cert-software deviations from spec, if they exist at all, are +expected to be small and localised. They surface as residual after +the spec sweep completes against a clean probe — and at that point +the question is whether to chase them at all (Elmhurst-deviation +fixes have low domain value compared to spec-correctness, given the +calculator's product use case is scoring counterfactuals for the +MeasureApplicator chain, not reproducing historical certs). --- ## 7b. Outstanding findings to pick up during the systematic pass The prior session identified several spec-correct fixes that were -**reverted because they made SAP MAE worse against the corpus, but the -spec basis is unambiguous and the fixes WILL be the right answer once -the cert-calibration is re-derived against a clean engine.** Treat -these as TODOs the systematic pass should encounter when it reaches -the relevant section. They're listed here so the work isn't lost. +reverted because they made SAP MAE worse against the **full corpus**. +The empirical signal that "reverted" them was version-mixture noise +(see §3) plus compensating-error breakage in the 7 retired golden +fixtures. Each fix below is **expected to land cleanly** once the +five prerequisites in §2.5 are done, because: + +- The Validation Cohort (P3) is on a single spec version — the price + mismatch that drove the bias regression on standing charges and + cat=10 routing disappears. +- The cert-cal layer is gone (P2) — no calibration to "break". +- PCDB is integrated (P4) — the heat-pump and gas-boiler residuals + that dominated per-cert MAE collapse before any of these findings + even matter. +- The fixtures are now BRE worked examples (P5 + §10) — they cannot + be broken by spec-correct changes because they are themselves + derived from the spec. + +Treat each finding as a section-sweep TODO. The empirical impacts +below were measured against the **dirty probe** (full corpus + cert-cal ++ no PCDB) and are **not predictive** of behaviour on the clean probe. +Re-measure each fix against the Validation Cohort after prerequisites +land. ### Finding 1 — HW cylinder zero-loss rule for combi boilers **Status**: spec-correct fix exists in working-tree-only form @@ -587,32 +767,45 @@ direction. ## 8. Don't repeat — known dead-ends +> **Re-read after §3 + §7b.** Three entries below were classified as +> "dead-ends because cert-cal absorbs" — that diagnosis is wrong. +> They are spec-correct fixes that were measured under a noisy probe. +> Now flagged as **conditional dead-ends**: dead only if you try them +> before P1–P5 land. After prerequisites: they are expected +> improvements, not dead-ends. See ADR-0010. + - ❌ **Switching "NI" wall thickness to None alone** (S-B5 in history) — over-corrected because it routed to the (Unfilled cavity, 50mm) row instead of the dedicated Filled cavity row. The right fix landed in S-B23 with a `WALL_INSULATION_FILLED_CAVITY` dispatcher. - ❌ **Aggressive efficiency rescue for missing `sap_main_heating_code`** (S-B5) — over-corrected. The category fallback (cat=4 → 2.30) is - intentionally conservative; PCDB is needed for real efficiency. -- ❌ **Using SAP 10.2 spec prices for parity validation** — cert assessor - uses lower prices despite reporting `sap_version=10.2` (S-B9, S-B10). - Use `cert_calibration_prices()` for the probe. + intentionally conservative; PCDB (P4 prerequisite) supplies the + real efficiency. +- ⚠️ **Using SAP 10.2 spec prices for parity validation** — under + the dirty probe, cert-cal prices fit better. **Inverts under the + clean probe (P2 + P3): SAP 10.2 spec prices are correct because the + Validation Cohort is on the 14-03-2025 amendment.** Listed here + only as a warning if you start the sweep before prerequisites land. - ❌ **Always applying 10% secondary heating** — must be conditional on cert lodging or main system being electric storage (S-B20). See spec Appendix A.4. - ❌ **Respecting `main_heating_fraction` for secondary allocation** (failed S-B30) — the field is the multi-main allocation (system 1 vs system 2), not main-vs-secondary. SAP MAE 4.69 → 4.85 (worse). -- ❌ **Switching cat=10 room heaters off off-peak** (failed S-B32) — - spec-correct per Table 12a but inverts bias direction. Cert-cal - calibration absorbs the deviation. -- ❌ **Adding gas standing charges** (4-mode probe, unimplemented) — - spec-correct per Table 12 note (a) but pushes SAP bias from +0.98 - to −2.62. Cert-cal calibration absorbs. -- ❌ **Zeroing storage + primary loss for combi boilers** (uncommitted - S-B32) — spec-correct per Table 2 + Table 3 footers and drops PE - MAE −6.64 (huge win) BUT raises SAP MAE +0.39 and breaks 3 golden - fixtures. Decision deferred to systematic pass. +- ⚠️ **Switching cat=10 room heaters off off-peak** (failed S-B32) — + spec-correct per Table 12a. The bias inversion under the dirty + probe was driven by cert-cal compensating; on the clean probe this + is just spec-correct. Land as part of the §12 spec sweep after + prerequisites. +- ⚠️ **Adding gas standing charges** (4-mode probe, unimplemented) — + spec-correct per Table 12 note (a). Same logic: bias drift under + dirty probe is version-mixture + missing-PCDB noise, not Elmhurst + deviation. Land as part of §12 spec sweep. +- ⚠️ **Zeroing storage + primary loss for combi boilers** (uncommitted + S-B32) — spec-correct per Table 2 + Table 3 footers. SAP MAE + regression was driven by the now-retired golden fixtures (§10) and + cert-cal absorption. Land as part of §4 / Appendix J sweep. --- @@ -620,10 +813,15 @@ direction. ### Sample `data/ml_training/runs/2025_2026_n250000_v18a/data.parquet` is the -250k-cert parquet. The probe filters to `sap_score ∈ [5, 99]` and +250k-cert parquet. **After P1 lands** the parquet carries +`inspection_date`; the probe then filters to the **Validation Cohort** +(`inspection_date ≥ 2025-07-01`) plus `sap_score ∈ [5, 99]` and samples 300 at seed=7 by default. Filtering rationale: -- ≤ 5 is heritage/anomaly stock (sub-3% of corpus) +- ≤ 5 is heritage/anomaly stock (sub-3 % of corpus) - ≥ 99 is full-SAP new-builds the parquet excludes anyway +- `inspection_date ≥ 2025-07-01` ensures every cert was lodged on + SAP 10.2 (14-03-2025 amendment) — see [CONTEXT.md](../../CONTEXT.md) + / "Validation Cohort" and ADR-0010 §3. ### Run the probe ```bash @@ -654,38 +852,58 @@ main(['300','7']) --- -## 10. The 7 golden fixtures +## 10. Fixtures: retire the 7 cert-based golden fixtures, replace with BRE worked examples (per ADR-0010 + P5) +The 7 cert-based fixtures at `packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py` -locks 7 corpus certs as regression anchors: +were locked in against the current calculator state — *with* cert-cal, +*without* PCDB, *with* HW cylinder loss always applied, *with* the +lighting heuristic, etc. They are documented in §3 / the prior +handover as containing compensating errors. Once the prerequisites +land, every spec-correct fix breaks at least one of them. They will +fight the spec sweep. -| Cert | TFA | Cat | Notes | -|---|---|---|---| -| `0240-0200-5706-2365-8010` | 202 | 2 | Detached, age J, oil boiler, Table 4b code 130 | -| `0300-2747-7640-2526-2135` | 526 | 2 | Semi-detached, age D, gas PCDB | -| `0390-2954-3640-2196-4175` | 360 | 2 | Detached, age F, oil PCDB | -| `6035-7729-2309-0879-2296` | 128 | 2 | Mid-terrace, age A, gas combi code 104 | -| `7536-3827-0600-0600-0276` | 152 | 2 | Detached + extensions, age D, gas PCDB. Cleanest PE match (−0.29 kWh/m²) | -| `8135-1728-8500-0511-3296` | 102 | 2 | Semi-detached, age C, gas PCDB | -| `9390-2722-3520-2105-8715` | 75 | 6 | Mid-floor flat, age D, heat network code 301 | +### Replacement strategy -Tolerance: `|SAP residual| ≤ 1`, `|PE residual| ≤ 10`. **Tighten as -the spec sweep progresses.** +**Primary regression suite: BRE worked-example fixtures.** -The cert JSONs are stored under `fixtures/golden/.json` — -frozen at extraction time so the test is reproducible without -bulk-zip access. The probe extraction script for new fixtures is -inlined in the test history (see commit `f4a8d2a0`). +Transcribe the worked examples from: +- SAP 10.2 spec appendices (especially Appendix R — reference values + and the worked example dwelling). +- RdSAP 10 (10-06-2025) worked-example annex. -**Important caveat**: some of these 7 are compensating-error matches -(see §3). When a spec-correct slice breaks one, the fixture is -probably the compensating case — investigate before reverting. +Each worked example becomes a unit test that locks **per-intermediate +expected values** (HLP, HTC, mean internal temperature monthly, MIT, +ECF, SAP score) rather than the aggregate SAP score alone. Because +they are spec-derived, no spec-correct change can break them — any +break is an implementation bug, unambiguously. + +These tests live at +`packages/domain/src/domain/sap/tests/test_bre_worked_examples.py` +(new module — separate from the cert-based fixtures module). + +**Cert-based fixtures retired.** + +The current `test_golden_fixtures.py` is either deleted or repurposed +as a *very loose* smoke-test integration suite (e.g. `|SAP residual| +≤ 5`) that catches catastrophic regressions only. The 7 cert JSONs +under `fixtures/golden/.json` can be kept on disk as reference +data, but they no longer drive go/no-go decisions in the sweep. + +**Optional future addition.** + +If/when a current Elmhurst (or Stroma / Quidos / NHER) license is +available, run a handful of representative corpus certs through it +and lock those outputs as a second-tier regression suite — Elmhurst- +parity fixtures alongside spec-parity fixtures. Not a prerequisite. --- -## 11. Trace mode (recommended infrastructure) +## 11. Trace mode (prerequisite P5 — implementation sketch) -ADR-0009 proposed: +This section was originally labelled "recommended"; it is now +**prerequisite P5** per ADR-0010. The sweep does not start until +`intermediate` is populated everywhere. ADR-0009 proposed: ```python @dataclass(frozen=True) class SapResult: @@ -765,14 +983,39 @@ This single session should produce zero behaviour changes if §1-3 are correctly implemented, but expect to find at least one issue in §3 geometry (per the reviewer's "biggest SAP error sources" list). -Run the golden fixtures + probe at the end of each session; expect no -movement until you start hitting actual gaps. +**Important:** Session 1 only starts after all five prerequisites in +§2.5 have landed and the Validation Cohort probe baseline has been +captured. Until then, running per-section verification produces noisy +signal. + +Run the BRE worked-example fixtures (P5) + Validation Cohort probe +(P3) at the end of each session; expect no movement until you start +hitting actual gaps. --- ## 13. Workflow recap -For each section, in order: +**Phase 0 — Prerequisites (§2.5).** Land P1–P6 first, in dependency +order: + +| | Slice | Depends on | +|---|---|---| +| P1 | Re-extract parquet with `inspection_date` | — | +| P2 | Delete cert-cal; correct `table_12.py` CO2 factors | — | +| P3 | Filter parity probe to Validation Cohort | P1 | +| P4 | Implement `PcdbLookup` | — (P2 helpful) | +| P5 | Populate `SapResult.intermediate` + transcribe BRE worked examples | — | +| P6 | Strict-type `EpcPropertyData` via codes.csv-derived enums | — | + +P1, P2, P4, P5, P6 can run in parallel. P3 needs P1. Capture a +Validation Cohort probe baseline once all six land — that is the new +MAE starting line. Repo-wide tests stay green throughout P6 (Site +Notes consumers, ML pipeline, recommendations, etc. all need the +mapper updates that accompany each typing change). + +**Phase 1 — Section sweep.** For each RdSAP 10 section, in document +order: 1. Read the spec section text + cited tables. 2. Identify code location(s). @@ -780,22 +1023,36 @@ For each section, in order: - Does our code implement it? - Does the implementation match? - Edge cases / fallback paths handled? -4. For each gap: AAA unit test → minimal implementation → commit. -5. After each commit: run golden fixtures (`pytest test_golden_fixtures.py`) - and the parity probe. Note both deltas in the commit message. -6. If a golden fixture breaks: investigate. Either fixture was a - compensating case (acceptable to break) or the new code is wrong - (revert). +4. For each gap: AAA unit test (preferring a BRE worked-example + assertion on `intermediate` values when possible) → minimal + implementation → commit. +5. **Apply the worksheet-faithful structure principle** (§5.5) as + part of this slice: name functions after worksheet lines, split + compound calculations, replace any remaining defensive + type-handling with typed-enum dispatch. +6. After each commit: run `test_bre_worked_examples.py` + Validation + Cohort probe. Note both deltas in the commit message. +7. If a BRE worked-example breaks: the new code is wrong (revert). + The worked examples are spec-derived and cannot be broken by + spec-correct changes. Stick to this. The prior session's mistake was jumping between -sections based on residual-size. Don't. +sections based on residual-size **on a dirty probe**. Clean probe +plus document-order discipline plus worksheet-faithful structure is +what makes the sweep converge. --- ## 14. Useful references +- **ADR-0010** `docs/adr/0010-sap10-calculator-spec-target-and-validation.md` + — the binding decisions reflected in this rewrite: SAP 10.2 target, + cert-cal deletion, Validation Cohort, PCDB-as-prerequisite, fixture + retirement. **Read first.** - **ADR-0009** `docs/adr/0009-deterministic-sap-calculator.md` — - decision rationale + Session A/B/C plan. + original calculator decision rationale + Session A/B/C plan. Read + for context; spec-version target / PCDB sequencing / cert-cal + rationale are superseded by ADR-0010. - **Spec coverage map** `docs/sap-spec/SPEC_COVERAGE.md` — pre-existing coverage tracker. Update as you go. @@ -817,19 +1074,19 @@ sections based on residual-size. Don't. ## 15. Final note -The prior session demonstrated that **moving SAP MAE down requires -either spec-correctness OR Elmhurst-perfect calibration, not both -simultaneously**. The cert-cal layer absorbs Elmhurst's spec -deviations; any spec-correct fix risks breaking it. +The prior session's framing — *"the cert-calibration layer absorbs +Elmhurst's spec deviations; we'll re-derive it at the end"* — was +load-bearing on a false diagnosis. The cert-cal layer is +pre-March-2025 SAP prices fit against a mixture distribution of two +spec-version regimes. Once you separate the regimes (Validation +Cohort) and use spec prices everywhere, the "tension" disappears. -The systematic pass clears this by separating the layers: -1. Build the spec-correct engine first. -2. Re-fit the cert-cal compatibility layer once at the end. +After P1–P5 land, the section sweep is straightforward: every +spec-correct fix is unambiguously the right answer, BRE +worked-example fixtures lock the result, and Validation Cohort probe +MAE moves monotonically downward. The fixes the prior session marked +as "spec-correct but probe-regressed" become trivially landable. -Don't be discouraged when SAP MAE rises temporarily during the spec -sweep. PE residual is the truer signal of engine correctness. SAP -MAE convergence will follow once cert-cal is re-derived against the -clean engine. - -**Welcome to the project. Read the spec, follow the order, commit one -section at a time. The deterministic answer is in there.** +**Welcome to the project. Read ADR-0010, land the five prerequisites, +then walk the spec in document order. The deterministic answer is in +there.** diff --git a/packages/domain/src/domain/sap/worksheet/dimensions.py b/packages/domain/src/domain/sap/worksheet/dimensions.py index 8294d272..e3cfdf6f 100644 --- a/packages/domain/src/domain/sap/worksheet/dimensions.py +++ b/packages/domain/src/domain/sap/worksheet/dimensions.py @@ -21,7 +21,6 @@ from typing import Final from datatypes.epc.domain.epc_property_data import EpcPropertyData, SapBuildingPart - _DEFAULT_STOREY_HEIGHT_M: Final[float] = 2.5 @@ -72,6 +71,16 @@ def dimensions_from_cert(epc: EpcPropertyData) -> Dimensions: """Build the `Dimensions` aggregate from an EpcPropertyData.""" parts = epc.sap_building_parts or [] + # Khalim Comments - this section seems to implement the + # worksheet section in page 132 and is unnecessarily + # complicated. The sap building parts are pre-ordered, form + # main building part to the extensions and the + # "identifier" field tells us if the part is the Main Dwelling + # of it's an extension. E.g. if it's an extension, identifier + # should be "Extension 1". + # We should strictly type the values on the EpcPropertyData + # domain model + ground_area = 0.0 ground_perim = 0.0 top_area = 0.0