Model/docs/adr/0010-sap10-calculator-spec-target-and-validation.md
Khalim Conn-Kowlessar bb9c5ac017 docs: ADR-0010 retargets calculator to SAP 10.2; rewrite handover
Adds ADR-0010 superseding ADR-0009's spec-version target, PCDB
sequencing, and cert-calibration layer. Captures the conclusions
of a grill-with-docs session:

  1. Active spec target is SAP 10.2 (14-03-2025), not SAP 10.3 — no
     SAP-10.3-lodged certs exist in the corpus to validate against.
  2. table_12_cert_calibration is deleted (not "re-derived at the
     end"). It was pre-March-2025 spec prices fit against a mixture
     distribution of two spec-version regimes, with downstream-
     component bugs absorbed into the fit — not Elmhurst deviation.
  3. Validation Cohort: filter the corpus to inspection_date ≥
     2025-07-01 so every cert in the probe was lodged on SAP 10.2
     (14-03-2025) prices. One spec, one signal.
  4. PCDB integration is promoted from "Session C deferred" to
     prerequisite P4 — dominates residual variance on heat pumps and
     the 78% of gas-boiler certs lodging main_heating_data_source=1.
  5. Trace mode (SapResult.intermediate) and BRE worked-example
     fixtures replace the 7 cert-based golden fixtures, which
     contained compensating errors.
  6. Strict-type EpcPropertyData via codes.csv-derived canonical
     enums (P6) — the in-source motivation lives at
     dimensions.py:74-82 (Khalim's comment, included in this commit).
  7. Worksheet-faithful structure is a sweep-time principle: each
     worksheet module mirrors SAP 10.2 worksheet line numbering.

CONTEXT.md additions:
  - Refined "Calculated SAP10 Performance" and "SAP10 Calculation"
    to reference SAP 10.2 + ADR-0010.
  - New term "SAP Spec Version" — domain-meaningful because the
    same EpcPropertyData yields different sap_score under different
    spec revisions.
  - New term "Validation Cohort" — the version-locked sub-corpus.

HANDOVER_SYSTEMATIC_REVIEW.md is rewritten section-by-section to
reflect ADR-0010: §1 framing, §2 status pointer, new §2.5 with the
six prerequisites P1–P6 in dependency order, §3 diagnosis (cert-cal
was stale prices, not Elmhurst deviation), §4 scope (PCDB IN,
SAP 10.3 stays OUT), §5 approach (worksheet-faithful principle as
§5.5), §7 tension dissolved, §7b findings re-framed, §8 dead-ends
re-classified as conditional, §9 cohort filter, §10 fixture
strategy, §11 trace mode as prerequisite, §12 prereqs-first,
§13 Phase 0/Phase 1 workflow, §14 ADR-0010 reference, §15 final
note.

P2.1 (commit ac1aa56a) already lands the first ADR-0010 slice
(probe swap to spec prices).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 09:54:24 +00:00

9.7 KiB
Raw Blame History

Retarget Sap10Calculator to SAP 10.2 (14-03-2025); delete cert-calibration; validate on a spec-version-locked cohort

Status: Accepted. Supersedes the spec-version target, the PCDB sequencing, and the cert-calibration layer of ADR-0009. Adds strict typing of EpcPropertyData (P6) and a worksheet-faithful structural principle for the domain/sap/worksheet/* modules — both new concerns ADR-0009 didn't address. All other ADR-0009 decisions stand (Calculated SAP10 Performance as a glossary term, MeasureApplicator/Sap10Calculator chain, MCS boolean default-false, global thermal-bridging y factor, Table 27 living-area fraction, Table 11 secondary-heating allocation, MeasureOverrides rejection).

Why this ADR exists

ADR-0009 was written before a second-order problem in the validation corpus was visible: the 250k-cert training parquet spans multiple SAP spec versions (SAP 10.1 from 2019, SAP 10.2 pre- and post-14-March-2025 amendment), each of which was the active table when its certs were lodged. The prior session's domain.sap.tables.table_12_cert_calibration layer was implicitly absorbing this version mixture into a single "best fit" price set ~1025 % lower than the SAP 10.2 (14-03-2025) spec — closer to the SAP 10.1 era prices. Every spec-correctness slice that touched a downstream component (HW cylinder zero-loss, gas standing charges, Table 12a fractional blending) registered as a regression on the parity probe because the cert-cal layer had been numerically calibrated against the buggy state of every other component.

This ADR resolves four entangled decisions at once. They are coupled — none of them is the right call in isolation.

Decisions

1. Active spec target is SAP 10.2 (14-03-2025), not SAP 10.3

ADR-0009 named SAP 10.3 (13-01-2026) as the calculator's target. No SAP-10.3-lodged certs exist in the corpus; assessor software has not migrated. Targeting SAP 10.3 produces a calculator whose output is verifiable against no cert. The active target is SAP 10.2 (14-03-2025 amendment) — both the document RdSAP 10 (10-06-2025) cross-references for heating-system identification, and the amendment that current assessor software is on.

packages/domain/src/domain/sap/tables/table_12.py is re-labelled as SAP 10.2 (14-03-2025). Its CO2 factors are corrected to spec (0.210 kg/kWh mains gas, 0.136 kg/kWh standard electricity — the file currently has SAP 10.3 values 0.214 and 0.086). Prices already match SAP 10.2 (3.64 p mains gas, 16.49 p standard electricity, etc.) — the misleading "+25 % shift from SAP 10.2 to 10.3" comment is removed; the 13.19 p figure is from SAP 10.1, not SAP 10.2.

A future ADR retargets to SAP 10.3 once the cert corpus migrates (expected late 2026 or 2027 once BRE updates RdSAP to reference SAP 10.3).

2. table_12_cert_calibration is deleted

The cert-calibration table is bug-masking. Its prices are pre-March-2025 SAP values fit against the average cert in a mixed-version corpus, with downstream-component bugs absorbed into the fit. Removing it forces upstream errors to surface where they live, in the component that owns them, instead of being silently compensated for by a price tweak.

This includes the cert_calibration_e7_codes extension that routes codes 191196 (direct-electric) and 691696 (room heaters) to off-peak rates — Table 12a is explicit that "other direct-acting electric heating" bills 100 % at the high rate on a 7-hour tariff. The S-B14 finding that motivated this hack is in §8 of the handover as a documented dead-end.

domain.sap.tables.table_12.unit_price_p_per_kwh becomes the only price API. Parity probes are updated to use it.

3. Validation Cohort is filtered to a single spec-version window

Probe MAE against the full 250k-cert corpus measures both calculator correctness and the spec-version drift across certs lodged at different times. Without separating them, every spec-correctness improvement is noisy.

The Validation Cohort is the subset of corpus certs with inspection_date ≥ 2025-07-01 — chosen to allow ~4 months past the 14-March-2025 SAP 10.2 amendment for commercial assessor software to roll out the new tables. Filtering to this cohort yields a probe where every cert was lodged on the same spec version the calculator targets. MAE on the Validation Cohort is the only metric used for spec-sweep go/no-go.

This requires re-extracting the training parquet to include inspection_date (currently dropped by the ETL — 202 columns, none of them dates). That extraction is a prerequisite slice.

4. PCDB integration is promoted from Session C to a prerequisite

ADR-0009 deferred PCDB to Session C and shipped a NoOpPcdbLookup stub. The handover's own measurements show PCDB absence accounts for ~19 SAP points of MAE on heat-pump certs (Table 4a fallback SCOP 2.30 vs typical PCDB 2.803.50) and most per-cert variance on the 78 % of gas-boiler certs lodging main_heating_data_source=1 (category-default 0.80 vs typical PCDB 0.880.94). The handover's rationale for deferral ("cert-cal absorbs PCDB gaps") collapses with decision (2).

PCDB lookup against main_heating_index_number is built before the section-by-section sweep starts. Data source: https://www.ncm-pcdb.org.uk — CSV exports of boilers and heat pumps. Per-product fields needed: seasonal efficiency, secondary efficiency, output kW, flow-temperature curve (heat pumps). The NoOpPcdbLookup seam from ADR-0009 grill outcome #1 is the integration point; the stub returns None and the calculator falls back to Table 4a only when the cert lodges no main_heating_index_number or the PCDB has no matching record.

Verification infrastructure (also prerequisites)

Three pieces of infrastructure are built before the section sweep so per-section verification has unambiguous signal:

  1. Trace mode populated. ADR-0009 specced SapResult.intermediate: dict[str, float] and it was never built. Every named SAP 10.2 worksheet variable (heat transfer coefficient, mean internal temperature, monthly solar gains, utilisation factor, ECF, etc.) is exposed on intermediate so any single cert can be diffed against a hand-computed value, a BRE worked example, or a future Elmhurst reference trace.
  2. BRE worked-example unit tests. SAP 10.2 spec appendices and RdSAP 10 worked examples are transcribed as fixtures keyed on per-intermediate expected values, not aggregate SAP score. These replace the 7 cert-based golden fixtures (which contained compensating errors per the handover §10). The cert fixtures are retired.
  3. Strict typing of EpcPropertyData via canonical domain enums. Bare str and Union[int, str] fields (the latter because the gov API gives ints and Site Notes give strings) cascade defensive type-handling into every consumer — the calculator's dimensions.py:74-82 is Khalim's documented example. The domain holds one canonical enum per field, derived from datatypes/epc/domain/epc_codes.csv (union of keys across schema versions, hand-authored). The API mapper and Site Notes mapper each adapt their raw input to the canonical enum. Repo-wide test compatibility is a hard constraint — every consumer of EpcPropertyData (calculator, ML pipeline, recommendations, ETL) continues working after the typing pass. Pyright strict mode stays clean.

These map to prerequisites P5 (trace mode + BRE fixtures) and P6 (strict typing) in the handover §2.5.

Worksheet-faithful structure (sweep-time principle)

Each domain/sap/worksheet/*.py module must mirror the SAP 10.2 worksheet structure for its section — function names reference their worksheet-line origin (e.g. heat_transfer_coefficient aligns with worksheet line (40)), compound calculations split into one function per line where possible, defensive type-handling replaced by typed-enum dispatch. This is not a prerequisite slice; the refactor lands as part of each section's sweep slice, verified by the BRE worked examples (which assert per-intermediate values).

Consequences

  • ADR-0009's "MAE ≤ 1.0 SAP-point on typical subset" success criterion is restated against the Validation Cohort (not the full corpus). The "typical subset" exclusions in ADR-0009 (sap_score ≤ 5, ≥ 100, multi-heating, conservatory, RIR) still apply on top of the cohort filter.
  • The training parquet schema bumps when inspection_date is added — a non-breaking MINOR addition under ADR-0008's Feature Schema Version discipline.
  • The handover document docs/sap-spec/HANDOVER_SYSTEMATIC_REVIEW.md is rewritten in lockstep: §3 (diagnosis), §4 (scope), §7 (state-A-vs-state-B framing deleted), §7b (findings re-framed), §10 (fixture strategy), and a new §2.5 listing the five prerequisites.
  • Sessions A/B/C from ADR-0009 collapse into a single sequence: prerequisites land, then the section sweep runs against a clean probe with PCDB available.

Considered alternatives

  • Build versioned Table 12 (pre/post 14-March-2025) keyed on inspection_date and validate across the full corpus. Rejected as more work for no signal benefit during the spec sweep — the filtered cohort gets us to a clean probe faster. A versioned table is still future work if Calculated SAP10 Performance ever needs to reproduce historical cert SAP for products that compare against Lodged Performance directly.
  • Keep cert-cal during the sweep and re-derive at the end (the handover's prescription). Rejected for the reasons in decision (2): the cert-cal layer corrupts the signal during the sweep, which is precisely when the signal needs to be cleanest.
  • Pay for an Elmhurst license, lock fixtures to its output. Held in reserve. BRE worked examples are free and spec-derived; an Elmhurst trace would add value as a per-component reference but is not a prerequisite.