Model/docs/adr/0010-sap10-calculator-spec-target-and-validation.md
Khalim Conn-Kowlessar a7b08a4e8f refactor: move docs/sap-spec/ contents into domain/sap10_calculator/
Locality of reference — SAP-specific docs, specs, and runtime data
now live alongside the calculator that consumes them, mirroring the
prior packages→domain layout moves.

Move targets:

- Narrative MDs → domain/sap10_calculator/docs/
    NEXT_AGENT_PROMPT.md, HANDOVER_NEXT.md, SAP_CALCULATOR.md
- Spec PDFs → domain/sap10_calculator/docs/specs/
    RdSAP 10 Specification 10-06-2025.pdf
    PCDF_Spec_Rev-06b_12_May_2021.pdf
    sap-10-2-full-specification-2025-03-14.pdf
    sap-10-3-full-specification-2026-01-13.pdf
- PCDB runtime data → domain/sap10_calculator/tables/pcdb/data/
    pcdb10.dat (8.3MB) + 7× pcdb_table_*.jsonl (18MB total)

Path code rewrites (load-bearing):

- tables/pcdb/__init__.py: replaced parents[4]/'docs'/'sap-spec' with
  Path(__file__).resolve().parent/'data' for Table 105 JSONL loading.
- tables/pcdb/postcode_weather.py: same rebase for the pcdb10.dat path
  read by _postcode_climate_table().
- tables/pcdb/etl.py __main__: same rebase for the manual ETL invocation
  (source + output_dir both now point inside the package).
- tests/test_pcdb_etl.py: _PCDB_DAT_PATH now derives from
  parents[1]/'tables'/'pcdb'/'data' (was parents[3]/'docs'/'sap-spec').

Citation rewrites:

- 12 .py docstrings and 4 .md docs (ADRs + READMEs + narrative docs)
  had `docs/sap-spec/<file>` strings rewritten to their new locations.
- Two cases where the catch-all sed misfired (an ADR-0009 line about a
  PCDB extract; the pcdb __init__.py docstring about ETL output) were
  hand-corrected to point at tables/pcdb/data/ rather than docs/specs/.

docs/sap-spec/ is now empty (will be removed in a follow-up sweep or
left as a vestigial empty dir for future repurposing). ADRs 0009 and
0010 remain at docs/adr/ — they're part of the chronological
cross-cutting decision log, not calculator-specific narrative.

Verified:

- Calculator's 1e-4 production gate
  (test_api_001479_full_chain_sap_matches_worksheet_pdf_exactly) GREEN.
- Wider sweep (domain/sap10_calculator/ + domain/sap10_ml/): 1654
  passed / 20 failed — exact pre-move baseline. All 20 failures
  pre-existing (10 hand-built skeleton + 4 cohort chain + 6 cohort
  diff).
- Pyright net-zero on the 4 touched runtime/test files (0 errors)
  and unchanged on heat_transmission.py (13) / cert_to_inputs.py (35) /
  mapper.py (33).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 13:17:18 +00:00

22 KiB
Raw Blame History

Retarget Sap10Calculator to SAP 10.2 (14-03-2025); delete cert-calibration; validate on a spec-version-locked cohort

Status: Accepted. Supersedes the spec-version target, the PCDB sequencing, and the cert-calibration layer of ADR-0009. Adds strict typing of EpcPropertyData (P6) and a worksheet-faithful structural principle for the domain/sap/worksheet/* modules — both new concerns ADR-0009 didn't address. All other ADR-0009 decisions stand (Calculated SAP10 Performance as a glossary term, MeasureApplicator/Sap10Calculator chain, MCS boolean default-false, global thermal-bridging y factor, Table 27 living-area fraction, Table 11 secondary-heating allocation, MeasureOverrides rejection).

Why this ADR exists

ADR-0009 was written before a second-order problem in the validation corpus was visible: the 250k-cert training parquet spans multiple SAP spec versions (SAP 10.1 from 2019, SAP 10.2 pre- and post-14-March-2025 amendment), each of which was the active table when its certs were lodged. The prior session's domain.sap10_calculator.tables.table_12_cert_calibration layer was implicitly absorbing this version mixture into a single "best fit" price set ~1025 % lower than the SAP 10.2 (14-03-2025) spec — closer to the SAP 10.1 era prices. Every spec-correctness slice that touched a downstream component (HW cylinder zero-loss, gas standing charges, Table 12a fractional blending) registered as a regression on the parity probe because the cert-cal layer had been numerically calibrated against the buggy state of every other component.

This ADR resolves four entangled decisions at once. They are coupled — none of them is the right call in isolation.

Decisions

1. Active spec target is SAP 10.2 (14-03-2025), not SAP 10.3

ADR-0009 named SAP 10.3 (13-01-2026) as the calculator's target. No SAP-10.3-lodged certs exist in the corpus; assessor software has not migrated. Targeting SAP 10.3 produces a calculator whose output is verifiable against no cert. The active target is SAP 10.2 (14-03-2025 amendment) — both the document RdSAP 10 (10-06-2025) cross-references for heating-system identification, and the amendment that current assessor software is on.

domain/sap10_calculator/tables/table_12.py is re-labelled as SAP 10.2 (14-03-2025). Its CO2 factors are corrected to spec (0.210 kg/kWh mains gas, 0.136 kg/kWh standard electricity — the file currently has SAP 10.3 values 0.214 and 0.086). Prices already match SAP 10.2 (3.64 p mains gas, 16.49 p standard electricity, etc.) — the misleading "+25 % shift from SAP 10.2 to 10.3" comment is removed; the 13.19 p figure is from SAP 10.1, not SAP 10.2.

A future ADR retargets to SAP 10.3 once the cert corpus migrates (expected late 2026 or 2027 once BRE updates RdSAP to reference SAP 10.3).

2. table_12_cert_calibration is deleted

The cert-calibration table is bug-masking. Its prices are pre-March-2025 SAP values fit against the average cert in a mixed-version corpus, with downstream-component bugs absorbed into the fit. Removing it forces upstream errors to surface where they live, in the component that owns them, instead of being silently compensated for by a price tweak.

This includes the cert_calibration_e7_codes extension that routes codes 191196 (direct-electric) and 691696 (room heaters) to off-peak rates — Table 12a is explicit that "other direct-acting electric heating" bills 100 % at the high rate on a 7-hour tariff. The S-B14 finding that motivated this hack is in §8 of the handover as a documented dead-end.

domain.sap10_calculator.tables.table_12.unit_price_p_per_kwh becomes the only price API. Parity probes are updated to use it.

3. Validation Cohort is filtered to a single spec-version window

Probe MAE against the full 250k-cert corpus measures both calculator correctness and the spec-version drift across certs lodged at different times. Without separating them, every spec-correctness improvement is noisy.

The Validation Cohort is the subset of corpus certs with inspection_date ≥ 2025-07-01 — chosen to allow ~4 months past the 14-March-2025 SAP 10.2 amendment for commercial assessor software to roll out the new tables. Filtering to this cohort yields a probe where every cert was lodged on the same spec version the calculator targets. MAE on the Validation Cohort is the only metric used for spec-sweep go/no-go.

This requires re-extracting the training parquet to include inspection_date (currently dropped by the ETL — 202 columns, none of them dates). That extraction is a prerequisite slice.

4. PCDB integration is promoted from Session C to a prerequisite

ADR-0009 deferred PCDB to Session C and shipped a NoOpPcdbLookup stub. The handover's own measurements show PCDB absence accounts for ~19 SAP points of MAE on heat-pump certs (Table 4a fallback SCOP 2.30 vs typical PCDB 2.803.50) and most per-cert variance on the 78 % of gas-boiler certs lodging main_heating_data_source=1 (category-default 0.80 vs typical PCDB 0.880.94). The handover's rationale for deferral ("cert-cal absorbs PCDB gaps") collapses with decision (2).

PCDB lookup against main_heating_index_number is built before the section-by-section sweep starts. Data source: https://www.ncm-pcdb.org.uk — CSV exports of boilers and heat pumps. Per-product fields needed: seasonal efficiency, secondary efficiency, output kW, flow-temperature curve (heat pumps). The NoOpPcdbLookup seam from ADR-0009 grill outcome #1 is the integration point; the stub returns None and the calculator falls back to Table 4a only when the cert lodges no main_heating_index_number or the PCDB has no matching record.

Verification infrastructure (also prerequisites)

Three pieces of infrastructure are built before the section sweep so per-section verification has unambiguous signal:

  1. Trace mode populated. ADR-0009 specced SapResult.intermediate: dict[str, float] and it was never built. Every named SAP 10.2 worksheet variable (heat transfer coefficient, mean internal temperature, monthly solar gains, utilisation factor, ECF, etc.) is exposed on intermediate so any single cert can be diffed against a hand-computed value, a BRE worked example, or a future Elmhurst reference trace.
  2. BRE worked-example unit tests. SAP 10.2 spec appendices and RdSAP 10 worked examples are transcribed as fixtures keyed on per-intermediate expected values, not aggregate SAP score. These replace the 7 cert-based golden fixtures (which contained compensating errors per the handover §10). The cert fixtures are retired.
  3. Strict typing of EpcPropertyData via canonical domain enums. Bare str and Union[int, str] fields (the latter because the gov API gives ints and Site Notes give strings) cascade defensive type-handling into every consumer — the calculator's dimensions.py:74-82 is Khalim's documented example. The domain holds one canonical enum per field, derived from datatypes/epc/domain/epc_codes.csv (union of keys across schema versions, hand-authored). The API mapper and Site Notes mapper each adapt their raw input to the canonical enum. Repo-wide test compatibility is a hard constraint — every consumer of EpcPropertyData (calculator, ML pipeline, recommendations, ETL) continues working after the typing pass. Pyright strict mode stays clean.

These map to prerequisites P5 (trace mode + BRE fixtures) and P6 (strict typing) in the handover §2.5.

Worksheet-faithful structure (sweep-time principle)

Each domain/sap/worksheet/*.py module must mirror the SAP 10.2 worksheet structure for its section — function names reference their worksheet-line origin (e.g. heat_transfer_coefficient aligns with worksheet line (40)), compound calculations split into one function per line where possible, defensive type-handling replaced by typed-enum dispatch. This is not a prerequisite slice; the refactor lands as part of each section's sweep slice, verified by the BRE worked examples (which assert per-intermediate values).

Consequences

  • ADR-0009's "MAE ≤ 1.0 SAP-point on typical subset" success criterion is restated against the Validation Cohort (not the full corpus). The "typical subset" exclusions in ADR-0009 (sap_score ≤ 5, ≥ 100, multi-heating, conservatory, RIR) still apply on top of the cohort filter.
  • The training parquet schema bumps when inspection_date is added — a non-breaking MINOR addition under ADR-0008's Feature Schema Version discipline.
  • The handover document domain/sap10_calculator/docs/HANDOVER_SYSTEMATIC_REVIEW.md is rewritten in lockstep: §3 (diagnosis), §4 (scope), §7 (state-A-vs-state-B framing deleted), §7b (findings re-framed), §10 (fixture strategy), and a new §2.5 listing the five prerequisites.
  • Sessions A/B/C from ADR-0009 collapse into a single sequence: prerequisites land, then the section sweep runs against a clean probe with PCDB available.

Considered alternatives

  • Build versioned Table 12 (pre/post 14-March-2025) keyed on inspection_date and validate across the full corpus. Rejected as more work for no signal benefit during the spec sweep — the filtered cohort gets us to a clean probe faster. A versioned table is still future work if Calculated SAP10 Performance ever needs to reproduce historical cert SAP for products that compare against Lodged Performance directly.
  • Keep cert-cal during the sweep and re-derive at the end (the handover's prescription). Rejected for the reasons in decision (2): the cert-cal layer corrupts the signal during the sweep, which is precisely when the signal needs to be cleanest.
  • Pay for an Elmhurst license, lock fixtures to its output. Held in reserve. BRE worked examples are free and spec-derived; an Elmhurst trace would add value as a per-component reference but is not a prerequisite.

Amendment — §10a Fuel costs (2026-05-21)

Decision 1's "active spec target is SAP 10.2 (14-03-2025)" is narrowed for the §10a Fuel-costs block: cost prices for §10a and §10b are sourced from RdSAP10 Table 32 (PDF page 95), not SAP 10.2 Table 12. RdSAP10 §19.1 is explicit: "The SAP rating for RdSAP 10 is to be calculated using Table 32 prices (not Table 12) for section 10a and 10b."

CO2 emission factors and primary-energy factors remain SAP 10.2 Table 12 per RdSAP10 §19.2 (the values are identical across the two tables; the columns are duplicated in Table 32 for completeness but Table 12 is the canonical authoritative source the calculator continues to import).

Why the amendment exists

The §10a slice 1+2 rewrite (commits 0f255165, adfa7f60 on branch ara-backend-design-prd) surfaced two structural bugs that the pre-amendment Table-12-only path was masking:

  1. Wrong table. Table 12 unit prices were 555% off Table 32 per carrier (mains gas 3.64 vs 3.48, heating oil 4.94 vs 7.64, std electricity 16.49 vs 13.19, off-peak 9.40 vs 5.50, PV export 5.59 vs 13.19). Table 32 is what cert assessor software computes against; comparing our Table-12-driven SAP scores against PDF references was an apples-to-oranges check.
  2. Missing (251) standing charges. Table 12 note (a) (and the identical Table 32 note (a)) gates additional standing charges into the SAP-rating ECF: gas standing added when gas is used for space/water heating; off-peak electricity standing added when an off-peak meter is in use; standard-electricity standing always omitted. Pre-amendment the calculator applied zero standing charges — equivalent to ignoring £92£120/yr per gas-heated dwelling.

The 000490 Elmhurst fixture had a recorded -12.5% cost gap (£706 vs £807 PDF) that ADR-0010 §3 Validation Cohort framing attributed to "pre-amendment spec-version drift". The §10a rewrite shows the gap was wrong-table + missing-standing-charges — a real calculator regression, not corpus drift. Post-§10a 000490 closes to within ~4% of PDF cost and SAP rating ceiling tightens 6 → 2.

Consequences

  • domain/sap10_calculator/tables/table_32.py ships the RdSAP10 unit prices + standing charges + Table 12 note (a) gating function. Table 12 keeps the CO2 + PEF columns.
  • domain/sap10_calculator/tables/table_12a.py ships the high-rate-fraction lookups for off-peak split (Table 12a in SAP 10.2 PDF page 191 — RdSAP10 §19.1 cross-references this table directly). Tariff.TEN_HOUR carried for spec completeness even though RdSAP cert meter_type enum (1..5) has no 10-hour code.
  • domain/sap10_calculator/worksheet/fuel_cost.py ships the §10a orchestrator producing FuelCostResult (32 fields, line refs (240)..(255)). cert_to_inputs._fuel_cost precompute wires it from cert state.
  • The 000474 Elmhurst fixture cost residual widened from -0.6% to +10.7% (SAP rating ceiling loosened 2 → 4) because the pre-amendment wrong-table-but-cancels-kWh accidentally compensated for upstream §4 HW kWh + Appendix L lighting overestimates. §4 HW worksheet tightening is the next ticket — see project memory project_section_4_hw_next_ticket. Ceiling drops back to 2 (or below) when that lands.
  • Golden corpus SAP tolerance widened ±7 → ±11 per the Validation Cohort discipline (oil unit price +55% from Table 12 → Table 32 moves oil-heated golden certs whose lodged SAP scores pre-date Table 32).

Deferred work (named in §10a slice 3)

  • §4 HW worksheet tightening + Appendix L lighting predictor — next ticket.
  • Table 12a high-rate-fraction wiring for off-peak electric mains (Table12aSystem cert→row mapping). Currently the cert→precompute path returns a zero FuelCostResult sentinel for off-peak certs, deferring to the legacy scalar _*_fuel_cost_gbp_per_kwh heuristic.
  • Table 13 immersion / HP-DHW WH high-rate fractions.
  • Off-peak per-row (230a)..(230g) Table 12a split for pumps/fans (spec line 8076).
  • (247a) Instant electric shower kWh routing.
  • (252) per-row Appendix M/N split (PV / wind / hydro / micro-CHP) — currently single pv_credit_gbp scalar.
  • (253)/(254) Appendix Q routes.
  • Drop the legacy scalar space_heating_fuel_cost_gbp_per_kwh / hot_water_fuel_cost_gbp_per_kwh / other_fuel_cost_gbp_per_kwh / secondary_heating_fuel_cost_gbp_per_kwh / pv_export_credit_gbp_per_kwh fields from CalculatorInputs once the ~33-occurrence synthetic-test corpus migrates to fuel_cost=....

Amendment — Appendix L lighting (2026-05-22)

The cost-side inputs.lighting_kwh_per_yr is sourced from the spec-faithful Appendix L L1-L11 cascade (via InternalGainsResult.lighting_kwh_per_yr), not from the legacy predicted_lighting_kwh heuristic. Replaces the 9.3 × TFA × (1 bulb-share-reduction) linear approximation with the same cascade that drives §5 (67) gains, so the cost side and the gains side share one source of truth.

Why the amendment exists

The Appendix L cascade was already implemented spec-faithfully for the §5 internal-gains side (validated across all 6 Elmhurst fixtures at ≤0.6% on LINE_67 monthly W tuples), but cert_to_inputs populated the cost-side inputs.lighting_kwh_per_yr from a separate heuristic that over-counted ~3× on the Elmhurst cohort (528 vs 140 kWh on 000474). The +9.2% total fuel cost residual on 000474 was dominated by this single component.

Two engine bugs surfaced during the wire-up:

  1. Cosine modulation integral. The L1-L9 formula yields a "continuous" annual E_L. The SAP10.2 worksheet at line (232) lodges Σ(L11 monthly distribution), which differs from the continuous formula by the discrete integration factor Σ(n_m × [1 + 0.5cos(2π(m 0.2)/12)]) / 365 = 0.998539. Pre-fix annual_lighting_kwh returned the continuous value → uniform +0.146% bias across all 6 fixtures. Post-fix sums the monthly distribution directly.
  2. Cert EPC under-lodgement. _w000474.build_epc() + _w000490.build_epc() did not pass low_energy_fixed_lighting_bulbs_count or sap_windows to make_minimal_sap10_epc. The §5 LINE_67 fixture conformance tests poke these at the test level, but the e2e Sap10Calculator().calculate(epc) path bypasses that. Without them, the cascade fell through to L5b (185 × TFA lm) + L8c (21.3 lm/W) + C_daylight = 1.433 no-bonus — producing ~317 kWh on 000474 instead of 139.9452. Fixed by passing the existing fixture constants (SECTION_5_BULB_COUNT_LEL + SECTION_6_VERTICAL_WINDOWS) through.

Consequences

  • 000474 e2e SAP integer closes to delta=0 (62 = PDF 62; continuous 62.1664 vs 62.2584, Δ 0.09). First Elmhurst fixture to hit the rdsap engine integration gate. Test ceilings tightened 3 → 0 (integer) and 3.5 → 0.5 (continuous).
  • 000490 SAP integer + fuel cost tests xfail (strict). Appendix L closure is spec-faithful (lighting kWh 614 → 171 matches U985 (232)=171.4217 to abs=1e-4), but the cost residual widens from -4.7% to -12.9% and SAP delta widens 3 → 6. The remaining residual is from other broken components on this fixture — primary suspects: fuel pricing for the pre-2025-07-01 cohort (Table 32 lodge-date snapshot semantics), main heating fuel +2.5% overshoot, Table D1/D2/D3 Ecodesign corrections, Appendix N heat-pump cascade. Per feedback-e2e-validation-philosophy memory: don't widen, hunt. Tests re-enable when each next component closes.
  • Golden fixture _PE_TOLERANCE_KWH_PER_M2 widened 30 → 35 to absorb the elec-PEF × lighting-Δ contribution (~4 kWh/m²) on the non-Elmhurst cohort. Pre-Appendix-L baseline residuals already sat near -28 kWh/m² from unrelated components on those certs. Tightens back when the dominant remaining components close.
  • Per-component worksheet-level pins land: result.lighting_kwh_per_yr == U985 (232) at abs=1e-4 for the 2 e2e fixtures, and InternalGainsResult.lighting_kwh_per_yr == U985 (232) at abs=1e-4 for all 6 §5 fixtures. New per-fixture constant LINE_232_LIGHTING_KWH_PER_YR pins each lodged value.
  • predicted_lighting_kwh kept in domain/ml/demand.py with a deprecation note. Still used by domain.sap10_ml.ecf.energy_cost_factor and domain.sap10_ml.transform.transform_to_predictions — both legacy ML pre-SAP-rewrite call sites; rip when those migrate.

Deferred work (named in Appendix L slice 3)

  • 000490 / cohort SAP-integer closure (residual hunt). Next ticket. Suspects above. Driven by user's next batch of test fixtures (battle-testing the engine) → emergent residual identification.
  • predicted_lighting_kwh deletion. Future cleanup ticket once domain.sap10_ml.ecf + domain.sap10_ml.transform are off the legacy heuristic.
  • RdSAP10 → API integration test. End-state e2e harness: RdSAP API response → cert_to_inputscalculate_sap_from_inputs → SAP integer = lodged integer. Once enough cohort fixtures pass delta=0 on isolated components.

Amendment — Cohort residual hunt + SAP 10.2 rating constants (2026-05-22)

The post-Appendix-L 000490 residual (SAP delta +6, cost -£104) closed in four micro-cycles after a per-component diagnostic walk down the spec cascade. Five engine pieces landed end-to-end:

  1. Secondary heating cascade (607e52a3): cert lodges SAP code 691 (Electricity Electric Panel, 100% efficiency); build_epc wasn't passing it through. Closes -£104 on 000490.
  2. Ventilation cert lodgement (af6fcfb1): SapVentilation schema gains 4 new fields (sheltered_sides, has_suspended_timber_floor, suspended_timber_floor_sealed, has_draught_lobby). cert_to_inputs now reads them. Removes a long-standing sheltered_sides=2 hardcode + 4 TODOs. All 6 fixtures' (25)m monthly effective ACH closes to U985 PDF at abs=1e-3 (72 assertions).
  3. Table 4f gas-combi pumps_fans (b536b46a): keyed by main_heating_category. Category 2 (gas boilers) → 115 kWh pump + 45 kWh flue fan = 160 kWh/yr. Other categories still on the legacy 130 sentinel.
  4. SAP 10.2 rating constants (a41ac6bd): worksheet/rating.py was using SAP 10.3 constants (deflator 0.36, slope 16.21/120.5). Per ADR-0010 §1 active spec target IS SAP 10.2 (14-03-2025). Restored SAP 10.2 values: deflator 0.42, linear branch slope 13.95, log branch intercept 117, log slope 121. The two errors were near-cancelling for the Elmhurst combi-gas cohort (low-cost dwellings on the linear branch).
  5. 000477 build_epc lodgement (partial — Table 3c blocker) (960419a9): mirrors the Appendix L slice 2 fix on 000477 (lodge windows + bulbs + PCDB index + secondary 691 + number_baths=0). Closes 000477 SAP delta from +6 to +1. Remaining +1 blocked by Table 3c (next ticket).

Consequences

  • 000474 + 000490 both hit SAP integer delta=0. First two Elmhurst fixtures across the rdsap engine integration gate. 685 tests pass + 1 xfail (000477 pending Table 3c).
  • Per-component pins now landed: lighting kWh, monthly infiltration ACH, secondary heating fuel, pumps_fans, plus the pre-existing §4 HW + §5 + §6 + §7 + §8 + §10a sections.
  • 000477 cost residual -3.5% remaining is the Table 3c 600-kWh-overshoot on combi-loss.
  • 000480/000487/000516 still at SAP delta +11/+12 because their build_epc lodgement is also incomplete (mirror the 000477 fix). Their PCDB records (16839/18119/18118) also have separate_dhw_tests=2 for sustain models → Table 3c blocker.

Deferred work (named in cohort slice 5)

  • Table 3c two-profile combi-loss override — Next ticket. SAP10.2 Appendix J §J3. Blocks 000477/000480/000487/000516 closure.
  • Build_epc lodgement on 000480/000487/000516 — Same pattern as 000477 (windows + bulbs + PCDB index + secondary 691 + number_baths). Lands with the Table 3c ticket since SAP closure requires both.
  • RdSAP API integration test — End-state validation gate. User generating exotic fixtures to pressure-test first.
  • §12a CO2 + §13a PE per-component pins — Engine produces result.co2_kg_per_yr and result.primary_energy_kwh_per_m2. Not yet validated against U985 (272) + (282) for any fixture.
  • PCDF field-position audit: parser reads F2 from fields[55]. PCDB 18118 raw row has 13.729 at index 52 — unclear which field that maps to per BRE PCDF Spec §7.11. Verify before assuming F2=0 is the lodged value.