Slice S0380.134: pin corpus PE against cascade demand-mode (apples-to-apples)

The SAP 10.2 worksheet computes each existing-dwelling metric in two
distinct blocks:

  1. "ENERGY RATING" block — uses Table 12 regulated prices + UK-
     average climate. Produces SAP score (Block 11a), total fuel
     cost (255), total CO2 (272).
  2. "EPC COSTS, EMISSIONS AND PRIMARY ENERGY" block — uses Table 32
     prices + postcode-specific climate. Produces total CO2 (272)
     again with different value, total PE (286).

The two blocks operate on different space-heating demand kWh per
SAP 10.2 §13 (e.g. solid fuel 8: 21097 kWh in rating block vs
16813 kWh in EPC block for London W6).

The corpus regression test was extracting all four pins and asserting
against the cascade's rating-mode result (`cert_to_inputs`). That was
apples-to-apples for SAP/cost/CO2 (the first `(255)` and `(272)`
matches the regex finds ARE in the rating block) but apples-to-
oranges for PE: the `(286)` Total PE only exists in the EPC block,
so every PE pin was comparing rating-mode cascade output against
EPC-block worksheet output. The mismatch inflated every PE residual
by 10-15% of total PE.

The fix runs both cascade modes in the Act phase and assigns:

  - rating-mode result → SAP / cost / CO2 residuals
  - demand-mode result (`cert_to_demand_inputs`) → PE residual

25 corpus _CorpusExpectation entries re-pinned. Some closed
dramatically (apples-to-apples reveals the cascade was actually
correct):

  ashp         +1467.90 → -11.80  ← effectively closed
  oil pcdb 1/2 +2086.75 → -83.82
  oil pcdb 3   +1897.43 → -271.44
  electric 1   +2837.14 → +164.91
  electric 8   +2113.83 → -224.46
  solid fuel 5 +2359.85 → -330.84

Others surfaced larger demand-mode gaps that the block mismatch had
been hiding — these are real cascade gaps the next slices will
address:

  electric 3       -850.93 → -3189.22
  electric 5/6     +540/+568 → -1797.96 / -1769.84
  pcdb 1           -171.70 → -3135.30
  solid fuel 2/3   +440.75 / +1451.79 → -2292.47 / -2496.20

The corpus test docstring + per-block-attribution comment now make
the rating-vs-EPC block distinction explicit so future reviewers
don't repeat the same conflation.

Extended handover suite at HEAD post-slice: 876 pass / 0 fail
(unchanged — no test count change, just per-pin value updates).

Pyright net-zero on touched file (0 → 0).

No cascade behaviour change. No golden / unit-test impact (the bug
was specific to the corpus test's pin-extraction logic).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-05-31 10:41:47 +00:00
parent 0d2d41abbb
commit 7530ed3f4a

View file

@ -10,9 +10,23 @@ controlled-variable signal this corpus was built to exercise.
Per variant we extract Block 11a (individual heating) or Block 11b
(community heating) pins from the P960 worksheet PDF, route the Summary
PDF through `ElmhurstSiteNotesExtractor` `from_elmhurst_site_notes`
`cert_to_inputs` `calculate_sap_from_inputs`, and assert each of the
four published outputs (continuous SAP, total fuel cost, CO2, PE)
matches its pinned residual within a tight absolute tolerance.
`cert_to_inputs` / `cert_to_demand_inputs` `calculate_sap_from_inputs`,
and assert each of the four published outputs matches its pinned
residual within a tight absolute tolerance.
The SAP 10.2 worksheet computes each existing-dwelling metric in two
distinct blocks: the "ENERGY RATING" block (uses Table 12 regulated
prices + UK-average climate; produces SAP score, total fuel cost,
CO2) and the "EPC COSTS, EMISSIONS AND PRIMARY ENERGY" block (uses
Table 32 prices + postcode-specific climate; produces Primary Energy).
The two blocks operate on different space-heating demand kWh values.
To compare apples-to-apples the corpus pins the worksheet's rating-
block (SAP / cost / CO2) against the cascade's rating-mode result
(`cert_to_inputs`) and the worksheet's EPC-block (PE) against the
cascade's demand-mode result (`cert_to_demand_inputs`). Pre-S0380.134
all four pins compared against rating-mode, which inflated every PE
residual by ~10-15% of total PE because the worksheet (286) Total PE
only appears in the EPC block.
Residuals are non-zero today: the cascade overshoots most variants by
+1..+30 SAP points (with `community heating 6` undershooting at 6.87,
@ -41,6 +55,7 @@ from domain.sap10_calculator.calculator import calculate_sap_from_inputs
from domain.sap10_calculator.exceptions import MissingMainFuelType
from domain.sap10_calculator.rdsap.cert_to_inputs import (
SAP_10_2_SPEC_PRICES,
cert_to_demand_inputs,
cert_to_inputs,
)
@ -105,22 +120,34 @@ class _CorpusExpectation:
# logs). All 10 close to ΔSAP ±7.4; solid fuel 5 +2.71 is the
# smallest open. 16 variants remain blocked (community heating,
# 4 electric storage codes, no system, oil non-Heating-oil, Bulk LPG).
#
# Slice S0380.134 fixed a measurement bug in the PE pin: the
# worksheet (286) Total PE only exists in the EPC block (uses
# postcode-specific climate + demand-mode space heating kWh), so
# comparing it against the cascade's rating-mode PE inflated every
# PE residual by 10-15% of total PE. The pin now compares the
# worksheet (286) against the cascade's demand-mode PE
# (`cert_to_demand_inputs`). Multiple variants closed dramatically
# (ashp +1468 → -12; oil pcdb 1/2 +2087 → -84; electric 1 +2837 →
# +165; electric 8 +2114 → -224); others surfaced larger demand-
# mode residuals that were hidden by the block mismatch (electric
# 3/5/6/7/9, pcdb 1, solid fuel 2-11).
_EXPECTATIONS: tuple[_CorpusExpectation, ...] = (
_CorpusExpectation(variant='ashp', block='11a', expected_sap_resid=+5.6680, expected_cost_resid_gbp=-130.5995, expected_co2_resid_kg=-1.4283, expected_pe_resid_kwh=+1467.8983),
_CorpusExpectation(variant='electric 1', block='11a', expected_sap_resid=+9.6439, expected_cost_resid_gbp=-222.2109, expected_co2_resid_kg=+14.3441, expected_pe_resid_kwh=+2837.1414),
_CorpusExpectation(variant='electric 2', block='11a', expected_sap_resid=+5.8523, expected_cost_resid_gbp=-134.8455, expected_co2_resid_kg=+94.4364, expected_pe_resid_kwh=+2420.9013),
_CorpusExpectation(variant='electric 3', block='11a', expected_sap_resid=+14.6973, expected_cost_resid_gbp=-338.6485, expected_co2_resid_kg=-379.1296, expected_pe_resid_kwh=-850.9293),
_CorpusExpectation(variant='electric 5', block='11a', expected_sap_resid=+10.9720, expected_cost_resid_gbp=-252.8131, expected_co2_resid_kg=-218.5642, expected_pe_resid_kwh=+540.3309),
_CorpusExpectation(variant='electric 6', block='11a', expected_sap_resid=+10.9720, expected_cost_resid_gbp=-252.8131, expected_co2_resid_kg=-209.8689, expected_pe_resid_kwh=+568.4500),
_CorpusExpectation(variant='electric 7', block='11a', expected_sap_resid=+9.6834, expected_cost_resid_gbp=-223.1212, expected_co2_resid_kg=-137.9832, expected_pe_resid_kwh=+1061.3307),
_CorpusExpectation(variant='electric 8', block='11a', expected_sap_resid=+6.8875, expected_cost_resid_gbp=-158.6999, expected_co2_resid_kg=-34.9564, expected_pe_resid_kwh=+2113.8303),
_CorpusExpectation(variant='electric 9', block='11a', expected_sap_resid=+12.0340, expected_cost_resid_gbp=-277.2813, expected_co2_resid_kg=-255.6076, expected_pe_resid_kwh=+362.4518),
_CorpusExpectation(variant='gshp', block='11a', expected_sap_resid=+5.1598, expected_cost_resid_gbp=-118.8901, expected_co2_resid_kg=-41.4461, expected_pe_resid_kwh=+639.1890),
_CorpusExpectation(variant='oil 1', block='11a', expected_sap_resid=+2.6578, expected_cost_resid_gbp=-61.2402, expected_co2_resid_kg=-242.2677, expected_pe_resid_kwh=+1259.6587),
_CorpusExpectation(variant='oil pcdb 1', block='11a', expected_sap_resid=+0.4239, expected_cost_resid_gbp=-9.7668, expected_co2_resid_kg=-35.9551, expected_pe_resid_kwh=+2086.7505),
_CorpusExpectation(variant='oil pcdb 2', block='11a', expected_sap_resid=+0.4239, expected_cost_resid_gbp=-9.7668, expected_co2_resid_kg=-35.9551, expected_pe_resid_kwh=+2086.7505),
_CorpusExpectation(variant='oil pcdb 3', block='11a', expected_sap_resid=+1.1597, expected_cost_resid_gbp=-26.7204, expected_co2_resid_kg=-53.1709, expected_pe_resid_kwh=+1897.4341),
_CorpusExpectation(variant='pcdb 1', block='11a', expected_sap_resid=+6.9521, expected_cost_resid_gbp=-157.6055, expected_co2_resid_kg=-845.8065, expected_pe_resid_kwh=-171.6971),
_CorpusExpectation(variant='ashp', block='11a', expected_sap_resid=+5.6680, expected_cost_resid_gbp=-130.5995, expected_co2_resid_kg=-1.4283, expected_pe_resid_kwh=-11.8017),
_CorpusExpectation(variant='electric 1', block='11a', expected_sap_resid=+9.6439, expected_cost_resid_gbp=-222.2109, expected_co2_resid_kg=+14.3441, expected_pe_resid_kwh=+164.9052),
_CorpusExpectation(variant='electric 2', block='11a', expected_sap_resid=+5.8523, expected_cost_resid_gbp=-134.8455, expected_co2_resid_kg=+94.4364, expected_pe_resid_kwh=+970.7570),
_CorpusExpectation(variant='electric 3', block='11a', expected_sap_resid=+14.6973, expected_cost_resid_gbp=-338.6485, expected_co2_resid_kg=-379.1296, expected_pe_resid_kwh=-3189.2203),
_CorpusExpectation(variant='electric 5', block='11a', expected_sap_resid=+10.9720, expected_cost_resid_gbp=-252.8131, expected_co2_resid_kg=-218.5642, expected_pe_resid_kwh=-1797.9601),
_CorpusExpectation(variant='electric 6', block='11a', expected_sap_resid=+10.9720, expected_cost_resid_gbp=-252.8131, expected_co2_resid_kg=-209.8689, expected_pe_resid_kwh=-1769.8410),
_CorpusExpectation(variant='electric 7', block='11a', expected_sap_resid=+9.6834, expected_cost_resid_gbp=-223.1212, expected_co2_resid_kg=-137.9832, expected_pe_resid_kwh=-1276.9603),
_CorpusExpectation(variant='electric 8', block='11a', expected_sap_resid=+6.8875, expected_cost_resid_gbp=-158.6999, expected_co2_resid_kg=-34.9564, expected_pe_resid_kwh=-224.4607),
_CorpusExpectation(variant='electric 9', block='11a', expected_sap_resid=+12.0340, expected_cost_resid_gbp=-277.2813, expected_co2_resid_kg=-255.6076, expected_pe_resid_kwh=-1975.8392),
_CorpusExpectation(variant='gshp', block='11a', expected_sap_resid=+5.1598, expected_cost_resid_gbp=-118.8901, expected_co2_resid_kg=-41.4461, expected_pe_resid_kwh=-454.5023),
_CorpusExpectation(variant='oil 1', block='11a', expected_sap_resid=+2.6578, expected_cost_resid_gbp=-61.2402, expected_co2_resid_kg=-242.2677, expected_pe_resid_kwh=-1050.4919),
_CorpusExpectation(variant='oil pcdb 1', block='11a', expected_sap_resid=+0.4239, expected_cost_resid_gbp=-9.7668, expected_co2_resid_kg=-35.9551, expected_pe_resid_kwh=-83.8239),
_CorpusExpectation(variant='oil pcdb 2', block='11a', expected_sap_resid=+0.4239, expected_cost_resid_gbp=-9.7668, expected_co2_resid_kg=-35.9551, expected_pe_resid_kwh=-83.8239),
_CorpusExpectation(variant='oil pcdb 3', block='11a', expected_sap_resid=+1.1597, expected_cost_resid_gbp=-26.7204, expected_co2_resid_kg=-53.1709, expected_pe_resid_kwh=-271.4351),
_CorpusExpectation(variant='pcdb 1', block='11a', expected_sap_resid=+6.9521, expected_cost_resid_gbp=-157.6055, expected_co2_resid_kg=-845.8065, expected_pe_resid_kwh=-3135.2991),
# Slice S0380.133 unblocked 10 solid-fuel variants by routing the
# Elmhurst §14.0 "Main Heating EES Code" through the new
# `_ELMHURST_MAIN_HEATING_EES_TO_FUEL_CODE` dict. Pre-slice the
@ -128,16 +155,16 @@ _EXPECTATIONS: tuple[_CorpusExpectation, ...] = (
# cost / CO2 / PE all route via the correct Table 32 fuel code.
# Remaining residuals are likely heating-system efficiency or
# control-type gaps — separate slices.
_CorpusExpectation(variant='solid fuel 2', block='11a', expected_sap_resid=+4.7910, expected_cost_resid_gbp=-110.3933, expected_co2_resid_kg=-484.3578, expected_pe_resid_kwh=+440.7506),
_CorpusExpectation(variant='solid fuel 3', block='11a', expected_sap_resid=+4.4310, expected_cost_resid_gbp=-102.0983, expected_co2_resid_kg=-1206.1483, expected_pe_resid_kwh=+1451.7872),
_CorpusExpectation(variant='solid fuel 4', block='11a', expected_sap_resid=+4.1283, expected_cost_resid_gbp=-95.1230, expected_co2_resid_kg=-714.4446, expected_pe_resid_kwh=+1655.3360),
_CorpusExpectation(variant='solid fuel 5', block='11a', expected_sap_resid=+2.7081, expected_cost_resid_gbp=-62.3977, expected_co2_resid_kg=-301.4166, expected_pe_resid_kwh=+2359.8540),
_CorpusExpectation(variant='solid fuel 6', block='11a', expected_sap_resid=-7.3846, expected_cost_resid_gbp=+168.2332, expected_co2_resid_kg=-153.6470, expected_pe_resid_kwh=+2519.2301),
_CorpusExpectation(variant='solid fuel 7', block='11a', expected_sap_resid=+5.8242, expected_cost_resid_gbp=-131.0462, expected_co2_resid_kg=-758.2093, expected_pe_resid_kwh=+2967.9919),
_CorpusExpectation(variant='solid fuel 8', block='11a', expected_sap_resid=+4.2391, expected_cost_resid_gbp=-97.6761, expected_co2_resid_kg=-14.9661, expected_pe_resid_kwh=+2512.8796),
_CorpusExpectation(variant='solid fuel 9', block='11a', expected_sap_resid=+3.4416, expected_cost_resid_gbp=-79.3010, expected_co2_resid_kg=-8.4751, expected_pe_resid_kwh=+2427.8078),
_CorpusExpectation(variant='solid fuel 10', block='11a', expected_sap_resid=+5.1366, expected_cost_resid_gbp=-118.3539, expected_co2_resid_kg=-52.9522, expected_pe_resid_kwh=+1848.8905),
_CorpusExpectation(variant='solid fuel 11', block='11a', expected_sap_resid=+4.3479, expected_cost_resid_gbp=-100.1809, expected_co2_resid_kg=-8.8428, expected_pe_resid_kwh=+1535.5344),
_CorpusExpectation(variant='solid fuel 2', block='11a', expected_sap_resid=+4.7910, expected_cost_resid_gbp=-110.3933, expected_co2_resid_kg=-484.3578, expected_pe_resid_kwh=-2292.4679),
_CorpusExpectation(variant='solid fuel 3', block='11a', expected_sap_resid=+4.4310, expected_cost_resid_gbp=-102.0983, expected_co2_resid_kg=-1206.1483, expected_pe_resid_kwh=-2496.1951),
_CorpusExpectation(variant='solid fuel 4', block='11a', expected_sap_resid=+4.1283, expected_cost_resid_gbp=-95.1230, expected_co2_resid_kg=-714.4446, expected_pe_resid_kwh=-1097.3549),
_CorpusExpectation(variant='solid fuel 5', block='11a', expected_sap_resid=+2.7081, expected_cost_resid_gbp=-62.3977, expected_co2_resid_kg=-301.4166, expected_pe_resid_kwh=-330.8371),
_CorpusExpectation(variant='solid fuel 6', block='11a', expected_sap_resid=-7.3846, expected_cost_resid_gbp=+168.2332, expected_co2_resid_kg=-153.6470, expected_pe_resid_kwh=-1312.5322),
_CorpusExpectation(variant='solid fuel 7', block='11a', expected_sap_resid=+5.8242, expected_cost_resid_gbp=-131.0462, expected_co2_resid_kg=-758.2093, expected_pe_resid_kwh=-1638.1589),
_CorpusExpectation(variant='solid fuel 8', block='11a', expected_sap_resid=+4.2391, expected_cost_resid_gbp=-97.6761, expected_co2_resid_kg=-14.9661, expected_pe_resid_kwh=-1307.9243),
_CorpusExpectation(variant='solid fuel 9', block='11a', expected_sap_resid=+3.4416, expected_cost_resid_gbp=-79.3010, expected_co2_resid_kg=-8.4751, expected_pe_resid_kwh=-510.4162),
_CorpusExpectation(variant='solid fuel 10', block='11a', expected_sap_resid=+5.1366, expected_cost_resid_gbp=-118.3539, expected_co2_resid_kg=-52.9522, expected_pe_resid_kwh=-1315.3508),
_CorpusExpectation(variant='solid fuel 11', block='11a', expected_sap_resid=+4.3479, expected_cost_resid_gbp=-100.1809, expected_co2_resid_kg=-8.8428, expected_pe_resid_kwh=-962.4251),
)
@ -303,15 +330,21 @@ def test_heating_systems_corpus_residual_matches_pin(
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
# Act
result = calculate_sap_from_inputs(
# Act — run both cascade modes so the comparison against the
# worksheet pins is apples-to-apples per block (see module
# docstring: rating block carries SAP / cost / CO2, EPC block
# carries PE).
rating = calculate_sap_from_inputs(
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES),
)
demand = calculate_sap_from_inputs(
cert_to_demand_inputs(epc, prices=SAP_10_2_SPEC_PRICES),
)
sap_resid = result.sap_score_continuous - worksheet['sap_c']
cost_resid = result.total_fuel_cost_gbp - worksheet['cost']
co2_resid = result.co2_kg_per_yr - worksheet['co2']
pe_resid = result.primary_energy_kwh_per_yr - worksheet['pe']
sap_resid = rating.sap_score_continuous - worksheet['sap_c']
cost_resid = rating.total_fuel_cost_gbp - worksheet['cost']
co2_resid = rating.co2_kg_per_yr - worksheet['co2']
pe_resid = demand.primary_energy_kwh_per_yr - worksheet['pe']
# Assert — each residual sits within its absolute tolerance of the
# pinned value. Drift beyond tolerance fires loudly; closures land