User clarified end-of-session: mapper is a thin enum-and-shape translation; when residuals remain after closing mapper coverage gaps, the gap is in the **calculator cascade**. This unlocks an Elmhurst-only fixture path that doesn't need API JSON at all. The fixture shape mirrors the 6 historical Elmhurst U985 fixtures (000474, 000477, 000480, 000487, 000490, 000516) at `domain/sap10_calculator/worksheet/tests/_elmhurst_worksheet_NNNNNN.py` + `test_e2e_elmhurst_sap_score.py`: build_epc() → cert_to_inputs → calculate_sap_from_inputs ↳ every SapResult field pinned at abs=1e-4 against U985 line refs Any failing pin is definitionally a calculator bug. The user generates certs in Elmhurst SAP and exports Summary + worksheet ZIPs — no gov.uk EPB lodgement required. Extended test case (000565) ready at `sap worksheets/extended test case/`: - Summary_000565.pdf (input) - U985-0001-000565.pdf (worksheet ground-truth) Cert 000565 is a wacky stress-test that exercises 3-4 zero-coverage cascade paths in one cert: Main + 4 extensions, age mix A through J, RR on every part with mixed ages, conservatory with fixed heaters, curtain-wall Ext2 post-2023, mixed wall types (solid brick + stone + curtain wall), mixed party walls (CU + CF + Unable to determine). After this cert lands, the user has agreed to generate single-feature certs (oil only, LPG only, solid fuel only, electric direct only, multi-main-heating, basement) to surface single-cause calculator gaps. Handover doc now has implementation outline (mirror _elmhurst_worksheet_000474.py shape) and a coverage-paths table showing which targets each fuel-type/config exercises.
18 KiB
Handover — golden coverage + next slice
Branch feature/per-cert-mapper-validation. HEAD: b7fbbcca (Slice
S0380.51 strict-raise UnmappedApiCode on API integer enums).
Test baseline: 769 pass + 0 fail. Pyright net-zero on every
touched file.
Recent session slices (S0380.47 → S0380.51)
| Slice | Commit | What |
|---|---|---|
| S0380.47 | 42ed38f7 |
β-split wired into cost cascade per Appendix M1 §6 — zero cohort impact because Table 32 collapses code 30 = code 60 = 13.19 p/kWh |
| S0380.48 | bf99b1c7 |
Schema gap closure: real-API pv_batteries[] lodges battery_capacity flat-shape ([{"battery_capacity": 5}]), schema expected nested {"pv_battery": {"battery_capacity": 5}} → 5-kWh batteries silently dropped → β too low. Cohort PE +2.7..+8.1 → −3.5..−4.5 |
| S0380.49 | e75198ce |
Effective-monthly Table 12e PE factors for the PV split per Appendix M1 §8. Cohort PE −3.5..−4.5 → −2.8..−3.7 |
| S0380.50 | 3d1e6f10 |
§4 seasonal monthly HW fuel for PV β cascade — replaced days-prorated hot-water demand with §4 (62)m seasonal output scaled to annual fuel. Cohort PE −2.8..−3.7 → −2.7..−3.5 |
| S0380.51 | b7fbbcca |
Strict-raise UnmappedApiCode on five API mapper helpers (floor_construction, floor_heat_loss, roof_construction, party_wall_construction, built_form). Surfaced two coverage gaps immediately (floor_heat_loss codes 2/3/6) and added explicit mappings. 6 new tests as the forcing function. |
Test-coverage matrix (current state)
| Test file | Certs | What's pinned |
|---|---|---|
test_summary_pdf_mapper_chain.py |
38 cohort-2 + 8 ASHP + per-cert chain tests | SAP at 1e-4 vs worksheet |
test_golden_fixtures.py |
15 certs | SAP int + PE + CO2 residuals vs API-lodged |
test_all_golden_fixtures_extract_via_api_without_unmapped_code_raise |
All JSON in fixtures/golden/ |
No UnmappedApiCode raised at extraction |
Cohort overlap
- Golden ∩ Cohort-2 = 0/38 — cohort-2 certs are NOT in golden fixtures
- Golden ∩ ASHP = 7/8 — cert 9501 lives in chain tests only
- Golden open-front = 8 certs (oil + gas + RR) — no worksheets, API-only
Cohort-2 SAP closure (chain tests)
All 38 at max |Δ| = 5e-5 vs worksheet — closed.
Cohort-2 PE / CO2 (probed but NOT pinned anywhere)
- 24/38 closed (|PE| < 1, |CO2| < 0.05)
- 14/38 open. Top offender: cert 2102 at +20.4 PE, −0.79 CO2 — completely undetected by any current test
- Other 13 cluster around −3 PE (same PV (233a/b) mystery pattern as the ASHP golden certs)
★ Next slice — add cohort-2 to test_golden_fixtures.py
This is the agreed-upon next slice (one-slice change, high-value):
- Run cohort-2 against
cert_to_demand_inputsand capture current PE/CO2 residuals - Add
_GoldenExpectationentries totest_golden_fixtures.pyfor all 38 certs - The pin tolerance stays at the existing
_PE_ABS_TOLERANCE_KWH_PER_M2 = 0.01/_CO2_ABS_TOLERANCE_TONNES = 0.001 - The 14 "open" certs get pinned at their CURRENT non-zero residuals (regression-guard, not closure)
- Cert 2102 (+20.4 PE / −0.79 CO2) becomes immediately visible as the next closure target with worksheet support
Why this is high-leverage: cohort-2 chain tests only pin SAP at 1e-4 (which catches cost-cascade drift but not PE/CO2 cascade drift). Cert 2102's +20.4 PE is invisible to any current test. Adding cohort-2 to golden creates regression guards across all three SAP/PE/CO2 cascades for 38 worksheet-backed certs.
Concrete implementation outline
# In test_golden_fixtures.py — add an entry per cohort-2 cert:
_GoldenExpectation(
cert_number="2102-3018-0205-7886-5204",
actual_sap=64, # from doc['energy_rating_current']
expected_sap_resid=+0, # cohort-2 closure at 1e-4 → rounds to 0
expected_pe_resid_kwh_per_m2=+20.3640, # current residual, pin here
expected_co2_resid_tonnes_per_yr=-0.7895,
notes=(
"Cohort-2 cert. SAP closed at 1e-4 via chain test. PE +20.4 / "
"CO2 -0.79 residuals are the open closure target — worksheet "
"exists (Summary + dr87) under `sap worksheets/`. Likely a "
"specific cascade gap to probe with the worksheet."
),
),
Use the probe in this session's last diagnostic to capture exact residuals:
PYTHONPATH=/workspaces/model python -c "
import json, pathlib
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
from domain.sap10_calculator.rdsap.cert_to_inputs import cert_to_inputs, cert_to_demand_inputs, SAP_10_2_SPEC_PRICES
from domain.sap10_calculator.calculator import calculate_sap_from_inputs
for cert in COHORT_2_LIST:
doc = json.loads(pathlib.Path(f'.../{cert}.json').read_text())
epc = EpcPropertyDataMapper.from_api_response(doc)
rating = calculate_sap_from_inputs(cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES))
demand = calculate_sap_from_inputs(cert_to_demand_inputs(epc, prices=SAP_10_2_SPEC_PRICES))
# ... print _GoldenExpectation tuple in the right format
"
The 38 fixture entries should land in one PR. After landing, cert 2102 becomes the obvious next closure target.
Open threads (after the cohort-2 add)
Tractable with worksheets we already have
-
Cert 2102 +20.4 PE / −0.79 CO2 — cohort-2 cert, worksheet exists under
sap worksheets/Additional data with api/for the cohort-2 batch. Surfaced by cohort-2 → golden migration. Best next closure target. -
PV (233a)+(233b) monthly mystery — documented at
project_pv_233_split_mystery.md. Cascade β = 0.7511 vs worksheet 0.7392 for cert 0380. Closes ~0.5 kWh/m² across the ASHP cohort. The 14 cohort-2 ASHP-pattern PE residuals at −3 kWh/m² likely share this root cause. -
_api_glazing_transmissionstrict-raise extension — the helper's existing comment says "Codes 4-12, 15+ not yet mapped — incremental coverage as new fixtures surface them." Same pattern as S0380.51. Mechanical; low risk; coverage-hardening.
Open without worksheets (low payoff)
Golden fixtures with large residuals but no worksheets to triangulate:
| Cert | PE Δ | Heating | Notes |
|---|---|---|---|
| 6035 | +46.76 | Gas combi age A (mid-terrace) | RR with "limited insulation (assumed)" → cascade roof = 130.72 W/K, possibly wrong cascade routing — needs worksheet |
| 0390-2954 | −26.01 | Oil combi Firebird PCDF 9005 | Oil tariff cascade + fabric heat loss — needs worksheet |
| 0240 | +12.49 | Oil boiler + PV + RR (detached) | Subsystem heat-loss diff in notes (roof 76.93 W/K) — needs worksheet |
| 0300 | +8.28 | Gas combi large semi TFA 526 | Shower outlet schema work was recent — needs worksheet |
| 2130 | −8.22 (chain) / −8.22 (golden) | Gas combi + PV | "gas combi PE under-count + secondary heating credit" — needs worksheet |
| 7536 | −7.08 | Gas combi multi-age (D/L/F) | "multi-age geometry probably surfaces per-bp U the spec table doesn't capture" — needs worksheet |
| 0535 | (in golden) | — | open-front — needs worksheet |
| 8135 | −0.07 | Gas | already closed — keep as regression guard |
The user observation that oil is under-represented is correct: 2 oil-boiler certs in golden, both at high residuals, both without worksheets. Solid fuel, LPG, electric direct-acting are completely absent.
Heating-system distribution across golden fixtures
| Heating | Count | Worksheets | Status |
|---|---|---|---|
| Boiler + radiators, mains gas | 34 | Most (cohort-2 + 9501) | Mostly closed at 1e-4 SAP |
| Air source heat pump | 20 | All 8 ASHP cohort have worksheets | β-split phase complete; ~−3 PE structural residual open |
| Boiler + radiators, oil | 2 | None | Both at high residuals; closure blocked on worksheets |
| Community scheme | 1 | None | Retired |
| Solid fuel | 0 | — | Completely absent |
| LPG | 0 | — | Completely absent |
| Electric direct / storage heater | 0 | — | Completely absent |
How to grow fixture diversity (answer to "what to download")
For the gov.uk EPB downloads UI, you only get API JSON — that's enough for SAP-closure verification IF the cert's lodged SAP value can be trusted (it's the assessor's calculator output). But:
-
The dr87-0001-NNNNNN.pdf worksheet — needed to debug structural cascade gaps line-by-line — is generated by the assessor's calculator (typically Elmhurst SAP tool) and bundled in their export ZIP. Not available via the gov.uk UI.
-
The cohort-2 + ASHP worksheets in
sap worksheets/Additional data with api/came from an Elmhurst data dump.
Recommended fixture targets to unlock open work:
- Oil worksheets — for cert 0240 + 0390 + 0390-2954 in our golden set. These would close ~38 PE kWh/m² of residual immediately.
- A solid-fuel cert with worksheet — anthracite / wood pellets / biomass. Currently zero coverage. The fuel-cost cascade through Table 32 + heat-emitter cascade has paths we've never exercised.
- An LPG cert with worksheet — Table 32 code different from gas/oil; the cost cascade has an LPG-specific branch that has never run in tests.
- An electric direct-acting cert with worksheet — storage heater (codes 401-409) or panel heater (codes 191-196). The off-peak tariff path (
_RDSAP_DEFINITELY_OFF_PEAK = {1, 4, 5}incert_to_inputs.py) currently raises rather than computes — first off-peak cert with worksheet would force that path. - A community/district heating cert with worksheet — currently the retired 9390 is the only such cert and it has no worksheet.
When grabbing certs from the data dump, filter by main_heating[0].description to ensure fuel-type coverage:
Boiler and radiators, oil(target: 5-10 worksheets)Boiler and radiators, anthracite/wood pellets/wood logsBoiler and radiators, LPGElectric storage heaters/Direct-acting electric heatersCommunity scheme
★★ Elmhurst-only path (calculator gap closure WITHOUT API JSON)
User insight from end of session: the mapper is a thin pass-through; when residuals remain after closing mapper gaps (cohort-2 → golden), the gap is in the calculator cascade, not the mapper. For calculator gaps, the API JSON is not load-bearing — only the Elmhurst Summary PDF (input) and the worksheet PDF (ground-truth line refs) are needed.
This is a different fixture shape from the cohort-2 + ASHP path. It
mirrors the 6 original Elmhurst U985 fixtures (000474, 000477,
000480, 000487, 000490, 000516) — the historical worksheet-pinned test
vectors at domain/sap10_calculator/worksheet/tests/_elmhurst_worksheet_NNNNNN.py
test_e2e_elmhurst_sap_score.py. No API JSON in the loop.
Concrete next-target: the extended test case at sap worksheets/extended test case/
sap worksheets/extended test case/
Summary_000565.pdf ← input lodgement (Elmhurst RdSAP10 PDF)
U985-0001-000565.pdf ← worksheet output (line refs ground-truth)
Cert 000565 is a wacky stress-test cert (user-supplied) that exercises many cascade paths absent from the cohort-2 + ASHP corpus:
- 5 building parts: Main + 4 extensions (vs cohort max 2 extensions)
- Age mix: Main A (pre-1900), Ext1 E (1967-75), Ext2 H (1991-95), Ext3 I (1996-2002), Ext4 J (2003-06) — spans 100+ years of construction
- Room-in-roof on every part at different ages (H, I, J, I, M)
- Conservatory thermally separated WITH fixed heaters (zero coverage elsewhere)
- Wall variety:
- Main: Solid Brick + 75mm External insulation + Alt Wall Stone granite (23 m² with 120mm As Built + dry-lining)
- Ext1: Stone granite, U Unknown, Cavity filled party wall
- Ext2: Curtain Wall Post 2023 (zero coverage)
- …
- Party walls: CU Cavity unfilled (Main), CF Cavity filled (Ext1), U Unable to determine (Ext2)
- Multi-storey extensions with floor 0/1 having varying room heights (1.0 m to 4.0 m) and party_wall_length (0 to 23 m)
Every uncommon cascade path the cohort-2 + ASHP fixtures don't exercise will light up against this cert.
Implementation outline (mirror the existing pattern)
-
Hand-build a
_elmhurst_worksheet_000565.pymodule underdomain/sap10_calculator/worksheet/tests/. Pattern is exactly the shape of_elmhurst_worksheet_000474.py:build_epc() -> EpcPropertyData— hand-construct the EpcPropertyData from the Summary_000565.pdf §1-19 lodgings. Use the existingmake_minimal_sap10_epc,SapBuildingPart,SapFloorDimensionetc. constructors.- Module-level
LINE_NN_FOO: type = valueconstants for every U985 line ref the test pins. Extract values from U985-0001-000565.pdf.
-
Register the fixture in
test_e2e_elmhurst_sap_score.py:- Add
from . import _elmhurst_worksheet_000565 as _w000565import. - Add
"000565": FixtureCascadePins(sap_score=..., sap_score_continuous=..., ...)entry to_FIXTURE_PINS. - Add
"000565": _w000565entry to_FIXTURE_MODULES. - The parametrized
test_sap_result_pin[000565-FIELD]test cases fire automatically.
- Add
-
Per feedback-e2e-validation-philosophy + feedback-zero-error-strict:
- Tolerances are
abs=1e-4on every field. No widening, no xfail. - Failing pins are named calculator bugs to fix, not tolerances to relax. Each failing pin is its own slice.
- Tolerances are
Why this path is more powerful than API-route closure for calculator gaps
| API-route closure | Elmhurst-only path |
|---|---|
| Cert needs both API JSON AND worksheet | Cert needs only Summary + worksheet PDFs |
Tests run via from_api_response → cert_to_inputs → calculator — failure could be mapper OR calculator |
Tests run via build_epc() → cert_to_inputs → calculator — failure is definitionally a calculator bug |
| Cohort acquisition: gov.uk EPB JSON + assessor's worksheet ZIP | Cohort acquisition: assessor's tool export only (Elmhurst SAP) |
| Cross-mapper parity is a 2nd-order check on top of cascade correctness | Direct cascade correctness check |
For diverse fuel-type / property-type calculator coverage, the user can generate test certs in Elmhurst SAP without needing to lodge them at gov.uk first. Targets to generate for closure on currently-zero- coverage paths:
| Fuel / config | Why critical | Cascade paths exercised |
|---|---|---|
| Oil boiler PCDB-listed (Firebird etc.) | Closes cert 0240 + 0390 oil residuals; no current oil worksheet | Table 105 oil + oil-tariff fuel cost + oil CO2/PE factors |
| Solid fuel (anthracite, wood pellets, biomass) | Zero coverage | Table 32 solid-fuel branch + solid-fuel CO2/PE factors |
| LPG | Zero coverage | Table 32 LPG branch + LPG-specific tariff lookup |
| Electric direct-acting / storage heaters | Zero coverage; off-peak meter path raises in cert_to_inputs | _RDSAP_DEFINITELY_OFF_PEAK dispatch (codes 1/4/5) + Table 12a high/low-rate split |
| Multi-main-heating (main 1 + main 2) | Currently un-exercised — main_2_fuel_kwh_per_yr cascade is dormant |
Per-main efficiency + per-main fuel cost + summed PE |
| Basement | Minimal coverage | u_basement_wall + u_basement_floor Table 23 dispatch |
| Conservatory with fixed heaters | Zero coverage | Conservatory exclusion / inclusion rule + heated-conservatory fuel routing |
The wacky 000565 cert exercises 3-4 of these in one shot (multi- extension + multi-age + conservatory + curtain wall). After it lands, the user can generate single-feature certs (one oil cert, one LPG cert, etc.) to isolate single-cause calculator gaps.
Strict-raise pattern (S0380.51) — extension queue
The UnmappedApiCode strict-raise pattern is established in
datatypes/epc/domain/mapper.py. Currently five helpers raise:
_api_party_wall_construction_int_api_floor_construction_str_api_floor_type_str_api_roof_construction_str_api_sheltered_sides
Pending extensions (mechanical; each its own slice):
_api_glazing_transmission— comment says "Codes 4-12, 15+ not yet mapped — incremental coverage as new fixtures surface them"_api_cascade_glazing_type— uses pass-through fallbackdict.get(code, code)which is intentional but worth auditing to surface deliberate decisions
The forcing function test_all_golden_fixtures_extract_via_api_without_unmapped_code_raise will catch any unmapped enum across the whole golden corpus at extraction time. Each new fixture added increases the gate's coverage automatically.
Test baseline at HEAD
PYTHONPATH=/workspaces/model python -m pytest \
backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \
backend/documents_parser/tests/test_elmhurst_extractor.py \
backend/documents_parser/tests/test_elmhurst_end_to_end.py \
domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \
domain/sap10_calculator/worksheet/tests/test_water_heating.py \
domain/sap10_calculator/worksheet/tests/test_mean_internal_temperature.py \
domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \
domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \
domain/sap10_calculator/tests/test_pcdb_table_362_lookup.py \
domain/sap10_ml/tests/test_rdsap_uvalues.py \
datatypes/epc/schema/tests/test_schema_loading.py \
domain/sap10_calculator/worksheet/tests/test_photovoltaic.py \
--no-cov -q
Expected: 769 pass + 0 fail.
Conventions preserved
- 1e-4 across the board (feedback-one-e-minus-4-across-the-board)
- Worksheet, not API, is the target (feedback-worksheet-not-api-reference)
- Verify worksheet PDF before accepting handover claims (feedback-verify-handover-claims)
- Spec-floor skepticism (feedback-spec-floor-skepticism)
- Golden residuals → ~0 (feedback-golden-residuals-near-zero)
- AAA test convention (feedback-aaa-test-convention)
abs(diff) <= tolnotpytest.approx(feedback-abs-diff-over-pytest-approx)- Spec citation in commit messages (feedback-spec-citation-in-commits)
- One slice = one commit; stage by name (feedback-commit-per-slice)
- Pyright net-zero per touched file (feedback-zero-error-strict)
- Cross-mapper parity via cascade (feedback-cross-mapper-parity-via-cascade)
- Bigger slices OK for uniform-cohort work (feedback-bigger-slices-for-uniform-work)