Model/domain/sap10_calculator/docs/HANDOVER_GOLDEN_COVERAGE.md
Khalim Conn-Kowlessar fc84c6d49a docs: extend handover with Elmhurst-only path + 000565 extended test case
User clarified end-of-session: mapper is a thin enum-and-shape
translation; when residuals remain after closing mapper coverage
gaps, the gap is in the **calculator cascade**. This unlocks an
Elmhurst-only fixture path that doesn't need API JSON at all.

The fixture shape mirrors the 6 historical Elmhurst U985 fixtures
(000474, 000477, 000480, 000487, 000490, 000516) at
`domain/sap10_calculator/worksheet/tests/_elmhurst_worksheet_NNNNNN.py`
+ `test_e2e_elmhurst_sap_score.py`:

  build_epc() → cert_to_inputs → calculate_sap_from_inputs
  ↳ every SapResult field pinned at abs=1e-4 against U985 line refs

Any failing pin is definitionally a calculator bug. The user generates
certs in Elmhurst SAP and exports Summary + worksheet ZIPs — no gov.uk
EPB lodgement required.

Extended test case (000565) ready at `sap worksheets/extended test case/`:
- Summary_000565.pdf (input)
- U985-0001-000565.pdf (worksheet ground-truth)

Cert 000565 is a wacky stress-test that exercises 3-4 zero-coverage
cascade paths in one cert: Main + 4 extensions, age mix A through J,
RR on every part with mixed ages, conservatory with fixed heaters,
curtain-wall Ext2 post-2023, mixed wall types (solid brick + stone +
curtain wall), mixed party walls (CU + CF + Unable to determine).

After this cert lands, the user has agreed to generate single-feature
certs (oil only, LPG only, solid fuel only, electric direct only,
multi-main-heating, basement) to surface single-cause calculator gaps.

Handover doc now has implementation outline (mirror
_elmhurst_worksheet_000474.py shape) and a coverage-paths table
showing which targets each fuel-type/config exercises.
2026-05-28 21:03:23 +00:00

18 KiB
Raw Blame History

Handover — golden coverage + next slice

Branch feature/per-cert-mapper-validation. HEAD: b7fbbcca (Slice S0380.51 strict-raise UnmappedApiCode on API integer enums). Test baseline: 769 pass + 0 fail. Pyright net-zero on every touched file.

Recent session slices (S0380.47 → S0380.51)

Slice Commit What
S0380.47 42ed38f7 β-split wired into cost cascade per Appendix M1 §6 — zero cohort impact because Table 32 collapses code 30 = code 60 = 13.19 p/kWh
S0380.48 bf99b1c7 Schema gap closure: real-API pv_batteries[] lodges battery_capacity flat-shape ([{"battery_capacity": 5}]), schema expected nested {"pv_battery": {"battery_capacity": 5}} → 5-kWh batteries silently dropped → β too low. Cohort PE +2.7..+8.1 → 3.5..4.5
S0380.49 e75198ce Effective-monthly Table 12e PE factors for the PV split per Appendix M1 §8. Cohort PE 3.5..4.5 → 2.8..3.7
S0380.50 3d1e6f10 §4 seasonal monthly HW fuel for PV β cascade — replaced days-prorated hot-water demand with §4 (62)m seasonal output scaled to annual fuel. Cohort PE 2.8..3.7 → 2.7..3.5
S0380.51 b7fbbcca Strict-raise UnmappedApiCode on five API mapper helpers (floor_construction, floor_heat_loss, roof_construction, party_wall_construction, built_form). Surfaced two coverage gaps immediately (floor_heat_loss codes 2/3/6) and added explicit mappings. 6 new tests as the forcing function.

Test-coverage matrix (current state)

Test file Certs What's pinned
test_summary_pdf_mapper_chain.py 38 cohort-2 + 8 ASHP + per-cert chain tests SAP at 1e-4 vs worksheet
test_golden_fixtures.py 15 certs SAP int + PE + CO2 residuals vs API-lodged
test_all_golden_fixtures_extract_via_api_without_unmapped_code_raise All JSON in fixtures/golden/ No UnmappedApiCode raised at extraction

Cohort overlap

  • Golden ∩ Cohort-2 = 0/38 — cohort-2 certs are NOT in golden fixtures
  • Golden ∩ ASHP = 7/8 — cert 9501 lives in chain tests only
  • Golden open-front = 8 certs (oil + gas + RR) — no worksheets, API-only

Cohort-2 SAP closure (chain tests)

All 38 at max |Δ| = 5e-5 vs worksheet — closed.

Cohort-2 PE / CO2 (probed but NOT pinned anywhere)

  • 24/38 closed (|PE| < 1, |CO2| < 0.05)
  • 14/38 open. Top offender: cert 2102 at +20.4 PE, 0.79 CO2 — completely undetected by any current test
  • Other 13 cluster around 3 PE (same PV (233a/b) mystery pattern as the ASHP golden certs)

★ Next slice — add cohort-2 to test_golden_fixtures.py

This is the agreed-upon next slice (one-slice change, high-value):

  1. Run cohort-2 against cert_to_demand_inputs and capture current PE/CO2 residuals
  2. Add _GoldenExpectation entries to test_golden_fixtures.py for all 38 certs
  3. The pin tolerance stays at the existing _PE_ABS_TOLERANCE_KWH_PER_M2 = 0.01 / _CO2_ABS_TOLERANCE_TONNES = 0.001
  4. The 14 "open" certs get pinned at their CURRENT non-zero residuals (regression-guard, not closure)
  5. Cert 2102 (+20.4 PE / 0.79 CO2) becomes immediately visible as the next closure target with worksheet support

Why this is high-leverage: cohort-2 chain tests only pin SAP at 1e-4 (which catches cost-cascade drift but not PE/CO2 cascade drift). Cert 2102's +20.4 PE is invisible to any current test. Adding cohort-2 to golden creates regression guards across all three SAP/PE/CO2 cascades for 38 worksheet-backed certs.

Concrete implementation outline

# In test_golden_fixtures.py — add an entry per cohort-2 cert:
_GoldenExpectation(
    cert_number="2102-3018-0205-7886-5204",
    actual_sap=64,                              # from doc['energy_rating_current']
    expected_sap_resid=+0,                      # cohort-2 closure at 1e-4 → rounds to 0
    expected_pe_resid_kwh_per_m2=+20.3640,      # current residual, pin here
    expected_co2_resid_tonnes_per_yr=-0.7895,
    notes=(
        "Cohort-2 cert. SAP closed at 1e-4 via chain test. PE +20.4 / "
        "CO2 -0.79 residuals are the open closure target — worksheet "
        "exists (Summary + dr87) under `sap worksheets/`. Likely a "
        "specific cascade gap to probe with the worksheet."
    ),
),

Use the probe in this session's last diagnostic to capture exact residuals:

PYTHONPATH=/workspaces/model python -c "
import json, pathlib
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
from domain.sap10_calculator.rdsap.cert_to_inputs import cert_to_inputs, cert_to_demand_inputs, SAP_10_2_SPEC_PRICES
from domain.sap10_calculator.calculator import calculate_sap_from_inputs

for cert in COHORT_2_LIST:
    doc = json.loads(pathlib.Path(f'.../{cert}.json').read_text())
    epc = EpcPropertyDataMapper.from_api_response(doc)
    rating = calculate_sap_from_inputs(cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES))
    demand = calculate_sap_from_inputs(cert_to_demand_inputs(epc, prices=SAP_10_2_SPEC_PRICES))
    # ... print _GoldenExpectation tuple in the right format
"

The 38 fixture entries should land in one PR. After landing, cert 2102 becomes the obvious next closure target.

Open threads (after the cohort-2 add)

Tractable with worksheets we already have

  1. Cert 2102 +20.4 PE / 0.79 CO2 — cohort-2 cert, worksheet exists under sap worksheets/Additional data with api/ for the cohort-2 batch. Surfaced by cohort-2 → golden migration. Best next closure target.

  2. PV (233a)+(233b) monthly mystery — documented at project_pv_233_split_mystery.md. Cascade β = 0.7511 vs worksheet 0.7392 for cert 0380. Closes ~0.5 kWh/m² across the ASHP cohort. The 14 cohort-2 ASHP-pattern PE residuals at 3 kWh/m² likely share this root cause.

  3. _api_glazing_transmission strict-raise extension — the helper's existing comment says "Codes 4-12, 15+ not yet mapped — incremental coverage as new fixtures surface them." Same pattern as S0380.51. Mechanical; low risk; coverage-hardening.

Open without worksheets (low payoff)

Golden fixtures with large residuals but no worksheets to triangulate:

Cert PE Δ Heating Notes
6035 +46.76 Gas combi age A (mid-terrace) RR with "limited insulation (assumed)" → cascade roof = 130.72 W/K, possibly wrong cascade routing — needs worksheet
0390-2954 26.01 Oil combi Firebird PCDF 9005 Oil tariff cascade + fabric heat loss — needs worksheet
0240 +12.49 Oil boiler + PV + RR (detached) Subsystem heat-loss diff in notes (roof 76.93 W/K) — needs worksheet
0300 +8.28 Gas combi large semi TFA 526 Shower outlet schema work was recent — needs worksheet
2130 8.22 (chain) / 8.22 (golden) Gas combi + PV "gas combi PE under-count + secondary heating credit" — needs worksheet
7536 7.08 Gas combi multi-age (D/L/F) "multi-age geometry probably surfaces per-bp U the spec table doesn't capture" — needs worksheet
0535 (in golden) open-front — needs worksheet
8135 0.07 Gas already closed — keep as regression guard

The user observation that oil is under-represented is correct: 2 oil-boiler certs in golden, both at high residuals, both without worksheets. Solid fuel, LPG, electric direct-acting are completely absent.

Heating-system distribution across golden fixtures

Heating Count Worksheets Status
Boiler + radiators, mains gas 34 Most (cohort-2 + 9501) Mostly closed at 1e-4 SAP
Air source heat pump 20 All 8 ASHP cohort have worksheets β-split phase complete; ~3 PE structural residual open
Boiler + radiators, oil 2 None Both at high residuals; closure blocked on worksheets
Community scheme 1 None Retired
Solid fuel 0 Completely absent
LPG 0 Completely absent
Electric direct / storage heater 0 Completely absent

How to grow fixture diversity (answer to "what to download")

For the gov.uk EPB downloads UI, you only get API JSON — that's enough for SAP-closure verification IF the cert's lodged SAP value can be trusted (it's the assessor's calculator output). But:

  • The dr87-0001-NNNNNN.pdf worksheet — needed to debug structural cascade gaps line-by-line — is generated by the assessor's calculator (typically Elmhurst SAP tool) and bundled in their export ZIP. Not available via the gov.uk UI.

  • The cohort-2 + ASHP worksheets in sap worksheets/Additional data with api/ came from an Elmhurst data dump.

Recommended fixture targets to unlock open work:

  1. Oil worksheets — for cert 0240 + 0390 + 0390-2954 in our golden set. These would close ~38 PE kWh/m² of residual immediately.
  2. A solid-fuel cert with worksheet — anthracite / wood pellets / biomass. Currently zero coverage. The fuel-cost cascade through Table 32 + heat-emitter cascade has paths we've never exercised.
  3. An LPG cert with worksheet — Table 32 code different from gas/oil; the cost cascade has an LPG-specific branch that has never run in tests.
  4. An electric direct-acting cert with worksheet — storage heater (codes 401-409) or panel heater (codes 191-196). The off-peak tariff path (_RDSAP_DEFINITELY_OFF_PEAK = {1, 4, 5} in cert_to_inputs.py) currently raises rather than computes — first off-peak cert with worksheet would force that path.
  5. A community/district heating cert with worksheet — currently the retired 9390 is the only such cert and it has no worksheet.

When grabbing certs from the data dump, filter by main_heating[0].description to ensure fuel-type coverage:

  • Boiler and radiators, oil (target: 5-10 worksheets)
  • Boiler and radiators, anthracite / wood pellets / wood logs
  • Boiler and radiators, LPG
  • Electric storage heaters / Direct-acting electric heaters
  • Community scheme

★★ Elmhurst-only path (calculator gap closure WITHOUT API JSON)

User insight from end of session: the mapper is a thin pass-through; when residuals remain after closing mapper gaps (cohort-2 → golden), the gap is in the calculator cascade, not the mapper. For calculator gaps, the API JSON is not load-bearing — only the Elmhurst Summary PDF (input) and the worksheet PDF (ground-truth line refs) are needed.

This is a different fixture shape from the cohort-2 + ASHP path. It mirrors the 6 original Elmhurst U985 fixtures (000474, 000477, 000480, 000487, 000490, 000516) — the historical worksheet-pinned test vectors at domain/sap10_calculator/worksheet/tests/_elmhurst_worksheet_NNNNNN.py

  • test_e2e_elmhurst_sap_score.py. No API JSON in the loop.

Concrete next-target: the extended test case at sap worksheets/extended test case/

sap worksheets/extended test case/
  Summary_000565.pdf              ← input lodgement (Elmhurst RdSAP10 PDF)
  U985-0001-000565.pdf            ← worksheet output (line refs ground-truth)

Cert 000565 is a wacky stress-test cert (user-supplied) that exercises many cascade paths absent from the cohort-2 + ASHP corpus:

  • 5 building parts: Main + 4 extensions (vs cohort max 2 extensions)
  • Age mix: Main A (pre-1900), Ext1 E (1967-75), Ext2 H (1991-95), Ext3 I (1996-2002), Ext4 J (2003-06) — spans 100+ years of construction
  • Room-in-roof on every part at different ages (H, I, J, I, M)
  • Conservatory thermally separated WITH fixed heaters (zero coverage elsewhere)
  • Wall variety:
    • Main: Solid Brick + 75mm External insulation + Alt Wall Stone granite (23 m² with 120mm As Built + dry-lining)
    • Ext1: Stone granite, U Unknown, Cavity filled party wall
    • Ext2: Curtain Wall Post 2023 (zero coverage)
  • Party walls: CU Cavity unfilled (Main), CF Cavity filled (Ext1), U Unable to determine (Ext2)
  • Multi-storey extensions with floor 0/1 having varying room heights (1.0 m to 4.0 m) and party_wall_length (0 to 23 m)

Every uncommon cascade path the cohort-2 + ASHP fixtures don't exercise will light up against this cert.

Implementation outline (mirror the existing pattern)

  1. Hand-build a _elmhurst_worksheet_000565.py module under domain/sap10_calculator/worksheet/tests/. Pattern is exactly the shape of _elmhurst_worksheet_000474.py:

    • build_epc() -> EpcPropertyData — hand-construct the EpcPropertyData from the Summary_000565.pdf §1-19 lodgings. Use the existing make_minimal_sap10_epc, SapBuildingPart, SapFloorDimension etc. constructors.
    • Module-level LINE_NN_FOO: type = value constants for every U985 line ref the test pins. Extract values from U985-0001-000565.pdf.
  2. Register the fixture in test_e2e_elmhurst_sap_score.py:

    • Add from . import _elmhurst_worksheet_000565 as _w000565 import.
    • Add "000565": FixtureCascadePins(sap_score=..., sap_score_continuous=..., ...) entry to _FIXTURE_PINS.
    • Add "000565": _w000565 entry to _FIXTURE_MODULES.
    • The parametrized test_sap_result_pin[000565-FIELD] test cases fire automatically.
  3. Per feedback-e2e-validation-philosophy + feedback-zero-error-strict:

    • Tolerances are abs=1e-4 on every field. No widening, no xfail.
    • Failing pins are named calculator bugs to fix, not tolerances to relax. Each failing pin is its own slice.

Why this path is more powerful than API-route closure for calculator gaps

API-route closure Elmhurst-only path
Cert needs both API JSON AND worksheet Cert needs only Summary + worksheet PDFs
Tests run via from_api_response → cert_to_inputs → calculator — failure could be mapper OR calculator Tests run via build_epc() → cert_to_inputs → calculator — failure is definitionally a calculator bug
Cohort acquisition: gov.uk EPB JSON + assessor's worksheet ZIP Cohort acquisition: assessor's tool export only (Elmhurst SAP)
Cross-mapper parity is a 2nd-order check on top of cascade correctness Direct cascade correctness check

For diverse fuel-type / property-type calculator coverage, the user can generate test certs in Elmhurst SAP without needing to lodge them at gov.uk first. Targets to generate for closure on currently-zero- coverage paths:

Fuel / config Why critical Cascade paths exercised
Oil boiler PCDB-listed (Firebird etc.) Closes cert 0240 + 0390 oil residuals; no current oil worksheet Table 105 oil + oil-tariff fuel cost + oil CO2/PE factors
Solid fuel (anthracite, wood pellets, biomass) Zero coverage Table 32 solid-fuel branch + solid-fuel CO2/PE factors
LPG Zero coverage Table 32 LPG branch + LPG-specific tariff lookup
Electric direct-acting / storage heaters Zero coverage; off-peak meter path raises in cert_to_inputs _RDSAP_DEFINITELY_OFF_PEAK dispatch (codes 1/4/5) + Table 12a high/low-rate split
Multi-main-heating (main 1 + main 2) Currently un-exercised — main_2_fuel_kwh_per_yr cascade is dormant Per-main efficiency + per-main fuel cost + summed PE
Basement Minimal coverage u_basement_wall + u_basement_floor Table 23 dispatch
Conservatory with fixed heaters Zero coverage Conservatory exclusion / inclusion rule + heated-conservatory fuel routing

The wacky 000565 cert exercises 3-4 of these in one shot (multi- extension + multi-age + conservatory + curtain wall). After it lands, the user can generate single-feature certs (one oil cert, one LPG cert, etc.) to isolate single-cause calculator gaps.

Strict-raise pattern (S0380.51) — extension queue

The UnmappedApiCode strict-raise pattern is established in datatypes/epc/domain/mapper.py. Currently five helpers raise:

  • _api_party_wall_construction_int
  • _api_floor_construction_str
  • _api_floor_type_str
  • _api_roof_construction_str
  • _api_sheltered_sides

Pending extensions (mechanical; each its own slice):

  • _api_glazing_transmission — comment says "Codes 4-12, 15+ not yet mapped — incremental coverage as new fixtures surface them"
  • _api_cascade_glazing_type — uses pass-through fallback dict.get(code, code) which is intentional but worth auditing to surface deliberate decisions

The forcing function test_all_golden_fixtures_extract_via_api_without_unmapped_code_raise will catch any unmapped enum across the whole golden corpus at extraction time. Each new fixture added increases the gate's coverage automatically.

Test baseline at HEAD

PYTHONPATH=/workspaces/model python -m pytest \
    backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \
    backend/documents_parser/tests/test_elmhurst_extractor.py \
    backend/documents_parser/tests/test_elmhurst_end_to_end.py \
    domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \
    domain/sap10_calculator/worksheet/tests/test_water_heating.py \
    domain/sap10_calculator/worksheet/tests/test_mean_internal_temperature.py \
    domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \
    domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \
    domain/sap10_calculator/tests/test_pcdb_table_362_lookup.py \
    domain/sap10_ml/tests/test_rdsap_uvalues.py \
    datatypes/epc/schema/tests/test_schema_loading.py \
    domain/sap10_calculator/worksheet/tests/test_photovoltaic.py \
    --no-cov -q

Expected: 769 pass + 0 fail.

Conventions preserved