mirror of https://github.com/Hestia-Homes/Model.git synced 2026-06-08 11:17:27 +00:00

Khalim Conn-Kowlessar fc84c6d49a docs: extend handover with Elmhurst-only path + 000565 extended test case

User clarified end-of-session: mapper is a thin enum-and-shape
translation; when residuals remain after closing mapper coverage
gaps, the gap is in the **calculator cascade**. This unlocks an
Elmhurst-only fixture path that doesn't need API JSON at all.

The fixture shape mirrors the 6 historical Elmhurst U985 fixtures
(000474, 000477, 000480, 000487, 000490, 000516) at
`domain/sap10_calculator/worksheet/tests/_elmhurst_worksheet_NNNNNN.py`
+ `test_e2e_elmhurst_sap_score.py`:

  build_epc() → cert_to_inputs → calculate_sap_from_inputs
  ↳ every SapResult field pinned at abs=1e-4 against U985 line refs

Any failing pin is definitionally a calculator bug. The user generates
certs in Elmhurst SAP and exports Summary + worksheet ZIPs — no gov.uk
EPB lodgement required.

Extended test case (000565) ready at `sap worksheets/extended test case/`:
- Summary_000565.pdf (input)
- U985-0001-000565.pdf (worksheet ground-truth)

Cert 000565 is a wacky stress-test that exercises 3-4 zero-coverage
cascade paths in one cert: Main + 4 extensions, age mix A through J,
RR on every part with mixed ages, conservatory with fixed heaters,
curtain-wall Ext2 post-2023, mixed wall types (solid brick + stone +
curtain wall), mixed party walls (CU + CF + Unable to determine).

After this cert lands, the user has agreed to generate single-feature
certs (oil only, LPG only, solid fuel only, electric direct only,
multi-main-heating, basement) to surface single-cause calculator gaps.

Handover doc now has implementation outline (mirror
_elmhurst_worksheet_000474.py shape) and a coverage-paths table
showing which targets each fuel-type/config exercises.

2026-05-28 21:03:23 +00:00

18 KiB

Raw Blame History

Handover — golden coverage + next slice

Branch feature/per-cert-mapper-validation. HEAD: b7fbbcca (Slice S0380.51 strict-raise UnmappedApiCode on API integer enums). Test baseline: 769 pass + 0 fail. Pyright net-zero on every touched file.

Recent session slices (S0380.47 → S0380.51)

Slice	Commit	What
S0380.47	`42ed38f7`	β-split wired into cost cascade per Appendix M1 §6 — zero cohort impact because Table 32 collapses code 30 = code 60 = 13.19 p/kWh
S0380.48	`bf99b1c7`	Schema gap closure: real-API `pv_batteries[]` lodges `battery_capacity` flat-shape (`[{"battery_capacity": 5}]`), schema expected nested `{"pv_battery": {"battery_capacity": 5}}` → 5-kWh batteries silently dropped → β too low. Cohort PE +2.7..+8.1 → −3.5..−4.5
S0380.49	`e75198ce`	Effective-monthly Table 12e PE factors for the PV split per Appendix M1 §8. Cohort PE −3.5..−4.5 → −2.8..−3.7
S0380.50	`3d1e6f10`	§4 seasonal monthly HW fuel for PV β cascade — replaced days-prorated hot-water demand with §4 (62)m seasonal output scaled to annual fuel. Cohort PE −2.8..−3.7 → −2.7..−3.5
S0380.51	`b7fbbcca`	Strict-raise `UnmappedApiCode` on five API mapper helpers (`floor_construction`, `floor_heat_loss`, `roof_construction`, `party_wall_construction`, `built_form`). Surfaced two coverage gaps immediately (`floor_heat_loss` codes 2/3/6) and added explicit mappings. 6 new tests as the forcing function.

Test-coverage matrix (current state)

Test file	Certs	What's pinned
`test_summary_pdf_mapper_chain.py`	38 cohort-2 + 8 ASHP + per-cert chain tests	SAP at 1e-4 vs worksheet
`test_golden_fixtures.py`	15 certs	SAP int + PE + CO2 residuals vs API-lodged
`test_all_golden_fixtures_extract_via_api_without_unmapped_code_raise`	All JSON in `fixtures/golden/`	No `UnmappedApiCode` raised at extraction

Cohort overlap

Golden ∩ Cohort-2 = 0/38 — cohort-2 certs are NOT in golden fixtures
Golden ∩ ASHP = 7/8 — cert 9501 lives in chain tests only
Golden open-front = 8 certs (oil + gas + RR) — no worksheets, API-only

Cohort-2 SAP closure (chain tests)

All 38 at max |Δ| = 5e-5 vs worksheet — closed.

Cohort-2 PE / CO2 (probed but NOT pinned anywhere)

24/38 closed (|PE| < 1, |CO2| < 0.05)
14/38 open. Top offender: cert 2102 at +20.4 PE, −0.79 CO2 — completely undetected by any current test
Other 13 cluster around −3 PE (same PV (233a/b) mystery pattern as the ASHP golden certs)

★ Next slice — add cohort-2 to `test_golden_fixtures.py`

This is the agreed-upon next slice (one-slice change, high-value):

Run cohort-2 against cert_to_demand_inputs and capture current PE/CO2 residuals
Add _GoldenExpectation entries to test_golden_fixtures.py for all 38 certs
The pin tolerance stays at the existing _PE_ABS_TOLERANCE_KWH_PER_M2 = 0.01 / _CO2_ABS_TOLERANCE_TONNES = 0.001
The 14 "open" certs get pinned at their CURRENT non-zero residuals (regression-guard, not closure)
Cert 2102 (+20.4 PE / −0.79 CO2) becomes immediately visible as the next closure target with worksheet support

Why this is high-leverage: cohort-2 chain tests only pin SAP at 1e-4 (which catches cost-cascade drift but not PE/CO2 cascade drift). Cert 2102's +20.4 PE is invisible to any current test. Adding cohort-2 to golden creates regression guards across all three SAP/PE/CO2 cascades for 38 worksheet-backed certs.

Concrete implementation outline

# In test_golden_fixtures.py — add an entry per cohort-2 cert:
_GoldenExpectation(
    cert_number="2102-3018-0205-7886-5204",
    actual_sap=64,                              # from doc['energy_rating_current']
    expected_sap_resid=+0,                      # cohort-2 closure at 1e-4 → rounds to 0
    expected_pe_resid_kwh_per_m2=+20.3640,      # current residual, pin here
    expected_co2_resid_tonnes_per_yr=-0.7895,
    notes=(
        "Cohort-2 cert. SAP closed at 1e-4 via chain test. PE +20.4 / "
        "CO2 -0.79 residuals are the open closure target — worksheet "
        "exists (Summary + dr87) under `sap worksheets/`. Likely a "
        "specific cascade gap to probe with the worksheet."
    ),
),

Use the probe in this session's last diagnostic to capture exact residuals:

PYTHONPATH=/workspaces/model python -c "
import json, pathlib
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
from domain.sap10_calculator.rdsap.cert_to_inputs import cert_to_inputs, cert_to_demand_inputs, SAP_10_2_SPEC_PRICES
from domain.sap10_calculator.calculator import calculate_sap_from_inputs

for cert in COHORT_2_LIST:
    doc = json.loads(pathlib.Path(f'.../{cert}.json').read_text())
    epc = EpcPropertyDataMapper.from_api_response(doc)
    rating = calculate_sap_from_inputs(cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES))
    demand = calculate_sap_from_inputs(cert_to_demand_inputs(epc, prices=SAP_10_2_SPEC_PRICES))
    # ... print _GoldenExpectation tuple in the right format
"

The 38 fixture entries should land in one PR. After landing, cert 2102 becomes the obvious next closure target.

Open threads (after the cohort-2 add)

Tractable with worksheets we already have

Cert 2102 +20.4 PE / −0.79 CO2 — cohort-2 cert, worksheet exists under sap worksheets/Additional data with api/ for the cohort-2 batch. Surfaced by cohort-2 → golden migration. Best next closure target.
PV (233a)+(233b) monthly mystery — documented at project_pv_233_split_mystery.md. Cascade β = 0.7511 vs worksheet 0.7392 for cert 0380. Closes ~0.5 kWh/m² across the ASHP cohort. The 14 cohort-2 ASHP-pattern PE residuals at −3 kWh/m² likely share this root cause.
_api_glazing_transmission strict-raise extension — the helper's existing comment says "Codes 4-12, 15+ not yet mapped — incremental coverage as new fixtures surface them." Same pattern as S0380.51. Mechanical; low risk; coverage-hardening.

Open without worksheets (low payoff)

Golden fixtures with large residuals but no worksheets to triangulate:

Cert	PE Δ	Heating	Notes
6035	+46.76	Gas combi age A (mid-terrace)	RR with "limited insulation (assumed)" → cascade roof = 130.72 W/K, possibly wrong cascade routing — needs worksheet
0390-2954	−26.01	Oil combi Firebird PCDF 9005	Oil tariff cascade + fabric heat loss — needs worksheet
0240	+12.49	Oil boiler + PV + RR (detached)	Subsystem heat-loss diff in notes (roof 76.93 W/K) — needs worksheet
0300	+8.28	Gas combi large semi TFA 526	Shower outlet schema work was recent — needs worksheet
2130	−8.22 (chain) / −8.22 (golden)	Gas combi + PV	"gas combi PE under-count + secondary heating credit" — needs worksheet
7536	−7.08	Gas combi multi-age (D/L/F)	"multi-age geometry probably surfaces per-bp U the spec table doesn't capture" — needs worksheet
0535	(in golden)	—	open-front — needs worksheet
8135	−0.07	Gas	already closed — keep as regression guard

The user observation that oil is under-represented is correct: 2 oil-boiler certs in golden, both at high residuals, both without worksheets. Solid fuel, LPG, electric direct-acting are completely absent.

Heating-system distribution across golden fixtures

Heating	Count	Worksheets	Status
Boiler + radiators, mains gas	34	Most (cohort-2 + 9501)	Mostly closed at 1e-4 SAP
Air source heat pump	20	All 8 ASHP cohort have worksheets	β-split phase complete; ~−3 PE structural residual open
Boiler + radiators, oil	2	None	Both at high residuals; closure blocked on worksheets
Community scheme	1	None	Retired
Solid fuel	0	—	Completely absent
LPG	0	—	Completely absent
Electric direct / storage heater	0	—	Completely absent

How to grow fixture diversity (answer to "what to download")

For the gov.uk EPB downloads UI, you only get API JSON — that's enough for SAP-closure verification IF the cert's lodged SAP value can be trusted (it's the assessor's calculator output). But:

The dr87-0001-NNNNNN.pdf worksheet — needed to debug structural cascade gaps line-by-line — is generated by the assessor's calculator (typically Elmhurst SAP tool) and bundled in their export ZIP. Not available via the gov.uk UI.
The cohort-2 + ASHP worksheets in sap worksheets/Additional data with api/ came from an Elmhurst data dump.

Recommended fixture targets to unlock open work:

Oil worksheets — for cert 0240 + 0390 + 0390-2954 in our golden set. These would close ~38 PE kWh/m² of residual immediately.
A solid-fuel cert with worksheet — anthracite / wood pellets / biomass. Currently zero coverage. The fuel-cost cascade through Table 32 + heat-emitter cascade has paths we've never exercised.
An LPG cert with worksheet — Table 32 code different from gas/oil; the cost cascade has an LPG-specific branch that has never run in tests.
An electric direct-acting cert with worksheet — storage heater (codes 401-409) or panel heater (codes 191-196). The off-peak tariff path (_RDSAP_DEFINITELY_OFF_PEAK = {1, 4, 5} in cert_to_inputs.py) currently raises rather than computes — first off-peak cert with worksheet would force that path.
A community/district heating cert with worksheet — currently the retired 9390 is the only such cert and it has no worksheet.

When grabbing certs from the data dump, filter by main_heating[0].description to ensure fuel-type coverage:

Boiler and radiators, oil (target: 5-10 worksheets)
Boiler and radiators, anthracite / wood pellets / wood logs
Boiler and radiators, LPG
Electric storage heaters / Direct-acting electric heaters
Community scheme

★★ Elmhurst-only path (calculator gap closure WITHOUT API JSON)

User insight from end of session: the mapper is a thin pass-through; when residuals remain after closing mapper gaps (cohort-2 → golden), the gap is in the calculator cascade, not the mapper. For calculator gaps, the API JSON is not load-bearing — only the Elmhurst Summary PDF (input) and the worksheet PDF (ground-truth line refs) are needed.

This is a different fixture shape from the cohort-2 + ASHP path. It mirrors the 6 original Elmhurst U985 fixtures (000474, 000477, 000480, 000487, 000490, 000516) — the historical worksheet-pinned test vectors at domain/sap10_calculator/worksheet/tests/_elmhurst_worksheet_NNNNNN.py

test_e2e_elmhurst_sap_score.py. No API JSON in the loop.

Concrete next-target: the extended test case at `sap worksheets/extended test case/`

sap worksheets/extended test case/
  Summary_000565.pdf              ← input lodgement (Elmhurst RdSAP10 PDF)
  U985-0001-000565.pdf            ← worksheet output (line refs ground-truth)

Cert 000565 is a wacky stress-test cert (user-supplied) that exercises many cascade paths absent from the cohort-2 + ASHP corpus:

5 building parts: Main + 4 extensions (vs cohort max 2 extensions)
Age mix: Main A (pre-1900), Ext1 E (1967-75), Ext2 H (1991-95), Ext3 I (1996-2002), Ext4 J (2003-06) — spans 100+ years of construction
Room-in-roof on every part at different ages (H, I, J, I, M)
Conservatory thermally separated WITH fixed heaters (zero coverage elsewhere)
Wall variety:
- Main: Solid Brick + 75mm External insulation + Alt Wall Stone granite (23 m² with 120mm As Built + dry-lining)
- Ext1: Stone granite, U Unknown, Cavity filled party wall
- Ext2: Curtain Wall Post 2023 (zero coverage)
- …
Party walls: CU Cavity unfilled (Main), CF Cavity filled (Ext1), U Unable to determine (Ext2)
Multi-storey extensions with floor 0/1 having varying room heights (1.0 m to 4.0 m) and party_wall_length (0 to 23 m)

Every uncommon cascade path the cohort-2 + ASHP fixtures don't exercise will light up against this cert.

Implementation outline (mirror the existing pattern)

Hand-build a _elmhurst_worksheet_000565.py module under domain/sap10_calculator/worksheet/tests/. Pattern is exactly the shape of _elmhurst_worksheet_000474.py:
- build_epc() -> EpcPropertyData — hand-construct the EpcPropertyData from the Summary_000565.pdf §1-19 lodgings. Use the existing make_minimal_sap10_epc, SapBuildingPart, SapFloorDimension etc. constructors.
- Module-level LINE_NN_FOO: type = value constants for every U985 line ref the test pins. Extract values from U985-0001-000565.pdf.
Register the fixture in test_e2e_elmhurst_sap_score.py:
- Add from . import _elmhurst_worksheet_000565 as _w000565 import.
- Add "000565": FixtureCascadePins(sap_score=..., sap_score_continuous=..., ...) entry to _FIXTURE_PINS.
- Add "000565": _w000565 entry to _FIXTURE_MODULES.
- The parametrized test_sap_result_pin[000565-FIELD] test cases fire automatically.
Per feedback-e2e-validation-philosophy + feedback-zero-error-strict:
- Tolerances are abs=1e-4 on every field. No widening, no xfail.
- Failing pins are named calculator bugs to fix, not tolerances to relax. Each failing pin is its own slice.

Why this path is more powerful than API-route closure for calculator gaps

API-route closure	Elmhurst-only path
Cert needs both API JSON AND worksheet	Cert needs only Summary + worksheet PDFs
Tests run via `from_api_response → cert_to_inputs → calculator` — failure could be mapper OR calculator	Tests run via `build_epc() → cert_to_inputs → calculator` — failure is definitionally a calculator bug
Cohort acquisition: gov.uk EPB JSON + assessor's worksheet ZIP	Cohort acquisition: assessor's tool export only (Elmhurst SAP)
Cross-mapper parity is a 2nd-order check on top of cascade correctness	Direct cascade correctness check

For diverse fuel-type / property-type calculator coverage, the user can generate test certs in Elmhurst SAP without needing to lodge them at gov.uk first. Targets to generate for closure on currently-zero- coverage paths:

Fuel / config	Why critical	Cascade paths exercised
Oil boiler PCDB-listed (Firebird etc.)	Closes cert 0240 + 0390 oil residuals; no current oil worksheet	Table 105 oil + oil-tariff fuel cost + oil CO2/PE factors
Solid fuel (anthracite, wood pellets, biomass)	Zero coverage	Table 32 solid-fuel branch + solid-fuel CO2/PE factors
LPG	Zero coverage	Table 32 LPG branch + LPG-specific tariff lookup
Electric direct-acting / storage heaters	Zero coverage; off-peak meter path raises in cert_to_inputs	`_RDSAP_DEFINITELY_OFF_PEAK` dispatch (codes 1/4/5) + Table 12a high/low-rate split
Multi-main-heating (main 1 + main 2)	Currently un-exercised — `main_2_fuel_kwh_per_yr` cascade is dormant	Per-main efficiency + per-main fuel cost + summed PE
Basement	Minimal coverage	`u_basement_wall` + `u_basement_floor` Table 23 dispatch
Conservatory with fixed heaters	Zero coverage	Conservatory exclusion / inclusion rule + heated-conservatory fuel routing

The wacky 000565 cert exercises 3-4 of these in one shot (multi- extension + multi-age + conservatory + curtain wall). After it lands, the user can generate single-feature certs (one oil cert, one LPG cert, etc.) to isolate single-cause calculator gaps.

Strict-raise pattern (S0380.51) — extension queue

The UnmappedApiCode strict-raise pattern is established in datatypes/epc/domain/mapper.py. Currently five helpers raise:

_api_party_wall_construction_int
_api_floor_construction_str
_api_floor_type_str
_api_roof_construction_str
_api_sheltered_sides

Pending extensions (mechanical; each its own slice):

_api_glazing_transmission — comment says "Codes 4-12, 15+ not yet mapped — incremental coverage as new fixtures surface them"
_api_cascade_glazing_type — uses pass-through fallback dict.get(code, code) which is intentional but worth auditing to surface deliberate decisions

The forcing function test_all_golden_fixtures_extract_via_api_without_unmapped_code_raise will catch any unmapped enum across the whole golden corpus at extraction time. Each new fixture added increases the gate's coverage automatically.

Test baseline at HEAD

PYTHONPATH=/workspaces/model python -m pytest \
    backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \
    backend/documents_parser/tests/test_elmhurst_extractor.py \
    backend/documents_parser/tests/test_elmhurst_end_to_end.py \
    domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \
    domain/sap10_calculator/worksheet/tests/test_water_heating.py \
    domain/sap10_calculator/worksheet/tests/test_mean_internal_temperature.py \
    domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \
    domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \
    domain/sap10_calculator/tests/test_pcdb_table_362_lookup.py \
    domain/sap10_ml/tests/test_rdsap_uvalues.py \
    datatypes/epc/schema/tests/test_schema_loading.py \
    domain/sap10_calculator/worksheet/tests/test_photovoltaic.py \
    --no-cov -q

Expected: 769 pass + 0 fail.

Conventions preserved

1e-4 across the board (feedback-one-e-minus-4-across-the-board)
Worksheet, not API, is the target (feedback-worksheet-not-api-reference)
Verify worksheet PDF before accepting handover claims (feedback-verify-handover-claims)
Spec-floor skepticism (feedback-spec-floor-skepticism)
Golden residuals → ~0 (feedback-golden-residuals-near-zero)
AAA test convention (feedback-aaa-test-convention)
abs(diff) <= tol not pytest.approx (feedback-abs-diff-over-pytest-approx)
Spec citation in commit messages (feedback-spec-citation-in-commits)
One slice = one commit; stage by name (feedback-commit-per-slice)
Pyright net-zero per touched file (feedback-zero-error-strict)
Cross-mapper parity via cascade (feedback-cross-mapper-parity-via-cascade)
Bigger slices OK for uniform-cohort work (feedback-bigger-slices-for-uniform-work)

18 KiB Raw Blame History Unescape Escape