From 33293b094290ed652aa896c51f06303c7bf0f6e6 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Thu, 28 May 2026 20:55:17 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20handover=20after=20S0380.47..S0380.51?= =?UTF-8?q?=20=E2=80=94=20golden=20coverage=20state?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Captures the per-cert validation state at HEAD b7fbbcca: - 5 slices shipped this session: cost cascade β-split (.47), schema gap closure for real-API battery_capacity (.48), Table 12e effective-monthly PE factor for PV (.49), §4 seasonal HW for PV β cascade (.50), UnmappedApiCode strict-raise pattern on API mapper (.51). - 769 pass + 0 fail across the full baseline; pyright net-zero on every touched file. Crucial finding for the next agent: cohort-2 (38 certs) is chain- tested at 1e-4 SAP vs worksheet but NOT in test_golden_fixtures.py — PE/CO2 cascades have NO regression guard. Probed at HEAD: 14/38 cohort-2 certs have non-trivial PE residuals invisible to any current test, including cert 2102 at +20.4 PE / -0.79 CO2 (single worst undetected residual in the cohort). Agreed next slice: add all 38 cohort-2 certs to test_golden_fixtures.py with current PE/CO2 pinned. Surfaces cert 2102 as the next closure target (worksheet exists under `sap worksheets/`) and creates PE/CO2 regression guards across the worksheet-backed cohort. Open threads ranked by tractability: - Cert 2102 +20.4 PE — worksheet exists, well-scoped - PV (233a)+(233b) monthly mystery — documented memory entry; ~0.5 kWh/m² across ASHP cohort - _api_glazing_transmission strict-raise extension — mechanical - 8 open-front golden certs (oil + RR) at high residuals — blocked on worksheets Fuel-type diversity guidance: heating system breakdown across all 60+ fixtures shows 34 gas, 20 ASHP, 2 oil (both open-front no worksheets), 0 solid fuel, 0 LPG, 0 electric direct. Closure on oil + solid fuel + LPG + electric blocked on worksheet availability — the gov.uk EPB downloads UI returns API JSON only; dr87 worksheets come from the assessor's tool (typically Elmhurst SAP) export ZIP. Handover doc at docs/HANDOVER_GOLDEN_COVERAGE.md. --- .../docs/HANDOVER_GOLDEN_COVERAGE.md | 205 ++++++++++++++++++ 1 file changed, 205 insertions(+) create mode 100644 domain/sap10_calculator/docs/HANDOVER_GOLDEN_COVERAGE.md diff --git a/domain/sap10_calculator/docs/HANDOVER_GOLDEN_COVERAGE.md b/domain/sap10_calculator/docs/HANDOVER_GOLDEN_COVERAGE.md new file mode 100644 index 00000000..7dc0290a --- /dev/null +++ b/domain/sap10_calculator/docs/HANDOVER_GOLDEN_COVERAGE.md @@ -0,0 +1,205 @@ +# Handover — golden coverage + next slice + +Branch `feature/per-cert-mapper-validation`. **HEAD: `b7fbbcca`** (Slice +S0380.51 strict-raise UnmappedApiCode on API integer enums). +**Test baseline: 769 pass + 0 fail.** Pyright net-zero on every +touched file. + +## Recent session slices (S0380.47 → S0380.51) + +| Slice | Commit | What | +|---|---|---| +| **S0380.47** | `42ed38f7` | β-split wired into cost cascade per Appendix M1 §6 — zero cohort impact because Table 32 collapses code 30 = code 60 = 13.19 p/kWh | +| **S0380.48** | `bf99b1c7` | Schema gap closure: real-API `pv_batteries[]` lodges `battery_capacity` flat-shape (`[{"battery_capacity": 5}]`), schema expected nested `{"pv_battery": {"battery_capacity": 5}}` → 5-kWh batteries silently dropped → β too low. Cohort PE +2.7..+8.1 → −3.5..−4.5 | +| **S0380.49** | `e75198ce` | Effective-monthly Table 12e PE factors for the PV split per Appendix M1 §8. Cohort PE −3.5..−4.5 → −2.8..−3.7 | +| **S0380.50** | `3d1e6f10` | §4 seasonal monthly HW fuel for PV β cascade — replaced days-prorated hot-water demand with §4 (62)m seasonal output scaled to annual fuel. Cohort PE −2.8..−3.7 → −2.7..−3.5 | +| **S0380.51** | `b7fbbcca` | Strict-raise `UnmappedApiCode` on five API mapper helpers (`floor_construction`, `floor_heat_loss`, `roof_construction`, `party_wall_construction`, `built_form`). Surfaced two coverage gaps immediately (`floor_heat_loss` codes 2/3/6) and added explicit mappings. 6 new tests as the forcing function. | + +## Test-coverage matrix (current state) + +| Test file | Certs | What's pinned | +|---|---:|---| +| `test_summary_pdf_mapper_chain.py` | 38 cohort-2 + 8 ASHP + per-cert chain tests | **SAP at 1e-4 vs worksheet** | +| `test_golden_fixtures.py` | 15 certs | **SAP int + PE + CO2 residuals** vs API-lodged | +| **`test_all_golden_fixtures_extract_via_api_without_unmapped_code_raise`** | All JSON in `fixtures/golden/` | **No `UnmappedApiCode` raised** at extraction | + +### Cohort overlap + +- **Golden ∩ Cohort-2 = 0/38** — cohort-2 certs are NOT in golden fixtures +- **Golden ∩ ASHP = 7/8** — cert 9501 lives in chain tests only +- **Golden open-front** = 8 certs (oil + gas + RR) — **no worksheets**, API-only + +### Cohort-2 SAP closure (chain tests) +All 38 at max |Δ| = 5e-5 vs worksheet — closed. + +### Cohort-2 PE / CO2 (probed but NOT pinned anywhere) +- 24/38 closed (|PE| < 1, |CO2| < 0.05) +- 14/38 open. **Top offender: cert 2102 at +20.4 PE, −0.79 CO2** — completely undetected by any current test +- Other 13 cluster around −3 PE (same PV (233a/b) mystery pattern as the ASHP golden certs) + +## ★ Next slice — add cohort-2 to `test_golden_fixtures.py` + +**This is the agreed-upon next slice** (one-slice change, high-value): + +1. Run cohort-2 against `cert_to_demand_inputs` and capture current PE/CO2 residuals +2. Add `_GoldenExpectation` entries to `test_golden_fixtures.py` for all 38 certs +3. The pin tolerance stays at the existing `_PE_ABS_TOLERANCE_KWH_PER_M2 = 0.01` / `_CO2_ABS_TOLERANCE_TONNES = 0.001` +4. The 14 "open" certs get pinned at their CURRENT non-zero residuals (regression-guard, not closure) +5. Cert 2102 (+20.4 PE / −0.79 CO2) becomes immediately visible as the next closure target with worksheet support + +**Why this is high-leverage:** cohort-2 chain tests only pin SAP at 1e-4 (which catches cost-cascade drift but not PE/CO2 cascade drift). Cert 2102's +20.4 PE is invisible to any current test. Adding cohort-2 to golden creates regression guards across all three SAP/PE/CO2 cascades for 38 worksheet-backed certs. + +### Concrete implementation outline + +```python +# In test_golden_fixtures.py — add an entry per cohort-2 cert: +_GoldenExpectation( + cert_number="2102-3018-0205-7886-5204", + actual_sap=64, # from doc['energy_rating_current'] + expected_sap_resid=+0, # cohort-2 closure at 1e-4 → rounds to 0 + expected_pe_resid_kwh_per_m2=+20.3640, # current residual, pin here + expected_co2_resid_tonnes_per_yr=-0.7895, + notes=( + "Cohort-2 cert. SAP closed at 1e-4 via chain test. PE +20.4 / " + "CO2 -0.79 residuals are the open closure target — worksheet " + "exists (Summary + dr87) under `sap worksheets/`. Likely a " + "specific cascade gap to probe with the worksheet." + ), +), +``` + +Use the probe in this session's last diagnostic to capture exact residuals: + +```bash +PYTHONPATH=/workspaces/model python -c " +import json, pathlib +from datatypes.epc.domain.mapper import EpcPropertyDataMapper +from domain.sap10_calculator.rdsap.cert_to_inputs import cert_to_inputs, cert_to_demand_inputs, SAP_10_2_SPEC_PRICES +from domain.sap10_calculator.calculator import calculate_sap_from_inputs + +for cert in COHORT_2_LIST: + doc = json.loads(pathlib.Path(f'.../{cert}.json').read_text()) + epc = EpcPropertyDataMapper.from_api_response(doc) + rating = calculate_sap_from_inputs(cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)) + demand = calculate_sap_from_inputs(cert_to_demand_inputs(epc, prices=SAP_10_2_SPEC_PRICES)) + # ... print _GoldenExpectation tuple in the right format +" +``` + +The 38 fixture entries should land in one PR. After landing, cert 2102 becomes the obvious next closure target. + +## Open threads (after the cohort-2 add) + +### Tractable with worksheets we already have + +1. **Cert 2102 +20.4 PE / −0.79 CO2** — cohort-2 cert, worksheet exists under `sap worksheets/Additional data with api/` for the cohort-2 batch. Surfaced by cohort-2 → golden migration. Best next closure target. + +2. **PV (233a)+(233b) monthly mystery** — documented at [`project_pv_233_split_mystery.md`](~/.claude/projects/-workspaces-model/memory/project_pv_233_split_mystery.md). Cascade β = 0.7511 vs worksheet 0.7392 for cert 0380. Closes ~0.5 kWh/m² across the ASHP cohort. The 14 cohort-2 ASHP-pattern PE residuals at −3 kWh/m² likely share this root cause. + +3. **`_api_glazing_transmission` strict-raise extension** — the helper's existing comment says "Codes 4-12, 15+ not yet mapped — incremental coverage as new fixtures surface them." Same pattern as S0380.51. Mechanical; low risk; coverage-hardening. + +### Open without worksheets (low payoff) + +Golden fixtures with large residuals but no worksheets to triangulate: + +| Cert | PE Δ | Heating | Notes | +|---|---:|---|---| +| **6035** | +46.76 | Gas combi age A (mid-terrace) | RR with "limited insulation (assumed)" → cascade roof = 130.72 W/K, possibly wrong cascade routing — needs worksheet | +| **0390-2954** | −26.01 | **Oil combi** Firebird PCDF 9005 | Oil tariff cascade + fabric heat loss — needs worksheet | +| **0240** | +12.49 | **Oil boiler** + PV + RR (detached) | Subsystem heat-loss diff in notes (roof 76.93 W/K) — needs worksheet | +| **0300** | +8.28 | Gas combi large semi TFA 526 | Shower outlet schema work was recent — needs worksheet | +| **2130** | −8.22 (chain) / −8.22 (golden) | Gas combi + PV | "gas combi PE under-count + secondary heating credit" — needs worksheet | +| **7536** | −7.08 | Gas combi multi-age (D/L/F) | "multi-age geometry probably surfaces per-bp U the spec table doesn't capture" — needs worksheet | +| **0535** | (in golden) | — | open-front — needs worksheet | +| **8135** | −0.07 | Gas | already closed — keep as regression guard | + +**The user observation that oil is under-represented is correct**: 2 oil-boiler certs in golden, both at high residuals, both without worksheets. Solid fuel, LPG, electric direct-acting are completely absent. + +## Heating-system distribution across golden fixtures + +| Heating | Count | Worksheets | Status | +|---|---:|---|---| +| Boiler + radiators, mains gas | 34 | Most (cohort-2 + 9501) | Mostly closed at 1e-4 SAP | +| Air source heat pump | 20 | All 8 ASHP cohort have worksheets | β-split phase complete; ~−3 PE structural residual open | +| Boiler + radiators, oil | 2 | None | Both at high residuals; **closure blocked on worksheets** | +| Community scheme | 1 | None | Retired | +| Solid fuel | 0 | — | Completely absent | +| LPG | 0 | — | Completely absent | +| Electric direct / storage heater | 0 | — | Completely absent | + +## How to grow fixture diversity (answer to "what to download") + +For the gov.uk EPB downloads UI, you only get API JSON — that's enough for SAP-closure verification IF the cert's lodged SAP value can be trusted (it's the assessor's calculator output). But: + +- The **dr87-0001-NNNNNN.pdf** worksheet — needed to debug structural cascade gaps line-by-line — is generated by the assessor's calculator (typically Elmhurst SAP tool) and bundled in their export ZIP. Not available via the gov.uk UI. + +- The cohort-2 + ASHP worksheets in `sap worksheets/Additional data with api/` came from an Elmhurst data dump. + +**Recommended fixture targets** to unlock open work: + +1. **Oil worksheets** — for cert 0240 + 0390 + 0390-2954 in our golden set. These would close ~38 PE kWh/m² of residual immediately. +2. **A solid-fuel cert with worksheet** — anthracite / wood pellets / biomass. Currently zero coverage. The fuel-cost cascade through Table 32 + heat-emitter cascade has paths we've never exercised. +3. **An LPG cert with worksheet** — Table 32 code different from gas/oil; the cost cascade has an LPG-specific branch that has never run in tests. +4. **An electric direct-acting cert with worksheet** — storage heater (codes 401-409) or panel heater (codes 191-196). The off-peak tariff path (`_RDSAP_DEFINITELY_OFF_PEAK = {1, 4, 5}` in `cert_to_inputs.py`) currently raises rather than computes — first off-peak cert with worksheet would force that path. +5. **A community/district heating cert with worksheet** — currently the retired 9390 is the only such cert and it has no worksheet. + +When grabbing certs from the data dump, filter by `main_heating[0].description` to ensure fuel-type coverage: +- `Boiler and radiators, oil` (target: 5-10 worksheets) +- `Boiler and radiators, anthracite` / `wood pellets` / `wood logs` +- `Boiler and radiators, LPG` +- `Electric storage heaters` / `Direct-acting electric heaters` +- `Community scheme` + +## Strict-raise pattern (S0380.51) — extension queue + +The `UnmappedApiCode` strict-raise pattern is established in +`datatypes/epc/domain/mapper.py`. Currently five helpers raise: + +- `_api_party_wall_construction_int` +- `_api_floor_construction_str` +- `_api_floor_type_str` +- `_api_roof_construction_str` +- `_api_sheltered_sides` + +**Pending extensions (mechanical; each its own slice):** + +- `_api_glazing_transmission` — comment says "Codes 4-12, 15+ not yet mapped — incremental coverage as new fixtures surface them" +- `_api_cascade_glazing_type` — uses pass-through fallback `dict.get(code, code)` which is intentional but worth auditing to surface deliberate decisions + +The forcing function `test_all_golden_fixtures_extract_via_api_without_unmapped_code_raise` will catch any unmapped enum across the whole golden corpus at extraction time. Each new fixture added increases the gate's coverage automatically. + +## Test baseline at HEAD + +```bash +PYTHONPATH=/workspaces/model python -m pytest \ + backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \ + backend/documents_parser/tests/test_elmhurst_extractor.py \ + backend/documents_parser/tests/test_elmhurst_end_to_end.py \ + domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \ + domain/sap10_calculator/worksheet/tests/test_water_heating.py \ + domain/sap10_calculator/worksheet/tests/test_mean_internal_temperature.py \ + domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \ + domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \ + domain/sap10_calculator/tests/test_pcdb_table_362_lookup.py \ + domain/sap10_ml/tests/test_rdsap_uvalues.py \ + datatypes/epc/schema/tests/test_schema_loading.py \ + domain/sap10_calculator/worksheet/tests/test_photovoltaic.py \ + --no-cov -q +``` + +Expected: **769 pass + 0 fail**. + +## Conventions preserved + +- **1e-4 across the board** ([[feedback-one-e-minus-4-across-the-board]]) +- **Worksheet, not API, is the target** ([[feedback-worksheet-not-api-reference]]) +- **Verify worksheet PDF before accepting handover claims** ([[feedback-verify-handover-claims]]) +- **Spec-floor skepticism** ([[feedback-spec-floor-skepticism]]) +- **Golden residuals → ~0** ([[feedback-golden-residuals-near-zero]]) +- **AAA test convention** ([[feedback-aaa-test-convention]]) +- **`abs(diff) <= tol`** not `pytest.approx` ([[feedback-abs-diff-over-pytest-approx]]) +- **Spec citation in commit messages** ([[feedback-spec-citation-in-commits]]) +- **One slice = one commit; stage by name** ([[feedback-commit-per-slice]]) +- **Pyright net-zero per touched file** ([[feedback-zero-error-strict]]) +- **Cross-mapper parity via cascade** ([[feedback-cross-mapper-parity-via-cascade]]) +- **Bigger slices OK for uniform-cohort work** ([[feedback-bigger-slices-for-uniform-work]])