docs: handover after S0380.47..S0380.51 — golden coverage state

Captures the per-cert validation state at HEAD b7fbbcca:

- 5 slices shipped this session: cost cascade β-split (.47), schema
  gap closure for real-API battery_capacity (.48), Table 12e
  effective-monthly PE factor for PV (.49), §4 seasonal HW for PV β
  cascade (.50), UnmappedApiCode strict-raise pattern on API mapper
  (.51).
- 769 pass + 0 fail across the full baseline; pyright net-zero on
  every touched file.

Crucial finding for the next agent: cohort-2 (38 certs) is chain-
tested at 1e-4 SAP vs worksheet but NOT in test_golden_fixtures.py
— PE/CO2 cascades have NO regression guard. Probed at HEAD:
14/38 cohort-2 certs have non-trivial PE residuals invisible to any
current test, including cert 2102 at +20.4 PE / -0.79 CO2 (single
worst undetected residual in the cohort).

Agreed next slice: add all 38 cohort-2 certs to
test_golden_fixtures.py with current PE/CO2 pinned. Surfaces cert
2102 as the next closure target (worksheet exists under
`sap worksheets/`) and creates PE/CO2 regression guards across the
worksheet-backed cohort.

Open threads ranked by tractability:
- Cert 2102 +20.4 PE — worksheet exists, well-scoped
- PV (233a)+(233b) monthly mystery — documented memory entry; ~0.5
  kWh/m² across ASHP cohort
- _api_glazing_transmission strict-raise extension — mechanical
- 8 open-front golden certs (oil + RR) at high residuals — blocked
  on worksheets

Fuel-type diversity guidance: heating system breakdown across all
60+ fixtures shows 34 gas, 20 ASHP, 2 oil (both open-front no
worksheets), 0 solid fuel, 0 LPG, 0 electric direct. Closure on
oil + solid fuel + LPG + electric blocked on worksheet availability
— the gov.uk EPB downloads UI returns API JSON only; dr87 worksheets
come from the assessor's tool (typically Elmhurst SAP) export ZIP.

Handover doc at docs/HANDOVER_GOLDEN_COVERAGE.md.
This commit is contained in:
Khalim Conn-Kowlessar 2026-05-28 20:55:17 +00:00 committed by Jun-te Kim
parent 730050d72a
commit 33293b0942

View file

@ -0,0 +1,205 @@
# Handover — golden coverage + next slice
Branch `feature/per-cert-mapper-validation`. **HEAD: `b7fbbcca`** (Slice
S0380.51 strict-raise UnmappedApiCode on API integer enums).
**Test baseline: 769 pass + 0 fail.** Pyright net-zero on every
touched file.
## Recent session slices (S0380.47 → S0380.51)
| Slice | Commit | What |
|---|---|---|
| **S0380.47** | `42ed38f7` | β-split wired into cost cascade per Appendix M1 §6 — zero cohort impact because Table 32 collapses code 30 = code 60 = 13.19 p/kWh |
| **S0380.48** | `bf99b1c7` | Schema gap closure: real-API `pv_batteries[]` lodges `battery_capacity` flat-shape (`[{"battery_capacity": 5}]`), schema expected nested `{"pv_battery": {"battery_capacity": 5}}` → 5-kWh batteries silently dropped → β too low. Cohort PE +2.7..+8.1 → 3.5..4.5 |
| **S0380.49** | `e75198ce` | Effective-monthly Table 12e PE factors for the PV split per Appendix M1 §8. Cohort PE 3.5..4.5 → 2.8..3.7 |
| **S0380.50** | `3d1e6f10` | §4 seasonal monthly HW fuel for PV β cascade — replaced days-prorated hot-water demand with §4 (62)m seasonal output scaled to annual fuel. Cohort PE 2.8..3.7 → 2.7..3.5 |
| **S0380.51** | `b7fbbcca` | Strict-raise `UnmappedApiCode` on five API mapper helpers (`floor_construction`, `floor_heat_loss`, `roof_construction`, `party_wall_construction`, `built_form`). Surfaced two coverage gaps immediately (`floor_heat_loss` codes 2/3/6) and added explicit mappings. 6 new tests as the forcing function. |
## Test-coverage matrix (current state)
| Test file | Certs | What's pinned |
|---|---:|---|
| `test_summary_pdf_mapper_chain.py` | 38 cohort-2 + 8 ASHP + per-cert chain tests | **SAP at 1e-4 vs worksheet** |
| `test_golden_fixtures.py` | 15 certs | **SAP int + PE + CO2 residuals** vs API-lodged |
| **`test_all_golden_fixtures_extract_via_api_without_unmapped_code_raise`** | All JSON in `fixtures/golden/` | **No `UnmappedApiCode` raised** at extraction |
### Cohort overlap
- **Golden ∩ Cohort-2 = 0/38** — cohort-2 certs are NOT in golden fixtures
- **Golden ∩ ASHP = 7/8** — cert 9501 lives in chain tests only
- **Golden open-front** = 8 certs (oil + gas + RR) — **no worksheets**, API-only
### Cohort-2 SAP closure (chain tests)
All 38 at max |Δ| = 5e-5 vs worksheet — closed.
### Cohort-2 PE / CO2 (probed but NOT pinned anywhere)
- 24/38 closed (|PE| < 1, |CO2| < 0.05)
- 14/38 open. **Top offender: cert 2102 at +20.4 PE, 0.79 CO2** — completely undetected by any current test
- Other 13 cluster around 3 PE (same PV (233a/b) mystery pattern as the ASHP golden certs)
## ★ Next slice — add cohort-2 to `test_golden_fixtures.py`
**This is the agreed-upon next slice** (one-slice change, high-value):
1. Run cohort-2 against `cert_to_demand_inputs` and capture current PE/CO2 residuals
2. Add `_GoldenExpectation` entries to `test_golden_fixtures.py` for all 38 certs
3. The pin tolerance stays at the existing `_PE_ABS_TOLERANCE_KWH_PER_M2 = 0.01` / `_CO2_ABS_TOLERANCE_TONNES = 0.001`
4. The 14 "open" certs get pinned at their CURRENT non-zero residuals (regression-guard, not closure)
5. Cert 2102 (+20.4 PE / 0.79 CO2) becomes immediately visible as the next closure target with worksheet support
**Why this is high-leverage:** cohort-2 chain tests only pin SAP at 1e-4 (which catches cost-cascade drift but not PE/CO2 cascade drift). Cert 2102's +20.4 PE is invisible to any current test. Adding cohort-2 to golden creates regression guards across all three SAP/PE/CO2 cascades for 38 worksheet-backed certs.
### Concrete implementation outline
```python
# In test_golden_fixtures.py — add an entry per cohort-2 cert:
_GoldenExpectation(
cert_number="2102-3018-0205-7886-5204",
actual_sap=64, # from doc['energy_rating_current']
expected_sap_resid=+0, # cohort-2 closure at 1e-4 → rounds to 0
expected_pe_resid_kwh_per_m2=+20.3640, # current residual, pin here
expected_co2_resid_tonnes_per_yr=-0.7895,
notes=(
"Cohort-2 cert. SAP closed at 1e-4 via chain test. PE +20.4 / "
"CO2 -0.79 residuals are the open closure target — worksheet "
"exists (Summary + dr87) under `sap worksheets/`. Likely a "
"specific cascade gap to probe with the worksheet."
),
),
```
Use the probe in this session's last diagnostic to capture exact residuals:
```bash
PYTHONPATH=/workspaces/model python -c "
import json, pathlib
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
from domain.sap10_calculator.rdsap.cert_to_inputs import cert_to_inputs, cert_to_demand_inputs, SAP_10_2_SPEC_PRICES
from domain.sap10_calculator.calculator import calculate_sap_from_inputs
for cert in COHORT_2_LIST:
doc = json.loads(pathlib.Path(f'.../{cert}.json').read_text())
epc = EpcPropertyDataMapper.from_api_response(doc)
rating = calculate_sap_from_inputs(cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES))
demand = calculate_sap_from_inputs(cert_to_demand_inputs(epc, prices=SAP_10_2_SPEC_PRICES))
# ... print _GoldenExpectation tuple in the right format
"
```
The 38 fixture entries should land in one PR. After landing, cert 2102 becomes the obvious next closure target.
## Open threads (after the cohort-2 add)
### Tractable with worksheets we already have
1. **Cert 2102 +20.4 PE / 0.79 CO2** — cohort-2 cert, worksheet exists under `sap worksheets/Additional data with api/` for the cohort-2 batch. Surfaced by cohort-2 → golden migration. Best next closure target.
2. **PV (233a)+(233b) monthly mystery** — documented at [`project_pv_233_split_mystery.md`](~/.claude/projects/-workspaces-model/memory/project_pv_233_split_mystery.md). Cascade β = 0.7511 vs worksheet 0.7392 for cert 0380. Closes ~0.5 kWh/m² across the ASHP cohort. The 14 cohort-2 ASHP-pattern PE residuals at 3 kWh/m² likely share this root cause.
3. **`_api_glazing_transmission` strict-raise extension** — the helper's existing comment says "Codes 4-12, 15+ not yet mapped — incremental coverage as new fixtures surface them." Same pattern as S0380.51. Mechanical; low risk; coverage-hardening.
### Open without worksheets (low payoff)
Golden fixtures with large residuals but no worksheets to triangulate:
| Cert | PE Δ | Heating | Notes |
|---|---:|---|---|
| **6035** | +46.76 | Gas combi age A (mid-terrace) | RR with "limited insulation (assumed)" → cascade roof = 130.72 W/K, possibly wrong cascade routing — needs worksheet |
| **0390-2954** | 26.01 | **Oil combi** Firebird PCDF 9005 | Oil tariff cascade + fabric heat loss — needs worksheet |
| **0240** | +12.49 | **Oil boiler** + PV + RR (detached) | Subsystem heat-loss diff in notes (roof 76.93 W/K) — needs worksheet |
| **0300** | +8.28 | Gas combi large semi TFA 526 | Shower outlet schema work was recent — needs worksheet |
| **2130** | 8.22 (chain) / 8.22 (golden) | Gas combi + PV | "gas combi PE under-count + secondary heating credit" — needs worksheet |
| **7536** | 7.08 | Gas combi multi-age (D/L/F) | "multi-age geometry probably surfaces per-bp U the spec table doesn't capture" — needs worksheet |
| **0535** | (in golden) | — | open-front — needs worksheet |
| **8135** | 0.07 | Gas | already closed — keep as regression guard |
**The user observation that oil is under-represented is correct**: 2 oil-boiler certs in golden, both at high residuals, both without worksheets. Solid fuel, LPG, electric direct-acting are completely absent.
## Heating-system distribution across golden fixtures
| Heating | Count | Worksheets | Status |
|---|---:|---|---|
| Boiler + radiators, mains gas | 34 | Most (cohort-2 + 9501) | Mostly closed at 1e-4 SAP |
| Air source heat pump | 20 | All 8 ASHP cohort have worksheets | β-split phase complete; ~3 PE structural residual open |
| Boiler + radiators, oil | 2 | None | Both at high residuals; **closure blocked on worksheets** |
| Community scheme | 1 | None | Retired |
| Solid fuel | 0 | — | Completely absent |
| LPG | 0 | — | Completely absent |
| Electric direct / storage heater | 0 | — | Completely absent |
## How to grow fixture diversity (answer to "what to download")
For the gov.uk EPB downloads UI, you only get API JSON — that's enough for SAP-closure verification IF the cert's lodged SAP value can be trusted (it's the assessor's calculator output). But:
- The **dr87-0001-NNNNNN.pdf** worksheet — needed to debug structural cascade gaps line-by-line — is generated by the assessor's calculator (typically Elmhurst SAP tool) and bundled in their export ZIP. Not available via the gov.uk UI.
- The cohort-2 + ASHP worksheets in `sap worksheets/Additional data with api/` came from an Elmhurst data dump.
**Recommended fixture targets** to unlock open work:
1. **Oil worksheets** — for cert 0240 + 0390 + 0390-2954 in our golden set. These would close ~38 PE kWh/m² of residual immediately.
2. **A solid-fuel cert with worksheet** — anthracite / wood pellets / biomass. Currently zero coverage. The fuel-cost cascade through Table 32 + heat-emitter cascade has paths we've never exercised.
3. **An LPG cert with worksheet** — Table 32 code different from gas/oil; the cost cascade has an LPG-specific branch that has never run in tests.
4. **An electric direct-acting cert with worksheet** — storage heater (codes 401-409) or panel heater (codes 191-196). The off-peak tariff path (`_RDSAP_DEFINITELY_OFF_PEAK = {1, 4, 5}` in `cert_to_inputs.py`) currently raises rather than computes — first off-peak cert with worksheet would force that path.
5. **A community/district heating cert with worksheet** — currently the retired 9390 is the only such cert and it has no worksheet.
When grabbing certs from the data dump, filter by `main_heating[0].description` to ensure fuel-type coverage:
- `Boiler and radiators, oil` (target: 5-10 worksheets)
- `Boiler and radiators, anthracite` / `wood pellets` / `wood logs`
- `Boiler and radiators, LPG`
- `Electric storage heaters` / `Direct-acting electric heaters`
- `Community scheme`
## Strict-raise pattern (S0380.51) — extension queue
The `UnmappedApiCode` strict-raise pattern is established in
`datatypes/epc/domain/mapper.py`. Currently five helpers raise:
- `_api_party_wall_construction_int`
- `_api_floor_construction_str`
- `_api_floor_type_str`
- `_api_roof_construction_str`
- `_api_sheltered_sides`
**Pending extensions (mechanical; each its own slice):**
- `_api_glazing_transmission` — comment says "Codes 4-12, 15+ not yet mapped — incremental coverage as new fixtures surface them"
- `_api_cascade_glazing_type` — uses pass-through fallback `dict.get(code, code)` which is intentional but worth auditing to surface deliberate decisions
The forcing function `test_all_golden_fixtures_extract_via_api_without_unmapped_code_raise` will catch any unmapped enum across the whole golden corpus at extraction time. Each new fixture added increases the gate's coverage automatically.
## Test baseline at HEAD
```bash
PYTHONPATH=/workspaces/model python -m pytest \
backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \
backend/documents_parser/tests/test_elmhurst_extractor.py \
backend/documents_parser/tests/test_elmhurst_end_to_end.py \
domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \
domain/sap10_calculator/worksheet/tests/test_water_heating.py \
domain/sap10_calculator/worksheet/tests/test_mean_internal_temperature.py \
domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \
domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \
domain/sap10_calculator/tests/test_pcdb_table_362_lookup.py \
domain/sap10_ml/tests/test_rdsap_uvalues.py \
datatypes/epc/schema/tests/test_schema_loading.py \
domain/sap10_calculator/worksheet/tests/test_photovoltaic.py \
--no-cov -q
```
Expected: **769 pass + 0 fail**.
## Conventions preserved
- **1e-4 across the board** ([[feedback-one-e-minus-4-across-the-board]])
- **Worksheet, not API, is the target** ([[feedback-worksheet-not-api-reference]])
- **Verify worksheet PDF before accepting handover claims** ([[feedback-verify-handover-claims]])
- **Spec-floor skepticism** ([[feedback-spec-floor-skepticism]])
- **Golden residuals → ~0** ([[feedback-golden-residuals-near-zero]])
- **AAA test convention** ([[feedback-aaa-test-convention]])
- **`abs(diff) <= tol`** not `pytest.approx` ([[feedback-abs-diff-over-pytest-approx]])
- **Spec citation in commit messages** ([[feedback-spec-citation-in-commits]])
- **One slice = one commit; stage by name** ([[feedback-commit-per-slice]])
- **Pyright net-zero per touched file** ([[feedback-zero-error-strict]])
- **Cross-mapper parity via cascade** ([[feedback-cross-mapper-parity-via-cascade]])
- **Bigger slices OK for uniform-cohort work** ([[feedback-bigger-slices-for-uniform-work]])