diff --git a/domain/sap10_calculator/docs/HANDOVER_GOLDEN_RESIDUALS.md b/domain/sap10_calculator/docs/HANDOVER_GOLDEN_RESIDUALS.md new file mode 100644 index 00000000..b1e3ac17 --- /dev/null +++ b/domain/sap10_calculator/docs/HANDOVER_GOLDEN_RESIDUALS.md @@ -0,0 +1,304 @@ +# Handover — Cohort-2 API path 38/38 closed; golden-residuals front next + +Branch `feature/per-cert-mapper-validation`. This session shipped +**5 slices** (S0380.39 → S0380.43) that closed the **entire cohort-2 +API-path cluster**. The branch is now at **750 pass + 0 fail** — the +3-cert +0.42..+0.44 cluster (0300/9380/1536) closed via two spec +citations + the Decimal HALF_UP pattern, and cert 2102's -6.30 +residual closed via the SAP 4a heating-type → spec fuel dispatch. + +**HEAD at handover start:** `6dccb15b` (Slice S0380.43). + +## User's stated goal carried forward (from prior handover) + +> Tackle Thread 4 — API-path closure for cohort-2. … Tolerance: 1e-4 +> vs each cert's worksheet SAP value. … Bigger slices are appropriate +> here. … Drive golden-fixture residuals to ~0. + +Threads 4 (cohort-2 API path closure) is **DONE**. The next thread — +**golden-fixture residuals → ~0** — is now the open front. + +## Slices shipped this session (handover-doc → HEAD) + +| Slice | Commit | Closes | Spec citation | +|---|---|---|---| +| **S0380.39** | `22ae6f4d` | Bulk-fetched 38 cohort-2 API JSONs via `scripts/fetch_cohort2_api_jsons.py` | (infra) | +| **S0380.40** | `ff25746f` | Parametrized API-path chain test mirroring Summary sweep; 34/38 immediate | (test infra) | +| **S0380.41** | `a96e6765` | Closed 0300/9380 (+0.43/+0.42 → <1e-4); 1536 partial close | RdSAP-Schema-21.0.0 glazed_type=1 = "DG installed before 2002 EAW" → SAP 10.2 Table 6b cascade code 2 (DG pre-2002, g_L=0.80, NOT single 0.90). RdSAP 10 Table 24 row 2 (PVC/wooden, 16+) → U=2.7 | +| **S0380.42** | `e1b7b30c` | Cert 1536 +0.0015 → -1e-6 | RdSAP 10 §15 p.66 — Decimal HALF_UP per-window area at the 0.005 boundary (0.65 × 0.70 = 0.4550 exact / 0.45499... float drops to 0.45) | +| **S0380.43** | `6dccb15b` | Cert 2102 -6.30 → +5e-5 | SAP 10.2 Appendix M Table 4a code 631 ("Open fire in grate") + BS EN 13229:2001 inset-appliance class — solid fuel; Elmhurst Summary maps to Table 32 code 11 (House coal) | + +All on branch `feature/per-cert-mapper-validation`. Each includes +spec citation in commit message, unit-level diff probes, AAA test +convention, pyright net-zero per touched file. + +## Cohort distributions at HEAD `6dccb15b` + +### Cohort-2 (38-cert dataset, API path) + +| Bucket (\|Δ\|) | Session start | Now | Δ | +|---|---|---|---| +| exact (<1e-4) | 34 | **38** | **+4** | +| 1e-4..0.07 | 0 | **0** | = | +| 0.07..0.5 | 3 | **0** | -3 | +| 0.5..1 | 0 | **0** | = | +| 1..5 | 0 | **0** | = | +| >5 | 1 | **0** | -1 | +| RAISES | 0 | **0** | = | + +### Cohort-2 Summary path (unchanged) + +38/38 < 1e-4 — closed in prior session's S0380.31..38. + +### Cohort-1 ASHP (9 certs, both paths) + +9/9 < 1e-4 on both paths. Worst residual: cert 2225 −4.8e-5 (binding +constraint on `_ASHP_COHORT_CHAIN_TOLERANCE` tightening — see below). + +## Cross-mapper parity at the cascade — established + +[[feedback-cross-mapper-parity-via-cascade]] now holds for all 38 +cohort-2 certs: API and Summary paths both produce SAP within 1e-4 +of each other AND of the worksheet, at the cascade output. The +underlying EpcPropertyData may differ structurally between mappers +(noise on cosmetic fields, schema-version int/str encoding), but +the cascade output is the load-bearing equivalence check, and it's +fully agreed. + +## Tolerance tightening — deferred + +The prior handover proposed tightening `_ASHP_COHORT_CHAIN_TOLERANCE` +from 1e-4 to ~1e-5. **Not viable at HEAD.** The cohort-wide worst +residuals are: + + - Cohort-1 ASHP API path: cert 2225 -4.8e-5 + - Cohort-2 Summary path: cert 2102 -4.9e-5 (matches API) + - Cohort-2 API path: cert 2102 +4.9e-5 + +So 1e-5 has no headroom. Realistic next floor is ~5e-5 (binding on +cert 2225's -4.8e-5). Tightening to 5e-5 gives ~4% headroom — too +thin to be robust to unrelated cascade drift. Tightening to ~6e-5 +gives ~25% headroom but is an awkward number. + +**Decision:** leave `_ASHP_COHORT_CHAIN_TOLERANCE = 1e-4` and the +cohort-2 strict tests at inline `1e-4`. Tightening below 1e-4 requires +closing cert 2225 specifically (per-cert investigation). + +## ★ Open front: golden-residuals → ~0 + +[`test_golden_cert_residual_matches_pin`](../rdsap/tests/test_golden_fixtures.py) +pins **PE Δ and CO2 Δ** vs the gov.uk-lodged values (NOT the worksheet +— this is a different reference point from the chain tests). Pins +currently sit at: + +| Cert | actual_sap | sap_resid | pe_resid (kWh/m²) | co2_resid (t/yr) | Notes | +|---|---:|---:|---:|---:|---| +| 0240 | 73 | -14 | +12.49 | +0.70 | RR extraction, multi-subsystem gaps | +| 0300 | 78 | 0 | +8.28 | -0.25 | DSP showers + flue (closed at HEAD) | +| 0390 | 60 | -7 | -26.01 | -2.52 | Firebird oil combi PCDF 9005 | +| 0535 | ... | ... | ... | ... | cert 001479 fixture | +| 2130 | ... | ... | -38.63 | +0.30 | Largest pre-existing residual | +| 6035 | ... | ... | +46.76 | +1.07 | Largest pre-existing residual | +| **ASHP cohort (the highest-value cluster)** | | | | | | +| 0350 | 88 | 0 | -7.78 | +0.17 | Mitsubishi PUZ-WM50VHA | +| 0380 | 88 | 0 | -14.60 | +0.28 | Mitsubishi PUZ-WM50VHA | +| 2225 | 89 | 0 | -11.77 | +0.26 | Mitsubishi PUZ-WM50VHA | +| 2636 | 86 | 0 | -9.65 | +0.22 | Mitsubishi PUZ-WM50VHA | +| 3800 | 86 | 0 | -9.61 | +0.26 | Mitsubishi PUZ-WM50VHA | +| 9285 | 84 | 0 | -7.96 | +0.16 | Mitsubishi PUZ-WM50VHA | +| 9418 | 84 | 0 | -7.30 | +0.16 | Daikin EDLQ05CAV3 | + +The ASHP cluster shape: +- All 7 certs hit `sap_resid=0` (chain-test work closed this). +- PE residual: -7..-15 kWh/m² UNDER-count (cascade < lodged). +- CO2 residual: +0.16..+0.28 t/yr OVER-count (cascade > lodged). +- Same magnitudes across 7 certs with the same PCDB heat pump strongly + suggests a single shared cascade gap in the PE/CO2 factor cascade + for ASHP electricity. + +### Diagnostic probe for cert 0380 at HEAD + +``` +Cert 0380 (60.43 m² TFA): + Lodged PE: 56 kWh/m² CO2: 0.3 t/yr + Calc demand: PE=41.40 kWh/m² CO2=0.578 t/yr + PE residual: -14.60 CO2 residual: +0.28 + Main fuel: 29 (Electricity, mains) + Main heating category: 4 (Heat pump) + Secondary fuel: 29 (Electricity) + Secondary heating: 691 (Portable electric heater default) +``` + +### Hypotheses + +The user's prior diagnosis (from earlier handover): + +> This smells like a single cascade gap in either the SAP 10.2 +> Appendix L1 primary-energy lookup for electricity (likely a missing +> distribution-loss factor or wrong tariff routing) or in the §12 +> Table 12d monthly electricity factor cascade for heat pumps. + +Additional shape evidence: +- PE under-count + CO2 over-count for the same fuel is structurally + unusual. If both were PE-factor-driven, they'd move in the same + direction. The split direction suggests the lodged values are using + **different factors** than the cascade (possibly an older SAP factor + vs current SAP 10.2). +- 14.6 kWh/m² × 60.43 m² = **882 kWh/yr** PE shortfall on cert 0380. +- 0.28 t/yr × 1000 = **280 kg/yr** CO2 over-count. + +### Slice plan for the ASHP PE cluster + +**Probe 1 — Inspect the SAP 10.2 Table 12 PE factor lookup.** Find +where the cascade resolves PE-factor-for-electricity (likely in +`internal_gains.py` or `cert_to_inputs.py` `_effective_monthly_pe_ +factor` or similar). Verify the factor used matches the lodged +EPC's expected value (1.501 standard / 1.500 SAP 2012 / etc). + +**Probe 2 — Diff cert 0380 calc vs PCDB-listed heat-pump efficiency.** +The heat pump (Mitsubishi PUZ-WM50VHA PCDB 104568) has a documented +SPF (seasonal performance factor). Check whether the cascade applies +the correct SPF and the lodged-vs-cascade electricity-consumption +delta accounts for the PE shortfall. + +**Probe 3 — Worksheet PE check.** The cert 0380 worksheet PDF (likely +`dr87-0001-000899.pdf` in the cohort-2 dir) lodges the worksheet's +PE value at the bottom. Compare cascade PE to worksheet PE — if they +agree, the lodgement is wrong (gov.uk computed differently); if they +disagree, the cascade has a real gap. + +### Pre-existing large residuals (lower priority) + +- Cert 6035 PE +46.76 — handover claim of multi-subsystem gaps; not + the same cluster cause as ASHP. +- Cert 2130 PE -38.63 — also pre-existing; likely RR + PV + electricity. + +These should be closed AFTER the ASHP cluster (which has a single +clean root cause). + +## Conventions preserved (carry forward) + +- **1e-4 across the board** ([[feedback-one-e-minus-4-across-the-board]]) +- **Worksheet, not API, is the target** for chain tests + ([[feedback-worksheet-not-api-reference]]) — except for the golden + fixtures, which pin against gov.uk-lodged PE/CO2. +- **Cross-mapper parity via cascade equivalence** + ([[feedback-cross-mapper-parity-via-cascade]]). Now fully established + for cohort-2. +- **Spec-floor skepticism** ([[feedback-spec-floor-skepticism]]). +- **Bigger slices OK for uniform-cohort work** + ([[feedback-bigger-slices-for-uniform-work]]). +- **Golden residuals → ~0** + ([[feedback-golden-residuals-near-zero]]). The 0.01 PE / 0.001 CO2 + absolute tolerances stay; what changes is the **expected residual + itself** (pinning at the actual delta vs zero). +- **AAA test convention** with literal `# Arrange / # Act / # Assert` + ([[feedback-aaa-test-convention]]). +- **`abs(diff) <= tol`** not `pytest.approx` + ([[feedback-abs-diff-over-pytest-approx]]). +- **Spec citation in commit messages** + ([[feedback-spec-citation-in-commits]]). +- **One slice = one commit; stage by name** + ([[feedback-commit-per-slice]]). +- **Strict-enum raises** on unmapped labels / unresolved dispatch. +- **Pyright net-zero per touched file**. + +## Lesson learned: GOV.UK RdSAP 21 enum ≠ cascade enum + +The cascade's `_G_LIGHT_BY_GLAZING_CODE` table in +`internal_gains.py` is keyed on the SAP 10.2 Table 6b enum that the +**Elmhurst extractor** produces (`_ELMHURST_GLAZING_LABEL_TO_SAP10`). +The API mapper currently passes the raw GOV.UK RdSAP 21 enum +straight through. For codes 2/3/13/14 this coincidentally works +(both enums agree on g_L for those codes); for code 1 it doesn't +(GOV.UK 1 = DG pre-2002, SAP 10.2 1 = single). + +Slice S0380.41 added `_API_TO_SAP10_CASCADE_GLAZING_CODE` to remap +RdSAP 21 codes to SAP 10.2 codes for the SapWindow.glazing_type +field that drives daylight g_L. Currently only code 1 remaps; other +codes pass through. **Future cert lodgements may surface analogous +divergences** (e.g. RdSAP 21 code 5 = single, but cascade code 5 +gets 0.80 — a similar mismatch waiting to happen). Add remap entries +as those codes appear in fixtures. + +## Lesson learned: Decimal HALF_UP extends to per-window areas + +S0380.34/35 closed the Σ-then-round Decimal pattern (gross wall, +party wall, kWp, living area). S0380.42 closed the round-per-then-Σ +pattern for per-window areas: `_decimal_round_half_up_product` was +added at three cascade sites (heat_transmission's windows_w_per_k + +per-bp window-area accumulation; internal_gains' daylight g_L; +solar_gains' window solar). Any future +0.0007-scale residual in +per-window areas — or analogous Decimal boundary cases for OTHER +elements (doors, alt-walls, RR sub-areas) — is the same class of +bug, fixed the same way. + +## Lesson learned: SAP heating-type → spec fuel dispatch + +S0380.43 added `_API_SECONDARY_HEATING_SPEC_FUEL` for SAP 631 +("Open fire in grate"). The pattern is incremental: a per-code +dispatch dict that overrides the lodged fuel ONLY when (a) the +heating type implies a specific fuel category, AND (b) the lodged +fuel is incompatible (electric for a solid-fuel heater). Future +cohort certs surfacing other inconsistencies (e.g. SAP 632 "Open +fire" with electric fuel) can extend the dispatch without +touching the routing logic. + +## Test baseline at HEAD + +```bash +PYTHONPATH=/workspaces/model python -m pytest \ + backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \ + backend/documents_parser/tests/test_elmhurst_extractor.py \ + backend/documents_parser/tests/test_elmhurst_end_to_end.py \ + domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \ + domain/sap10_calculator/worksheet/tests/test_water_heating.py \ + domain/sap10_calculator/worksheet/tests/test_mean_internal_temperature.py \ + domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \ + domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \ + domain/sap10_calculator/tests/test_pcdb_table_362_lookup.py \ + domain/sap10_ml/tests/test_rdsap_uvalues.py \ + datatypes/epc/schema/tests/test_schema_loading.py \ + --no-cov -q +``` + +Expected: **750 pass + 0 fails**. + +## First concrete actions for the next agent + +1. **Re-run the diagnostic probe** to confirm baseline reproduces + (38/38 cohort-2 both paths < 1e-4; 9/9 ASHP cohort-1 < 1e-4; + 750 pass + 0 fails). + +2. **Probe 1 (PE factor lookup)** — find the cascade's PE-factor + resolution for electricity heat pumps. The most likely entry + points: search `cert_to_inputs.py` for `primary_energy`, + `pe_factor`, `effective_monthly_pe_factor`. Compare the resolved + factor against SAP 10.2 Table 12 "Standard electricity" + (PE = 1.501) and ASHP-specific entries. + +3. **Probe 2 (worksheet vs cascade PE)** — extract the PE value from + cert 0380's worksheet PDF (`dr87-0001-000899.pdf` under + `sap worksheets/additional with api 2/0380-2530-6150-2326-4161/`). + Compare against cascade output 41.40 kWh/m² and lodged 56 kWh/m². + This isolates "cascade vs spec" from "lodgement vs spec". + +4. **Probe 3 (CO2 factor)** — similar probe for CO2 factor cascade. + The cluster's +0.16..+0.28 t/yr over-count is the same shape as + PE under-count, suggesting both come from the same factor lookup. + +5. **If the cluster has a single root cause** (likely per the + uniform shape), close it in ONE slice. Re-pin all 7 ASHP fixture + `expected_pe_resid_kwh_per_m2` and `expected_co2_resid_tonnes_per_yr` + values to the new residuals (which should drop to ~0.01). + +6. **Then move to the pre-existing residual cluster** (certs 6035, + 2130, 0240) — these have multi-subsystem gaps that need per-cert + investigation. Less uniform than the ASHP cluster. + +Good luck. The cohort-2 API closure is COMPLETE; the chain-test +infrastructure is robust and battle-tested across 38 + 9 certs +spanning gas/oil/heat-pump main heating, all RdSAP 21 schema +variants, and multiple lodgement-source quirks. The golden-residuals +front is the next high-value workstream, and the ASHP cluster is +the cleanest single thread.