Model/domain/sap10_calculator/docs/HANDOVER_GOLDEN_RESIDUALS.md
Khalim Conn-Kowlessar 29c4b029e3 docs: handover after S0380.39..S0380.43 — cohort-2 API path 38/38 closed
Session shipped 5 slices that closed the entire cohort-2 API-path
cluster (S0380.39 bulk-fetch, S0380.40 parametrized test, S0380.41
RdSAP 21 → SAP 10.2 glazing alias, S0380.42 Decimal HALF_UP per-window
areas, S0380.43 SAP 631 → spec fuel).

Documents:
  - Cross-mapper parity at cascade established for all 38 cohort-2
    certs (and 9 cohort-1 ASHP); both paths < 1e-4 vs worksheet.
  - Tolerance tightening deferred — 1e-4 is the realistic floor at
    HEAD (worst residual 4.91e-5 on cert 2102).
  - Lessons learned: GOV.UK RdSAP 21 enum != cascade enum (codes
    needing remap are incremental as fixtures surface them);
    Decimal HALF_UP per-window areas extends the S0380.34/35
    pattern; SAP heating-type → spec fuel dispatch is the new
    forcing-function pattern for cert-lodgement inconsistencies.
  - Open front: golden-residuals → ~0 on PE/CO2. ASHP cluster
    (-7..-15 kWh/m² PE / +0.16..+0.28 t/yr CO2 across 7 certs with
    the same PCDB heat pump) is the highest-value single thread —
    likely SAP 10.2 Appendix L1 / Table 12 PE-factor or CO2-factor
    cascade gap. Three concrete diagnostic probes proposed.

Test baseline at HEAD: 750 pass + 0 fail.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 17:22:55 +00:00

14 KiB
Raw Blame History

Handover — Cohort-2 API path 38/38 closed; golden-residuals front next

Branch feature/per-cert-mapper-validation. This session shipped 5 slices (S0380.39 → S0380.43) that closed the entire cohort-2 API-path cluster. The branch is now at 750 pass + 0 fail — the 3-cert +0.42..+0.44 cluster (0300/9380/1536) closed via two spec citations + the Decimal HALF_UP pattern, and cert 2102's -6.30 residual closed via the SAP 4a heating-type → spec fuel dispatch.

HEAD at handover start: 6dccb15b (Slice S0380.43).

User's stated goal carried forward (from prior handover)

Tackle Thread 4 — API-path closure for cohort-2. … Tolerance: 1e-4 vs each cert's worksheet SAP value. … Bigger slices are appropriate here. … Drive golden-fixture residuals to ~0.

Threads 4 (cohort-2 API path closure) is DONE. The next thread — golden-fixture residuals → ~0 — is now the open front.

Slices shipped this session (handover-doc → HEAD)

Slice Commit Closes Spec citation
S0380.39 22ae6f4d Bulk-fetched 38 cohort-2 API JSONs via scripts/fetch_cohort2_api_jsons.py (infra)
S0380.40 ff25746f Parametrized API-path chain test mirroring Summary sweep; 34/38 immediate (test infra)
S0380.41 a96e6765 Closed 0300/9380 (+0.43/+0.42 → <1e-4); 1536 partial close RdSAP-Schema-21.0.0 glazed_type=1 = "DG installed before 2002 EAW" → SAP 10.2 Table 6b cascade code 2 (DG pre-2002, g_L=0.80, NOT single 0.90). RdSAP 10 Table 24 row 2 (PVC/wooden, 16+) → U=2.7
S0380.42 e1b7b30c Cert 1536 +0.0015 → -1e-6 RdSAP 10 §15 p.66 — Decimal HALF_UP per-window area at the 0.005 boundary (0.65 × 0.70 = 0.4550 exact / 0.45499... float drops to 0.45)
S0380.43 6dccb15b Cert 2102 -6.30 → +5e-5 SAP 10.2 Appendix M Table 4a code 631 ("Open fire in grate") + BS EN 13229:2001 inset-appliance class — solid fuel; Elmhurst Summary maps to Table 32 code 11 (House coal)

All on branch feature/per-cert-mapper-validation. Each includes spec citation in commit message, unit-level diff probes, AAA test convention, pyright net-zero per touched file.

Cohort distributions at HEAD 6dccb15b

Cohort-2 (38-cert dataset, API path)

Bucket (|Δ|) Session start Now Δ
exact (<1e-4) 34 38 +4
1e-4..0.07 0 0 =
0.07..0.5 3 0 -3
0.5..1 0 0 =
1..5 0 0 =
>5 1 0 -1
RAISES 0 0 =

Cohort-2 Summary path (unchanged)

38/38 < 1e-4 — closed in prior session's S0380.31..38.

Cohort-1 ASHP (9 certs, both paths)

9/9 < 1e-4 on both paths. Worst residual: cert 2225 4.8e-5 (binding constraint on _ASHP_COHORT_CHAIN_TOLERANCE tightening — see below).

Cross-mapper parity at the cascade — established

feedback-cross-mapper-parity-via-cascade now holds for all 38 cohort-2 certs: API and Summary paths both produce SAP within 1e-4 of each other AND of the worksheet, at the cascade output. The underlying EpcPropertyData may differ structurally between mappers (noise on cosmetic fields, schema-version int/str encoding), but the cascade output is the load-bearing equivalence check, and it's fully agreed.

Tolerance tightening — deferred

The prior handover proposed tightening _ASHP_COHORT_CHAIN_TOLERANCE from 1e-4 to ~1e-5. Not viable at HEAD. The cohort-wide worst residuals are:

  • Cohort-1 ASHP API path: cert 2225 -4.8e-5
  • Cohort-2 Summary path: cert 2102 -4.9e-5 (matches API)
  • Cohort-2 API path: cert 2102 +4.9e-5

So 1e-5 has no headroom. Realistic next floor is ~5e-5 (binding on cert 2225's -4.8e-5). Tightening to 5e-5 gives ~4% headroom — too thin to be robust to unrelated cascade drift. Tightening to ~6e-5 gives ~25% headroom but is an awkward number.

Decision: leave _ASHP_COHORT_CHAIN_TOLERANCE = 1e-4 and the cohort-2 strict tests at inline 1e-4. Tightening below 1e-4 requires closing cert 2225 specifically (per-cert investigation).

★ Open front: golden-residuals → ~0

test_golden_cert_residual_matches_pin pins PE Δ and CO2 Δ vs the gov.uk-lodged values (NOT the worksheet — this is a different reference point from the chain tests). Pins currently sit at:

Cert actual_sap sap_resid pe_resid (kWh/m²) co2_resid (t/yr) Notes
0240 73 -14 +12.49 +0.70 RR extraction, multi-subsystem gaps
0300 78 0 +8.28 -0.25 DSP showers + flue (closed at HEAD)
0390 60 -7 -26.01 -2.52 Firebird oil combi PCDF 9005
0535 ... ... ... ... cert 001479 fixture
2130 ... ... -38.63 +0.30 Largest pre-existing residual
6035 ... ... +46.76 +1.07 Largest pre-existing residual
ASHP cohort (the highest-value cluster)
0350 88 0 -7.78 +0.17 Mitsubishi PUZ-WM50VHA
0380 88 0 -14.60 +0.28 Mitsubishi PUZ-WM50VHA
2225 89 0 -11.77 +0.26 Mitsubishi PUZ-WM50VHA
2636 86 0 -9.65 +0.22 Mitsubishi PUZ-WM50VHA
3800 86 0 -9.61 +0.26 Mitsubishi PUZ-WM50VHA
9285 84 0 -7.96 +0.16 Mitsubishi PUZ-WM50VHA
9418 84 0 -7.30 +0.16 Daikin EDLQ05CAV3

The ASHP cluster shape:

  • All 7 certs hit sap_resid=0 (chain-test work closed this).
  • PE residual: -7..-15 kWh/m² UNDER-count (cascade < lodged).
  • CO2 residual: +0.16..+0.28 t/yr OVER-count (cascade > lodged).
  • Same magnitudes across 7 certs with the same PCDB heat pump strongly suggests a single shared cascade gap in the PE/CO2 factor cascade for ASHP electricity.

Diagnostic probe for cert 0380 at HEAD

Cert 0380 (60.43 m² TFA):
  Lodged PE: 56 kWh/m²  CO2: 0.3 t/yr
  Calc demand: PE=41.40 kWh/m²  CO2=0.578 t/yr
  PE residual: -14.60   CO2 residual: +0.28
  Main fuel: 29 (Electricity, mains)
  Main heating category: 4 (Heat pump)
  Secondary fuel: 29 (Electricity)
  Secondary heating: 691 (Portable electric heater default)

Hypotheses

The user's prior diagnosis (from earlier handover):

This smells like a single cascade gap in either the SAP 10.2 Appendix L1 primary-energy lookup for electricity (likely a missing distribution-loss factor or wrong tariff routing) or in the §12 Table 12d monthly electricity factor cascade for heat pumps.

Additional shape evidence:

  • PE under-count + CO2 over-count for the same fuel is structurally unusual. If both were PE-factor-driven, they'd move in the same direction. The split direction suggests the lodged values are using different factors than the cascade (possibly an older SAP factor vs current SAP 10.2).
  • 14.6 kWh/m² × 60.43 m² = 882 kWh/yr PE shortfall on cert 0380.
  • 0.28 t/yr × 1000 = 280 kg/yr CO2 over-count.

Slice plan for the ASHP PE cluster

Probe 1 — Inspect the SAP 10.2 Table 12 PE factor lookup. Find where the cascade resolves PE-factor-for-electricity (likely in internal_gains.py or cert_to_inputs.py _effective_monthly_pe_ factor or similar). Verify the factor used matches the lodged EPC's expected value (1.501 standard / 1.500 SAP 2012 / etc).

Probe 2 — Diff cert 0380 calc vs PCDB-listed heat-pump efficiency. The heat pump (Mitsubishi PUZ-WM50VHA PCDB 104568) has a documented SPF (seasonal performance factor). Check whether the cascade applies the correct SPF and the lodged-vs-cascade electricity-consumption delta accounts for the PE shortfall.

Probe 3 — Worksheet PE check. The cert 0380 worksheet PDF (likely dr87-0001-000899.pdf in the cohort-2 dir) lodges the worksheet's PE value at the bottom. Compare cascade PE to worksheet PE — if they agree, the lodgement is wrong (gov.uk computed differently); if they disagree, the cascade has a real gap.

Pre-existing large residuals (lower priority)

  • Cert 6035 PE +46.76 — handover claim of multi-subsystem gaps; not the same cluster cause as ASHP.
  • Cert 2130 PE -38.63 — also pre-existing; likely RR + PV + electricity.

These should be closed AFTER the ASHP cluster (which has a single clean root cause).

Conventions preserved (carry forward)

Lesson learned: GOV.UK RdSAP 21 enum ≠ cascade enum

The cascade's _G_LIGHT_BY_GLAZING_CODE table in internal_gains.py is keyed on the SAP 10.2 Table 6b enum that the Elmhurst extractor produces (_ELMHURST_GLAZING_LABEL_TO_SAP10). The API mapper currently passes the raw GOV.UK RdSAP 21 enum straight through. For codes 2/3/13/14 this coincidentally works (both enums agree on g_L for those codes); for code 1 it doesn't (GOV.UK 1 = DG pre-2002, SAP 10.2 1 = single).

Slice S0380.41 added _API_TO_SAP10_CASCADE_GLAZING_CODE to remap RdSAP 21 codes to SAP 10.2 codes for the SapWindow.glazing_type field that drives daylight g_L. Currently only code 1 remaps; other codes pass through. Future cert lodgements may surface analogous divergences (e.g. RdSAP 21 code 5 = single, but cascade code 5 gets 0.80 — a similar mismatch waiting to happen). Add remap entries as those codes appear in fixtures.

Lesson learned: Decimal HALF_UP extends to per-window areas

S0380.34/35 closed the Σ-then-round Decimal pattern (gross wall, party wall, kWp, living area). S0380.42 closed the round-per-then-Σ pattern for per-window areas: _decimal_round_half_up_product was added at three cascade sites (heat_transmission's windows_w_per_k + per-bp window-area accumulation; internal_gains' daylight g_L; solar_gains' window solar). Any future +0.0007-scale residual in per-window areas — or analogous Decimal boundary cases for OTHER elements (doors, alt-walls, RR sub-areas) — is the same class of bug, fixed the same way.

Lesson learned: SAP heating-type → spec fuel dispatch

S0380.43 added _API_SECONDARY_HEATING_SPEC_FUEL for SAP 631 ("Open fire in grate"). The pattern is incremental: a per-code dispatch dict that overrides the lodged fuel ONLY when (a) the heating type implies a specific fuel category, AND (b) the lodged fuel is incompatible (electric for a solid-fuel heater). Future cohort certs surfacing other inconsistencies (e.g. SAP 632 "Open fire" with electric fuel) can extend the dispatch without touching the routing logic.

Test baseline at HEAD

PYTHONPATH=/workspaces/model python -m pytest \
    backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \
    backend/documents_parser/tests/test_elmhurst_extractor.py \
    backend/documents_parser/tests/test_elmhurst_end_to_end.py \
    domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \
    domain/sap10_calculator/worksheet/tests/test_water_heating.py \
    domain/sap10_calculator/worksheet/tests/test_mean_internal_temperature.py \
    domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \
    domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \
    domain/sap10_calculator/tests/test_pcdb_table_362_lookup.py \
    domain/sap10_ml/tests/test_rdsap_uvalues.py \
    datatypes/epc/schema/tests/test_schema_loading.py \
    --no-cov -q

Expected: 750 pass + 0 fails.

First concrete actions for the next agent

  1. Re-run the diagnostic probe to confirm baseline reproduces (38/38 cohort-2 both paths < 1e-4; 9/9 ASHP cohort-1 < 1e-4; 750 pass + 0 fails).

  2. Probe 1 (PE factor lookup) — find the cascade's PE-factor resolution for electricity heat pumps. The most likely entry points: search cert_to_inputs.py for primary_energy, pe_factor, effective_monthly_pe_factor. Compare the resolved factor against SAP 10.2 Table 12 "Standard electricity" (PE = 1.501) and ASHP-specific entries.

  3. Probe 2 (worksheet vs cascade PE) — extract the PE value from cert 0380's worksheet PDF (dr87-0001-000899.pdf under sap worksheets/additional with api 2/0380-2530-6150-2326-4161/). Compare against cascade output 41.40 kWh/m² and lodged 56 kWh/m². This isolates "cascade vs spec" from "lodgement vs spec".

  4. Probe 3 (CO2 factor) — similar probe for CO2 factor cascade. The cluster's +0.16..+0.28 t/yr over-count is the same shape as PE under-count, suggesting both come from the same factor lookup.

  5. If the cluster has a single root cause (likely per the uniform shape), close it in ONE slice. Re-pin all 7 ASHP fixture expected_pe_resid_kwh_per_m2 and expected_co2_resid_tonnes_per_yr values to the new residuals (which should drop to ~0.01).

  6. Then move to the pre-existing residual cluster (certs 6035, 2130, 0240) — these have multi-subsystem gaps that need per-cert investigation. Less uniform than the ASHP cluster.

Good luck. The cohort-2 API closure is COMPLETE; the chain-test infrastructure is robust and battle-tested across 38 + 9 certs spanning gas/oil/heat-pump main heating, all RdSAP 21 schema variants, and multiple lodgement-source quirks. The golden-residuals front is the next high-value workstream, and the ASHP cluster is the cleanest single thread.