Session shipped 5 slices that closed the entire cohort-2 API-path
cluster (S0380.39 bulk-fetch, S0380.40 parametrized test, S0380.41
RdSAP 21 → SAP 10.2 glazing alias, S0380.42 Decimal HALF_UP per-window
areas, S0380.43 SAP 631 → spec fuel).
Documents:
- Cross-mapper parity at cascade established for all 38 cohort-2
certs (and 9 cohort-1 ASHP); both paths < 1e-4 vs worksheet.
- Tolerance tightening deferred — 1e-4 is the realistic floor at
HEAD (worst residual 4.91e-5 on cert 2102).
- Lessons learned: GOV.UK RdSAP 21 enum != cascade enum (codes
needing remap are incremental as fixtures surface them);
Decimal HALF_UP per-window areas extends the S0380.34/35
pattern; SAP heating-type → spec fuel dispatch is the new
forcing-function pattern for cert-lodgement inconsistencies.
- Open front: golden-residuals → ~0 on PE/CO2. ASHP cluster
(-7..-15 kWh/m² PE / +0.16..+0.28 t/yr CO2 across 7 certs with
the same PCDB heat pump) is the highest-value single thread —
likely SAP 10.2 Appendix L1 / Table 12 PE-factor or CO2-factor
cascade gap. Three concrete diagnostic probes proposed.
Test baseline at HEAD: 750 pass + 0 fail.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
14 KiB
Handover — Cohort-2 API path 38/38 closed; golden-residuals front next
Branch feature/per-cert-mapper-validation. This session shipped
5 slices (S0380.39 → S0380.43) that closed the entire cohort-2
API-path cluster. The branch is now at 750 pass + 0 fail — the
3-cert +0.42..+0.44 cluster (0300/9380/1536) closed via two spec
citations + the Decimal HALF_UP pattern, and cert 2102's -6.30
residual closed via the SAP 4a heating-type → spec fuel dispatch.
HEAD at handover start: 6dccb15b (Slice S0380.43).
User's stated goal carried forward (from prior handover)
Tackle Thread 4 — API-path closure for cohort-2. … Tolerance: 1e-4 vs each cert's worksheet SAP value. … Bigger slices are appropriate here. … Drive golden-fixture residuals to ~0.
Threads 4 (cohort-2 API path closure) is DONE. The next thread — golden-fixture residuals → ~0 — is now the open front.
Slices shipped this session (handover-doc → HEAD)
| Slice | Commit | Closes | Spec citation |
|---|---|---|---|
| S0380.39 | 22ae6f4d |
Bulk-fetched 38 cohort-2 API JSONs via scripts/fetch_cohort2_api_jsons.py |
(infra) |
| S0380.40 | ff25746f |
Parametrized API-path chain test mirroring Summary sweep; 34/38 immediate | (test infra) |
| S0380.41 | a96e6765 |
Closed 0300/9380 (+0.43/+0.42 → <1e-4); 1536 partial close | RdSAP-Schema-21.0.0 glazed_type=1 = "DG installed before 2002 EAW" → SAP 10.2 Table 6b cascade code 2 (DG pre-2002, g_L=0.80, NOT single 0.90). RdSAP 10 Table 24 row 2 (PVC/wooden, 16+) → U=2.7 |
| S0380.42 | e1b7b30c |
Cert 1536 +0.0015 → -1e-6 | RdSAP 10 §15 p.66 — Decimal HALF_UP per-window area at the 0.005 boundary (0.65 × 0.70 = 0.4550 exact / 0.45499... float drops to 0.45) |
| S0380.43 | 6dccb15b |
Cert 2102 -6.30 → +5e-5 | SAP 10.2 Appendix M Table 4a code 631 ("Open fire in grate") + BS EN 13229:2001 inset-appliance class — solid fuel; Elmhurst Summary maps to Table 32 code 11 (House coal) |
All on branch feature/per-cert-mapper-validation. Each includes
spec citation in commit message, unit-level diff probes, AAA test
convention, pyright net-zero per touched file.
Cohort distributions at HEAD 6dccb15b
Cohort-2 (38-cert dataset, API path)
| Bucket (|Δ|) | Session start | Now | Δ |
|---|---|---|---|
| exact (<1e-4) | 34 | 38 | +4 |
| 1e-4..0.07 | 0 | 0 | = |
| 0.07..0.5 | 3 | 0 | -3 |
| 0.5..1 | 0 | 0 | = |
| 1..5 | 0 | 0 | = |
| >5 | 1 | 0 | -1 |
| RAISES | 0 | 0 | = |
Cohort-2 Summary path (unchanged)
38/38 < 1e-4 — closed in prior session's S0380.31..38.
Cohort-1 ASHP (9 certs, both paths)
9/9 < 1e-4 on both paths. Worst residual: cert 2225 −4.8e-5 (binding
constraint on _ASHP_COHORT_CHAIN_TOLERANCE tightening — see below).
Cross-mapper parity at the cascade — established
feedback-cross-mapper-parity-via-cascade now holds for all 38 cohort-2 certs: API and Summary paths both produce SAP within 1e-4 of each other AND of the worksheet, at the cascade output. The underlying EpcPropertyData may differ structurally between mappers (noise on cosmetic fields, schema-version int/str encoding), but the cascade output is the load-bearing equivalence check, and it's fully agreed.
Tolerance tightening — deferred
The prior handover proposed tightening _ASHP_COHORT_CHAIN_TOLERANCE
from 1e-4 to ~1e-5. Not viable at HEAD. The cohort-wide worst
residuals are:
- Cohort-1 ASHP API path: cert 2225 -4.8e-5
- Cohort-2 Summary path: cert 2102 -4.9e-5 (matches API)
- Cohort-2 API path: cert 2102 +4.9e-5
So 1e-5 has no headroom. Realistic next floor is ~5e-5 (binding on cert 2225's -4.8e-5). Tightening to 5e-5 gives ~4% headroom — too thin to be robust to unrelated cascade drift. Tightening to ~6e-5 gives ~25% headroom but is an awkward number.
Decision: leave _ASHP_COHORT_CHAIN_TOLERANCE = 1e-4 and the
cohort-2 strict tests at inline 1e-4. Tightening below 1e-4 requires
closing cert 2225 specifically (per-cert investigation).
★ Open front: golden-residuals → ~0
test_golden_cert_residual_matches_pin
pins PE Δ and CO2 Δ vs the gov.uk-lodged values (NOT the worksheet
— this is a different reference point from the chain tests). Pins
currently sit at:
| Cert | actual_sap | sap_resid | pe_resid (kWh/m²) | co2_resid (t/yr) | Notes |
|---|---|---|---|---|---|
| 0240 | 73 | -14 | +12.49 | +0.70 | RR extraction, multi-subsystem gaps |
| 0300 | 78 | 0 | +8.28 | -0.25 | DSP showers + flue (closed at HEAD) |
| 0390 | 60 | -7 | -26.01 | -2.52 | Firebird oil combi PCDF 9005 |
| 0535 | ... | ... | ... | ... | cert 001479 fixture |
| 2130 | ... | ... | -38.63 | +0.30 | Largest pre-existing residual |
| 6035 | ... | ... | +46.76 | +1.07 | Largest pre-existing residual |
| ASHP cohort (the highest-value cluster) | |||||
| 0350 | 88 | 0 | -7.78 | +0.17 | Mitsubishi PUZ-WM50VHA |
| 0380 | 88 | 0 | -14.60 | +0.28 | Mitsubishi PUZ-WM50VHA |
| 2225 | 89 | 0 | -11.77 | +0.26 | Mitsubishi PUZ-WM50VHA |
| 2636 | 86 | 0 | -9.65 | +0.22 | Mitsubishi PUZ-WM50VHA |
| 3800 | 86 | 0 | -9.61 | +0.26 | Mitsubishi PUZ-WM50VHA |
| 9285 | 84 | 0 | -7.96 | +0.16 | Mitsubishi PUZ-WM50VHA |
| 9418 | 84 | 0 | -7.30 | +0.16 | Daikin EDLQ05CAV3 |
The ASHP cluster shape:
- All 7 certs hit
sap_resid=0(chain-test work closed this). - PE residual: -7..-15 kWh/m² UNDER-count (cascade < lodged).
- CO2 residual: +0.16..+0.28 t/yr OVER-count (cascade > lodged).
- Same magnitudes across 7 certs with the same PCDB heat pump strongly suggests a single shared cascade gap in the PE/CO2 factor cascade for ASHP electricity.
Diagnostic probe for cert 0380 at HEAD
Cert 0380 (60.43 m² TFA):
Lodged PE: 56 kWh/m² CO2: 0.3 t/yr
Calc demand: PE=41.40 kWh/m² CO2=0.578 t/yr
PE residual: -14.60 CO2 residual: +0.28
Main fuel: 29 (Electricity, mains)
Main heating category: 4 (Heat pump)
Secondary fuel: 29 (Electricity)
Secondary heating: 691 (Portable electric heater default)
Hypotheses
The user's prior diagnosis (from earlier handover):
This smells like a single cascade gap in either the SAP 10.2 Appendix L1 primary-energy lookup for electricity (likely a missing distribution-loss factor or wrong tariff routing) or in the §12 Table 12d monthly electricity factor cascade for heat pumps.
Additional shape evidence:
- PE under-count + CO2 over-count for the same fuel is structurally unusual. If both were PE-factor-driven, they'd move in the same direction. The split direction suggests the lodged values are using different factors than the cascade (possibly an older SAP factor vs current SAP 10.2).
- 14.6 kWh/m² × 60.43 m² = 882 kWh/yr PE shortfall on cert 0380.
- 0.28 t/yr × 1000 = 280 kg/yr CO2 over-count.
Slice plan for the ASHP PE cluster
Probe 1 — Inspect the SAP 10.2 Table 12 PE factor lookup. Find
where the cascade resolves PE-factor-for-electricity (likely in
internal_gains.py or cert_to_inputs.py _effective_monthly_pe_ factor or similar). Verify the factor used matches the lodged
EPC's expected value (1.501 standard / 1.500 SAP 2012 / etc).
Probe 2 — Diff cert 0380 calc vs PCDB-listed heat-pump efficiency. The heat pump (Mitsubishi PUZ-WM50VHA PCDB 104568) has a documented SPF (seasonal performance factor). Check whether the cascade applies the correct SPF and the lodged-vs-cascade electricity-consumption delta accounts for the PE shortfall.
Probe 3 — Worksheet PE check. The cert 0380 worksheet PDF (likely
dr87-0001-000899.pdf in the cohort-2 dir) lodges the worksheet's
PE value at the bottom. Compare cascade PE to worksheet PE — if they
agree, the lodgement is wrong (gov.uk computed differently); if they
disagree, the cascade has a real gap.
Pre-existing large residuals (lower priority)
- Cert 6035 PE +46.76 — handover claim of multi-subsystem gaps; not the same cluster cause as ASHP.
- Cert 2130 PE -38.63 — also pre-existing; likely RR + PV + electricity.
These should be closed AFTER the ASHP cluster (which has a single clean root cause).
Conventions preserved (carry forward)
- 1e-4 across the board (feedback-one-e-minus-4-across-the-board)
- Worksheet, not API, is the target for chain tests (feedback-worksheet-not-api-reference) — except for the golden fixtures, which pin against gov.uk-lodged PE/CO2.
- Cross-mapper parity via cascade equivalence (feedback-cross-mapper-parity-via-cascade). Now fully established for cohort-2.
- Spec-floor skepticism (feedback-spec-floor-skepticism).
- Bigger slices OK for uniform-cohort work (feedback-bigger-slices-for-uniform-work).
- Golden residuals → ~0 (feedback-golden-residuals-near-zero). The 0.01 PE / 0.001 CO2 absolute tolerances stay; what changes is the expected residual itself (pinning at the actual delta vs zero).
- AAA test convention with literal
# Arrange / # Act / # Assert(feedback-aaa-test-convention). abs(diff) <= tolnotpytest.approx(feedback-abs-diff-over-pytest-approx).- Spec citation in commit messages (feedback-spec-citation-in-commits).
- One slice = one commit; stage by name (feedback-commit-per-slice).
- Strict-enum raises on unmapped labels / unresolved dispatch.
- Pyright net-zero per touched file.
Lesson learned: GOV.UK RdSAP 21 enum ≠ cascade enum
The cascade's _G_LIGHT_BY_GLAZING_CODE table in
internal_gains.py is keyed on the SAP 10.2 Table 6b enum that the
Elmhurst extractor produces (_ELMHURST_GLAZING_LABEL_TO_SAP10).
The API mapper currently passes the raw GOV.UK RdSAP 21 enum
straight through. For codes 2/3/13/14 this coincidentally works
(both enums agree on g_L for those codes); for code 1 it doesn't
(GOV.UK 1 = DG pre-2002, SAP 10.2 1 = single).
Slice S0380.41 added _API_TO_SAP10_CASCADE_GLAZING_CODE to remap
RdSAP 21 codes to SAP 10.2 codes for the SapWindow.glazing_type
field that drives daylight g_L. Currently only code 1 remaps; other
codes pass through. Future cert lodgements may surface analogous
divergences (e.g. RdSAP 21 code 5 = single, but cascade code 5
gets 0.80 — a similar mismatch waiting to happen). Add remap entries
as those codes appear in fixtures.
Lesson learned: Decimal HALF_UP extends to per-window areas
S0380.34/35 closed the Σ-then-round Decimal pattern (gross wall,
party wall, kWp, living area). S0380.42 closed the round-per-then-Σ
pattern for per-window areas: _decimal_round_half_up_product was
added at three cascade sites (heat_transmission's windows_w_per_k +
per-bp window-area accumulation; internal_gains' daylight g_L;
solar_gains' window solar). Any future +0.0007-scale residual in
per-window areas — or analogous Decimal boundary cases for OTHER
elements (doors, alt-walls, RR sub-areas) — is the same class of
bug, fixed the same way.
Lesson learned: SAP heating-type → spec fuel dispatch
S0380.43 added _API_SECONDARY_HEATING_SPEC_FUEL for SAP 631
("Open fire in grate"). The pattern is incremental: a per-code
dispatch dict that overrides the lodged fuel ONLY when (a) the
heating type implies a specific fuel category, AND (b) the lodged
fuel is incompatible (electric for a solid-fuel heater). Future
cohort certs surfacing other inconsistencies (e.g. SAP 632 "Open
fire" with electric fuel) can extend the dispatch without
touching the routing logic.
Test baseline at HEAD
PYTHONPATH=/workspaces/model python -m pytest \
backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \
backend/documents_parser/tests/test_elmhurst_extractor.py \
backend/documents_parser/tests/test_elmhurst_end_to_end.py \
domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \
domain/sap10_calculator/worksheet/tests/test_water_heating.py \
domain/sap10_calculator/worksheet/tests/test_mean_internal_temperature.py \
domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \
domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \
domain/sap10_calculator/tests/test_pcdb_table_362_lookup.py \
domain/sap10_ml/tests/test_rdsap_uvalues.py \
datatypes/epc/schema/tests/test_schema_loading.py \
--no-cov -q
Expected: 750 pass + 0 fails.
First concrete actions for the next agent
-
Re-run the diagnostic probe to confirm baseline reproduces (38/38 cohort-2 both paths < 1e-4; 9/9 ASHP cohort-1 < 1e-4; 750 pass + 0 fails).
-
Probe 1 (PE factor lookup) — find the cascade's PE-factor resolution for electricity heat pumps. The most likely entry points: search
cert_to_inputs.pyforprimary_energy,pe_factor,effective_monthly_pe_factor. Compare the resolved factor against SAP 10.2 Table 12 "Standard electricity" (PE = 1.501) and ASHP-specific entries. -
Probe 2 (worksheet vs cascade PE) — extract the PE value from cert 0380's worksheet PDF (
dr87-0001-000899.pdfundersap worksheets/additional with api 2/0380-2530-6150-2326-4161/). Compare against cascade output 41.40 kWh/m² and lodged 56 kWh/m². This isolates "cascade vs spec" from "lodgement vs spec". -
Probe 3 (CO2 factor) — similar probe for CO2 factor cascade. The cluster's +0.16..+0.28 t/yr over-count is the same shape as PE under-count, suggesting both come from the same factor lookup.
-
If the cluster has a single root cause (likely per the uniform shape), close it in ONE slice. Re-pin all 7 ASHP fixture
expected_pe_resid_kwh_per_m2andexpected_co2_resid_tonnes_per_yrvalues to the new residuals (which should drop to ~0.01). -
Then move to the pre-existing residual cluster (certs 6035, 2130, 0240) — these have multi-subsystem gaps that need per-cert investigation. Less uniform than the ASHP cluster.
Good luck. The cohort-2 API closure is COMPLETE; the chain-test infrastructure is robust and battle-tested across 38 + 9 certs spanning gas/oil/heat-pump main heating, all RdSAP 21 schema variants, and multiple lodgement-source quirks. The golden-residuals front is the next high-value workstream, and the ASHP cluster is the cleanest single thread.