10 spec-cited slices closed this session: .115 — fixture ECF pin typo .116 — RdSAP 10 §15 A_RR_shell rounding (cert 000565 truly exact) .117 — re-pin golden PE residuals for 0240 + 6035 .118 — cohort LINE_xx pins → 1e-4 + §15-aware RR test expecteds .119 — §5 test EPC builder propagates sap_roof_windows .120 — RdSAP 10 §5.11.4 NI vs explicit-0 roof discriminator .121 — floor_construction code 4 → "Solid" (basement cert 0712) .122 — tighten test_ventilation tolerances .123 — pin Table U5 share-column solar fluxes at exact equality .124 — tighten dimensions + rating arithmetic pins Extended handover suite at HEAD `1e69bd39`: 775 pass, 0 fail. Handover documents: - HANDOVER_POST_S0380_124.md — full state + cert 0240 hypothesis ranking - NEXT_AGENT_PROMPT_POST_S0380_124.md — two-task brief (0240 cost-cascade diagnosis + golden-corpus audit awaiting user's same-property heating-variant Elmhurst fixtures). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
12 KiB
Handover — post Slices S0380.115..124
Branch: feature/per-cert-mapper-validation. HEAD 1e69bd39.
Predecessor: HANDOVER_POST_S0380_114.md.
TL;DR
10 spec-cited slices landed on top of cc70e559:
| Slice | Commit | Scope |
|---|---|---|
| S0380.115 | d0268a5b |
Fixture ECF pin transcription typo 5.3866 → 5.3868 (PDF line 593) |
| S0380.116 | f2e8b657 |
A_RR_shell rounded to 2 d.p. per RdSAP 10 §15 (p.66) — cert 000565 truly exact |
| S0380.117 | 854b8884 |
Re-pin golden PE residuals for 0240 + 6035 |
| S0380.118 | 55a29f5a |
Cohort LINE_xx pins 0.01/0.1 → 1e-4 + §15-rounded RR test expecteds |
| S0380.119 | a77f1a28 |
Propagate sap_roof_windows in §5 test EPC builder (closed 000516 lighting) |
| S0380.120 | f0305d54 |
Distinguish "NI" string from explicit int(0) roof_insulation_thickness per RdSAP 10 §5.11.4 |
| S0380.121 | e698fabc |
Map floor_construction code 4 → "Solid" (basement cert 0712) |
| S0380.122 | 9f0dd645 |
Tighten test_ventilation tolerances (17 hand-crafted + 10 cohort pins) |
| S0380.123 | 49f87160 |
Pin Table U5 share-column solar fluxes at exact equality |
| S0380.124 | 1e69bd39 |
Tighten dimensions + rating arithmetic pins |
Extended handover suite at HEAD 1e69bd39: 775 pass, 0 fail.
Cert 000565 is now TRULY EXACT — every SAP-result pin ≤5e-5 vs U985 PDF display.
Two-task handover for the next agent
Task 1: Close cert 0240's remaining residual
Cert 0240's mapper gap was largely closed by the §5.11.4 fix (Slice 120), but a SAP-rating residual of −10 persists alongside near-zero PE/CO2:
| Pin | Before Slice 120 | After Slice 120 (now) |
|---|---|---|
expected_sap_resid |
−14 | −10 |
expected_pe_resid_kwh_per_m2 |
+12.4933 | +0.0542 |
expected_co2_resid_tonnes_per_yr |
+0.6957 | +0.0626 |
PE and CO2 are essentially closed (sub-0.1 magnitude). The SAP residual −10 means cascade COST > lodged COST while energy demand and CO2 match. The driver is in the fuel-cost / ECF path, not the heat-loss path.
Cert 0240 shape
- Detached house (property_type=0, built_form=1), TFA 202 m², stone walls
walls: "Sandstone, as built, insulated (assumed)" — solid stoneroofs: "Pitched, 400+ mm loft insulation" — Table 16 row 400+ → U≈0.11floors: "Solid, insulated (assumed)" — §5.11.4 fired here toomain_heating: "Boiler and radiators, oil" — Table 4a oil boilersecondary_heating: Nonesolar_water_heating: Nphotovoltaic_supply:none_or_no_details(no PV)mains_gas: N (off-grid oil)- SAP version 10.2
Hypothesis ranking
- Oil tariff routing. SAP 10.2 Table 12 / RdSAP10 Table 32 oil price is 7.64 p/kWh. Cascade may be defaulting to a different tariff (e.g. electricity 13.19 p/kWh) for either main or secondary cost. Δ in cost suggests a ~1.3× over-count which is consistent with a mis-routed tariff.
- Hot water fuel routing. Same oil boiler does HW. If HW cost routes via electricity tariff rather than oil, cost over-counts.
- Off-peak / 7-hour tariff (
meter_type=3). The cert lodgesmeter_type=3(10-hour off-peak). For an oil-heated dwelling this means oil-for-heating + electricity-for-other on a 10-hour off-peak. The cascade may be applying electricity tariff to oil energy. - Standing-charge mishandling. Oil has no standing charge; if cascade adds gas/electricity standing charge, that's £120/yr — could account for some of the £420 cost residual.
Approach
- Probe cascade's fuel-cost breakdown for 0240 (
result.intermediate'smain_heating_cost_gbp,hot_water_cost_gbp,pumps_fans_cost_gbp,lighting_cost_gbp,standing_charges_gbp). - Back-solve: with cascade total cost vs lodged cost, identify which sub-component is over-counting.
- Check what oil tariff lookup the cascade uses for this cert. Trace
via
cert_to_inputs→_cost_per_kwh_for_fuel. - Once the gap is localised, write an AAA test, fix per spec, re-pin
expected_sap_residto the new (smaller-magnitude) value.
Task 2: Audit golden corpus for fixture-coverage gaps
The user has supplied additional Elmhurst Summary + worksheet PDFs for the same property with multiple different heating systems. These will help cover shape gaps the current cohort doesn't exercise.
Why the residuals matter
Top remaining golden-corpus residuals (post-Slice 120):
| Cert | SAP res | PE res (kWh/m²) | CO2 res (t/yr) | Shape |
|---|---|---|---|---|
| 0240-0200-5706-2365-8010 | −10 | +0.054 | +0.063 | Detached stone, oil boiler, TFA 202 — task 1 above |
| 0390-2954-3640-2196-4175 | −6 | −26.4 | −2.55 | TFA 360, oil + (?) PV cert |
| 6035-7729-2309-0879-2296 | −6 | +46.1 | +1.05 | TFA 128 mid-terrace age A, gas combi |
| 7536-3827-0600-0600-0276 | +1 | −7.08 | −0.19 | Gas combi |
| 2130-1033-4050-5007-8395 | +1 | −7.50 | −0.05 | Gas combi + PV |
All other cohort-2 certs sit at SAP=0, sub-1 PE/CO2.
The biggest residuals (6035 +46 PE, 0390 −26 PE) are documented mapper
gaps in the cert notes: field. Each is a real cascade-vs-API
divergence that needs a PDF reference (Summary + worksheet) to
diagnose.
Why deterministic-cohort fixtures help
The 6 cohort fixtures (000474..000516) + 000565 are the only certs pinned at PDF-exact precision (abs=1e-4 against U985 PDF line refs). The golden corpus is pinned at the calc-vs-API-lodged residual, which means we accept whatever residual the cascade produces and pin against it. Closing those residuals requires:
- Source-of-truth worksheet PDF for the cert (currently we don't have one for 0390, 6035, etc.)
- Identify per-section cascade drift line-by-line
- Implement the missing spec rule
- Re-pin the smaller residual
The user's incoming Elmhurst worksheets (same property, multiple heating systems) will fill specific shape gaps. Specifically: same envelope but different heating → isolates the heating-cascade impact on SAP / PE / CO2 per fuel type. This is exactly the controlled-variable test we need to pin oil / heat-pump / electric / heat-network cascades against PDF precision rather than API residual.
Approach
- Wait for the user's new fixtures. Drop them into
backend/documents_parser/tests/fixtures/(Summary PDFs) andsap worksheets/(U985 worksheet PDFs). - For each variant (same property × different heating), run extractor → mapper → calculator and pin against the worksheet PDF.
- The first cert is the e2e baseline; subsequent certs share the envelope so cascade differences localise to the heating subsystem only.
- Each variant becomes a new mapper-driven fixture (mirror of
_elmhurst_worksheet_000565.pypattern).
Test baseline at HEAD 1e69bd39
PYTHONPATH=/workspaces/model python -m pytest \
backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \
backend/documents_parser/tests/test_elmhurst_extractor.py \
backend/documents_parser/tests/test_elmhurst_end_to_end.py \
domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \
domain/sap10_calculator/worksheet/tests/test_heat_transmission.py \
domain/sap10_calculator/worksheet/tests/test_internal_gains.py \
domain/sap10_calculator/worksheet/tests/test_solar_gains.py \
domain/sap10_calculator/worksheet/tests/test_dimensions.py \
domain/sap10_calculator/worksheet/tests/test_rating.py \
domain/sap10_calculator/worksheet/tests/test_ventilation.py \
domain/sap10_calculator/worksheet/tests/test_appendix_h_solar.py \
domain/sap10_calculator/worksheet/tests/test_mev.py \
domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \
domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \
domain/sap10_calculator/tests/test_pcdb_table_322_lookup.py \
domain/sap10_calculator/tests/test_pcdb_table_329_lookup.py \
--no-cov -q
Expected: 775 pass, 0 fail.
Memories to load (in order)
project_cert_000565_recovery_state— full per-slice history at HEAD1e69bd39feedback_sap_10_2_only_never_10_3— CRITICAL — never reference SAP 10.3feedback_spec_citation_in_commits— quote spec text + page in commitsfeedback_verify_handover_claims— verify numeric claims against PDFsfeedback_zero_error_strict— pyright net-zero per touched filefeedback_commit_per_slice— one slice = one commitfeedback_aaa_test_convention— literal# Arrange / # Act / # Assertheadersfeedback_e2e_validation_philosophy— abs=1e-4 pins, no rel/xfailfeedback_abs_diff_over_pytest_approx— useabs(x-y) <= tolfor new testsfeedback_spec_floor_skepticism— verify "precision floor" claims against PDFsfeedback_verify_handover_claims— same skepticism for handover narrativesfeedback_golden_residuals_near_zero— pins should shrink toward zerofeedback_worksheet_not_api_reference— worksheet PDF is source of truth, not API EPCreference_unmapped_sap_code— calculator strict-raise patternreference_unmapped_api_code— mapper strict-raise patternproject_sap10_ml_deprecation—domain/sap10_ml/is retiring
Spec source quick-reference
All under domain/sap10_calculator/docs/specs/:
- SAP 10.2 full spec:
sap-10-2-full-specification-2025-03-14.pdf- §13 + Table 12 (p.191) — fuel cost / ECF / SAP rating
- Appendix N (p.101-107) — heat pumps
- RdSAP 10 spec:
RdSAP 10 Specification 10-06-2025.pdf- §5.11.4 (p.44) — retrofit roof insulation (closed in Slice 120)
- §15 (p.66) — rounding rules (closed in Slice 116)
- §19 Table 32 (p.95) — RdSAP10 fuel prices / CO2 / PE factors
- SAP 10.3 at
sap-10-3-full-specification-2026-01-13.pdf: DO NOT reference (feedback-sap-10-2-only-never-10-3)
Standard workflow per slice
- Read spec page + identify rule
- Probe cascade vs lodged values; back-solve hypothesis
- Write failing AAA test
- Implement helper / cascade change
- Verify test passes
- Run handover suite (above command)
- Check pyright on touched files — net-zero from baseline (
git stash+ re-run pyright) - Commit with spec citation + verbatim quote +
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> - Update
project_cert_000565_recovery_state(rename if pivoting away) +MEMORY.mdindex
What NOT to do
- Don't reference SAP 10.3 — track 10.2 deliberately
- Don't widen pin tolerances to make pins pass — find the bug
- Don't re-investigate any closed work (Slices .91..124) — all settled
- Don't add new helpers to
domain/sap10_ml/— on the deprecation path - Don't trust handover numeric claims without verifying against source PDF
- Don't accept "spec-precision floor" framing without spec-citation work
Where to put new Elmhurst fixtures
When the user supplies the new worksheets:
- Summary PDFs →
backend/documents_parser/tests/fixtures/Summary_<refno>.pdf - U985 worksheet PDFs →
sap worksheets/<source-folder>/U985-0001-<refno>.pdf - Per-cert fixture module →
domain/sap10_calculator/worksheet/tests/_elmhurst_worksheet_<refno>.py(mirror_elmhurst_worksheet_000565.pyshape — mapper-drivenbuild_epc()) - Add to
_FIXTURE_PINS+_FIXTURE_MODULESindomain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py - AAA tests for any new mapper gaps go in
backend/documents_parser/tests/test_summary_pdf_mapper_chain.py
The user's "same property, multiple heating systems" pattern is ideal: the envelope stays constant across variants, so any SAP/PE/CO2 difference is fully attributable to the heating cascade. That's the cleanest possible test vector for heating-section diagnostics.
Good luck.