Model/domain/sap10_calculator/docs/HANDOVER_POST_S0380_124.md
Khalim Conn-Kowlessar 8904ec090b docs: handover + next-agent prompt post S0380.115..124
10 spec-cited slices closed this session:
  .115 — fixture ECF pin typo
  .116 — RdSAP 10 §15 A_RR_shell rounding (cert 000565 truly exact)
  .117 — re-pin golden PE residuals for 0240 + 6035
  .118 — cohort LINE_xx pins → 1e-4 + §15-aware RR test expecteds
  .119 — §5 test EPC builder propagates sap_roof_windows
  .120 — RdSAP 10 §5.11.4 NI vs explicit-0 roof discriminator
  .121 — floor_construction code 4 → "Solid" (basement cert 0712)
  .122 — tighten test_ventilation tolerances
  .123 — pin Table U5 share-column solar fluxes at exact equality
  .124 — tighten dimensions + rating arithmetic pins

Extended handover suite at HEAD `1e69bd39`: 775 pass, 0 fail.

Handover documents:
- HANDOVER_POST_S0380_124.md — full state + cert 0240 hypothesis ranking
- NEXT_AGENT_PROMPT_POST_S0380_124.md — two-task brief (0240 cost-cascade
  diagnosis + golden-corpus audit awaiting user's same-property
  heating-variant Elmhurst fixtures).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-30 22:15:22 +00:00

12 KiB
Raw Blame History

Handover — post Slices S0380.115..124

Branch: feature/per-cert-mapper-validation. HEAD 1e69bd39. Predecessor: HANDOVER_POST_S0380_114.md.

TL;DR

10 spec-cited slices landed on top of cc70e559:

Slice Commit Scope
S0380.115 d0268a5b Fixture ECF pin transcription typo 5.3866 → 5.3868 (PDF line 593)
S0380.116 f2e8b657 A_RR_shell rounded to 2 d.p. per RdSAP 10 §15 (p.66) — cert 000565 truly exact
S0380.117 854b8884 Re-pin golden PE residuals for 0240 + 6035
S0380.118 55a29f5a Cohort LINE_xx pins 0.01/0.1 → 1e-4 + §15-rounded RR test expecteds
S0380.119 a77f1a28 Propagate sap_roof_windows in §5 test EPC builder (closed 000516 lighting)
S0380.120 f0305d54 Distinguish "NI" string from explicit int(0) roof_insulation_thickness per RdSAP 10 §5.11.4
S0380.121 e698fabc Map floor_construction code 4 → "Solid" (basement cert 0712)
S0380.122 9f0dd645 Tighten test_ventilation tolerances (17 hand-crafted + 10 cohort pins)
S0380.123 49f87160 Pin Table U5 share-column solar fluxes at exact equality
S0380.124 1e69bd39 Tighten dimensions + rating arithmetic pins

Extended handover suite at HEAD 1e69bd39: 775 pass, 0 fail.

Cert 000565 is now TRULY EXACT — every SAP-result pin ≤5e-5 vs U985 PDF display.

Two-task handover for the next agent

Task 1: Close cert 0240's remaining residual

Cert 0240's mapper gap was largely closed by the §5.11.4 fix (Slice 120), but a SAP-rating residual of 10 persists alongside near-zero PE/CO2:

Pin Before Slice 120 After Slice 120 (now)
expected_sap_resid 14 10
expected_pe_resid_kwh_per_m2 +12.4933 +0.0542
expected_co2_resid_tonnes_per_yr +0.6957 +0.0626

PE and CO2 are essentially closed (sub-0.1 magnitude). The SAP residual 10 means cascade COST > lodged COST while energy demand and CO2 match. The driver is in the fuel-cost / ECF path, not the heat-loss path.

Cert 0240 shape

  • Detached house (property_type=0, built_form=1), TFA 202 m², stone walls
  • walls: "Sandstone, as built, insulated (assumed)" — solid stone
  • roofs: "Pitched, 400+ mm loft insulation" — Table 16 row 400+ → U≈0.11
  • floors: "Solid, insulated (assumed)" — §5.11.4 fired here too
  • main_heating: "Boiler and radiators, oil" — Table 4a oil boiler
  • secondary_heating: None
  • solar_water_heating: N
  • photovoltaic_supply: none_or_no_details (no PV)
  • mains_gas: N (off-grid oil)
  • SAP version 10.2

Hypothesis ranking

  1. Oil tariff routing. SAP 10.2 Table 12 / RdSAP10 Table 32 oil price is 7.64 p/kWh. Cascade may be defaulting to a different tariff (e.g. electricity 13.19 p/kWh) for either main or secondary cost. Δ in cost suggests a ~1.3× over-count which is consistent with a mis-routed tariff.
  2. Hot water fuel routing. Same oil boiler does HW. If HW cost routes via electricity tariff rather than oil, cost over-counts.
  3. Off-peak / 7-hour tariff (meter_type=3). The cert lodges meter_type=3 (10-hour off-peak). For an oil-heated dwelling this means oil-for-heating + electricity-for-other on a 10-hour off-peak. The cascade may be applying electricity tariff to oil energy.
  4. Standing-charge mishandling. Oil has no standing charge; if cascade adds gas/electricity standing charge, that's £120/yr — could account for some of the £420 cost residual.

Approach

  1. Probe cascade's fuel-cost breakdown for 0240 (result.intermediate's main_heating_cost_gbp, hot_water_cost_gbp, pumps_fans_cost_gbp, lighting_cost_gbp, standing_charges_gbp).
  2. Back-solve: with cascade total cost vs lodged cost, identify which sub-component is over-counting.
  3. Check what oil tariff lookup the cascade uses for this cert. Trace via cert_to_inputs_cost_per_kwh_for_fuel.
  4. Once the gap is localised, write an AAA test, fix per spec, re-pin expected_sap_resid to the new (smaller-magnitude) value.

Task 2: Audit golden corpus for fixture-coverage gaps

The user has supplied additional Elmhurst Summary + worksheet PDFs for the same property with multiple different heating systems. These will help cover shape gaps the current cohort doesn't exercise.

Why the residuals matter

Top remaining golden-corpus residuals (post-Slice 120):

Cert SAP res PE res (kWh/m²) CO2 res (t/yr) Shape
0240-0200-5706-2365-8010 10 +0.054 +0.063 Detached stone, oil boiler, TFA 202 — task 1 above
0390-2954-3640-2196-4175 6 26.4 2.55 TFA 360, oil + (?) PV cert
6035-7729-2309-0879-2296 6 +46.1 +1.05 TFA 128 mid-terrace age A, gas combi
7536-3827-0600-0600-0276 +1 7.08 0.19 Gas combi
2130-1033-4050-5007-8395 +1 7.50 0.05 Gas combi + PV

All other cohort-2 certs sit at SAP=0, sub-1 PE/CO2.

The biggest residuals (6035 +46 PE, 0390 26 PE) are documented mapper gaps in the cert notes: field. Each is a real cascade-vs-API divergence that needs a PDF reference (Summary + worksheet) to diagnose.

Why deterministic-cohort fixtures help

The 6 cohort fixtures (000474..000516) + 000565 are the only certs pinned at PDF-exact precision (abs=1e-4 against U985 PDF line refs). The golden corpus is pinned at the calc-vs-API-lodged residual, which means we accept whatever residual the cascade produces and pin against it. Closing those residuals requires:

  1. Source-of-truth worksheet PDF for the cert (currently we don't have one for 0390, 6035, etc.)
  2. Identify per-section cascade drift line-by-line
  3. Implement the missing spec rule
  4. Re-pin the smaller residual

The user's incoming Elmhurst worksheets (same property, multiple heating systems) will fill specific shape gaps. Specifically: same envelope but different heating → isolates the heating-cascade impact on SAP / PE / CO2 per fuel type. This is exactly the controlled-variable test we need to pin oil / heat-pump / electric / heat-network cascades against PDF precision rather than API residual.

Approach

  1. Wait for the user's new fixtures. Drop them into backend/documents_parser/tests/fixtures/ (Summary PDFs) and sap worksheets/ (U985 worksheet PDFs).
  2. For each variant (same property × different heating), run extractor → mapper → calculator and pin against the worksheet PDF.
  3. The first cert is the e2e baseline; subsequent certs share the envelope so cascade differences localise to the heating subsystem only.
  4. Each variant becomes a new mapper-driven fixture (mirror of _elmhurst_worksheet_000565.py pattern).

Test baseline at HEAD 1e69bd39

PYTHONPATH=/workspaces/model python -m pytest \
    backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \
    backend/documents_parser/tests/test_elmhurst_extractor.py \
    backend/documents_parser/tests/test_elmhurst_end_to_end.py \
    domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \
    domain/sap10_calculator/worksheet/tests/test_heat_transmission.py \
    domain/sap10_calculator/worksheet/tests/test_internal_gains.py \
    domain/sap10_calculator/worksheet/tests/test_solar_gains.py \
    domain/sap10_calculator/worksheet/tests/test_dimensions.py \
    domain/sap10_calculator/worksheet/tests/test_rating.py \
    domain/sap10_calculator/worksheet/tests/test_ventilation.py \
    domain/sap10_calculator/worksheet/tests/test_appendix_h_solar.py \
    domain/sap10_calculator/worksheet/tests/test_mev.py \
    domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \
    domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \
    domain/sap10_calculator/tests/test_pcdb_table_322_lookup.py \
    domain/sap10_calculator/tests/test_pcdb_table_329_lookup.py \
    --no-cov -q

Expected: 775 pass, 0 fail.

Memories to load (in order)

  1. project_cert_000565_recovery_state — full per-slice history at HEAD 1e69bd39
  2. feedback_sap_10_2_only_never_10_3CRITICAL — never reference SAP 10.3
  3. feedback_spec_citation_in_commits — quote spec text + page in commits
  4. feedback_verify_handover_claims — verify numeric claims against PDFs
  5. feedback_zero_error_strict — pyright net-zero per touched file
  6. feedback_commit_per_slice — one slice = one commit
  7. feedback_aaa_test_convention — literal # Arrange / # Act / # Assert headers
  8. feedback_e2e_validation_philosophy — abs=1e-4 pins, no rel/xfail
  9. feedback_abs_diff_over_pytest_approx — use abs(x-y) <= tol for new tests
  10. feedback_spec_floor_skepticism — verify "precision floor" claims against PDFs
  11. feedback_verify_handover_claims — same skepticism for handover narratives
  12. feedback_golden_residuals_near_zero — pins should shrink toward zero
  13. feedback_worksheet_not_api_reference — worksheet PDF is source of truth, not API EPC
  14. reference_unmapped_sap_code — calculator strict-raise pattern
  15. reference_unmapped_api_code — mapper strict-raise pattern
  16. project_sap10_ml_deprecationdomain/sap10_ml/ is retiring

Spec source quick-reference

All under domain/sap10_calculator/docs/specs/:

  • SAP 10.2 full spec: sap-10-2-full-specification-2025-03-14.pdf
    • §13 + Table 12 (p.191) — fuel cost / ECF / SAP rating
    • Appendix N (p.101-107) — heat pumps
  • RdSAP 10 spec: RdSAP 10 Specification 10-06-2025.pdf
    • §5.11.4 (p.44) — retrofit roof insulation (closed in Slice 120)
    • §15 (p.66) — rounding rules (closed in Slice 116)
    • §19 Table 32 (p.95) — RdSAP10 fuel prices / CO2 / PE factors
  • SAP 10.3 at sap-10-3-full-specification-2026-01-13.pdf: DO NOT reference (feedback-sap-10-2-only-never-10-3)

Standard workflow per slice

  1. Read spec page + identify rule
  2. Probe cascade vs lodged values; back-solve hypothesis
  3. Write failing AAA test
  4. Implement helper / cascade change
  5. Verify test passes
  6. Run handover suite (above command)
  7. Check pyright on touched files — net-zero from baseline (git stash + re-run pyright)
  8. Commit with spec citation + verbatim quote + Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
  9. Update project_cert_000565_recovery_state (rename if pivoting away) + MEMORY.md index

What NOT to do

  • Don't reference SAP 10.3 — track 10.2 deliberately
  • Don't widen pin tolerances to make pins pass — find the bug
  • Don't re-investigate any closed work (Slices .91..124) — all settled
  • Don't add new helpers to domain/sap10_ml/ — on the deprecation path
  • Don't trust handover numeric claims without verifying against source PDF
  • Don't accept "spec-precision floor" framing without spec-citation work

Where to put new Elmhurst fixtures

When the user supplies the new worksheets:

  • Summary PDFs → backend/documents_parser/tests/fixtures/Summary_<refno>.pdf
  • U985 worksheet PDFs → sap worksheets/<source-folder>/U985-0001-<refno>.pdf
  • Per-cert fixture module → domain/sap10_calculator/worksheet/tests/_elmhurst_worksheet_<refno>.py (mirror _elmhurst_worksheet_000565.py shape — mapper-driven build_epc())
  • Add to _FIXTURE_PINS + _FIXTURE_MODULES in domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py
  • AAA tests for any new mapper gaps go in backend/documents_parser/tests/test_summary_pdf_mapper_chain.py

The user's "same property, multiple heating systems" pattern is ideal: the envelope stays constant across variants, so any SAP/PE/CO2 difference is fully attributable to the heating cascade. That's the cleanest possible test vector for heating-section diagnostics.

Good luck.