From 38e6d18a13853f237928671949a244ffbdfcbeb8 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Sun, 31 May 2026 09:05:24 +0000 Subject: [PATCH] docs: handover + next-agent prompt post S0380.125..130 Captures the heating-systems corpus closure work, the new permanent residual-pin regression test, and the queued S0380.131 candidate (heating-oil unit price spec-vs-worksheet divergence). Co-Authored-By: Claude Opus 4.7 --- .../docs/HANDOVER_POST_S0380_130.md | 253 ++++++++++++++++++ .../docs/NEXT_AGENT_PROMPT_POST_S0380_130.md | 197 ++++++++++++++ 2 files changed, 450 insertions(+) create mode 100644 domain/sap10_calculator/docs/HANDOVER_POST_S0380_130.md create mode 100644 domain/sap10_calculator/docs/NEXT_AGENT_PROMPT_POST_S0380_130.md diff --git a/domain/sap10_calculator/docs/HANDOVER_POST_S0380_130.md b/domain/sap10_calculator/docs/HANDOVER_POST_S0380_130.md new file mode 100644 index 00000000..7f3e3674 --- /dev/null +++ b/domain/sap10_calculator/docs/HANDOVER_POST_S0380_130.md @@ -0,0 +1,253 @@ +# Handover — post Slices S0380.125..130 + +Branch: `feature/per-cert-mapper-validation`. **HEAD `c8486077`**. +Predecessor: [`HANDOVER_POST_S0380_124.md`](HANDOVER_POST_S0380_124.md). + +## TL;DR + +Six slices landed on top of `8904ec09`. The user pivoted away from +cert 0240's residual closure and into a new controlled-variable +heating-systems corpus (1 property × 41 heating variants). All 41 +now cascade-execute; permanent residual-pin regression test landed; +investigation surfaced a heating-oil unit-price discrepancy between +the published RdSAP 10 spec PDF (7.64 p/kWh) and the +operationally-canonical Elmhurst worksheet + gov.uk register values +(5.44 p/kWh). + +| Slice | Commit | Scope | +|---|---|---| +| **S0380.125** | `d8cdee4e` | meter_type "18 Hour" alias per RdSAP 10 §17 + §12 | +| **S0380.126** | `e25aa021` | bare "Underfloor Heating" → §10.11 Table 29 subtype derivation | +| **S0380.127** | `11ecac94` | "No Access" cylinder → Table 28 derivation (oil HW + off-peak meter) | +| **S0380.128** | `729ee29c` | extractor §14.0 closure falls back to "14.1 Community Heating" | +| **S0380.129** | `82b8a16b` | permanent residual-pin regression guard (41 parametrised) | +| **S0380.130** | `c8486077` | Elmhurst oil-mains routed via §15.0 Water Heating Fuel Type fallback | + +Extended handover suite at HEAD: **874 pass, 0 fail**. + +## What changed + +### The corpus + +User provided `sap worksheets/heating systems examples/` — 47 folders, +**41 populated** (6 empty: `community heating 5`, `electric 4`, +`electric 10`, `gshp 2`, `pcdb 2`, `solid fuel 1`). Every variant is +the same dwelling (Reference 001431, semi-detached, TFA 90 m², age G +1983-1990, W6 9BF) under a different heating system. Each carries an +Elmhurst Summary PDF + an Elmhurst P960 worksheet PDF. Controlled- +variable test set — cascade-vs-worksheet residuals are fully +attributable to the heating subsystem. + +### Permanent regression test + +[`backend/documents_parser/tests/test_heating_systems_corpus.py`](backend/documents_parser/tests/test_heating_systems_corpus.py) +(S0380.129) — single parametrised test +`test_heating_systems_corpus_residual_matches_pin` driven by 41 +`_CorpusExpectation` entries. Per variant: + +1. Block 11a (individual) or 11b (community) pins extracted from P960: + continuous SAP (`SAP value`), total fuel cost (255)/(355), CO2 + (272/372/382/383), PE (286/386/486/483). +2. Summary PDF → extractor → mapper → cascade. +3. Each cascade output pinned against the residual at tight tolerance + (SAP ±0.001, cost ±£0.01, CO2 ±0.1 kg/yr, PE ±0.1 kWh/yr). + +Tolerances stay tight; **expected residuals move toward 0** as +heating-cascade gaps close. Per [[feedback-zero-error-strict]] + +[[feedback-golden-residuals-near-zero]] — re-pin smaller, never +widen the tolerance. + +### Current residual cluster (post-S0380.130) + +Cascade SAP_c minus worksheet SAP_c per variant, sorted by absolute +value (smallest first): + +| Variant | ΔSAP_c | Notes | +|---|---:|---| +| solid fuel 8 | +0.87 | closest to closure | +| community heating 2/4 | +1.16 | gas-fired heat network (envelope-identical pairs) | +| solid fuel 5 | +3.79 | | +| community heating 1/3 | +4.18 | gas-fired heat network (1↔3 + 2↔4 pairs) | +| solid fuel 4 | +5.07 | | +| gshp | +5.16 | | +| ashp | +5.67 | | +| **community heating 6** | **−6.87** | **only negative ΔSAP — heat-pump heat network** | +| oil 1 | **−9.70** | **after S0380.130 — over-counts at 7.64 p/kWh** | +| pcdb 1 | −9.41 | **after S0380.130** | +| oil pcdb 3 | −10.87 | **after S0380.130** | +| oil pcdb 1/2 | −11.63 | **after S0380.130** | +| oil 3 | +30.95 | bio-FAME boiler (worksheet uses 7.64, spec says 5.44) | +| no system | +21.94 | SAP code 699 | +| oil 5 (pathological) | +120.75 | bioethanol; worksheet clamps SAP int to 1 | + +## The S0380.131 candidate — heating-oil unit price + +**Status: queued, decision pending.** Two slices were agreed; S0380.130 +landed the mapper half. S0380.131 is the cascade-price half. + +### Evidence + +| Source | Heating oil p/kWh | Heating oil CO2 kg/kWh | +|---|---:|---:| +| SAP 10.2 spec PDF Table 12 p.191 | 4.94 | 0.298 | +| **RdSAP 10 spec PDF** Table 32 p.95 | **7.64** | 0.298 | +| `domain/sap10_calculator/tables/table_32.py` (verbatim from RdSAP 10) | 7.64 | 0.298 | +| **Elmhurst P960 worksheet** for oil 1 + oil pcdb 1/3 | **5.44** | 0.298 | +| **Cert 0240** (gov.uk register lodged SAP 73) back-solved | **~5.48** | matches oil | + +Two independent implementations (Elmhurst worksheet + gov.uk register's +lodging software) agree on **5.44** for heating oil; the published +RdSAP 10 spec PDF (7.64) is the outlier. Per +[[feedback-worksheet-not-api-reference]] the worksheet is the source +of truth. + +### Two distinct gaps were investigated + +The S0380.130 mapper fix and S0380.131 price fix are **independent**: + +- **S0380.130** (landed) fixes the Elmhurst mapper for oil mains. It + affects the heating-systems corpus (oil 1, oil pcdb 1/2/3, pcdb 1). + It does NOT touch cert 0240 (which already uses the API mapper with + correct fuel routing). +- **S0380.131** (queued) would switch the cascade's heating-oil tariff + to 5.44. It affects ANY oil cert whose cost passes through the + cascade — including the heating-systems corpus AND cert 0240 AND + cert 0390 in the golden corpus. + +Closing S0380.131 is what would move cert 0240's golden residual from +−10 toward 0; S0380.130 alone leaves cert 0240 unchanged. + +### Projected impact of switching cascade to 5.44 + +| Cert | Current ΔSAP | After 7.64 → 5.44 | +|---|---:|---:| +| oil 1 corpus | −9.70 | ~+0.6 (closes) | +| oil pcdb 1/2 corpus | −11.63 | ~−1 | +| oil pcdb 3 corpus | −10.87 | ~−1 | +| pcdb 1 corpus | −9.41 | ~+1 | +| **cert 0240 golden** | **−10** | **~0 (closes exactly to lodged 73)** | +| cert 0390 golden | −6 | improves significantly | + +### Open questions before implementing + +1. Is there a more authoritative spec source for 5.44? Check the BRE + technical papers in `domain/sap10_calculator/docs/specs/sap10 + technical papers/` for any RdSAP 10 errata or fuel-price update. +2. Should bio-FAME price also flip (worksheet uses 7.64 for FAME but + spec says 5.44 — possible spec PDF row swap)? +3. Should standing charges, CO2, or PE factors change too? Per the + evidence above only the unit-price column is divergent. + +The user explicitly agreed to the two-slice split so any spec-target +change in S0380.131 is isolated and reviewable on its own. + +## Test baseline at HEAD `c8486077` + +```bash +PYTHONPATH=/workspaces/model python -m pytest \ + backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \ + backend/documents_parser/tests/test_heating_systems_corpus.py \ + backend/documents_parser/tests/test_elmhurst_extractor.py \ + backend/documents_parser/tests/test_elmhurst_end_to_end.py \ + domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \ + domain/sap10_calculator/worksheet/tests/test_heat_transmission.py \ + domain/sap10_calculator/worksheet/tests/test_internal_gains.py \ + domain/sap10_calculator/worksheet/tests/test_solar_gains.py \ + domain/sap10_calculator/worksheet/tests/test_dimensions.py \ + domain/sap10_calculator/worksheet/tests/test_rating.py \ + domain/sap10_calculator/worksheet/tests/test_ventilation.py \ + domain/sap10_calculator/worksheet/tests/test_appendix_h_solar.py \ + domain/sap10_calculator/worksheet/tests/test_mev.py \ + domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \ + domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \ + domain/sap10_calculator/tests/test_pcdb_table_322_lookup.py \ + domain/sap10_calculator/tests/test_pcdb_table_329_lookup.py \ + domain/sap10_calculator/tests/test_table_12a.py \ + --no-cov -q +``` + +Expected: **874 pass, 0 fail**. + +## Memories to load (in order) + +1. `project-heating-systems-corpus` — full corpus state at HEAD `c8486077` +2. `project-oil-price-spec-divergence` — S0380.131 plan + evidence +3. `project-cert-000565-recovery-state` — per-slice history (legacy log) +4. `feedback-sap-10-2-only-never-10-3` — **CRITICAL** — never reference SAP 10.3 +5. `feedback-worksheet-not-api-reference` — worksheet PDF is source of truth +6. `feedback-spec-citation-in-commits` — quote spec + page in commits +7. `feedback-verify-handover-claims` — verify numeric claims against PDFs +8. `feedback-zero-error-strict` — never widen tolerances; re-pin smaller +9. `feedback-commit-per-slice` — one slice = one commit +10. `feedback-aaa-test-convention` — literal `# Arrange / # Act / # Assert` +11. `feedback-e2e-validation-philosophy` — abs=1e-4 pins +12. `feedback-abs-diff-over-pytest-approx` — `abs(x-y) <= tol` +13. `feedback-spec-floor-skepticism` — verify "precision floor" against PDFs +14. `feedback-golden-residuals-near-zero` — pins shrink toward zero +15. `feedback-one-e-minus-4-across-the-board` — 1e-4 bar for HP certs too +16. `reference-unmapped-sap-code` — calculator strict-raise pattern +17. `reference-unmapped-api-code` — mapper strict-raise pattern +18. `project-sap10-ml-deprecation` — `domain/sap10_ml/` is retiring + +## Spec source quick-reference + +All under `domain/sap10_calculator/docs/specs/`: + +- **SAP 10.2 full spec**: `sap-10-2-full-specification-2025-03-14.pdf` + - §13 + Table 12 (p.191) — fuel cost / ECF / SAP rating + - Table 4a-d (p.163-170) — heating systems + responsiveness + - Appendix N (p.101-107) — heat pumps +- **RdSAP 10 spec**: `RdSAP 10 Specification 10-06-2025.pdf` + - §5 (p.29) — fabric defaults + - §10.11 Table 29 (p.56) — heating/HW parameters (closed in S0380.126) + - Table 28 (p.55) — cylinder size (closed in S0380.127) + - §12 (p.62) — electricity tariff dispatch + - §17 (p.85) — data collection (meter_type lodging form) + - §19 Table 32 (p.95) — RdSAP10 fuel prices / CO2 / PE factors +- **BRE technical papers** at `sap10 technical papers/` — check for any + RdSAP 10 errata / fuel-price update relevant to S0380.131 +- **SAP 10.3** at `sap-10-3-full-specification-2026-01-13.pdf`: + **DO NOT reference** ([[feedback-sap-10-2-only-never-10-3]]) + +## Standard workflow per slice + +1. Read spec page + identify rule +2. Probe cascade vs worksheet/PDF; back-solve hypothesis +3. Write failing AAA test +4. Implement helper / cascade change +5. Verify test passes +6. Run extended handover suite (above command) +7. Check pyright on touched files — net-zero from baseline + (`git stash` → pyright → `git stash pop` → pyright) +8. Commit with spec citation + verbatim quote + + `Co-Authored-By: Claude Opus 4.7 ` +9. Update `project-heating-systems-corpus` + `MEMORY.md` index + +## What NOT to do + +- **Don't reference SAP 10.3** — track 10.2 deliberately +- **Don't widen pin tolerances** to make pins pass — re-pin smaller or + find the spec gap +- **Don't re-investigate closed work** (Slices .91..130) — all settled +- **Don't add new helpers to `domain/sap10_ml/`** — on the deprecation path +- **Don't conflate the mapper fix (S0380.130) with the price fix + (S0380.131)** — they're distinct. The mapper fix doesn't close cert + 0240; only the price fix does +- **Don't accept "spec-precision floor" framing** without spec-citation + work — verify against worksheet PDF + cross-cert empirical evidence + +## Where new heating-systems-corpus fixtures live + +- Summary PDF: `sap worksheets/heating systems examples//Summary_001431.pdf` +- P960 worksheet PDF: `sap worksheets/heating systems examples//P960-0001-001431 - .pdf` +- Pin entries: `backend/documents_parser/tests/test_heating_systems_corpus.py`'s + `_EXPECTATIONS` tuple + +## User direction + +Two-slice plan (S0380.130 + S0380.131) was agreed in the conversation. +S0380.130 landed first. The user explicitly noted that the mapper fix +and the golden-bug fix are distinct — the next agent should preserve +that distinction in any future communication. + +Good luck. diff --git a/domain/sap10_calculator/docs/NEXT_AGENT_PROMPT_POST_S0380_130.md b/domain/sap10_calculator/docs/NEXT_AGENT_PROMPT_POST_S0380_130.md new file mode 100644 index 00000000..4aedb023 --- /dev/null +++ b/domain/sap10_calculator/docs/NEXT_AGENT_PROMPT_POST_S0380_130.md @@ -0,0 +1,197 @@ +# Next-agent prompt — post S0380.130 + +You are picking up on branch `feature/per-cert-mapper-validation` at +**HEAD `c8486077`**. The previous session built a controlled-variable +heating-systems corpus (1 property × 41 heating variants), unblocked +all 41 to cascade-execute through 4 spec-cited closures, landed a +permanent residual-pin regression test, and routed the Elmhurst +mapper for oil mains via §15.0 Water Heating Fuel Type. Extended +handover suite: **874 pass, 0 fail**. + +## Read these first + +In order, before any tool call: + +1. [`HANDOVER_POST_S0380_130.md`](HANDOVER_POST_S0380_130.md) — full + state at HEAD `c8486077`, S0380.131 plan + evidence, all open + residuals. +2. [`HANDOVER_POST_S0380_124.md`](HANDOVER_POST_S0380_124.md) — prior + state at HEAD `1e69bd39` (cert 0240 deferred + handover hypotheses + ranking — note the prior hypothesis ranking was disproved during + the S0380.130 investigation). + +## Load these memories before starting + +``` +project-heating-systems-corpus # full corpus state + 41 residual pins +project-oil-price-spec-divergence # S0380.131 plan + evidence +project-cert-000565-recovery-state # per-slice history (legacy log) +feedback-sap-10-2-only-never-10-3 # CRITICAL — never reference SAP 10.3 +feedback-worksheet-not-api-reference # worksheet PDF is source of truth +feedback-spec-citation-in-commits # quote spec + page in commits +feedback-verify-handover-claims # verify numeric claims against PDFs +feedback-zero-error-strict # never widen tolerances; re-pin smaller +feedback-commit-per-slice # one slice = one commit +feedback-aaa-test-convention # literal # Arrange / # Act / # Assert +feedback-e2e-validation-philosophy # abs=1e-4 pins +feedback-abs-diff-over-pytest-approx # abs(x-y) <= tol +feedback-spec-floor-skepticism # verify "precision floor" against PDFs +feedback-golden-residuals-near-zero # pins shrink toward zero +feedback-one-e-minus-4-across-the-board # 1e-4 bar for HP certs too +reference-unmapped-sap-code # calculator strict-raise pattern +reference-unmapped-api-code # mapper strict-raise pattern +project-sap10-ml-deprecation # domain/sap10_ml/ is retiring +``` + +## Verify baseline first + +```bash +PYTHONPATH=/workspaces/model python -m pytest \ + backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \ + backend/documents_parser/tests/test_heating_systems_corpus.py \ + backend/documents_parser/tests/test_elmhurst_extractor.py \ + backend/documents_parser/tests/test_elmhurst_end_to_end.py \ + domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \ + domain/sap10_calculator/worksheet/tests/test_heat_transmission.py \ + domain/sap10_calculator/worksheet/tests/test_internal_gains.py \ + domain/sap10_calculator/worksheet/tests/test_solar_gains.py \ + domain/sap10_calculator/worksheet/tests/test_dimensions.py \ + domain/sap10_calculator/worksheet/tests/test_rating.py \ + domain/sap10_calculator/worksheet/tests/test_ventilation.py \ + domain/sap10_calculator/worksheet/tests/test_appendix_h_solar.py \ + domain/sap10_calculator/worksheet/tests/test_mev.py \ + domain/sap10_calculator/rdsap/tests/test_cert_to_inputs.py \ + domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \ + domain/sap10_calculator/tests/test_pcdb_table_322_lookup.py \ + domain/sap10_calculator/tests/test_pcdb_table_329_lookup.py \ + domain/sap10_calculator/tests/test_table_12a.py \ + --no-cov -q +``` + +Expected: **874 pass, 0 fail**. + +## The queued task — S0380.131 (heating-oil unit price) + +**The user agreed to a two-slice plan to investigate oil 1's residual. +S0380.130 (mapper) landed first. S0380.131 (cascade price) is up +next, but the user wants it presented as a DISTINCT task — not a +follow-on to S0380.130.** + +### Evidence (verbatim from S0380.130 investigation) + +| Source | Heating oil p/kWh | Heating oil CO2 | +|---|---:|---:| +| SAP 10.2 spec PDF Table 12 p.191 | 4.94 | 0.298 | +| **RdSAP 10 spec PDF** Table 32 p.95 | **7.64** | 0.298 | +| `domain/sap10_calculator/tables/table_32.py` | 7.64 | 0.298 | +| **Elmhurst P960 worksheet** for oil 1 / oil pcdb 1/3 | **5.44** | 0.298 | +| **Cert 0240** gov.uk register, back-solved from SAP 73 | **~5.48** | matches | + +Two independent implementations (Elmhurst worksheet + the gov.uk +register's lodging software) agree on **5.44 p/kWh** for heating +oil. The published RdSAP 10 spec PDF (7.64) is the outlier. + +Per [[feedback-worksheet-not-api-reference]] the worksheet PDF is +the source of truth. Per [[feedback-spec-floor-skepticism]] don't +accept the spec-vs-worksheet gap without verification. + +### Before implementing — investigate further + +1. Read the BRE technical papers at + `domain/sap10_calculator/docs/specs/sap10 technical papers/` + for any RdSAP 10 errata or fuel-price update relevant to the 5.44 + vs 7.64 discrepancy. Specifically look for STPs touching Table 32 + or fuel prices. +2. Check if RdSAP 10 has a newer spec revision than `10-06-2025` in + `domain/sap10_calculator/docs/specs/`. +3. Verify the Elmhurst worksheet's heating-oil price across more + variants: oil 2 (HVO) uses 7.64; oil 3/4 (FAME) use 7.64; only + oil 1 + oil pcdb 1/3 use 5.44. So Elmhurst clearly distinguishes + them — it's the heating-oil row specifically that uses 5.44. + +### Implementation plan (after investigation) + +If the worksheet value 5.44 is empirically canonical: + +1. **Failing test**: pin an oil-cert cascade SAP_c at the worksheet + value — e.g. oil 1 to ~+0.6 ΔSAP_c (instead of −9.70). +2. **Implement**: change + `domain/sap10_calculator/tables/table_32.py` `UNIT_PRICE_P_PER_KWH` + entry for code 4 (heating oil): 7.64 → 5.44. +3. **Consider**: should bio-FAME (code 73) also flip from 5.44 → 7.64 + (matching worksheet's FAME treatment for oil 3/4)? Empirically + yes; if so add as part of the same slice. +4. **Re-pin** the 4 corpus oil variants in + `test_heating_systems_corpus.py` to the new (smaller-magnitude) + residuals. +5. **Re-pin** cert 0240 + cert 0390 in + `test_golden_fixtures.py` to the new residuals. +6. **Verify** cohort fixtures (000474..000516, 000565, ASHP cohort) + are all gas/HP — none oil-fired, so unaffected. Run extended + handover suite to confirm. +7. **Commit** S0380.131 with verbatim worksheet PDF evidence + cert + 0240 back-solve as the citation. The spec PDF doesn't support + the value, so the empirical citation is what carries the slice. + +### Projected impact + +| Cert | Current ΔSAP_c | After 7.64 → 5.44 | +|---|---:|---:| +| oil 1 corpus | −9.70 | ~+0.6 (closes) | +| oil pcdb 1/2 corpus | −11.63 | ~−1 | +| oil pcdb 3 corpus | −10.87 | ~−1 | +| pcdb 1 corpus | −9.41 | ~+1 | +| **cert 0240 golden** | **−10 SAP int** | **~0 (closes exactly to lodged 73)** | +| cert 0390 golden | −6 | improves significantly | + +### Important: don't conflate S0380.130 and S0380.131 + +The user noted explicitly: **the mapper fix (S0380.130) and the +price fix (S0380.131) are distinct**. S0380.130 closed an Elmhurst +mapper coverage gap; it doesn't affect cert 0240 (which uses the +API mapper). S0380.131 changes the cascade tariff; it affects every +oil-heated cert whose cost passes through the cascade. + +Don't present them as a chain ("we fixed the mapper, now let's fix +the price"). They're independent bugs that happen to both involve +oil. + +## After S0380.131 — what's next + +The corpus residual cluster still has work after the oil price +closes: + +| ΔSAP_c | Variant | Likely cause | +|---|---:|---| +| +0.87 | solid fuel 8 | smallest residual — diagnose first | +| +1.16 | community heating 2/4 | gas-fired heat network | +| +3.79 | solid fuel 5 | solid-fuel cluster | +| −6.87 | community heating 6 | only negative — heat-pump heat network | +| +21.94 | no system | SAP code 699 | +| +120.75 | oil 5 (pathological) | bioethanol; worksheet clamps SAP int to 1 | + +User direction at end of last session: investigate the smallest +residual first (`solid fuel 8` +0.87), the community-heating cluster +(envelope-identical pairs 1↔3 and 2↔4 — clean comparison), or the +lone negative outlier (`community heating 6`). + +## What NOT to do + +- **Don't reference SAP 10.3** ([[feedback-sap-10-2-only-never-10-3]]) +- **Don't widen pin tolerances** to make pins pass — re-pin smaller +- **Don't re-investigate closed work** — Slices .91..130 all settled +- **Don't add new helpers to `domain/sap10_ml/`** — on the deprecation path +- **Don't conflate the mapper fix with the price fix** — they're distinct +- **Don't accept "spec-precision floor" framing** without verification + +## Memory hygiene + +After each slice: + +1. Update `project-heating-systems-corpus` (per-variant residual table). +2. Update `MEMORY.md` — keep the HEAD pointer current. +3. If S0380.131 lands and cert 0240 closes, update + `project-cert-000565-recovery-state` to reflect the new golden + residuals. + +Good luck.