diff --git a/domain/sap10_calculator/docs/NEXT_AGENT_PROMPT.md b/domain/sap10_calculator/docs/NEXT_AGENT_PROMPT.md index f885b9cf..d6e7f7fa 100644 --- a/domain/sap10_calculator/docs/NEXT_AGENT_PROMPT.md +++ b/domain/sap10_calculator/docs/NEXT_AGENT_PROMPT.md @@ -1,365 +1,57 @@ -# Handover — Per-cert validation workflow, 9 new triples staged +# Next-agent prompt — PV β-split slices 4-6 -You are picking up branch `feature/per-cert-mapper-validation` -(off main at `7fba27a7`, where the prior `ara-backend-design-prd` -work was merged via PR #1123). The user has shifted focus from -"close cert 001479 to 1e-4" (done — Slice 95) to "validate the -API mapper against more cert pairs to surface remaining mapping -gaps". 9 new (Summary + worksheet + API) triples have been -provided. The mapping is acknowledged-incomplete; expect many -mapper-completion slices. +Branch: `feature/per-cert-mapper-validation`. HEAD: `beb0db95` (docs commit on top of S0380.46 `5b269f23`). -## The user's stated workflow (verbatim) +Read `domain/sap10_calculator/docs/HANDOVER_PV_BETA_SPLIT.md` end-to-end before any tool call. It has the full state, the 3 slices shipped (S0380.44 → S0380.46), the residual table showing where each cert sits, and the concrete plans for the 3 remaining slices. -> we pick one [cert], we then pass the Elmhurst summary document to -> `EpcPropertyDataMapper` to map the site notes data to -> `EpcPropertyData`, we then pass to the SAP calculator. If the -> output of the SAP calculator matches the SAP worksheet correctly, -> we know we have correctly mapped the EpcPropertyData. We then get -> the API response, map to `EpcPropertyData` using -> `EpcPropertyDataMapper`, then check if we have the same -> `EpcPropertyData` as the summary report (or same for the fields we -> care about). We also check we get the same result. -> -> The `EpcPropertyData` objects matching is our signal that we've -> done things correctly. So this validates our mapping. +## My directives -Translation: Summary path proves itself against the worksheet → -becomes the canonical reference for the API path. This is Layer 2 + -Layer 3 + Layer 4 of the validation stack. +The PV β-split work is a 6-slice plan; 3/6 are shipped. Continue: -## State at session start (this handover's baseline) +1. **Slice 4 (S0380.47) — cost cascade β wiring.** [fuel_cost.py:182](../worksheet/fuel_cost.py) currently does `pv_credit = -pv_generation × pv_export_credit_gbp_per_kwh` — treats ALL PV as exported at 13.19 p/kWh. Per Appendix M1 §6, onsite-consumed PV should bill at the IMPORT price (Table 12a standard tariff ~18 p, or weighted high/low if off-peak meter). The β infrastructure is already in place (`pv_dwelling_kwh_per_yr` + `pv_exported_kwh_per_yr` on CalculatorInputs from Slice 2). Add a new `pv_dwelling_import_price_gbp_per_kwh` field, wire it in `cert_to_inputs` using the same off-peak meter logic as `_space_heating_fuel_cost_gbp_per_kwh`, and split the credit in fuel_cost.py. -Most recent commits (`sap10_calculator` + `sap10_ml` are now at the -repo root; `packages/domain/src/domain/` was removed): + This is the riskiest slice because every PV cert's SAP rating shifts. The cohort-1 + cohort-2 chain-test 1e-4 pins will need re-pinning — expect small Δ (~0.02-0.05 SAP per cert) so the new pins will still be tight against worksheets. Re-pin AS PART OF SLICE 4 so the suite stays green between commits. -``` -6dc11e4d fix: resolve 10 remaining test_summary_pdf_mapper_chain failures -09fb6f1b fix: address 22 project-wide test failures from previous sweep -a7b08a4e refactor: move docs/sap-spec/ contents into domain/sap10_calculator/ -960130b0 deleted redundant packages folder -68401c51 refactor: lift-and-shift packages/domain/src/domain/ml → domain/sap10_ml -29ac35cc refactor: lift-and-shift packages/domain/src/domain/sap → domain/sap10_calculator -... (87b6045c "fixed merge conflicts from main", 168e7f18, 94975f3b deletions) -a75052dc chore: commit cert 001479 fixture + RdSAP/PCDF spec PDFs -f502db8c Slice 95: API mapper TFA from per-bp dims + window area 2dp rounding -``` +2. **Slice 5 (S0380.48) — E_PV magnitude audit.** The 7-cert ASHP+5kWh-battery cohort (0350/0380/2225/2636/3800/9285/9418) overshoots PE by +2.7..+8.1 because the cascade computes E_PV ≈ 3× the worksheet's value. For cert 0380: cascade thinks 2570 kWh/yr, worksheet uses 831 kWh/yr. Either `peak_power=3` in the API JSON is in non-kWp units, the cascade's S lookup is wrong, or ZPV is mis-mapped. -Folder structure post-migration: + Concrete probes (in handover §"Slice 5 plan"): + - Compare cert 0380's API `peak_power=3` against the Elmhurst Summary PDF Section 19 for the same cert + - Compute cascade S for orientation=South, pitch=45°, overshading=1=None — compare to SAP Appendix U3.3 spec value (expected ~1100 kWh/m²/yr UK avg) + - Verify Table M1 ZPV[1] = 1.0 against spec + - Empirical test: set cert 0380 `peak_power = 1.0` and check if residuals close -``` -domain/ (PEP 420 namespace; no __init__.py) -├── addresses/, postcode.py, tasks/ -├── sap10_calculator/ ← was packages/domain/src/domain/sap/ -│ ├── calculator.py, climate/, rdsap/, tables/, validation/, worksheet/ -│ ├── docs/ ← was docs/sap-spec/ -│ │ ├── HANDOVER_NEXT.md, SAP_CALCULATOR.md -│ │ ├── NEXT_AGENT_PROMPT.md ← this file -│ │ └── specs/ ← RdSAP 10, SAP 10.2 + 10.3, PCDF spec PDFs -│ └── tables/pcdb/data/ ← pcdb10.dat + 7× pcdb_table_*.jsonl -└── sap10_ml/ ← was packages/domain/src/domain/ml/ -``` + If it's a kWp interpretation bug, surface via the schema or API mapper. -`Path(__file__).parents[N]` indices were rebased through the move -(delta of 3); see `Dockerfile.test` (poppler-utils now installed for -test_summary_pdf_mapper_chain.py). +3. **Slice 6 (S0380.49) — final fixture re-pin + tolerance tightening.** Once Slices 4 + 5 ship, the ASHP cohort residuals should land near zero. Re-pin all affected golden fixtures; if the cluster lands tightly (~0.01 PE / ~0.001 CO2), tighten `_PE_ABS_TOLERANCE_KWH_PER_M2` / `_CO2_ABS_TOLERANCE_TONNES` accordingly per [[feedback-golden-residuals-near-zero]]. -## Test baselines you should see at HEAD `6dc11e4d` +## Conventions preserved (carry forward) -```bash -PYTHONPATH=/workspaces/model python -m pytest \ - backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \ - domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \ - domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \ - --no-cov -q -# Expect: 17/0 in mapper-chain + Layer 1 baseline + golden residual baseline -``` +- 1e-4 across the board ([[feedback-one-e-minus-4-across-the-board]]) +- Worksheet, not API, is the chain-test target ([[feedback-worksheet-not-api-reference]]) +- Cross-mapper parity via cascade ([[feedback-cross-mapper-parity-via-cascade]]) +- Spec-floor skepticism ([[feedback-spec-floor-skepticism]]) +- Bigger slices OK for uniform work ([[feedback-bigger-slices-for-uniform-work]]) +- Golden residuals → ~0 ([[feedback-golden-residuals-near-zero]]) +- AAA test convention + `abs(diff) <= tol` ([[feedback-aaa-test-convention]], [[feedback-abs-diff-over-pytest-approx]]) +- Spec citation in commit messages ([[feedback-spec-citation-in-commits]]) +- One slice = one commit; stage by name; re-pin shifted fixtures IN SAME SLICE so suite stays green ([[feedback-commit-per-slice]]) +- Pyright net-zero per touched file +- Strict-enum raises on unmapped labels -Wider domain sweep (1654 / 20 baseline): 9 hand-built 001479 -skeleton + 10 cohort Layer 1 pins + 1 heat_transmission edge case -= 20 RED, all pre-existing and orthogonal to mapper work. +## First concrete actions -**Layer 4 production gate**: -`test_api_001479_full_chain_sap_matches_worksheet_pdf_exactly` — -**GREEN at < 1e-4**. Keep it green. +1. Re-run the diagnostic baseline at the bottom of `HANDOVER_PV_BETA_SPLIT.md` to confirm **763 pass + 0 fail** at HEAD. -## The new test data +2. Start Slice 4 by reading [fuel_cost.py:182](../worksheet/fuel_cost.py) and the existing `_space_heating_fuel_cost_gbp_per_kwh` in cert_to_inputs.py to understand the off-peak meter price-resolution logic. Mirror that pattern for the dwelling IMPORT price. -Location: `sap worksheets/Additional data with api//` +3. After Slice 4 lands and chain tests are re-pinned: Slice 5's first probe is comparing cert 0380's API `peak_power` against the Summary PDF lodgement. The golden-fixture cert 0380 is `0380-2471-3250-2596-8761`; its Summary PDF + dr87 worksheet live in `backend/documents_parser/tests/fixtures/` — Section 19 of the Summary carries the PV array lodgement. -Each folder is named by the GOV.UK EPB certificate number. Contains: +4. Slice 6 wraps up — re-pin, verify, document. -- `Summary_NNNNNN.pdf` — Elmhurst-format site notes -- `dr87-0001-NNNNNN.pdf` — worksheet (`dr87-` prefix is a Domna-tool - variant; same shape as the `P960-` worksheet for cert 001479) +## Architecture lessons that landed this session (load-bearing) -The API JSON is **not** in the folder — fetch from GOV.UK EPB using -the cert-ref: +- **β-split shape is uniform across PE / CO2 / Cost.** Each cascade had the same bug — credit ALL PV at one rate (IMPORT for PE; missing for CO2; EXPORT for cost). The spec-correct fix is uniformly onsite-at-IMPORT + exported-at-EXPORT. `CalculatorInputs.pv_dwelling_kwh_per_yr` + `pv_exported_kwh_per_yr` are shared cross-cascade state; each cascade adds its own factor-pair fields. `None` falls back to legacy single-rate for synthetic test constructions. +- **The EXPORT factor is Table 12 code 60** ("electricity sold to grid, PV") at all three cascades — already in `domain/sap10_calculator/tables/table_12.py` for PE (0.501) and CO2 (monthly Table 12d). For Slice 4 cost, you'll reference the same code 60 export-price from Table 12a (typically 13.19 p/kWh for the spec price set). +- **Cert 9501 is the validation pin.** It has PV but no battery, and its PE + CO2 residuals both closed to ~0 after Slices 2-3. Any future cascade refactor must keep cert 9501 closed. -```python -from backend.epc_client.epc_client_service import EpcClientService -from dotenv import load_dotenv -import os -load_dotenv('/workspaces/model/backend/.env') # OPEN_EPC_API_TOKEN -svc = EpcClientService(auth_token=os.environ['OPEN_EPC_API_TOKEN']) -raw = svc._fetch_certificate('') # raw JSON dict -``` - -Note: use `OPEN_EPC_API_TOKEN` not `EPC_AUTH_TOKEN` (the latter is -for a different/legacy API). - -### 9 cert references + heating type + worksheet SAP - -| Cert ref | Worksheet | Heating | PCDB idx | Worksheet SAP | TFA | bps | Dwelling | -|---|---|---|---|---|---|---|---| -| `0330-2249-8150-2326-4121` | 000897 | **Mains gas boiler** | 10241 | 61.5993 | 69.14 | 2 | Mid-terrace house | -| `0350-2968-2650-2796-5255` | 000903 | ASHP | 104568 | 84.1367 | 90.54 | 2 | Mid-terrace house | -| `0380-2471-3250-2596-8761` | 000899 | ASHP | 104568 | 88.5104 | 60.43 | 1 | Semi-detached bungalow | -| `2225-3062-8205-2856-7204` | 000900 | ASHP | 104568 | 88.7921 | 82.49 | 1 | End-terrace house | -| `2636-0525-2600-0401-2296` | 000901 | ASHP | 104568 | 86.2641 | 82.10 | 1 | Mid-terrace house | -| `3800-8515-0922-3398-3563` | 000898 | ASHP | 104568 | 86.1458 | 81.34 | 2 | Mid-terrace house | -| `9285-3062-0205-7766-7200` | 000902 | ASHP | 104568 | 84.1369 | 85.90 | 1 | End-terrace house | -| `9418-3062-8205-3566-7200` | 000896 | ASHP | 102421 | 84.6305 | 74.37 | 3 | End-terrace house | -| `9501-3059-8202-7356-0204` | (RR cert — newest, added late in session) | **Mains gas boiler** | 19007 | (not measured) | — | — | Top-floor flat | - -**Heating-type split**: -- 2 mains gas boilers: 0330, 9501 (validated mapper territory) -- 7 ASHPs: 0350, 0380, 2225, 2636, 3800, 9285, 9418 (**brand-new - mapper territory — never validated**) - -One earlier mismatch — cert 0330's folder originally held the wrong -property's Summary/worksheet (17 vs 21 Summerfield Road); the user -fixed mid-session and Summary_000897/dr87-0001-000897 now match -cert ref 0330 correctly. The other 8 were audited and match. - -## Major scope discovery — Heat Pumps - -7 of the 9 new certs are Air Source Heat Pumps (predominantly PCDB -index 104568, one model 102421). The mapper has never been -validated against a heat-pump cert — cohort certs + cert 001479 are -all mains-gas boilers. - -**Cert 0380 (initial pilot attempt) showed catastrophic failures**: - -| Path | Cascade SAP | Δ vs worksheet 88.5104 | -|---|---|---| -| Summary mapper | 18.08 | **-70.43** | -| API mapper | 70.14 | **-18.37** | - -Diff: Summary identified the heat pump as an 80%-efficient boiler -(catastrophic); API correctly identified it as a heat pump with -COP=2.3 but cascade output still −18 SAP below worksheet (fabric -HLC 104 vs probably ~50 needed). The Summary mapper is -fundamentally broken on heat pumps; the API mapper is -partially-broken. - -**Recommendation**: defer the heat-pump certs until the boiler -workflow is proven. Closing 7 ASHP certs is plausibly a 15-30 slice -workstream (new mapper plumbing for PCDB COP, electric tariff -costing for HW + space heating, Appendix N heat-pump efficiency -adjustments, etc.). Cert 0380 (smallest TFA bungalow, single bp) -is the pilot HP cert once boiler workflow is proven. - -## Pilot status — cert 0330 (mains-gas mid-terrace boiler) - -Same shape as cert 001479 (proven). API JSON staged at -`domain/sap10_calculator/rdsap/tests/fixtures/golden/ -0330-2249-8150-2326-4121.json` (**uncommitted**). Summary PDF -copied to -`backend/documents_parser/tests/fixtures/Summary_000897.pdf` -(**uncommitted**). - -### Cascade SAP comparison - -| Path | Cascade SAP | Δ vs worksheet 61.5993 | -|---|---|---| -| Summary mapper | 62.0660 | **+0.4667** (just over 0.5) | -| API mapper | 63.7446 | **+2.1453** (≥2 SAP off) | -| Δ API↔Summary | +1.6786 | (mapper paths disagree) | - -### Cascade-component diff (API vs Summary) - -``` -TFA: 90.56 = 90.56 ✓ -storeys: 2 = 2 ✓ -HLC walls: 113.535 ≈ 113.520 (Δ +0.015 — negligible) -HLC roof: 7.323 = 7.323 ✓ -HLC floor: 30.705 = 30.705 ✓ -HLC windows: 36.455 vs 29.741 (Δ +6.71 ← BIG) -HLC doors: 11.100 = 11.100 ✓ -HLC party: 11.357 = 11.357 ✓ -HLC bridge: 28.347 = 28.347 ✓ -HLC total: 238.822 vs 232.093 (Δ +6.73 — all from windows) -Inf ACH: 0.7382 = 0.7382 ✓ -HW kWh: 3172.65 vs 2112.00 (Δ +1060 ← BIG) -Lighting kWh: 207.92 = 207.92 ✓ -Main eff: 0.8850 = 0.8850 ✓ -``` - -Two specific gaps to investigate as separate slices: - -1. **Windows HLC +6.71 W/K** — likely `glazing_type=14` (cert 0330) - not in Slice 93's `_API_GLAZING_TYPE_TO_TRANSMISSION` (only codes - 3 and 13 are mapped). Same shape as cert 001479's - `glazing_type=2` issue; extending the dict should close this. - Affects multiple certs that use code 14. - -2. **HW kWh +1060 (API 3172 vs Summary 2112)** — substantial - divergence in §4 hot water cascade. Needs probe of which - subsystem (occupancy N, shower outlets, electric_shower_count, - cylinder, etc.) the API mapper is reading wrong. Cert 0330 - doesn't have the +0.5m upper-storey adjustment quirk cert 001479 - needed (Slice 92), so different root cause likely. - -(The user observed: "the mapping is very much incomplete (hence we -have some non 0 matches to elmhurst summary matches)" — non-1e-4 -matches are expected and tractable.) - -### 116 field-level divergences (API vs Summary) - -Most are cascade-equivalent surfacing differences (Slice 91-era -descriptive strings + int/None vs explicit-bool patterns) — the -same shape `_is_excluded_path` already handles for the cohort -certs. New specific concrete diffs that DO affect the cascade: - -- `sap_windows[*].window_transmission_details` — Summary has - explicit U/g/data_source; API has None for `glazing_type=14` - (cascade falls back to default U → too high) -- `sap_windows[*].frame_factor` — Summary 0.7, API None -- `sap_windows[*].window_width / window_height` — same w*h area - rounding pattern as cert 001479 (handled in Slice 95) - -## Workflow recommendation for next slice queue - -For each new cert (after cert 0330 pilot lands): - -1. **Stage**: fetch API JSON, copy Summary PDF into fixtures -2. **Probe**: run the cascade-component diff (recreate the inline - pattern; the probe takes both `summary_epc` and `api_epc`, lowers - via `cert_to_inputs`, diffs each subsystem) -3. **Localise** the biggest cascade-component delta -4. **Fix** the mapper to close it; one fix = one slice -5. **Add Layer 4 1e-4 test** when both Summary and API paths hit - worksheet at 1e-4 (cert may pass Summary path first, then - iterate API mapper to catch up) -6. **Commit**: stage by name (`git add `), cite spec page - when implementing a spec rule - -### Cohort-style fixture pattern - -If a cert benefits from a hand-built fixture (Layer 1), mirror the -cohort pattern at -`domain/sap10_calculator/worksheet/tests/_elmhurst_worksheet_NNNNNN.py` -— with prefix `_dr87_worksheet_NNNNNN.py` for the new Domna-tool -worksheet variant. - -**WARNING (lesson from previous session)**: the cohort hand-builts -encode non-spec quirks (e.g. `has_suspended_timber_floor=False` to -mirror the worksheet's non-spec §(12) behaviour for 4 certs). Don't -blindly trust the hand-builts as spec-correct; cross-check against -the mapper's spec-inference output before committing. - -## Conventions (preserved from previous handover) - -- **One slice = one commit** — stage by name. -- **AAA test convention** — literal `# Arrange / # Act / # Assert` - headers in every new test. -- **`abs(diff) <= tol`** not `pytest.approx` (strict-pyright clean). -- **1e-4 worksheet tolerance** when worksheet is available; ±0.5 - fallback only for API-only goldens. -- **Spec citation** in commit messages when a slice implements a - spec rule (quote RdSAP 10 / SAP 10.2/10.3 page reference). -- **Pyright net-zero per file**. Baselines (re-verify at session - start): - - `datatypes/epc/domain/mapper.py`: 33 - - `domain/sap10_calculator/worksheet/heat_transmission.py`: 13 - - `domain/sap10_calculator/rdsap/cert_to_inputs.py`: 35 - - `datatypes/epc/domain/epc_property_data.py`: 0 - -## First actions for the next agent - -1. Confirm HEAD: `git log --oneline -1` → `6dc11e4d`. -2. Re-baseline: - ```bash - PYTHONPATH=/workspaces/model python -m pytest \ - backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \ - domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \ - domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \ - --no-cov -q - ``` -3. Pick up cert 0330 pilot. Either continue from where I left off - (fixtures staged uncommitted, 2 specific gaps identified above) - OR pivot to a different boiler cert if 0330 turns out - problematic (cert 9501 is the other boiler — top-floor flat with - PCDB idx 19007). -4. Commit cert 0330's fixtures (API JSON + Summary PDF) as the - foundation slice before working any mapper fixes: - ```bash - git add domain/sap10_calculator/rdsap/tests/fixtures/golden/0330-2249-8150-2326-4121.json - git add backend/documents_parser/tests/fixtures/Summary_000897.pdf - git commit -m "chore: stage cert 0330 fixtures (boiler pilot, worksheet SAP 61.5993)" - ``` -5. Add a RED Layer 2 test (Summary mapper cascade SAP at 1e-4 - vs 61.5993) — establishes the failing target. Then fix the - Summary path mapper bugs slice-by-slice. -6. Once Summary path is GREEN, do the same for the API path (Layer - 4). The API mapper may need additional fixes Summary doesn't - need — they're independent paths into the same `EpcPropertyData` - shape. -7. After cert 0330 lands as a clean Layer 4 1e-4 pin, repeat for - cert 9501 (the other boiler). 2 boiler certs proven is much - stronger evidence than 1. -8. Then plan the heat-pump workstream. The 7 ASHP certs share a - PCDB index (104568) so much of the fix is likely shared. Write - a follow-up handover for that workstream specifically. - -## Heat-pump workstream sketch (deferred) - -When the user gives the go-ahead, work order: - -1. **API mapper**: surface `main_heating_index_number`, set - `main_heating_category` for HPs, `main_fuel_type=29` (electric - heat pump). -2. **Cascade**: ensure `cert_to_inputs._main_heating_efficiency` - reads PCDB HP COP correctly. Investigate Table 4a/4b vs PCDB - precedence for HPs. -3. **Fuel cost**: HW + space heating on electricity tariffs - (Table 12) — check if the cascade has electric-tariff fuel-cost - plumbing wired up. -4. **Appendix N**: HP-specific efficiency adjustments (climate + - flow temperature). Likely the biggest cascade-side gap. -5. **Summary mapper**: separate slice — needs to identify HPs from - the Summary PDF's heating section. - -## Open items / known gaps not yet addressed - -- 8 API-only golden cert residuals still range from 0 to -15 SAP - delta (cert 0240 is the outlier — see prior handover §4 and - `test_golden_fixtures.py` notes). The user's stated end goal is - <0.5 SAP error on all goldens; cert 0240 needs RR-description - parsing (or Room-in-Roof mapping investigation) + glazing_type=2 - surfacing. -- Layer 3 field-parity test - (`test_from_api_response_matches_from_elmhurst_site_notes_001479`) - still not written. Lower priority since cascade-output Layer 4 - already gates parity. -- The 4 cohort chain tests for non-spec §(12) certs were deleted - this session; if the user later sources spec-compliant - worksheets for 000474/000480/000487/000490, those tests can be - restored (with the spec-correct hand-builts). - -## Tooling shortcuts - -- **EPC fetch**: `OPEN_EPC_API_TOKEN` (NOT `EPC_AUTH_TOKEN`) in - `backend/.env`. `EpcClientService._fetch_certificate(cert_ref)` - returns the raw JSON dict. -- **Worksheet SAP extract**: `pdftotext -layout -` - then `grep -E "SAP value\s+[0-9]+\.[0-9]+"`. Works for all - `dr87-`, `P960-`, and `U985-` worksheet variants. -- **Cascade-component probe template**: see the cert-0330 probe - inline above; same shape as the cert-001479 probe. - -Good luck. The methodology is proven on cert 001479 and partially -on cert 0330 (boiler pilot 95% closed). Each new cert pair should -land in 1-5 mapper slices. Stage by name; one slice = one commit; -cite spec when implementing a spec rule. +Good luck. The β-implementation is spec-correct (cert 9501 proves it). Slices 4-5 surface the remaining bugs as forcing functions; Slice 6 finalises the closure.