diff --git a/domain/sap10_calculator/docs/HANDOVER_FRESH_API_DEBUG.md b/domain/sap10_calculator/docs/HANDOVER_FRESH_API_DEBUG.md new file mode 100644 index 00000000..d6859499 --- /dev/null +++ b/domain/sap10_calculator/docs/HANDOVER_FRESH_API_DEBUG.md @@ -0,0 +1,171 @@ +# Handover — fresh-API cross-comparison + flagged-cert debugging + +Point-in-time note. Start from [`AGENT_GUIDE.md`](AGENT_GUIDE.md) for methodology, the +1e-4 bar, the per-line debugging loop, the section helpers, and the suite command. + +- **Branch:** `feature/per-cert-mapper-validation` +- **HEAD:** `f895dd3a` (S0380.217). Confirm with `git rev-parse HEAD`. +- **Baseline (AGENT_GUIDE §4 suite):** `tests/domain/sap10_calculator/ backend/documents_parser/tests/` + → green (2388 passed, 1 skipped at HEAD; the golden + worksheet pins all pass). +- **Next slice number:** **S0380.218**. +- **Pre-existing failures (NOT yours, out of scope):** + - `domain/sap10_ml/tests/test_rdsap_uvalues.py` — 2 stone-§5.6 thin-wall failures + (granite + sandstone band A, 3.7408 vs Table-6 1.7 cap). Run this suite when you touch + `rdsap_uvalues.py`. + - `datatypes/epc/domain/tests/test_from_rdsap_schema.py::TestFromRdSapSchema21_0_1::test_total_floor_area` + (145.82 vs 45.82) — fails at original HEAD `ec64c39d` too. This file is NOT in the §4 + suite command. + +--- + +## ★ THE TASK — fetch fresh from the EPC API and debug, with worksheet cross-comparison + +The previous session drove the **golden-fixtures cascade** (`cert_to_inputs` → +`calculate_sap_from_inputs`) and concluded that the three then-flagged certs (7536, 2130, +0240) are "0240-like" — API-only residuals not reproducible from the register JSON. The +user pushed back ("going around in circles"), and the right next move is **fresh raw-API +data + worksheet triples**, not more simulated worksheets. + +### Part 1 — two NEW certs with API + Summary + worksheet (cross-comparison) + +The user has **two certs that have all three artifacts**: the GOV.UK API JSON, the Elmhurst +**Summary** PDF (site notes / input), and the Elmhurst **worksheet** PDF (the `(1)..(286)` +ground truth). These are gold — they let you run BOTH front-ends (`from_api_response` and +`from_elmhurst_site_notes`) through the same cascade and pin **both** against the worksheet +at 1e-4. The user will provide the cert numbers + drop the PDFs. For each: + +1. Fetch the API JSON (see **Fetching** below). +2. Run API path → cascade; run Summary path → cascade; pin **both** vs the worksheet line + refs (`pdftotext -layout` the worksheet; compare `(27)/(28a)/(29a)/(30)/(33)/(36)/(45)m/ + (62)/(233a)/(233b)/(258)…`). Cross-mapper parity: the two paths must agree to 1e-4 AND + match the worksheet (memory `feedback_cross_mapper_parity_via_cascade`). +3. The **first diverging line ref localises the bug** (AGENT_GUIDE §3): value present in + worksheet but cascade 0/wrong → calculator; input field absent in `epc` → mapper or + extractor. Fix one cause = one slice. + +### Part 2 — six flagged certs to fetch fresh and debug + +The user's dashboard flags these (their numbers, **sign = lodged − our**): + +| cert | lodged | their "our" | their Δ | +|---|---|---|---| +| 0240-0200-5706-2365-8010 | 73 | 71.73 | +1.27 | +| 0390-2954-3640-2196-4175 | 60 | 66.85 | −6.85 | +| 2130-1033-4050-5007-8395 | 82 | 83.35 | −1.35 | +| 6035-7729-2309-0879-2296 | 70 | 67.81 | +2.19 | +| 7536-3827-0600-0600-0276 | 68 | 69.07 | −1.07 | +| 9390-2722-3520-2105-8715 | 67 | 71.24 | −4.24 | + +### ⚠ CRITICAL — reconcile the numbers FIRST, before debugging + +**The user's flagged numbers DO NOT match the golden-fixtures cascade.** All six certs are +already golden fixtures (`tests/domain/sap10_calculator/rdsap/fixtures/golden/.json`), +and the cascade gives different values: + +- **0390-2954-3640 is pinned at resid +0** (our cascade = 60, EXACTLY lodged) — but the user + flags it at **66.85 (−6.85)**. A 6.85 SAP gap can't be staleness. +- 7536 (their 69.07) and 2130 (their 83.35) are **pre-this-session** values — the S0380.214 + roof fix moved 7536 → 68.924, and the S0380.215 wall fix moved 2130 → 83.78. + +So the user's numbers come from a **different computation** than the golden cascade. Two +hypotheses, test both before assuming the cascade is wrong: + +1. **Fresh API JSON ≠ curated fixture.** The golden fixtures were bulk-fetched once + (`scripts/fetch_cohort2_api_jsons.py`, which *skips certs whose JSON already exists`) and + some may have been hand-corrected since. **Fetch each cert fresh and `diff` the raw JSON + against the committed fixture.** If they differ, the fixture was curated and the fresh raw + data is what the user's pipeline sees — debug the FRESH data. This is the most likely cause + and exactly why the user wants a fresh fetch. +2. **A different SAP engine.** The production stack (`backend/SearchEpc.py` → + `etl/epc_clean/epc_attributes/*` → `backend/engine/engine.py`) is a SEPARATE mapping + + scorer from `cert_to_inputs`. If the user's dashboard is produced there, that's a different + code path than the golden cascade. Ask the user which pipeline the table came from. + +Do NOT start "fixing" the cascade to hit the user's numbers until you know which pipeline +produced them. The golden cascade is worksheet-validated for 47 certs; chasing a dashboard +number from a different stack would regress it. + +--- + +## Fetching from the EPC API + +Token lives in `backend/.env` as `OPEN_EPC_API_TOKEN` (also `EPC_AUTH_TOKEN`). The exact +mechanism (from `scripts/fetch_cohort2_api_jsons.py`): + +```python +import httpx, os +from dotenv import load_dotenv +from infrastructure.epc_client.epc_client_service import EpcClientService +load_dotenv("backend/.env") +token = os.environ["OPEN_EPC_API_TOKEN"] +resp = httpx.get( + f"{EpcClientService.BASE_URL}/api/certificate", + params={"certificate_number": ""}, + headers={"Authorization": f"Bearer {token}", "Accept": "application/json"}, + timeout=EpcClientService.REQUEST_TIMEOUT, +) +payload = resp.json()["data"] # <- this is the schema-21 JSON the mapper consumes +``` + +`EpcPropertyDataMapper.from_api_response(payload)` only supports `schema_type` +`RdSAP-Schema-21.0.0` / `21.0.1`; it raises for others. The persisted golden fixture IS this +`data` payload. So `diff <(fresh)` vs the committed fixture is apples-to-apples. + +--- + +## Per-cert notes carried from the previous session (verify against FRESH data) + +- **7536 (+1)** — roof bug fixed (S0380.214: as-built sloping ceiling → Table 18 col 3). + Every per-element U matches Elmhurst (cases 15-17 worksheets). Concluded 0240-like; cont + 68.924. +- **2130 (+2)** — dropped measured wall insulation captured (S0380.215 → Table 8 U=0.32), + which **exposed** the true residual (the +1 was two offsetting bugs). PV β-split **proven + exact** vs simulated case 18 worksheet (onsite 970.77 / export 1713.40 to the decimal). + Gas PE factor exact (1.13). Concluded 0240-like; cont 83.78. +- **0240 (−1)** — export-dropped 2013+ circulation-pump age (115 vs 41 kWh); WWHRS confirmed + inert (`shower_wwhrs=1` is the universal default across all 47 certs). User previously + decided NOT to re-pin. Concluded 0240-like. +- **0390-2954-3640** — pinned +0 (oil combi, Table 3a row 1). The user's −6.85 flag is the + reconciliation mystery above — START HERE; it's the clearest signal of a fresh-vs-fixture + or different-engine gap. +- **6035** — see memory `project_golden_coverage_state`: a user-simulated 6035 worksheet + closed to 1e-4, but "6035 remaining +19 PE needs its own worksheet"; flagged +2.19 SAP. +- **9390** — community heat-network (S0380.212/.213 fixed the fuel-code collision + standing + charge); left at SAP −2 with a documented ~7% demand over-count (heat-source-eff default?). + Unpinned/retired. The user's −4.24 may be the same demand over-count on fresh data. + +--- + +## What this session shipped (commits `ec64c39d..f895dd3a`) + +| slice | what | +|---|---| +| **S0380.214** | As-built "Pitched, sloping ceiling" (code 8) roof → RdSAP 10 Table 18 col (3) (band F 0.40→0.68, L 0.16→0.18) per §5.11 item 5-5 + note (b). Code-5 vaulted stays col (1) (cohort). Worksheet-validated (sim case 15). Re-pinned 7536. | +| **S0380.215** | Captured dropped `wall_insulation_thickness_measured` (schema 21 didn't declare it → `from_dict` dropped it). 2130 Ext1 "measured"/100 mm → RdSAP Table 8 U=0.32 (was 0.55 default). Exposed 2130's true +2 residual. | +| **S0380.216** | Extractor: handle pdftotext wrapping the §11 glazing-GAP column onto the glazing-TYPE token ("…16 mm or [1st]"). Fallback strip AFTER the direct lookup (preserves explicit interleaved keys). Unblocked running the cascade on hand-entered worksheet Summaries. | +| **S0380.217** | Captured dropped `wall_insulation_thermal_conductivity` (schema → domain → mapper) and wired it into `u_wall`'s §5.8 λ resolver. Code 1 = default 0.04; unmapped codes raise. Zero cascade effect today (2130's §5.8 path doesn't fire). | +| 3× docs | finalised 7536 / 2130 as 0240-like; corrected diagnoses. | + +**Audit method that found the dropped fields** (reuse it on the fresh certs): recursively +compare raw JSON keys against the parsed schema dataclass fields — anything in the JSON but +not a declared field is silently dropped by `from_dict`. The two real drops (2130's measured +wall insulation + thermal conductivity) came from this. Re-run it on the fresh fetches; new +certs may surface new dropped fields. + +--- + +## Conventions (unchanged) + +One cause = one slice = one commit; spec citation (page + line) in the message; AAA tests +(`# Arrange / # Act / # Assert`); assert with `abs(x - y) <= tol` (not `pytest.approx`); +SAP 10.2 only; no tolerance widening / xfail / rel-tol. New code passes pyright strict with +ZERO NEW errors — baseline-compare with `git stash` + `PYRIGHT_PYTHON_FORCE_VERSION=latest` +(mapper.py / cert_to_inputs.py / heat_transmission.py / rdsap_uvalues.py carry pre-existing +errors; compare counts). Stage files by name — the working tree has pre-existing unrelated +changes to `pytest.ini` / `scripts/` that must NOT be staged. +`Co-Authored-By: Claude Opus 4.8 `. + +When you re-pin a golden cert, update `expected_sap_resid` (±0), `expected_pe_resid_kwh_per_m2` +(±0.01) and `expected_co2_resid_tonnes_per_yr` (±0.001) to the exact post-fix values and +append a slice note to the cert's `notes:` explaining the cause + spec/worksheet citation. +Run the full §4 suite as the blast-radius check after any fabric/factor change.