docs: handover — fresh-API cross-comparison + flagged-cert debugging

Next-agent brief: fetch certs fresh from the EPC API (two new API+Summary+
worksheet triples for cross-mapper parity, plus six dashboard-flagged certs).
Flags the critical reconciliation: the user's flagged numbers don't match the
golden-fixtures cascade (0390-2954-3640 pinned +0 but flagged -6.85; 7536/2130
flags are pre-this-session), so fresh-raw-JSON-vs-curated-fixture or a
different engine must be reconciled before debugging. Documents the EPC API
fetch mechanism, the dropped-field audit method, this session's 4 fixes, and
the conventions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-06-04 12:08:28 +00:00
parent f895dd3ab7
commit 8bd8ff8e5c

View file

@ -0,0 +1,171 @@
# Handover — fresh-API cross-comparison + flagged-cert debugging
Point-in-time note. Start from [`AGENT_GUIDE.md`](AGENT_GUIDE.md) for methodology, the
1e-4 bar, the per-line debugging loop, the section helpers, and the suite command.
- **Branch:** `feature/per-cert-mapper-validation`
- **HEAD:** `f895dd3a` (S0380.217). Confirm with `git rev-parse HEAD`.
- **Baseline (AGENT_GUIDE §4 suite):** `tests/domain/sap10_calculator/ backend/documents_parser/tests/`
→ green (2388 passed, 1 skipped at HEAD; the golden + worksheet pins all pass).
- **Next slice number:** **S0380.218**.
- **Pre-existing failures (NOT yours, out of scope):**
- `domain/sap10_ml/tests/test_rdsap_uvalues.py` — 2 stone-§5.6 thin-wall failures
(granite + sandstone band A, 3.7408 vs Table-6 1.7 cap). Run this suite when you touch
`rdsap_uvalues.py`.
- `datatypes/epc/domain/tests/test_from_rdsap_schema.py::TestFromRdSapSchema21_0_1::test_total_floor_area`
(145.82 vs 45.82) — fails at original HEAD `ec64c39d` too. This file is NOT in the §4
suite command.
---
## ★ THE TASK — fetch fresh from the EPC API and debug, with worksheet cross-comparison
The previous session drove the **golden-fixtures cascade** (`cert_to_inputs`
`calculate_sap_from_inputs`) and concluded that the three then-flagged certs (7536, 2130,
0240) are "0240-like" — API-only residuals not reproducible from the register JSON. The
user pushed back ("going around in circles"), and the right next move is **fresh raw-API
data + worksheet triples**, not more simulated worksheets.
### Part 1 — two NEW certs with API + Summary + worksheet (cross-comparison)
The user has **two certs that have all three artifacts**: the GOV.UK API JSON, the Elmhurst
**Summary** PDF (site notes / input), and the Elmhurst **worksheet** PDF (the `(1)..(286)`
ground truth). These are gold — they let you run BOTH front-ends (`from_api_response` and
`from_elmhurst_site_notes`) through the same cascade and pin **both** against the worksheet
at 1e-4. The user will provide the cert numbers + drop the PDFs. For each:
1. Fetch the API JSON (see **Fetching** below).
2. Run API path → cascade; run Summary path → cascade; pin **both** vs the worksheet line
refs (`pdftotext -layout` the worksheet; compare `(27)/(28a)/(29a)/(30)/(33)/(36)/(45)m/
(62)/(233a)/(233b)/(258)…`). Cross-mapper parity: the two paths must agree to 1e-4 AND
match the worksheet (memory `feedback_cross_mapper_parity_via_cascade`).
3. The **first diverging line ref localises the bug** (AGENT_GUIDE §3): value present in
worksheet but cascade 0/wrong → calculator; input field absent in `epc` → mapper or
extractor. Fix one cause = one slice.
### Part 2 — six flagged certs to fetch fresh and debug
The user's dashboard flags these (their numbers, **sign = lodged our**):
| cert | lodged | their "our" | their Δ |
|---|---|---|---|
| 0240-0200-5706-2365-8010 | 73 | 71.73 | +1.27 |
| 0390-2954-3640-2196-4175 | 60 | 66.85 | 6.85 |
| 2130-1033-4050-5007-8395 | 82 | 83.35 | 1.35 |
| 6035-7729-2309-0879-2296 | 70 | 67.81 | +2.19 |
| 7536-3827-0600-0600-0276 | 68 | 69.07 | 1.07 |
| 9390-2722-3520-2105-8715 | 67 | 71.24 | 4.24 |
### ⚠ CRITICAL — reconcile the numbers FIRST, before debugging
**The user's flagged numbers DO NOT match the golden-fixtures cascade.** All six certs are
already golden fixtures (`tests/domain/sap10_calculator/rdsap/fixtures/golden/<cert>.json`),
and the cascade gives different values:
- **0390-2954-3640 is pinned at resid +0** (our cascade = 60, EXACTLY lodged) — but the user
flags it at **66.85 (6.85)**. A 6.85 SAP gap can't be staleness.
- 7536 (their 69.07) and 2130 (their 83.35) are **pre-this-session** values — the S0380.214
roof fix moved 7536 → 68.924, and the S0380.215 wall fix moved 2130 → 83.78.
So the user's numbers come from a **different computation** than the golden cascade. Two
hypotheses, test both before assuming the cascade is wrong:
1. **Fresh API JSON ≠ curated fixture.** The golden fixtures were bulk-fetched once
(`scripts/fetch_cohort2_api_jsons.py`, which *skips certs whose JSON already exists`) and
some may have been hand-corrected since. **Fetch each cert fresh and `diff` the raw JSON
against the committed fixture.** If they differ, the fixture was curated and the fresh raw
data is what the user's pipeline sees — debug the FRESH data. This is the most likely cause
and exactly why the user wants a fresh fetch.
2. **A different SAP engine.** The production stack (`backend/SearchEpc.py`
`etl/epc_clean/epc_attributes/*``backend/engine/engine.py`) is a SEPARATE mapping +
scorer from `cert_to_inputs`. If the user's dashboard is produced there, that's a different
code path than the golden cascade. Ask the user which pipeline the table came from.
Do NOT start "fixing" the cascade to hit the user's numbers until you know which pipeline
produced them. The golden cascade is worksheet-validated for 47 certs; chasing a dashboard
number from a different stack would regress it.
---
## Fetching from the EPC API
Token lives in `backend/.env` as `OPEN_EPC_API_TOKEN` (also `EPC_AUTH_TOKEN`). The exact
mechanism (from `scripts/fetch_cohort2_api_jsons.py`):
```python
import httpx, os
from dotenv import load_dotenv
from infrastructure.epc_client.epc_client_service import EpcClientService
load_dotenv("backend/.env")
token = os.environ["OPEN_EPC_API_TOKEN"]
resp = httpx.get(
f"{EpcClientService.BASE_URL}/api/certificate",
params={"certificate_number": "<CERT>"},
headers={"Authorization": f"Bearer {token}", "Accept": "application/json"},
timeout=EpcClientService.REQUEST_TIMEOUT,
)
payload = resp.json()["data"] # <- this is the schema-21 JSON the mapper consumes
```
`EpcPropertyDataMapper.from_api_response(payload)` only supports `schema_type`
`RdSAP-Schema-21.0.0` / `21.0.1`; it raises for others. The persisted golden fixture IS this
`data` payload. So `diff <(fresh)` vs the committed fixture is apples-to-apples.
---
## Per-cert notes carried from the previous session (verify against FRESH data)
- **7536 (+1)** — roof bug fixed (S0380.214: as-built sloping ceiling → Table 18 col 3).
Every per-element U matches Elmhurst (cases 15-17 worksheets). Concluded 0240-like; cont
68.924.
- **2130 (+2)** — dropped measured wall insulation captured (S0380.215 → Table 8 U=0.32),
which **exposed** the true residual (the +1 was two offsetting bugs). PV β-split **proven
exact** vs simulated case 18 worksheet (onsite 970.77 / export 1713.40 to the decimal).
Gas PE factor exact (1.13). Concluded 0240-like; cont 83.78.
- **0240 (1)** — export-dropped 2013+ circulation-pump age (115 vs 41 kWh); WWHRS confirmed
inert (`shower_wwhrs=1` is the universal default across all 47 certs). User previously
decided NOT to re-pin. Concluded 0240-like.
- **0390-2954-3640** — pinned +0 (oil combi, Table 3a row 1). The user's 6.85 flag is the
reconciliation mystery above — START HERE; it's the clearest signal of a fresh-vs-fixture
or different-engine gap.
- **6035** — see memory `project_golden_coverage_state`: a user-simulated 6035 worksheet
closed to 1e-4, but "6035 remaining +19 PE needs its own worksheet"; flagged +2.19 SAP.
- **9390** — community heat-network (S0380.212/.213 fixed the fuel-code collision + standing
charge); left at SAP 2 with a documented ~7% demand over-count (heat-source-eff default?).
Unpinned/retired. The user's 4.24 may be the same demand over-count on fresh data.
---
## What this session shipped (commits `ec64c39d..f895dd3a`)
| slice | what |
|---|---|
| **S0380.214** | As-built "Pitched, sloping ceiling" (code 8) roof → RdSAP 10 Table 18 col (3) (band F 0.40→0.68, L 0.16→0.18) per §5.11 item 5-5 + note (b). Code-5 vaulted stays col (1) (cohort). Worksheet-validated (sim case 15). Re-pinned 7536. |
| **S0380.215** | Captured dropped `wall_insulation_thickness_measured` (schema 21 didn't declare it → `from_dict` dropped it). 2130 Ext1 "measured"/100 mm → RdSAP Table 8 U=0.32 (was 0.55 default). Exposed 2130's true +2 residual. |
| **S0380.216** | Extractor: handle pdftotext wrapping the §11 glazing-GAP column onto the glazing-TYPE token ("…16 mm or [1st]"). Fallback strip AFTER the direct lookup (preserves explicit interleaved keys). Unblocked running the cascade on hand-entered worksheet Summaries. |
| **S0380.217** | Captured dropped `wall_insulation_thermal_conductivity` (schema → domain → mapper) and wired it into `u_wall`'s §5.8 λ resolver. Code 1 = default 0.04; unmapped codes raise. Zero cascade effect today (2130's §5.8 path doesn't fire). |
| 3× docs | finalised 7536 / 2130 as 0240-like; corrected diagnoses. |
**Audit method that found the dropped fields** (reuse it on the fresh certs): recursively
compare raw JSON keys against the parsed schema dataclass fields — anything in the JSON but
not a declared field is silently dropped by `from_dict`. The two real drops (2130's measured
wall insulation + thermal conductivity) came from this. Re-run it on the fresh fetches; new
certs may surface new dropped fields.
---
## Conventions (unchanged)
One cause = one slice = one commit; spec citation (page + line) in the message; AAA tests
(`# Arrange / # Act / # Assert`); assert with `abs(x - y) <= tol` (not `pytest.approx`);
SAP 10.2 only; no tolerance widening / xfail / rel-tol. New code passes pyright strict with
ZERO NEW errors — baseline-compare with `git stash` + `PYRIGHT_PYTHON_FORCE_VERSION=latest`
(mapper.py / cert_to_inputs.py / heat_transmission.py / rdsap_uvalues.py carry pre-existing
errors; compare counts). Stage files by name — the working tree has pre-existing unrelated
changes to `pytest.ini` / `scripts/` that must NOT be staged.
`Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>`.
When you re-pin a golden cert, update `expected_sap_resid` (±0), `expected_pe_resid_kwh_per_m2`
(±0.01) and `expected_co2_resid_tonnes_per_yr` (±0.001) to the exact post-fix values and
append a slice note to the cert's `notes:` explaining the cause + spec/worksheet citation.
Run the full §4 suite as the blast-radius check after any fabric/factor change.