Record S0380.218 shipped, bump HEAD/next-slice, note both certs are 0-residual cross-validated golden fixtures and flag the optional Summary-path regression guard as the cheap follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
9.7 KiB
Handover — fresh-API cross-comparison + flagged-cert debugging
Point-in-time note. Start from AGENT_GUIDE.md for methodology, the
1e-4 bar, the per-line debugging loop, the section helpers, and the suite command.
- Branch:
feature/per-cert-mapper-validation - HEAD:
6d9ef114(S0380.218). Confirm withgit rev-parse HEAD. - Baseline (AGENT_GUIDE §4 suite):
tests/domain/sap10_calculator/ backend/documents_parser/tests/→ green (2392 passed, 1 skipped at HEAD; the golden + worksheet pins all pass). - Next slice number: S0380.219.
S0380.218 (DONE) — Part 1 closed for the "with api 3" pair. The two certs the user dropped under
sap worksheets/with api 3/—0340-2467-9260-2006-6521(Summary_000922 / dr87 000922) and5500-5070-0822-0201-3663(Summary_000920 / dr87 000920) — are clean. Fetched fresh, run through BOTH front-ends, both paths agree to <1e-4 on SAP/cost/CO2/PE AND reproduce the worksheet (255)/(272)/(286)/(33)/(37) exactly. SAP integer = lodged (resid +0) on both. No mapper/calculator bug surfaced. Dropped-field audit clean (onlycreated_at+_normalize_shower_outlets-handled shower keys). Locked in as golden fixtures: 2 JSONs underfixtures/golden/+ entries in_EXPECTATIONSand_WORKSHEET_PE_CO2(test_golden_fixtures.py). The Summary path was validated manually but is NOT pinned in a committed test (would need the Summary PDFs copied intobackend/documents_parser/tests/fixtures/+ a textract-preprocessed chain test) — a cheap follow-up if cross-mapper parity wants a standing regression guard beyond the API-path golden pin.
- Pre-existing failures (NOT yours, out of scope):
domain/sap10_ml/tests/test_rdsap_uvalues.py— 2 stone-§5.6 thin-wall failures (granite + sandstone band A, 3.7408 vs Table-6 1.7 cap). Run this suite when you touchrdsap_uvalues.py.datatypes/epc/domain/tests/test_from_rdsap_schema.py::TestFromRdSapSchema21_0_1::test_total_floor_area(145.82 vs 45.82) — fails at original HEADec64c39dtoo. This file is NOT in the §4 suite command.
★ THE TASK — fetch fresh from the EPC API and debug, with worksheet cross-comparison
The previous session drove the golden-fixtures cascade (cert_to_inputs →
calculate_sap_from_inputs) and concluded that the three then-flagged certs (7536, 2130,
0240) are "0240-like" — API-only residuals not reproducible from the register JSON. The
user pushed back ("going around in circles"), and the right next move is fresh raw-API
data + worksheet triples, not more simulated worksheets.
Part 1 — two NEW certs with API + Summary + worksheet (cross-comparison)
The user has two certs that have all three artifacts: the GOV.UK API JSON, the Elmhurst
Summary PDF (site notes / input), and the Elmhurst worksheet PDF (the (1)..(286)
ground truth). These are gold — they let you run BOTH front-ends (from_api_response and
from_elmhurst_site_notes) through the same cascade and pin both against the worksheet
at 1e-4. The user will provide the cert numbers + drop the PDFs. For each:
- Fetch the API JSON (see Fetching below).
- Run API path → cascade; run Summary path → cascade; pin both vs the worksheet line
refs (
pdftotext -layoutthe worksheet; compare(27)/(28a)/(29a)/(30)/(33)/(36)/(45)m/ (62)/(233a)/(233b)/(258)…). Cross-mapper parity: the two paths must agree to 1e-4 AND match the worksheet (memoryfeedback_cross_mapper_parity_via_cascade). - The first diverging line ref localises the bug (AGENT_GUIDE §3): value present in
worksheet but cascade 0/wrong → calculator; input field absent in
epc→ mapper or extractor. Fix one cause = one slice.
Part 2 — (secondary) re-check the previously-flagged certs on THIS branch
A dashboard once flagged six certs (0240, 0390-2954-3640, 2130, 6035, 7536, 9390). Those numbers are STALE — they came from a branch WITHOUT this branch's fixes (the user confirmed this). Do not chase them. On THIS branch the picture is different and mostly settled:
- 7536 (68.924, +1), 2130 (83.78, +2), 0240 (−1) — concluded 0240-like (API-only residuals; see per-cert notes below). 0390-2954-3640 pins at +0 (exact).
- 6035 (+2.19) and 9390 (community, −2) carry documented open residuals (see notes) but are lower-priority and not worksheet-backed.
So Part 2 is only worth touching if a fresh fetch differs from the committed fixture
(curated/hand-corrected fixtures can mask raw-API mapper behaviour) — diff fresh vs fixture
and debug the delta. Otherwise these are done; the real new work is Part 1.
Fetching from the EPC API
Token lives in backend/.env as OPEN_EPC_API_TOKEN (also EPC_AUTH_TOKEN). The exact
mechanism (from scripts/fetch_cohort2_api_jsons.py):
import httpx, os
from dotenv import load_dotenv
from infrastructure.epc_client.epc_client_service import EpcClientService
load_dotenv("backend/.env")
token = os.environ["OPEN_EPC_API_TOKEN"]
resp = httpx.get(
f"{EpcClientService.BASE_URL}/api/certificate",
params={"certificate_number": "<CERT>"},
headers={"Authorization": f"Bearer {token}", "Accept": "application/json"},
timeout=EpcClientService.REQUEST_TIMEOUT,
)
payload = resp.json()["data"] # <- this is the schema-21 JSON the mapper consumes
EpcPropertyDataMapper.from_api_response(payload) only supports schema_type
RdSAP-Schema-21.0.0 / 21.0.1; it raises for others. The persisted golden fixture IS this
data payload. So diff <(fresh) vs the committed fixture is apples-to-apples.
Per-cert notes carried from the previous session (verify against FRESH data)
- 7536 (+1) — roof bug fixed (S0380.214: as-built sloping ceiling → Table 18 col 3). Every per-element U matches Elmhurst (cases 15-17 worksheets). Concluded 0240-like; cont 68.924.
- 2130 (+2) — dropped measured wall insulation captured (S0380.215 → Table 8 U=0.32), which exposed the true residual (the +1 was two offsetting bugs). PV β-split proven exact vs simulated case 18 worksheet (onsite 970.77 / export 1713.40 to the decimal). Gas PE factor exact (1.13). Concluded 0240-like; cont 83.78.
- 0240 (−1) — export-dropped 2013+ circulation-pump age (115 vs 41 kWh); WWHRS confirmed
inert (
shower_wwhrs=1is the universal default across all 47 certs). User previously decided NOT to re-pin. Concluded 0240-like. - 0390-2954-3640 — pinned +0 (oil combi, Table 3a row 1). The user's −6.85 flag is the reconciliation mystery above — START HERE; it's the clearest signal of a fresh-vs-fixture or different-engine gap.
- 6035 — see memory
project_golden_coverage_state: a user-simulated 6035 worksheet closed to 1e-4, but "6035 remaining +19 PE needs its own worksheet"; flagged +2.19 SAP. - 9390 — community heat-network (S0380.212/.213 fixed the fuel-code collision + standing charge); left at SAP −2 with a documented ~7% demand over-count (heat-source-eff default?). Unpinned/retired. The user's −4.24 may be the same demand over-count on fresh data.
What this session shipped (commits ec64c39d..f895dd3a)
| slice | what |
|---|---|
| S0380.214 | As-built "Pitched, sloping ceiling" (code 8) roof → RdSAP 10 Table 18 col (3) (band F 0.40→0.68, L 0.16→0.18) per §5.11 item 5-5 + note (b). Code-5 vaulted stays col (1) (cohort). Worksheet-validated (sim case 15). Re-pinned 7536. |
| S0380.215 | Captured dropped wall_insulation_thickness_measured (schema 21 didn't declare it → from_dict dropped it). 2130 Ext1 "measured"/100 mm → RdSAP Table 8 U=0.32 (was 0.55 default). Exposed 2130's true +2 residual. |
| S0380.216 | Extractor: handle pdftotext wrapping the §11 glazing-GAP column onto the glazing-TYPE token ("…16 mm or [1st]"). Fallback strip AFTER the direct lookup (preserves explicit interleaved keys). Unblocked running the cascade on hand-entered worksheet Summaries. |
| S0380.217 | Captured dropped wall_insulation_thermal_conductivity (schema → domain → mapper) and wired it into u_wall's §5.8 λ resolver. Code 1 = default 0.04; unmapped codes raise. Zero cascade effect today (2130's §5.8 path doesn't fire). |
| 3× docs | finalised 7536 / 2130 as 0240-like; corrected diagnoses. |
Audit method that found the dropped fields (reuse it on the fresh certs): recursively
compare raw JSON keys against the parsed schema dataclass fields — anything in the JSON but
not a declared field is silently dropped by from_dict. The two real drops (2130's measured
wall insulation + thermal conductivity) came from this. Re-run it on the fresh fetches; new
certs may surface new dropped fields.
Conventions (unchanged)
One cause = one slice = one commit; spec citation (page + line) in the message; AAA tests
(# Arrange / # Act / # Assert); assert with abs(x - y) <= tol (not pytest.approx);
SAP 10.2 only; no tolerance widening / xfail / rel-tol. New code passes pyright strict with
ZERO NEW errors — baseline-compare with git stash + PYRIGHT_PYTHON_FORCE_VERSION=latest
(mapper.py / cert_to_inputs.py / heat_transmission.py / rdsap_uvalues.py carry pre-existing
errors; compare counts). Stage files by name — the working tree has pre-existing unrelated
changes to pytest.ini / scripts/ that must NOT be staged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>.
When you re-pin a golden cert, update expected_sap_resid (±0), expected_pe_resid_kwh_per_m2
(±0.01) and expected_co2_resid_tonnes_per_yr (±0.001) to the exact post-fix values and
append a slice note to the cert's notes: explaining the cause + spec/worksheet citation.
Run the full §4 suite as the blast-radius check after any fabric/factor change.