mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
docs: handover — fresh-API cross-comparison + flagged-cert debugging
Next-agent brief: fetch certs fresh from the EPC API (two new API+Summary+ worksheet triples for cross-mapper parity, plus six dashboard-flagged certs). Flags the critical reconciliation: the user's flagged numbers don't match the golden-fixtures cascade (0390-2954-3640 pinned +0 but flagged -6.85; 7536/2130 flags are pre-this-session), so fresh-raw-JSON-vs-curated-fixture or a different engine must be reconciled before debugging. Documents the EPC API fetch mechanism, the dropped-field audit method, this session's 4 fixes, and the conventions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
f895dd3ab7
commit
8bd8ff8e5c
1 changed files with 171 additions and 0 deletions
171
domain/sap10_calculator/docs/HANDOVER_FRESH_API_DEBUG.md
Normal file
171
domain/sap10_calculator/docs/HANDOVER_FRESH_API_DEBUG.md
Normal file
|
|
@ -0,0 +1,171 @@
|
|||
# Handover — fresh-API cross-comparison + flagged-cert debugging
|
||||
|
||||
Point-in-time note. Start from [`AGENT_GUIDE.md`](AGENT_GUIDE.md) for methodology, the
|
||||
1e-4 bar, the per-line debugging loop, the section helpers, and the suite command.
|
||||
|
||||
- **Branch:** `feature/per-cert-mapper-validation`
|
||||
- **HEAD:** `f895dd3a` (S0380.217). Confirm with `git rev-parse HEAD`.
|
||||
- **Baseline (AGENT_GUIDE §4 suite):** `tests/domain/sap10_calculator/ backend/documents_parser/tests/`
|
||||
→ green (2388 passed, 1 skipped at HEAD; the golden + worksheet pins all pass).
|
||||
- **Next slice number:** **S0380.218**.
|
||||
- **Pre-existing failures (NOT yours, out of scope):**
|
||||
- `domain/sap10_ml/tests/test_rdsap_uvalues.py` — 2 stone-§5.6 thin-wall failures
|
||||
(granite + sandstone band A, 3.7408 vs Table-6 1.7 cap). Run this suite when you touch
|
||||
`rdsap_uvalues.py`.
|
||||
- `datatypes/epc/domain/tests/test_from_rdsap_schema.py::TestFromRdSapSchema21_0_1::test_total_floor_area`
|
||||
(145.82 vs 45.82) — fails at original HEAD `ec64c39d` too. This file is NOT in the §4
|
||||
suite command.
|
||||
|
||||
---
|
||||
|
||||
## ★ THE TASK — fetch fresh from the EPC API and debug, with worksheet cross-comparison
|
||||
|
||||
The previous session drove the **golden-fixtures cascade** (`cert_to_inputs` →
|
||||
`calculate_sap_from_inputs`) and concluded that the three then-flagged certs (7536, 2130,
|
||||
0240) are "0240-like" — API-only residuals not reproducible from the register JSON. The
|
||||
user pushed back ("going around in circles"), and the right next move is **fresh raw-API
|
||||
data + worksheet triples**, not more simulated worksheets.
|
||||
|
||||
### Part 1 — two NEW certs with API + Summary + worksheet (cross-comparison)
|
||||
|
||||
The user has **two certs that have all three artifacts**: the GOV.UK API JSON, the Elmhurst
|
||||
**Summary** PDF (site notes / input), and the Elmhurst **worksheet** PDF (the `(1)..(286)`
|
||||
ground truth). These are gold — they let you run BOTH front-ends (`from_api_response` and
|
||||
`from_elmhurst_site_notes`) through the same cascade and pin **both** against the worksheet
|
||||
at 1e-4. The user will provide the cert numbers + drop the PDFs. For each:
|
||||
|
||||
1. Fetch the API JSON (see **Fetching** below).
|
||||
2. Run API path → cascade; run Summary path → cascade; pin **both** vs the worksheet line
|
||||
refs (`pdftotext -layout` the worksheet; compare `(27)/(28a)/(29a)/(30)/(33)/(36)/(45)m/
|
||||
(62)/(233a)/(233b)/(258)…`). Cross-mapper parity: the two paths must agree to 1e-4 AND
|
||||
match the worksheet (memory `feedback_cross_mapper_parity_via_cascade`).
|
||||
3. The **first diverging line ref localises the bug** (AGENT_GUIDE §3): value present in
|
||||
worksheet but cascade 0/wrong → calculator; input field absent in `epc` → mapper or
|
||||
extractor. Fix one cause = one slice.
|
||||
|
||||
### Part 2 — six flagged certs to fetch fresh and debug
|
||||
|
||||
The user's dashboard flags these (their numbers, **sign = lodged − our**):
|
||||
|
||||
| cert | lodged | their "our" | their Δ |
|
||||
|---|---|---|---|
|
||||
| 0240-0200-5706-2365-8010 | 73 | 71.73 | +1.27 |
|
||||
| 0390-2954-3640-2196-4175 | 60 | 66.85 | −6.85 |
|
||||
| 2130-1033-4050-5007-8395 | 82 | 83.35 | −1.35 |
|
||||
| 6035-7729-2309-0879-2296 | 70 | 67.81 | +2.19 |
|
||||
| 7536-3827-0600-0600-0276 | 68 | 69.07 | −1.07 |
|
||||
| 9390-2722-3520-2105-8715 | 67 | 71.24 | −4.24 |
|
||||
|
||||
### ⚠ CRITICAL — reconcile the numbers FIRST, before debugging
|
||||
|
||||
**The user's flagged numbers DO NOT match the golden-fixtures cascade.** All six certs are
|
||||
already golden fixtures (`tests/domain/sap10_calculator/rdsap/fixtures/golden/<cert>.json`),
|
||||
and the cascade gives different values:
|
||||
|
||||
- **0390-2954-3640 is pinned at resid +0** (our cascade = 60, EXACTLY lodged) — but the user
|
||||
flags it at **66.85 (−6.85)**. A 6.85 SAP gap can't be staleness.
|
||||
- 7536 (their 69.07) and 2130 (their 83.35) are **pre-this-session** values — the S0380.214
|
||||
roof fix moved 7536 → 68.924, and the S0380.215 wall fix moved 2130 → 83.78.
|
||||
|
||||
So the user's numbers come from a **different computation** than the golden cascade. Two
|
||||
hypotheses, test both before assuming the cascade is wrong:
|
||||
|
||||
1. **Fresh API JSON ≠ curated fixture.** The golden fixtures were bulk-fetched once
|
||||
(`scripts/fetch_cohort2_api_jsons.py`, which *skips certs whose JSON already exists`) and
|
||||
some may have been hand-corrected since. **Fetch each cert fresh and `diff` the raw JSON
|
||||
against the committed fixture.** If they differ, the fixture was curated and the fresh raw
|
||||
data is what the user's pipeline sees — debug the FRESH data. This is the most likely cause
|
||||
and exactly why the user wants a fresh fetch.
|
||||
2. **A different SAP engine.** The production stack (`backend/SearchEpc.py` →
|
||||
`etl/epc_clean/epc_attributes/*` → `backend/engine/engine.py`) is a SEPARATE mapping +
|
||||
scorer from `cert_to_inputs`. If the user's dashboard is produced there, that's a different
|
||||
code path than the golden cascade. Ask the user which pipeline the table came from.
|
||||
|
||||
Do NOT start "fixing" the cascade to hit the user's numbers until you know which pipeline
|
||||
produced them. The golden cascade is worksheet-validated for 47 certs; chasing a dashboard
|
||||
number from a different stack would regress it.
|
||||
|
||||
---
|
||||
|
||||
## Fetching from the EPC API
|
||||
|
||||
Token lives in `backend/.env` as `OPEN_EPC_API_TOKEN` (also `EPC_AUTH_TOKEN`). The exact
|
||||
mechanism (from `scripts/fetch_cohort2_api_jsons.py`):
|
||||
|
||||
```python
|
||||
import httpx, os
|
||||
from dotenv import load_dotenv
|
||||
from infrastructure.epc_client.epc_client_service import EpcClientService
|
||||
load_dotenv("backend/.env")
|
||||
token = os.environ["OPEN_EPC_API_TOKEN"]
|
||||
resp = httpx.get(
|
||||
f"{EpcClientService.BASE_URL}/api/certificate",
|
||||
params={"certificate_number": "<CERT>"},
|
||||
headers={"Authorization": f"Bearer {token}", "Accept": "application/json"},
|
||||
timeout=EpcClientService.REQUEST_TIMEOUT,
|
||||
)
|
||||
payload = resp.json()["data"] # <- this is the schema-21 JSON the mapper consumes
|
||||
```
|
||||
|
||||
`EpcPropertyDataMapper.from_api_response(payload)` only supports `schema_type`
|
||||
`RdSAP-Schema-21.0.0` / `21.0.1`; it raises for others. The persisted golden fixture IS this
|
||||
`data` payload. So `diff <(fresh)` vs the committed fixture is apples-to-apples.
|
||||
|
||||
---
|
||||
|
||||
## Per-cert notes carried from the previous session (verify against FRESH data)
|
||||
|
||||
- **7536 (+1)** — roof bug fixed (S0380.214: as-built sloping ceiling → Table 18 col 3).
|
||||
Every per-element U matches Elmhurst (cases 15-17 worksheets). Concluded 0240-like; cont
|
||||
68.924.
|
||||
- **2130 (+2)** — dropped measured wall insulation captured (S0380.215 → Table 8 U=0.32),
|
||||
which **exposed** the true residual (the +1 was two offsetting bugs). PV β-split **proven
|
||||
exact** vs simulated case 18 worksheet (onsite 970.77 / export 1713.40 to the decimal).
|
||||
Gas PE factor exact (1.13). Concluded 0240-like; cont 83.78.
|
||||
- **0240 (−1)** — export-dropped 2013+ circulation-pump age (115 vs 41 kWh); WWHRS confirmed
|
||||
inert (`shower_wwhrs=1` is the universal default across all 47 certs). User previously
|
||||
decided NOT to re-pin. Concluded 0240-like.
|
||||
- **0390-2954-3640** — pinned +0 (oil combi, Table 3a row 1). The user's −6.85 flag is the
|
||||
reconciliation mystery above — START HERE; it's the clearest signal of a fresh-vs-fixture
|
||||
or different-engine gap.
|
||||
- **6035** — see memory `project_golden_coverage_state`: a user-simulated 6035 worksheet
|
||||
closed to 1e-4, but "6035 remaining +19 PE needs its own worksheet"; flagged +2.19 SAP.
|
||||
- **9390** — community heat-network (S0380.212/.213 fixed the fuel-code collision + standing
|
||||
charge); left at SAP −2 with a documented ~7% demand over-count (heat-source-eff default?).
|
||||
Unpinned/retired. The user's −4.24 may be the same demand over-count on fresh data.
|
||||
|
||||
---
|
||||
|
||||
## What this session shipped (commits `ec64c39d..f895dd3a`)
|
||||
|
||||
| slice | what |
|
||||
|---|---|
|
||||
| **S0380.214** | As-built "Pitched, sloping ceiling" (code 8) roof → RdSAP 10 Table 18 col (3) (band F 0.40→0.68, L 0.16→0.18) per §5.11 item 5-5 + note (b). Code-5 vaulted stays col (1) (cohort). Worksheet-validated (sim case 15). Re-pinned 7536. |
|
||||
| **S0380.215** | Captured dropped `wall_insulation_thickness_measured` (schema 21 didn't declare it → `from_dict` dropped it). 2130 Ext1 "measured"/100 mm → RdSAP Table 8 U=0.32 (was 0.55 default). Exposed 2130's true +2 residual. |
|
||||
| **S0380.216** | Extractor: handle pdftotext wrapping the §11 glazing-GAP column onto the glazing-TYPE token ("…16 mm or [1st]"). Fallback strip AFTER the direct lookup (preserves explicit interleaved keys). Unblocked running the cascade on hand-entered worksheet Summaries. |
|
||||
| **S0380.217** | Captured dropped `wall_insulation_thermal_conductivity` (schema → domain → mapper) and wired it into `u_wall`'s §5.8 λ resolver. Code 1 = default 0.04; unmapped codes raise. Zero cascade effect today (2130's §5.8 path doesn't fire). |
|
||||
| 3× docs | finalised 7536 / 2130 as 0240-like; corrected diagnoses. |
|
||||
|
||||
**Audit method that found the dropped fields** (reuse it on the fresh certs): recursively
|
||||
compare raw JSON keys against the parsed schema dataclass fields — anything in the JSON but
|
||||
not a declared field is silently dropped by `from_dict`. The two real drops (2130's measured
|
||||
wall insulation + thermal conductivity) came from this. Re-run it on the fresh fetches; new
|
||||
certs may surface new dropped fields.
|
||||
|
||||
---
|
||||
|
||||
## Conventions (unchanged)
|
||||
|
||||
One cause = one slice = one commit; spec citation (page + line) in the message; AAA tests
|
||||
(`# Arrange / # Act / # Assert`); assert with `abs(x - y) <= tol` (not `pytest.approx`);
|
||||
SAP 10.2 only; no tolerance widening / xfail / rel-tol. New code passes pyright strict with
|
||||
ZERO NEW errors — baseline-compare with `git stash` + `PYRIGHT_PYTHON_FORCE_VERSION=latest`
|
||||
(mapper.py / cert_to_inputs.py / heat_transmission.py / rdsap_uvalues.py carry pre-existing
|
||||
errors; compare counts). Stage files by name — the working tree has pre-existing unrelated
|
||||
changes to `pytest.ini` / `scripts/` that must NOT be staged.
|
||||
`Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>`.
|
||||
|
||||
When you re-pin a golden cert, update `expected_sap_resid` (±0), `expected_pe_resid_kwh_per_m2`
|
||||
(±0.01) and `expected_co2_resid_tonnes_per_yr` (±0.001) to the exact post-fix values and
|
||||
append a slice note to the cert's `notes:` explaining the cause + spec/worksheet citation.
|
||||
Run the full §4 suite as the blast-radius check after any fabric/factor change.
|
||||
Loading…
Add table
Reference in a new issue