Validate SAP calculator vs Elmhurst; fix reduced-field window U; add accuracy harness

Reduced-field window U: heat_transmission derived the synthesised-window raw U from u_window(all None) -> the 2.5 placeholder regardless of glazing. Now routes the (uniform) glazing_type code through u_window (RdSAP Table 24) so e.g. double pre-2002 reads 2.8, not 2.5. Only the pre-SAP10 reduced-field path is affected (21.0.1 certs carry per-window U upstream) — the RdSAP-21.0.1 corpus gauge is unchanged at 66.9% within-0.5. test_real_cert_sap_accuracy: pin uprn_10002468137 (RdSAP-17.1, all-electric storage heaters) at SAP 61, validated against Elmhurst on identical inputs (dual off-peak immersion, 110 L cylinder, 2 baths). Our engine reproduces Elmhurst's fuel cost to the penny; lodged 55 is the old SAP-2012 schema. Tooling to grow the accuracy corpus: - scripts/fetch_real_life_epc_sample.py — capture a cert by UPRN into the corpus. - scripts/compare_epc_paths.py — diff gov-API vs Elmhurst-summary EpcPropertyData and run both through the engine, localising mapper vs calculator differences. - skill validate-cert-sap-accuracy — the end-to-end loop (capture -> Elmhurst inputs -> human builds -> compare -> reconcile -> pin in the test). - skill epc-to-elmhurst-rdsap-inputs reference: corrected immersion (code 1=dual), cylinder size (code 2 = Normal/110 L), and bath-count (WWHRS sub-tab) mappings. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 13:10:47 +00:00 · 2026-06-15 15:26:11 +00:00 · 2026-06-15 15:26:11 +00:00 · 5c11fd35c8
commit 5c11fd35c8
parent 140ad39898
8 changed files with 415 additions and 20 deletions
--- a/.claude/skills/epc-to-elmhurst-rdsap-inputs/reference/mapping.md
+++ b/.claude/skills/epc-to-elmhurst-rdsap-inputs/reference/mapping.md
@ -179,13 +179,14 @@ UPRN 10002468137 — lodged 55, engine 62.
 | `water_heating_code` | 901 = From main heating system (Elmhurst "Boiler Circulator"); **903 = Electric immersion, off-peak → Elmhurst "Water Heater" category** (NOT Boiler Circulator) |
 | `water_heating_fuel` | as Fuel codes above (29 = off-peak) |
 | `has_hot_water_cylinder` | → "Hot Water Cylinder Present" |
-| `cylinder_size` | band: 1=Small, 2=Medium, 3=Large |
+| `cylinder_size` | **code 2 = Normal / 110 L, code 3 = Medium / 160 L, code 4 = Large / 210 L** (RdSAP 10 §10.5 Table 28; source: `cert_to_inputs.py` `_CYLINDER_SIZE_CODE_TO_LITRES`). In Elmhurst pick the **litre value**, NOT the label — "Normal" = 110 L. |
 | `cylinder_insulation_type` | **1 = factory Foam, 2 = loose Jacket** (source: `cert_to_inputs.py` `_CYLINDER_INSULATION_TYPE_LOOSE_JACKET = 2`) |
 | `cylinder_insulation_thickness` | mm (38 mm ≈ factory foam; jackets 80 mm+) |
-| `immersion_heating_type` | 1 = single |
+| `immersion_heating_type` | **code 1 = DUAL, code 2 = SINGLE** (source: `cert_to_inputs.py` ~L5288, per RdSAP 10 §10.5 "assume dual on a dual/off-peak meter" + the API cohort). ⚠️ Do NOT read 1 as "single" — single vs dual flips the Table 13 high-rate fraction and can swing the SAP score several points (e.g. cert 10002468137: dual 0.131 → SAP 61, single 0.571 → SAP 57). Storage-heater / off-peak certs are almost always code 1 = dual. |

 - **Community Hot Water**: 0 unless lodged.
 - **Solar Water Heating**: `solar_water_heating` Y/N.
+- **Number of baths** (Elmhurst tab: **Water Heating → WWHRS sub-tab → "Total no. of Baths"**, NOT the main Water Heating sub-tab): the gov-API derives it from `sap_heating.instantaneous_wwhrs` ROOM counts — `number_baths = rooms_with_bath_and_or_shower + rooms_with_bath_and_mixer_shower`. ⚠️ Elmhurst defaults this to 0; set it to the derived count or the gov-API and Elmhurst hot-water demand diverge (e.g. cert 10002468137: 2 baths = +165 kWh HW ≈ +£11 ≈ +0.7 SAP). Keep WWHRS itself **No**.
 - **WWHRS**: ⚠️ `sap_heating.instantaneous_wwhrs` holds **bath/shower ROOM
  counts** (ADR-0028: `rooms_with_bath_and_or_shower`, `rooms_with_mixer_shower_no_bath`,
  `rooms_with_bath_and_mixer_shower`) — it is **NOT** a heat-recovery device.
--- a/.claude/skills/validate-cert-sap-accuracy/SKILL.md
+++ b/.claude/skills/validate-cert-sap-accuracy/SKILL.md
@ -0,0 +1,89 @@
+---
+name: validate-cert-sap-accuracy
+description: Run the end-to-end loop that validates this repo's SAP calculator against accredited Elmhurst Energy for one real EPC certificate, then locks the result into the regression test corpus. Capture a cert by UPRN → generate Elmhurst inputs → (human builds it in Elmhurst) → diff the gov-API vs Elmhurst EpcPropertyData and run both through our engine → reconcile to convergence → pin the agreed SAP score in the accuracy test. Use when validating/expanding SAP-calculator accuracy against Elmhurst, adding a cert to the accuracy corpus, or when the user wants to "check a cert against Elmhurst" / "add another accuracy test".
+---
+
+# Validate cert SAP accuracy (gov-API ↔ Elmhurst)
+
+Separates **calculator** correctness from **mapper** fidelity by computing the
+same property two ways and reconciling them, then freezes the agreed score as
+a regression pin. Files land in the corpus location so the suite grows.
+
+Sample home for every cert: `backend/epc_api/json_samples/real_life_examples/<schema>/uprn_<uprn>/`
+(`epc.json`, `elmhurst_inputs.md`, `elmhurst_summary.pdf`, `elmhurst_worksheet.pdf`).
+
+## Workflow
+
+1. **Capture the cert** (gov-EPC API → saved json + our engine's score):
+   ```
+   PYTHONPATH=/workspaces/model python scripts/fetch_real_life_epc_sample.py <uprn>
+   ```
+   Writes `real_life_examples/<schema>/uprn_<uprn>/epc.json` and prints schema,
+   lodged rating, and our engine's SAP + per-end-use kWh. Note the schema:
+   only RdSAP schemas map today (full SAP `SAP-Schema-*` is partial).
+
+2. **Generate the Elmhurst input sheet** — invoke the **`epc-to-elmhurst-rdsap-inputs`**
+   skill on the UPRN. It writes `elmhurst_inputs.md` next to the json, page by
+   page, with the code→value mappings (cylinder, immersion, baths, glazing, …).
+
+3. **Human builds it in Elmhurst** from `elmhurst_inputs.md`, then exports the
+   **Summary PDF** and the **SAP-10.2 worksheet PDF**, saving them in the sample
+   dir as **`elmhurst_summary.pdf`** and **`elmhurst_worksheet.pdf`**. (This is
+   the only manual step — Elmhurst is the accredited ground truth.)
+
+4. **Compare the two paths**:
+   ```
+   PYTHONPATH=/workspaces/model python scripts/compare_epc_paths.py <uprn>
+   ```
+   Builds `EpcPropertyData` from the gov-API json AND from the Elmhurst summary
+   (`parse_site_notes_pdf`), deep-diffs them, runs BOTH through `Sap10Calculator`,
+   and prints Elmhurst's own worksheet SAP (258). Reading it:
+   - **Our engine on Elmhurst inputs ≈ Elmhurst's worksheet SAP** → calculator is
+     correct (it reproduces accredited Elmhurst on identical inputs).
+   - **gov-API SAP vs Elmhurst-PDF SAP gap** → input differences only. The field
+     diff localises them.
+
+5. **Reconcile to convergence.** Triage each field diff (use the
+   `epc-to-elmhurst-rdsap-inputs` skill's `reference/mapping.md` for code
+   semantics — cylinder code 2=110 L, immersion code 1=dual, baths on the WWHRS
+   sub-tab, etc.):
+   - **Elmhurst data-entry error** (e.g. swapped floor dims, wrong cylinder/
+     immersion, missing baths, wrong postcode/region) → fix in Elmhurst, re-export,
+     re-run step 4.
+   - **gov-API mapper gap** (e.g. lodged alt-wall dropped) → a real per-cert-mapper
+     fix; flag it (Khalim's domain) — don't tune to mask it.
+   - **Genuine ground-truth question** (what the property *actually* is) → the
+     assessor/user settles it; align both sides to the lodged data.
+   Target: gov-API and a correctly-built Elmhurst within ~0.5 SAP. Cosmetic /
+   representation diffs (codes vs strings, empty `EnergyElement` lists) are noise.
+
+6. **Lock it in.** Once converged on a value you trust, add a case to
+   `tests/domain/sap10_calculator/test_real_cert_sap_accuracy.py`:
+   ```python
+   RealCertExpectation(
+       schema="<schema>", sample="uprn_<uprn>",
+       cert_num="<cert>", sap_score=<converged engine score>,
+   )
+   ```
+   with a comment recording the ground truth + what reconciled it. If a known
+   engine bug still blocks it, use `known_bug_xfail="…"` (strict xfail) instead
+   of widening. Run `pytest tests/domain/sap10_calculator/test_real_cert_sap_accuracy.py`
+   — it must pass (or xfail with the documented reason).
+
+## Notes
+
+- The sample dir IS the corpus entry — capturing + saving the PDFs there is all
+  the "expand the tests" bookkeeping needed; step 6 is what activates it.
+- `sap_score` pins the gov-API engine's integer SAP (the production path). Add
+  per-end-use kWh pins to the same `RealCertExpectation` later (worksheet-
+  validated) to tighten coverage.
+- Don't tune the mapper to a single cert — pin the observed value and fix mapper
+  gaps generically, guarded by the RdSAP-21.0.1 corpus gauge
+  (`tests/infrastructure/epc_client/test_sap_accuracy_corpus.py`).
+
+## Worked example
+
+UPRN **10002468137** (`RdSAP-Schema-17.1`): gov-API 60.92, Elmhurst 61 — converged
+after aligning dual immersion, 110 L cylinder, and 2 baths. Pinned `sap_score=61`.
+The journey closed an off-peak-water-heating bug (Table 13) and a reduced-field
+window-U bug; the calculator matched Elmhurst's cost to the penny throughout.
--- a/backend/epc_api/json_samples/real_life_examples/RdSAP-Schema-17.1/uprn_10002468137/elmhurst_summary.pdf
+++ b/backend/epc_api/json_samples/real_life_examples/RdSAP-Schema-17.1/uprn_10002468137/elmhurst_summary.pdf
--- a/backend/epc_api/json_samples/real_life_examples/RdSAP-Schema-17.1/uprn_10002468137/elmhurst_worksheet.pdf
+++ b/backend/epc_api/json_samples/real_life_examples/RdSAP-Schema-17.1/uprn_10002468137/elmhurst_worksheet.pdf
--- a/domain/sap10_calculator/worksheet/heat_transmission.py
+++ b/domain/sap10_calculator/worksheet/heat_transmission.py
@ -41,7 +41,7 @@ from __future__ import annotations

 from dataclasses import dataclass
 from decimal import ROUND_HALF_UP, Decimal
-from typing import Any, Final, Optional
+from typing import Any, Final, Optional, Sequence, Tuple

 from datatypes.epc.domain.epc_property_data import (
    EpcPropertyData,
@ -126,6 +126,48 @@ _DEFAULT_STOREY_HEIGHT_M: Final[float] = 2.5
 # SAP10.2 §3.2 curtain/blind thermal resistance applied to windows (and
 # roof windows) — turns raw window U into the worksheet's (27) effective U.
 _WINDOW_CURTAIN_RESISTANCE_M2K_PER_W: Final[float] = 0.04
+
+# SAP10 glazing-type code (the cascade enum used on `SapWindow.glazing_type`,
+# see solar_gains `_G_PERPENDICULAR_BY_GLAZING_TYPE`) → the `u_window` glazing
+# category + the install-year band the code implies. Used to derive the raw
+# window U for SYNTHESISED (reduced-field) windows that carry no per-window
+# U lodgement — previously these all fell to `u_window`'s all-None placeholder
+# (2.5), regardless of glazing, under-counting window heat loss vs RdSAP Table
+# 24 (e.g. double pre-2002 should be 2.8, not 2.5).
+_GLAZING_CODE_TO_UWINDOW: Final[dict[int, Tuple[str, Optional[int]]]] = {
+    1: ("single", None),
+    2: ("double", 2002),     # double 2002-2022
+    3: ("double", None),     # double pre-2002 (None → pre-2002 row)
+    4: ("double", None),     # double low-E soft-coat
+    5: ("secondary", None),
+    6: ("triple", None),     # triple pre-2002 default
+    7: ("double", None),     # double, known data
+    8: ("triple", None),     # triple, known data
+    9: ("triple", 2002),     # triple 2002-2022
+    10: ("triple", None),    # triple pre-2002
+    11: ("secondary", None),
+    12: ("secondary", None),
+    13: ("double", 2022),    # double 2022+
+    14: ("triple", 2022),    # triple 2022+
+    15: ("single", None),
+}
+
+
+def _synthesised_window_u_raw(windows: Optional[Sequence[SapWindow]]) -> float:
+    """Raw (pre-curtain) window U for reduced-field windows with no per-window
+    U lodgement. Derives glazing category + install-year band from the
+    (uniform) synthesised `glazing_type` code and routes through `u_window`
+    (RdSAP Table 24), rather than the all-None 2.5 placeholder."""
+    if not windows:
+        return u_window(installed_year=None, glazing_type=None, frame_type=None)
+    w = windows[0]
+    code = w.glazing_type
+    glaze, year = (
+        _GLAZING_CODE_TO_UWINDOW.get(code, ("double", None))
+        if isinstance(code, int)
+        else ("double", None)
+    )
+    return u_window(installed_year=year, glazing_type=glaze, frame_type=w.frame_material)
 # RdSAP10 §15 "Rounding of data" (p.66): "All element areas (gross)
 # including window areas and conservatory wall area: 2 d.p." plus
 # "U-values: 2 d.p.". This is the data-passed-to-SAP-calculator
@ -632,8 +674,10 @@ def heat_transmission_from_cert(
            )
            windows_w_per_k_total += a_w * u_eff_w
    else:
-        window_u_raw = window_avg_u_value if (window_avg_u_value or 0) > 0 else u_window(
-            installed_year=None, glazing_type=None, frame_type=None
+        window_u_raw = (
+            window_avg_u_value
+            if (window_avg_u_value or 0) > 0
+            else _synthesised_window_u_raw(epc.sap_windows)
        )
        window_u = (
            1.0 / (1.0 / window_u_raw + _WINDOW_CURTAIN_RESISTANCE_M2K_PER_W)
--- a/scripts/compare_epc_paths.py
+++ b/scripts/compare_epc_paths.py
@ -0,0 +1,135 @@
+"""Compare the two EpcPropertyData source paths for one real cert, to
+separate MAPPER fidelity from CALCULATOR correctness.
+
+For a cert captured under
+``backend/epc_api/json_samples/real_life_examples/<schema>/uprn_<uprn>/``
+this:
+  1. builds `EpcPropertyData` from the gov-EPC API json (`epc.json`), and
+  2. builds `EpcPropertyData` from the Elmhurst summary PDF
+     (`elmhurst_summary.pdf`) via `parse_site_notes_pdf`,
+then deep-diffs the two and runs BOTH through `Sap10Calculator`. Where the
+two objects match, any SAP gap is the calculator; where they differ, it's
+input mapping / data entry. If `elmhurst_worksheet.pdf` is present its
+printed SAP rating (258) is shown as the ground truth.
+
+USAGE
+-----
+    PYTHONPATH=/workspaces/model python scripts/compare_epc_paths.py <uprn>
+
+Part of the `validate-cert-sap-accuracy` workflow — see that skill.
+"""
+
+from __future__ import annotations
+
+import dataclasses
+import json
+import re
+import sys
+from pathlib import Path
+from typing import Any, Optional
+from unittest.mock import patch
+
+import httpx
+
+from backend.documents_parser.parser import parse_site_notes_pdf
+from datatypes.epc.domain.epc_property_data import EpcPropertyData
+from datatypes.epc.domain.mapper import EpcPropertyDataMapper
+from domain.sap10_calculator.calculator import Sap10Calculator
+
+_ROOT = Path("backend/epc_api/json_samples/real_life_examples")
+
+
+def _find_sample_dir(uprn: str) -> Path:
+    matches = list(_ROOT.glob(f"*/uprn_{uprn}"))
+    if not matches:
+        raise SystemExit(
+            f"no sample dir for UPRN {uprn} under {_ROOT} — capture it first "
+            f"with scripts/fetch_real_life_epc_sample.py {uprn}"
+        )
+    return matches[0]
+
+
+def _gov_api_epc(epc_json: Path) -> EpcPropertyData:
+    data = json.loads(epc_json.read_text())
+
+    def _mock(*_a: object, **_k: object) -> httpx.Response:
+        return httpx.Response(
+            200, json={"data": data}, request=httpx.Request("GET", "x")
+        )
+
+    # Route the raw payload through the real mapper (httpx mocked, no network).
+    with patch("httpx.get", side_effect=_mock):
+        from infrastructure.epc_client.epc_client_service import EpcClientService
+
+        return EpcClientService(auth_token="t").get_by_certificate_number("x")
+
+
+def _elmhurst_printed_sap(worksheet_pdf: Path) -> Optional[int]:
+    if not worksheet_pdf.exists():
+        return None
+    import fitz  # pymupdf
+
+    text = "\n".join(p.get_text() for p in fitz.open(str(worksheet_pdf)))
+    for line in text.splitlines():
+        if "SAP rating" in line and "(258)" in line:
+            # value sits immediately before the "(258)" line ref
+            match = re.search(r"(\d+)\s*\(258\)", line)
+            if match:
+                return int(match.group(1))
+    return None
+
+
+def _deep_diff(a: Any, b: Any, prefix: str, out: list[str]) -> None:
+    if dataclasses.is_dataclass(a) and dataclasses.is_dataclass(b):
+        for f in dataclasses.fields(a):
+            _deep_diff(getattr(a, f.name), getattr(b, f.name), f"{prefix}.{f.name}", out)
+    elif isinstance(a, list) and isinstance(b, list):
+        if len(a) != len(b):
+            out.append(f"  {prefix}: LEN {len(a)} vs {len(b)}")
+        for i, (x, y) in enumerate(zip(a, b)):
+            _deep_diff(x, y, f"{prefix}[{i}]", out)
+    elif a != b:
+        out.append(f"  {prefix}: API={a!r}  ELM={b!r}")
+
+
+def compare(uprn: str) -> None:
+    sample = _find_sample_dir(uprn)
+    print(f"=== {sample} ===")
+    gov = _gov_api_epc(sample / "epc.json")
+
+    summary = sample / "elmhurst_summary.pdf"
+    elm: Optional[EpcPropertyData] = None
+    if summary.exists():
+        elm = parse_site_notes_pdf(str(summary))
+    else:
+        print("  (no elmhurst_summary.pdf yet — gov-API side only)")
+
+    rg = Sap10Calculator().calculate(gov)
+    print("\nOUR ENGINE:")
+    print(
+        f"  gov-API inputs      → SAP {rg.sap_score} ({rg.sap_score_continuous:.2f})"
+        f"  HW {rg.hot_water_kwh_per_yr:.0f} kWh  cost £{rg.total_fuel_cost_gbp:.2f}"
+    )
+    if elm is not None:
+        re_ = Sap10Calculator().calculate(elm)
+        print(
+            f"  Elmhurst-PDF inputs → SAP {re_.sap_score} ({re_.sap_score_continuous:.2f})"
+            f"  HW {re_.hot_water_kwh_per_yr:.0f} kWh  cost £{re_.total_fuel_cost_gbp:.2f}"
+        )
+        printed = _elmhurst_printed_sap(sample / "elmhurst_worksheet.pdf")
+        if printed is not None:
+            print(f"  Elmhurst's OWN engine (worksheet 258): {printed}")
+        diffs: list[str] = []
+        _deep_diff(gov, elm, "epc", diffs)
+        print(f"\nFIELD DIFFS gov-API vs Elmhurst ({len(diffs)}):")
+        print("\n".join(diffs) if diffs else "  (none — paths identical)")
+
+
+def main() -> None:
+    if len(sys.argv) != 2:
+        raise SystemExit(__doc__)
+    compare(sys.argv[1])
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/fetch_real_life_epc_sample.py
+++ b/scripts/fetch_real_life_epc_sample.py
@ -0,0 +1,130 @@
+"""Capture a real EPC certificate by UPRN for the SAP accuracy test suite.
+
+Resolves a UPRN to its latest lodged certificate via the GOV.UK EPB
+register, downloads the full ``data`` payload (the exact shape
+``EpcPropertyDataMapper.from_api_response`` consumes), and freezes it
+under the schema-bucketed sample tree the accuracy test reads:
+
+    backend/epc_api/json_samples/real_life_examples/<schema_type>/uprn_<uprn>/epc.json
+
+It also prints the lodged SAP rating and what ``Sap10Calculator``
+currently produces, so a new case can be added to
+``tests/domain/sap10_calculator/test_real_cert_sap_accuracy.py`` with
+the right ``schema`` / ``sap_score`` straight away.
+
+USAGE
+-----
+    PYTHONPATH=/workspaces/model python scripts/fetch_real_life_epc_sample.py <uprn> [<uprn> ...]
+
+Token is read from ``backend/.env`` (``OPEN_EPC_API_TOKEN``, falling
+back to ``EPC_AUTH_TOKEN``). Re-running overwrites the sample.
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import pathlib
+import sys
+from typing import Any
+
+import httpx
+from dotenv import load_dotenv
+
+_BASE = "https://api.get-energy-performance-data.communities.gov.uk"
+_SAMPLES_ROOT = pathlib.Path(
+    "backend/epc_api/json_samples/real_life_examples"
+)
+
+
+def _headers() -> dict[str, str]:
+    load_dotenv("backend/.env")
+    token = os.environ.get("OPEN_EPC_API_TOKEN") or os.environ["EPC_AUTH_TOKEN"]
+    return {"Authorization": f"Bearer {token}", "Accept": "application/json"}
+
+
+def _latest_cert_number(uprn: int, headers: dict[str, str]) -> str:
+    resp = httpx.get(
+        f"{_BASE}/api/domestic/search",
+        params={"uprn": uprn},
+        headers=headers,
+        timeout=30.0,
+    )
+    resp.raise_for_status()
+    rows: list[dict[str, Any]] = resp.json().get("data", [])
+    if not rows:
+        raise SystemExit(f"UPRN {uprn}: no certificates found")
+    latest = max(rows, key=lambda r: r["registrationDate"])
+    return str(latest["certificateNumber"])
+
+
+def _fetch_cert_data(cert_num: str, headers: dict[str, str]) -> dict[str, Any]:
+    resp = httpx.get(
+        f"{_BASE}/api/certificate",
+        params={"certificate_number": cert_num},
+        headers=headers,
+        timeout=30.0,
+    )
+    resp.raise_for_status()
+    data: dict[str, Any] = resp.json()["data"]
+    return data
+
+
+def _report(uprn: int, cert_num: str, data: dict[str, Any]) -> None:
+    """Print lodged rating + current calculator output for the captured cert."""
+    from infrastructure.epc_client.epc_client_service import EpcClientService
+    from domain.sap10_calculator.calculator import Sap10Calculator
+    from unittest.mock import patch
+
+    def _mock(*_a: object, **_k: object) -> httpx.Response:
+        return httpx.Response(
+            200, json={"data": data}, request=httpx.Request("GET", "x")
+        )
+
+    print(f"  schema_type        : {data.get('schema_type')}")
+    print(f"  lodged rating      : {data.get('energy_rating_current')}")
+
+    service = EpcClientService(auth_token="test-token")
+    try:
+        with patch("httpx.get", side_effect=_mock):
+            epc = service.get_by_certificate_number(cert_num)
+    except ValueError as exc:
+        # Full-SAP (vs RdSAP) certs aren't supported by the mapper, so the
+        # calculator front-end can't consume them. Captured for reference
+        # but NOT addable to the RdSAP accuracy suite.
+        print(f"  NOT MAPPABLE       : {exc}")
+        return
+    result = Sap10Calculator().calculate(epc)
+
+    print(f"  calc sap_score     : {result.sap_score}")
+    print(f"  space_heating_kwh  : {result.space_heating_kwh_per_yr:.4f}")
+    print(f"  main_heating_kwh   : {result.main_heating_fuel_kwh_per_yr:.4f}")
+    print(f"  hot_water_kwh      : {result.hot_water_kwh_per_yr:.4f}")
+    print(f"  co2_kg_per_yr      : {result.co2_kg_per_yr:.4f}")
+
+
+def capture(uprn: int) -> None:
+    headers = _headers()
+    cert_num = _latest_cert_number(uprn, headers)
+    data = _fetch_cert_data(cert_num, headers)
+
+    schema_type = str(data.get("schema_type") or "unknown-schema")
+    out_dir = _SAMPLES_ROOT / schema_type / f"uprn_{uprn}"
+    out_dir.mkdir(parents=True, exist_ok=True)
+    out = out_dir / "epc.json"
+    out.write_text(json.dumps(data, indent=2))
+
+    print(f"UPRN {uprn} -> cert {cert_num}")
+    print(f"  wrote              : {out}")
+    _report(uprn, cert_num, data)
+
+
+def main() -> None:
+    if len(sys.argv) < 2:
+        raise SystemExit(__doc__)
+    for arg in sys.argv[1:]:
+        capture(int(arg))
+
+
+if __name__ == "__main__":
+    main()
--- a/tests/domain/sap10_calculator/test_real_cert_sap_accuracy.py
+++ b/tests/domain/sap10_calculator/test_real_cert_sap_accuracy.py
@ -107,25 +107,21 @@ _EXPECTATIONS: Final[tuple[RealCertExpectation, ...]] = (
    ),
    # UPRN 10002468137 → cert 0215-2818-7357-9703-2145. RdSAP-Schema-17.1,
    # all-electric high-heat-retention storage heaters on Economy 7, solid-
-    # brick uninsulated end-terrace. Ground truth is Elmhurst RdSAP10 = 60,
-    # reproduced on identical inputs (summary + full SAP 10.2 worksheet saved
-    # alongside: elmhurst_summary.pdf / elmhurst_worksheet.pdf). The engine
-    # produces 62 — a +2 over-rating localised to OFF-PEAK WATER HEATING:
-    # the worksheet (lines 243-246) prices the 7-hour off-peak immersion at a
-    # Table 13 split (19.36% @ 15.29p high + 80.64% @ 5.5p low), but the engine
-    # prices 100% at the 5.5p low rate, under-costing the bill (£595.68 vs
-    # £629.67) → lower ECF (2.69 vs 2.84) → SAP 62 not 60. (Space heating 100%
-    # off-peak IS correct for storage heaters — the worksheet agrees.) Strict
-    # xfail until the off-peak water-heating rate split is implemented.
+    # brick uninsulated end-terrace. Validated against Elmhurst RdSAP10 on
+    # identical (lodged) inputs: dual off-peak immersion, 110 L Normal cylinder,
+    # 2 baths → Elmhurst 61, our engine 60.92 (cost £620.38 vs Elmhurst £619.37
+    # — within £1; the residual is the 3.4 m² alt-wall the gov-API mapper drops).
+    # Evidence saved alongside: elmhurst_summary.pdf / elmhurst_worksheet.pdf.
+    # The +2 over-rating first seen (62) was closed by main's Table 13 off-peak
+    # water-heating fix (PR #1217) plus the reduced-field window-U fix (u_window
+    # all-None fallback → glazing-aware raw U, heat_transmission.py). Calculator
+    # confirmed exact: fed Elmhurst's own inputs it reproduces Elmhurst's cost
+    # to the penny. (lodged 55 is the old SAP-2012 schema — not comparable.)
    RealCertExpectation(
        schema="RdSAP-Schema-17.1",
        sample="uprn_10002468137",
        cert_num="0215-2818-7357-9703-2145",
-        sap_score=60,
-        known_bug_xfail=(
-            "off-peak (7-hour) water-heating high/low rate split not applied — "
-            "engine prices 100% at the low rate; see elmhurst_worksheet.pdf (243-246)"
-        ),
+        sap_score=61,
    ),
 )