From 77e29ac2f8ad5cd1eee48efe36503cb243caf1a9 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Thu, 4 Jun 2026 10:46:31 +0000 Subject: [PATCH] =?UTF-8?q?docs(modelling):=20handover=20=E2=80=94=20EPC?= =?UTF-8?q?=20API=20fetch=20+=20property=20inspection=20report?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Next-phase handover: fetch live EPCs via EpcClientService, run the offline Modelling harness, and save a per-property report covering (1) lodged-vs-calculated SAP divergence (>0.5), (2) plans + costings, (3) recommended measures + the EPC attributes that triggered them. Maps the EPC API client (the user's blocker), the calculator-error ingredients (parity_report scaffolding), and each generator's exact trigger fields. Co-Authored-By: Claude Opus 4.8 --- docs/HANDOVER_API_FETCH_AND_REPORT.md | 84 +++++++++++++++++++++++++++ 1 file changed, 84 insertions(+) create mode 100644 docs/HANDOVER_API_FETCH_AND_REPORT.md diff --git a/docs/HANDOVER_API_FETCH_AND_REPORT.md b/docs/HANDOVER_API_FETCH_AND_REPORT.md new file mode 100644 index 00000000..14f78655 --- /dev/null +++ b/docs/HANDOVER_API_FETCH_AND_REPORT.md @@ -0,0 +1,84 @@ +# HANDOVER — EPC API fetch + property inspection report + +**Branch:** `feature/bill-derivation` (worktree `/workspaces/home/hestia-worktrees/model-assemble-new-backend`). **HEAD:** `7be4d83f`. +**Prior phase (DONE this session):** DB-less offline Modelling harness + `material_id` + Valuation Uplift + fuel-rate proxies. See "What already exists" below. + +## The goal (this phase) + +Fetch real EPCs **from the live EPC API**, run them through the offline Modelling harness, and **save a per-property report** covering three things: + +1. **Calculator error** — for each property, compare the **lodged SAP** on the API response against **our calculator's** SAP; flag where `|lodged − calculated| > 0.5`. +2. **Plans + costings** — the optimised Plan: measures, cost of works + contingency, SAP/band transition, bill & CO₂ savings, valuation uplift. +3. **Individual recommended measures + the property attributes that triggered them** — for each fired measure, show the EPC field(s) and value(s) that caused the generator to recommend it (the "why"). + +## FIRST: read these + +1. This file (the API client + the three report ingredients are mapped below — load-bearing). +2. `docs/HANDOVER_MODELLING.md` + auto-memory `project_modelling_stage_state` — full Modelling state. +3. `CONTEXT.md` — glossary, esp. **Calculated SAP10 Performance**, **Validation Cohort**, **Lodged Performance** (the calculator-divergence concept behind report #1), and Plan / Plan Measure / Recommendation. +4. ADR-0010/0013 (calculator shadow-validation), ADR-0014 (bills), ADR-0016 (scoring), ADR-0018 (valuation). + +## What already exists (build ON this, don't rebuild) + +- **Offline harness (no DB, no network for modelling):** + - `harness/console.py::run_modelling(epc, goal_band="C", current_market_value=None, print_table=True) -> Plan` — runs ONLY the Modelling stage (no Ingestion/Baseline), so it needs no lodged-performance/RHI and works on any calculator-scorable EPC. (`run_one` is the full pipeline; use `run_modelling` for inspection.) + - `harness/cohort.py::run_cohort(paths) -> list[CertResult]` + `format_cohort_summary` + `format_cohort_csv`. `CertResult` carries the `Plan` (+ flat `measures`/`baseline_sap`/`post_sap`). Errors are captured per-cert, never abort the sweep. + - `scripts/run_modelling_cohort.py` — CLI over a directory of API JSONs (prints tables + summary, writes `modelling_cohort.csv`, gitignored). + - `harness/plan_table.py::format_plan_table(plan)` — the sense-check table. + - `harness/sample_catalogue.json` — prices all 5 generator measure types (cavity/loft/solid-floor/suspended-floor/ventilation). + - In-memory `FakeUnitOfWork` etc. in `tests/orchestration/fakes.py`. +- **Proven offline:** the 57 golden API certs (`tests/domain/sap10_calculator/rdsap/fixtures/golden/*.json`, schema 21.0.1, API-shaped) run **57/57, 0 errors** after the fuel-rate proxies landed. + +## Report ingredient #1 — EPC API client (the user's "can't find the file") + +- **Client:** `infrastructure/epc_client/epc_client_service.py::EpcClientService`. + - Base URL `https://api.get-energy-performance-data.communities.gov.uk`; **Bearer token** in the constructor. + - **Env var:** the bulk-fetch script reads `OPEN_EPC_API_TOKEN` (`scripts/fetch_cohort2_api_jsons.py:49`); CONTEXT.md's glossary names the New-EPC-API token `EPC_AUTH_TOKEN`. **Confirm which is set in `backend/.env` before relying on either.** + - Methods: `get_by_uprn(uprn) -> Optional[EpcPropertyData]`, `get_by_certificate_number(cert) -> EpcPropertyData`, `search_by_postcode(postcode) -> list[EpcSearchResult]`. Internally hits `/api/certificate` + `/api/domestic/search`, unwraps `data`, maps via `EpcPropertyDataMapper.from_api_response`. Handles 404/429 + retry. +- **Working example to copy:** `scripts/fetch_cohort2_api_jsons.py` bulk-fetches raw API JSON and writes one file per cert (it calls the client's certificate fetch via a retry wrapper). Mirror it to fetch the user's target set (by UPRN list / postcode) into a dump dir, then feed that dir to `run_cohort`. +- **NOTE:** the API returns the cert as raw JSON identical to the committed golden fixtures, so the **same `from_api_response` path** the harness already uses applies. The raw JSON (not just the mapped EPC) is what report #1 needs — keep both (raw for the lodged SAP, mapped for the calculator + generators). + +## Report ingredient #2 — lodged vs calculated SAP (calculator error > 0.5) + +- **Calculated:** `domain/sap10_calculator/calculator.py::Sap10Calculator().calculate(epc) -> SapResult`; use `SapResult.sap_score_continuous` (un-rounded) — `sap_score` is the rounded int. +- **Lodged:** `EpcPropertyData.energy_rating_current` (mapped from the API response; SAP points 0–100). (Confirm it is populated for live certs — some samples leave it blank; the API response itself carries `current-energy-efficiency`.) +- **Divergence:** `error = epc.energy_rating_current − calculate(epc).sap_score_continuous`; flag `abs(error) > 0.5`. This is exactly the **Validation Cohort / shadow-validation** idea (ADR-0010/0013) — the calculator runs alongside the lodged figure and logs divergence. +- **Existing scaffolding:** `domain/sap10_calculator/validation/parity_report.py` — `ParityCase(certificate_number, actual_sap, predicted_sap, is_typical)` + `build_parity_report(...) -> ParityReport` (MAE / RMSE / bias / worst-N). The 0.5 is a **design target, not a hardcoded filter** — you implement the per-property flag. Consider reusing `ParityCase`/`build_parity_report` for the cohort-level stats in the report. +- **Gotcha:** the calculator can **raise** on an un-mapped cert (UnmappedSapCode / UnmappedApiCode) — catch per-cert (like `run_cohort` does) so one bad cert doesn't abort the report; record the raise as the "error" for that property. + +## Report ingredient #3 — measures + the attributes that triggered them + +Each generator reads `epc.sap_building_parts` filtered to `BuildingPartIdentifier.MAIN` (ventilation is whole-dwelling). The exact trigger fields (so the report can say "fired because X = Y"): + +| Measure | Trigger fields (on `SapBuildingPart` unless noted) | Fires when | +|---|---|---| +| **cavity_wall_insulation** | `wall_construction`, `wall_insulation_type` | `wall_construction == 4` (cavity) AND `wall_insulation_type == 4` (as-built/uninsulated) — `wall_recommendation.py:42` | +| **loft_insulation** | `roof_insulation_thickness` | `== 0` (uninsulated loft) — `roof_recommendation.py:41` | +| **{suspended,solid}_floor_insulation** | `floor_insulation_thickness`, `floor_construction_type` | thickness None/blank/"0" AND construction contains "suspended"/"solid" — `floor_recommendation.py:64` | +| **mechanical_ventilation** | `epc.sap_ventilation.mechanical_ventilation_kind` (whole-dwelling) | `sap_ventilation is None` OR `mechanical_ventilation_kind is None` (not already mechanically ventilated); only injected when a wall is selected (Measure Dependency) — `ventilation_recommendation.py:41` | + +To produce report #3, run each generator on the EPC (or read the Plan's `PlanMeasure.measure_type`) and, for each fired measure, surface the above field values from `epc.sap_building_parts[MAIN]` (and `sap_ventilation`). The generators currently only return the Recommendation — you may add a small "explain" helper that returns the trigger fields, or read them directly off the EPC in the report builder. + +## Suggested shape (grill the owner first if unsure) + +Extend `harness/cohort.py` / a new `harness/report.py`: +- Enrich `CertResult` with `lodged_sap`, `calculated_sap`, `sap_error`, `sap_error_exceeds_0_5` (report #1), and a per-measure `[(measure_type, {trigger_field: value})]` list (report #3). Plan/costings (report #2) already on `CertResult.plan`. +- A `format_report` (Markdown and/or CSV) with the three sections; the script writes it to a file (gitignore the artifact). +- A live-fetch entrypoint: a script that takes a UPRN list / postcode, fetches via `EpcClientService` into a dump dir (raw JSON), then runs the report. Keep the raw JSON so #1 has the lodged figure. + +## Critical gotchas (carry these) + +- **Worktree import trap** — run via `pytest` / `python -m` from the worktree root, NOT `python /tmp/foo.py` (imports `/workspaces/model`). +- **`mip`/CBC broken on aarch64; `moto` not installed** — `--ignore tests/orchestration/test_postcode_splitter_orchestrator.py` + `tests/repositories/unstandardised_address/` when sweeping. Run tests `python -m pytest -q` (NOT `-p no:cov`). +- **Don't edit `heat_transmission.py`** (another agent owns it). Per-element U-values still aren't surfaced in `SapResult` (deferred — a request to that owner). +- **Live API calls hit the network + rate limits (429)** — the client retries; for a big fetch, throttle and cache raw JSON to disk (mirror `fetch_cohort2_api_jsons.py`), then run the report offline against the cached dump. +- **Fuel proxies:** COAL + HEAT_NETWORK are documented **estimates** (see `repositories/fuel_rates/data/fuel_rates_2026_q2.json` `_note`/`_gaps`); coal/heat-network bills are indicative. +- **Many certs yield 0 measures** — they're already efficient; that's correct, not a bug. Report #1 (calculator error) is independent of whether measures fire. + +## Conventions + +Stay on `feature/bill-derivation`; one TDD slice = one commit; conventional-commit ending `Co-Authored-By: Claude Opus 4.8 `; AAA test headers; assert `abs(x - y) <= tol` (not `pytest.approx`); pyright strict zero errors; annotate call-return locals. + +## How to start + +Confirm the API token env var + that you can fetch one cert (`EpcClientService(...).get_by_uprn()`). Then decide with the owner: report format (Markdown report + CSV?), the property set (UPRN list / postcode / the user's dump), and whether the calculator-error section is per-property flags + a cohort ParityReport. Then TDD the report builder on the committed golden certs (offline) before pointing it at the live API.