Model/docs/HANDOVER_API_FETCH_AND_REPORT.md
Khalim Conn-Kowlessar 77e29ac2f8 docs(modelling): handover — EPC API fetch + property inspection report
Next-phase handover: fetch live EPCs via EpcClientService, run the
offline Modelling harness, and save a per-property report covering
(1) lodged-vs-calculated SAP divergence (>0.5), (2) plans + costings,
(3) recommended measures + the EPC attributes that triggered them. Maps
the EPC API client (the user's blocker), the calculator-error ingredients
(parity_report scaffolding), and each generator's exact trigger fields.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 10:46:31 +00:00

84 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# HANDOVER — EPC API fetch + property inspection report
**Branch:** `feature/bill-derivation` (worktree `/workspaces/home/hestia-worktrees/model-assemble-new-backend`). **HEAD:** `7be4d83f`.
**Prior phase (DONE this session):** DB-less offline Modelling harness + `material_id` + Valuation Uplift + fuel-rate proxies. See "What already exists" below.
## The goal (this phase)
Fetch real EPCs **from the live EPC API**, run them through the offline Modelling harness, and **save a per-property report** covering three things:
1. **Calculator error** — for each property, compare the **lodged SAP** on the API response against **our calculator's** SAP; flag where `|lodged calculated| > 0.5`.
2. **Plans + costings** — the optimised Plan: measures, cost of works + contingency, SAP/band transition, bill & CO₂ savings, valuation uplift.
3. **Individual recommended measures + the property attributes that triggered them** — for each fired measure, show the EPC field(s) and value(s) that caused the generator to recommend it (the "why").
## FIRST: read these
1. This file (the API client + the three report ingredients are mapped below — load-bearing).
2. `docs/HANDOVER_MODELLING.md` + auto-memory `project_modelling_stage_state` — full Modelling state.
3. `CONTEXT.md` — glossary, esp. **Calculated SAP10 Performance**, **Validation Cohort**, **Lodged Performance** (the calculator-divergence concept behind report #1), and Plan / Plan Measure / Recommendation.
4. ADR-0010/0013 (calculator shadow-validation), ADR-0014 (bills), ADR-0016 (scoring), ADR-0018 (valuation).
## What already exists (build ON this, don't rebuild)
- **Offline harness (no DB, no network for modelling):**
- `harness/console.py::run_modelling(epc, goal_band="C", current_market_value=None, print_table=True) -> Plan` — runs ONLY the Modelling stage (no Ingestion/Baseline), so it needs no lodged-performance/RHI and works on any calculator-scorable EPC. (`run_one` is the full pipeline; use `run_modelling` for inspection.)
- `harness/cohort.py::run_cohort(paths) -> list[CertResult]` + `format_cohort_summary` + `format_cohort_csv`. `CertResult` carries the `Plan` (+ flat `measures`/`baseline_sap`/`post_sap`). Errors are captured per-cert, never abort the sweep.
- `scripts/run_modelling_cohort.py` — CLI over a directory of API JSONs (prints tables + summary, writes `modelling_cohort.csv`, gitignored).
- `harness/plan_table.py::format_plan_table(plan)` — the sense-check table.
- `harness/sample_catalogue.json` — prices all 5 generator measure types (cavity/loft/solid-floor/suspended-floor/ventilation).
- In-memory `FakeUnitOfWork` etc. in `tests/orchestration/fakes.py`.
- **Proven offline:** the 57 golden API certs (`tests/domain/sap10_calculator/rdsap/fixtures/golden/*.json`, schema 21.0.1, API-shaped) run **57/57, 0 errors** after the fuel-rate proxies landed.
## Report ingredient #1 — EPC API client (the user's "can't find the file")
- **Client:** `infrastructure/epc_client/epc_client_service.py::EpcClientService`.
- Base URL `https://api.get-energy-performance-data.communities.gov.uk`; **Bearer token** in the constructor.
- **Env var:** the bulk-fetch script reads `OPEN_EPC_API_TOKEN` (`scripts/fetch_cohort2_api_jsons.py:49`); CONTEXT.md's glossary names the New-EPC-API token `EPC_AUTH_TOKEN`. **Confirm which is set in `backend/.env` before relying on either.**
- Methods: `get_by_uprn(uprn) -> Optional[EpcPropertyData]`, `get_by_certificate_number(cert) -> EpcPropertyData`, `search_by_postcode(postcode) -> list[EpcSearchResult]`. Internally hits `/api/certificate` + `/api/domestic/search`, unwraps `data`, maps via `EpcPropertyDataMapper.from_api_response`. Handles 404/429 + retry.
- **Working example to copy:** `scripts/fetch_cohort2_api_jsons.py` bulk-fetches raw API JSON and writes one file per cert (it calls the client's certificate fetch via a retry wrapper). Mirror it to fetch the user's target set (by UPRN list / postcode) into a dump dir, then feed that dir to `run_cohort`.
- **NOTE:** the API returns the cert as raw JSON identical to the committed golden fixtures, so the **same `from_api_response` path** the harness already uses applies. The raw JSON (not just the mapped EPC) is what report #1 needs — keep both (raw for the lodged SAP, mapped for the calculator + generators).
## Report ingredient #2 — lodged vs calculated SAP (calculator error > 0.5)
- **Calculated:** `domain/sap10_calculator/calculator.py::Sap10Calculator().calculate(epc) -> SapResult`; use `SapResult.sap_score_continuous` (un-rounded) — `sap_score` is the rounded int.
- **Lodged:** `EpcPropertyData.energy_rating_current` (mapped from the API response; SAP points 0100). (Confirm it is populated for live certs — some samples leave it blank; the API response itself carries `current-energy-efficiency`.)
- **Divergence:** `error = epc.energy_rating_current calculate(epc).sap_score_continuous`; flag `abs(error) > 0.5`. This is exactly the **Validation Cohort / shadow-validation** idea (ADR-0010/0013) — the calculator runs alongside the lodged figure and logs divergence.
- **Existing scaffolding:** `domain/sap10_calculator/validation/parity_report.py``ParityCase(certificate_number, actual_sap, predicted_sap, is_typical)` + `build_parity_report(...) -> ParityReport` (MAE / RMSE / bias / worst-N). The 0.5 is a **design target, not a hardcoded filter** — you implement the per-property flag. Consider reusing `ParityCase`/`build_parity_report` for the cohort-level stats in the report.
- **Gotcha:** the calculator can **raise** on an un-mapped cert (UnmappedSapCode / UnmappedApiCode) — catch per-cert (like `run_cohort` does) so one bad cert doesn't abort the report; record the raise as the "error" for that property.
## Report ingredient #3 — measures + the attributes that triggered them
Each generator reads `epc.sap_building_parts` filtered to `BuildingPartIdentifier.MAIN` (ventilation is whole-dwelling). The exact trigger fields (so the report can say "fired because X = Y"):
| Measure | Trigger fields (on `SapBuildingPart` unless noted) | Fires when |
|---|---|---|
| **cavity_wall_insulation** | `wall_construction`, `wall_insulation_type` | `wall_construction == 4` (cavity) AND `wall_insulation_type == 4` (as-built/uninsulated) — `wall_recommendation.py:42` |
| **loft_insulation** | `roof_insulation_thickness` | `== 0` (uninsulated loft) — `roof_recommendation.py:41` |
| **{suspended,solid}_floor_insulation** | `floor_insulation_thickness`, `floor_construction_type` | thickness None/blank/"0" AND construction contains "suspended"/"solid" — `floor_recommendation.py:64` |
| **mechanical_ventilation** | `epc.sap_ventilation.mechanical_ventilation_kind` (whole-dwelling) | `sap_ventilation is None` OR `mechanical_ventilation_kind is None` (not already mechanically ventilated); only injected when a wall is selected (Measure Dependency) — `ventilation_recommendation.py:41` |
To produce report #3, run each generator on the EPC (or read the Plan's `PlanMeasure.measure_type`) and, for each fired measure, surface the above field values from `epc.sap_building_parts[MAIN]` (and `sap_ventilation`). The generators currently only return the Recommendation — you may add a small "explain" helper that returns the trigger fields, or read them directly off the EPC in the report builder.
## Suggested shape (grill the owner first if unsure)
Extend `harness/cohort.py` / a new `harness/report.py`:
- Enrich `CertResult` with `lodged_sap`, `calculated_sap`, `sap_error`, `sap_error_exceeds_0_5` (report #1), and a per-measure `[(measure_type, {trigger_field: value})]` list (report #3). Plan/costings (report #2) already on `CertResult.plan`.
- A `format_report` (Markdown and/or CSV) with the three sections; the script writes it to a file (gitignore the artifact).
- A live-fetch entrypoint: a script that takes a UPRN list / postcode, fetches via `EpcClientService` into a dump dir (raw JSON), then runs the report. Keep the raw JSON so #1 has the lodged figure.
## Critical gotchas (carry these)
- **Worktree import trap** — run via `pytest` / `python -m` from the worktree root, NOT `python /tmp/foo.py` (imports `/workspaces/model`).
- **`mip`/CBC broken on aarch64; `moto` not installed** — `--ignore tests/orchestration/test_postcode_splitter_orchestrator.py` + `tests/repositories/unstandardised_address/` when sweeping. Run tests `python -m pytest <path> -q` (NOT `-p no:cov`).
- **Don't edit `heat_transmission.py`** (another agent owns it). Per-element U-values still aren't surfaced in `SapResult` (deferred — a request to that owner).
- **Live API calls hit the network + rate limits (429)** — the client retries; for a big fetch, throttle and cache raw JSON to disk (mirror `fetch_cohort2_api_jsons.py`), then run the report offline against the cached dump.
- **Fuel proxies:** COAL + HEAT_NETWORK are documented **estimates** (see `repositories/fuel_rates/data/fuel_rates_2026_q2.json` `_note`/`_gaps`); coal/heat-network bills are indicative.
- **Many certs yield 0 measures** — they're already efficient; that's correct, not a bug. Report #1 (calculator error) is independent of whether measures fire.
## Conventions
Stay on `feature/bill-derivation`; one TDD slice = one commit; conventional-commit ending `Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>`; AAA test headers; assert `abs(x - y) <= tol` (not `pytest.approx`); pyright strict zero errors; annotate call-return locals.
## How to start
Confirm the API token env var + that you can fetch one cert (`EpcClientService(...).get_by_uprn(<uprn>)`). Then decide with the owner: report format (Markdown report + CSV?), the property set (UPRN list / postcode / the user's dump), and whether the calculator-error section is per-property flags + a cohort ParityReport. Then TDD the report builder on the committed golden certs (offline) before pointing it at the live API.