Commit graph

17 commits

Author SHA1 Message Date
Khalim Conn-Kowlessar
8323d9cf07 Merge branch 'feature/per-cert-mapper-validation' of https://github.com/Hestia-Homes/Model into feature/bill-derivation 2026-06-05 09:38:40 +00:00
Khalim Conn-Kowlessar
3b442f9606 scripts: promote the API SAP-accuracy toolkit from /tmp
Three reusable scripts (each with a purpose/usage docstring) for wide-scale
testing of the calculator's API front-end against the GOV.UK EPB register —
the toolkit behind the 1000-cert study (docs/HANDOVER_API_SAMPLE_ACCURACY.md):

  fetch_2026_epc_sample.py    — sample cert numbers across a date window
                                (random pages) + download full schema-21 JSON
                                to a cache; resumable, 429/5xx backoff.
  eval_api_sap_accuracy.py    — % within 0.5 SAP, error histogram, worst-40,
                                and the mapper/calculator raise breakdown.
  analyse_api_sap_clusters.py — error grouped by property + heating type to
                                locate clusters (electric heating, flats, PV).

Cache dir defaults to /tmp/epc_2026_sample, overridable via EPC_SAMPLE_CACHE.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 16:52:09 +00:00
Khalim Conn-Kowlessar
afabfa0147 feat(modelling): sample a year from the EPC bulk export, offline-ready
fetch_epc_bulk_sample streams certificates-<year>.json out of the bulk ZIP via
range requests, keeps the first N SAP-version matches, and writes each cert's
inner document to <out>/<cert>.json for run_property_report. Stops after N, so
only the member prefix transfers, not the 15.7 GB archive (RangeFile.bytes_read
reports the true transfer vs the absolute ZIP offset). Verified on 2026: 100
SAP-10.2 certs -> report ran 81 scorable (MAE 2.03), 46 flagged, 19 raises
(11 full-SAP schema 19.1.0, 7 unmapped floor_construction 0/3, 1 missing
post_town) — real shadow-validation signal vs the curated golden 57.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 12:20:57 +00:00
Khalim Conn-Kowlessar
ea3af8d2f4 feat(modelling): CLI to fetch an EPC dump + build the inspection report
run_property_report builds the three-section Markdown+CSV report over a dir of
API-shaped EPC JSON, offline (defaults to the golden 57: 57/57 scorable, MAE
0.54, 6 flagged |Δ|>0.5). fetch_epc_dump pulls raw cert JSON from the live API
by --uprn/--postcode (picking the latest cert per match, skipping existing
files), mirroring fetch_cohort2's proven HTTP shape and reading
OPEN_EPC_API_TOKEN. Report artifacts + epc_dump/ are gitignored.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 11:26:17 +00:00
Khalim Conn-Kowlessar
8b5ab1c59e feat(modelling): turnkey offline cohort script (tables + CSV)
CertResult now carries its Plan (with flat baseline/post-SAP/measures
properties), and `format_cohort_csv` renders one browsable row per cert
(SAP transition, band, measures, cost, bill saving, valuation %, error).
`scripts/run_modelling_cohort.py` is turnkey: no args runs the committed
golden cohort, prints a sense-check table for the first measure-bearing
certs (a capped preview so a large dump doesn't flood the terminal), the
summary, and writes modelling_cohort.csv (gitignored). Point it at the
EPC dump when it lands.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 09:30:53 +00:00
Khalim Conn-Kowlessar
d8ef40c745 feat(modelling): offline cohort runner over an EPC-JSON dump
`harness.cohort.run_cohort(paths)` parses each API-shaped EPC JSON with
from_api_response and models it via run_modelling — no database, no
network — capturing per-cert errors instead of aborting the sweep, plus
`format_cohort_summary`. A thin `scripts/run_modelling_cohort.py` CLI
points it at a directory. Proven over the 57 golden API certs: 56 ran
offline, 15 produced measures, 1 errored (COAL has no Fuel Rates entry —
a BillDerivation coverage gap, not a harness one). Ready for the EPC dump.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 09:23:32 +00:00
Khalim Conn-Kowlessar
d7d5084f90 Move sap10_calculator tests to tests/domain/sap10_calculator/ for CI
The calculator tests lived under domain/sap10_calculator/{tests,worksheet/
tests,rdsap/tests,climate/tests,validation/tests}, none of which are in
pytest.ini testpaths — so CI (which collects tests/) never ran them. Relocate
all five dirs to tests/domain/sap10_calculator/{,worksheet,rdsap,climate,
validation}, mirroring the tests/domain/property_baseline/ convention, so the
cascade-pin / golden / e2e conformance suites run in CI.

Mechanics:
- git mv preserves history (110 files).
- Flattening the trailing /tests keeps each file's depth-to-repo-root
  identical, so all 16 repo-root parents[4] fixture refs stay valid. Only
  test_pcdb_etl.py's parents[1] (→ pcdb data) and one hardcoded absolute
  golden-fixture path in test_cert_to_inputs.py needed rebasing.
- Cross-imports rewritten domain.sap10_calculator.worksheet.tests →
  tests.domain.sap10_calculator.worksheet (21 files incl. the external
  importer backend/documents_parser/tests/test_summary_pdf_mapper_chain.py).
- Golden-fixture path strings in test_summary_pdf_mapper_chain.py +
  scripts/fetch_cohort2_api_jsons.py updated to the new location (the JSONs
  moved with the rdsap tests).

load_cells / gitignored worksheet xlsx: the xlsx-pinned tests (test_dimensions
/ ventilation / water_heating) read 2026-05-19-17-18 RdSap10Worksheet.xlsx,
which is gitignored (.gitignore `*.xlsx`) and so absent in CI. _xlsx_loader.
load_cells now pytest.skip()s when the file is absent, so those tests run
locally and skip cleanly in CI instead of erroring — no new CI failures from
the move, and the gitignore policy is respected.

Verified: tests/domain/sap10_calculator + backend/documents_parser +
tests/domain/property_baseline = 2248 pass, 1 skipped; pyright resolves the
new import paths with zero import-resolution errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 16:58:00 +00:00
Khalim Conn-Kowlessar
caee4de2f4 feat(ingestion): relocate EpcClientService to infrastructure + SolarRepo (#1133)
Move the EpcClientService package (client + _retry + exceptions + tests) from
the dying backend/ tree to infrastructure/epc_client/ as the New-EPC-API Fetcher;
update the two callers (address2UPRN, a script). All 14 client tests pass.

Add SolarRepository port + SolarPostgresRepository persisting Google Solar
building insights as JSONB (solar_building_insights table), one row per Property.
The EPC repo half of this slice already landed in #1129. pyright strict clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 19:45:26 +00:00
Khalim Conn-Kowlessar
22ae6f4d77 Slice S0380.39: bulk-fetch 38 cohort-2 EPC API JSONs for cross-mapper parity
Adds scripts/fetch_cohort2_api_jsons.py (throwaway one-off) plus 38
golden fixtures under domain/sap10_calculator/rdsap/tests/fixtures/golden/
covering every cert in "sap worksheets/additional with api 2/".

Each JSON is the inner `data` payload from the gov.uk EPB
/api/certificate endpoint — the same shape EpcPropertyDataMapper
.from_api_response consumes today.

Required prerequisite for Slice B (parametrized API-path chain test
that mirrors the cohort-2 Summary-path sweep at 1e-4 vs worksheet).
Per the cross-mapper-parity primitive: API EPC and Elmhurst EPC must
produce SAP within 1e-4 of each other and of the worksheet — the SAP
cascade is the load-bearing equivalence check.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 16:40:58 +00:00
Daniel Roth
9f7c16ccbd add address list 2026-05-21 15:30:03 +00:00
Daniel Roth
4e21dda328 rename files in sharepoint to desired structure 2026-05-20 16:26:07 +00:00
Jun-te Kim
c22528299c added type hinting to uprn 2026-05-12 09:40:12 +00:00
Jun-te Kim
c9c43f178c demo generated for use in address2uprn 2026-05-08 14:48:15 +00:00
Jun-te Kim
c498dc1951 init db 2026-03-31 11:45:59 +00:00
Daniel Roth
609468cff9 new methods for downloading all core files for pashub URL. Download currently not being authorised 2026-03-24 08:47:59 +00:00
Daniel Roth
6617d9e614 improved typing 2026-03-23 16:16:20 +00:00
Jun-te Kim
f102aa6a7c move location 2026-03-11 15:27:31 +00:00