Commit graph

14 commits

Author SHA1 Message Date
Khalim Conn-Kowlessar
7e9231b36b fix(debug-tool): read the domain field names, not the schema ones
The first cut of elmhurst_input_sheet.py introspected the `schema`
dataclasses (rdsap_schema_*.py) but the mapper emits the `epc_property_data`
domain types, whose fields differ (wall_thickness_mm not wall_thickness;
total_floor_area_m2 not total_floor_area; frame_material not pvc_frame;
cylinder_insulation_thickness_mm; SapRoomInRoof has gable_*_length_m not
insulation/roof_room_connected). Worse, the getattr-with-None-default helper
printed None over real data, nearly sending a debug session chasing a
non-existent "dimensions dropped" mapper bug on cert 2100 (the dims map
fine; that cert's error is elsewhere). Switched to direct attribute access
so a future rename fails loudly, fixed every field name against the live
domain objects, and added roof_construction_type / floor_type for context.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 19:48:02 +00:00
Khalim Conn-Kowlessar
2c126b2a62 tooling(debug): add scripts/elmhurst_input_sheet.py worksheet-input dumper
Reconstructs the per-cert "Elmhurst SAP input sheet" generator that the
API-accuracy debugging loop relied on (the worked example survives at
'sap worksheets/golden fixture debugging/6035_elmhurst_input_sheet.md'); the
original was a throwaway and never committed. Companion to
eval_api_sap_accuracy.py: once that names a worst-offender cert, this dumps
the codes the mapper hands the calculator (from_api_response → EpcPropertyData)
in the 6035 layout — header, lodged element descriptions, building parts +
dimensions, windows, doors/heating/water/vent — plus the lodged reference
outputs and OUR continuous SAP next to the lodged value, to read side-by-side
with the Elmhurst Summary / P960 worksheet PDF.

Reads the fetch_2026_epc_sample.py cache (EPC_SAMPLE_CACHE, default
/tmp/epc_2026_sample). `--out-dir` writes <cert>_elmhurst_input_sheet.md.
Pyright strict clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 19:39:31 +00:00
Khalim Conn-Kowlessar
3b442f9606 scripts: promote the API SAP-accuracy toolkit from /tmp
Three reusable scripts (each with a purpose/usage docstring) for wide-scale
testing of the calculator's API front-end against the GOV.UK EPB register —
the toolkit behind the 1000-cert study (docs/HANDOVER_API_SAMPLE_ACCURACY.md):

  fetch_2026_epc_sample.py    — sample cert numbers across a date window
                                (random pages) + download full schema-21 JSON
                                to a cache; resumable, 429/5xx backoff.
  eval_api_sap_accuracy.py    — % within 0.5 SAP, error histogram, worst-40,
                                and the mapper/calculator raise breakdown.
  analyse_api_sap_clusters.py — error grouped by property + heating type to
                                locate clusters (electric heating, flats, PV).

Cache dir defaults to /tmp/epc_2026_sample, overridable via EPC_SAMPLE_CACHE.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 16:52:09 +00:00
Khalim Conn-Kowlessar
d7d5084f90 Move sap10_calculator tests to tests/domain/sap10_calculator/ for CI
The calculator tests lived under domain/sap10_calculator/{tests,worksheet/
tests,rdsap/tests,climate/tests,validation/tests}, none of which are in
pytest.ini testpaths — so CI (which collects tests/) never ran them. Relocate
all five dirs to tests/domain/sap10_calculator/{,worksheet,rdsap,climate,
validation}, mirroring the tests/domain/property_baseline/ convention, so the
cascade-pin / golden / e2e conformance suites run in CI.

Mechanics:
- git mv preserves history (110 files).
- Flattening the trailing /tests keeps each file's depth-to-repo-root
  identical, so all 16 repo-root parents[4] fixture refs stay valid. Only
  test_pcdb_etl.py's parents[1] (→ pcdb data) and one hardcoded absolute
  golden-fixture path in test_cert_to_inputs.py needed rebasing.
- Cross-imports rewritten domain.sap10_calculator.worksheet.tests →
  tests.domain.sap10_calculator.worksheet (21 files incl. the external
  importer backend/documents_parser/tests/test_summary_pdf_mapper_chain.py).
- Golden-fixture path strings in test_summary_pdf_mapper_chain.py +
  scripts/fetch_cohort2_api_jsons.py updated to the new location (the JSONs
  moved with the rdsap tests).

load_cells / gitignored worksheet xlsx: the xlsx-pinned tests (test_dimensions
/ ventilation / water_heating) read 2026-05-19-17-18 RdSap10Worksheet.xlsx,
which is gitignored (.gitignore `*.xlsx`) and so absent in CI. _xlsx_loader.
load_cells now pytest.skip()s when the file is absent, so those tests run
locally and skip cleanly in CI instead of erroring — no new CI failures from
the move, and the gitignore policy is respected.

Verified: tests/domain/sap10_calculator + backend/documents_parser +
tests/domain/property_baseline = 2248 pass, 1 skipped; pyright resolves the
new import paths with zero import-resolution errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 16:58:00 +00:00
Khalim Conn-Kowlessar
caee4de2f4 feat(ingestion): relocate EpcClientService to infrastructure + SolarRepo (#1133)
Move the EpcClientService package (client + _retry + exceptions + tests) from
the dying backend/ tree to infrastructure/epc_client/ as the New-EPC-API Fetcher;
update the two callers (address2UPRN, a script). All 14 client tests pass.

Add SolarRepository port + SolarPostgresRepository persisting Google Solar
building insights as JSONB (solar_building_insights table), one row per Property.
The EPC repo half of this slice already landed in #1129. pyright strict clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 19:45:26 +00:00
Khalim Conn-Kowlessar
22ae6f4d77 Slice S0380.39: bulk-fetch 38 cohort-2 EPC API JSONs for cross-mapper parity
Adds scripts/fetch_cohort2_api_jsons.py (throwaway one-off) plus 38
golden fixtures under domain/sap10_calculator/rdsap/tests/fixtures/golden/
covering every cert in "sap worksheets/additional with api 2/".

Each JSON is the inner `data` payload from the gov.uk EPB
/api/certificate endpoint — the same shape EpcPropertyDataMapper
.from_api_response consumes today.

Required prerequisite for Slice B (parametrized API-path chain test
that mirrors the cohort-2 Summary-path sweep at 1e-4 vs worksheet).
Per the cross-mapper-parity primitive: API EPC and Elmhurst EPC must
produce SAP within 1e-4 of each other and of the worksheet — the SAP
cascade is the load-bearing equivalence check.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 16:40:58 +00:00
Daniel Roth
9f7c16ccbd add address list 2026-05-21 15:30:03 +00:00
Daniel Roth
4e21dda328 rename files in sharepoint to desired structure 2026-05-20 16:26:07 +00:00
Jun-te Kim
c22528299c added type hinting to uprn 2026-05-12 09:40:12 +00:00
Jun-te Kim
c9c43f178c demo generated for use in address2uprn 2026-05-08 14:48:15 +00:00
Jun-te Kim
c498dc1951 init db 2026-03-31 11:45:59 +00:00
Daniel Roth
609468cff9 new methods for downloading all core files for pashub URL. Download currently not being authorised 2026-03-24 08:47:59 +00:00
Daniel Roth
6617d9e614 improved typing 2026-03-23 16:16:20 +00:00
Jun-te Kim
f102aa6a7c move location 2026-03-11 15:27:31 +00:00