Model/scripts/fetch_cohort2_api_jsons.py
Khalim Conn-Kowlessar a2bcc2c8af Move sap10_calculator tests to tests/domain/sap10_calculator/ for CI
The calculator tests lived under domain/sap10_calculator/{tests,worksheet/
tests,rdsap/tests,climate/tests,validation/tests}, none of which are in
pytest.ini testpaths — so CI (which collects tests/) never ran them. Relocate
all five dirs to tests/domain/sap10_calculator/{,worksheet,rdsap,climate,
validation}, mirroring the tests/domain/property_baseline/ convention, so the
cascade-pin / golden / e2e conformance suites run in CI.

Mechanics:
- git mv preserves history (110 files).
- Flattening the trailing /tests keeps each file's depth-to-repo-root
  identical, so all 16 repo-root parents[4] fixture refs stay valid. Only
  test_pcdb_etl.py's parents[1] (→ pcdb data) and one hardcoded absolute
  golden-fixture path in test_cert_to_inputs.py needed rebasing.
- Cross-imports rewritten domain.sap10_calculator.worksheet.tests →
  tests.domain.sap10_calculator.worksheet (21 files incl. the external
  importer backend/documents_parser/tests/test_summary_pdf_mapper_chain.py).
- Golden-fixture path strings in test_summary_pdf_mapper_chain.py +
  scripts/fetch_cohort2_api_jsons.py updated to the new location (the JSONs
  moved with the rdsap tests).

load_cells / gitignored worksheet xlsx: the xlsx-pinned tests (test_dimensions
/ ventilation / water_heating) read 2026-05-19-17-18 RdSap10Worksheet.xlsx,
which is gitignored (.gitignore `*.xlsx`) and so absent in CI. _xlsx_loader.
load_cells now pytest.skip()s when the file is absent, so those tests run
locally and skip cleanly in CI instead of erroring — no new CI failures from
the move, and the gitignore policy is respected.

Verified: tests/domain/sap10_calculator + backend/documents_parser +
tests/domain/property_baseline = 2248 pass, 1 skipped; pyright resolves the
new import paths with zero import-resolution errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 15:40:26 +00:00

85 lines
2.5 KiB
Python

"""Throwaway one-off: bulk-fetch cohort-2 EPC API JSONs from gov.uk EPB.
Persists the inner `data` payload (as returned by EpcClientService._fetch_certificate)
to tests/domain/sap10_calculator/rdsap/fixtures/golden/<cert>.json. Skips certs
whose JSON already exists.
"""
from __future__ import annotations
import json
import os
import sys
from pathlib import Path
from typing import Any
import httpx
from dotenv import load_dotenv
REPO_ROOT = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(REPO_ROOT))
from infrastructure.epc_client._retry import call_with_retry
from infrastructure.epc_client.epc_client_service import EpcClientService
from infrastructure.epc_client.exceptions import (
EpcApiError,
EpcNotFoundError,
EpcRateLimitError,
)
def _fetch_raw(token: str, cert_num: str) -> dict[str, Any]:
resp = httpx.get(
f"{EpcClientService.BASE_URL}/api/certificate",
params={"certificate_number": cert_num},
headers={"Authorization": f"Bearer {token}", "Accept": "application/json"},
timeout=EpcClientService.REQUEST_TIMEOUT,
)
if resp.status_code == 404:
raise EpcNotFoundError(cert_num)
if resp.status_code == 429:
raise EpcRateLimitError("Rate limited by EPC API")
if not resp.is_success:
raise EpcApiError(f"EPC API error {resp.status_code}: {resp.text}")
payload: dict[str, Any] = resp.json()["data"]
return payload
def main() -> int:
load_dotenv(REPO_ROOT / "backend" / ".env")
token = os.environ["OPEN_EPC_API_TOKEN"]
src = REPO_ROOT / "sap worksheets" / "additional with api 2"
dst = REPO_ROOT / "domain" / "sap10_calculator" / "rdsap" / "tests" / "fixtures" / "golden"
fetched = 0
skipped = 0
missing: list[str] = []
for cd in sorted(src.iterdir()):
if not cd.is_dir():
continue
out_path = dst / f"{cd.name}.json"
if out_path.exists():
print(f"skip {cd.name}")
skipped += 1
continue
cert_num = cd.name
try:
raw = call_with_retry(lambda: _fetch_raw(token, cert_num))
except EpcNotFoundError:
print(f"404 {cd.name}")
missing.append(cd.name)
continue
out_path.write_text(json.dumps(raw, indent=2))
print(f"fetch {cd.name}")
fetched += 1
print(f"\nfetched={fetched} skipped={skipped} missing={len(missing)}")
if missing:
print("missing:")
for c in missing:
print(f" {c}")
return 0
if __name__ == "__main__":
sys.exit(main())