Full SAP assessments (~15% of corpus, 4 403 of 30 000 scanned bulk-zip certs) lodge a measured/calculated wall U-value per BS EN ISO 6946 in walls[i].description, e.g. "Average thermal transmittance 0.18 W/m²K". These certs typically have wall_construction, wall_insulation_type and construction_age_band all None, which the cascade defaults previously resolved to U = 1.5 (uninsulated cavity at band E). RdSAP 10 §5.3: "U values are obtained from … the construction type, date of construction and, where applicable, thickness of additional insulation" — but a measured value supersedes the cascade. Corpus U-value distribution among parsed: median 0.21, mean 0.225, range 0.06-1.84 80% at U ≈ 0.2 (Part L-compliant new-builds) 10% at U ≈ 0.1 (passivhaus / very low) 7% at U ≈ 0.3 (older retrofitted full-SAP) 3% in the tail (conversions, edge cases) Per affected cert (100 m² new-build at U 1.5 → 0.21): walls_w_per_k drops 129 → 21 W/K PEUI drops ≈ 120 kWh/m² Implementation: - _measured_u_from_description() regex-parses the phrase from the wall description; returns None on no-match or non-numeric so the cascade fall-through is preserved. - u_wall checks the measured value FIRST, before any cascade logic. - No range cap — calculator mirrors what the assessor lodged, per the "deterministic except for input errors" principle. Parse failure falls through cleanly. Parity probe at 300 certs, seed=7: headlines unchanged. Direct check on the sample: 0/300 certs carry an "Average thermal transmittance" description. The v18a parquet filters full-SAP certs out somewhere upstream, so this slice is invisible in the parquet-based probe. The slice's correctness is proved by: - 4 unit tests in test_rdsap_uvalues.py (tracer + regression on ordinary descriptions + parse-failure fallback + filled-cavity description still routes correctly) - 1 end-to-end test in test_heat_transmission.py exercising a synthetic full-SAP cert through heat_transmission_from_cert - All 274 domain tests passing, no regressions Follow-up tooling: a bulk-zip-based parity probe that doesn't filter to the parquet's subset is needed to measure this slice's corpus impact. Separate dig. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|---|---|---|
| .devcontainer | ||
| .github/workflows | ||
| .idea | ||
| .vscode | ||
| asset_list | ||
| backend | ||
| backlog | ||
| datatypes | ||
| docs | ||
| epr_data_exports | ||
| etl | ||
| infrastructure/terraform | ||
| model_data/requirements | ||
| packages | ||
| recommendations | ||
| scripts | ||
| services | ||
| sfr/principal_pitch | ||
| survey_report | ||
| utils | ||
| .coveragerc | ||
| .dockerignore | ||
| .gitignore | ||
| __init__.py | ||
| AGENTS.md | ||
| ara_backend_design.md | ||
| BaseUtility.py | ||
| CLAUDE.md | ||
| conftest.py | ||
| CONTEXT.md | ||
| devcontainer.sh | ||
| Dockerfile.test | ||
| Dockerfile.test.dockerignore | ||
| Makefile | ||
| MEMORY.md | ||
| package-lock.json | ||
| package.json | ||
| pyproject.toml | ||
| pyrightconfig.json | ||
| pytest.ini | ||
| README.md | ||
| run_backlog.sh | ||
| run_lambda_local.sh | ||
| serverless.yml | ||
| test.requirements.txt | ||
| tox.ini | ||
| UBIQUITOUS_LANGUAGE.md | ||
Model Repository
This repository contains the code pertaining to the development of the data science and machine learning products being utilised by Hestia.
The different folders in this repository relate to services that can be used independently, or can be imported and used as part of a larger application
Getting Started
Prerequisites
Dev Container Setup
This repo uses a Docker Compose-based dev container. The model-backend service joins a shared-dev Docker network so it can communicate with other local services (e.g. a frontend container) running on your machine.
VS Code users: The initializeCommand in devcontainer.json creates the shared-dev network automatically before the container starts. No manual step required — just open the repo and select Reopen in Container.
Non-VS Code / CI workflows: Run the following once before starting the container:
make dev-setup
This is idempotent and safe to re-run if the network already exists.
Folders
backend/
This folder contains the code for the fastapi backend service, which provides an interface to much of the functionality in this repository, for the frontend
model_data/
This folder contains related to the reading and preparation of assessment model data, including pulling out epc attributes
Testing
All tests can be run, against the configuration in pytest.ini running
pytest
This will run the complete panel of tests and report on coverage in the locations specified by the pytest.ini file.
To run tests in a specific service, e.g. inside of model_data, simply run
pytest --cov-config=model_data/.coveragerc --cov=model_data
This will produce the test results and coverage reports