User-driven pivot to the cohort-first validation strategy: the 6
existing hand-built `_elmhurst_worksheet_NNNNNN.build_epc()` fixtures
already cascade to their worksheet PDFs at 1e-4 — they ARE the
100%-correct calculator-input ground truth. Adding diff tests that
assert `from_elmhurst_site_notes(pdf) == hand_built()` surfaces every
silent divergence the existing chain tests miss (because chain tests
only check cascade output, not field-level EpcPropertyData equality).
Adds `test_from_elmhurst_site_notes_matches_hand_built_000474` as the
tracer-bullet first cohort case. The test:
1. Maps Summary_000474.pdf through the Elmhurst extractor + mapper.
2. Builds the hand-built EpcPropertyData via
`_elmhurst_worksheet_000474.build_epc()`.
3. Recursively diffs the two across a `_LOAD_BEARING_FIELDS`
allow-list (40 top-level fields driving the SAP cascade or
cross-mapper semantic equivalence; explicitly excludes cert
metadata, EnergyElement descriptive lists, registration dates,
and other fields that vary by mapper pathway without semantic
disagreement — these are noise per user decision).
RED status committed as the load-bearing TDD forcing function:
50 load-bearing divergences across 4 categories:
Cat A — encoding-only / cascade-equivalent (~30 diffs):
* Ventilation flue counts `0 vs None` (cascade defaults None to 0)
* Dual-encoded sub-fields (`floor_construction_type` str-side,
`roof_insulation_location` str-side, etc.)
* Mapper-surfaces-descriptive-only fields (`floor_type`,
`floor_u_value_known`)
Cat B — real cascade-affecting gaps (~10 diffs):
* `sap_heating.water_heating_fuel`: None vs 26 (mains gas)
* `sap_heating.shower_outlets`: extracted vs None
* `sap_heating.number_baths`: 1 vs None
* `country_code`: None vs 'ENG'
* `built_form`: 'Mid-Terrace' vs None
* `boiler_flue_type`, `central_heating_pump_age` dual-encoding
* `dwelling_type` casing 'Mid-Terrace house' vs 'Mid-terrace house'
* `wall_thickness_measured`: True vs False
Cat C — structural shape divergences (1 diff):
* `sap_windows: LEN 7 vs 5` — mapper extracts 1:1 with §11 table;
cohort hand-built collapsed entries by glazing-type group
(preserving total area, cascade-equivalent but not field-equal).
Cat D — Slice-54-style hand-built staleness (~5 diffs):
* `extensions_count: 2 vs 0` — Slice 54 fix landed on mapper;
hand-built still uses old hardcoded 0
* `party_wall_construction: None vs 0` — cohort convention sentinel
* Hand-built ages prior to current mapper conventions
Two RED forcing functions on the branch now:
- test_summary_001479_full_chain_sap_matches_worksheet_pdf_exactly
(delta 1.19 SAP vs 69.0094)
- test_from_elmhurst_site_notes_matches_hand_built_000474
(50 load-bearing field divergences)
Strict-pyright net-zero on the chain test file (0 errors); cohort
chain tests all still pass (13 green / 2 RED).
Next slices will chip away at the diff list — bulk-update cohort
hand-builts for Cat A/D (mechanical) then attack Cat B/C with
per-field design decisions. Once 000474 closes, parametrize over
the 5 other cohort certs, then API-mapper diff test, then cross-
mapper parity falls out.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|---|---|---|
| .devcontainer | ||
| .github/workflows | ||
| .idea | ||
| .vscode | ||
| asset_list | ||
| backend | ||
| backlog | ||
| datatypes | ||
| docs | ||
| epr_data_exports | ||
| etl | ||
| infrastructure/terraform | ||
| model_data/requirements | ||
| packages | ||
| recommendations | ||
| scripts | ||
| services | ||
| sfr/principal_pitch | ||
| survey_report | ||
| utils | ||
| .coveragerc | ||
| .dockerignore | ||
| .gitignore | ||
| __init__.py | ||
| AGENTS.md | ||
| ara_backend_design.md | ||
| BaseUtility.py | ||
| CLAUDE.md | ||
| conftest.py | ||
| CONTEXT.md | ||
| devcontainer.sh | ||
| Dockerfile.test | ||
| Dockerfile.test.dockerignore | ||
| Makefile | ||
| MEMORY.md | ||
| package-lock.json | ||
| package.json | ||
| pyproject.toml | ||
| pyrightconfig.json | ||
| pytest.ini | ||
| README.md | ||
| run_backlog.sh | ||
| run_lambda_local.sh | ||
| serverless.yml | ||
| test.requirements.txt | ||
| tox.ini | ||
| UBIQUITOUS_LANGUAGE.md | ||
Model Repository
This repository contains the code pertaining to the development of the data science and machine learning products being utilised by Hestia.
The different folders in this repository relate to services that can be used independently, or can be imported and used as part of a larger application
Getting Started
Prerequisites
Dev Container Setup
This repo uses a Docker Compose-based dev container. The model-backend service joins a shared-dev Docker network so it can communicate with other local services (e.g. a frontend container) running on your machine.
VS Code users: The initializeCommand in devcontainer.json creates the shared-dev network automatically before the container starts. No manual step required — just open the repo and select Reopen in Container.
Non-VS Code / CI workflows: Run the following once before starting the container:
make dev-setup
This is idempotent and safe to re-run if the network already exists.
Folders
backend/
This folder contains the code for the fastapi backend service, which provides an interface to much of the functionality in this repository, for the frontend
model_data/
This folder contains related to the reading and preparation of assessment model data, including pulling out epc attributes
Testing
All tests can be run, against the configuration in pytest.ini running
pytest
This will run the complete panel of tests and report on coverage in the locations specified by the pytest.ini file.
To run tests in a specific service, e.g. inside of model_data, simply run
pytest --cov-config=model_data/.coveragerc --cov=model_data
This will produce the test results and coverage reports