Model

mirror of https://github.com/Hestia-Homes/Model.git synced 2026-07-27 23:35:01 +00:00

History

Khalim Conn-Kowlessar 01d234dd0b Slice 63: RED tracer-bullet mapper-vs-hand-built diff test for cohort 000474 User-driven pivot to the cohort-first validation strategy: the 6 existing hand-built `_elmhurst_worksheet_NNNNNN.build_epc()` fixtures already cascade to their worksheet PDFs at 1e-4 — they ARE the 100%-correct calculator-input ground truth. Adding diff tests that assert `from_elmhurst_site_notes(pdf) == hand_built()` surfaces every silent divergence the existing chain tests miss (because chain tests only check cascade output, not field-level EpcPropertyData equality). Adds `test_from_elmhurst_site_notes_matches_hand_built_000474` as the tracer-bullet first cohort case. The test: 1. Maps Summary_000474.pdf through the Elmhurst extractor + mapper. 2. Builds the hand-built EpcPropertyData via `_elmhurst_worksheet_000474.build_epc()`. 3. Recursively diffs the two across a `_LOAD_BEARING_FIELDS` allow-list (40 top-level fields driving the SAP cascade or cross-mapper semantic equivalence; explicitly excludes cert metadata, EnergyElement descriptive lists, registration dates, and other fields that vary by mapper pathway without semantic disagreement — these are noise per user decision). RED status committed as the load-bearing TDD forcing function: 50 load-bearing divergences across 4 categories: Cat A — encoding-only / cascade-equivalent (~30 diffs): * Ventilation flue counts `0 vs None` (cascade defaults None to 0) * Dual-encoded sub-fields (`floor_construction_type` str-side, `roof_insulation_location` str-side, etc.) * Mapper-surfaces-descriptive-only fields (`floor_type`, `floor_u_value_known`) Cat B — real cascade-affecting gaps (~10 diffs): * `sap_heating.water_heating_fuel`: None vs 26 (mains gas) * `sap_heating.shower_outlets`: extracted vs None * `sap_heating.number_baths`: 1 vs None * `country_code`: None vs 'ENG' * `built_form`: 'Mid-Terrace' vs None * `boiler_flue_type`, `central_heating_pump_age` dual-encoding * `dwelling_type` casing 'Mid-Terrace house' vs 'Mid-terrace house' * `wall_thickness_measured`: True vs False Cat C — structural shape divergences (1 diff): * `sap_windows: LEN 7 vs 5` — mapper extracts 1:1 with §11 table; cohort hand-built collapsed entries by glazing-type group (preserving total area, cascade-equivalent but not field-equal). Cat D — Slice-54-style hand-built staleness (~5 diffs): * `extensions_count: 2 vs 0` — Slice 54 fix landed on mapper; hand-built still uses old hardcoded 0 * `party_wall_construction: None vs 0` — cohort convention sentinel * Hand-built ages prior to current mapper conventions Two RED forcing functions on the branch now: - test_summary_001479_full_chain_sap_matches_worksheet_pdf_exactly (delta 1.19 SAP vs 69.0094) - test_from_elmhurst_site_notes_matches_hand_built_000474 (50 load-bearing field divergences) Strict-pyright net-zero on the chain test file (0 errors); cohort chain tests all still pass (13 green / 2 RED). Next slices will chip away at the diff list — bulk-update cohort hand-builts for Cat A/D (mechanical) then attack Cat B/C with per-field design decisions. Once 000474 closes, parametrize over the 5 other cohort certs, then API-mapper diff test, then cross- mapper parity falls out. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>		2026-05-25 16:43:04 +00:00
..
handler	address JTK review comments	2026-04-20 15:11:17 +00:00
tests	Slice 63: RED tracer-bullet mapper-vs-hand-built diff test for cohort 000474	2026-05-25 16:43:04 +00:00
__init__.py	Map to RdSapSiteNotes from site notes JSON 🟥	2026-04-16 13:54:03 +00:00
db_writer.py	include updating epc_property_data to pashub to ara workflow	2026-04-29 09:55:14 +00:00
elmhurst_extractor.py	Slice 53: Summary_000487 chain pins SAP at 1e-4 — last cohort cert closed	2026-05-24 21:42:42 +00:00
extractor.py	Handle wall thickness "Unmeasurable" 🟩	2026-04-30 16:41:16 +00:00
local_runner.py	update local runner to work for elmhurst	2026-04-24 14:01:36 +00:00
parser.py	load ecmk site notes to db	2026-04-29 11:20:47 +00:00
pdf.py	update local runner to work for elmhurst	2026-04-24 14:01:36 +00:00