Model/backend/documents_parser/tests
Khalim Conn-Kowlessar 256a5afee5 Slice 46c: Elmhurst mapper produces calculator-equivalent EpcPropertyData — Summary_000474 SAP within 0.5 of worksheet PDF
The full Summary→ElmhurstSiteNotes→EpcPropertyData→cascade→SAP chain now produces unrounded SAP 62.52 for cert U985-0001-000474 vs the worksheet PDF's 62.2584 — inside the 0.5 tolerance the user accepts on the API-cert residual cohort. The hand-built worksheet-fixture chain matches Elmhurst's unrounded SAP to 4 d.p. (62.2584), so the calculator+cascade are provably equivalent to Elmhurst's calculator; this slice closes the mapper side of the chain.

Mapper changes drop the string-versus-int impedance mismatch that prevented the cascade from consuming Elmhurst-coded values:
- construction_age_band: `_strip_code('B 1900-1929')` → 'B' (was '1900-1929')
- wall_construction: `_elmhurst_wall_construction_int('CA Cavity')` → 4 (was string 'Cavity')
- wall_insulation_type: `'A As Built'` → 4 (was string 'As Built')
- party_wall_construction: same int-mapping treatment
- main_fuel_type: `_elmhurst_main_fuel_int('Mains gas')` → 26 (the Table 12 fuel code; was string)
- heat_emitter_type: `'Radiators'` → 1 (was string)
- main_heating_control: `_elmhurst_sap_control_code('SAP code 2106, ...')` → 2106 (the SAP code int; was the trailing description)
- main_heating_index_number: parsed leading int from `pcdf_boiler_reference` ('16839 Vaillant…' → 16839) + `main_heating_data_source=1` so the PCDB cascade fires
- window orientation: `_elmhurst_orientation_int('North-West')` → 8 (the SAP10 octant; was string — solar gains were dropping to 0 W/m² as a result)

Floor handling also re-aligned with the SAP convention: floors sorted with the lowest as floor=0 (Elmhurst lodges 1st-floor entries first in the PDF); zero-area entries filtered out (single-storey extensions); non-ground room heights get the +0.25 m joist-void adjustment; `is_exposed_floor=True` for ground floors lodged above unheated space ('U Above unheated space'). `total_floor_area_m2` now sums across main + extensions.

Three regression pins on the new path:
- sap_building_parts == 3 (multi-bp)
- sap_windows == 7 (layout-style window parser)
- unrounded SAP within 0.5 of 62.2584 (worksheet PDF line 257)

Existing end-to-end test assertions updated to reflect the spec-correct int codes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 18:32:20 +00:00
..
fixtures Scaffold: end-to-end Summary→EpcPropertyData chain test for 000474 (xfail) 2026-05-24 17:40:06 +00:00
__init__.py Map to RdSapSiteNotes from site notes JSON 🟥 2026-04-16 13:54:03 +00:00
test_elmhurst_end_to_end.py Slice 46c: Elmhurst mapper produces calculator-equivalent EpcPropertyData — Summary_000474 SAP within 0.5 of worksheet PDF 2026-05-24 18:32:20 +00:00
test_elmhurst_extractor.py extract window frame details from elmhurst site notes 🟥 2026-04-27 15:50:25 +00:00
test_end_to_end.py P6.1 follow-on: unbox BuildingPartIdentifier at backend boundaries 2026-05-20 09:58:23 +00:00
test_extractor.py Handle wall thickness "Unmeasurable" 🟩 2026-04-30 16:41:16 +00:00
test_pdf.py rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
test_summary_pdf_mapper_chain.py Slice 46c: Elmhurst mapper produces calculator-equivalent EpcPropertyData — Summary_000474 SAP within 0.5 of worksheet PDF 2026-05-24 18:32:20 +00:00