Two diffs closed against the hand-built `_elmhurst_worksheet_000474`
target (SAP 62.2584):
1. `pumps_fans_kwh_per_yr` (130 → 160). The cascade keys §4f pumps+fans
electricity on `MainHeatingDetail.main_heating_category` (gas-fired
boilers = cat 2 → 160 kWh/yr). `from_elmhurst_site_notes` wasn't
populating the field, so it fell through to the default 130. Added
`_elmhurst_main_heating_category` deriving cat 2 for the gas/LPG-
PCDB-boiler branch; other categories deferred until a fixture
exercises them (consistent with the cascade lookup).
2. Window [4] orientation `East-South` → `East` and window [5]
orientation `''` → `South-East`. The layout-style parser's
`before_start = prev_manuf + 7` / `after_end = next_data` rule was
over-grabbing prefix tokens of W_{k+1} as suffix tokens of W_k
('South' from W_5's prefix bled into W_4's suffix). Replaced with
a symmetric partition on the first glazing-type-start token
(`Single`/`Double`/`Triple`/`Secondary`) within the cross-window
gap, used as the upper bound of W_k's suffix and the lower bound
of W_{k+1}'s prefix. Same boundary on both sides — prefix tokens
of the next window can no longer be attributed as suffix of the
current one.
After both fixes, Summary_000474 → ElmhurstSiteNotes → EpcPropertyData
→ cascade → SAP matches the worksheet PDF's unrounded line 257 value
to 1e-4 tolerance. All 754 datatypes/epc/ + backend/documents_parser/
tests green; pyright net-zero on touched files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The legacy `_extract_windows` regex anchors on "Permanent Shutters\n" which is broken across lines by the pdftotext-layout preprocessor. New fallback `_extract_windows_from_layout` anchors on the two stable per-window markers — a "W H Area" data line and the "Manufacturer <U_value>" line a few lines further down — and tolerates the variable-order optional fields (glazing_gap, inline building_part, inline orientation) between them. Prefix/suffix tokens around the data block are re-joined into glazing_type / building_part / orientation strings.
Cert U985-0001-000474's 7 windows across Main + 2 extensions now flow through the mapper to EpcPropertyData.sap_windows (was 0). Textract-style extraction (existing fixture) is unchanged — the legacy path runs first and only falls through when its regex misses.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ElmhurstSiteNotes had no representation for extensions: singular dimensions / walls / roof / floor fields could only describe the main bp. Summary PDFs lodge "1st Extension" / "2nd Extension" subsections in §4, §7, §8, §9 with optional "As Main: Yes" inheritance. This slice:
- Adds `ExtensionPart` dataclass and `ElmhurstSiteNotes.extensions: List[ExtensionPart]`.
- Adds `_split_section_by_bp` helper + per-bp parsing of dimensions / walls / roof / floor in the extractor; "As Main" inherits from the main bp.
- Refactors `_map_elmhurst_building_part` into a parameterised builder; adds `_map_elmhurst_building_parts` that yields Main + one SapBuildingPart per extension (capped at 4 per RdSAP10 §1.2).
- Scaffold test `test_summary_000474_mapper_produces_three_building_parts` flips from strict-xfail to passing.
Single-bp behaviour is unchanged (empty extensions list defaults). 752 existing tests stay green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>