Model/backend/documents_parser/tests
Khalim Conn-Kowlessar a9023e9650 Slice S0380.128: extractor §14.0 closure falls back to "14.1 Community Heating"
Elmhurst Summary §14.0 Main Heating1 normally closes at "14.1 Main
Heating2", but community-heated dwellings and "no system" certs lodge
§14.0 followed directly by "14.1 Community Heating/Heat Network" (no
second main system exists on a community-heated dwelling). Pre-slice
the extractor's `_between("14.0 Main Heating1", "14.1 Main Heating2")`
returned an empty string for these shapes — every §14.0 field
(including `Main Heating SAP Code`) came back None, then the mapper
strict-raised `UnmappedElmhurstLabel` with "§14.0 Main Heating1 has
neither PCDF boiler reference (None) nor SAP code (None)".

The fix adds a `_section_lines_first_end(start, ends)` helper that
accepts a tuple of end-marker candidates and uses whichever appears
first after `start`. `_extract_main_heating` now closes §14.0 at
either "14.1 Main Heating2" or "14.1 Community Heating" — whichever
Summary lodges.

Impact on heating-systems corpus 001431 at `sap worksheets/heating
systems examples/`:

  Variant                  Pre-S0380.128 -> Post-S0380.128
  ------------------------ ------------------ -----------------
  community heating 1      mapper-raise   ->  SAP code 301 OK
  community heating 2      mapper-raise   ->  SAP code 302 OK
  community heating 3      mapper-raise   ->  SAP code 304 OK
  community heating 4      mapper-raise   ->  SAP code 302 OK
  community heating 6      mapper-raise   ->  SAP code 302 OK
  no system                mapper-raise   ->  SAP code 699 OK

Corpus tally: **35/41 -> 41/41 cascade-OK**. With all populated
variants now executing, the cascade-vs-worksheet residual cluster is
fully visible for the first time. Notably community heating 6 surfaces
the FIRST negative ΔSAP in the corpus (-6.87 — cascade undershooting
the worksheet rather than overshooting), a distinct diagnostic shape
worth investigating next.

The fix is structural (extractor section bracketing) — no spec rule
to cite. RdSAP 10 §17 page 85 row 1.0 ("Main Heating") + §17 row
10-1a ("Community Heat Source") confirm that community-heated certs
have only one main heating system (no Main 2 block).

Extended handover suite at HEAD post-slice: **832 pass, 0 fail**
(was 831 + 1 new AAA test).

Pyright net-zero on touched files (13 → 13 — pre-existing errors
unrelated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:48 +00:00
..
fixtures Slice S0380.52: cert 000565 Elmhurst-only mapper-driven cascade pin + glazing-label coverage 2026-06-01 16:28:47 +00:00
__init__.py Map to RdSapSiteNotes from site notes JSON 🟥 2026-04-16 13:54:03 +00:00
test_elmhurst_end_to_end.py Slice S0380.17: map Elmhurst §11 glazing-type labels to SAP10 codes 2026-06-01 16:28:46 +00:00
test_elmhurst_extractor.py extract window frame details from elmhurst site notes 🟥 2026-04-27 15:50:25 +00:00
test_end_to_end.py P6.1 follow-on: unbox BuildingPartIdentifier at backend boundaries 2026-05-20 09:58:23 +00:00
test_extractor.py Handle wall thickness "Unmeasurable" 🟩 2026-04-30 16:41:16 +00:00
test_pdf.py rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
test_summary_pdf_mapper_chain.py Slice S0380.128: extractor §14.0 closure falls back to "14.1 Community Heating" 2026-06-01 16:28:48 +00:00