Model/backend/documents_parser
Khalim Conn-Kowlessar 729ee29c84 Slice S0380.128: extractor §14.0 closure falls back to "14.1 Community Heating"
Elmhurst Summary §14.0 Main Heating1 normally closes at "14.1 Main
Heating2", but community-heated dwellings and "no system" certs lodge
§14.0 followed directly by "14.1 Community Heating/Heat Network" (no
second main system exists on a community-heated dwelling). Pre-slice
the extractor's `_between("14.0 Main Heating1", "14.1 Main Heating2")`
returned an empty string for these shapes — every §14.0 field
(including `Main Heating SAP Code`) came back None, then the mapper
strict-raised `UnmappedElmhurstLabel` with "§14.0 Main Heating1 has
neither PCDF boiler reference (None) nor SAP code (None)".

The fix adds a `_section_lines_first_end(start, ends)` helper that
accepts a tuple of end-marker candidates and uses whichever appears
first after `start`. `_extract_main_heating` now closes §14.0 at
either "14.1 Main Heating2" or "14.1 Community Heating" — whichever
Summary lodges.

Impact on heating-systems corpus 001431 at `sap worksheets/heating
systems examples/`:

  Variant                  Pre-S0380.128 -> Post-S0380.128
  ------------------------ ------------------ -----------------
  community heating 1      mapper-raise   ->  SAP code 301 OK
  community heating 2      mapper-raise   ->  SAP code 302 OK
  community heating 3      mapper-raise   ->  SAP code 304 OK
  community heating 4      mapper-raise   ->  SAP code 302 OK
  community heating 6      mapper-raise   ->  SAP code 302 OK
  no system                mapper-raise   ->  SAP code 699 OK

Corpus tally: **35/41 -> 41/41 cascade-OK**. With all populated
variants now executing, the cascade-vs-worksheet residual cluster is
fully visible for the first time. Notably community heating 6 surfaces
the FIRST negative ΔSAP in the corpus (-6.87 — cascade undershooting
the worksheet rather than overshooting), a distinct diagnostic shape
worth investigating next.

The fix is structural (extractor section bracketing) — no spec rule
to cite. RdSAP 10 §17 page 85 row 1.0 ("Main Heating") + §17 row
10-1a ("Community Heat Source") confirm that community-heated certs
have only one main heating system (no Main 2 block).

Extended handover suite at HEAD post-slice: **832 pass, 0 fail**
(was 831 + 1 new AAA test).

Pyright net-zero on touched files (13 → 13 — pre-existing errors
unrelated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-31 08:26:24 +00:00
..
handler address JTK review comments 2026-04-20 15:11:17 +00:00
tests Slice S0380.128: extractor §14.0 closure falls back to "14.1 Community Heating" 2026-05-31 08:26:24 +00:00
__init__.py Map to RdSapSiteNotes from site notes JSON 🟥 2026-04-16 13:54:03 +00:00
db_writer.py include updating epc_property_data to pashub to ara workflow 2026-04-29 09:55:14 +00:00
elmhurst_extractor.py Slice S0380.128: extractor §14.0 closure falls back to "14.1 Community Heating" 2026-05-31 08:26:24 +00:00
extractor.py Handle wall thickness "Unmeasurable" 🟩 2026-04-30 16:41:16 +00:00
local_runner.py update local runner to work for elmhurst 2026-04-24 14:01:36 +00:00
parser.py load ecmk site notes to db 2026-04-29 11:20:47 +00:00
pdf.py update local runner to work for elmhurst 2026-04-24 14:01:36 +00:00