Model/backend/documents_parser/tests
Khalim Conn-Kowlessar 795d36b732 fix(extractor): re-join §11 windows whose Area cell split onto its own line
Sim case 20's §11 lodges 5 windows but only 1 surfaced. The "W H Area"
cells tokenize inconsistently: a narrow Area column keeps all three on one
line ("1.80 2.10 3.78" — matches _WIDTH_HEIGHT_AREA_RE), but a wider Area
column triggers pdftotext's 2+-space split, dropping the Area onto its own
line ("5.79 2.00" then "11.58"). The 3-decimal data anchor never matched
those four rows, so they were lost — gutting §6 solar gains (5 windows →
1) and dropping continuous SAP 43.05 → 38.32 vs the worksheet's 43.6322.

Pre-merge a "W H" line + a following lone-decimal Area into the canonical
"W H Area" line, gated on Area ≈ W × H (the §11 Area is always the product)
so a frame factor / g-value / U-value below a dimension line is never
absorbed. One-line layouts (3 decimals) are untouched.

Pins via test_summary_001431_case20_extracts_all_five_section11_windows
(Summary_001431_case20.pdf mirrors sap worksheets/golden fixture debugging/
simulated case 20/). 573 documents_parser tests pass; pyright strict net-zero.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 10:35:21 +00:00
..
fixtures fix(extractor): re-join §11 windows whose Area cell split onto its own line 2026-06-06 10:35:21 +00:00
__init__.py Map to RdSapSiteNotes from site notes JSON 🟥 2026-04-16 13:54:03 +00:00
test_elmhurst_end_to_end.py Slice S0380.17: map Elmhurst §11 glazing-type labels to SAP10 codes 2026-05-27 23:05:52 +00:00
test_elmhurst_extractor.py fix(extractor): drop windows-table header remnant from first window glazing type 2026-06-02 22:54:49 +00:00
test_end_to_end.py P6.1 follow-on: unbox BuildingPartIdentifier at backend boundaries 2026-05-20 09:58:23 +00:00
test_extractor.py Handle wall thickness "Unmeasurable" 🟩 2026-04-30 16:41:16 +00:00
test_heating_systems_corpus.py S0380.185: record CH6 pin-forever proof — distribution-loss is a Summary-export gap 2026-06-02 19:21:28 +00:00
test_pdf.py rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
test_room_heater_worksheet_001431.py S0380.239: system-build walls take masonry structural infiltration (0.35) 2026-06-05 19:00:02 +00:00
test_summary_pdf_mapper_chain.py fix(extractor): re-join §11 windows whose Area cell split onto its own line 2026-06-06 10:35:21 +00:00