mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
Sim case 20's §11 lodges 5 windows but only 1 surfaced. The "W H Area"
cells tokenize inconsistently: a narrow Area column keeps all three on one
line ("1.80 2.10 3.78" — matches _WIDTH_HEIGHT_AREA_RE), but a wider Area
column triggers pdftotext's 2+-space split, dropping the Area onto its own
line ("5.79 2.00" then "11.58"). The 3-decimal data anchor never matched
those four rows, so they were lost — gutting §6 solar gains (5 windows →
1) and dropping continuous SAP 43.05 → 38.32 vs the worksheet's 43.6322.
Pre-merge a "W H" line + a following lone-decimal Area into the canonical
"W H Area" line, gated on Area ≈ W × H (the §11 Area is always the product)
so a frame factor / g-value / U-value below a dimension line is never
absorbed. One-line layouts (3 decimals) are untouched.
Pins via test_summary_001431_case20_extracts_all_five_section11_windows
(Summary_001431_case20.pdf mirrors sap worksheets/golden fixture debugging/
simulated case 20/). 573 documents_parser tests pass; pyright strict net-zero.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| fixtures | ||
| __init__.py | ||
| test_elmhurst_end_to_end.py | ||
| test_elmhurst_extractor.py | ||
| test_end_to_end.py | ||
| test_extractor.py | ||
| test_heating_systems_corpus.py | ||
| test_pdf.py | ||
| test_room_heater_worksheet_001431.py | ||
| test_summary_pdf_mapper_chain.py | ||