Model/backend/documents_parser/tests/fixtures
Khalim Conn-Kowlessar 00a27efd87 Slice 48: Elmhurst extractor handles 3 new layout quirks; 5 fixture PDFs added
The §11 Windows table in the Summary PDF doesn't lay out identically
across the cohort. Three new quirks added to the layout-style parser
so the remaining 5 certs can be debugged with windows actually
extracted:

1. `Wood 0.70` combined frame_type+frame_factor line — previously the
   parser expected them on separate lines (data+1 / data+2) and
   rejected the window when the joined form appeared.
2. Trailing glazing-type on the data line — `1.22 1.76 2.15 Double
   pre 2002` is the joined-cell variant in 000516; the W/H/Area
   anchor now captures the trailing phrase as an optional 4th group
   and feeds it through as `inline_glazing_type`, bypassing the
   separate-line glazing-prefix scan.
3. Cross-window gap with no glazing marker — `_partition_after_manuf`
   now falls back to "second orientation token in gap" when no
   glazing-type-prefix word appears. Covers the 000516 layout where
   each window has prefix+suffix orient tokens (no inline orient)
   and the glazing-type is joined-to-data.

The 5 remaining Summary PDFs are copied into
`backend/documents_parser/tests/fixtures/` ready for per-cert mapper
work. Mirror pin tests deferred — each cert still has its own diff
to close (handover in NEXT_AGENT_PROMPT.md documents the per-cert
state, e.g. 000477 needs secondary-heating extraction, 000516 needs
roof-window separation).

Current cohort SAP deltas vs the U985 worksheet PDFs (target 1e-4):

  000474   0.0000  ✓
  000477  +6.3655     secondary heating + lighting
  000480  +8.2695     diagnosis pending
  000487  +8.1433     extractor still drops windows
  000490  +5.6551     diagnosis pending
  000516  +5.9812     roof-window separation

Wider regression stays green (754 pass). Pyright net-zero on
touched files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 19:17:59 +00:00
..
elmhurst_site_notes_1_text.json elmhurst site notes fixture 2026-04-24 13:09:30 +00:00
elmhurst_site_notes_2_text.json extract window frame details from elmhurst site notes 🟥 2026-04-27 15:50:25 +00:00
ElmhurstSiteNotes.pdf rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
ElmhurstSiteNotes_2.pdf extract window frame details from elmhurst site notes 🟥 2026-04-27 15:50:25 +00:00
pashub_site_notes_1_text.json rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
pashub_site_notes_2_text.json rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
pashub_site_notes_3_text.json rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
pashub_site_notes_4_text.json rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
pashub_site_notes_5_text.json rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
pashub_site_notes_6_text.json rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
pashub_site_notes_7_text.json Extract address when Property photo element is missing from PDF 🟩 2026-04-30 16:25:41 +00:00
PasHubSiteNotes_1.pdf rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
PasHubSiteNotes_2.pdf rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
PasHubSiteNotes_3.pdf rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
PasHubSiteNotes_4.pdf rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
PasHubSiteNotes_5.pdf rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
PasHubSiteNotes_6.pdf rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
PasHubSiteNotes_7.pdf Extract address when Property photo element is missing from PDF 🟩 2026-04-30 16:25:41 +00:00
Summary_000474.pdf Scaffold: end-to-end Summary→EpcPropertyData chain test for 000474 (xfail) 2026-05-24 17:40:06 +00:00
Summary_000477.pdf Slice 48: Elmhurst extractor handles 3 new layout quirks; 5 fixture PDFs added 2026-05-24 19:17:59 +00:00
Summary_000480.pdf Slice 48: Elmhurst extractor handles 3 new layout quirks; 5 fixture PDFs added 2026-05-24 19:17:59 +00:00
Summary_000487.pdf Slice 48: Elmhurst extractor handles 3 new layout quirks; 5 fixture PDFs added 2026-05-24 19:17:59 +00:00
Summary_000490.pdf Slice 48: Elmhurst extractor handles 3 new layout quirks; 5 fixture PDFs added 2026-05-24 19:17:59 +00:00
Summary_000516.pdf Slice 48: Elmhurst extractor handles 3 new layout quirks; 5 fixture PDFs added 2026-05-24 19:17:59 +00:00