Model/backend/documents_parser/tests/fixtures
Khalim Conn-Kowlessar 795d36b732 fix(extractor): re-join §11 windows whose Area cell split onto its own line
Sim case 20's §11 lodges 5 windows but only 1 surfaced. The "W H Area"
cells tokenize inconsistently: a narrow Area column keeps all three on one
line ("1.80 2.10 3.78" — matches _WIDTH_HEIGHT_AREA_RE), but a wider Area
column triggers pdftotext's 2+-space split, dropping the Area onto its own
line ("5.79 2.00" then "11.58"). The 3-decimal data anchor never matched
those four rows, so they were lost — gutting §6 solar gains (5 windows →
1) and dropping continuous SAP 43.05 → 38.32 vs the worksheet's 43.6322.

Pre-merge a "W H" line + a following lone-decimal Area into the canonical
"W H Area" line, gated on Area ≈ W × H (the §11 Area is always the product)
so a frame factor / g-value / U-value below a dimension line is never
absorbed. One-line layouts (3 decimals) are untouched.

Pins via test_summary_001431_case20_extracts_all_five_section11_windows
(Summary_001431_case20.pdf mirrors sap worksheets/golden fixture debugging/
simulated case 20/). 573 documents_parser tests pass; pyright strict net-zero.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 10:35:21 +00:00
..
elmhurst_site_notes_1_text.json elmhurst site notes fixture 2026-04-24 13:09:30 +00:00
elmhurst_site_notes_2_text.json extract window frame details from elmhurst site notes 🟥 2026-04-27 15:50:25 +00:00
ElmhurstSiteNotes.pdf rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
ElmhurstSiteNotes_2.pdf extract window frame details from elmhurst site notes 🟥 2026-04-27 15:50:25 +00:00
pashub_site_notes_1_text.json rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
pashub_site_notes_2_text.json rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
pashub_site_notes_3_text.json rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
pashub_site_notes_4_text.json rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
pashub_site_notes_5_text.json rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
pashub_site_notes_6_text.json rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
pashub_site_notes_7_text.json Extract address when Property photo element is missing from PDF 🟩 2026-04-30 16:25:41 +00:00
PasHubSiteNotes_1.pdf rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
PasHubSiteNotes_2.pdf rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
PasHubSiteNotes_3.pdf rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
PasHubSiteNotes_4.pdf rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
PasHubSiteNotes_5.pdf rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
PasHubSiteNotes_6.pdf rename example site notes to PasHub_ and add Elmhurst example 2026-04-24 13:01:51 +00:00
PasHubSiteNotes_7.pdf Extract address when Property photo element is missing from PDF 🟩 2026-04-30 16:25:41 +00:00
Summary_000474.pdf Scaffold: end-to-end Summary→EpcPropertyData chain test for 000474 (xfail) 2026-05-24 17:40:06 +00:00
Summary_000477.pdf Slice 48: Elmhurst extractor handles 3 new layout quirks; 5 fixture PDFs added 2026-05-24 19:17:59 +00:00
Summary_000480.pdf Slice 48: Elmhurst extractor handles 3 new layout quirks; 5 fixture PDFs added 2026-05-24 19:17:59 +00:00
Summary_000487.pdf Slice 48: Elmhurst extractor handles 3 new layout quirks; 5 fixture PDFs added 2026-05-24 19:17:59 +00:00
Summary_000490.pdf Slice 48: Elmhurst extractor handles 3 new layout quirks; 5 fixture PDFs added 2026-05-24 19:17:59 +00:00
Summary_000516.pdf Slice 48: Elmhurst extractor handles 3 new layout quirks; 5 fixture PDFs added 2026-05-24 19:17:59 +00:00
Summary_000565.pdf Slice S0380.52: cert 000565 Elmhurst-only mapper-driven cascade pin + glazing-label coverage 2026-05-28 22:03:52 +00:00
Summary_000784.pdf chore: stage cert 9501 fixtures (second boiler validation cert) 2026-05-26 18:53:08 +00:00
Summary_000884.pdf Slice S0380.16: add 'Normal' → cylinder_size=2 (110 L) for cohort 2 2026-05-27 22:44:02 +00:00
Summary_000888.pdf Slice S0380.17: map Elmhurst §11 glazing-type labels to SAP10 codes 2026-05-27 23:05:52 +00:00
Summary_000889.pdf Slice S0380.16: add 'Normal' → cylinder_size=2 (110 L) for cohort 2 2026-05-27 22:44:02 +00:00
Summary_000890.pdf Slice S0380.19: count Elmhurst shower outlets by type (no more hardcoded 1) 2026-05-28 07:16:32 +00:00
Summary_000897.pdf chore: stage cert 0330 fixtures (boiler pilot) 2026-05-26 17:37:14 +00:00
Summary_000898.pdf Slice S0380.10: pin certs 3800 + 9285 Summary chain tests — first-try closure 2026-05-27 20:47:51 +00:00
Summary_000899.pdf chore: stage cert 0380 fixtures (HP pilot — deferred workstream) 2026-05-26 17:37:34 +00:00
Summary_000900.pdf Slice S0380.10: pin certs 3800 + 9285 Summary chain tests — first-try closure 2026-05-27 20:47:51 +00:00
Summary_000901.pdf Slice S0380.10: pin certs 3800 + 9285 Summary chain tests — first-try closure 2026-05-27 20:47:51 +00:00
Summary_000902.pdf Slice S0380.10: pin certs 3800 + 9285 Summary chain tests — first-try closure 2026-05-27 20:47:51 +00:00
Summary_000903.pdf Slice S0380.9: multi-array PV support + close cert 0350 to ASHP spec floor 2026-05-27 20:44:13 +00:00
Summary_000904.pdf Slice S0380.10: pin certs 3800 + 9285 Summary chain tests — first-try closure 2026-05-27 20:47:51 +00:00
Summary_000910.pdf Slice S0380.18: u_party_wall flat default per RdSAP10 Table 15 footnote* 2026-05-27 23:24:58 +00:00
Summary_001431_6035.pdf S0380.195: pin sim case 4 (6035 floor geometry) e2e at 1e-4 — 6035 +19 PE is lodged divergence 2026-06-03 09:56:39 +00:00
Summary_001431_case5.pdf S0380.197: simulated case 5 e2e fixture — detached sandstone RR validates S0380.196 (RdSAP 10 §3.9.1 + Table 4 p.22) 2026-06-03 11:41:16 +00:00
Summary_001431_case6.pdf S0380.199: site-notes "Roof of Room" windows → roof windows (cross-mapper parity with S0380.198) 2026-06-03 12:46:18 +00:00
Summary_001431_case7.pdf S0380.208: case 7 combi e2e fixture — condensing-oil-combi path validated exact 2026-06-03 17:57:22 +00:00
Summary_001431_case20.pdf fix(extractor): re-join §11 windows whose Area cell split onto its own line 2026-06-06 10:35:21 +00:00
Summary_001431_double_glazing.pdf S0380.236: extension party-wall type read independently of "As Main Wall" 2026-06-05 09:19:43 +00:00
Summary_001431_gas_combi.pdf S0380.191: pin simulated 001431 gas-combi end-to-end at 1e-4 (e2e harness) 2026-06-02 22:44:32 +00:00
Summary_001431_rr8w.pdf S0380.194: pin sim case 3 (near-exact 6035 replica) e2e at 1e-4 2026-06-03 09:46:56 +00:00
Summary_001431_rr_ext.pdf S0380.193: suspended-floor (12) sealed rule fires only on a SUPPLIED U-value 2026-06-03 09:16:25 +00:00
Summary_001479.pdf Slice 54: Elmhurst mapper sets extensions_count from len(survey.extensions) 2026-05-24 22:15:47 +00:00