The §11 Windows table in the Summary PDF doesn't lay out identically
across the cohort. Three new quirks added to the layout-style parser
so the remaining 5 certs can be debugged with windows actually
extracted:
1. `Wood 0.70` combined frame_type+frame_factor line — previously the
parser expected them on separate lines (data+1 / data+2) and
rejected the window when the joined form appeared.
2. Trailing glazing-type on the data line — `1.22 1.76 2.15 Double
pre 2002` is the joined-cell variant in 000516; the W/H/Area
anchor now captures the trailing phrase as an optional 4th group
and feeds it through as `inline_glazing_type`, bypassing the
separate-line glazing-prefix scan.
3. Cross-window gap with no glazing marker — `_partition_after_manuf`
now falls back to "second orientation token in gap" when no
glazing-type-prefix word appears. Covers the 000516 layout where
each window has prefix+suffix orient tokens (no inline orient)
and the glazing-type is joined-to-data.
The 5 remaining Summary PDFs are copied into
`backend/documents_parser/tests/fixtures/` ready for per-cert mapper
work. Mirror pin tests deferred — each cert still has its own diff
to close (handover in NEXT_AGENT_PROMPT.md documents the per-cert
state, e.g. 000477 needs secondary-heating extraction, 000516 needs
roof-window separation).
Current cohort SAP deltas vs the U985 worksheet PDFs (target 1e-4):
000474 0.0000 ✓
000477 +6.3655 secondary heating + lighting
000480 +8.2695 diagnosis pending
000487 +8.1433 extractor still drops windows
000490 +5.6551 diagnosis pending
000516 +5.9812 roof-window separation
Wider regression stays green (754 pass). Pyright net-zero on
touched files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>