The Elmhurst extractor crashed parsing simulated-case-6's room-in-roof
window rows: the §11 "Location" cell "Roof of Room in Roof" wraps across
the layout prefix/suffix blocks and leaked into the glazing-type phrase
("Double between 2002 Roof of Room and 2021 in Roof" → UnmappedElmhurst-
Label). Fix (`_parse_window_from_anchors`): detect the roof-of-room
location tokens, strip them from the before/after blocks so the glazing
phrase reconstructs cleanly, and set location="Roof of Room".
Mapper: `_is_elmhurst_roof_window` gains a "Roof of Room" location branch
(highest-confidence rooflight signal, above the BP-roof-type / U>3.0
gates); `_ELMHURST_ROOF_WINDOW_U_BY_GLAZING` gains "Double between 2002
and 2021" → 2.30 (case 6 lodges the already-inclined roof-window U, so
the +0.30 inclination adjustment must not double-apply).
This is the site-notes mirror of S0380.198 (API window_wall_type=4):
both paths now route room-in-roof rooflights to (27a) at the inclined U.
Validated against the case-6 P960 worksheet at abs=1e-4:
(27) Windows = 22.7408 (cascade 22.7407)
(27a) Roof Windows = 13.0375 (cascade 13.0375, EXACT)
(31) ext area = 336.13
Case 6 is pinned only on the §3 window line refs (new standalone test,
not added to the section-pin `_FIXTURES`) because its DUAL main heating
(51% rads + 49% underfloor, oil) makes the §10/§12 per-system lines
non-comparable to SapResult's aggregated fields — documented in the
fixture module. Summary mirrored to Summary_001431_case6.pdf.
Suite: 2355 passed, 1 skipped. New code: 0 pyright errors.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Promotes user-simulated "case 5" (detached, sandstone-walled, room-in-roof
cousin of golden cert 0240) to an e2e worksheet fixture pinning the WHOLE
extractor → mapper → calculator pipeline at abs=1e-4 on all 11 Block-1
line refs. Its worksheet prints the exact RR-gable routing S0380.196
implements, validating that fix against ground truth:
Roof room Main Gable Wall 1 15.68 U=0.35 (29a) Exposed → walls @ main-wall U
Roof room Main remaining area 61.73 U=0.30 (30) A_RR shell − Σ gables
External roof Main 14.52 U=0.11 (30) loft residual
Roof room Main Gable Wall 2 15.68 U=0.25 (32) Party → party @ 0.25
gable area = 6.40 × 2.45 (§3.9.1 default RR storey height); A_RR remaining
= 12.5√(83.2/1.5) − 2×15.68 = 93.09 − 31.36 = 61.73 (RdSAP 10 §3.9.1(e)).
Confirms a DETACHED dwelling can lodge a Party RR gable (Table 4 p.22
row 2) — so my S0380.196 mapping (gable_wall_type 0=Party, 1=Exposed) is
correct; do not flip it.
Two extractor/mapper gaps surfaced and fixed (case 5 is the forcing test):
- Sandstone wall label "SS Stone: sandstone or limestone" had no
`_ELMHURST_WALL_CODE_TO_SAP10` entry (raised UnmappedElmhurstLabel).
Added "SS" → 2 (WALL_STONE_SANDSTONE), matching 0240's API
wall_construction=2 (cross-mapper parity).
- Roof "Insulation Thickness 400+ mm" was silently dropped: the four
thickness parsers used `.split()[0].isdigit()`, which rejects the
trailing "+" → None → u_roof fell back to the age-J default 0.16
instead of 0.11 (+1.09 W/K roof, the whole 0.12 SAP gap). Added
`_parse_thickness_mm` (strips to leading digits) and applied it at all
four sites (walls / alt-wall / roof / floor). The only existing fixture
with "400+ mm" (000565 Stud Wall) routes via the RIR regex, unaffected.
Result: case 5 cascade ≡ worksheet at 1e-4 on SAP/ECF/cost/CO2 + every
energy stream. Neither gap affects 0240 (its API path captures both the
sandstone code and "400mm+"); 0240's residual is therefore non-fabric.
Suite: 2353 passed, 1 skipped. New code: 0 pyright errors.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds the user-simulated case-4 worksheet as e2e fixture `001431_6035` —
reproduces golden cert 6035's full floor geometry (Main ground-floor HLP
15.99 + first-floor HLP 8.32, the asymmetric upper storey) and 8 windows.
All 11 Block-1 line refs pin at abs=1e-4 against the worksheet (SAP 68,
ECF 2.2802, cost 937.2341, CO2 4682.3494, space 15745.3260, main fuel
18744.4357).
This is the 4th independent 1e-4 confirmation across the 6035 archetype
(sim cases 1-4). Case 4 matches 6035 on floors + window areas; the
residual ~50 kWh / £11 cascade delta vs 6035 is two lodged inputs only
(largest window orientation N vs S; meter type "Dual" vs API 2), not
calculator behaviour.
Conclusion: the cascade reproduces the spec engine exactly for 6035's
geometry, so 6035's +19 PE vs the lodged register is lodged-register
divergence (the gov.uk register's rounded value vs the spec-exact
worksheet), NOT a calculator gap. 6035 is a "pin-forever" lodged-only
cert. Bugs surfaced + fixed along the way: S0380.192 (Simplified-RR
remaining area) and S0380.193 (suspended-floor sealed rule).
2341 passed (+11), 0 failed; pyright net-zero.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds the user-simulated case-3 worksheet as e2e fixture `001431_rr8` —
Main + Extension + Simplified room-in-roof with 8 windows (≈14.15 m²,
reproducing golden cert 6035's glazing) and Main ground-floor HLP 15.99.
All 11 Block-1 line refs pin at abs=1e-4 against the worksheet (SAP 68,
cost 951.3425, CO2 4767.4862, space 16086.3557, main fuel 19150.4235,
HW 3307.2639, lighting 262.0885).
This is the third independent 1e-4 confirmation that the cascade
reproduces the spec engine for the 6035 archetype (after S0380.192
Simplified-RR + S0380.193 suspended-floor). It differs from 6035 in one
input only — the Main first-floor HLP (15.99 here vs 6035's 8.32) — so
6035's +19 PE vs the lodged register is lodged-register divergence, not
a calculator gap. A byte-identical 6035 replica (first-floor HLP 8.32)
would let 6035 itself be pinned directly to close that out.
2330 passed (+11), 0 failed; pyright net-zero.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
RdSAP 10 §5 (PDF p.29) "Floor infiltration (suspended timber ground
floor only)", age band A-E, splits on whether a floor U-value is
supplied:
a) [U-value supplied] if floor U-value < 0.5 → "sealed", (12) = 0.1
b) [no U-value supplied] retro-fitted insulation → "sealed" 0.1;
otherwise "unsealed", (12) = 0.2
`_has_suspended_timber_floor_per_spec` fed the cascade's COMPUTED default
U into rule (a), so an as-built/uninsulated suspended-timber floor whose
default U happens to be < 0.5 was marked "sealed" (0.1) where Elmhurst
uses "unsealed" (0.2). That dropped (18) infiltration 0.85 → 0.75, (25)
effective ACH, HTC, and understated space heating ~450 kWh.
Fix: gate rule (a) on `floor_u_value_known` — a computed default U is not
a supplied value, so it falls through to (b). Verified against the
cert 001431 sim-case-2 worksheet: floor "As built", U=0.43 (matches the
worksheet's (28a) 0.4300 exactly), (12)=0.2 unsealed. Golden cert 6035
(also a suspended uninsulated floor) is unaffected — its U=0.63 ≥ 0.5
already routed to unsealed.
Promotes sim case 2 to the e2e harness as `001431_rr` (Main + Extension
+ Simplified room-in-roof — the 6035 archetype). All 11 Block-1 line
refs pin at abs=1e-4, locking BOTH this fix and S0380.192 (Simplified-RR
remaining area) end-to-end: SAP 69, cost 920.5046, CO2 4566.7090, space
15269.8593, main fuel 18178.4039. 2319 passed (+11), 0 failed; pyright
net-zero.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds the user-simulated 001431 case (the cert that drove S0380.189/.190)
as an Elmhurst-only e2e fixture: Summary PDF → extractor → mapper →
calculator, every Block-1 SapResult field pinned against the
P960-0001-001431 worksheet at abs=1e-4. All 11 pins pass with zero
residual — the case is clean, confirming the S0380.190 gas-combi fuel
derivation closes the Summary path natively.
Verified the handover's flagged "+0.0007 SAP" was a target artifact, not
a cascade gap: the worksheet displays ECF (257) rounded to 1.6047 and
integer SAP (258)=78; the cascade's continuous SAP is computed from the
UNROUNDED ECF = (255)*(256)/((4)+45) = 660.9750*0.4200/173.0, giving
77.6147 — which matches the worksheet's own unrounded value. Pinning the
continuous SAP from the display-rounded ECF (→ 77.6144) was the wrong
target. Block-1 line refs all match exactly: (211) 10699.7225, (219)
3327.1592, (231) 86.0, (232) 283.2229, (255) 660.9750, (272) 3000.1664,
Σ(98) 8987.7669.
Summary mirrored into the tracked fixtures dir as
Summary_001431_gas_combi.pdf (distinct name — the corpus reuses cert
001431 across every heating variant); source Summary + worksheet tracked
under sap worksheets/golden fixture debugging/ as the pin ground truth.
2302 passed (+11), 0 failed; pyright net-zero on new/changed files.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
test_heating_systems_corpus.py (and one pcdb-1 cross-check in
test_cert_to_inputs.py) read the 001431 controlled-variable corpus PDFs
directly at runtime from `sap worksheets/heating systems examples/`, but
that directory was never committed — it was supplied locally on
2026-05-30 and only ever existed on dev machines. CI therefore errored
with "no Summary PDF in …" for all 57 corpus variants.
Commit the 82 corpus PDFs (41 populated variant folders × Summary +
P960, 4.7 MB) in place so the cascade-vs-worksheet residual pins run in
CI, matching the existing convention where the U985 / 000565
conformance fixtures are committed under
backend/documents_parser/tests/fixtures/ (31 PDFs already tracked).
Only the .pdf fixtures are added; the stray .DS_Store and a P960 .txt
dump in pcdb 1/ are left untracked.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>