Captures the diagnosis so the next agent doesn't re-derive it: what's done (S0380.235-237), what's confirmed correct (calculator U-adjustment, party wall, glazing labels), the worksheet pin targets, and the two open causes — crucially the 000516 trap (byte-identical Summary data classified as a roof window there but a wall window here, so flipping the U>3 rule regresses 000516). Includes a rebuildable tracer recipe. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.4 KiB
Handover — double_glazing fixture: glazing done, window-extraction open
Point-in-time note for the agent owning the Elmhurst window extractor /
mapper. Start from AGENT_GUIDE.md for the 1e-4
worksheet-pin methodology and the cascade pipeline.
- Branch:
feature/per-cert-mapper-validation. HEAD8133521c. - Fixture: the double_glazing "before" recommendation pair —
sap worksheets/Recommendations Elmhurst Files/double_glazing/before/(Summary_001431 (1).pdf+P960-0001-001431 - 2026-06-02T115533.961.pdf). The Summary is also committed as a test fixture:backend/documents_parser/tests/fixtures/Summary_001431_double_glazing.pdf. - Worksheet block to pin (rating = block 1, region 0; demand = block 2, postcode): SAP cont 57.2415, cost (255) 1423.0955, fabric (33) 158.4548 / (37) 197.8463; demand CO2 (272) 3486.0799, PE (286) 16796.5617. (Blocks 3+ add PV/diverter — ignore for the "before" pin.)
Done this session (S0380.235-237) — DON'T redo
| slice | what |
|---|---|
S0380.235 3e45b7fa |
5 missing Elmhurst §11 glazing labels → SAP10 Table 6b (Secondary glazing→7, …- Normal emissivity→11, Triple pre 2002→10, Triple with unknown install date→6, Single glazing, known data→15). |
S0380.236 ea35bed2 |
Extension party-wall type read independently of "As Main Wall" (_extract_extensions): Main CU→0.5, Ext U Unable to determine→0.25. Worksheet (32) party heat loss now exact (32.573 vs 32.5725). |
S0380.237 8133521c |
Secondary glazing - Low emissivity→12. Double {1,2,3,7,13} + secondary {4,11,12} families now fully mapped. |
Confirmed already correct — do not touch:
- The calculator's window U-adjustment
1/(1/U + 0.04)(SAP §3.2 curtain correction) is exact: lodged 4.80→4.0268, 3.10→2.7580, 1.40→1.3258, all match the worksheet to 1e-4. Our 14 extracted windows sum to exactly 56.090. The 1e-4 gap is NOT in the calculator. - Glazing label→code mapping (g_L is the only cascade effect; lodged U/g
drive §3/§6 via
_g_perpendicularpreferring the lodged value).
Open — current residual SAP +1.13 (ours 58.37 vs ws 57.24), all in WINDOWS
The Summary §11 lodges 17 physical window rows; we end up with 14
sap_windows. Three windows are lost, in two distinct ways:
Cause 1 (HARD — read before touching _is_elmhurst_roof_window)
The mapper routes the two Double pre 2002 windows (lodged U 3.1 / 3.4) to
roof windows via the U > 3.0 backstop in
_is_elmhurst_roof_window (datatypes/epc/domain/mapper.py, the final
return w.u_value > _ELMHURST_ROOF_WINDOW_U_THRESHOLD). This fixture's
worksheet bills them as wall windows (27).
The trap: cohort cert 000516 has a window that is byte-identical
in every extractable Summary field — Double pre 2002, U=3.1,
location="External wall", bp Main, orient North-East — and its
worksheet bills it as a roof window (27a). Verified: gating the U>3
rule on location == "External wall" makes this fixture pass but
breaks both 000516 pins (test_summary_000516_full_chain_… and
test_from_elmhurst_site_notes_matches_hand_built_000516).
So identical Summary inputs are classified oppositely by the two worksheets. No rule keyed on the fields we currently extract can satisfy both. Resolving this needs a NEW disambiguating signal — likely a roof/wall or rooflight field Elmhurst lodges in §11 (or the BP roof structure) that the extractor doesn't yet capture. Do NOT flip the U>3 heuristic to fix this fixture; it silently regresses 000516.
Cause 2 (tractable — a plain parsing miss)
The extractor produces 16 windows from 17 §11 rows — it drops the
Double glazing, known data row (BFRC, lodged U=1.00 → adjusted 0.9615,
1st Extension, area 1.00; worksheet "Windows 12" on Ext1). The label maps
fine (→3); the physical row just isn't extracted. Fixing this alone won't
pin the fixture (Cause 1 still blocks) but it's a real, isolatable
extractor bug.
Tracer recipe (rebuild — the throwaway lived in /tmp)
# from repo root, PYTHONPATH=/workspaces/model
import re, subprocess; from pathlib import Path
from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
from domain.sap10_calculator.rdsap.cert_to_inputs import (
SAP_10_2_SPEC_PRICES, cert_to_inputs, cert_to_demand_inputs,
heat_transmission_section_from_cert)
from domain.sap10_calculator.calculator import calculate_sap_from_inputs
def pages(pdf):
n=int(re.search(r"Pages:\s+(\d+)",subprocess.run(["pdfinfo",str(pdf)],
capture_output=True,text=True).stdout).group(1)); out=[]
for i in range(1,n+1):
L=subprocess.run(["pdftotext","-layout","-f",str(i),"-l",str(i),str(pdf),"-"],
capture_output=True,text=True).stdout
out.append("\n".join(tok for ln in L.splitlines()
for tok in re.split(r"\s{2,}",ln.strip()) if tok))
return out
D=Path("sap worksheets/Recommendations Elmhurst Files/double_glazing/before")
sn=ElmhurstSiteNotesExtractor(pages(D/"Summary_001431 (1).pdf")).extract()
epc=EpcPropertyDataMapper.from_elmhurst_site_notes(sn)
# len(sn.windows)==16 (should be 17); len(epc.sap_windows)==14 (2 → roof, 1 dropped)
Per-window A×U on the worksheet uses the ADJUSTED U 1/(1/U_lodged+0.04);
sum the §3 (27) lines to 60.5577 (we get 56.090 from 14 windows).