Model/domain/sap10_calculator/docs/HANDOVER_GLAZING_WINDOW_EXTRACTION.md
Khalim Conn-Kowlessar 218840db98 docs: handover for the open window-extraction work on the double_glazing fixture
Captures the diagnosis so the next agent doesn't re-derive it: what's done
(S0380.235-237), what's confirmed correct (calculator U-adjustment, party
wall, glazing labels), the worksheet pin targets, and the two open causes —
crucially the 000516 trap (byte-identical Summary data classified as a roof
window there but a wall window here, so flipping the U>3 rule regresses
000516). Includes a rebuildable tracer recipe.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:47:29 +00:00

5.4 KiB
Raw Blame History

Handover — double_glazing fixture: glazing done, window-extraction open

Point-in-time note for the agent owning the Elmhurst window extractor / mapper. Start from AGENT_GUIDE.md for the 1e-4 worksheet-pin methodology and the cascade pipeline.

  • Branch: feature/per-cert-mapper-validation. HEAD 8133521c.
  • Fixture: the double_glazing "before" recommendation pair — sap worksheets/Recommendations Elmhurst Files/double_glazing/before/ (Summary_001431 (1).pdf + P960-0001-001431 - 2026-06-02T115533.961.pdf). The Summary is also committed as a test fixture: backend/documents_parser/tests/fixtures/Summary_001431_double_glazing.pdf.
  • Worksheet block to pin (rating = block 1, region 0; demand = block 2, postcode): SAP cont 57.2415, cost (255) 1423.0955, fabric (33) 158.4548 / (37) 197.8463; demand CO2 (272) 3486.0799, PE (286) 16796.5617. (Blocks 3+ add PV/diverter — ignore for the "before" pin.)

Done this session (S0380.235-237) — DON'T redo

slice what
S0380.235 3e45b7fa 5 missing Elmhurst §11 glazing labels → SAP10 Table 6b (Secondary glazing→7, …- Normal emissivity→11, Triple pre 2002→10, Triple with unknown install date→6, Single glazing, known data→15).
S0380.236 ea35bed2 Extension party-wall type read independently of "As Main Wall" (_extract_extensions): Main CU→0.5, Ext U Unable to determine→0.25. Worksheet (32) party heat loss now exact (32.573 vs 32.5725).
S0380.237 8133521c Secondary glazing - Low emissivity→12. Double {1,2,3,7,13} + secondary {4,11,12} families now fully mapped.

Confirmed already correct — do not touch:

  • The calculator's window U-adjustment 1/(1/U + 0.04) (SAP §3.2 curtain correction) is exact: lodged 4.80→4.0268, 3.10→2.7580, 1.40→1.3258, all match the worksheet to 1e-4. Our 14 extracted windows sum to exactly 56.090. The 1e-4 gap is NOT in the calculator.
  • Glazing label→code mapping (g_L is the only cascade effect; lodged U/g drive §3/§6 via _g_perpendicular preferring the lodged value).

Open — current residual SAP +1.13 (ours 58.37 vs ws 57.24), all in WINDOWS

The Summary §11 lodges 17 physical window rows; we end up with 14 sap_windows. Three windows are lost, in two distinct ways:

Cause 1 (HARD — read before touching _is_elmhurst_roof_window)

The mapper routes the two Double pre 2002 windows (lodged U 3.1 / 3.4) to roof windows via the U > 3.0 backstop in _is_elmhurst_roof_window (datatypes/epc/domain/mapper.py, the final return w.u_value > _ELMHURST_ROOF_WINDOW_U_THRESHOLD). This fixture's worksheet bills them as wall windows (27).

The trap: cohort cert 000516 has a window that is byte-identical in every extractable Summary field — Double pre 2002, U=3.1, location="External wall", bp Main, orient North-East — and its worksheet bills it as a roof window (27a). Verified: gating the U>3 rule on location == "External wall" makes this fixture pass but breaks both 000516 pins (test_summary_000516_full_chain_… and test_from_elmhurst_site_notes_matches_hand_built_000516).

So identical Summary inputs are classified oppositely by the two worksheets. No rule keyed on the fields we currently extract can satisfy both. Resolving this needs a NEW disambiguating signal — likely a roof/wall or rooflight field Elmhurst lodges in §11 (or the BP roof structure) that the extractor doesn't yet capture. Do NOT flip the U>3 heuristic to fix this fixture; it silently regresses 000516.

Cause 2 (tractable — a plain parsing miss)

The extractor produces 16 windows from 17 §11 rows — it drops the Double glazing, known data row (BFRC, lodged U=1.00 → adjusted 0.9615, 1st Extension, area 1.00; worksheet "Windows 12" on Ext1). The label maps fine (→3); the physical row just isn't extracted. Fixing this alone won't pin the fixture (Cause 1 still blocks) but it's a real, isolatable extractor bug.

Tracer recipe (rebuild — the throwaway lived in /tmp)

# from repo root, PYTHONPATH=/workspaces/model
import re, subprocess; from pathlib import Path
from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
from domain.sap10_calculator.rdsap.cert_to_inputs import (
    SAP_10_2_SPEC_PRICES, cert_to_inputs, cert_to_demand_inputs,
    heat_transmission_section_from_cert)
from domain.sap10_calculator.calculator import calculate_sap_from_inputs
def pages(pdf):
    n=int(re.search(r"Pages:\s+(\d+)",subprocess.run(["pdfinfo",str(pdf)],
        capture_output=True,text=True).stdout).group(1)); out=[]
    for i in range(1,n+1):
        L=subprocess.run(["pdftotext","-layout","-f",str(i),"-l",str(i),str(pdf),"-"],
            capture_output=True,text=True).stdout
        out.append("\n".join(tok for ln in L.splitlines()
            for tok in re.split(r"\s{2,}",ln.strip()) if tok))
    return out
D=Path("sap worksheets/Recommendations Elmhurst Files/double_glazing/before")
sn=ElmhurstSiteNotesExtractor(pages(D/"Summary_001431 (1).pdf")).extract()
epc=EpcPropertyDataMapper.from_elmhurst_site_notes(sn)
# len(sn.windows)==16 (should be 17); len(epc.sap_windows)==14 (2 → roof, 1 dropped)

Per-window A×U on the worksheet uses the ADJUSTED U 1/(1/U_lodged+0.04); sum the §3 (27) lines to 60.5577 (we get 56.090 from 14 windows).