From 218840db987f65b14e776f2cd4edd08c06845149 Mon Sep 17 00:00:00 2001 From: Khalim Conn-Kowlessar Date: Fri, 5 Jun 2026 09:47:29 +0000 Subject: [PATCH] docs: handover for the open window-extraction work on the double_glazing fixture MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Captures the diagnosis so the next agent doesn't re-derive it: what's done (S0380.235-237), what's confirmed correct (calculator U-adjustment, party wall, glazing labels), the worksheet pin targets, and the two open causes — crucially the 000516 trap (byte-identical Summary data classified as a roof window there but a wall window here, so flipping the U>3 rule regresses 000516). Includes a rebuildable tracer recipe. Co-Authored-By: Claude Opus 4.8 --- .../HANDOVER_GLAZING_WINDOW_EXTRACTION.md | 94 +++++++++++++++++++ 1 file changed, 94 insertions(+) create mode 100644 domain/sap10_calculator/docs/HANDOVER_GLAZING_WINDOW_EXTRACTION.md diff --git a/domain/sap10_calculator/docs/HANDOVER_GLAZING_WINDOW_EXTRACTION.md b/domain/sap10_calculator/docs/HANDOVER_GLAZING_WINDOW_EXTRACTION.md new file mode 100644 index 00000000..8f652381 --- /dev/null +++ b/domain/sap10_calculator/docs/HANDOVER_GLAZING_WINDOW_EXTRACTION.md @@ -0,0 +1,94 @@ +# Handover — double_glazing fixture: glazing done, window-extraction open + +Point-in-time note for the agent owning the Elmhurst window extractor / +mapper. Start from [`AGENT_GUIDE.md`](AGENT_GUIDE.md) for the 1e-4 +worksheet-pin methodology and the cascade pipeline. + +- **Branch:** `feature/per-cert-mapper-validation`. HEAD `8133521c`. +- **Fixture:** the double_glazing "before" recommendation pair — + `sap worksheets/Recommendations Elmhurst Files/double_glazing/before/` + (`Summary_001431 (1).pdf` + `P960-0001-001431 - 2026-06-02T115533.961.pdf`). + The Summary is also committed as a test fixture: + `backend/documents_parser/tests/fixtures/Summary_001431_double_glazing.pdf`. +- **Worksheet block to pin** (rating = block 1, region 0; demand = block 2, + postcode): SAP cont **57.2415**, cost (255) **1423.0955**, fabric (33) + **158.4548** / (37) **197.8463**; demand CO2 (272) **3486.0799**, PE (286) + **16796.5617**. (Blocks 3+ add PV/diverter — ignore for the "before" pin.) + +## Done this session (S0380.235-237) — DON'T redo + +| slice | what | +|---|---| +| **S0380.235** `3e45b7fa` | 5 missing Elmhurst §11 glazing labels → SAP10 Table 6b (`Secondary glazing`→7, `…- Normal emissivity`→11, `Triple pre 2002`→10, `Triple with unknown install date`→6, `Single glazing, known data`→15). | +| **S0380.236** `ea35bed2` | Extension party-wall type read independently of "As Main Wall" (`_extract_extensions`): Main `CU`→0.5, Ext `U Unable to determine`→0.25. Worksheet **(32) party heat loss now exact** (32.573 vs 32.5725). | +| **S0380.237** `8133521c` | `Secondary glazing - Low emissivity`→12. Double {1,2,3,7,13} + secondary {4,11,12} families now fully mapped. | + +**Confirmed already correct — do not touch:** +- The calculator's window **U-adjustment `1/(1/U + 0.04)`** (SAP §3.2 curtain + correction) is exact: lodged 4.80→4.0268, 3.10→2.7580, 1.40→1.3258, all + match the worksheet to 1e-4. Our 14 extracted windows sum to **exactly + 56.090**. The 1e-4 gap is NOT in the calculator. +- Glazing label→code mapping (g_L is the only cascade effect; lodged U/g + drive §3/§6 via `_g_perpendicular` preferring the lodged value). + +## Open — current residual **SAP +1.13** (ours 58.37 vs ws 57.24), all in WINDOWS + +The Summary §11 lodges **17 physical window rows**; we end up with **14** +`sap_windows`. Three windows are lost, in two distinct ways: + +### Cause 1 (HARD — read before touching `_is_elmhurst_roof_window`) +The mapper routes the two `Double pre 2002` windows (lodged U 3.1 / 3.4) to +**roof** windows via the `U > 3.0` backstop in +`_is_elmhurst_roof_window` (`datatypes/epc/domain/mapper.py`, the final +`return w.u_value > _ELMHURST_ROOF_WINDOW_U_THRESHOLD`). This fixture's +worksheet bills them as **wall** windows (27). + +**The trap:** cohort cert **000516** has a window that is *byte-identical* +in every extractable Summary field — `Double pre 2002`, U=3.1, +`location="External wall"`, bp `Main`, orient `North-East` — and *its* +worksheet bills it as a **roof** window (27a). Verified: gating the U>3 +rule on `location == "External wall"` makes this fixture pass but +**breaks both 000516 pins** (`test_summary_000516_full_chain_…` and +`test_from_elmhurst_site_notes_matches_hand_built_000516`). + +So identical Summary inputs are classified oppositely by the two +worksheets. **No rule keyed on the fields we currently extract can satisfy +both.** Resolving this needs a NEW disambiguating signal — likely a +roof/wall or rooflight field Elmhurst lodges in §11 (or the BP roof +structure) that the extractor doesn't yet capture. Do NOT flip the U>3 +heuristic to fix this fixture; it silently regresses 000516. + +### Cause 2 (tractable — a plain parsing miss) +The extractor produces **16 windows from 17 §11 rows** — it drops the +`Double glazing, known data` row (BFRC, lodged U=1.00 → adjusted 0.9615, +1st Extension, area 1.00; worksheet "Windows 12" on Ext1). The label maps +fine (→3); the physical row just isn't extracted. Fixing this alone won't +pin the fixture (Cause 1 still blocks) but it's a real, isolatable +extractor bug. + +## Tracer recipe (rebuild — the throwaway lived in /tmp) +```python +# from repo root, PYTHONPATH=/workspaces/model +import re, subprocess; from pathlib import Path +from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor +from datatypes.epc.domain.mapper import EpcPropertyDataMapper +from domain.sap10_calculator.rdsap.cert_to_inputs import ( + SAP_10_2_SPEC_PRICES, cert_to_inputs, cert_to_demand_inputs, + heat_transmission_section_from_cert) +from domain.sap10_calculator.calculator import calculate_sap_from_inputs +def pages(pdf): + n=int(re.search(r"Pages:\s+(\d+)",subprocess.run(["pdfinfo",str(pdf)], + capture_output=True,text=True).stdout).group(1)); out=[] + for i in range(1,n+1): + L=subprocess.run(["pdftotext","-layout","-f",str(i),"-l",str(i),str(pdf),"-"], + capture_output=True,text=True).stdout + out.append("\n".join(tok for ln in L.splitlines() + for tok in re.split(r"\s{2,}",ln.strip()) if tok)) + return out +D=Path("sap worksheets/Recommendations Elmhurst Files/double_glazing/before") +sn=ElmhurstSiteNotesExtractor(pages(D/"Summary_001431 (1).pdf")).extract() +epc=EpcPropertyDataMapper.from_elmhurst_site_notes(sn) +# len(sn.windows)==16 (should be 17); len(epc.sap_windows)==14 (2 → roof, 1 dropped) +``` +Per-window A×U on the worksheet uses the ADJUSTED U `1/(1/U_lodged+0.04)`; +sum the §3 `(27)` lines to 60.5577 (we get 56.090 from 14 windows).