docs: handover for the open window-extraction work on the double_glazing fixture

Captures the diagnosis so the next agent doesn't re-derive it: what's done
(S0380.235-237), what's confirmed correct (calculator U-adjustment, party
wall, glazing labels), the worksheet pin targets, and the two open causes —
crucially the 000516 trap (byte-identical Summary data classified as a roof
window there but a wall window here, so flipping the U>3 rule regresses
000516). Includes a rebuildable tracer recipe.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-06-05 09:47:29 +00:00
parent 14a27c7a61
commit 218840db98

View file

@ -0,0 +1,94 @@
# Handover — double_glazing fixture: glazing done, window-extraction open
Point-in-time note for the agent owning the Elmhurst window extractor /
mapper. Start from [`AGENT_GUIDE.md`](AGENT_GUIDE.md) for the 1e-4
worksheet-pin methodology and the cascade pipeline.
- **Branch:** `feature/per-cert-mapper-validation`. HEAD `8133521c`.
- **Fixture:** the double_glazing "before" recommendation pair —
`sap worksheets/Recommendations Elmhurst Files/double_glazing/before/`
(`Summary_001431 (1).pdf` + `P960-0001-001431 - 2026-06-02T115533.961.pdf`).
The Summary is also committed as a test fixture:
`backend/documents_parser/tests/fixtures/Summary_001431_double_glazing.pdf`.
- **Worksheet block to pin** (rating = block 1, region 0; demand = block 2,
postcode): SAP cont **57.2415**, cost (255) **1423.0955**, fabric (33)
**158.4548** / (37) **197.8463**; demand CO2 (272) **3486.0799**, PE (286)
**16796.5617**. (Blocks 3+ add PV/diverter — ignore for the "before" pin.)
## Done this session (S0380.235-237) — DON'T redo
| slice | what |
|---|---|
| **S0380.235** `3e45b7fa` | 5 missing Elmhurst §11 glazing labels → SAP10 Table 6b (`Secondary glazing`→7, `…- Normal emissivity`→11, `Triple pre 2002`→10, `Triple with unknown install date`→6, `Single glazing, known data`→15). |
| **S0380.236** `ea35bed2` | Extension party-wall type read independently of "As Main Wall" (`_extract_extensions`): Main `CU`→0.5, Ext `U Unable to determine`→0.25. Worksheet **(32) party heat loss now exact** (32.573 vs 32.5725). |
| **S0380.237** `8133521c` | `Secondary glazing - Low emissivity`→12. Double {1,2,3,7,13} + secondary {4,11,12} families now fully mapped. |
**Confirmed already correct — do not touch:**
- The calculator's window **U-adjustment `1/(1/U + 0.04)`** (SAP §3.2 curtain
correction) is exact: lodged 4.80→4.0268, 3.10→2.7580, 1.40→1.3258, all
match the worksheet to 1e-4. Our 14 extracted windows sum to **exactly
56.090**. The 1e-4 gap is NOT in the calculator.
- Glazing label→code mapping (g_L is the only cascade effect; lodged U/g
drive §3/§6 via `_g_perpendicular` preferring the lodged value).
## Open — current residual **SAP +1.13** (ours 58.37 vs ws 57.24), all in WINDOWS
The Summary §11 lodges **17 physical window rows**; we end up with **14**
`sap_windows`. Three windows are lost, in two distinct ways:
### Cause 1 (HARD — read before touching `_is_elmhurst_roof_window`)
The mapper routes the two `Double pre 2002` windows (lodged U 3.1 / 3.4) to
**roof** windows via the `U > 3.0` backstop in
`_is_elmhurst_roof_window` (`datatypes/epc/domain/mapper.py`, the final
`return w.u_value > _ELMHURST_ROOF_WINDOW_U_THRESHOLD`). This fixture's
worksheet bills them as **wall** windows (27).
**The trap:** cohort cert **000516** has a window that is *byte-identical*
in every extractable Summary field — `Double pre 2002`, U=3.1,
`location="External wall"`, bp `Main`, orient `North-East` — and *its*
worksheet bills it as a **roof** window (27a). Verified: gating the U>3
rule on `location == "External wall"` makes this fixture pass but
**breaks both 000516 pins** (`test_summary_000516_full_chain_…` and
`test_from_elmhurst_site_notes_matches_hand_built_000516`).
So identical Summary inputs are classified oppositely by the two
worksheets. **No rule keyed on the fields we currently extract can satisfy
both.** Resolving this needs a NEW disambiguating signal — likely a
roof/wall or rooflight field Elmhurst lodges in §11 (or the BP roof
structure) that the extractor doesn't yet capture. Do NOT flip the U>3
heuristic to fix this fixture; it silently regresses 000516.
### Cause 2 (tractable — a plain parsing miss)
The extractor produces **16 windows from 17 §11 rows** — it drops the
`Double glazing, known data` row (BFRC, lodged U=1.00 → adjusted 0.9615,
1st Extension, area 1.00; worksheet "Windows 12" on Ext1). The label maps
fine (→3); the physical row just isn't extracted. Fixing this alone won't
pin the fixture (Cause 1 still blocks) but it's a real, isolatable
extractor bug.
## Tracer recipe (rebuild — the throwaway lived in /tmp)
```python
# from repo root, PYTHONPATH=/workspaces/model
import re, subprocess; from pathlib import Path
from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
from domain.sap10_calculator.rdsap.cert_to_inputs import (
SAP_10_2_SPEC_PRICES, cert_to_inputs, cert_to_demand_inputs,
heat_transmission_section_from_cert)
from domain.sap10_calculator.calculator import calculate_sap_from_inputs
def pages(pdf):
n=int(re.search(r"Pages:\s+(\d+)",subprocess.run(["pdfinfo",str(pdf)],
capture_output=True,text=True).stdout).group(1)); out=[]
for i in range(1,n+1):
L=subprocess.run(["pdftotext","-layout","-f",str(i),"-l",str(i),str(pdf),"-"],
capture_output=True,text=True).stdout
out.append("\n".join(tok for ln in L.splitlines()
for tok in re.split(r"\s{2,}",ln.strip()) if tok))
return out
D=Path("sap worksheets/Recommendations Elmhurst Files/double_glazing/before")
sn=ElmhurstSiteNotesExtractor(pages(D/"Summary_001431 (1).pdf")).extract()
epc=EpcPropertyDataMapper.from_elmhurst_site_notes(sn)
# len(sn.windows)==16 (should be 17); len(epc.sap_windows)==14 (2 → roof, 1 dropped)
```
Per-window A×U on the worksheet uses the ADJUSTED U `1/(1/U_lodged+0.04)`;
sum the §3 `(27)` lines to 60.5577 (we get 56.090 from 14 windows).