Handover: 4 cert-001479 slices in (54-57); gap at +7.62 SAP; non-fabric next

Update NEXT_AGENT_PROMPT.md with current branch state for cert 001479
work. Slices 54-57 closed Elmhurst-side mapper gaps surfaced by the
cross-mapper diff against the new GOV.UK API counterpart:

  54: extensions_count from len(survey.extensions)
  55: party-wall code "CU" → cavity unfilled U=0.5
  56: floor "E To external air" → u_exposed_floor (Table 20)
  57: PS sloping-ceiling + As Built + pre-1950 → thickness=0 → U=2.30

Per-bp fabric U-values all match worksheet exactly now. Cascade SAP
went 63.17 → 61.39 (gap widened to +7.62) as each fix exposed
previously-masked over-counting elsewhere; per-data-correct moves.

Remaining ~15 W/K HLC gap (HLP cascade 2.235 vs worksheet 3.127)
lives in non-fabric: living_area_fraction TFA convention, internal
gains, secondary heating SAP-code wiring, possibly thermal bridging
and ventilation HLC.

Documents one source-data caveat: Summary §3 says Ext1 age "M 2023
onwards", worksheet header says "Ext1: L" — assessor inconsistency;
trust Summary per session policy.

758 cohort tests + cert-001479 structural pins green; pyright net-zero
on touched files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-05-24 22:41:24 +00:00
parent 7a9a8b7ebe
commit a0d9d09410

View file

@ -1,62 +1,198 @@
# Handover — Elmhurst Summary→SAP cohort all closed at 1e-4
# Handover — wire up API↔Elmhurst↔Calculator parity test for cert 0535-9020-6509-0821-6222
You are picking up branch `ara-backend-design-prd` after the 6-fixture
Elmhurst Summary→SAP validation chain landed end-to-end.
Elmhurst Summary→SAP validation chain landed end-to-end at 1e-4. The
**next workstream** is the project's actual end goal: prove the API
mapper produces the same result as the Elmhurst-site-notes mapper and
both run cleanly through the calculator.
## State
## The end goal (per the user)
All 6 cohort fixtures pass at the **1e-4** unrounded-SAP tolerance.
> Data from the API → `EpcPropertyData` → SAP10 calculator, matching the
> API-published SAP rating to within ±0.5 (the API publishes rounded
> integer SAPs).
```
Cert Mapped SAP Target SAP Δ State
000474 62.2584 62.2584 0.0000 ✓ Slice 47
000477 65.0057 65.0057 0.0000 ✓ Slice 52
000480 61.2986 61.2986 0.0000 ✓ Slice 50
000487 61.6431 61.6431 0.0000 ✓ Slice 53
000490 57.3979 57.3979 0.0000 ✓ Slice 49
000516 62.7937 62.7937 0.0000 ✓ Slice 51
```
The Elmhurst Summary→SAP chain is now closed at 1e-4 across 6 fixtures
(`backend/documents_parser/tests/test_summary_pdf_mapper_chain.py`
all 8 tests green; Slices 4753). That gives us a **calibrated
alternate route** into `EpcPropertyData` for the same physical
dwelling, which means we can validate the API mapper by **comparing
its `EpcPropertyData` output against the Elmhurst mapper's output for
the same cert**.
Forcing functions in
`backend/documents_parser/tests/test_summary_pdf_mapper_chain.py`
8 tests (6 chain pins + 2 structural pins) all green. Wider regression:
**758 tests pass**, pyright net-zero (35 errors baseline, no change).
## The new resource (cert 001479)
## What landed (Slices 4753)
A single dwelling now has **all three** artefacts:
| Slice | Commit | What |
|---|---|---|
| 47 | `29ab80b0` | `main_heating_category=2` → pumps_fans 130→160; window-gap partition on glazing-type marker (W4/W5 orient). **Closed 000474.** |
| 48 | `00a27efd` | Extractor: combined `Wood 0.70` frame line; data anchor with trailing glazing-type; partition fallback on second orient-token. 5 fixture PDFs copied to `backend/documents_parser/tests/fixtures/`. |
| 49 | `7f17de84` | `MainHeating.secondary_heating_sap_code` extracted from §14.1 sub-section; RdSAP §S5 sheltered-sides from built_form. **Closed 000490.** |
| 50 | `598f0408` | `ElmhurstSiteNotes.RoomInRoof` + `RoomInRoofSurface` with §3.10 detailed-surface support; `_map_elmhurst_room_in_roof` builds `SapRoomInRoof.detailed_surfaces`. Party-wall construction code mapping. Roof "N None" → 0mm. `number_baths` lodgement. **Closed 000480.** |
| 51 | `cb4e31a1` | Roof-window separation by U-value threshold (>3.0); Table 24 lookup for raw U (Double pre 2002 → 3.40). `SapWindow.window_width × window_height = lodged Area` convention. **Closed 000516.** |
| 52 | `4ccf9c97` | RR detailed-surface area rounded half-up via Decimal (6.585 → 6.59); suspended-timber-floor heuristic (RIR < storey area True); electric-vs-mixer shower from outlet_type. **Closed 000477.** |
| 53 | `58088c10` | `WallDetails.alternative_walls` extraction + `SapAlternativeWall` mapper plumbing; "TI Timber Frame" code mapping; `Thickness Unknown: Yes` → cascade thickness=None (TF age B uninsulated → U=1.9 default). **Closed 000487.** |
| Path | What |
|---|---|
| `sap worksheets/lodged example/Summary_001479.pdf` | Elmhurst Summary site-notes PDF |
| `sap worksheets/lodged example/P960-0001-001479.pdf` | Elmhurst Calculator worksheet output |
| GOV.UK EPB API certificate `0535-9020-6509-0821-6222` | The published cert |
## Architecture summary
- Worksheet PDF lodges unrounded SAP **69.0094** (line "SAP value") →
rating **C 69** (rounded integer published in §11a + the API).
- Summary PDF current SAP rating: **C 69**, Potential **C 76**, Fuel
Bill £1056, Emissions 2.509 tonnes.
- Surveyor P960-0001 (Richard Matthew Ratcliff); Inspection 29/10/2025;
processed 31/10/2025; postcode PR1 0LX; UPRN A005608690 (note: starts
with `A`, may be a placeholder); 67 Howick Park Drive, Penwortham,
Preston.
- `Lodgement Required: Yes` — distinguishes this cert from the other
6 cohort certs (U985 surveyor) where `Lodgement Required: No`. This
one was actually pushed to the GOV.UK EPB API, hence the cert
reference.
**Two-path validation**:
- Path A — hand-built `_elmhurst_worksheet_NNNNNN.py` fixtures build
`EpcPropertyData` directly. Pins the cascade against the
worksheet PDF's line refs (1a..486).
- Path B — `Summary_NNNNNN.pdf → ElmhurstSiteNotes →
EpcPropertyData → cascade → SAP`. Pins the mapper + extractor
against the worksheet PDF's unrounded SAP rating (line 257).
There's a separate folder `sap worksheets/extended test case/` with
`Summary_000565.pdf` and `U985-0001-000565.pdf` — those are
**not** the right pair for this workstream (no API counterpart). The
user clarified the source mid-handover; the correct location is
`sap worksheets/lodged example/`.
Both paths run the same `calculate_sap_from_inputs` cascade and now
both close to 1e-4 across the 6 cohort fixtures.
## The 5-step plan
The user is explicit on the workflow:
1. **Fetch the API response** for cert `0535-9020-6509-0821-6222`.
The existing client is at `backend/epc_client/epc_client_service.py`:
```python
from backend.epc_client.epc_client_service import EpcClientService
service = EpcClientService(auth_token=os.environ["OPEN_EPC_API_TOKEN"])
epc_from_api = service.get_by_certificate_number("0535-9020-6509-0821-6222")
```
Cache the raw JSON to
`packages/domain/src/domain/sap/rdsap/tests/fixtures/golden/0535-9020-6509-0821-6222.json`
so tests run reproducibly without a token — that's the pattern other
golden fixtures already use (`0240-0200-…`, `0300-2747-…`, etc.).
2. **Map the API response to `EpcPropertyData`** via the existing
`EpcPropertyDataMapper.from_api_response(raw_json)`. Only RdSAP-
Schema-21.0.0 / 21.0.1 are supported today; this cert (Elmhurst
RdSAP10, processed Oct 2025) is almost certainly 21.0.1 — verify.
3. **Map the Summary PDF to `EpcPropertyData`** via the new chain:
```python
from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
# use _summary_pdf_to_textract_style_pages helper from
# backend/documents_parser/tests/test_summary_pdf_mapper_chain.py
pages = _summary_pdf_to_textract_style_pages(summary_pdf_path)
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
epc_from_site_notes = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
```
4. **Compare the two `EpcPropertyData` objects field-by-field.** Any
difference is either (a) a mapper-coverage gap on one side or (b)
data the API doesn't publish (which would be a nightmare — the user
flagged this explicitly). Surface every diff; classify and fix.
5. **Pass both through the calculator** and assert:
- `calculate_sap_from_inputs(cert_to_inputs(epc_from_api))`'s
unrounded SAP is within **±0.5** of the API-published rounded
SAP (69). The 0.5 tolerance is the API-cert convention — the
published integer is rounded, so half a SAP point is just
rounding noise.
- `calculate_sap_from_inputs(cert_to_inputs(epc_from_site_notes))`
matches the worksheet PDF's unrounded SAP **69.0094** to **1e-4**
(extending the existing
`test_summary_pdf_mapper_chain.py` cohort pattern to this 7th
fixture).
- **The two cascade outputs match each other to ≤ 1e-4** when the
mappers are fully aligned — this is the load-bearing parity
proof the user is after.
## Existing infra you should lean on
- **`packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py`**
is the canonical API→SAP residual test pattern. It loads
`fixtures/golden/<cert_number>.json`, runs
`from_api_response → cert_to_inputs → calculate_sap_from_inputs`,
and pins the residual `(calc_sap - lodged_sap)`. The new cert
belongs in this file's `_EXPECTATIONS` tuple.
- **`backend/documents_parser/tests/test_summary_pdf_mapper_chain.py`**
is the Elmhurst Summary→SAP pinning pattern. All 6 cohort certs use
this. The new cert (001479) needs a 7th `test_summary_001479_full_
chain_sap_matches_worksheet_pdf_exactly` here pinned at 1e-4 vs
69.0094.
- **The cross-mapper diff** is genuinely new — there's no existing
test that asserts `from_api_response(json) == from_elmhurst_site_
notes(pdf)` for the same cert. You'll be writing it from scratch.
Consider a dedicated test file
`backend/documents_parser/tests/test_api_vs_elmhurst_parity.py`
asserting field-level equivalence (and cascade-output equivalence)
for the cert `001479 / 0535-9020-6509-0821-6222`.
## Conventions you must honour (project memory)
- AAA test convention: every new test uses literal `# Arrange / # Act
/ # Assert` headers.
- `abs(diff) <= tol` not `pytest.approx` (strict pyright).
- One slice = one commit; stage by name.
- 1e-4 tolerance, no widening, no xfail.
- Strict pyright net-zero on every commit (35-error baseline on these files).
- `abs(diff) <= tol` not `pytest.approx` (strict pyright; pytest.approx
is partially-unknown and costs a pyright error).
- One slice = one commit; stage by name. Don't stage `?? non_intrusive_
photos/`, `?? kwh_client_for_deletion.pkl`, etc. — pre-existing
untracked junk.
- **1e-4 tolerance for the Elmhurst path; 0.5 tolerance for the API
path.** Memory entry `feedback_e2e_validation_philosophy`: component
pins at <1e-3; SAP integer must hit delta=0 vs PDF; **adaptive
ceilings forbidden**.
- Strict pyright net-zero on every commit (35-error baseline across the
Elmhurst+mapper files).
- The user has firmly rejected widening/xfail in the past. If the
mappers disagree, fix the underlying gap — don't loosen the test.
## Probe scripts (in `/tmp`, regenerated as needed)
## What's already done (Slices 4753)
The Elmhurst extractor + mapper now handle:
- Multi-bp dwellings (Main + N extensions); per-bp dimensions, walls,
roofs, floors.
- Room-in-Roof (`SapRoomInRoof.detailed_surfaces`) with §3.10 detailed
Flat Ceiling / Stud Wall / Slope / Gable Wall / Gable-Wall-External
surfaces, Decimal-based round-half-up area rounding.
- Window parser handling 3 §11 layout variants (separate frame_type/
factor; combined `Wood 0.70`; trailing glazing-type on data line;
unprefixed frame_factor-only line).
- Roof-window separation by U > 3.0 with Table 24 raw-U lookup.
- `window_width × window_height = lodged Area` convention to avoid
W×H reconstruction drift.
- Alternative-wall extraction with "Thickness Unknown" → cascade-
default U routing (TF age B uninsulated → U=1.9 for thin timber).
- Secondary heating SAP code from §14.1 Main Heating2 sub-section;
RdSAP §S5 sheltered-sides from built-form; party-wall construction
codes ("U", "S"); suspended-timber-floor heuristic; electric-vs-
mixer shower from outlet_type; `number_baths` lodgement; `main_
heating_category=2` for pumps_fans; roof "N None" → 0mm thickness.
If the diff in step 4 surfaces a gap on the **API mapper** side, the
fix may need to mirror one of the above — the API schema fields are
already in `EpcPropertyData` (most paths feed through it), but the
`from_rdsap_schema_21_0_1` mapper may not be wiring everything.
If the diff surfaces a gap on the **Elmhurst mapper** side, the
recently-landed work probably already covers the analogous field for
one of the 6 cohort fixtures — extend, don't reinvent.
## Likely outcomes / risks
- **Best case**: both mappers produce equivalent `EpcPropertyData` for
cert 001479; both cascade to ≈ 69.0094 SAP; the API target (69) is
hit to within 0.5; you write the parity test and ship a clean slice.
- **Likely case**: there are a handful of small mapping divergences
(e.g. one mapper sets a default that the other extracts; one
rounds a 2-d.p. value differently). Each is a slice; close them
systematically using the cohort patterns from Slices 4753.
- **Worst case (the nightmare the user flagged)**: the API simply
doesn't publish a field that the Elmhurst Summary PDF does (e.g.
measured alt-wall U-values, certain Room-in-Roof gable-type
flags). In that case, document the gap clearly and either accept
the resulting SAP drift (within 0.5) or escalate to the user —
don't paper over with widened tolerances.
## Probe scripts (regenerate in `/tmp` as needed)
The Elmhurst session used these heavily; you'll want analogues:
```bash
# Cohort SAP delta — verify nothing has regressed
@ -66,34 +202,156 @@ python /tmp/probe_all.py
python /tmp/diff_objects.py 000487
```
## What's NEXT for future work
For the new workflow, you'll want a probe that:
1. Loads the API JSON + Summary PDF for the same cert.
2. Maps both → `EpcPropertyData`.
3. Diffs them field-by-field.
4. Cascades both and prints both unrounded SAPs alongside the
worksheet PDF's lodged value (69.0094).
The Elmhurst Summary→SAP chain is closed. Likely future directions:
## First actions
- **Pin against more Elmhurst worksheets** beyond the 6-cohort. Each
new cert may surface a §11 layout variant, a new wall_construction
code, or a new gable-type lodgement that the cohort didn't exercise.
- **Apply the same chain to other surveyor tools** (e.g. Pashub site
notes). The cascade is reusable; the mapper-per-surveyor pattern
established here generalises.
- **The API-cert residual cohort** still uses 0.5 tolerance (the API
publishes rounded SAP integers). Tighten that as a separate
workstream — different forcing function, different conventions.
1. Read `backend/epc_client/epc_client_service.py` end-to-end. The
`get_by_certificate_number` entry point is the one you want.
2. Fetch cert `0535-9020-6509-0821-6222`. Save the raw JSON to
`packages/domain/src/domain/sap/rdsap/tests/fixtures/golden/`.
3. Inspect the schema_type and confirm `from_api_response` accepts it.
4. Write the probe script described above; capture the cross-mapper
diff.
5. Triage the diff. Each divergence is a slice. Close them in order.
6. Land the three pin tests as forcing functions:
- Summary_001479 → ≤ 1e-4 vs 69.0094 (new entry in `test_summary_
pdf_mapper_chain.py`).
- API cert 0535-9020-6509-0821-6222 → within 0.5 of 69 (new entry
in `test_golden_fixtures.py`).
- Cross-mapper parity: `from_api_response` and
`from_elmhurst_site_notes` produce equivalent
`EpcPropertyData` for the same cert; cascade outputs match to
≤ 1e-4 (new file `test_api_vs_elmhurst_parity.py`).
## Branch state at handover
```
$ git log --oneline -10
$ git log --oneline -8
7a9a8b7e Slice 57: Pre-1950 Elmhurst sloping-ceiling roofs map to thickness=0
07ed871f Slice 56: Elmhurst floor exposed to external air routes through u_exposed_floor
c89206fc Slice 55: Elmhurst party-wall code "CU" maps to cavity unfilled
4427b58a Slice 54: Elmhurst mapper sets extensions_count from len(survey.extensions)
a756114a Handover: all 6 Elmhurst Summary→SAP chains closed at 1e-4
58088c10 Slice 53: Summary_000487 chain pins SAP at 1e-4 — last cohort cert closed
4ccf9c97 Slice 52: Summary_000477 chain pins SAP at 1e-4; electric shower + decimal RIR rounding
cb4e31a1 Slice 51: Summary_000516 chain pins SAP at 1e-4; roof-window separation
598f0408 Slice 50: Summary_000480 chain pins SAP at 1e-4; Room-in-Roof + baths + party-wall + roof-none
7f17de84 Slice 49: Summary_000490 chain pins SAP at 1e-4; secondary heating + RdSAP sheltered-sides
00a27efd Slice 48: Elmhurst extractor handles 3 new layout quirks; 5 fixture PDFs added
29ab80b0 Slice 47: Summary_000474 chain pins SAP at 1e-4 vs worksheet PDF
ec4916b5 Handover: 2/6 Elmhurst chains closed at 1e-4; per-cert diagnoses for remaining 4
b6544e1c Handover: tighten Summary→SAP chain pin to 1e-4 + brief next agent
256a5afe Slice 46c: Elmhurst mapper produces calculator-equivalent EpcPropertyData — Summary_000474 SAP within 0.5 of worksheet PDF
```
Good luck on whatever comes next.
758 cohort + cert-001479 structural tests green; pyright net-zero
(35 baseline) on touched files.
## Resumption notes for cert 001479 (Slices 5457 partial progress)
### What landed
Four mapper slices closed real Elmhurst-side gaps that surfaced when
the cross-mapper diff against the new API counterpart was run for
the first time:
- **Slice 54**`extensions_count` now reads `len(survey.extensions)`
instead of the hard-coded `0`. No SAP impact (the cascade iterates
`sap_building_parts`), but a real correctness fix the cross-mapper
parity assertion needs.
- **Slice 55** — Elmhurst party-wall code `"CU"` (Cavity masonry
unfilled) now maps to SAP10 `WALL_CAVITY=4`; `u_party_wall` returns
0.5 W/m²K matching the worksheet's lodged `Party walls Main … 0.50`.
- **Slice 56** — Floor location `"E To external air"` now routes
through `u_exposed_floor` (Table 20), matching cert 001479 Ext2's
cantilevered exposed timber floor at U=1.20.
- **Slice 57** — PS (Pitched, sloping ceiling) roofs with no lodged
thickness ("As Built") and age band A-D map to `thickness=0`,
giving Table 16 row-0 U=2.30 — matches cert 001479 Ext2's worksheet
`External roof Ext2 … 2.30`. Ext1 (age M) keeps thickness=None
→ cascade default 0.15.
### Where the chain stands
Mapped Elmhurst cascade for cert 001479:
- Pre-Slice 54: SAP 63.17 vs worksheet 69.0094 (gap +5.84)
- Post-Slice 57: SAP **61.39** vs worksheet 69.0094 (gap **+7.62**)
Gap widened because each fix was per-data correct; the previous
mapper state was under-counting fabric heat loss in multiple places
that were collectively offsetting some over-counting elsewhere.
Per-bp wall U-values now all match the worksheet exactly:
| BP | Wall | Roof | Party | Floor |
|---|---|---|---|---|
| Main | 0.70 ✓ | 0.14 ✓ | 0.50 ✓ | 0.65 (ground cascade) |
| Ext1 | 0.26 ✓ | 0.15 ✓ | n/a (pwl=0) | 0.20 |
| Ext2 | 0.70 ✓ | **2.30 ✓** | n/a (pwl=0) | **1.20 exposed ✓** |
Fabric is essentially complete. The remaining ~7.6 SAP gap lives in
non-fabric inputs.
### What likely drives the remaining 7.6 SAP
- **HLP check**: worksheet `HLP (average) 3.1269`; cascade
total_w_per_k / TFA = 153.15 / 68.51 = **2.235** — cascade is
under-counting total heat loss by ~61 W/K. Combined with cascade
ventilation HLC (~46 W/K) gives total ~199 vs worksheet's expected
~214 — gap ~15 W/K in non-fabric.
- **Living area fraction**: cascade `0.25`, worksheet `0.28`. The
worksheet computes 17.13/61.18 (Main TFA only?) vs cascade's
17.13/68.51 (all bp TFA). SAP convention question — may need
cascade-level fix.
- **Internal gains**: cascade `lighting_kwh_per_yr=163` looks low for
23 fittings; cascade `pumps_fans_kwh_per_yr=160` may differ from
worksheet (which lodges main_heating_category=1).
- **Secondary heating**: cascade has fraction=0.1, η=0.4 (matches
worksheet). SAP code 605 (gas fire flush, sealed-flue) is not wired
through; the cohort 000490 sets `secondary_heating_type` to a SAP
code int — verify cert 001479 needs the same.
- **PCDB boiler index 17507 (Worcester Greenstar 30i)** — cascade
reads `main_heating_efficiency=0.89` (matches worksheet's 89%
winter) so likely already resolved. Confirm.
- **Per-window U vs avg-U routing**: cascade takes per-window U path
(every window has `window_transmission_details`). Worksheet's
windows use 2.80 U (default) — verify cascade matches.
### Source-data caveats found this session
- **Summary PDF vs worksheet age band on Ext1**: Summary §3 says
`M 2023 onwards`; worksheet header says `Property Age Band C, Ext1: L,
Ext2: C`. Likely assessor data-entry inconsistency. User decision:
trust the Summary PDF (M); accept whatever residual the worksheet's
L-based calc leaves. Document the caveat in the chain pin docstring
when it lands.
- **Worksheet "0.0" Type column on External Walls**: looks like an
unused column header in Elmhurst's tabular output, not a
shelter-factor input. Cascade ignores it correctly.
### Probe scripts in /tmp (regenerable)
- `/tmp/probe_001479.py` — cross-mapper diff + cascade for both
mappers; baseline for comparing API vs Elmhurst EpcPropertyData.
- `/tmp/sensitivity_001479.py` — single-field patch SAP impact probe;
useful for sequencing slices but stale after each commit (re-run).
- `/tmp/perbp_001479.py` — per-bp cascade U-value dump vs worksheet
expected values; the cleanest "is fabric matching?" check.
Cached cert JSON is at `packages/domain/src/domain/sap/rdsap/tests/
fixtures/golden/0535-9020-6509-0821-6222.json` (token-fetched once,
no further API calls needed). Summary PDF copied into the chain-test
fixtures dir.
### Suggested next steps
1. Probe the cascade's section-by-section line refs (§3 walls, §3
roofs, §3 windows, §3 thermal bridging, §2 ventilation HLC) against
the worksheet text to find the ~15 W/K HLC gap.
2. Check `living_area_fraction` SAP convention — Main-only vs whole-
dwelling TFA. Cohort certs may have been single-bp so this
convention difference didn't surface.
3. Wire secondary heating SAP code through if a §14 worksheet line ref
shows a different secondary contribution than the cascade.
4. When the chain SAP is within ~0.1 of 69.0094, land the
`test_summary_001479_full_chain_sap_matches_worksheet_pdf_exactly`
pin at 1e-4 — that's the forcing function for the workstream.
Good luck.