diff --git a/backend/documents_parser/tests/test_summary_pdf_mapper_chain.py b/backend/documents_parser/tests/test_summary_pdf_mapper_chain.py index 17b588cf..07047241 100644 --- a/backend/documents_parser/tests/test_summary_pdf_mapper_chain.py +++ b/backend/documents_parser/tests/test_summary_pdf_mapper_chain.py @@ -110,14 +110,16 @@ def test_summary_000474_mapper_extracts_seven_windows() -> None: assert len(epc.sap_windows) == 7 -def test_summary_000474_full_chain_sap_within_half_point_of_worksheet_pdf() -> None: +def test_summary_000474_full_chain_sap_matches_worksheet_pdf_exactly() -> None: # Arrange — the full Summary→ElmhurstSiteNotes→EpcPropertyData→cascade # →SAP path against the U985-0001-000474 worksheet PDF's unrounded # SAP rating (line 257: SAP value 62.2584, rating (258) = 62). - # The cascade itself matches Elmhurst exactly on hand-built inputs; - # this test pins the mapper end-to-end at the SAP-rating layer so - # any future mapper regression (extractor field drop, code-mapping - # break) surfaces here rather than at the residual-pin layer. + # Because the Summary PDF carries the same source-of-truth data that + # the hand-built worksheet fixture encodes by hand, and because the + # cascade matches Elmhurst's calculator to 4 d.p. on those hand- + # built inputs, this end-to-end path MUST produce the same unrounded + # SAP value. Any non-trivial drift = a real mapper bug dropping + # information from the Summary PDF. pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000474_PDF) site_notes = ElmhurstSiteNotesExtractor(pages).extract() epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes) @@ -127,8 +129,10 @@ def test_summary_000474_full_chain_sap_within_half_point_of_worksheet_pdf() -> N cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES) ) - # Assert — unrounded SAP within 0.5 of the worksheet's 62.2584 - # (the same tolerance the user accepted on the API-cert residual - # cohort given the API publishes rounded SAP integers). + # Assert — within the same 1e-4 tolerance the other Elmhurst worksheet + # tests pin against. 0.5 is the API-cert residual tolerance (the API + # publishes rounded SAP integers, so up to half a SAP point is just + # rounding); for Elmhurst worksheet inputs the cascade reproduces + # Elmhurst exactly and we expect identical outputs. worksheet_unrounded_sap = 62.2584 - assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 0.5 + assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 1e-4 diff --git a/docs/sap-spec/NEXT_AGENT_PROMPT.md b/docs/sap-spec/NEXT_AGENT_PROMPT.md new file mode 100644 index 00000000..ebca5762 --- /dev/null +++ b/docs/sap-spec/NEXT_AGENT_PROMPT.md @@ -0,0 +1,245 @@ +# Handover — close the Elmhurst Summary→SAP chain to 1e-4 + +You are picking up branch `ara-backend-design-prd` mid-stream. The +previous agent left a near-complete but **not actually complete** +validation chain. This handover is honest about what's done, what's +still wrong, and why a fresh approach may help. + +## The 30-second picture + +There are two paths into the calculator: + +``` +Path A: hand-built EpcPropertyData → cascade → SAP 62.2584 ← matches Elmhurst worksheet PDF to 4 d.p. ✓ +Path B: Summary_NNNNNN.pdf → extractor → ElmhurstSiteNotes + → from_elmhurst_site_notes → EpcPropertyData + → cascade → SAP 62.5195 ← off by 0.2611 unrounded SAP points ✗ +``` + +Both paths feed the same calculator (`calculate_sap_from_inputs`). Path +A proves the cascade is provably equivalent to Elmhurst's calculator +(`sap_score_continuous` = 62.2584 matches Elmhurst worksheet PDF line +257 exactly). Path B uses `from_elmhurst_site_notes` instead of the +hand-built fixture; **it should produce identical output**, because +the Summary PDF and the hand-built fixture encode the same source-of- +truth data. It doesn't. The 0.26 SAP gap means information is being +dropped in the extractor or mapper. + +The user explicitly rejected "within 0.5 is good enough" — the chain +must reproduce Elmhurst to `1e-4` like every other Elmhurst worksheet +test. + +The end goal is: `api → EpcPropertyData → Sap10 calculator` with +`< 0.5` SAP error (the API publishes rounded SAP integers, so half a +point is rounding noise). But for the Elmhurst-site-notes path, +because Elmhurst's worksheet PDFs publish unrounded line-ref values, +the target is **zero error to 1e-4**. + +## Forcing function + +There is one failing test pinning this: + +``` +backend/documents_parser/tests/test_summary_pdf_mapper_chain.py:: + test_summary_000474_full_chain_sap_matches_worksheet_pdf_exactly +``` + +It asserts `abs(mapped_sap - 62.2584) < 1e-4` and currently fails with +`Δ = 0.2611`. Your job is to drive it to GREEN. + +## Definition of done + +- The failing test above passes at `1e-4` tolerance. +- The two other `test_summary_pdf_mapper_chain.py` tests stay green + (`sap_building_parts == 3`, `sap_windows == 7`). +- The wider `datatypes/epc/` + `backend/documents_parser/tests/` + regression stays green (the 9 pre-existing `test_appendix_u.py` + failures and 1 pre-existing 1e-9 FEE-precision failure are + unrelated — leave them). +- Once 000474 is at 1e-4, replicate the test for the other 5 Summary + PDFs (000477, 000480, 000487, 000490, 000516) — files are under + `sap worksheets/` at the repo root, **untracked**, copy each into + `backend/documents_parser/tests/fixtures/` as you go. +- For each cert, pin against the unrounded SAP value lodged in line + 257 of the corresponding `U985-0001-NNNNNN.pdf` worksheet. + +## What the previous agent did right + +12 commits across the session. The architecturally-load-bearing ones: + +| Slice | Commit | Effect | +|---|---|---| +| 44 | `ea6d4263` | flat_roof_insulation_thickness mapper passthrough | +| 45a/b/c | `f08252dc` / `24f35f8b` / `5acbecc5` | PV cascade per Appendix M + U (orientation × pitch × Table-M1 ZPV × rating-vs-demand climate) | +| audit pins | `15789f5a` / `acc6331d` / `8ac548ca` | u_wall / u_roof / u_floor description-cascade pins against Tables 6 / 16 / §5.12 — proves U-value cascade is spec-correct on the cohort | +| 46 scaffold | `ccf7aa21` | First scaffold test for Summary→EpcPropertyData chain (strict-xfail) | +| 46a | `36f2c7bb` | Multi-bp support: schema adds `ExtensionPart`; extractor parses Main + 1st Extension + 2nd Extension subsections in §4/§7/§8/§9 with "As Main: Yes" inheritance; mapper produces a `SapBuildingPart` per bp | +| 46b | `066dce19` | Layout-style window parser anchored on `W H Area` data line + `Manufacturer ` line — extracts 7 windows from the Summary table layout | +| **46c** | **`256a5afe`** | **String→int code translations for every Elmhurst-coded field the cascade reads (age band, wall_construction, wall_insulation_type, main_fuel_type, heat_emitter_type, main_heating_control, orientation); PCDB index parsed from `pcdf_boiler_reference`; floor ordering + 0.25 m upper-storey adjustment + `is_exposed_floor` flag for "above unheated space"** | + +After 46c, mapped SAP is 62.52 vs target 62.26 (`Δ = 0.26`). + +## What the previous agent got wrong + +**Overclaimed completion at the 0.5-tolerance milestone.** The +original test was written with a 0.5 tolerance (mirroring the +API-cert residual cohort, where the API publishes rounded SAP +integers so half a point is just rounding). That's the wrong +tolerance for the Elmhurst path: Elmhurst lodges full PDF lines with +4-d.p. unrounded values, and our cascade matches them exactly on +hand-built inputs. The bar is `1e-4`, not 0.5. The previous agent +committed Slice 46c with a 0.5-tolerance pin; this handover has +since tightened it to `1e-4` (the failing test above). + +## The remaining diffs (Slice 46c → 1e-4 SAP) + +After Slice 46c, the only differing `cert_to_inputs(epc)` scalar +fields between mapped and hand-built are: + +``` +hot_water_kwh_per_yr: mapped=2291.7821223353485 handbuilt=2291.7784230242883 (Δ < 0.01 — float drift) +pumps_fans_kwh_per_yr: mapped=130.0 handbuilt=160.0 ← 30 kWh, real bug +lighting_kwh_per_yr: mapped=139.94522455112704 handbuilt=139.94522455112707 (Δ < 1e-10 — float noise) +pumps_fans_primary_factor: mapped=1.5128000000000001 handbuilt=1.5128 (float repr — harmless) +fabric_energy_efficiency_kwh_per_m2_yr: mapped=186.62 handbuilt=186.88 (output, not input — driven by solar gains) +``` + +Plus a window-orientation mis-classification (see below). + +### Diff #1: `pumps_fans_kwh_per_yr` = 130 vs 160 (30 kWh) + +This is the dominant residual contributor. Search `cert_to_inputs.py` +for `pumps_fans_kwh_per_yr` to find what drives it. Likely candidates: +- `central_heating_pump_age` (the Summary PDF lodges "Heat pump age: + Unknown" but that's the HEAT pump, not the central heating pump — + may need a separate field on `ElmhurstSiteNotes.MainHeating`) +- Boiler type / FGHRs / weather compensator flags +- Some specific Table 4d/4e cascade input we're dropping + +The cascade reads `MainHeatingDetail.central_heating_pump_age: Optional[int]` +which the Elmhurst mapper doesn't currently populate. + +### Diff #2: Window [4] orientation mis-classified as SE (4) — should be E (3) + +Mapped windows: +``` + [4] orient=4 (SE) W=1.1 H=1.6 U=2.0 +``` + +Hand-built has TWO East U=2.0 windows totalling 3.74 m² area. Mapped's +window [2] (East, 1.98 m²) + window [4] (mis-labelled SE, 1.76 m²) = +3.74 m² ✓ — exactly matches. So the layout-style window parser is +producing `orientation='East-South'` for window [4] when it should be +just `'East'`. Look at `_compose_window_descriptors` in +`backend/documents_parser/elmhurst_extractor.py` — the suffix +token "South" is being joined with the inline "East" prefix when it +shouldn't be (probably the "South" belongs to a different window). + +The window count itself also differs: mapped extracts 7 individual +windows, hand-built consolidates to 5 by `(orientation, U)` group with +width = total area / 1.0 m. Both should be functionally equivalent +to the cascade IF orientations + U-values + total areas all match. So +fixing the orientation should close this gap. + +### Diff #3: float-precision noise + +`hot_water_kwh_per_yr` (sub-0.01 kWh) and `lighting_kwh_per_yr` +(sub-1e-10 kWh) are downstream of accumulation order in the cascade. +`pumps_fans_primary_factor` 1.5128000000000001 vs 1.5128 is a Python +float repr quirk. These won't close the SAP gap; ignore them unless +the other fixes leave the test slightly red and one of these turns +out to be the last decimal. + +## Why a fresh approach may help + +The previous agent's pattern was "fix one bug at a time, ship when +test passes loosely." The right pattern for `1e-4` is the opposite: +**systematically diff every input field between mapped and hand-built, +fix every diff, then run the test once.** That's two more slices +(pumps_fans + window orientation) — not enough for a session of +incremental shipping, so consider doing both in one slice with the +pin as the forcing function. + +You may also want to look harder at the *architectural* question: +why does Elmhurst's "site notes" (the surveyor's input form) need to +go through THREE schemas — `ElmhurstSiteNotes`, then `EpcPropertyData`, +then `CalculatorInputs` — when the hand-built fixture skips straight +to the middle one? The string→int translations in `from_elmhurst_site_notes` +are essentially doing what the extractor should do (or what an +`elmhurst_codes.py` codes-module could express). If `ElmhurstSiteNotes` +stored integer codes alongside the human-readable strings, the mapper +would be a pure projection. + +## Quick-orient commands + +```bash +# Failing test (Δ = 0.26 SAP → target < 1e-4) +python -m pytest backend/documents_parser/tests/test_summary_pdf_mapper_chain.py::test_summary_000474_full_chain_sap_matches_worksheet_pdf_exactly --no-cov --no-header --tb=short + +# All Summary→chain tests (2/3 green, 1 failing — the one above) +python -m pytest backend/documents_parser/tests/test_summary_pdf_mapper_chain.py --no-cov --no-header -v + +# Wider regression to confirm no fresh breakage +python -m pytest datatypes/epc/ backend/documents_parser/tests/ --no-cov --no-header -q +``` + +The 9 `test_appendix_u.py` failures and 1 `test_no_ac_cert_round_trips_fee_equals_space_heating_per_m2` +failure are **pre-existing** from before this session — don't try to +fix them as part of this work. + +## Reference materials + +- **`docs/sap-spec/HANDOVER_NEXT.md`** — original calculator-closure + handover; still useful as the canonical reference for cascade + conventions (AAA tests, 1e-4 tolerance, etc.). +- **`docs/sap-spec/SAP_CALCULATOR.md`** — public API + two-cascade + architecture (rating vs demand). +- **`sap worksheets/Summary_000474.pdf`** (untracked) — the source-of- + truth input for fixture 000474. Mirror tracked at + `backend/documents_parser/tests/fixtures/Summary_000474.pdf`. +- **`sap worksheets/U985-0001-000474.pdf`** (untracked) — the + Elmhurst-computed worksheet with line refs the test pins against + (line 257 for unrounded SAP). +- **`packages/domain/src/domain/sap/worksheet/tests/_elmhurst_worksheet_000474.py`** + — the hand-built `EpcPropertyData` for the same fixture. The CALCULATOR- + EQUIVALENT target the mapper must reproduce. + +## File map for the work ahead + +| File | Role | +|---|---| +| `backend/documents_parser/elmhurst_extractor.py` | PDF → `ElmhurstSiteNotes` extractor; layout-style window parser at `_compose_window_descriptors` is where diff #2 lives | +| `datatypes/epc/surveys/elmhurst_site_notes.py` | The schema (recently extended with `ExtensionPart`); `MainHeating` may need a `central_heating_pump_age` field for diff #1 | +| `datatypes/epc/domain/mapper.py:254-326` | `from_elmhurst_site_notes` — the mapper itself | +| `datatypes/epc/domain/mapper.py:1772-1830` | Code-translation helpers (`_leading_code`, `_elmhurst_wall_construction_int`, etc.) | +| `datatypes/epc/domain/mapper.py:2010-2080` | `_map_elmhurst_building_part` + extension iteration | +| `datatypes/epc/domain/mapper.py:2180+` | `_map_elmhurst_sap_heating` — likely where diff #1 (pumps_fans) is fixed | +| `packages/domain/src/domain/sap/rdsap/cert_to_inputs.py` | Cascade — search `pumps_fans_kwh_per_yr` for diff #1's root cause | + +## Conventions you must honour (from project memory) + +- AAA test convention: every new test uses literal `# Arrange / # Act + / # Assert` headers +- `abs(diff) <= tol` not `pytest.approx` (strict pyright) +- One slice = one commit; stage by name (`?? non_invasive_photos/` and + similar untracked junk must not be staged) +- 1e-4 tolerance, no widening, no xfail (`feedback_zero_error_strict`) +- Strict pyright net-zero on every commit + +## Branch state at handover + +``` +$ git log --oneline -5 +256a5afe Slice 46c: Elmhurst mapper produces calculator-equivalent EpcPropertyData — Summary_000474 SAP within 0.5 of worksheet PDF +066dce19 Slice 46b: Elmhurst extractor parses windows from layout-style Summary PDFs +36f2c7bb Slice 46a: Elmhurst mapper handles multi-bp Summary PDFs — Summary_000474 chain test flips green +ccf7aa21 Scaffold: end-to-end Summary→EpcPropertyData chain test for 000474 (xfail) +8ac548ca Audit: pin u_floor §5.12 formula cascade for cert 0240 cohort geometry +``` + +The 0.5 tolerance in commit 46c's message is stale — this handover +tightened it to 1e-4 after the commit. The first thing you commit +should fix one of the two diffs and explicitly mention closing the +last bit of the gap toward 1e-4. + +Good luck.