mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
Handover: tighten Summary→SAP chain pin to 1e-4 + brief next agent
Slice 46c left the chain at SAP Δ=0.26 vs the Elmhurst worksheet PDF's 62.2584. The user rejected the 0.5 tolerance: because the cascade reproduces Elmhurst exactly on hand-built inputs and the Summary PDF carries the same source-of-truth data, the mapped path must hit 1e-4 like every other Elmhurst worksheet pin. This commit: - Tightens `test_summary_000474_full_chain_sap_matches_worksheet_pdf_exactly` from 0.5 to 1e-4. Currently fails with Δ=0.2611 — the forcing function for the next slice. - Replaces the stale `docs/sap-spec/NEXT_AGENT_PROMPT.md` with a fresh handover identifying the two remaining diffs: * pumps_fans_kwh_per_yr 130 vs 160 (30 kWh; likely `central_heating_pump_age` not plumbed) * Window [4] mis-classified as SE (4) instead of E (3); `_compose_window_descriptors` over-joins suffix tokens - Documents the architectural smell (3-schema chain ElmhurstSiteNotes → EpcPropertyData → CalculatorInputs may be over-engineered). - Lists end-goal: API-path < 0.5 SAP (rounded integers), Elmhurst-path < 1e-4 SAP (unrounded worksheet pins), then replicate for the other 5 Summary PDFs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
256a5afee5
commit
b6544e1cd1
2 changed files with 258 additions and 9 deletions
|
|
@ -110,14 +110,16 @@ def test_summary_000474_mapper_extracts_seven_windows() -> None:
|
|||
assert len(epc.sap_windows) == 7
|
||||
|
||||
|
||||
def test_summary_000474_full_chain_sap_within_half_point_of_worksheet_pdf() -> None:
|
||||
def test_summary_000474_full_chain_sap_matches_worksheet_pdf_exactly() -> None:
|
||||
# Arrange — the full Summary→ElmhurstSiteNotes→EpcPropertyData→cascade
|
||||
# →SAP path against the U985-0001-000474 worksheet PDF's unrounded
|
||||
# SAP rating (line 257: SAP value 62.2584, rating (258) = 62).
|
||||
# The cascade itself matches Elmhurst exactly on hand-built inputs;
|
||||
# this test pins the mapper end-to-end at the SAP-rating layer so
|
||||
# any future mapper regression (extractor field drop, code-mapping
|
||||
# break) surfaces here rather than at the residual-pin layer.
|
||||
# Because the Summary PDF carries the same source-of-truth data that
|
||||
# the hand-built worksheet fixture encodes by hand, and because the
|
||||
# cascade matches Elmhurst's calculator to 4 d.p. on those hand-
|
||||
# built inputs, this end-to-end path MUST produce the same unrounded
|
||||
# SAP value. Any non-trivial drift = a real mapper bug dropping
|
||||
# information from the Summary PDF.
|
||||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000474_PDF)
|
||||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||||
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||||
|
|
@ -127,8 +129,10 @@ def test_summary_000474_full_chain_sap_within_half_point_of_worksheet_pdf() -> N
|
|||
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
|
||||
)
|
||||
|
||||
# Assert — unrounded SAP within 0.5 of the worksheet's 62.2584
|
||||
# (the same tolerance the user accepted on the API-cert residual
|
||||
# cohort given the API publishes rounded SAP integers).
|
||||
# Assert — within the same 1e-4 tolerance the other Elmhurst worksheet
|
||||
# tests pin against. 0.5 is the API-cert residual tolerance (the API
|
||||
# publishes rounded SAP integers, so up to half a SAP point is just
|
||||
# rounding); for Elmhurst worksheet inputs the cascade reproduces
|
||||
# Elmhurst exactly and we expect identical outputs.
|
||||
worksheet_unrounded_sap = 62.2584
|
||||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 0.5
|
||||
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 1e-4
|
||||
|
|
|
|||
245
docs/sap-spec/NEXT_AGENT_PROMPT.md
Normal file
245
docs/sap-spec/NEXT_AGENT_PROMPT.md
Normal file
|
|
@ -0,0 +1,245 @@
|
|||
# Handover — close the Elmhurst Summary→SAP chain to 1e-4
|
||||
|
||||
You are picking up branch `ara-backend-design-prd` mid-stream. The
|
||||
previous agent left a near-complete but **not actually complete**
|
||||
validation chain. This handover is honest about what's done, what's
|
||||
still wrong, and why a fresh approach may help.
|
||||
|
||||
## The 30-second picture
|
||||
|
||||
There are two paths into the calculator:
|
||||
|
||||
```
|
||||
Path A: hand-built EpcPropertyData → cascade → SAP 62.2584 ← matches Elmhurst worksheet PDF to 4 d.p. ✓
|
||||
Path B: Summary_NNNNNN.pdf → extractor → ElmhurstSiteNotes
|
||||
→ from_elmhurst_site_notes → EpcPropertyData
|
||||
→ cascade → SAP 62.5195 ← off by 0.2611 unrounded SAP points ✗
|
||||
```
|
||||
|
||||
Both paths feed the same calculator (`calculate_sap_from_inputs`). Path
|
||||
A proves the cascade is provably equivalent to Elmhurst's calculator
|
||||
(`sap_score_continuous` = 62.2584 matches Elmhurst worksheet PDF line
|
||||
257 exactly). Path B uses `from_elmhurst_site_notes` instead of the
|
||||
hand-built fixture; **it should produce identical output**, because
|
||||
the Summary PDF and the hand-built fixture encode the same source-of-
|
||||
truth data. It doesn't. The 0.26 SAP gap means information is being
|
||||
dropped in the extractor or mapper.
|
||||
|
||||
The user explicitly rejected "within 0.5 is good enough" — the chain
|
||||
must reproduce Elmhurst to `1e-4` like every other Elmhurst worksheet
|
||||
test.
|
||||
|
||||
The end goal is: `api → EpcPropertyData → Sap10 calculator` with
|
||||
`< 0.5` SAP error (the API publishes rounded SAP integers, so half a
|
||||
point is rounding noise). But for the Elmhurst-site-notes path,
|
||||
because Elmhurst's worksheet PDFs publish unrounded line-ref values,
|
||||
the target is **zero error to 1e-4**.
|
||||
|
||||
## Forcing function
|
||||
|
||||
There is one failing test pinning this:
|
||||
|
||||
```
|
||||
backend/documents_parser/tests/test_summary_pdf_mapper_chain.py::
|
||||
test_summary_000474_full_chain_sap_matches_worksheet_pdf_exactly
|
||||
```
|
||||
|
||||
It asserts `abs(mapped_sap - 62.2584) < 1e-4` and currently fails with
|
||||
`Δ = 0.2611`. Your job is to drive it to GREEN.
|
||||
|
||||
## Definition of done
|
||||
|
||||
- The failing test above passes at `1e-4` tolerance.
|
||||
- The two other `test_summary_pdf_mapper_chain.py` tests stay green
|
||||
(`sap_building_parts == 3`, `sap_windows == 7`).
|
||||
- The wider `datatypes/epc/` + `backend/documents_parser/tests/`
|
||||
regression stays green (the 9 pre-existing `test_appendix_u.py`
|
||||
failures and 1 pre-existing 1e-9 FEE-precision failure are
|
||||
unrelated — leave them).
|
||||
- Once 000474 is at 1e-4, replicate the test for the other 5 Summary
|
||||
PDFs (000477, 000480, 000487, 000490, 000516) — files are under
|
||||
`sap worksheets/` at the repo root, **untracked**, copy each into
|
||||
`backend/documents_parser/tests/fixtures/` as you go.
|
||||
- For each cert, pin against the unrounded SAP value lodged in line
|
||||
257 of the corresponding `U985-0001-NNNNNN.pdf` worksheet.
|
||||
|
||||
## What the previous agent did right
|
||||
|
||||
12 commits across the session. The architecturally-load-bearing ones:
|
||||
|
||||
| Slice | Commit | Effect |
|
||||
|---|---|---|
|
||||
| 44 | `ea6d4263` | flat_roof_insulation_thickness mapper passthrough |
|
||||
| 45a/b/c | `f08252dc` / `24f35f8b` / `5acbecc5` | PV cascade per Appendix M + U (orientation × pitch × Table-M1 ZPV × rating-vs-demand climate) |
|
||||
| audit pins | `15789f5a` / `acc6331d` / `8ac548ca` | u_wall / u_roof / u_floor description-cascade pins against Tables 6 / 16 / §5.12 — proves U-value cascade is spec-correct on the cohort |
|
||||
| 46 scaffold | `ccf7aa21` | First scaffold test for Summary→EpcPropertyData chain (strict-xfail) |
|
||||
| 46a | `36f2c7bb` | Multi-bp support: schema adds `ExtensionPart`; extractor parses Main + 1st Extension + 2nd Extension subsections in §4/§7/§8/§9 with "As Main: Yes" inheritance; mapper produces a `SapBuildingPart` per bp |
|
||||
| 46b | `066dce19` | Layout-style window parser anchored on `W H Area` data line + `Manufacturer <U>` line — extracts 7 windows from the Summary table layout |
|
||||
| **46c** | **`256a5afe`** | **String→int code translations for every Elmhurst-coded field the cascade reads (age band, wall_construction, wall_insulation_type, main_fuel_type, heat_emitter_type, main_heating_control, orientation); PCDB index parsed from `pcdf_boiler_reference`; floor ordering + 0.25 m upper-storey adjustment + `is_exposed_floor` flag for "above unheated space"** |
|
||||
|
||||
After 46c, mapped SAP is 62.52 vs target 62.26 (`Δ = 0.26`).
|
||||
|
||||
## What the previous agent got wrong
|
||||
|
||||
**Overclaimed completion at the 0.5-tolerance milestone.** The
|
||||
original test was written with a 0.5 tolerance (mirroring the
|
||||
API-cert residual cohort, where the API publishes rounded SAP
|
||||
integers so half a point is just rounding). That's the wrong
|
||||
tolerance for the Elmhurst path: Elmhurst lodges full PDF lines with
|
||||
4-d.p. unrounded values, and our cascade matches them exactly on
|
||||
hand-built inputs. The bar is `1e-4`, not 0.5. The previous agent
|
||||
committed Slice 46c with a 0.5-tolerance pin; this handover has
|
||||
since tightened it to `1e-4` (the failing test above).
|
||||
|
||||
## The remaining diffs (Slice 46c → 1e-4 SAP)
|
||||
|
||||
After Slice 46c, the only differing `cert_to_inputs(epc)` scalar
|
||||
fields between mapped and hand-built are:
|
||||
|
||||
```
|
||||
hot_water_kwh_per_yr: mapped=2291.7821223353485 handbuilt=2291.7784230242883 (Δ < 0.01 — float drift)
|
||||
pumps_fans_kwh_per_yr: mapped=130.0 handbuilt=160.0 ← 30 kWh, real bug
|
||||
lighting_kwh_per_yr: mapped=139.94522455112704 handbuilt=139.94522455112707 (Δ < 1e-10 — float noise)
|
||||
pumps_fans_primary_factor: mapped=1.5128000000000001 handbuilt=1.5128 (float repr — harmless)
|
||||
fabric_energy_efficiency_kwh_per_m2_yr: mapped=186.62 handbuilt=186.88 (output, not input — driven by solar gains)
|
||||
```
|
||||
|
||||
Plus a window-orientation mis-classification (see below).
|
||||
|
||||
### Diff #1: `pumps_fans_kwh_per_yr` = 130 vs 160 (30 kWh)
|
||||
|
||||
This is the dominant residual contributor. Search `cert_to_inputs.py`
|
||||
for `pumps_fans_kwh_per_yr` to find what drives it. Likely candidates:
|
||||
- `central_heating_pump_age` (the Summary PDF lodges "Heat pump age:
|
||||
Unknown" but that's the HEAT pump, not the central heating pump —
|
||||
may need a separate field on `ElmhurstSiteNotes.MainHeating`)
|
||||
- Boiler type / FGHRs / weather compensator flags
|
||||
- Some specific Table 4d/4e cascade input we're dropping
|
||||
|
||||
The cascade reads `MainHeatingDetail.central_heating_pump_age: Optional[int]`
|
||||
which the Elmhurst mapper doesn't currently populate.
|
||||
|
||||
### Diff #2: Window [4] orientation mis-classified as SE (4) — should be E (3)
|
||||
|
||||
Mapped windows:
|
||||
```
|
||||
[4] orient=4 (SE) W=1.1 H=1.6 U=2.0
|
||||
```
|
||||
|
||||
Hand-built has TWO East U=2.0 windows totalling 3.74 m² area. Mapped's
|
||||
window [2] (East, 1.98 m²) + window [4] (mis-labelled SE, 1.76 m²) =
|
||||
3.74 m² ✓ — exactly matches. So the layout-style window parser is
|
||||
producing `orientation='East-South'` for window [4] when it should be
|
||||
just `'East'`. Look at `_compose_window_descriptors` in
|
||||
`backend/documents_parser/elmhurst_extractor.py` — the suffix
|
||||
token "South" is being joined with the inline "East" prefix when it
|
||||
shouldn't be (probably the "South" belongs to a different window).
|
||||
|
||||
The window count itself also differs: mapped extracts 7 individual
|
||||
windows, hand-built consolidates to 5 by `(orientation, U)` group with
|
||||
width = total area / 1.0 m. Both should be functionally equivalent
|
||||
to the cascade IF orientations + U-values + total areas all match. So
|
||||
fixing the orientation should close this gap.
|
||||
|
||||
### Diff #3: float-precision noise
|
||||
|
||||
`hot_water_kwh_per_yr` (sub-0.01 kWh) and `lighting_kwh_per_yr`
|
||||
(sub-1e-10 kWh) are downstream of accumulation order in the cascade.
|
||||
`pumps_fans_primary_factor` 1.5128000000000001 vs 1.5128 is a Python
|
||||
float repr quirk. These won't close the SAP gap; ignore them unless
|
||||
the other fixes leave the test slightly red and one of these turns
|
||||
out to be the last decimal.
|
||||
|
||||
## Why a fresh approach may help
|
||||
|
||||
The previous agent's pattern was "fix one bug at a time, ship when
|
||||
test passes loosely." The right pattern for `1e-4` is the opposite:
|
||||
**systematically diff every input field between mapped and hand-built,
|
||||
fix every diff, then run the test once.** That's two more slices
|
||||
(pumps_fans + window orientation) — not enough for a session of
|
||||
incremental shipping, so consider doing both in one slice with the
|
||||
pin as the forcing function.
|
||||
|
||||
You may also want to look harder at the *architectural* question:
|
||||
why does Elmhurst's "site notes" (the surveyor's input form) need to
|
||||
go through THREE schemas — `ElmhurstSiteNotes`, then `EpcPropertyData`,
|
||||
then `CalculatorInputs` — when the hand-built fixture skips straight
|
||||
to the middle one? The string→int translations in `from_elmhurst_site_notes`
|
||||
are essentially doing what the extractor should do (or what an
|
||||
`elmhurst_codes.py` codes-module could express). If `ElmhurstSiteNotes`
|
||||
stored integer codes alongside the human-readable strings, the mapper
|
||||
would be a pure projection.
|
||||
|
||||
## Quick-orient commands
|
||||
|
||||
```bash
|
||||
# Failing test (Δ = 0.26 SAP → target < 1e-4)
|
||||
python -m pytest backend/documents_parser/tests/test_summary_pdf_mapper_chain.py::test_summary_000474_full_chain_sap_matches_worksheet_pdf_exactly --no-cov --no-header --tb=short
|
||||
|
||||
# All Summary→chain tests (2/3 green, 1 failing — the one above)
|
||||
python -m pytest backend/documents_parser/tests/test_summary_pdf_mapper_chain.py --no-cov --no-header -v
|
||||
|
||||
# Wider regression to confirm no fresh breakage
|
||||
python -m pytest datatypes/epc/ backend/documents_parser/tests/ --no-cov --no-header -q
|
||||
```
|
||||
|
||||
The 9 `test_appendix_u.py` failures and 1 `test_no_ac_cert_round_trips_fee_equals_space_heating_per_m2`
|
||||
failure are **pre-existing** from before this session — don't try to
|
||||
fix them as part of this work.
|
||||
|
||||
## Reference materials
|
||||
|
||||
- **`docs/sap-spec/HANDOVER_NEXT.md`** — original calculator-closure
|
||||
handover; still useful as the canonical reference for cascade
|
||||
conventions (AAA tests, 1e-4 tolerance, etc.).
|
||||
- **`docs/sap-spec/SAP_CALCULATOR.md`** — public API + two-cascade
|
||||
architecture (rating vs demand).
|
||||
- **`sap worksheets/Summary_000474.pdf`** (untracked) — the source-of-
|
||||
truth input for fixture 000474. Mirror tracked at
|
||||
`backend/documents_parser/tests/fixtures/Summary_000474.pdf`.
|
||||
- **`sap worksheets/U985-0001-000474.pdf`** (untracked) — the
|
||||
Elmhurst-computed worksheet with line refs the test pins against
|
||||
(line 257 for unrounded SAP).
|
||||
- **`packages/domain/src/domain/sap/worksheet/tests/_elmhurst_worksheet_000474.py`**
|
||||
— the hand-built `EpcPropertyData` for the same fixture. The CALCULATOR-
|
||||
EQUIVALENT target the mapper must reproduce.
|
||||
|
||||
## File map for the work ahead
|
||||
|
||||
| File | Role |
|
||||
|---|---|
|
||||
| `backend/documents_parser/elmhurst_extractor.py` | PDF → `ElmhurstSiteNotes` extractor; layout-style window parser at `_compose_window_descriptors` is where diff #2 lives |
|
||||
| `datatypes/epc/surveys/elmhurst_site_notes.py` | The schema (recently extended with `ExtensionPart`); `MainHeating` may need a `central_heating_pump_age` field for diff #1 |
|
||||
| `datatypes/epc/domain/mapper.py:254-326` | `from_elmhurst_site_notes` — the mapper itself |
|
||||
| `datatypes/epc/domain/mapper.py:1772-1830` | Code-translation helpers (`_leading_code`, `_elmhurst_wall_construction_int`, etc.) |
|
||||
| `datatypes/epc/domain/mapper.py:2010-2080` | `_map_elmhurst_building_part` + extension iteration |
|
||||
| `datatypes/epc/domain/mapper.py:2180+` | `_map_elmhurst_sap_heating` — likely where diff #1 (pumps_fans) is fixed |
|
||||
| `packages/domain/src/domain/sap/rdsap/cert_to_inputs.py` | Cascade — search `pumps_fans_kwh_per_yr` for diff #1's root cause |
|
||||
|
||||
## Conventions you must honour (from project memory)
|
||||
|
||||
- AAA test convention: every new test uses literal `# Arrange / # Act
|
||||
/ # Assert` headers
|
||||
- `abs(diff) <= tol` not `pytest.approx` (strict pyright)
|
||||
- One slice = one commit; stage by name (`?? non_invasive_photos/` and
|
||||
similar untracked junk must not be staged)
|
||||
- 1e-4 tolerance, no widening, no xfail (`feedback_zero_error_strict`)
|
||||
- Strict pyright net-zero on every commit
|
||||
|
||||
## Branch state at handover
|
||||
|
||||
```
|
||||
$ git log --oneline -5
|
||||
256a5afe Slice 46c: Elmhurst mapper produces calculator-equivalent EpcPropertyData — Summary_000474 SAP within 0.5 of worksheet PDF
|
||||
066dce19 Slice 46b: Elmhurst extractor parses windows from layout-style Summary PDFs
|
||||
36f2c7bb Slice 46a: Elmhurst mapper handles multi-bp Summary PDFs — Summary_000474 chain test flips green
|
||||
ccf7aa21 Scaffold: end-to-end Summary→EpcPropertyData chain test for 000474 (xfail)
|
||||
8ac548ca Audit: pin u_floor §5.12 formula cascade for cert 0240 cohort geometry
|
||||
```
|
||||
|
||||
The 0.5 tolerance in commit 46c's message is stale — this handover
|
||||
tightened it to 1e-4 after the commit. The first thing you commit
|
||||
should fix one of the two diffs and explicitly mention closing the
|
||||
last bit of the gap toward 1e-4.
|
||||
|
||||
Good luck.
|
||||
Loading…
Add table
Reference in a new issue