mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
docs: rewrite NEXT_AGENT_PROMPT for Slice 87-94 state
Cert 001479 API path closed from +3.08 → +0.0006 SAP delta vs worksheet 69.0094 in Slices 87-94. Fabric heat loss is now EXACT across all 6 components. Replaced the prior handover (which assumed the Elmhurst path was still RED with a 0.26 SAP gap on cohort 000474) with the current state: - Acceptance criterion corrected: 1e-4 against worksheet continuous SAP (not ±0.5 against API integer) when a worksheet is available. - Validation layer status table reflects current GREEN/RED state. - Slice 87-94 progression captured with each fix's SAP delta impact. - Diagnostic probe + queue documented for next agent: close 001479's residual +0.0006 (HW + gains), write Layer 3 diff test, then process new cert pairs as user sources them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
0320341837
commit
985a59e1f9
1 changed files with 210 additions and 179 deletions
|
|
@ -1,238 +1,269 @@
|
|||
# Handover — API mapper validation via Elmhurst cross-check
|
||||
# Handover — API mapper at 1e-3 on cert 001479, closing to 1e-4
|
||||
|
||||
You are picking up branch `ara-backend-design-prd`. The end goal of
|
||||
this workstream is clear and worth re-stating before anything else.
|
||||
You are picking up branch `ara-backend-design-prd`. The cert 001479 API
|
||||
path is at SAP delta **+0.0006** (was +3.08); fabric heat loss is
|
||||
EXACT. The remaining work is closing the sub-1e-3 gap and validating
|
||||
against more cert pairs.
|
||||
|
||||
## The end goal (re-confirmed by the user)
|
||||
|
||||
> **Production goal: `API JSON → EpcPropertyDataMapper.from_api_response
|
||||
> → SAP10 calculator → SAP rating` must match the API-published SAP
|
||||
> rating to within ±0.5 (the API publishes rounded integer SAPs).**
|
||||
> **Production goal: `API JSON → EpcPropertyDataMapper.from_api_
|
||||
> response → SAP10 calculator → SAP rating` must match the SAP value
|
||||
> the calculator emitted at lodge time to within 1e-4.**
|
||||
>
|
||||
> The work in progress facilitates that by giving us an *independent*
|
||||
> route to the same dwelling's `EpcPropertyData` — `Summary PDF →
|
||||
> ElmhurstSiteNotesExtractor → EpcPropertyDataMapper.from_elmhurst_
|
||||
> site_notes → SAP`. Once both routes produce the same
|
||||
> `EpcPropertyData` (or a documented superset) for the same cert,
|
||||
> the API mapper is validated by transitivity.
|
||||
> The acceptance tolerance is **1e-4 against the worksheet's
|
||||
> continuous SAP value**, not ±0.5 against the published integer.
|
||||
> ±0.5 only applies when no worksheet is available (the 8 cohort
|
||||
> golden certs we have as API-only); when we have both API + worksheet
|
||||
> (cert 001479), the 1e-4 bar is the bar.
|
||||
|
||||
The validation cohort is the 6 U985-surveyor certs (000474, 000477,
|
||||
000480, 000487, 000490, 000516) — each has a hand-built
|
||||
`EpcPropertyData` fixture that cascades to the worksheet PDF's lodged
|
||||
SAP at 1e-4. The 7th cert (001479 / API ref `0535-9020-6509-0821-6222`)
|
||||
is the first with **both** an Elmhurst site-notes lodgement AND a real
|
||||
GOV.UK API counterpart — making it the load-bearing cross-mapper
|
||||
parity-test fixture.
|
||||
The earlier handover stated ±0.5 — that was wrong. The user
|
||||
emphasised this twice: the calc is mechanical, identical inputs must
|
||||
produce identical outputs, so when we have the continuous worksheet
|
||||
value we should hit it exactly. See the conversation thread that led
|
||||
to Slice 87.
|
||||
|
||||
Once both mappers produce equivalent `EpcPropertyData` for cert
|
||||
001479, running each through the calculator and comparing the SAP
|
||||
rating against the API-published `69` is the final acceptance test
|
||||
for the production flow.
|
||||
|
||||
## The workstream layers (current state of each)
|
||||
|
||||
The work is structured as four nested validation layers — each
|
||||
validates the layer below. Closing the inner-most one first means the
|
||||
upper layers can rely on it as a reference.
|
||||
## Validation layers (current state)
|
||||
|
||||
```
|
||||
Layer 4: API mapper validated end-to-end (production goal)
|
||||
Layer 4: API mapper cascade SAP = worksheet SAP at 1e-4 (production goal)
|
||||
└── Layer 3: API mapper EpcPropertyData ≡ Elmhurst mapper EpcPropertyData
|
||||
└── Layer 2: Elmhurst mapper EpcPropertyData ≡ hand-built fixture
|
||||
└── Layer 1: hand-built fixture → cascade SAP at 1e-4 vs worksheet
|
||||
└── Layer 2: Elmhurst-mapped EpcPropertyData → cascade SAP = worksheet SAP at 1e-4
|
||||
└── Layer 1: hand-built EpcPropertyData → cascade SAP = worksheet SAP at 1e-4
|
||||
```
|
||||
|
||||
| Layer | Status | Where |
|
||||
| Layer | Status |
|
||||
|---|---|
|
||||
| **1 — hand-built cascade pin** | ✅ 6 cohort certs (000474, 000477, 000480, 000487, 000490, 000516) GREEN at 1e-4; cert 001479 hand-built skeleton (Slice 62) still RED (2 of 11 pins green, hand-built has its own bugs — orthogonal to the production path) |
|
||||
| **2 — Elmhurst-mapped path** | ✅ **Cert 001479 GREEN at 1e-4** (Slice 89); cohort: 2 GREEN (000477, 000516), 4 RED (000474, 000480, 000487, 000490 — Elmhurst U985 worksheets violate the RdSAP 10 §5 (12) spec; orthogonal to the production goal) |
|
||||
| **3 — API-mapped ≡ Elmhurst-mapped (field-level)** | 🟡 Cascade outputs match within 1e-3 SAP; field-level diff test not yet written |
|
||||
| **4 — API path cascade SAP** | 🟡 **Cert 001479 at +0.0006 SAP delta from worksheet** (was +3.08); 9 other golden certs pinned at residual-from-integer at tolerance 0 |
|
||||
|
||||
## Cumulative API SAP delta progression (cert 001479)
|
||||
|
||||
The big breakthrough: implementing the RdSAP 10 §5 (12) spec rule
|
||||
(`Floor infiltration (suspended timber ground floor only)` — page 29
|
||||
of `docs/sap-spec/RdSAP 10 Specification 10-06-2025.pdf`) revealed a
|
||||
series of API-mapper coverage gaps that all needed fixing for the
|
||||
spec rule's premise to be met. Each slice closed one gap:
|
||||
|
||||
| Slice | Fix | API SAP delta |
|
||||
|---|---|---|
|
||||
| **1 — hand-built cascade pin** | ✅ 6 cohort certs GREEN at 1e-4; cert 001479 hand-built skeleton at 2/11 pins green (Slice 62 unfinished) | `test_e2e_elmhurst_sap_score.py::test_sap_result_pin` |
|
||||
| **2 — Elmhurst-mapped ≡ hand-built** | ✅ Cohort 000474 fully GREEN (Slice 70); 5 other cohort certs PENDING; cert 001479 PENDING | `test_summary_pdf_mapper_chain.py::test_from_elmhurst_site_notes_matches_hand_built_NNNNNN` |
|
||||
| **3 — API-mapped ≡ Elmhurst-mapped** | PENDING — no test exists yet | New file `test_api_vs_elmhurst_parity.py` (or extension of the chain test) |
|
||||
| **4 — API mapper cascade ±0.5 SAP** | RED — cascade SAP 72.08 vs published 69 (delta +3.08, was +9.7 before slices 58-60); golden-fixtures residual pins green | `test_golden_fixtures.py` for cohort + new entry for `0535-9020-6509-0821-6222` |
|
||||
| baseline | broken party wall enum, no descriptive strings | **+3.0752** |
|
||||
| 87 | RdSAP 10 §5 (12) spec rule + Elmhurst-mapper switch to None | — |
|
||||
| 88 | thread `bp.floor_construction_type` into `u_floor` cascade | — |
|
||||
| 89 | PS pitched-sloping-ceiling roof area `÷ cos(30°)` (added `roof_construction_type` field on `SapBuildingPart`) | — |
|
||||
| 90 | API `party_wall_construction` enum → SAP10 `u_party_wall` codes (1→3 Solid, 2→4 Cavity, etc.) | +1.5298 |
|
||||
| 91 | descriptive strings via int→str lookups (`floor_construction_type`, `roof_construction_type`) + pre-1950 PS sloping → thickness=0 + per-bp roof description fix | +1.0970 |
|
||||
| 92 | upper-floor `room_height_m += 0.25` + `is_exposed_floor` from `floor_heat_loss==1` + `floor_insulation_thickness="NI"→None` | +1.0022 |
|
||||
| 93 | `window_transmission_details` from `glazing_type` int (code 3 → U=2.8/g=0.76, code 13 → U=1.4/g=0.72) | +1.1846 |
|
||||
| 94 | `sheltered_sides` from API `built_form` + `floor_type` from `floor_heat_loss==7` | **+0.0006** |
|
||||
|
||||
## What's done (slices 54–70 in this branch)
|
||||
Fabric breakdown for cert 001479 API path is now COMPLETELY EXACT
|
||||
(all 6 components match worksheet to 4 d.p.):
|
||||
|
||||
Cascade-level fixes (help both mappers):
|
||||
- Slice 58 `e3dc0b28` — secondary fuel cost routes through lodged `secondary_fuel_type` (was hard-coded to electric tariff); closed a 9-SAP-point ECF distortion on gas-secondary certs.
|
||||
- Slice 59 `175873b4` — `heat_transmission_from_cert` apportions windows per `window_location` per bp (not all-on-Main); load-bearing for multi-bp dwellings with non-uniform wall U.
|
||||
- Slice 60 `31c01a7e` — thermal bridging `y` is dwelling-wide (primary bp's age band), not per-bp.
|
||||
| Component | Cascade | Worksheet target |
|
||||
|---|---|---|
|
||||
| walls | 39.7652 | 39.7652 ✓ |
|
||||
| party walls | 17.0700 | 17.0700 ✓ |
|
||||
| roof | 10.3438 | 10.3438 ✓ |
|
||||
| floor | 23.1705 | 23.1705 ✓ |
|
||||
| windows | 43.5962 | 43.5962 ✓ |
|
||||
| doors | 5.5500 | 5.5500 ✓ |
|
||||
| **fabric total** | **139.4957** | **139.4957 ✓** |
|
||||
|
||||
Elmhurst-mapper fixes (Slice 2 layer):
|
||||
- Slice 54 `4427b58a` — `extensions_count` from `len(survey.extensions)`.
|
||||
- Slice 55 `c89206fc` — party-wall code `"CU"` → 4 (cavity unfilled U=0.5).
|
||||
- Slice 56 `07ed871f` — floor `"E To external air"` → `u_exposed_floor` Table 20.
|
||||
- Slice 57 `7a9a8b7e` — PS sloping-ceiling + As-Built + pre-1950 age → `thickness=0` → U=2.30.
|
||||
- Slice 66+67 `ca39d072` — `country_code="ENG"`, `has_draught_lobby` gate, plus 5 heating-detail int surfacings (`boiler_flue_type`, `emitter_temperature`, `central_heating_pump_age`, `main_heating_number`, `water_heating_fuel`).
|
||||
- Slice 68 `6baf66cd` — Elmhurst party-wall `"U"` → 0 sentinel; cohort hand-built `central_heating_pump_age_str="Unknown"`.
|
||||
## What's left (queue, in priority order)
|
||||
|
||||
Hand-built fixture work (Slice 1 layer + parity setup):
|
||||
- Slice 62 `ee98dbe0` — created `_elmhurst_worksheet_001479.py` skeleton; 2/11 cascade pins green (the rest need iteration; `sap_score_continuous=65.99 vs 69.0094`, gap −3.02 SAP).
|
||||
- Slice 64 `b5cbfe83` — bulk-update cohort 000474 hand-built with Cat A fields (descriptive strings, ventilation zero counts, top-level booleans); 50 → 14 mapper-vs-hand-built diffs.
|
||||
- Slice 65 `4997039f` — added `shower_outlets` + `number_baths` to cohort 000474 hand-built.
|
||||
- Slice 69 `d8a37029` — expanded cohort 000474 windows 5 → 7 (1:1 with §11 table).
|
||||
- Slice 70 `035d916d` — added window-subfield exclusion to diff helper + `frame_factor=0.7` default in `make_window`. **Cohort 000474 diff GREEN**.
|
||||
### 1. Close cert 001479's residual 0.0006 SAP gap (1-3 slices)
|
||||
|
||||
Diff test infrastructure (Slice 63 `01d234dd`):
|
||||
- `_LOAD_BEARING_FIELDS` allow-list in `test_summary_pdf_mapper_chain.py` (~40 top-level fields driving cascade or cross-mapper semantics).
|
||||
- `_NON_LOAD_BEARING_WINDOW_SUBFIELDS` deny-list (descriptive int/str encodings that don't affect cascade).
|
||||
- `_diff_load_bearing` recursive helper, strict-pyright-clean (`mapped/hand_built: object`, narrowed via isinstance).
|
||||
- `test_from_elmhurst_site_notes_matches_hand_built_000474` is the tracer-bullet test.
|
||||
|
||||
## What's RED right now
|
||||
The remaining gap is non-fabric. Diff against the Summary path's
|
||||
intermediate cascade values (which lands at 1e-4 GREEN):
|
||||
|
||||
```
|
||||
$ git log --oneline -1 backed | head -1
|
||||
035d916d Slice 70: cohort 000474 mapper-vs-hand-built diff is GREEN
|
||||
Σ internal_gains_monthly_w: API 5339.27 Sum 5313.55 delta +25.72
|
||||
Σ solar_gains_monthly_w: API 5510.10 Sum 5508.60 delta +1.50
|
||||
Σ mean_internal_temp_monthly_c: API 214.87 Sum 213.51 delta +1.35
|
||||
Σ monthly_infiltration_ach: API 8.95 Sum 10.91 delta -1.96
|
||||
hot_water_kwh_per_yr: API 2365.00 Sum 2358.31 delta +6.69
|
||||
```
|
||||
|
||||
Two RED forcing functions on the branch:
|
||||
Specifically:
|
||||
- **Infiltration is still under by ~2 ACH/year**. The (12) spec rule
|
||||
applies on both paths now (after Slice 87), so it's something else
|
||||
— possibly `has_draught_lobby` (API=None, Summary=False; cascade
|
||||
treats both as False so it shouldn't matter; verify) or `(13)
|
||||
draught_lobby_ach`. Or storey count. Probe with
|
||||
`ventilation_from_cert(api_mapped)` vs `ventilation_from_cert(sum_
|
||||
mapped)`.
|
||||
- **HW kWh +6.7** suggests a small Appendix J §1a occupancy
|
||||
difference, or a different Tcold series, or shower outlets.
|
||||
- **Internal gains +25.7 W·months** — probably a pumps_fans count or
|
||||
lighting bulb count mismatch.
|
||||
|
||||
1. `test_summary_001479_full_chain_sap_matches_worksheet_pdf_exactly` — chain pin for cert 001479; cascade SAP `70.20` vs worksheet `69.0094` (delta `1.19`). 9 of 11 `test_sap_result_pin[001479-*]` fail in the same RED state. Closing requires either:
|
||||
- Completing the 001479 hand-built (`_elmhurst_worksheet_001479.py` is the Slice 62 skeleton) — encode every worksheet input until 11/11 pins hit 1e-4.
|
||||
- Or finding the remaining `~3 W/K` cascade gap (likely `u_floor` Table 19 for age C + PS sloping-ceiling roof area inclination factor — see prior handover at commit `0e4f4c05`).
|
||||
|
||||
## What's GREEN right now
|
||||
|
||||
- All 66 cohort `test_sap_result_pin[NNNNNN-*]` pins (6 certs × 11 fields) at 1e-4.
|
||||
- 8 golden-fixture residual pins in `test_golden_fixtures.py` (cohort API certs).
|
||||
- `test_from_elmhurst_site_notes_matches_hand_built_000474` — first parity validation.
|
||||
- Pyright net-zero on every touched file's baseline.
|
||||
|
||||
## Suggested next moves (in priority order)
|
||||
|
||||
### 1. Parametrize the diff test over the 5 other cohort certs
|
||||
|
||||
The toolchain is in place. For each cert 000477, 000480, 000487, 000490, 000516:
|
||||
|
||||
```python
|
||||
def test_from_elmhurst_site_notes_matches_hand_built_NNNNNN() -> None:
|
||||
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_NNNNNN_PDF)
|
||||
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
|
||||
mapped = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
|
||||
hand_built = _wNNNNNN.build_epc()
|
||||
diffs: list[str] = []
|
||||
for field_name in _LOAD_BEARING_FIELDS:
|
||||
diffs.extend(_diff_load_bearing(
|
||||
getattr(mapped, field_name, None),
|
||||
getattr(hand_built, field_name, None),
|
||||
field_name,
|
||||
))
|
||||
assert not diffs, (
|
||||
f"{len(diffs)} load-bearing divergence(s) ...\n " +
|
||||
"\n ".join(diffs)
|
||||
)
|
||||
```
|
||||
|
||||
Each will RED initially with a similar diff pattern to 000474. Most diffs should close mechanically by the same bulk-update pattern as Slice 64 (descriptive fields, ventilation zeros, top-level booleans, `wall_thickness_measured`, etc.). The unique-to-cert wrinkles need slice-by-slice attention. Could be parametrize-then-bulk-fix-then-iterate, or one cert at a time.
|
||||
|
||||
Run diff probe (substitute `NNNNNN`):
|
||||
Run the diff probe (the one from the conversation) to localise:
|
||||
```bash
|
||||
PYTHONPATH=/workspaces/model:/workspaces/model/packages/domain/src python -c "
|
||||
import sys; sys.path.insert(0, '/workspaces/model')
|
||||
from backend.documents_parser.tests.test_summary_pdf_mapper_chain import _diff_load_bearing, _LOAD_BEARING_FIELDS, _summary_pdf_to_textract_style_pages
|
||||
from backend.documents_parser.elmhurst_extractor import ElmhurstSiteNotesExtractor
|
||||
from datatypes.epc.domain.mapper import EpcPropertyDataMapper
|
||||
from domain.sap.worksheet.tests import _elmhurst_worksheet_NNNNNN as wHB
|
||||
import json, dataclasses
|
||||
from pathlib import Path
|
||||
pages = _summary_pdf_to_textract_style_pages(Path('/workspaces/model/backend/documents_parser/tests/fixtures/Summary_NNNNNN.pdf'))
|
||||
|
||||
api = json.loads(Path('/workspaces/model/packages/domain/src/domain/sap/rdsap/tests/fixtures/golden/0535-9020-6509-0821-6222.json').read_text())
|
||||
api_mapped = EpcPropertyDataMapper.from_api_response(api)
|
||||
pages = _summary_pdf_to_textract_style_pages(Path('/workspaces/model/backend/documents_parser/tests/fixtures/Summary_001479.pdf'))
|
||||
sn = ElmhurstSiteNotesExtractor(pages).extract()
|
||||
mapped = EpcPropertyDataMapper.from_elmhurst_site_notes(sn)
|
||||
hb = wHB.build_epc()
|
||||
sum_mapped = EpcPropertyDataMapper.from_elmhurst_site_notes(sn)
|
||||
diffs = []
|
||||
for f in _LOAD_BEARING_FIELDS:
|
||||
diffs.extend(_diff_load_bearing(getattr(mapped, f, None), getattr(hb, f, None), f))
|
||||
print(f'diff count: {len(diffs)}')
|
||||
for d in diffs: print(f' {d}')
|
||||
diffs.extend(_diff_load_bearing(getattr(api_mapped, f, None), getattr(sum_mapped, f, None), f))
|
||||
print(f'{len(diffs)} load-bearing divergences')
|
||||
for d in diffs[:40]: print(f' {d}')
|
||||
"
|
||||
```
|
||||
|
||||
### 2. Complete cert 001479's hand-built (`_elmhurst_worksheet_001479.py`)
|
||||
(NB: the original `_diff_load_bearing` was written for cohort
|
||||
diff tests; the helper signature is `mapped, hand_built, path` — pass
|
||||
api_mapped as `mapped` and sum_mapped as `hand_built` to surface API
|
||||
gaps.)
|
||||
|
||||
Currently 2/11 cascade pins green. Worksheet target `69.0094`. Cascade output `65.99`. Likely missing inputs (compare against cohort 000490 which has a similar gas-combi+secondary config):
|
||||
- Hot-water demand routing (Tcold model, occupancy)
|
||||
- Thermal mass parameter
|
||||
- Internal gains (appliance + cooking allowance)
|
||||
- `multiple_glazed_proportion`
|
||||
- §2 ventilation tuning
|
||||
### 2. Layer 3 — write the API ≡ Elmhurst diff test (1 slice)
|
||||
|
||||
Diagnostic: `python -m pytest packages/domain/src/domain/sap/worksheet/tests/test_e2e_elmhurst_sap_score.py::test_sap_result_pin -k 001479 -v --no-cov` shows each pin's `actual vs expected`.
|
||||
Add `test_from_api_response_matches_from_elmhurst_site_notes_001479`
|
||||
in `backend/documents_parser/tests/test_summary_pdf_mapper_chain.py`,
|
||||
mirroring the cohort `test_from_elmhurst_site_notes_matches_hand_
|
||||
built_NNNNNN` pattern. Use `_diff_load_bearing` with `_LOAD_BEARING_
|
||||
FIELDS`. This formalises Layer 3 as a 1e-4 gate (zero load-bearing
|
||||
divergences between the two mapper outputs).
|
||||
|
||||
### 3. Add cert 001479 to the diff test (after 001479 hand-built lands 1e-4)
|
||||
This test will start RED with the residual diffs from step 1; closing
|
||||
those slices brings it to GREEN.
|
||||
|
||||
```python
|
||||
def test_from_elmhurst_site_notes_matches_hand_built_001479() -> None:
|
||||
...
|
||||
```
|
||||
### 3. More cert pairs (user is sourcing — pause for new data)
|
||||
|
||||
Likely RED initially. Close diffs the same way as 000474.
|
||||
The user has agreed to source 2-3 more (Elmhurst worksheet + GOV.UK
|
||||
API JSON) pairs to validate the mapper isn't 001479-overfit.
|
||||
Suggested diversity:
|
||||
|
||||
### 4. API mapper → hand-built diff test (Layer 3)
|
||||
- **Detached + RR** (would fix cert 0240's -14 residual which has a
|
||||
Type-1 RR the mapper doesn't extract).
|
||||
- **Mid-terrace with cavity-filled party walls** (API party_wall_
|
||||
construction=3 → spec U=0.2; currently mapped to SAP10 code 4
|
||||
which gives U=0.5; needs cascade extension at
|
||||
`u_party_wall`).
|
||||
- **Flat / maisonette** (party wall U=0 path; cert 9390 is one but
|
||||
no worksheet).
|
||||
- **Different age band** (E, J, K, L) to exercise the (12) spec
|
||||
rule's age boundaries.
|
||||
|
||||
```python
|
||||
def test_from_api_response_matches_hand_built_001479() -> None:
|
||||
raw = json.loads(Path("packages/domain/src/domain/sap/rdsap/tests/fixtures/golden/0535-9020-6509-0821-6222.json").read_text())
|
||||
mapped = EpcPropertyDataMapper.from_api_response(raw)
|
||||
hand_built = _w001479.build_epc()
|
||||
# same _diff_load_bearing pattern
|
||||
```
|
||||
Each new pair lands as a 1e-4 cascade-pin test. Pattern: ~3-5 new
|
||||
mapper bugs per cert pair (similar to Slice 87-94 on 001479). Each
|
||||
becomes its own slice. Stage by name; one slice = one commit.
|
||||
|
||||
The API JSON is already cached at `packages/domain/src/domain/sap/rdsap/tests/fixtures/golden/0535-9020-6509-0821-6222.json` (Slice 54 era).
|
||||
### 4. Investigate goldenz with shifted residuals after Slices 87-94
|
||||
|
||||
Diffs here will surface API-mapper coverage gaps. Each one is a slice; the API mapper at `from_api_response` / `from_rdsap_schema_21_0_1` paths needs corresponding extraction.
|
||||
The Slice 87-94 fixes shifted residuals on 7 of 10 API-only golden
|
||||
certs. The new residuals are pinned. Outliers that need attention:
|
||||
|
||||
### 5. The production acceptance test
|
||||
- **0240** (-14): documented RR mapper gap (`'Roof room(s),
|
||||
insulated (assumed)'` description not parsed; Type-1 RR
|
||||
gable_wall_lengths not extracted)
|
||||
- **0390-2954** (-6): large detached, age F, oil — likely a heating
|
||||
efficiency cascade gap
|
||||
- **6035** (-6): mid-terrace age A — possibly party wall config or
|
||||
ventilation issue
|
||||
|
||||
Once Layer 3 is green for cert 001479:
|
||||
- `test_golden_fixtures.py::test_golden_cert_residual_matches_pin[0535-9020-6509-0821-6222]` — add entry. API-mapped EPC cascades to within ±0.5 of API-published `69`.
|
||||
- And `test_summary_001479_full_chain_sap_matches_worksheet_pdf_exactly` is GREEN at 1e-4.
|
||||
These are tractable once you have a worksheet for any of them.
|
||||
|
||||
That's the production-flow acceptance: API → EpcPropertyData → SAP score within tolerance.
|
||||
### 5. (deferred) Cohort chain test RED triage
|
||||
|
||||
## Conventions you must honour (project memory)
|
||||
4 cohort chain tests (000474, 000480, 000487, 000490) are RED
|
||||
because the Elmhurst U985 worksheets emit (12) values that don't
|
||||
follow RdSAP 10 §5 — see the conversation re: identical Summary §9
|
||||
lodgements producing different worksheet (12) for cohort 000477 vs
|
||||
000480. The cascade is now spec-correct; the Elmhurst tool isn't.
|
||||
Options: (a) mark as known-Elmhurst-non-spec, (b) add per-cert
|
||||
override field, (c) wait for more cert pairs to confirm pattern.
|
||||
**Not blocking the production goal.**
|
||||
|
||||
- AAA test convention: every new test uses literal `# Arrange / # Act / # Assert` headers.
|
||||
- `abs(diff) <= tol` not `pytest.approx` (strict-pyright partially-unknown).
|
||||
- One slice = one commit; stage by name.
|
||||
- 1e-4 tolerance for the Elmhurst path; 0.5 for the API path. No widening, no xfail (`feedback_zero_error_strict`).
|
||||
- Strict pyright net-zero on every commit (per-file baselines: mapper.py 35, heat_transmission.py 13, cert_to_inputs.py 35).
|
||||
- The 6 cohort cert hand-builts MUST keep cascading to 1e-4. If a mapper change breaks one, fix the mapper or update the hand-built to match — don't widen.
|
||||
## Key conventions (project memory)
|
||||
|
||||
## Source-data caveats
|
||||
- **AAA test convention** — every new test uses literal `# Arrange /
|
||||
# Act / # Assert` headers.
|
||||
- **`abs(diff) <= tol`** not `pytest.approx` (strict-pyright partial-
|
||||
unknown).
|
||||
- **One slice = one commit** — stage by name (`git add <path>`).
|
||||
- **1e-4 tolerance** for the worksheet-comparable paths (Elmhurst
|
||||
Summary + API both have worksheets for cert 001479). No widening,
|
||||
no xfail.
|
||||
- **Strict pyright net-zero** per file. Baselines: `mapper.py` 33,
|
||||
`heat_transmission.py` 13, `cert_to_inputs.py` 35,
|
||||
`epc_property_data.py` 0.
|
||||
- **Spec citation in commit messages** — when a slice implements a
|
||||
spec rule, quote the spec text (RdSAP 10 page reference). User
|
||||
asked us to confirm against docs.
|
||||
|
||||
- **Cert 001479 age band**: Summary §3 says `Ext1: M 2023 onwards`; worksheet header says `Ext1: L`. Assessor data-entry inconsistency. The 001479 hand-built uses `L` (to mirror the worksheet calc inputs); the Elmhurst mapper trusts the Summary `M`. This will surface as a 1-field diff in the eventual `001479` diff test — document and accept (or override per-cert in the hand-built).
|
||||
## Cached artefacts
|
||||
|
||||
## Branch state
|
||||
- `packages/domain/src/domain/sap/rdsap/tests/fixtures/golden/0535-
|
||||
9020-6509-0821-6222.json` — API JSON for cert 001479 (RdSAP-Schema-
|
||||
21.0.1).
|
||||
- `backend/documents_parser/tests/fixtures/Summary_001479.pdf` —
|
||||
Elmhurst site-notes PDF for cert 001479.
|
||||
- `sap worksheets/lodged example/P960-0001-001479.pdf` — Domna's
|
||||
worksheet output for cert 001479 (Continuous SAP 69.0094).
|
||||
- `sap worksheets/U985-0001-NNNNNN.pdf` × 6 — cohort Elmhurst
|
||||
worksheets (000474, 000477, 000480, 000487, 000490, 000516).
|
||||
- `sap worksheets/U985-0001-NNNNNN.txt` × 6 — text exports of above.
|
||||
|
||||
## Recent slice history (Slices 87-94, current branch)
|
||||
|
||||
```
|
||||
$ git log --oneline -15
|
||||
035d916d Slice 70: cohort 000474 mapper-vs-hand-built diff is GREEN
|
||||
d8a37029 Slice 69: 1:1 windows expansion in cohort 000474 (5 → 7)
|
||||
6baf66cd Slice 68: party-wall "U Unable" + central_heating_pump_age_str → 1 diff left
|
||||
ca39d072 Slices 66+67: Elmhurst mapper surfaces country_code + heating ints + has_draught_lobby
|
||||
4997039f Slice 65: add shower_outlets + number_baths to cohort 000474 hand-built
|
||||
b5cbfe83 Slice 64: bulk-update cohort 000474 hand-built for Cat A diff parity
|
||||
01d234dd Slice 63: RED tracer-bullet mapper-vs-hand-built diff test for cohort 000474
|
||||
7e1269fc Handover: hand-built fixture skeleton landed (Slice 62); 2/11 pins green
|
||||
ee98dbe0 Slice 62: hand-built _elmhurst_worksheet_001479.py — skeleton + 11 RED pins
|
||||
0e4f4c05 Handover: TDD red-green session — 4 more slices (58-60) + RED chain pin
|
||||
31c01a7e Slice 60: thermal bridging y is dwelling-wide, not per-bp
|
||||
175873b4 Slice 59: heat_transmission apportions window area per bp via window_location
|
||||
e3dc0b28 Slice 58: secondary fuel cost routes through lodged secondary_fuel_type
|
||||
a0d9d094 Handover: 4 cert-001479 slices in (54-57); gap at +7.62 SAP; non-fabric next
|
||||
7a9a8b7e Slice 57: Pre-1950 Elmhurst sloping-ceiling roofs map to thickness=0
|
||||
03203418 Slice 94: API mapper sheltered_sides + floor_type — cert 001479 to 1e-3
|
||||
7281b7b3 Slice 93: API mapper window_transmission_details from glazing_type
|
||||
8e752e57 Slice 92: API mapper floor dimensions (SAP +0.25m + exposed-floor + NI→None)
|
||||
2cebba28 Slice 91: API mapper descriptive strings + roof description per-bp fix
|
||||
fbbdca49 Slice 90: API mapper translates party_wall_construction → SAP10 enum
|
||||
006e9842 Slice 89: PS pitched-sloping-ceiling roof area uses inclined surface
|
||||
c40679d1 Slice 88: thread bp.floor_construction_type into u_floor cascade
|
||||
aff331ff Slice 87: implement RdSAP 10 §5 (12) spec rule for suspended timber floor
|
||||
2d3355ee Slice 86: 1:1 windows expansion in cohort 000516 (2 → 5 entries)
|
||||
f863598d Slice 85: bulk-update cohort 000516 hand-built for Cat A diff parity
|
||||
```
|
||||
|
||||
## Cached artefacts (don't re-fetch)
|
||||
Earlier slice context (71-86 closed cohort Layer 2) is in the prior
|
||||
handover at commit `86eff23f` (`docs/sap-spec/NEXT_AGENT_PROMPT.md`
|
||||
before this rewrite).
|
||||
|
||||
- `packages/domain/src/domain/sap/rdsap/tests/fixtures/golden/0535-9020-6509-0821-6222.json` — API JSON for cert 001479 (Slice 54 era, fetched via `OPEN_EPC_API_TOKEN` from `backend/.env`).
|
||||
- `backend/documents_parser/tests/fixtures/Summary_001479.pdf` — site-notes PDF.
|
||||
- `sap worksheets/lodged example/P960-0001-001479.pdf` — Elmhurst worksheet output for cert 001479.
|
||||
## First action
|
||||
|
||||
## Probe scripts (regenerable in `/tmp`)
|
||||
1. Confirm branch state matches `git log --oneline -1` →
|
||||
`03203418` Slice 94.
|
||||
2. Run the full sweep:
|
||||
```bash
|
||||
PYTHONPATH=/workspaces/model:/workspaces/model/packages/domain/src \
|
||||
python -m pytest backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \
|
||||
packages/domain/src/domain/sap/worksheet/tests/test_e2e_elmhurst_sap_score.py \
|
||||
packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py \
|
||||
--no-cov -q
|
||||
```
|
||||
Expect ~75 passed / ~16 failed. The 9 failures on
|
||||
`test_sap_result_pin[001479-*]` (cohort cascade for the hand-built
|
||||
skeleton) and 4 cohort chain RED + 3 cohort diff RED are
|
||||
pre-existing.
|
||||
3. Run the API → Summary diff probe (script in §1 above) to surface
|
||||
the remaining sub-1e-3 SAP gap. Likely candidates ranked by impact:
|
||||
- Infiltration (-2 ACH/yr) → check `ventilation_from_cert()`
|
||||
intermediate outputs for both paths
|
||||
- HW kWh (+6.7) → check shower outlet count + Appendix J §1a path
|
||||
- Internal gains (+25.7 W·months) → check pumps_fans + bulb counts
|
||||
4. Don't lose sight of Layer 4: **API → SAP within 1e-4 of worksheet
|
||||
continuous on cert 001479** is the production goal. Currently
|
||||
delta +0.0006.
|
||||
|
||||
- `/tmp/probe_000474_handbuilt_diff.py` — diff cohort 000474 mapped vs hand-built (un-filtered).
|
||||
- `/tmp/probe_000474_load_bearing.py` — diff cohort 000474 mapped vs hand-built (load-bearing scope, pre-filter).
|
||||
- `/tmp/probe_001479.py` — cross-mapper diff + cascade for cert 001479.
|
||||
- `/tmp/sensitivity_001479.py` — single-field patch SAP impact probe.
|
||||
- `/tmp/perbp_001479.py` — per-bp cascade U-value dump.
|
||||
|
||||
Good luck. Keep the end goal at the front of the work: **API → SAP within ±0.5 of published 69 on cert 001479** is the acceptance test. The cohort + Elmhurst diff layers are the trail of breadcrumbs that will get us there with high confidence.
|
||||
Good luck. The user is sourcing more cert pairs in parallel; when
|
||||
they arrive, each one will surface 3-5 mapper bugs along the same
|
||||
pattern as Slices 87-94. The diagnostic methodology that worked here
|
||||
(diff Summary-mapper vs API-mapper; localise by cascade component;
|
||||
fix the API mapper to mirror the Summary's surfacing) will work
|
||||
again.
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue