mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
docs: update handover — cert 9501 closed, HP workstream still next
Cert 9501 (top-floor flat + RR + measured PV) is now CLOSED on both Summary and API paths at 1e-4 vs worksheet 68.5252 (Slices 99a-99e on Summary + 100a-100c on API). Three boiler certs in total now have Layer 4 production gates. Updated handover lists the 7 ASHP workstream (still deferred), the 8 cohort certs without worksheets (residuals tightened by Slice 100c's gap-aware DG-pre-2002 glazing lookup), and captures the 7 key learnings from cert 9501 closure as guidance for the HP workstream. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
7992154ffd
commit
7fed541efa
1 changed files with 127 additions and 162 deletions
|
|
@ -1,168 +1,74 @@
|
|||
# Handover — Cert 9501 flat-exposure + heat-pump workstream
|
||||
# Handover — Heat-pump workstream + remaining boiler audits
|
||||
|
||||
You're picking up branch `feature/per-cert-mapper-validation` after
|
||||
the cert 0330 boiler workflow landed (Layer 4 1e-4 gate GREEN on both
|
||||
Summary and API paths, mirroring cert 001479). Two boiler certs are
|
||||
now validated end-to-end against worksheets at 1e-4. The third boiler
|
||||
cert (9501) is staged but RED at Δ -5.25 SAP because it surfaces a
|
||||
new class of mapper gap: **flat-specific exposure**.
|
||||
three boiler certs (001479, 0330, 9501) landed Layer 4 1e-4 chain
|
||||
gates on BOTH the Summary and API paths. The boiler workflow is now
|
||||
proven on three independent shapes — house mid-terrace (001479),
|
||||
house mid-terrace with single extension (0330), top-floor flat with
|
||||
RR + measured PV (9501). The next pieces are the 7 ASHP certs and
|
||||
the 8 cohort golden certs that don't yet have worksheets.
|
||||
|
||||
## State at session start
|
||||
|
||||
Recent commits:
|
||||
Most recent commits (cert 9501 closure):
|
||||
|
||||
```
|
||||
7992154f Slice 100c: API path — surface PV arrays + gap-aware glazing lookup
|
||||
814ae798 Slice 100b: API TFA — include per-bp RR floor area in continuous TFA
|
||||
7d460183 Slice 100a: API path — surface Detailed-RR per-surface areas
|
||||
0735c7e8 Slice 99e: PV pitch enum-not-degrees + cert 9501 Layer 2 chain test
|
||||
4264e0ad Slice 99d: surface PV array from Elmhurst Summary §19.0
|
||||
e9575b52 Slice 99c: Elmhurst mapper — RR gables external for flats + SO wall code
|
||||
2cdaefcd Slice 99b: Elmhurst mapper — flat floor-position from floor.location
|
||||
a76af2ec Slice 99a: Elmhurst extractor — no attachment line for flats
|
||||
158c08f1 docs: handover for cert 9501 (flat exposure) + HP workstream
|
||||
5d1778ac chore: stage cert 9501 fixtures
|
||||
8443c770 Slice 98: API path shower-counts + window-rounding → cert 0330 1e-4
|
||||
aa6645e3 Slice 97: API glazing_type=2 → RdSAP 10 Table 24 (DG 2002-2021)
|
||||
aa6645e3 Slice 97: API glazing_type=2 → RdSAP 10 Table 24
|
||||
da5e7196 Slice 96: flat-roof U-value defaults — RdSAP 10 §5.11 Table 18 col (3)
|
||||
5d1778ac chore: stage cert 9501 fixtures (second boiler validation cert)
|
||||
17646c8a chore: stage cert 0380 fixtures (HP pilot — deferred workstream)
|
||||
460f1735 chore: stage cert 0330 fixtures (boiler pilot)
|
||||
```
|
||||
|
||||
Test baselines you should see (197 pass + 9 pre-existing 001479
|
||||
Test baselines you should see (429 pass + 9 pre-existing 001479
|
||||
Layer 1 fails):
|
||||
|
||||
```bash
|
||||
PYTHONPATH=/workspaces/model python -m pytest \
|
||||
backend/documents_parser/tests/test_summary_pdf_mapper_chain.py \
|
||||
domain/sap10_calculator/worksheet/tests/test_e2e_elmhurst_sap_score.py \
|
||||
backend/documents_parser/tests/test_elmhurst_extractor.py \
|
||||
backend/documents_parser/tests/test_elmhurst_end_to_end.py \
|
||||
domain/sap10_calculator/rdsap/tests/test_golden_fixtures.py \
|
||||
domain/sap10_ml/tests/test_rdsap_uvalues.py \
|
||||
datatypes/epc/schema/tests/test_schema_loading.py \
|
||||
--no-cov -q
|
||||
```
|
||||
|
||||
Layer 4 1e-4 gates passing:
|
||||
**Layer 4 1e-4 production gates passing (3 boiler certs, dual-path):**
|
||||
|
||||
- `test_api_001479_full_chain_sap_matches_worksheet_pdf_exactly`
|
||||
- `test_api_0330_full_chain_sap_matches_worksheet_pdf_exactly` ← landed this session
|
||||
- `test_summary_001479_full_chain_sap_matches_worksheet_pdf_exactly`
|
||||
- `test_summary_0330_full_chain_sap_matches_worksheet_pdf_exactly` ← landed this session
|
||||
- 000477 / 000516 cohort chain tests
|
||||
| Cert | Heating | Dwelling | Worksheet SAP | Summary | API |
|
||||
|---|---|---|---|---|---|
|
||||
| 001479 (0535-...-6222) | Mains gas boiler | Mid-terrace house | 69.0094 | ✓ | ✓ |
|
||||
| 0330-2249-... | Mains gas boiler | Mid-terrace house + ext | 61.5993 | ✓ | ✓ |
|
||||
| 9501-3059-... | Mains gas boiler | Top-floor flat + RR + PV | 68.5252 | ✓ | ✓ |
|
||||
|
||||
## Cert 9501 — staged but RED (Δ -5.25 SAP)
|
||||
## Outstanding workstreams (in priority order)
|
||||
|
||||
Fixtures committed in `5d1778ac`:
|
||||
- API JSON: `domain/sap10_calculator/rdsap/tests/fixtures/golden/9501-3059-8202-7356-0204.json`
|
||||
- Summary PDF: `backend/documents_parser/tests/fixtures/Summary_000784.pdf`
|
||||
- Worksheet (reference): `sap worksheets/Additional data with api/9501-3059-8202-7356-0204/dr87-0001-000784.pdf`
|
||||
### 1. Heat-pump workstream — 7 ASHP certs (DEFERRED until go-ahead)
|
||||
|
||||
Cert shape (per worksheet header):
|
||||
- Property type: **Flat, Mid-Terrace** (mid-floor — `not` top-floor as
|
||||
the prior handover claimed)
|
||||
- Storeys (building): 4
|
||||
- Age band: B
|
||||
- TFA: 113.08 m²
|
||||
- Heating: mains-gas boiler, PCDB idx 19007 (Vaillant)
|
||||
- Worksheet target unrounded SAP: **68.5252**
|
||||
Cert refs (per the prior handover): 0350, 0380, 2225, 2636, 3800,
|
||||
9285, 9418. Predominantly PCDB index 104568 (one model 102421). The
|
||||
mapper has never been validated against a heat-pump cert.
|
||||
|
||||
### Cascade-component diff (Summary path vs worksheet)
|
||||
Cert 0380 fixtures are already staged (commit `17646c8a`). Original
|
||||
probe showed:
|
||||
|
||||
```
|
||||
TFA: 113.08 = 113.08 ✓
|
||||
walls: 148.89 vs 218.81 (Δ -69.92 ← BIG — missing RR gables)
|
||||
roof: 18.10 = 18.10 ✓ (Table 18 age B col-(3) +
|
||||
col-(1) compound — fine)
|
||||
floor: 9.25 vs 0.00 (Δ +9.25 ← FLAT GROUND-FLOOR PARTY)
|
||||
windows: 25.83 = 25.83 ✓
|
||||
doors: 5.55 = 5.55 ✓
|
||||
party: 7.36 vs 0.00 (Δ +7.36 ← worksheet U_party=0 for flat)
|
||||
bridges: 25.00 vs 28.39 (Δ -3.39 ← downstream of (31) shrink)
|
||||
(37) tot: 239.98 vs 296.68 (Δ -56.70 ← composite)
|
||||
|
||||
ECF: 2.6326 vs 2.2563 (too high; SAP too low by 5.25)
|
||||
```
|
||||
|
||||
### Worksheet element decomposition (line 187-205)
|
||||
|
||||
```
|
||||
Element Net Area U A x U
|
||||
(26) Doors uninsulated 1 1.85 3.00 5.55
|
||||
(27) Windows 1 10.60 2.44 25.83
|
||||
(28a) Ground floor Main 67.58 0.00 0.00 ← PARTY
|
||||
(29a) External walls Main 99.26 1.70 168.74
|
||||
(29a) Roof room Main Gable Wall 1 13.50 1.70 22.95 ← RR
|
||||
(29a) Roof room Main Gable Wall 2 15.95 1.70 27.12 ← RR
|
||||
(30) Roof room Main Flat Ceiling 1 5.50 0.19 1.045 ← RR
|
||||
(30) External roof Main 42.63 0.40 17.05
|
||||
(31) Total net area = 189.29 m²
|
||||
(33) Fabric heat loss = 268.28
|
||||
(32) Party walls Main 52.54 0.00 0.00 ← PARTY
|
||||
(32d) Dwelling below Main 6.85 — — ← PARTY
|
||||
(35) TMP = 250
|
||||
(36) Bridges (0.150 × 189.29) 28.39
|
||||
(37) Total fabric heat loss 296.68
|
||||
```
|
||||
|
||||
### Localised mapper gaps
|
||||
|
||||
The Summary path's `EpcPropertyData` has these load-bearing wrong
|
||||
or missing fields:
|
||||
|
||||
| Field | Currently | Should be |
|
||||
| Path | Cascade SAP | Δ vs worksheet 88.5104 |
|
||||
|---|---|---|
|
||||
| `dwelling_type` | `"Number of Storeys: flat"` (mangled by extractor) | `"Flat"` |
|
||||
| `built_form` | `"Number of Storeys:"` (mangled) | `"Mid-Terrace"` (or similar) |
|
||||
| `sap_flat_details` | `None` | populated with the cert's flat position |
|
||||
| `sap_building_parts[0].sap_room_in_roof` | likely None | populated with the RR's gable walls + flat ceiling areas |
|
||||
| Summary mapper | 18.08 | **-70.43** (catastrophic — Summary identifies HP as 80% boiler) |
|
||||
| API mapper | 70.14 | **-18.37** |
|
||||
|
||||
**Order of attack** (each is a slice candidate):
|
||||
|
||||
1. **Fix the Elmhurst extractor's `dwelling_type` / `built_form`
|
||||
parsing** for this Summary PDF format. Some other section of the
|
||||
PDF is bleeding into the parsed value (the "Number of Storeys:"
|
||||
prefix). The extractor's anchor for `built_form` is likely
|
||||
matching too eagerly; check `ElmhurstSiteNotesExtractor`. Don't
|
||||
guess — read the Summary_000784.pdf header section + compare to
|
||||
what `ElmhurstSiteNotesExtractor` returns.
|
||||
|
||||
2. **Populate `sap_flat_details`** in `EpcPropertyDataMapper.
|
||||
from_elmhurst_site_notes`. The cascade's `_dwelling_exposure`
|
||||
reads from this field (see
|
||||
`domain/sap10_calculator/rdsap/cert_to_inputs.py`) to gate
|
||||
floor/roof contributions per RdSAP 10 §5. For cert 9501 (mid-
|
||||
floor flat), both floor (party with dwelling below) and roof
|
||||
(party with dwelling above) should be excluded — but the cert
|
||||
does have an RR with gable walls and flat ceiling exposed
|
||||
externally, so the dwelling has SOME exposed roof.
|
||||
|
||||
3. **Populate `sap_room_in_roof`** with the RR-specific geometry:
|
||||
gable walls 13.50 + 15.95 m², flat ceiling 5.50 m². Worksheet
|
||||
lodges these as part of the Main bp's (29a) walls + (30) roof.
|
||||
Cascade reads from `sap_room_in_roof.detailed_surfaces` —
|
||||
check `worksheet/heat_transmission.py` for the surfacing
|
||||
convention.
|
||||
|
||||
4. **Re-pin or remove cert 9501 from Layer 4 tracking** once
|
||||
Summary path lands at 1e-4. The RED test was NOT committed this
|
||||
session (working-tree-only) — add the equivalent of
|
||||
`test_summary_0330_full_chain_sap_matches_worksheet_pdf_exactly`
|
||||
for cert 9501 once the gap closes.
|
||||
|
||||
### API path expected gaps (after Summary lands)
|
||||
|
||||
The API JSON for cert 9501 lodges `property_type=2` (Flat) and
|
||||
`built_form=NR`. The API mapper needs to populate `sap_flat_details`
|
||||
from `floors[]` + `roofs[]` + the GOV.UK schema's flat-specific
|
||||
fields. Probable additional gaps (same pattern as Summary):
|
||||
- `sap_flat_details` mid-floor exposure routing
|
||||
- RR detection from cert's `roofs[].description` if the cert lodges
|
||||
an attic-style roof
|
||||
|
||||
## Heat-pump workstream (cert 0380 + 6 sibling ASHPs) — DEFERRED
|
||||
|
||||
Per the user's direction, the 7 ASHP certs are deferred until the
|
||||
boiler workflow is proven. Status:
|
||||
|
||||
- Cert 0380 fixtures staged in commit `17646c8a` (worksheet target
|
||||
SAP 88.5104). Original probe showed catastrophic Δ -70 SAP on
|
||||
Summary path and Δ -18 SAP on API path — the Summary mapper
|
||||
identified the HP as an 80%-efficient boiler.
|
||||
- 6 other ASHPs share PCDB index 104568 (one uses 102421) — work
|
||||
is likely shared across them.
|
||||
|
||||
Work sketch (from the prior handover):
|
||||
The Summary mapper is fundamentally broken on heat pumps; the API
|
||||
mapper is partially-broken. Likely 15-30 slice workstream. Sketch
|
||||
(from the prior handover, unchanged):
|
||||
|
||||
1. **API mapper**: surface `main_heating_index_number`, set
|
||||
`main_heating_category` for HPs, `main_fuel_type=29` (electric
|
||||
|
|
@ -178,45 +84,104 @@ Work sketch (from the prior handover):
|
|||
5. **Summary mapper**: separate slice — needs to identify HPs from
|
||||
the Summary PDF's heating section.
|
||||
|
||||
Do NOT start HP slices without an explicit go-ahead from the user.
|
||||
**Do NOT start HP slices without an explicit go-ahead from the user.**
|
||||
|
||||
## Conventions (preserved)
|
||||
### 2. 8 cohort golden certs without worksheets
|
||||
|
||||
The 8 cert refs currently in `test_golden_fixtures.py` (0240, 0300,
|
||||
0390-2954, 6035, 7536, 8135, 2130, 0390-2254) are API-only with
|
||||
integer SAP residual pins. Some have non-trivial residuals
|
||||
(0240=-14, 0390-2954=-6, 6035=-6) that suggest mapper coverage gaps.
|
||||
|
||||
If worksheets become available for any of them, migrate to Layer 4
|
||||
1e-4 chain pins (cleanest forcing function). Until then, the
|
||||
residual pins are the only gate.
|
||||
|
||||
The recent gap-aware DG-pre-2002 glazing lookup (Slice 100c) tightened
|
||||
PE / CO2 residuals on 5 of these 8 certs by surfacing the correct
|
||||
spec-table U per `glazing_gap`. Other coverage gaps probably surface
|
||||
similarly — gap-aware lookups for glazing_type=2 (DG 2002+) and 13
|
||||
(DG argon post-2022) are candidates the next time a residual drifts.
|
||||
|
||||
### 3. Solar battery storage (user-flagged)
|
||||
|
||||
User question this session: "do we handle solar battery?" — Partial
|
||||
coverage: the data model has `SapEnergySource.pv_battery_count` +
|
||||
`SapEnergySource.pv_batteries: Optional[PvBatteries]`, the API mapper
|
||||
extracts both, but the Elmhurst Summary mapper hardcodes
|
||||
`pv_battery_count=0` and doesn't parse battery details from the PDF.
|
||||
The cascade's Appendix M battery-storage adjustment (PV self-
|
||||
consumption fraction with battery) hasn't been audited. None of the
|
||||
three closed boiler certs lodge a battery so it's not blocking — but
|
||||
it's a known gap.
|
||||
|
||||
## Key learnings from cert 9501 closure (replicate for HP workstream)
|
||||
|
||||
1. **Two RR JSON shapes coexist**: `room_in_roof_type_1` (Simplified
|
||||
Type 1, cohort certs) and `room_in_roof_details` (Detailed RR,
|
||||
newer certs). The schema must model both; the API mapper picks
|
||||
whichever block is populated. Slice 100a added the new dataclass
|
||||
alongside the legacy one.
|
||||
|
||||
2. **Two PV JSON shapes coexist**: `photovoltaic_supply` as a nested
|
||||
list (cohort cert 2130) vs `{"pv_arrays": [{...}]}` dict wrapper
|
||||
(cert 9501). Schema needs the `pv_arrays` field, mapper dispatcher
|
||||
handles both shapes. Slice 100c.
|
||||
|
||||
3. **RR floor area lives under `sap_room_in_roof.floor_area`, NOT
|
||||
`sap_floor_dimensions`**: the per-bp TFA helper must add it
|
||||
explicitly. Cohort certs (e.g. 0240 with 83.2 m² RR floor) were
|
||||
silently dropping the RR area from TFA — Slice 100b fixed this
|
||||
and tightened cohort 0240 SAP residual -15 → -14, 6035 PE
|
||||
+49.51 → +47.85.
|
||||
|
||||
4. **API `PhotovoltaicArray.pitch` is the RdSAP enum (1-5), NOT
|
||||
degrees**: codes 1=0°, 2=30°, 3=45°, 4=60°, 5=90°. Summary mapper
|
||||
needs `_elmhurst_pv_pitch_code` to snap-to-nearest. The wrong-by-
|
||||
one-unit shift inflates PV generation ~2.5% (Slice 99e).
|
||||
|
||||
5. **Glazing U-value is type+gap-aware in RdSAP 10 Table 24**:
|
||||
`glazing_type=3` (DG pre-2002) has U=3.1 (6mm), 2.8 (12mm), 2.7
|
||||
(16+). 5/8 cohort certs use 16+ — flat lookup at the type-only
|
||||
default U=2.8 was wrong for 5 of them. Slice 100c.
|
||||
|
||||
6. **Flats with RR have external gable walls, not party walls**:
|
||||
Top-floor flats sit at the building's end (no neighbour above);
|
||||
the gables are exposed external (U = main-wall U) not party
|
||||
(U=0.25). Threading `is_flat=True` through the RR surface
|
||||
mapper picks `gable_wall_external` for un-typed gables. Slice 99c
|
||||
(Summary) + Slice 100a (API).
|
||||
|
||||
7. **`dwelling_type` floor-position prefix gates exposure routing**:
|
||||
For flats, `_dwelling_exposure` in cert_to_inputs.py prefix-
|
||||
matches "top-floor" / "mid-floor" / "ground-floor". The Elmhurst
|
||||
mapper composes the position from `floor.location` ("dwelling
|
||||
below" → not ground) + RR presence (→ top vs mid). Slice 99b.
|
||||
|
||||
## Conventions (preserved — unchanged this session)
|
||||
|
||||
- **One slice = one commit** — stage by name.
|
||||
- **AAA test convention** — literal `# Arrange / # Act / # Assert`.
|
||||
- **`abs(diff) <= tol`** not `pytest.approx` (strict-pyright clean).
|
||||
- **`abs(diff) <= tol`** not `pytest.approx`.
|
||||
- **1e-4 worksheet tolerance** when worksheet is available.
|
||||
- **Spec citation** in commit messages when a slice implements a
|
||||
spec rule (quote RdSAP 10 / SAP 10.2/10.3 page reference).
|
||||
- **Pyright net-zero per file**. Updated baselines:
|
||||
- `datatypes/epc/domain/mapper.py`: 33
|
||||
- **Spec citation** in commit messages when implementing a spec rule.
|
||||
- **Pyright net-zero per file**. Updated baselines (Slice 100c
|
||||
improved mapper.py by 1):
|
||||
- `datatypes/epc/domain/mapper.py`: **32** (was 33; extracting
|
||||
`_api_sap_window` resolved one)
|
||||
- `domain/sap10_calculator/worksheet/heat_transmission.py`: 13
|
||||
- `domain/sap10_calculator/rdsap/cert_to_inputs.py`: 35
|
||||
- `datatypes/epc/domain/epc_property_data.py`: 1 (pre-existing)
|
||||
- `domain/sap10_ml/rdsap_uvalues.py`: 1 (pre-existing)
|
||||
|
||||
## Tooling shortcuts (unchanged)
|
||||
|
||||
- EPC fetch: `OPEN_EPC_API_TOKEN` (NOT `EPC_AUTH_TOKEN`) in
|
||||
`backend/.env`.
|
||||
- Worksheet SAP: `pdftotext -layout <worksheet.pdf> -` then grep.
|
||||
- Cascade-component probe: reuse the inline pattern from this
|
||||
handover's "Cascade-component diff" section above.
|
||||
|
||||
## Open items / known gaps from prior session
|
||||
## Open items / known gaps (carried forward)
|
||||
|
||||
- Pre-existing `test_roof_insulated_assumed_with_ni_thickness_uses_
|
||||
50mm_per_section_5_11_4` in `test_heat_transmission.py` fails
|
||||
with `229.99 vs 68.0 ± 2` — verified pre-existing (stash test
|
||||
showed same failure without my changes). Not addressed this
|
||||
session; address separately when the §5.11.4 50mm-rule cascade
|
||||
path is touched.
|
||||
- 8 cohort golden certs (0240, 0300, 0390-2954, 6035, 7536, 8135,
|
||||
2130, 0390-2254) are API-only with integer SAP residual pins —
|
||||
if worksheets become available for any of them, migrate to
|
||||
Layer 4 1e-4 chain pins (cleanest forcing function).
|
||||
showed same failure without cert 9501 changes). The §5.11.4
|
||||
50mm-rule cascade path needs a separate audit.
|
||||
|
||||
Good luck. The diagnostic methodology (Summary path → worksheet 1e-4
|
||||
first, then API path catches up) is now proven on 2 boiler certs;
|
||||
cert 9501 should land in ~3-5 slices once the flat-exposure plumbing
|
||||
is in place.
|
||||
Good luck with the HP workstream when the user gives the go-ahead.
|
||||
Each cert pair has been closing in 3-5 slices using the methodology
|
||||
proven over 8 slices (96-100c) on cert 9501.
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue