docs: refresh handover + cert 0240 notes after Slice 95

Status: Slice 95 closed Layer 4 (API → cascade SAP) on cert 001479 at
< 1e-4 vs worksheet 69.0094. Production goal MET; the
`test_api_001479_full_chain_sap_matches_worksheet_pdf_exactly` test
formalises this gate. Updates to keep the next agent honest:

- NEXT_AGENT_PROMPT: header + status table + cumulative SAP delta table
  + "First action" + epilogue all reflect Slice 95's close-out.
- NEXT_AGENT_PROMPT §4 (Outlier golden cert investigations): rewrote
  the cert 0240 entry. The earlier "Type-1 RR gable_wall_lengths not
  extracted" claim is stale — mapper.py:1349-1369 already extracts
  them (Slices 71-86). The -15 SAP residual is a mix, dominated by
  the windows subsystem (11 windows × 18.28 m² with default U≈2.27
  because Slice 93's `_API_GLAZING_TYPE_TO_TRANSMISSION` only covers
  glazing codes 3 and 13; cert 0240 lodges code 2). Surfacing
  glazing_type=2 (and likely other unmapped codes) is the biggest
  single-slice leverage point — and would touch 6035 too.
- test_golden_fixtures.py cert 0240 `notes:` field: replaced the
  stale RR hypothesis with the actual cascade subsystem breakdown
  and the glazing_type-2 surfacing recommendation.

No production code changed; docs and a `_GoldenExpectation.notes`
string only. test_golden_fixtures.py stays GREEN (14 passed).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-05-26 10:32:18 +00:00
parent f502db8c74
commit b2c6a57247
2 changed files with 87 additions and 51 deletions

View file

@ -1,9 +1,10 @@
# Handover — API mapper at 1e-3 on cert 001479, closing to 1e-4
# Handover — API mapper at 1e-4 on cert 001479; investigating goldens
You are picking up branch `ara-backend-design-prd`. The cert 001479 API
path is at SAP delta **+0.0006** (was +3.08); fabric heat loss is
EXACT. The remaining work is closing the sub-1e-3 gap and validating
against more cert pairs.
path now hits the worksheet's continuous SAP 69.0094 **at < 1e-4**
(Slice 95). Layer 4 production goal is MET. Remaining work: investigate
golden cert residual outliers (especially cert 0240's -15 SAP) and
process any new (Summary + API) cert pairs the user sources.
## The end goal (re-confirmed by the user)
@ -36,8 +37,8 @@ Layer 4: API mapper cascade SAP = worksheet SAP at 1e-4 (production goal)
|---|---|
| **1 — hand-built cascade pin** | ✅ 6 cohort certs (000474, 000477, 000480, 000487, 000490, 000516) GREEN at 1e-4; cert 001479 hand-built skeleton (Slice 62) still RED (2 of 11 pins green, hand-built has its own bugs — orthogonal to the production path) |
| **2 — Elmhurst-mapped path** | ✅ **Cert 001479 GREEN at 1e-4** (Slice 89); cohort: 2 GREEN (000477, 000516), 4 RED (000474, 000480, 000487, 000490 — Elmhurst U985 worksheets violate the RdSAP 10 §5 (12) spec; orthogonal to the production goal) |
| **3 — API-mapped ≡ Elmhurst-mapped (field-level)** | 🟡 Cascade outputs match within 1e-3 SAP; field-level diff test not yet written |
| **4 — API path cascade SAP** | 🟡 **Cert 001479 at +0.0006 SAP delta from worksheet** (was +3.08); 9 other golden certs pinned at residual-from-integer at tolerance 0 |
| **3 — API-mapped ≡ Elmhurst-mapped (field-level)** | 🟡 Cascade outputs match at 1e-4 (Slice 95); field-level diff test not yet written but lower priority since cascade-output gate exists |
| **4 — API path cascade SAP** | **Cert 001479 GREEN at 1e-4** (Slice 95). `test_api_001479_full_chain_sap_matches_worksheet_pdf_exactly` formalises the gate. 8 other golden certs pinned at residual-from-integer at tolerance 0 |
## Cumulative API SAP delta progression (cert 001479)
@ -57,7 +58,8 @@ spec rule's premise to be met. Each slice closed one gap:
| 91 | descriptive strings via int→str lookups (`floor_construction_type`, `roof_construction_type`) + pre-1950 PS sloping → thickness=0 + per-bp roof description fix | +1.0970 |
| 92 | upper-floor `room_height_m += 0.25` + `is_exposed_floor` from `floor_heat_loss==1` + `floor_insulation_thickness="NI"→None` | +1.0022 |
| 93 | `window_transmission_details` from `glazing_type` int (code 3 → U=2.8/g=0.76, code 13 → U=1.4/g=0.72) | +1.1846 |
| 94 | `sheltered_sides` from API `built_form` + `floor_type` from `floor_heat_loss==7` | **+0.0006** |
| 94 | `sheltered_sides` from API `built_form` + `floor_type` from `floor_heat_loss==7` | +0.0006 |
| 95 | API mapper `total_floor_area_m2` = Σ per-bp dims (worksheet-precise 68.51 not lodged-rounded 69) + RdSAP 10 §15 p.66 window 2dp area rounding in solar_gains/internal_gains | **< 1e-4** |
Fabric breakdown for cert 001479 API path is now COMPLETELY EXACT
(all 6 components match worksheet to 4 d.p.):
@ -160,20 +162,47 @@ Each new pair lands as a 1e-4 cascade-pin test. Pattern: ~3-5 new
mapper bugs per cert pair (similar to Slice 87-94 on 001479). Each
becomes its own slice. Stage by name; one slice = one commit.
### 4. Investigate goldenz with shifted residuals after Slices 87-94
### 4. Investigate goldens with shifted residuals after Slices 87-95
The Slice 87-94 fixes shifted residuals on 7 of 10 API-only golden
certs. The new residuals are pinned. Outliers that need attention:
Slices 87-94 shifted residuals on 7 of 10 API-only golden certs;
Slice 95 (precise TFA + window 2dp area rounding) shifted 5 more
(0240, 6035, 8135, 2130, 0390-2254). All residuals are re-pinned.
Current outliers and what we now know:
- **0240** (-14): documented RR mapper gap (`'Roof room(s),
insulated (assumed)'` description not parsed; Type-1 RR
gable_wall_lengths not extracted)
- **0390-2954** (-6): large detached, age F, oil — likely a heating
efficiency cascade gap
- **6035** (-6): mid-terrace age A — possibly party wall config or
ventilation issue
These are tractable once you have a worksheet for any of them.
- **0240** (-15 SAP, +17.8 PE): Detached age J + RR + 11 windows. The
earlier handover claim of "RR mapper gap" is **partly stale**:
- `room_in_roof_type_1.gable_wall_length_1/2` ARE extracted by the
21.0.1 mapper (see mapper.py:1349-1369 — must have landed in
Slices 71-86). Cert 0240's RR cascades through with floor_area=
83.2, gables 6.4 + 6.4, age J → U_RR = 0.30 W/m²K.
- `'Roof room(s), insulated (assumed)'` description NOT parsed —
but the spec basis for parsing it is unclear: age J's Table 18
col(4) default already models insulation (U=0.30), and unlike
the regular-roof "insulated (assumed)" → 50 mm bucket rule
(RdSAP §5.11.4), no equivalent rule for RR has been identified.
- The -15 SAP residual is a mix, not a single RR gap. Subsystem
breakdown for cert 0240 (via cert_to_inputs cascade):
- walls 22.95, party 0, roof 76.93 (incl RR ~18.5), floor 29.43,
windows 41.55, doors 11.10, bridging 39.64; total HLC 221.6 W/K
- **windows_w_per_k = 41.55 is the most leverageable**: 11
windows × 18.28 m² × U_default ≈ 2.27 W/m²K. Cert lodges
`glazing_type=2` for all windows but Slice 93's
`_API_GLAZING_TYPE_TO_TRANSMISSION` only covers codes 3 and 13;
surfacing code 2 would land a measurable U (likely ~1.8-2.0)
and close several W/K of fabric loss.
- Other potential gains: BP[0] non-RR ceiling lodges "Pitched,
400+ mm loft insulation" (should U ~0.10); verify cascade
gives it that.
- **Net**: cert 0240 is not a single-slice fix; it's 3-5
progressive mapper improvements (glazing_type 2 surfacing,
possibly more glazing codes, possibly RR description nuance).
- **0390-2954** (-6 SAP, -26.5 PE): large detached F (TFA 360), oil
PCDB-listed. Undocumented. PE going more negative than SAP suggests
the cost cascade is hitting harder than energy — possibly oil
price/efficiency interaction.
- **6035** (-6 SAP, +49.5 PE): mid-terrace age A + RR. Probably has
the same glazing_type-default-U issue as 0240 plus an age-A-
specific gap.
### 5. (deferred) Cohort chain test RED triage
@ -216,9 +245,10 @@ override field, (c) wait for more cert pairs to confirm pattern.
worksheets (000474, 000477, 000480, 000487, 000490, 000516).
- `sap worksheets/U985-0001-NNNNNN.txt` × 6 — text exports of above.
## Recent slice history (Slices 87-94, current branch)
## Recent slice history (Slices 87-95, current branch)
```
f502db8c Slice 95: API mapper TFA from per-bp dims + window area 2dp rounding — cert 001479 to 1e-4
03203418 Slice 94: API mapper sheltered_sides + floor_type — cert 001479 to 1e-3
7281b7b3 Slice 93: API mapper window_transmission_details from glazing_type
8e752e57 Slice 92: API mapper floor dimensions (SAP +0.25m + exposed-floor + NI→None)
@ -237,8 +267,8 @@ before this rewrite).
## First action
1. Confirm branch state matches `git log --oneline -1`
`03203418` Slice 94.
1. Confirm branch state — Slice 95 (`f502db8c`) closed cert 001479 to
< 1e-4 (was +0.0006 after Slice 94). Layer 4 is GREEN.
2. Run the full sweep:
```bash
PYTHONPATH=/workspaces/model:/workspaces/model/packages/domain/src \
@ -247,23 +277,25 @@ before this rewrite).
packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py \
--no-cov -q
```
Expect ~75 passed / ~16 failed. The 9 failures on
`test_sap_result_pin[001479-*]` (cohort cascade for the hand-built
skeleton) and 4 cohort chain RED + 3 cohort diff RED are
pre-existing.
3. Run the API → Summary diff probe (script in §1 above) to surface
the remaining sub-1e-3 SAP gap. Likely candidates ranked by impact:
- Infiltration (-2 ACH/yr) → check `ventilation_from_cert()`
intermediate outputs for both paths
- HW kWh (+6.7) → check shower outlet count + Appendix J §1a path
- Internal gains (+25.7 W·months) → check pumps_fans + bulb counts
Expect **99 passed / 19 failed**. All 19 failures pre-existing:
9× hand-built 001479 skeleton (`test_sap_result_pin[001479-*]`),
6× cohort diff (`test_from_elmhurst_site_notes_matches_hand_built_*`),
4× cohort chain (000474/000480/000487/000490 — Elmhurst non-spec).
3. Production goal is met for cert 001479. Next work focuses on the
golden cert residual outliers (§4 above) and new (Summary + API)
cert pairs from the user. The diff-probe methodology from Slice 95
(cascade-component diff API vs Summary path; localise; fix mapper)
works for any new (Summary + API) pair — worksheet not required
when Summary path is established as canonical.
4. Don't lose sight of Layer 4: **API → SAP within 1e-4 of worksheet
continuous on cert 001479** is the production goal. Currently
delta +0.0006.
continuous on cert 001479** is the production goal. **MET as of
Slice 95** — `test_api_001479_full_chain_sap_matches_worksheet_pdf_
exactly` formalises this gate.
Good luck. The user is sourcing more cert pairs in parallel; when
they arrive, each one will surface 3-5 mapper bugs along the same
pattern as Slices 87-94. The diagnostic methodology that worked here
(diff Summary-mapper vs API-mapper; localise by cascade component;
fix the API mapper to mirror the Summary's surfacing) will work
again.
The user is sourcing more cert pairs in parallel; when they arrive,
each one will surface ~3-5 mapper bugs along the same pattern as
Slices 87-95. The diagnostic methodology (diff Summary-mapper vs
API-mapper; localise by cascade component; fix the API mapper to
mirror the Summary's surfacing) works for any new (Summary + API)
pair — worksheet not required when Summary path is canonical (cert
001479 proves it is).

View file

@ -78,17 +78,21 @@ _EXPECTATIONS: tuple[_GoldenExpectation, ...] = (
expected_pe_resid_kwh_per_m2=+17.8450,
expected_co2_resid_tonnes_per_yr=+1.0097,
notes=(
"Detached house, TFA 202, age J, oil boiler, Table 4b code 130. "
"API response lodges sap_room_in_roof.room_in_roof_type_1 with "
"gable_wall_length_1/2 + 'Roof room(s), insulated (assumed)' "
"description; our mapper doesn't yet extract these. Until it "
"does, the Simplified Type 1 RR fallback at U_RR_default ages "
"J = 0.30 W/m²K + ΣA_RR_gable/other = 0 over-counts the RR's "
"real heat loss (the cert has retrofit insulation). Pre-RR-fix "
"(commits b01164a2..1928e5a2) this cert coincidentally landed "
"at Δ=0 because RR contribution was missing entirely. Returns "
"to Δ≈0 once the mapper extracts gable lengths + parses the "
"description's '50mm retrofit' signal (handover ticket)."
"Detached house, TFA 118, age J, oil boiler PCDB-listed + PV + "
"RR on BP[0]. Mapper DOES extract sap_room_in_roof.room_in_roof_"
"type_1.gable_wall_length_1/2 (mapper.py:1349) and applies "
"U_RR_J=0.30 via u_rr_default_all_elements — the earlier "
"handover claim of 'gable_wall_lengths not extracted' is stale. "
"Subsystem diff against the cascade: walls 22.95 / roof 76.93 / "
"floor 29.43 / windows 41.55 / doors 11.10 / bridging 39.64 "
"(total HLC 221.6 W/K). Biggest leverage is windows: 11 windows "
"× 18.28 m² × U_default≈2.27 because cert lodges glazing_type=2 "
"and Slice 93's _API_GLAZING_TYPE_TO_TRANSMISSION only covers "
"codes 3 and 13. Surfacing code 2 → measurable U≈1.8-2.0 would "
"close several W/K. Other candidates: BP[0] non-RR ceiling lodges "
"'Pitched, 400+ mm loft insulation' — verify cascade U; possibly "
"RR description-implied insulation nuance (spec basis unclear "
"for RR — unlike regular roofs which have the §5.11.4 50mm rule)."
),
),
_GoldenExpectation(