diff --git a/docs/sap-spec/NEXT_AGENT_PROMPT.md b/docs/sap-spec/NEXT_AGENT_PROMPT.md index 3e0bfad8..d24f46ca 100644 --- a/docs/sap-spec/NEXT_AGENT_PROMPT.md +++ b/docs/sap-spec/NEXT_AGENT_PROMPT.md @@ -1,9 +1,10 @@ -# Handover — API mapper at 1e-3 on cert 001479, closing to 1e-4 +# Handover — API mapper at 1e-4 on cert 001479; investigating goldens You are picking up branch `ara-backend-design-prd`. The cert 001479 API -path is at SAP delta **+0.0006** (was +3.08); fabric heat loss is -EXACT. The remaining work is closing the sub-1e-3 gap and validating -against more cert pairs. +path now hits the worksheet's continuous SAP 69.0094 **at < 1e-4** +(Slice 95). Layer 4 production goal is MET. Remaining work: investigate +golden cert residual outliers (especially cert 0240's -15 SAP) and +process any new (Summary + API) cert pairs the user sources. ## The end goal (re-confirmed by the user) @@ -36,8 +37,8 @@ Layer 4: API mapper cascade SAP = worksheet SAP at 1e-4 (production goal) |---|---| | **1 — hand-built cascade pin** | ✅ 6 cohort certs (000474, 000477, 000480, 000487, 000490, 000516) GREEN at 1e-4; cert 001479 hand-built skeleton (Slice 62) still RED (2 of 11 pins green, hand-built has its own bugs — orthogonal to the production path) | | **2 — Elmhurst-mapped path** | ✅ **Cert 001479 GREEN at 1e-4** (Slice 89); cohort: 2 GREEN (000477, 000516), 4 RED (000474, 000480, 000487, 000490 — Elmhurst U985 worksheets violate the RdSAP 10 §5 (12) spec; orthogonal to the production goal) | -| **3 — API-mapped ≡ Elmhurst-mapped (field-level)** | 🟡 Cascade outputs match within 1e-3 SAP; field-level diff test not yet written | -| **4 — API path cascade SAP** | 🟡 **Cert 001479 at +0.0006 SAP delta from worksheet** (was +3.08); 9 other golden certs pinned at residual-from-integer at tolerance 0 | +| **3 — API-mapped ≡ Elmhurst-mapped (field-level)** | 🟡 Cascade outputs match at 1e-4 (Slice 95); field-level diff test not yet written but lower priority since cascade-output gate exists | +| **4 — API path cascade SAP** | ✅ **Cert 001479 GREEN at 1e-4** (Slice 95). `test_api_001479_full_chain_sap_matches_worksheet_pdf_exactly` formalises the gate. 8 other golden certs pinned at residual-from-integer at tolerance 0 | ## Cumulative API SAP delta progression (cert 001479) @@ -57,7 +58,8 @@ spec rule's premise to be met. Each slice closed one gap: | 91 | descriptive strings via int→str lookups (`floor_construction_type`, `roof_construction_type`) + pre-1950 PS sloping → thickness=0 + per-bp roof description fix | +1.0970 | | 92 | upper-floor `room_height_m += 0.25` + `is_exposed_floor` from `floor_heat_loss==1` + `floor_insulation_thickness="NI"→None` | +1.0022 | | 93 | `window_transmission_details` from `glazing_type` int (code 3 → U=2.8/g=0.76, code 13 → U=1.4/g=0.72) | +1.1846 | -| 94 | `sheltered_sides` from API `built_form` + `floor_type` from `floor_heat_loss==7` | **+0.0006** | +| 94 | `sheltered_sides` from API `built_form` + `floor_type` from `floor_heat_loss==7` | +0.0006 | +| 95 | API mapper `total_floor_area_m2` = Σ per-bp dims (worksheet-precise 68.51 not lodged-rounded 69) + RdSAP 10 §15 p.66 window 2dp area rounding in solar_gains/internal_gains | **< 1e-4** | Fabric breakdown for cert 001479 API path is now COMPLETELY EXACT (all 6 components match worksheet to 4 d.p.): @@ -160,20 +162,47 @@ Each new pair lands as a 1e-4 cascade-pin test. Pattern: ~3-5 new mapper bugs per cert pair (similar to Slice 87-94 on 001479). Each becomes its own slice. Stage by name; one slice = one commit. -### 4. Investigate goldenz with shifted residuals after Slices 87-94 +### 4. Investigate goldens with shifted residuals after Slices 87-95 -The Slice 87-94 fixes shifted residuals on 7 of 10 API-only golden -certs. The new residuals are pinned. Outliers that need attention: +Slices 87-94 shifted residuals on 7 of 10 API-only golden certs; +Slice 95 (precise TFA + window 2dp area rounding) shifted 5 more +(0240, 6035, 8135, 2130, 0390-2254). All residuals are re-pinned. +Current outliers and what we now know: -- **0240** (-14): documented RR mapper gap (`'Roof room(s), - insulated (assumed)'` description not parsed; Type-1 RR - gable_wall_lengths not extracted) -- **0390-2954** (-6): large detached, age F, oil — likely a heating - efficiency cascade gap -- **6035** (-6): mid-terrace age A — possibly party wall config or - ventilation issue - -These are tractable once you have a worksheet for any of them. +- **0240** (-15 SAP, +17.8 PE): Detached age J + RR + 11 windows. The + earlier handover claim of "RR mapper gap" is **partly stale**: + - `room_in_roof_type_1.gable_wall_length_1/2` ARE extracted by the + 21.0.1 mapper (see mapper.py:1349-1369 — must have landed in + Slices 71-86). Cert 0240's RR cascades through with floor_area= + 83.2, gables 6.4 + 6.4, age J → U_RR = 0.30 W/m²K. + - `'Roof room(s), insulated (assumed)'` description NOT parsed — + but the spec basis for parsing it is unclear: age J's Table 18 + col(4) default already models insulation (U=0.30), and unlike + the regular-roof "insulated (assumed)" → 50 mm bucket rule + (RdSAP §5.11.4), no equivalent rule for RR has been identified. + - The -15 SAP residual is a mix, not a single RR gap. Subsystem + breakdown for cert 0240 (via cert_to_inputs cascade): + - walls 22.95, party 0, roof 76.93 (incl RR ~18.5), floor 29.43, + windows 41.55, doors 11.10, bridging 39.64; total HLC 221.6 W/K + - **windows_w_per_k = 41.55 is the most leverageable**: 11 + windows × 18.28 m² × U_default ≈ 2.27 W/m²K. Cert lodges + `glazing_type=2` for all windows but Slice 93's + `_API_GLAZING_TYPE_TO_TRANSMISSION` only covers codes 3 and 13; + surfacing code 2 would land a measurable U (likely ~1.8-2.0) + and close several W/K of fabric loss. + - Other potential gains: BP[0] non-RR ceiling lodges "Pitched, + 400+ mm loft insulation" (should U ~0.10); verify cascade + gives it that. + - **Net**: cert 0240 is not a single-slice fix; it's 3-5 + progressive mapper improvements (glazing_type 2 surfacing, + possibly more glazing codes, possibly RR description nuance). +- **0390-2954** (-6 SAP, -26.5 PE): large detached F (TFA 360), oil + PCDB-listed. Undocumented. PE going more negative than SAP suggests + the cost cascade is hitting harder than energy — possibly oil + price/efficiency interaction. +- **6035** (-6 SAP, +49.5 PE): mid-terrace age A + RR. Probably has + the same glazing_type-default-U issue as 0240 plus an age-A- + specific gap. ### 5. (deferred) Cohort chain test RED triage @@ -216,9 +245,10 @@ override field, (c) wait for more cert pairs to confirm pattern. worksheets (000474, 000477, 000480, 000487, 000490, 000516). - `sap worksheets/U985-0001-NNNNNN.txt` × 6 — text exports of above. -## Recent slice history (Slices 87-94, current branch) +## Recent slice history (Slices 87-95, current branch) ``` +f502db8c Slice 95: API mapper TFA from per-bp dims + window area 2dp rounding — cert 001479 to 1e-4 03203418 Slice 94: API mapper sheltered_sides + floor_type — cert 001479 to 1e-3 7281b7b3 Slice 93: API mapper window_transmission_details from glazing_type 8e752e57 Slice 92: API mapper floor dimensions (SAP +0.25m + exposed-floor + NI→None) @@ -237,8 +267,8 @@ before this rewrite). ## First action -1. Confirm branch state matches `git log --oneline -1` → - `03203418` Slice 94. +1. Confirm branch state — Slice 95 (`f502db8c`) closed cert 001479 to + < 1e-4 (was +0.0006 after Slice 94). Layer 4 is GREEN. 2. Run the full sweep: ```bash PYTHONPATH=/workspaces/model:/workspaces/model/packages/domain/src \ @@ -247,23 +277,25 @@ before this rewrite). packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py \ --no-cov -q ``` - Expect ~75 passed / ~16 failed. The 9 failures on - `test_sap_result_pin[001479-*]` (cohort cascade for the hand-built - skeleton) and 4 cohort chain RED + 3 cohort diff RED are - pre-existing. -3. Run the API → Summary diff probe (script in §1 above) to surface - the remaining sub-1e-3 SAP gap. Likely candidates ranked by impact: - - Infiltration (-2 ACH/yr) → check `ventilation_from_cert()` - intermediate outputs for both paths - - HW kWh (+6.7) → check shower outlet count + Appendix J §1a path - - Internal gains (+25.7 W·months) → check pumps_fans + bulb counts + Expect **99 passed / 19 failed**. All 19 failures pre-existing: + 9× hand-built 001479 skeleton (`test_sap_result_pin[001479-*]`), + 6× cohort diff (`test_from_elmhurst_site_notes_matches_hand_built_*`), + 4× cohort chain (000474/000480/000487/000490 — Elmhurst non-spec). +3. Production goal is met for cert 001479. Next work focuses on the + golden cert residual outliers (§4 above) and new (Summary + API) + cert pairs from the user. The diff-probe methodology from Slice 95 + (cascade-component diff API vs Summary path; localise; fix mapper) + works for any new (Summary + API) pair — worksheet not required + when Summary path is established as canonical. 4. Don't lose sight of Layer 4: **API → SAP within 1e-4 of worksheet - continuous on cert 001479** is the production goal. Currently - delta +0.0006. + continuous on cert 001479** is the production goal. **MET as of + Slice 95** — `test_api_001479_full_chain_sap_matches_worksheet_pdf_ + exactly` formalises this gate. -Good luck. The user is sourcing more cert pairs in parallel; when -they arrive, each one will surface 3-5 mapper bugs along the same -pattern as Slices 87-94. The diagnostic methodology that worked here -(diff Summary-mapper vs API-mapper; localise by cascade component; -fix the API mapper to mirror the Summary's surfacing) will work -again. +The user is sourcing more cert pairs in parallel; when they arrive, +each one will surface ~3-5 mapper bugs along the same pattern as +Slices 87-95. The diagnostic methodology (diff Summary-mapper vs +API-mapper; localise by cascade component; fix the API mapper to +mirror the Summary's surfacing) works for any new (Summary + API) +pair — worksheet not required when Summary path is canonical (cert +001479 proves it is). diff --git a/packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py b/packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py index 97a654ca..9ee7d722 100644 --- a/packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py +++ b/packages/domain/src/domain/sap/rdsap/tests/test_golden_fixtures.py @@ -78,17 +78,21 @@ _EXPECTATIONS: tuple[_GoldenExpectation, ...] = ( expected_pe_resid_kwh_per_m2=+17.8450, expected_co2_resid_tonnes_per_yr=+1.0097, notes=( - "Detached house, TFA 202, age J, oil boiler, Table 4b code 130. " - "API response lodges sap_room_in_roof.room_in_roof_type_1 with " - "gable_wall_length_1/2 + 'Roof room(s), insulated (assumed)' " - "description; our mapper doesn't yet extract these. Until it " - "does, the Simplified Type 1 RR fallback at U_RR_default ages " - "J = 0.30 W/m²K + ΣA_RR_gable/other = 0 over-counts the RR's " - "real heat loss (the cert has retrofit insulation). Pre-RR-fix " - "(commits b01164a2..1928e5a2) this cert coincidentally landed " - "at Δ=0 because RR contribution was missing entirely. Returns " - "to Δ≈0 once the mapper extracts gable lengths + parses the " - "description's '50mm retrofit' signal (handover ticket)." + "Detached house, TFA 118, age J, oil boiler PCDB-listed + PV + " + "RR on BP[0]. Mapper DOES extract sap_room_in_roof.room_in_roof_" + "type_1.gable_wall_length_1/2 (mapper.py:1349) and applies " + "U_RR_J=0.30 via u_rr_default_all_elements — the earlier " + "handover claim of 'gable_wall_lengths not extracted' is stale. " + "Subsystem diff against the cascade: walls 22.95 / roof 76.93 / " + "floor 29.43 / windows 41.55 / doors 11.10 / bridging 39.64 " + "(total HLC 221.6 W/K). Biggest leverage is windows: 11 windows " + "× 18.28 m² × U_default≈2.27 because cert lodges glazing_type=2 " + "and Slice 93's _API_GLAZING_TYPE_TO_TRANSMISSION only covers " + "codes 3 and 13. Surfacing code 2 → measurable U≈1.8-2.0 would " + "close several W/K. Other candidates: BP[0] non-RR ceiling lodges " + "'Pitched, 400+ mm loft insulation' — verify cascade U; possibly " + "RR description-implied insulation nuance (spec basis unclear " + "for RR — unlike regular roofs which have the §5.11.4 50mm rule)." ), ), _GoldenExpectation(