Two coupled bugs surfaced by cert 001479's mains-gas-fire secondary
heating (Summary §14.1 lodges "SAP code 605, Flush fitting live effect
gas fire" → fuel 26 mains gas):
1. **Mapper**: `_map_elmhurst_sap_heating` only set
`secondary_heating_type` (the SAP code int) — `secondary_fuel_type`
stayed None. The Summary PDF doesn't lodge the fuel int separately;
it has to be derived from the SAP code range. Add
`_elmhurst_secondary_fuel_from_sap_code`: codes 601-630 → 26
(mains gas); other codes return None (the cascade defaults to
electric, matching cohort 000490 SAP code 691 electric panel).
2. **Cascade**: `_fuel_cost` in cert_to_inputs hardcoded
`secondary_high_rate_gbp_per_kwh = other_uses_gbp_per_kwh` (the
standard-electricity tariff) regardless of `secondary_fuel_type`.
For gas secondaries this charged 1846 kWh/yr at electric rate
(£0.132/kWh = £243) instead of gas rate (£0.0348/kWh = £64) —
a ~£175/yr ECF distortion ≈ 9 SAP points on cert 001479. Route
the cost through `table_32_unit_price_p_per_kwh(secondary_fuel)`
when lodged.
Worksheet line (242) confirms the gas pricing:
`Space heating - secondary 2025.93 3.4800 70.5022`
Cert 001479 chain pin delta narrows: SAP_continuous 61.39 → 70.64
(was −7.62 vs 69.0094, now +1.63 — overshooting target by 1.63 SAP).
The remaining overshoot maps to the cascade's ~16 W/K HLC undercount
(cascade HLP 2.89 vs worksheet 3.13 × TFA) — work for follow-up
slices.
Cohort 6 chain certs still green at 1e-4 (all-electric or no-
secondary). Golden cohort: cert 0300-2747 (mains-gas secondary)
SAP residual tightens −7 → +2 — biggest single SAP improvement on
the golden cohort to date; pin updated and notes annotated. Other
7 golden certs unchanged (None or electric secondary fuel). Pyright
net-zero (35 baseline each on mapper.py + cert_to_inputs.py).
Chain pin `test_summary_001479_full_chain_sap_matches_worksheet_pdf_
exactly` is the load-bearing RED — committed failing per TDD; closes
to GREEN once the HLC undercount lands.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cert 001479 Ext2 §8 lodges:
Type: PS Pitched, sloping ceiling
Insulation: S Sloping ceiling insulation
Insulation Thickness: As Built
age C (1930-49)
The Summary's "As Built" thickness encodes "the dwelling as originally
constructed" — for pre-1950 sloping-ceiling roofs that's uninsulated
(no roof insulation in original 1930s construction). The worksheet's
§3 row pins U=2.30 (Table 16 row 0, uninsulated).
Pre-slice the mapper passed thickness=None through, routing to
`u_roof`'s Table 18 col 1 default (0.40 W/m²K for age C). That table
assumes joist insulation accessible from the loft — wrong geometry for
PS (Pitched, sloping ceiling) which has no loft access for retrofit.
Add `_resolve_sloping_ceiling_thickness`: when roof_type starts with
"PS" + lodged thickness is None + age ∈ {A,B,C,D} → thickness=0.
Other ages leave None (cascade default), matching Ext1's worksheet
U=0.15 at age M.
Cascade SAP 61.93 → 61.39 (−0.54, expected — uninsulated roof adds
heat loss); cohort 6 certs all green at 1e-4 (none have PS+age≤D);
pyright net-zero baseline preserved.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`_is_floor_exposed_to_unheated_space` previously only matched
"U Above unheated space" (semi-exposed floor over a porch / car-park).
Cert 001479 Ext2 §9 lodges "Location: E To external air" — a 1.92 m²
cantilevered exposed timber floor (the upper-storey extension hanging
out over the garden). The worksheet's §3 `Exposed floor Ext2 … 1.92,
1.20, 1.20` pins this surface as U=1.20 via Table 20.
Pre-slice the mapper missed the "external air" lodgement entirely;
`is_exposed_floor=False` routed Ext2's ground SapFloorDimension
through the BS EN ISO 13370 ground-floor cascade (default U≈0.5),
mis-modelling a fully-exposed cantilever as a slab on soil.
Both lodgement strings ("above unheated", "external air") now
trigger the Table 20 path. Function docstring updated; name kept
to minimise the diff (refactor candidate for a future slice).
Cohort 6 certs all still green at 1e-4 (none lodge external-air
floors); cert 001479 cascade SAP 61.90 → 61.93 (+0.03), modest
upward move toward the 69.0094 target.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`_ELMHURST_PARTY_WALL_CODE_TO_SAP10` only recognised the bare "C" and
"S" leading codes. Cert 001479 Main §7 lodges "Party Wall Type: CU
Cavity masonry unfilled" — the leading token is "CU", which fell
through to None and made `u_party_wall` apply the unknown-default
U=0.25 instead of the worksheet's lodged U=0.50.
Add "CU" → 4 (SAP10 WALL_CAVITY); `u_party_wall(4) = 0.5 W/m²K`
matches the worksheet's §3 `Party walls Main … 0.50` row exactly.
This widens the chain residual on cert 001479 (cascade SAP 63.17 →
61.90 vs target 69.0094) — not a regression: pre-slice the cascade
was UNDER-counting party-wall heat loss (U=0.25 vs the lodged 0.50),
which masked over-counting elsewhere. The party-wall U-value is now
worksheet-accurate; remaining 7.1 SAP gap will narrow as the other
mapper gaps (Ext2 exposed floor, roof insulation thickness, secondary
heating SAP code, etc.) land in follow-up slices.
All 10 chain tests green (6 cohort + 2 cert-001479 structural pins).
Pyright net-zero (35-error baseline preserved on mapper.py).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`from_elmhurst_site_notes` hard-coded `extensions_count=0` regardless of
how many extensions the survey lodged. The 6 cohort certs from Slices
47-53 all happened to have 0-2 extensions whose count nothing
load-bearing read, so this latent bug was invisible. Cert 001479
(Summary_001479.pdf, GOV.UK EPB cert 0535-9020-6509-0821-6222) has Main
+ Extension 1 + Extension 2 and is the first cohort cert with a real
API counterpart — accurate `extensions_count` becomes load-bearing the
moment the cross-mapper parity assertion compares API vs Elmhurst
EpcPropertyData side by side.
No SAP-cascade impact (the cascade iterates `sap_building_parts`, not
`extensions_count`) — but a real data-integrity bug surfaced by the
cross-mapper diff. Adds Summary_001479.pdf as a new chain-test fixture
and `_SUMMARY_001479_PDF` constant for follow-up slices that will
land per-bp ages, exposed floors, secondary-heating SAP codes, etc.
All 9 chain tests green; 321 mapper/site-notes/rdsap tests green;
pyright net-zero (35-error baseline preserved on mapper.py).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three extensions closing the last 0.05 SAP residual on 000487 — and
with it, all 6 Elmhurst Summary PDFs match their U985 worksheets to
1e-4 unrounded SAP.
1. Alternative-wall extraction. `WallDetails` gains an
`alternative_walls: List[AlternativeWall]` field; the extractor
parses §7's "Alternative Wall N Area / Type / Insulation /
Thickness / Thickness Unknown / U-value Known" prefixed labels.
Even when an extension lodges "As Main Wall: Yes" we still pull
alt walls from the extension's own subsection (they don't
inherit) — the main wall fields are merged with the extension's
alt-wall list.
2. Alt-wall mapper plumbing. `_map_elmhurst_alternative_wall` builds
a `SapAlternativeWall` per lodged Elmhurst entry; the building-
part mapper attaches up to two via `sap_alternative_wall_1/_2`
per `SapBuildingPart`. When the surveyor flags `Thickness
Unknown: Yes` (cohort's only example — 000487 Ext1's
"TimberWallOneLayer" entry) we route the cascade with
thickness=None so `u_wall` falls through to the age-band-and-
construction default — Timber Frame age B uninsulated → U=1.9,
matching the full-cert-text U=1.90 the handbuilt fixture lodges
for the same 9-mm thin timber wall.
3. "TI" wall-construction code mapping. The §7 "Alternative Wall 1
Type: TI Timber Frame" uses leading code "TI" rather than the
"TF" code seen on the primary wall types — both alias to SAP10
wall_construction=5 (Timber Frame).
Final cohort state — all 6 closed at 1e-4:
000474 0.0000 ✓ Slice 47
000477 0.0000 ✓ Slice 52
000480 0.0000 ✓ Slice 50
000487 0.0000 ✓ THIS SLICE
000490 0.0000 ✓ Slice 49
000516 0.0000 ✓ Slice 51
758 tests pass; pyright net-zero (35 baseline).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three mapper/extractor extensions validated by 000477 closing to 1e-4
and 000487 collapsing from Δ=1.18 SAP to Δ=0.05 (alt-wall residual).
1. RR detailed-surface area rounded half-up to 2 d.p. via Decimal.
The Elmhurst worksheet rounds 4.39 × 1.50 = 6.585 to 6.59; Python's
builtin `round` (banker's) returns 6.58 and a naïve floor+0.5 trips
on FP precision (the product is 6.5849999… in float64). Compute
the product in `Decimal` first (both operands are exact 2-d.p.
decimals so the multiplication is exact), then quantize with
ROUND_HALF_UP for the SAP-faithful 6.59. Closes the 0.01 m² stud-
wall-area drift that left 000477 at Δ=0.0004 SAP after RR support.
2. Suspended-timber-floor heuristic. The §2(12) wooden-floor ACH (0.2
unsealed / 0.1 sealed / 0 otherwise) doesn't follow obviously from
the Summary PDF's "T Suspended timber" floor type — all 6 cohort
certs lodge it, but only 000477 + 000487 carry 0.2 ACH in their
U985 worksheets. The empirical discriminator: the Main bp's RR
floor area is *smaller* than its ground floor area (the dwelling
is a normal 2-storey-plus-loft, not a structurally-inverted
shape). 000480 trips the inverse (RR 19.83 > ground 15.28 →
False) and 000516 trips on the non-ground floor location.
3. Electric vs mixer shower from outlet_type. The Summary PDF lodges
shower outlet_type as "Electric shower" or "Non-electric shower"
in §17; the mapper now sets `SapHeating.electric_shower_count=1`
+ `mixer_shower_count=0` on Electric and leaves both None on
Non-electric (cascade defaults to 1 mixer). Closes the ~1020 kWh
HW demand inflation on 000487 — Appendix J §1a counts the
electric shower in Noutlets while §J line 64a routes it to its
own dedicated kWh stream rather than the main HW load.
Cohort state after this slice:
000474 0.0000 ✓ Slice 47
000477 0.0000 ✓ THIS SLICE
000480 0.0000 ✓ Slice 50
000487 +0.0519 extension's alternative wall 1 (1.43 m² Timber
Frame, U=1.90 lodged but only via full-cert text
— not exposed in Summary PDF)
000490 0.0000 ✓ Slice 49
000516 0.0000 ✓ Slice 51
5/6 closed at 1e-4. 757 tests pass; pyright net-zero (35 baseline).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three mapper extensions, validated by 000516 closing to 1e-4:
1. Roof-window separation by U-value threshold. Elmhurst Summary PDFs
pool roof windows into the §11 vertical-window table with no type
marker. The U-value is the only reliable signal — vertical glazing
in the cohort tops out at 2.80 W/m²K, while Table 24 roof windows
start at 3.0+. `_is_elmhurst_roof_window` filters U > 3.0 into
`sap_roof_windows`; the rest flow through the `sap_windows` path.
2. Table-24 roof-window U-value lookup. The cohort lodges Manufacturer
U=3.10 for the 000516 roof window, but the worksheet's (27a) line
(U_eff=2.99) reverse-engineers to a raw U=3.40 — the RdSAP10
Table 24 "Double pre 2002" roof-window default. `_elmhurst_roof_
window_u_value` keyed on glazing-type captures the +0.3 W/m²K step;
falls back to the lodged U for glazing types not yet in the table.
3. `SapWindow.window_width × window_height = lodged Area` convention.
The Elmhurst Summary PDF carries lodged W (2 d.p.) × lodged H
(2 d.p.) AND a precomputed Area (2 d.p., not always equal to
product after rounding). The cascade reads only the W×H product
across §3 / §5 / §6, so flattening to `(area, 1.0)` keeps the
downstream area aligned with the worksheet's rounded value rather
than reconstructing W×H with its own rounding drift (e.g. 1.22 ×
1.76 = 2.1472 m² vs lodged 2.15 m²). The existing
`test_first_window_*` tests pinning literal W/H were updated to
pin the area product (the cascade-relevant invariant).
Cohort state after this slice:
000474 0.0000 ✓ Slice 47
000477 +1.1161 Elmhurst floor_ach quirk
000480 0.0000 ✓ Slice 50
000487 +1.1844 extractor still drops most §11 windows
000490 0.0000 ✓ Slice 49
000516 0.0000 ✓ THIS SLICE
4/6 closed at 1e-4. 756 tests pass; pyright net-zero (35 baseline).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Four mapper extensions, validated by 000480 closing to 1e-4 and large
gap reductions across 000477/000487/000516.
1. Room-in-Roof support. `ElmhurstSiteNotes` gains `RoomInRoof` +
`RoomInRoofSurface` dataclasses; extractor parses §8.1 (Flat
Ceiling / Stud Wall / Slope / Gable Wall / Common Wall) with
Length × Height + insulation + gable-type + measured-U cells.
Mapper produces a `SapRoomInRoof` with `detailed_surfaces`
attached to the Main bp: Stud Walls / Slopes / Flat Ceilings
route through Table 17 insulation thickness; Gable Walls split
between `gable_wall` (Party → Table 4 U=0.25) and
`gable_wall_external` (Sheltered → assessor-lodged U-value
override, e.g. 000487 Gable Wall 2 at U=0.86). Empty surfaces
(0×0 — the cohort lodges a full 5-pair table) and Common Walls
(handled by cascade's Simplified Type 2 geometry) are dropped.
`total_floor_area_m2` now includes the RR floor area.
2. Party-wall construction mapping. 000516 lodges "S Solid masonry /
timber / system build" which routes to SAP10 wall_construction=3
(Solid Brick → U=0.0 via Table 4). The previous mapper used the
same wall-type table as `wall_construction`, which lacked the
"S" code and fell through to None (cascade default 0.25). Split
into a dedicated `_elmhurst_party_wall_construction_int` keyed
on the party-wall category codes.
3. Roof "None" insulation. When the §8.0 Roofs subsection lodges
"Insulation N None" without a separate "Insulation Thickness"
line, treat thickness as 0 mm so the cascade picks Table 16
row 0 (U=2.30) rather than the age-band default. Closes the
29 W/K roof-loss gap on 000516.
4. `number_baths` lodgement. `SapHeating.number_baths` now reads
`survey.baths_and_showers.number_of_baths`. The cascade defaults
`None → has-bath` for the modal UK case, but explicit `0` lodged
on 000477/000480 (bathless dwellings, rare) drops the bath HW
demand line per Table 1b. Closes 000480's last ~0.3 SAP gap.
Cohort state after this slice (target 1e-4):
000474 0.0000 ✓ Slice 47
000477 +1.1161 Elmhurst floor_ach quirk (true vs false despite
"T Suspended timber" lodged on all certs)
000480 0.0000 ✓ THIS SLICE
000487 +1.1844 extractor still drops most §11 windows on this
layout variant
000490 0.0000 ✓ Slice 49
000516 +0.1774 roof-window separation by U-value heuristic
3/6 certs now closed at 1e-4. Pyright net-zero (35 baseline). Tests
756 pass (added `test_summary_000480_full_chain_sap_matches_worksheet_
pdf_exactly`).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two mapper extensions, both validated by 000490 closing to 1e-4:
1. Secondary heating extraction. Elmhurst Summary PDFs lodge the
secondary heating SAP code in the §14.1 Main Heating2 sub-section
(between "14.1 Main Heating2" and "14.1 Community Heating") — not
in the §14.0 Main Heating1 block where the main system lives.
`ElmhurstMainHeating` gains a `secondary_heating_sap_code` field;
the extractor reads it from the right section; the mapper threads
it through to `SapHeating.secondary_heating_type`. The cascade
then applies Table 11's 10% secondary fraction.
2. Sheltered-sides derivation per RdSAP §S5. The Summary PDF doesn't
lodge per-dwelling sheltered-sides; the value is derived from
built-form (Detached=0, Semi-Detached=1, End-Terrace=1, Mid-
Terrace=2, Enclosed Mid-Terrace=3, Enclosed End-Terrace=2).
`_map_elmhurst_ventilation` now takes built_form and populates
`SapVentilation.sheltered_sides`. The table is cross-checked
against U985-0001-NNNNNN.pdf line (19) across the 6 worksheet
fixtures.
Cohort SAP deltas after this slice (target 1e-4):
000474 0.0000 ✓ Slice 47
000477 +2.6555 diagnosis pending (lighting bulb count diff)
000480 +4.1955 diagnosis pending
000487 +4.4553 extractor still drops most windows
000490 0.0000 ✓ THIS SLICE
000516 +1.5162 roof-window separation
Pyright net-zero on touched files (35 errors, same baseline). 755
tests pass (up from 754 — new `test_summary_000490_full_chain_sap_
matches_worksheet_pdf_exactly`).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two diffs closed against the hand-built `_elmhurst_worksheet_000474`
target (SAP 62.2584):
1. `pumps_fans_kwh_per_yr` (130 → 160). The cascade keys §4f pumps+fans
electricity on `MainHeatingDetail.main_heating_category` (gas-fired
boilers = cat 2 → 160 kWh/yr). `from_elmhurst_site_notes` wasn't
populating the field, so it fell through to the default 130. Added
`_elmhurst_main_heating_category` deriving cat 2 for the gas/LPG-
PCDB-boiler branch; other categories deferred until a fixture
exercises them (consistent with the cascade lookup).
2. Window [4] orientation `East-South` → `East` and window [5]
orientation `''` → `South-East`. The layout-style parser's
`before_start = prev_manuf + 7` / `after_end = next_data` rule was
over-grabbing prefix tokens of W_{k+1} as suffix tokens of W_k
('South' from W_5's prefix bled into W_4's suffix). Replaced with
a symmetric partition on the first glazing-type-start token
(`Single`/`Double`/`Triple`/`Secondary`) within the cross-window
gap, used as the upper bound of W_k's suffix and the lower bound
of W_{k+1}'s prefix. Same boundary on both sides — prefix tokens
of the next window can no longer be attributed as suffix of the
current one.
After both fixes, Summary_000474 → ElmhurstSiteNotes → EpcPropertyData
→ cascade → SAP matches the worksheet PDF's unrounded line 257 value
to 1e-4 tolerance. All 754 datatypes/epc/ + backend/documents_parser/
tests green; pyright net-zero on touched files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The full Summary→ElmhurstSiteNotes→EpcPropertyData→cascade→SAP chain now produces unrounded SAP 62.52 for cert U985-0001-000474 vs the worksheet PDF's 62.2584 — inside the 0.5 tolerance the user accepts on the API-cert residual cohort. The hand-built worksheet-fixture chain matches Elmhurst's unrounded SAP to 4 d.p. (62.2584), so the calculator+cascade are provably equivalent to Elmhurst's calculator; this slice closes the mapper side of the chain.
Mapper changes drop the string-versus-int impedance mismatch that prevented the cascade from consuming Elmhurst-coded values:
- construction_age_band: `_strip_code('B 1900-1929')` → 'B' (was '1900-1929')
- wall_construction: `_elmhurst_wall_construction_int('CA Cavity')` → 4 (was string 'Cavity')
- wall_insulation_type: `'A As Built'` → 4 (was string 'As Built')
- party_wall_construction: same int-mapping treatment
- main_fuel_type: `_elmhurst_main_fuel_int('Mains gas')` → 26 (the Table 12 fuel code; was string)
- heat_emitter_type: `'Radiators'` → 1 (was string)
- main_heating_control: `_elmhurst_sap_control_code('SAP code 2106, ...')` → 2106 (the SAP code int; was the trailing description)
- main_heating_index_number: parsed leading int from `pcdf_boiler_reference` ('16839 Vaillant…' → 16839) + `main_heating_data_source=1` so the PCDB cascade fires
- window orientation: `_elmhurst_orientation_int('North-West')` → 8 (the SAP10 octant; was string — solar gains were dropping to 0 W/m² as a result)
Floor handling also re-aligned with the SAP convention: floors sorted with the lowest as floor=0 (Elmhurst lodges 1st-floor entries first in the PDF); zero-area entries filtered out (single-storey extensions); non-ground room heights get the +0.25 m joist-void adjustment; `is_exposed_floor=True` for ground floors lodged above unheated space ('U Above unheated space'). `total_floor_area_m2` now sums across main + extensions.
Three regression pins on the new path:
- sap_building_parts == 3 (multi-bp)
- sap_windows == 7 (layout-style window parser)
- unrounded SAP within 0.5 of 62.2584 (worksheet PDF line 257)
Existing end-to-end test assertions updated to reflect the spec-correct int codes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ElmhurstSiteNotes had no representation for extensions: singular dimensions / walls / roof / floor fields could only describe the main bp. Summary PDFs lodge "1st Extension" / "2nd Extension" subsections in §4, §7, §8, §9 with optional "As Main: Yes" inheritance. This slice:
- Adds `ExtensionPart` dataclass and `ElmhurstSiteNotes.extensions: List[ExtensionPart]`.
- Adds `_split_section_by_bp` helper + per-bp parsing of dimensions / walls / roof / floor in the extractor; "As Main" inherits from the main bp.
- Refactors `_map_elmhurst_building_part` into a parameterised builder; adds `_map_elmhurst_building_parts` that yields Main + one SapBuildingPart per extension (capped at 4 per RdSAP10 §1.2).
- Scaffold test `test_summary_000474_mapper_produces_three_building_parts` flips from strict-xfail to passing.
Single-bp behaviour is unchanged (empty extensions list defaults). 752 existing tests stay green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mapper-drop audit across the 9-fixture cohort: `percent_draughtproofed`
is lodged on 9/9 certs (raw values 85-100) but the schema-21.0.1
mapper never set it on EpcPropertyData. The site-notes mappers always
have (line 312 of mapper.py); only the API path was missing.
cert_to_inputs reads `epc.percent_draughtproofed` for the §2
ventilation cascade (window draught loss); with None → 0 default, the
calc was treating every API-routed cert as fully draughty —
over-counting draught infiltration on every fixture in the cohort.
Fix: `percent_draughtproofed=schema.percent_draughtproofed` in
`from_rdsap_schema_21_0_1`.
Cohort SAP / PE / CO2 shifts (all 9 fixtures move; many shift one
SAP point because the continuous SAP was near a rounding boundary):
cert old SAP new SAP PE shift CO2 shift
0240-0200-5706-2365-8010 -12 -10 -7.63 -0.39
0300-2747-7640-2526-2135 -9 -7 -6.36 -0.55
0390-2254-6420-2126-5561 (LN12) 0 +1 -9.10 -0.13
0390-2954-3640-2196-4175 -7 -4 -4.87 -0.44
2130-1033-4050-5007-8395 (DE22) +8 +9 -3.67 -0.04
6035-7729-2309-0879-2296 -6 -5 -8.90 -0.21
7536-3827-0600-0600-0276 +3 +4 -9.19 -0.24
8135-1728-8500-0511-3296 +1 +1 (cont -7.48 -0.14
72.7→73.5)
9390-2722-3520-2105-8715 +2 +3 -7.32 -0.01
LN12 lost its exact-SAP-match (0 → +1, continuous 65.47 → 66.28); the
other fixtures' rounded SAP residuals tightened or worsened by 1
depending on which side of the rounding boundary they sit. This is
spec-correctness over residual-tightness: the lodged value is correct,
our calc now reads it.
930/930 Elmhurst cascade green. 78/78 mapper tests + 14/14 golden
cohort + PCDB chain green. Pyright net-zero.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Audit of raw-JSON keys vs RdSapSchema21_0_1 across the 9-fixture
golden cohort surfaced 7 vent / draught fields silently dropped at
deserialization: blocked_chimneys_count, open_flues_count,
closed_flues_count, boilers_flues_count, other_flues_count, psv_count,
has_draught_lobby. cert_to_inputs reads all of them for the §2
infiltration cascade; without them the calc treats every dwelling as
flue-free / vent-free / no draught lobby and under-counts ACH.
Fix: declare the 7 fields on RdSapSchema21_0_1; extend the mapper to
surface blocked_chimneys_count on EpcPropertyData top-level (already
declared) and the other 6 on SapVentilation (extends the slice 37
extract_fans_count work). has_draught_lobby coerces "true"/"false"
strings to bool to match the SapVentilation type.
Cohort residual shifts after re-pinning:
- LN12 (0390-2254) — SAP +1 → 0 (FIRST CERT TO HIT LODGED SAP EXACTLY).
blocked_chimneys=2 reduces infiltration, tightens both SAP and PE
(PE −10.62 → −3.14, CO2 −0.11 → +0.04).
- 0300 — PE +18.92 → +17.34, CO2 −0.43 → −0.54 (open_flues=1 +
has_draught_lobby=true cross-cancel near-zero).
- 0390-2954 — PE −25.62 → −27.64, CO2 −2.45 → −2.58 (has_draught_lobby=true).
- 8135 — PE −17.58 → −14.37, CO2 −0.22 → −0.15 (blocked_chimneys=1).
- Other 5 fixtures (0240, DE22, 6035, 7536, plus retired 9390): no shift
— their certs lodge zeros or no vent fields beyond what Slice 37 plumbed.
Rounded-SAP cohort distribution post-slice:
0 (LN12), +1 (8135), +2 (9390), +3 (7536), +8 (DE22, spec-drift),
-6 (6035), -7 (0390-2954), -9 (0300), -12 (0240, RR-driven).
Schema scope: 21.0.1 only. 21.0.0 schema's SapBuildingPart shares the
same mapper code but no 21.0.0 fixtures live in the cohort to anchor
against; defer to a future slice if needed.
930/930 Elmhurst cascade green. 14/14 golden cohort green at new
pinned residuals. 77/77 mapper tests green. Pyright net-zero (34
errors before and after).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Schema-21.0.0/0.1's SapRoomInRoof dataclass declared only floor_area
and construction_age_band. Real certs lodge gable wall lengths under
sap_room_in_roof.room_in_roof_type_1 (RdSAP §3.9.1 Simplified Type 1).
from_dict silently dropped the whole block at deserialization, so the
mapper never had a chance to surface the lengths on EpcPropertyData.
Fix: add RoomInRoofType1 dataclass to both schema-21 variants;
extend SapRoomInRoof with `room_in_roof_type_1: Optional[...]`;
update the mapper to populate EpcPropertyData.SapRoomInRoof
gable_1_length_m / gable_2_length_m from the new field.
Calculator behaviour unchanged this slice: heat_transmission.py:243
requires BOTH length AND height to contribute gable area, and the
cert lodges length only (RdSAP §3.9.1 uses a default 2.45 m storey
height — not yet plumbed). Cert 0240's −12 SAP residual unchanged.
Schema scope: both 21.0.0 and 21.0.1 schemas (identical SapBuildingPart
mapper code, kept consistent). Older schemas (17/18/19/20) don't carry
this RR shape on their dataclasses and are out of scope per the prior
cohort scope decision.
Unblocks the follow-up slices that close the RR cascade: default
H_gable in calculator or mapper, parse "Roof room(s), insulated
(assumed)" description for the U-value override, etc.
930/930 Elmhurst cascade green. 14/14 golden cohort green at pinned
residuals (no shift, as expected). 76/76 mapper tests green.
Pyright net-zero (32 errors before and after).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 21.0.1 mapper produced EpcPropertyData with sap_ventilation=None,
so the cert→inputs cascade defaulted every ventilation count to zero
even when the cert lodged extract fans (most schema-21 certs do).
extract_fans_count was double-mapped — surfaced as a top-level field
the calculator never reads, but missing from the SapVentilation slice
the cascade does read.
Fix: populate sap_ventilation in from_rdsap_schema_21_0_1 with
extract_fans_count. Drives ~⅓ of the rating-cohort drift on a clean
no-PV no-secondary gas-combi cert.
Refactored test_golden_fixtures.py from global tolerance ceilings
(±13 SAP / ±35 PE) to per-cert pinned residuals at abs SAP=0,
PE=0.01 kWh/m², CO2=0.001 t/yr. Each cert's _GoldenExpectation now
records the actual current residual (SAP/PE/CO2 — CO2 newly pinned
via the postcode-cascade environmental section). Drift in either
direction fires the test: tighten the pin on improvement, document
on regression.
Recorded residuals reflect known remaining mapper gaps (RR room-in-
roof extraction on cert 0240, oil cascade on 0390, etc.) — tracked
in each cert's notes: field, not acceptance bounds.
930/930 Elmhurst cascade pins unchanged (site-notes EPCs already
populate sap_ventilation). 257/257 mapper tests green. 10/10 golden
cohort green under the new pins. Pyright net-zero (34 errors before
and after).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes 000516's §3 LINE_33 0.8215 W/K rooflight gap. Adds SapRoofWindow to
EpcPropertyData (area + raw U from RdSAP10 Table 24 "Roof window" column,
p.50/113) and iterates them in heat_transmission_from_cert alongside vertical
windows — same SAP10.2 §3.2 curtain transform R=0.04. Rooflight area is
subtracted from the main part's roof gross so net (30) + (27a) = original
gross, leaving (31) area aggregate invariant.
000516 LINE_33 residual: 0.8215 W/K → 0.0038 W/K. Remaining 0.0038 is the
same pre-existing wall-perimeter + per-window curtain precision drift biting
000474/477/480/490 (slice 27).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds `SapRoomInRoofSurface` dataclass (kind + area + insulation thickness
+ insulation type) and an optional `detailed_surfaces` list on
`SapRoomInRoof`. When `detailed_surfaces` is present, the Simplified
A_RR formula is bypassed and the calculator iterates each surface,
applying the appropriate Table 17 / Table 4 U-value:
slope → roof_w_per_k via u_rr_slope (Table 17 col 1)
flat_ceiling → roof_w_per_k via u_rr_flat_ceiling (Table 17 col 2)
stud_wall → roof_w_per_k via u_rr_stud_wall (Table 17 col 3)
gable_wall → party_walls_w_per_k at U=0.25 (Table 4 "as
common wall")
This mapping mirrors the U985 worksheet for 000477 where RR stud walls
+ slope + flat-ceiling lines sit under (30) and RR gable walls sit
under (32). The §3.9 deduction of `A_RR_floor` from the storey-below
roof area still applies.
Synthetic test pins a 1-storey + RR dwelling with 4 detailed surfaces
(slope/stud_wall/flat_ceiling/gable_wall) at hand-computed U-values
from Table 17 and Table 4, abs=0.001 tolerance.
Reference: RdSAP 10 (10-06-2025) §3.10 page 24-25; Figure 4; Table 17
page 44; Table 4 page 22.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extends `SapRoomInRoof` with six optional fields capturing the RdSAP10
§3.9.2 Simplified Type 2 lodgement: common_wall_length_m / height_m
plus two gable length/height pairs.
Type 2 fires when `common_wall_height_m` is set and < 1.8 m (otherwise
the space is a separate storey). Geometry per spec page 23:
A_common_wall = L × (0.25 + H)
A_gable = L × (0.25 + H_gable)
− Σ ((H_gable − H_common_wall_i)² / 2)
A_RR_final = A_RR − Σ A_common_wall − Σ A_gable
(− party / sheltered / connected when lodged, future
slice when a fixture exercises them)
Common walls and gables route to walls_w_per_k at U_main_wall (per spec:
"Common wall U-value is inferred from the U-value of the main wall in
the building part below"). A_RR_final routes to roof_w_per_k at
u_rr_default_all_elements (Table 18 col 4).
Synthetic test: 1-storey cavity-uninsulated dwelling at age B + RR
(floor 10 m², common_wall_length 5 m × 1 m height). Pins
walls_w_per_k = 60 × 1.5 + 6.25 × 1.5 = 99.375 W/K and
roof_w_per_k = 30 × 0.40 + 26.025 × 2.30 = 71.857 W/K at abs=0.001.
No production fixture exercises Type 2 yet — synthetic test is the
unit-level guard until a Type 2 cert lands in the corpus.
Reference: RdSAP 10 (10-06-2025) §3.9.2 page 22-23.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Surfaces four cert lodgements that the §2 ventilation cascade was
missing on the cert→inputs path. Without them, `cert_to_inputs` was
defaulting:
- extract_fans_count → 0 (PDF: 1-2 fans per fixture)
- percent_draughtproofed → 0 (PDF: 75-100% per fixture)
- sheltered_sides → 2 (PDF: 1-3 per fixture — hardcoded TODO)
- has_suspended_timber_floor → False (PDF: True on 000477/000487)
Net effect on (25)m monthly effective ACH ranged from -19% (000477)
to +5% (000490) → propagated 1:1 through HLC × ΔT → useful space heat
→ main + secondary fuel kWh → cost / SAP integer.
Schema:
- `SapVentilation` gains 4 new optional fields: `sheltered_sides`,
`has_suspended_timber_floor`, `suspended_timber_floor_sealed`,
`has_draught_lobby`. RdSAP cert lodges these but the type didn't
surface them.
- `cert_to_inputs.cert_to_inputs` reads them when set; falls back to
the SAP10.2 §2 worst-case defaults (sheltered=2, no timber floor,
no draught lobby) when the cert hasn't lodged. Removes the long-
standing `sheltered_sides=2` hardcode + 4 TODOs.
- `make_minimal_sap10_epc` accepts a `sap_ventilation` kwarg.
Per-fixture build_epc() updates lodge the U985 PDF values verbatim.
E2E pin: new parametrized test
`test_elmhurst_cert_to_inputs_monthly_infiltration_ach_matches_u985_
worksheet` asserts `inputs.monthly_infiltration_ach[m] == LINE_25_
EFFECTIVE_ACH[m]` at abs=1e-3 across all 6 fixtures + 12 months
(72 assertions). All pass.
Useful space heating drift:
000474: useful 10821.69 → 10765.85 (Δ -55.8 kWh vs PDF 10612.86 → +1.4% over, was +2.0%)
000490: useful 11262.05 → 11184.06 (Δ -78.0 kWh vs PDF 11183.28 → +0.007% — essentially exact)
SAP integer status:
000474: 62 = PDF 62 (delta 0) ✓
000490: 58 vs PDF 57 (delta 1; continuous 57.77 vs 57.40)
— remaining residual is pumps_fans hardcoded at 130 kWh
vs PDF 160 (Table 4f cascade not yet implemented → -£4 cost
+ 0.3 continuous SAP). Next slice.
Tightens `result.secondary_heating_fuel_kwh_per_yr` pin abs=10 → abs=0.1
(was loose to absorb the +0.7% useful overshoot which has now closed).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
SapFloorDimension gains an is_exposed_floor flag (default False) signalling
that the floor sits over outside air or unheated space rather than soil —
typical for an extension that hangs off the main from the first storey
upward (Elmhurst 000490 Extension 1 is exactly this shape).
heat_transmission_from_cert now consults the flag on the part's ground
SapFloorDimension and dispatches to u_exposed_floor (Table 20) instead
of the BS EN ISO 13370 / Table 19 cascade. Basement floor still wins
priority (Table 23 § 5.17 overrides everything else for that part).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Lands the production code that the just-committed Elmhurst conformance
fixtures (6455d48b) exercise: the SAP10.3 calculator orchestrator
(domain.sap.calculator.Sap10Calculator), the RdSAP-driven cert→inputs
mapper (domain.sap.rdsap.cert_to_inputs), and the EpcPropertyData
strict-type pass that P6.1 starts.
calculator.py is the entry point. Two surfaces depending on the caller's
shape:
- Sap10Calculator().calculate(epc) — full RdSAP mapper + worksheet loop
- calculate_sap_from_inputs(inputs) — pure physics over typed inputs
P6.1 introduces BuildingPartIdentifier as a strictly-typed replacement
for bare-string matching on SapBuildingPart.identifier (motivated by
the pain point at worksheet/dimensions.py:74-82). Two boundary factories
canonicalise raw inputs: from_api_string for the gov-EPC API, and
extension(n) for site-notes / construction id flows.
Also catches up two transitive deps that 6455d48b implicitly required
but I missed:
- ml/rdsap_uvalues.py — party-wall U-value rows that heat_transmission
resolves; the U=0.0 branch the 000516 fixture exercises lands here.
- ml/tests/_fixtures.py — make_minimal_sap10_epc that every Elmhurst
fixture imports. Without this catch-up, checking out 6455d48b in
isolation would ImportError.
Out of scope (will commit separately): ml/transform.py legacy envelope
drift; backend/ FastAPI + documents_parser layer; etl/ scratch.
824 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
15 new features wired through schema -> domain -> mapper -> transform:
Main Dwelling fabric (11):
- wall_insulation_type, wall_insulation_thickness_mm, wall_dry_lined,
wall_thickness_mm, party_wall_construction
- roof_insulation_location, roof_insulation_thickness_mm
- floor_construction, floor_insulation, floor_insulation_thickness_mm,
floor_heat_loss
Dwelling-level scalars (4):
- multiple_glazed_proportion, number_baths, number_baths_wwhrs,
extract_fans_count
Thickness strings like '50mm'/'NI'/'ND' parsed via _parse_thickness_mm; NI
(no insulation) lands as 0mm so the model sees the physical zero rather than
a missing value. Categorical sentinels ('NA'/'NI'/'ND') become None.
Also fixed long-standing typo `multiple_glazed_propertion` -> `_proportion`
in domain dataclass + its lone DB-model usage.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two production fixes surfaced by the live run:
- mapper.from_rdsap_schema_21_0_1 now sets the three ML target scalars
(energy_rating_current, co2_emissions_current, energy_consumption_current).
They were silently None for every cert before, leaving the only labels as
the kWh fields from renewable_heat_incentive.
- train_baseline coerces object-dtype columns to numeric (None -> NaN) and
drops rows with null target per fit, so LightGBM accepts the frame.
E2E on 500 real certs (~1s):
sap_score R^2=0.604 MAPE=0.084
co2_emissions R^2=0.813 MAPE=0.130
peui_raw R^2=0.979 MAPE=0.026
space_heating_kwh R^2=0.823 MAPE=0.213
hot_water_kwh R^2=0.519 MAPE=0.115
peui_ucl excluded: UCL correction still needs wiring.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Currently fails on SapWindow.glazing_gap (first of ~30 fields the dataclass
incorrectly treats as required). Will go GREEN once 14j sweeps Optional.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
SAP10 EPCs with measured PV carry photovoltaic_supply as a nested
list of arrays (peak_power, pitch, orientation, overshading) rather
than the legacy unmeasured wrapper {none_or_no_details:
{percent_roof_area: N}}. The schema-21 dataclasses now accept both
shapes via Union[PhotovoltaicSupply, List[List[PhotovoltaicArray]]],
and from_dict._coerce now dispatches list values onto list type
variants of multi-type Unions.
EpcPropertyData.SapEnergySource gains
photovoltaic_arrays: Optional[List[PhotovoltaicArray]] — populated
when the measured shape is present, otherwise None. The legacy
photovoltaic_supply field is preserved for the fallback case.
Both schema-21.0.0 and 21.0.1 mappers dispatch via the new
_map_schema_21_pv helper.
Unblocks Slice 11 (PV feature aggregation in EpcMlTransform).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Thirteen window-aggregate features land on the transform: count,
total area, eight SAP-octant area columns (N/NE/E/SE/S/SW/W/NW),
area-weighted draught-proofing pct, and area-weighted u_value +
solar transmittance (nullable, populated only when windows carry
transmission_details). Windows with orientation outside 1-8 (0,
NR) contribute to count and total area but no octant.
Also: epc codes CSV (gov api /api/codes export, RdSAP-Schema-21.x +
older versions) moved next to EpcPropertyData as epc_codes.csv —
canonical SAP enum source for upcoming categorical-share slices.
.gitignore exception added so the reference CSV is tracked.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>