Commit graph

202 commits

Author SHA1 Message Date
Khalim Conn-Kowlessar
2adff08210 Slice S0380.75: Wire Appendix H orchestrator into cascade; cert 000565 HW +272 → −69
Per SAP 10.2 §4 line (64)m: `(64)m = max(0, (62)m + (63a)m + (63b)m
+ (63c)m + (63d)m)` where (63c)m is the solar HW credit lodged as a
negative quantity. The cascade hardcoded (63c)m = 0 since S0380.66
when the Appendix H orchestrator landed without integration, pending
the 1.81× over-count resolution (closed in S0380.74).

This slice plumbs the orchestrator into `water_heating_from_cert`
via a new `solar_water_heating_monthly_kwh_override` parameter, and
adds `_solar_hw_monthly_override` in cert_to_inputs.py that drives
the orchestrator from RdSAP 10 §10.11 Table 29 defaults +
cert-lodged collector geometry on Elmhurst Summary §16.0.

RdSAP 10 §10.11 Table 29 row "Solar panel" (p.58, verbatim):
  "If solar panel present, the parameters for the calculation not
   provided in the RdSAP data set are:
   - panel aperture area 3 m²
   - flat panel, η₀ = 0.80, a₁ = 4.0, a₂ = 0.01
   - facing South, pitch 30°, modest overshading
   - …
   - pump for solar-heated water is electric (75 kWh/year)
   - showers are both electric and non-electric"

Lodged collector orientation / pitch / overshading on the Summary
§16.0 ("Are details known? Yes" branch) override South / 30° /
Modest. Aperture, η₀, a₁, a₂, IAM stay at Table 29 defaults — the
deeper thermal parameter lodgement (P960 worksheet) isn't yet in
the Summary extractor surface.

For (H17)m to include storage + primary + combi losses, the cascade
runs a `demand_pass` call without solar (gets (62)m) before sizing
the solar credit. The final call then uses all overrides.

Files:
- datatypes/epc/surveys/elmhurst_site_notes.py: Renewables gains
  `solar_hw_collector_orientation` / `_pitch_deg` / `_overshading`
  optional fields.
- datatypes/epc/domain/epc_property_data.py: same three fields
  added at the end of the dataclass.
- datatypes/epc/domain/mapper.py: from_elmhurst_site_notes
  propagates the three new fields.
- backend/documents_parser/elmhurst_extractor.py: §16.0 section
  parsing reads "Collector orientation" / "Collector elevation" /
  "Overshading" rows; `_parse_solar_pitch_deg` strips the degree
  glyph.
- domain/sap10_calculator/worksheet/water_heating.py: new
  `solar_water_heating_monthly_kwh_override` param on
  `water_heating_from_cert`; threaded into `output_from_water_
  heater_monthly_kwh(solar_monthly_kwh=...)`.
- domain/sap10_calculator/rdsap/cert_to_inputs.py: Table 29
  constants + `_solar_hw_monthly_override` helper +
  `_orientation_from_summary_string` mapper. Added the demand_pass
  intermediate call so (H17)m sees the full (62)m. Negates the
  orchestrator output at the boundary (spec convention: heat
  displaced from boiler is negative on line (63c)m).

Cert 000565 cascade pin shifts:
- hot_water_kwh_per_yr: +271.84 → −68.96 (4× closer)
- sap_score_continuous: +0.6334 → +0.7732 (drift downstream of HW)
- ecf: −0.0643 → −0.0784 (drift)
- total_fuel_cost: −56.08 → −68.36 (drift)
- co2: −19.77 → −22.66 (drift)
- sap_score (int): 29 EXACT (unchanged)
- space_heating / main_heating_fuel / lighting / pumps_fans:
  unchanged

The remaining −69 kWh HW residual is the gap between Table 29
defaults (H12 = 75 L separate tank) and cert 000565's lodged H12 =
53 L + combined cylinder 160 L. Closing this requires extracting
solar storage volume + combined-cylinder routing from the cert (P960
worksheet block lodges these explicitly; Summary doesn't). That's
the follow-on slice.

Test baseline: 547 pass + 9 expected `test_sap_result_pin[000565-*]`
fails preserved. Cohort-2 + ASHP cohort + all golden fixtures
untouched (no certs other than 000565 lodge `solar_water_heating =
True`).

Pyright net-zero on touched files (68 errors at baseline = 68 errors
post-change).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:47 +00:00
Khalim Conn-Kowlessar
2725ff505b Slice S0380.64: Elmhurst per-extension wall_construction mappings + strict-raise
Pre-S0380.64 the mapper silently fell through to wall_construction=None
on three Elmhurst code lodgements that the cohort PDFs use:

  - "SG Stone: granite or whinstone" (cert 000565 Ext1)
  - "B Basement wall" (cert 000565 Ext3 + Ext4)
  - "CF Cavity masonry filled" party wall (cert 000565 Ext1)

Cascade impact on cert 000565 (vs U985-0001-000565.pdf worksheet):
  - sap_score                30 → 29 EXACT (was Δ +1)
  - sap_score_continuous     30.23 → 29.14 (Δ +1.72 → +0.63)
  - space_heating_kwh_per_yr 57909 → 59274 (Δ −1100 → +266)
  - HTC                      1281 → 1321 W/K (was 234 W/K short
    of worksheet line 39 monthly avg 1515.38)

Spec basis:
  - SG → 1 (WALL_STONE_GRANITE per domain.sap10_ml.rdsap_uvalues)
    is the granite-specific Elmhurst variant of "ST Stone"; same
    SAP10 enum, no cascade behaviour change for stone walls.
  - B → 6 (BASEMENT_WALL_CONSTRUCTION_CODE per
    datatypes/epc/domain/epc_property_data.py:361) routes the
    cascade through `part.main_wall_is_basement` →
    `u_basement_wall(age_band)` per RdSAP 10 §5.17 / Table 23
    (heat_transmission.py:640). Empirically established from a
    2026 50k-bulk GOV.UK API sweep (88% co-occurrence with
    walls[].description = "Basement wall").
  - CF → 4 (Cavity, RdSAP 10 Table 15 row 3 spec U=0.20). The
    cascade's `u_party_wall` returns 0.0 / 0.5 / 0.25 for code 4
    today, so CF conservatively rounds up to the cavity-unfilled
    U=0.5 — matches the pre-existing
    `_API_PARTY_WALL_CONSTRUCTION_TO_SAP10[3]` approximation
    until `u_party_wall` gains a filled-cavity branch (TODO).

Strict-coverage gate per [[reference-unmapped-api-code]] mirror:
`_elmhurst_wall_construction_int` and
`_elmhurst_party_wall_construction_int` now raise
`UnmappedElmhurstLabel` on a non-empty Elmhurst code that isn't in
the lookup dict, rather than silently returning None. Empty
lodgings (absent fields) continue to return None — the cascade's
own defaults apply. The silent-None failure mode is what hid cert
000565's ~300 W/K cascade fabric-loss gap from the audit chain
until the S0380.64 space-heating residual probe surfaced it.

Cohort coverage swept: every Summary PDF in the test fixtures
folder lodges only {SO, CA, CW, SG, B} wall types and
{'', S, U, CU, CF} party-wall types — the new dict entries cover
all observed codes, so strict-raise does not regress any cohort
fixture (478 pass, 9 expected 000565 cascade-gap fails; was 427
pass + 10 fails per HANDOVER_CERT_000565_COST_CASCADE.md).

Pyright net-zero on touched files (mapper.py 32 → 32 errors;
test_summary_pdf_mapper_chain.py 13 → 13 errors — all pre-existing
in unrelated sections).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:47 +00:00
Khalim Conn-Kowlessar
71d9738749 docs: flag deferred HP-on-E7 Table 12a + Table 4f pumps_fans cascade gap
Cert 000565 reveals a coupling between two SAP 10.2 cascade gaps
that prevents an isolated fix to either:

1. `_space_heating_fuel_cost_gbp_per_kwh` applies the E7 low-rate
   override to any electric main on a Dual meter. Per Table 12a,
   heat pumps on E7 use a ~33% high / 67% low split (cert 000565
   empirically) — NOT 100% low. The current binary all-low/all-high
   biases space-heating cost £-1.1k / £+1.3k respectively.

2. `_PUMPS_FANS_KWH_BY_MAIN_CATEGORY[4] = 0` for HPs (Table 4f says
   the circulation pump is in the COP). But certs with MEV / flue
   fans / solar HW pumps have those components added on top — cert
   000565's worksheet pin = 127.5 MEV + 45 flue + 80 solar = 252.5
   kWh, none of which the cascade currently sums.

Probed a fix that derives `main_heating_category=4` from
`sap_main_heating_code in {211-227, 521-527}` (the Table 4a HP
rows) and exempts category=4 from the off-peak override. The
mapper change is architecturally correct but coupling to (1) +
(2) leaves residuals worse at HEAD than at the prior commit — so
both edits are reverted and the spec rationale is folded into
TODO docstrings on the two helpers:

- `_elmhurst_main_heating_category` (mapper) — flags the deferred
  HP SAP code route + the two cascade prerequisites
- `_space_heating_fuel_cost_gbp_per_kwh` (cascade) — flags the
  Table 12a high/low split as a future cascade slice

Cohort regression check: 192 pass + 10 expected 000565 fails —
identical baseline to S0380.59. Docs-only, pyright net-zero.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:47 +00:00
Khalim Conn-Kowlessar
10437143c4 Slice S0380.58: Elmhurst per-extension Room(s) in Roof extraction + TFA fix
Cert 000565 surfaced a per-extension Room(s) in Roof coverage gap.
§4 Dimensions lodges an RR floor area for every BP (Main + each
extension) and §8.1 lodges full construction details per BP. The
old extractor parsed RR from §4 + §8.1 for Main only — the 4
extensions' RR areas (34 + 5 + 32 + 2 = 73 m²) were silently
dropped, leaving TFA at 246.91 m² vs the worksheet's 319.91 m²
(23% deficit).

Schema:
- `ExtensionPart.room_in_roof: Optional[RoomInRoof] = None` field.
  None for single-storey extensions (no RR lodged); populated for
  every extension that lodges a §4 RR floor area > 0.

Extractor:
- `_room_in_roof_from_bodies(dim_body, rir_body, age_band)`
  parameterises the previously Main-only `_extract_room_in_roof`
  so the same parsing applies to each extension.
- `_extract_extensions` now slices §8.1 by BP (alongside the
  existing §4/§7/§8/§9 slicing) and reads each extension's RR age
  band from §3's "<N>th Ext. Room(s) in Roof <band>" line via a
  new regex.
- A new defensive "§4 lodges RR area but §8.1 has no construction
  details" branch returns a partial `RoomInRoof` with empty surfaces
  so the cascade still attributes the floor area to TFA. (Not
  triggered on 000565 — all 5 BPs lodge construction details — but
  needed for older Elmhurst variants per the existing extractor
  comment style.)

Mapper:
- `_map_elmhurst_building_parts` now passes each extension's
  `room_in_roof` through `_map_elmhurst_room_in_roof` to the
  extension's `SapBuildingPart.sap_room_in_roof`. Previously the
  loop hardcoded the field as None.
- `total_floor_area_m2` derivation now also sums each extension's
  `room_in_roof.floor_area_m2`. Without this, the per-BP RR floor
  area is lodged on the BP but the cert's top-level TFA stays at
  the pre-fix value.

Cert 000565 cascade impact:
- TFA: 246.91 → 319.91 ✓ (matches U985-0001-000565.pdf Block 1)
- space_heating_kwh_per_yr:  Δ −9,107.71 → −1,099.50  (88% reduction)
- main_heating_fuel_kwh_per_yr: Δ −5,357.47 → −646.76  (88% reduction;
    space_heating × 1/HP COP — main_heating tracks space_heating)
- lighting_kwh_per_yr:       Δ −236.19 → +2.18  (essentially closed —
    RdSAP §12-1 lighting is TFA-proportional)
- hot_water_kwh_per_yr:      Δ +214.50 → +271.84
- co2_kg_per_yr:             Δ −1,438.16 → −751.06
- total_fuel_cost_gbp:       Δ −1,055.62 → −564.05
- sap_score_continuous:      Δ +1.70 → +6.75  (cost/TFA dropped because
    cost rose ~14% but TFA rose ~30% — the remaining −564 cost gap
    has to close before SAP catches up)

Single-storey-extension certs: `room_in_roof=None` for each extension
(no §4 RR lodgement), no behavioural change. Cohort regression check:
415 pass + 10 expected 000565 fails — no regression on the 14 Summary
fixtures + JSON fixtures that don't carry per-extension RR.

Pyright net-zero on all 3 touched files (32 / 0 / 0).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:47 +00:00
Khalim Conn-Kowlessar
358b4dcd01 Slice S0380.57: Elmhurst mapper infers electricity fuel for electric SAP main heating codes
Elmhurst §14.0 leaves "Fuel Type" empty for electric main heating
systems (heat pumps, electric boilers, storage heaters, electric
underfloor, warm-air HPs) — the SAP code identifies the carrier
directly. The mapper was reading the empty string via
`_elmhurst_main_fuel_int(mh.fuel_type)` → None, and downstream
`_main_fuel_code` returned None, so Table 32 unit-price lookups
defaulted to mains gas. Cert 000565 (HP Main 1, SAP code 224) was
being charged 29,353 kWh/yr of electricity at the gas tariff —
£0.0364/kWh instead of £0.165/kWh.

New `_ELECTRIC_SAP_MAIN_HEATING_CODES` frozen set covers the Table
4a electric carrier rows:
  191-196  Electric boilers
  211-217, 221-227  Heat pumps (224 = ASHP 2013+, 1.70 COP)
  401-409  Electric storage heaters
  421-425  Electric underfloor heating
  521-527  Warm-air heat pumps

Inference fires in both Main 1 (`_map_elmhurst_sap_heating`) and
Main 2 (`_map_elmhurst_main_heating_2`) construction paths — when
`_elmhurst_main_fuel_int(fuel_type)` returns None AND the SAP code
is in the electric set, fall back to `_STANDARD_ELECTRICITY_FUEL_
CODE = 30` (Table 12 row "Electricity, standard tariff").

Cert 000565 cascade impact (compounding with S0380.56):
- sap_score:                 71  → 30  (target 29 → Δ +1.7;  was Δ +44)
- sap_score_continuous:      71.42 → 30.21  (target 28.51 → Δ +1.70; was Δ +42.91)
- ecf:                        2.05 → 5.22  (target 5.39  → Δ −0.17; was Δ −3.34)
- total_fuel_cost_gbp:    1,423.80 → 3,624.64 (target 4,680.26 → Δ −1,055.62; was Δ −3,256.46)
- co2_kg_per_yr:          7,181.62 → 5,009.47 (target 6,447.63 → Δ −1,438.16; was Δ +733.99)
                          (now undershooting — independent cascade gap
                           around Table 12d monthly electric CO2 factor
                           interpolation; separate slice)

Single-main non-HP certs: no behavioural change (`fuel_type` lodged
explicitly for gas/oil boilers → `_elmhurst_main_fuel_int` returns
non-None → inference branch not entered). Cohort regression check:
472 pass + 10 expected 000565 fails — no regression.

Spec source: SAP 10.2 Table 4a main heating SAP codes + Table 12 fuel
codes (electricity, standard tariff = 30). Heat-pump cohort efficiency
values cross-referenced in `domain/sap10_ml/sap_efficiencies.py:42-44`.

Pyright net-zero on mapper.py (32 / 32).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:47 +00:00
Khalim Conn-Kowlessar
6d82f8842d Slice S0380.54: Elmhurst §14.1 Main Heating2 extraction + 2nd MainHeatingDetail
Cert 000565 lodges §14.1 Main Heating2 as PCDB 15100 (Vaillant Ecotec
plus 415, 88%, mains gas, 0% space heat) — this is the system that
services DHW via `Water Heating SapCode 914` ("from second main
system"). The previous extractor / mapper shape supported only ONE
main heating system, dropping Main 2 entirely.

New shape:
- `MainHeating2` dataclass (slim §14.1-shaped: PCDB ref, fuel type,
  flue type, fan_assisted_flue, percentage_of_heat, SAP code)
- `MainHeating.main_heating_2: Optional[MainHeating2]` — None when
  §14.1 is absent OR lodges only placeholder zeros (the PCDB-only
  convention; the two JSON fixtures + 14 existing Summary fixtures
  all lodge "0 / 0" for an absent Main 2)
- `_extract_main_heating_2` parses §14.1; returns None when neither
  PCDB ref nor SAP code identifies Main 2
- `_map_elmhurst_main_heating_2` builds `MainHeatingDetail` from the
  Main 2 lodgement with `main_heating_number=2` and `main_heating_
  fraction=percentage_of_heat`; strict-raises `UnmappedElmhurstLabel`
  (mirroring Slice S0380.53's Main 1 raise) when Main 2 has neither
  identifier — surfaces coverage gaps at extraction time

Per RdSAP convention "0%" is lodged without a space (vs Main 1's
"100 %" with a space) — robust percentage parse via `rstrip("%")` so
both forms thread through.

Cohort impact:
- 14 existing Summary PDF fixtures + 2 JSON fixtures: Main 2 returns
  None (placeholder zeros) → no 2nd MainHeatingDetail produced → no
  cascade behaviour change (regression-tested: 415 pass + 10 expected
  000565 fails, identical to S0380.53 baseline)
- Cert 000565: 2nd MainHeatingDetail now lodged with sap_code=None,
  pcdb=15100 (Table 105 gas-boiler 88% efficiency), category=2,
  fuel=26 (mains gas), fraction=0

Cascade still uses Main 1 for water-heating efficiency in the WHC
914 branch — that routing fix is the next slice. This commit is
the plumbing-only half; the SAP-result pin residuals are unchanged
at HEAD because the cascade hasn't been wired to read Main 2 yet.

Pyright net-zero on all 3 touched files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:47 +00:00
Khalim Conn-Kowlessar
043620802f Slice S0380.53: Elmhurst §14.0 "Main Heating SAP Code" extraction + strict-raise
Cert 000565 surfaced an Elmhurst extractor schema gap. §14.0 lodges
"Main Heating SAP Code 224" identifying Main 1 as an Air Source Heat
Pump (SAP 10.2 Table 4a row 224: "Air source heat pump, 2013 or
later") — but the extractor was dropping the line. The mapper
therefore produced a `MainHeatingDetail` with `sap_main_heating_code
= None` AND `main_heating_index_number = None` (because `PCDF boiler
Reference = 0` for HP certs), leaving the cascade to fall back to
the 0.80 gas-boiler default efficiency.

Cascade impact on cert 000565 main_heating_fuel_kwh_per_yr pin:
- Before: actual 62,375.80 kWh/yr (= 59,008 / 0.80 wrong default)
  Δ +27,665.01 vs U985-0001-000565.pdf expected 34,710.79
- After:  actual 29,353.32 kWh/yr (= 59,008 / 1.70 HP COP via §A4.1)
  Δ −5,357.47 (remaining gap is on the space_heating side, not
  heating efficiency)

The strict-raise mirrors [[unmapped-api-code]] (Slice S0380.51) and
[[unmapped-elmhurst-label]] (cylinder size / glazing type) — when
neither the §14.0 SAP code nor the PCDB boiler reference identifies
Main 1, the mapper raises `UnmappedElmhurstLabel("main_heating",
...)` so the coverage gap surfaces at extraction time instead of as
an opaque downstream SAP delta. Per user end-of-S0380.52 directive:
"if we're missing mapping on EpcPropertyDataMapper - let's raise an
exception".

Spec source: SAP 10.2 §A4 Appendix A "Heat pump cascade", Table 4a
row 224 (Air source heat pump, 2013 or later) — `seasonal_efficiency`
reads the SAP code when no PCDB Table 105/362 record overrides.

Touched:
- datatypes/epc/surveys/elmhurst_site_notes.py: `MainHeating.
  main_heating_sap_code: Optional[int]` field added (treat 0 as None
  per Elmhurst convention — PCDB-listed boilers lodge §14.0 SAP code
  as 0 and identify themselves via the PCDB index instead)
- backend/documents_parser/elmhurst_extractor.py:
  `_extract_main_heating` reads §14.0 "Main Heating SAP Code" via the
  existing `_local_val` slice helper; 0/absent → None
- datatypes/epc/domain/mapper.py: `_map_elmhurst_sap_heating` passes
  `sap_main_heating_code=mh.main_heating_sap_code` to
  `MainHeatingDetail`, and raises `UnmappedElmhurstLabel` when
  neither identifier resolves

Cohort regression check: 415 pass + 10 expected 000565 failures
(unchanged from S0380.52 — same pins, different residuals). Pyright
net-zero on all 3 touched files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:47 +00:00
Khalim Conn-Kowlessar
c52e750bb2 Slice S0380.52: cert 000565 Elmhurst-only mapper-driven cascade pin + glazing-label coverage
User pivot at end of prior session: don't hand-build EpcPropertyData
fixtures — route Summary PDFs through `EpcPropertyDataMapper.from_
elmhurst_site_notes` so the pin grid exercises extractor + mapper +
calculator, and each new Elmhurst doc grows mapper coverage instead
of bespoke fixture code.

New fixture cert 000565 is a stress-test cert (5 building parts, age
mix A→J, conservatory with heaters, curtain wall, basement walls,
mixed party-wall constructions) that surfaces many uncommon cascade
paths absent from the cohort-2 + ASHP corpus.

Mapper coverage extended for 3 Elmhurst §11 glazing labels surfaced
on this cert (per RdSAP-Schema-21.0.1, `datatypes/epc/domain/
epc_codes.csv` glazed_type rows):

  "Triple between 2002 and 2021": 9  (RdSAP-21 schema row 9 — triple
       glazing, installed 2002-2022 in EAW; `_G_PERPENDICULAR_BY_
       GLAZING_TYPE[9] = 0.68`, `_G_LIGHT_BY_GLAZING_CODE[9] = 0.70`)
  "Single glazing": 1                (alias of bare "Single"; cascade
       g_L = 0.90, g⊥ = 0.85 per SAP 10.2 Table 6b)
  "Double glazing, known data": 3    (Elmhurst lodgement of RdSAP-21
       schema row 7 "double, known data"; manufacturer U-value and
       g-value lodged via WindowTransmissionDetails override the
       cascade's defaults — grouped under code 3 with other unknown-
       date DG variants for cascade-equivalence on g_L/g⊥)

Per [[feedback-e2e-validation-philosophy]] + [[feedback-zero-error-
strict]]: pin tolerances are abs=1e-4 against U985-0001-000565.pdf
Block 1 line refs (pinned: SAP int + SAP continuous + ECF + total
fuel cost + CO2 + space heating + main 1 fuel + secondary fuel +
hot water + lighting + pumps/fans).

Outcome: 1/11 pin green (`secondary_heating_fuel_kwh_per_yr = 0`);
10 pins are now named calculator-gap residuals to fix in subsequent
slices:

  main_heating_fuel_kwh_per_yr  +27,665.01 kWh/yr  (heat-pump SAP code
      224 + gas combi via WHC 914 "from second main"; cascade probably
      runs ASHP for DHW instead of routing through gas combi)
  hot_water_kwh_per_yr             +164.88 kWh/yr  (FGHRS / solar HW /
      Table 3a no-keep-hot for the gas combi DHW path)
  lighting_kwh_per_yr              -236.19 kWh/yr  (RdSAP §12-1 bulb-
      count cascade; 27 total / 7 low-energy / 20 incandescent lodged)
  pumps_fans_kwh_per_yr            -122.52 kWh/yr  (cascade defaults
      to 130; expected 252.52 = MEV PCDF 500755 + flue + solar pump)

Cohort regression check: 472 pass + 10 expected 000565 failures.
Pyright net-zero (32 errors before, 32 after).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:47 +00:00
Khalim Conn-Kowlessar
730050d72a Slice S0380.51: strict-raise UnmappedApiCode on API integer enums
Mirrors the Elmhurst `UnmappedElmhurstLabel` coverage gate on the
GOV.UK API path. The same failure mode (silently routing an unknown
enum to a default / None hides cascade gaps until a downstream SAP-
delta investigation surfaces them) was hitting the API mapper:
existing helpers like `_api_floor_construction_str` returned None on
unrecognised codes per the comment "Only the values observed across
the 10 golden fixtures (1, 2) are mapped; unrecognised codes fall
through to None."

Adds `UnmappedApiCode(ValueError)` at the API mapper boundary and
threads it through five strict helpers:

- `_api_party_wall_construction_int`     (RdSAP10 Table 15)
- `_api_floor_construction_str`          (Slice 88 floor signal)
- `_api_floor_type_str`                  (RdSAP10 §5 rule (12))
- `_api_roof_construction_str`           (Slice 89 cos(30°) factor)
- `_api_sheltered_sides`                 (SAP10.2 §S5)

Each helper distinguishes:
- "lodging absent" → return None (unchanged behaviour)
- "lodging present and mapped" → translate (unchanged behaviour)
- "lodging present but unrecognised" → raise UnmappedApiCode (NEW)

Two coverage gaps surfaced immediately at strict-run, both fixed
in the same slice with the worksheet-backed lodged-floor descriptions:

1. `floor_heat_loss=2` — cert 7536 Main lodges this (floors[]
   description "To unheated space, insulated"); also lodged on cert
   2031 / etc. Added mapping → "To unheated space".
2. `floor_heat_loss=3` — cert 7536 Ext2 lodges this with the same
   floors[] description as Main code 2 — same cascade signal.
3. `floor_heat_loss=6` — cert 9501 + cert 9390 (top-floor flats)
   lodge this with floors[] description "(another dwelling below)".
   The cascade routes party-floor handling via property_type=Flat +
   cert.floors[] description independently of this string, so the
   explicit None entry preserves the cascade match (cert 9501 stays
   at exact 1e-4 SAP vs worksheet 68.5252) while distinguishing
   "decided no string" from "unknown".

Six new tests document the contract:
- Five unit tests inject an out-of-range integer (99) into a real
  cohort cert JSON and assert UnmappedApiCode raises with the right
  `field` and `value`.
- One coverage forcing function (`test_all_golden_fixtures_extract
  _via_api_without_unmapped_code_raise`) loops every JSON under
  `fixtures/golden/` through `from_api_response` and asserts no
  raise — future fixtures with unmapped enums fail this test until
  a dict entry is added.

763 → 769 pass + 0 fail (5 unit + 1 cohort-coverage test added).
Pyright net-zero (32 → 32 baseline preserved).

The pattern is ready to extend to other silently-falling-through
helpers — e.g., `_api_glazing_transmission` (codes 4-12, 15+ noted
in the existing comment as "not yet mapped — incremental coverage
as new fixtures surface them"), `_api_cascade_glazing_type` (pass-
through is intentional, so probably leave alone). Each addition
is its own slice.
2026-06-01 16:28:47 +00:00
Khalim Conn-Kowlessar
2805e13d4d Slice S0380.48: surface real-API pv_batteries[].battery_capacity (5 kWh)
The 7-cert ASHP+battery PE cluster was overshooting by +2.7..+8.1 kWh/m²
after the PE β-split landed in S0380.45. The handover hypothesised an
E_PV magnitude bug ("cascade thinks 2570 kWh/yr vs worksheet 831"). The
worksheet PDF for cert 0380 (dr87-0001-000899.pdf line 233) was
verified to show **-2563.3692** kWh/yr — matching our cascade. The
real bug was different: the **5-kWh battery wasn't reaching the
cascade**, so β-coefficients used the no-battery branch (C1=1.61,
β≈0.36) instead of the 5-kWh branch (C1=1.12, β≈0.75).

Per SAP 10.2 Appendix M1 §3c-d (p.94): "C_bat is the usable capacity
of the battery in kWh, limited to a maximum value of 15 kWh. C_bat=0
if no battery present." Cert 0380 lodges `pv_battery_count: 1` and
`pv_batteries: [{"battery_capacity": 5}]` — but the schema's
`PvBatteries` dataclass had only `pv_battery: Optional[PvBattery]`,
matching the older synthetic fixture shape (nested
`{"pv_battery": {"battery_capacity": 5}}`). The real-API payload's
flat `battery_capacity: 5` was silently dropped during `from_dict`.

Two surgical changes:
- `datatypes/epc/schema/rdsap_schema_21_0_1.py`: add
  `battery_capacity: Optional[float] = None` as a sibling to
  `pv_battery` on `PvBatteries`. Synthetic-shape certs continue to
  populate the nested form; real-API certs now populate the flat form.
- `datatypes/epc/domain/mapper.py:_first_pv_battery`: prefer nested
  when present, fall back to the flat lifted field. Domain still
  exposes a single uniform `PvBatteries(pv_battery=PvBattery(...))`
  shape downstream.

Cohort impact (PE residual kWh/m² vs worksheet):

| Cert | Pre-S0380.48 | Post-S0380.48 |
|---|---:|---:|
| 0350 | +2.73 | -3.58 |
| 0380 | +8.09 | -4.01 |
| 2225 | +4.48 | -4.50 |
| 2636 | +3.42 | -4.14 |
| 3800 | +3.58 | -4.01 |
| 9285 | +3.20 | -3.46 |
| 9418 | +4.67 | -3.76 |

Cluster magnitude dropped from +2.7..+8.1 to -3.5..-4.5 — the cascade
now over-credits PV by ~4 kWh/m² (vs previously under-crediting by
~5 kWh/m²). The residual flipped sign because cascade β=0.75-0.81
slightly exceeds worksheet β=0.74 (read from page-3 line 233a/233b
ratio 1903.39/2563.37 = 0.7426). The remaining ~4 kWh/m² under-shoot
traces to two structural factors deferred until a fresh closure
slice ships:

1. The synthetic-default `pv_export_primary_factor = 0.501` is the
   annual Table 12 code-60 value. The worksheet uses the effective
   monthly Table 12e factor weighted by E_PV,ex,m (cert 0380: 0.4268
   = -0.074 differential). The cascade's `_effective_monthly_pe_
   factor` already computes the same weighting for PV — but the
   calculator's PV PE credit reads `inputs.other_primary_factor`
   (=1.501) and `inputs.pv_export_primary_factor` (=0.501) directly,
   bypassing the per-end-use effective-monthly cascade.
2. Cascade β slightly higher than worksheet (0.751 vs 0.7426 on
   cert 0380) — likely a monthly-distribution detail in D_PV.

SAP scores remain exact across the cohort (residual +0 every cert).
CO2 residuals all <0.11 t/yr (well within the 0.001-tolerance pin
range after re-pin). 9501 (PV no battery) preserved at +0.255 PE /
-0.047 CO2 — no regression. Re-pins all 7 golden fixtures in the
same slice per [[feedback-commit-per-slice]].

Pyright net-zero on touched files (32 errors before, 32 after).
2026-06-01 16:28:47 +00:00
Khalim Conn-Kowlessar
276e435e6c Slice S0380.43: SAP 631 open-fire → House coal spec fuel — closes cert 2102
Cert 2102 lodges `secondary_heating_type=631` ("Open fire in grate"
per SAP 10.2 Appendix M Table 4a, BS EN 13229:2001 inset-appliance
class — solid fuel) but `secondary_fuel_type=33` (electricity, Table 32
off-peak 7hr) — physically incompatible (an open fire grate doesn't
run on electricity). The Elmhurst Summary path independently resolves
to Coal (Table 32 code 11) via the §15 "Secondary Fuel: Coal" lodgement
(see `test_summary_2102_secondary_heating_routes_house_coal_for_open_fire`).

API mapper now applies the same spec-derived default via the new
`_api_secondary_fuel_type` helper:

  - When `secondary_heating_type` is in the
    `_API_SECONDARY_HEATING_SPEC_FUEL` dispatch (currently {631: 11}),
    AND the lodged `secondary_fuel_type` is electric (codes 30-40),
    substitute the spec default (House coal).
  - Legitimate non-default solid-fuel lodgement (e.g. SAP 631 with
    lodged fuel_type=15 Wood logs) passes through unchanged.

The override is keyed on the heating-type → spec-fuel dispatch dict
(extend as new fixtures surface analogous inconsistencies), not a
blanket per-code rewrite — keeps the lodged data trusted by default
while spec-correcting the narrow class of inconsistent lodgements.

Applied at all 6 API schema-version mapping sites in `from_api_response`
via replace_all (lines 637/767/922/1080/1278/1544). Worksheet target
for cert 2102: line (242) "Space heating - secondary 3585.24 × 3.6700
= 131.58" confirms 3.67 p/kWh = Table 32 fuel code 11 (House coal).

Test impact:
  - Cohort-2 cert 2102 API path: -6.30 → +4.9e-5 (<1e-4 ✓).
    Moves from `_COHORT_2_API_OPEN` to `_COHORT_2_API_CLOSED`.
  - `_COHORT_2_API_OPEN` is now empty — the residual-pin test
    `test_api_cohort_2_open_cert_residual_matches_current_pin` is
    deleted (cohort fully closed; re-add if future cert surfaces).
  - Cohort-2 API path: **38/38 < 1e-4** matching Summary path 38/38.
    Cross-mapper parity at the cascade is fully established for
    cohort-2 per [[feedback-cross-mapper-parity-via-cascade]].
  - Cohort-1 ASHP 9/9 unchanged.

Test suite: 750 pass + 0 fail. Pyright net-zero on touched files
(mapper.py 32/32 baseline; chain test 0/0).

Spec citations:
  - SAP 10.2 Appendix M Table 4a code 631 "Open fire in grate"
    (Category C, Room heaters, eff 37/32%, solid fuel via BS EN
    13229:2001 inset-appliance class — see spec p.156).
  - SAP 10.2 Table 32 code 11 "House coal" 3.67 p/kWh.
  - Cert 2102 worksheet line (242) reproduces 131.58 = 35.84 × 3.67
    confirming house-coal pricing for the secondary cascade.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:47 +00:00
Khalim Conn-Kowlessar
d7cecf45f5 Slice S0380.41: GOV.UK RdSAP 21 glazing-type code 1 → DG pre-2002 cascade
Closes the cohort-2 API-path +0.42..+0.44 cluster (certs 0300/9380
closed to <1e-4; cert 1536 partially closed +0.4445 → +0.0015 — a
sub-2e-3 secondary tail remains for Slice S0380.42).

Root cause: per `datatypes/epc/domain/epc_codes.csv` the GOV.UK API
schema RdSAP-Schema-21.0.0 defines `glazed_type=1` as "double glazing
installed before 2002 in EAW, 2003 in SCT, 2006 NI". Three cohort-2
certs (0300/1536/9380) lodge this code with `glazing_gap=16+` and
description "Fully double glazed" — but the API mapper passed the
raw code straight through to SapWindow.glazing_type, and:

  1. `_api_glazing_transmission` had no (1, "16+") entry, so the
     U-value lookup returned None and the cascade defaulted to U=2.5
     instead of the spec-correct U=2.7 (RdSAP 10 Table 24 row 2,
     PVC/wooden frame, 16+ gap = 2.7).
  2. The cascade's `_G_LIGHT_BY_GLAZING_CODE` table is keyed on the
     SAP 10.2 Table 6b enum (the Elmhurst extractor produces this
     enum via `_ELMHURST_GLAZING_LABEL_TO_SAP10`), where code 1 means
     "single glazed" (g_L=0.90). Passing RdSAP 21 code 1 straight
     through gave the cascade the wrong g_L for the daylight factor
     calculation, off by 0.90 vs spec 0.80.

Both gaps closed in one slice because they're the same misinterpretation:

- `_API_GLAZING_TYPE_TO_TRANSMISSION` + `_API_GLAZING_TYPE_GAP_TO_
  TRANSMISSION` now alias code 1 as a schema sibling of code 3 — both
  resolve to RdSAP 10 Table 24 row 2 ("DG pre-2002 / unknown install
  date"). Per-gap entries cover the full 6mm=3.1 / 12mm=2.8 / 16+=2.7
  row; type-only fallback uses the 12mm default U=2.8.

- New `_API_TO_SAP10_CASCADE_GLAZING_CODE = {1: 2}` remap is applied
  in `_api_sap_window` AFTER the U-value lookup, so SapWindow.glazing_
  type carries the SAP 10.2 cascade enum (code 2 = DG pre-2002 air-
  filled, g_L=0.80) while the U lookup stays keyed on the raw GOV.UK
  API code. The cohort-1 codes 2/3/13/14 already coincide with the
  cascade table's intended SAP 10.2 g_L values, so no remap entry
  required for them; only divergent codes get a remap.

Test impact:
  - Cohort-2 API path: 34/38 → 36/38 at 1e-4 (0300 +4.8e-5; 9380 -5e-6
    both move from _COHORT_2_API_OPEN to _COHORT_2_API_CLOSED).
  - Cert 1536 pin updated from 66.337334 to 65.894324; ws Δ now +0.0015
    (was +0.4445) — same root-cause fix dominated, residual tail is
    distinct-cause work for the next slice.
  - Cert 2102 unchanged (-6.30 residual, secondary-heating routing gap).
  - Cohort-1 (9 ASHP certs) unaffected: 9/9 still < 1e-4 on both paths.

Test suite: 750 pass + 0 fail. Pyright net-zero per touched file.

Spec citations:
  - RdSAP-Schema-21.0.0 glazed_type=1 → datatypes/epc/domain/epc_codes.csv
  - RdSAP 10 Specification §8.2 Table 24 (p.49) row 2 "Double glazed:
    Installed England/Wales before 2002 / Scotland before 2003 /
    N. Ireland before 2006" — U=2.7 (PVC/wooden, 16+ gap).
  - SAP 10.2 Table 6b: DG air-filled g_L=0.80 (vs single 0.90).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:47 +00:00
Khalim Conn-Kowlessar
9fbbad9068 Slice S0380.26: RdSAP10 §5.8 dry-lining adjustment on alt walls — closes cert 7700 -0.44 → +5e-5
Per RdSAP10 §5.8 final note + Table 14 page 41:

  "For drylining including laths and plaster use Rinsulation = 0.17 m²K/W."

Applied additively to the base U-value of an otherwise-uninsulated wall:

  U_adjusted = 1 / (1/U_base + 0.17)  — rounded to 2 d.p. half-up.

Closed form for the cohort fixture (cavity-as-built age C, U_base=1.5):

  1 / (1/1.5 + 0.17) = 1.19522... → 1.20 ✓ matches worksheet

Cert 7700-3362-0922-7022-3563 (Summary_000905.pdf / dr87-0001-000905.pdf)
is an End-Terrace house age C lodging:
  - Main wall: CavityWallDensePlasterDenseBlock, Filled Cavity, U=0.70
  - Alt wall 1: 14.44 m² Cavity As-Built, Dry-lining: Yes (worksheet
    `CavityWallPlasterOnDabsDenseBlock`, U=1.20)

Pre-slice the Elmhurst alt-wall mapper hard-coded `wall_dry_lined="N"`
and the cascade ignored the field everywhere — alt-wall U routed to the
cavity-as-built default (1.50), giving fabric (33) 148.72 W/K vs
worksheet 144.38 (Δ +4.33 W/K = ~+0.44 SAP). Worksheet "SAP value" line
lodges unrounded SAP 63.4425.

Implementation:
  1. `AlternativeWall.dry_lined: bool = False` on the Elmhurst surveys
     dataclass.
  2. Elmhurst extractor reads "Alternative Wall N Dry-lining: Yes/No"
     into the new field.
  3. `_map_elmhurst_alternative_wall` propagates `wall_dry_lined="Y"`
     instead of the hard-coded "N".
  4. `u_wall` gains a `dry_lined: bool = False` kwarg and a single
     §5.8 adjustment site at the as-built bucket (bucket=0). Insulated
     buckets already absorb the dry-lining R via Table 14.
  5. `_alt_wall_w_per_k` passes `dry_lined=alt_wall.wall_dry_lined == "Y"`.

Scope is the alt-wall path only — main BPs in the corpus all lodge
`wall_dry_lined="N"` (or the Summary PDF omits the field for the main
wall), so the main-wall call site is untouched. Conservative regression
posture per the user's strict cohort-pin convention.

Cohort-2 outcome (38 certs, Summary path):
  exact (<1e-4): 22 → **23**  (+1: cert 7700 -0.44 → +4.87e-05)
  0.07..0.5:      1 → **0**   (-1: cert 7700 closes out)
  0.5..1:         1 → 1       (cert 9796 unchanged — MIT precision floor)
  RAISES:         0 → 0

Cohort-1 ASHP cohort untouched: all certs lodge wall_dry_lined="N", so
the alt-wall call site short-circuits to the original cascade. Verified
no regressions across the 22 previously-exact cohort-2 certs either.

Pyright net-zero on all 8 touched files (183 → 183).

Tests: 704 → 708 pass (+4 new: u_wall §5.8 adjustment fires
correctly; cavity-as-built unchanged without flag; insulated bucket
unaffected by flag; heat_transmission alt-wall delta = 14.44 × 0.30
W/K; cert 7700 full chain hits worksheet 63.4425 at < 1e-4),
10 expected fails unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
5402dd17e1 Slice S0380.24: SAP code 631 → house coal secondary fuel — closes cert 2102 -15.81 → +5e-5
Per SAP 10.2 spec page 165 Table 4a Category 10 (Room heaters), the
600-range secondary-heating SAP codes split by fuel:
  601-613: Gas (mains gas / LPG / biogas) — column A is mains gas.
  621-625: Liquid fuel room heaters (oil / bioethanol).
  631-634: Solid fuel room heaters (open fire, closed room heater
           with/without boiler) — house coal is the modal default.
  691-699: Electric room heaters.

`_elmhurst_secondary_fuel_from_sap_code` previously mapped the entire
601-630 range to mains gas (API code 26). Two bugs:
  1. Codes 621-625 are oil heaters, not gas. (Cohort hasn't surfaced
     an oil-secondary cert yet — deferred until a fixture exercises.)
  2. Codes 631-634 are solid fuel, not gas, and weren't in the range
     at all. Cascade fell through to the secondary-fuel-None default
     (standard electricity at 13.19 p/kWh), over-charging cert 2102's
     "Open fire in grate" secondary by ~£340/yr.

Narrow the gas range to 601-613 (per the spec) and add 631-634 → API
fuel code 11 (Coal in `_ELMHURST_MAIN_FUEL_TO_SAP10`) → Table 32
direct lookup returns 3.67 p/kWh (house coal), matching worksheet
(242) "Space heating - secondary 3585.2401 × 3.6700 = 131.58".

Cohort-2 outcome (38 certs, Summary path):
  exact (<1e-4): 20 → **21**  (+1: cert 2102 -15.81 → +5e-5)
  ±5+:           1 → **0**    (last big-gap closed)

Cert 2102 verified end-to-end:
  - secondary_heating_type=631 → secondary_fuel_type=11 → 3.67 p/kWh
  - Cascade SAP 63.8732 vs worksheet 63.8732 (delta +5e-5)
  - Cascade total fuel cost £787.03 = worksheet £787.03 exactly

Pyright net-zero on both touched files (mapper.py 32→32, test 0→0).

Tests: 703 → 704 pass (+1 new SAP-code-631 secondary-fuel routing
test), 10 expected fails unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
9a091234cf Slice S0380.23: RdSAP §11.1 b) PV %-of-roof-area synthesis — closes cert 6835 -13.37 → +0.72
RdSAP 10 specification page 60 §11.1 b) (Photovoltaics): "If the kWp
(or DNC) is not known use the following: PV area is roof area for
heat loss (before amendment for any room-in-roof), times percent of
roof area covered by PVs, and if pitched roof divided by cos(35°).
If there is an extension, the roof area is adjusted by the cosine
factor only for those parts having a pitched roof. kWp is 0.12 ×
PV area. If not provided in the RdSAP data set then facing South,
pitch 30°, modest overshading."

Wire-through:
  1. `Renewables.pv_percent_roof_area: Optional[int]` — new field on
     the Elmhurst site-notes dataclass.
  2. Elmhurst extractor `_extract_renewables` parses Summary §19.0
     row "Proportion of roof area" (cert 6835: "40").
  3. Elmhurst mapper `from_elmhurst_site_notes` surfaces it through
     `epc.sap_energy_source.photovoltaic_supply.none_or_no_details
     .percent_roof_area` — mirrors the API mapper's lodgement shape.
  4. `cert_to_inputs._synthesize_pv_arrays_from_percent_roof_area`
     synthesizes a single PV array via the spec formula when
     `photovoltaic_arrays` is empty AND a `percent_roof_area > 0`
     lodgement is present. Fires inside
     `_pv_generation_kwh_per_yr`, so both rating + demand cascades
     pick it up.

Cohort-2 outcome (38 certs, Summary path):
  exact (<1e-4): 20 → 20
  ±0.07..0.5:   1 → 1
  ±0.5..1:      1 → **2**  (cert 6835 closes -13.37 → +0.72)
  ±1..5:        1 → 1
  ±5+:          2 → **1**  (-1: cert 6835 moves out of big-gap band)

Cert 6835 verified end-to-end:
  - kWp = 0.12 × 36.9 × 0.40 / cos(35°) = 2.1622
    (worksheet "Cells Peak = 2.16, Orientation = South, Elevation =
    30°, Overshading = Modest")
  - Cascade PV generation = 1493.88 kWh/yr vs worksheet 1492.33
    (<0.1% delta — kWp-rounding artefact).
  - Cascade SAP 80.92 vs worksheet 80.20 (+0.72, in the ±0.5..1 band).

The residual +0.72 likely traces to the PV-cost cascade's
used-in-dwelling / exported split rather than the synthesis — the
kWh figure is within rounding of the worksheet.

Pyright per-file: net-zero
  - cert_to_inputs.py 35 → 35
  - test_cert_to_inputs.py 13 → 13
  - mapper.py 32 → 32
  - elmhurst_site_notes.py 0 → 0
  - elmhurst_extractor.py 0 → 0

Tests: 702 → 703 pass (+1 new RdSAP §11.1 b synthesis test), 10
expected fails unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
15b3df1778 Slice S0380.19: count Elmhurst shower outlets by type (no more hardcoded 1)
Surfaces the lodged shower multiplicity from the Elmhurst Summary §16
on the EPC. Previously `_map_elmhurst_sap_heating` hardcoded:

  electric_shower_count = 1 if has_electric_shower else None
  mixer_shower_count    = 0 if has_electric_shower else None

losing the count for any cert with ≥ 2 outlets. Cert
7800-1501-0922-7127-3563 lodges TWO instantaneous electric showers
("Shower 01" + "Shower 11") but the mapper produced
`electric_shower_count=1`. After this slice:

  electric_shower_count = Σ(s for s in showers if s.outlet_type
                              == "Electric shower")
  mixer_shower_count    = Σ(s for s in showers if s.outlet_type
                              != "Electric shower")

**Cascade SAP effect:** None on cert 7800. Appendix J's eq J16
(`N_ES,per_outlet = N_shower / N_outlets`) and eq J18 (Σ_j E_ES,j)
are symmetric in N_electric_showers when there are no mixer outlets,
so the lodged (64a) kWh and (247a) cost are unchanged. The fix is
correctness-by-construction, not a delta-closer for the negative-band
certs (their +0.69 GBP total-cost gap traces to the gas hot-water
kWh path — separate slice).

**Hand-built fixture updates (5):** the cohort-1 hand-builts at
`domain/sap10_calculator/worksheet/tests/_elmhurst_worksheet_*.py`
previously omitted `electric_shower_count` / `mixer_shower_count`
(implicitly None), which matched the mapper's pre-slice None
sentinel. Updated each to the lodged counts the mapper now surfaces:
  000474: 1 mixer  → (0, 1)
  000477: 1 mixer  → (0, 1)
  000480: 1 mixer  → (0, 1)
  000490: 1 mixer  → (0, 1)
  000516: 1 mixer  → (0, 1)
000487 (already at (1, 0) for an electric-shower lodging) unchanged.

Tests:
- `test_summary_7800_two_electric_showers_count_as_two_not_one` —
  pins the multi-shower mapping for cert 7800 (Summary_000890.pdf).
- 5 hand-built field-parity tests
  (`test_from_elmhurst_site_notes_matches_hand_built_*`) now pass at
  the new integer counts instead of None.

Pyright net-zero per file:
- datatypes/epc/domain/mapper.py: 32 (baseline 32)
- backend/documents_parser/tests/test_summary_pdf_mapper_chain.py: 0

Regression baseline: 699 pass + 10 fail (= prior 698 + 10 + 1 new).

Spec refs:
- SAP 10.2 Appendix J §1a — outlet counting drives `N_outlets` used
  in eq J6/J7 (mixer shower water draw) and eq J16/J17/J18 (electric
  shower energy).
- Cert 7800-1501-0922-7127-3563 Summary §16 "Showers" lodgement.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
4cfec00f22 Slice S0380.17: map Elmhurst §11 glazing-type labels to SAP10 codes
Closes a systematic +0.02..+0.07 SAP over-prediction on every triple-
glazed cert in cohort 2 (13 of 38) and removes a silent-default
failure mode flagged via cert 3336-2825-9400-0512-8292 (+0.0674 Δ).

Root cause: `_map_elmhurst_window` (datatypes/epc/domain/mapper.py)
was passing the Elmhurst-lodged glazing-type string verbatim into
`SapWindow.glazing_type` (declared `Union[int, str]`). The §5 (66)..
(67) daylight-factor cascade at
`domain/sap10_calculator/worksheet/internal_gains.py:512` requires
`isinstance(w.glazing_type, int)` to look up Table 6b col light g_L —
string lodgings silently fell through to the `_G_LIGHT_DEFAULT = 0.80`
(double-glazed) branch. Cert 3336 (Triple glazed, worksheet "Window,
Triple glazed") got g_L = 0.80 instead of the correct 0.70, inflating
C_daylight from 1.072 to 1.041 → lighting kWh under-predicted by
−4.53 kWh/yr → total fuel cost under by −1.17 GBP → ECF Δ −0.0049 →
SAP continuous over by +0.0674.

Fix: `_ELMHURST_GLAZING_LABEL_TO_SAP10` dict + `_elmhurst_glazing_
type_code` helper translate the Elmhurst Summary §11 lodged strings
to the SAP 10.2 Table U2 integer codes the cascade keys on:

  "Single"                                          → 1
  "Double pre 2002"                                 → 2
  "Double between 2002 and 2021"                    → 3
  "Double with unknown install date"                → 3
  "Double with unknown 16 mm or install date more"  → 3
  "Double post or during 2022"                      → 5
  "Triple post or during 2022"                      → 6
  "Triple post or during"                           → 6  (year-trunc.)
  "Secondary"                                       → 7

Two regex passes strip the layout noise the extractor sometimes folds
into the glazing-type token: a `(?:Part )?value value Proofed Shutters`
prefix (from adjacent column headers) and a ` Summary Information` /
` Alternative wall…` suffix. Verified against the union of cohort-1
(7 certs) + cohort-2 (38 certs) + test-fixture (9 PDFs) glazing
labels: 18 distinct surface forms, all closed by the dict + noise
patterns; one window in cert 2636's Summary_000898.pdf lodged the
year-truncated "Triple post or during" — added as an alias for code 6
per worksheet "Triple glazed" lodging.

Strict-enum gate: `_elmhurst_glazing_type_code` raises
`UnmappedElmhurstLabel("glazing_type", label)` (Slice S0380.15
pattern, extended to the new helper) when the label is None or not
in the dict — surfaces mapper-coverage gaps at extraction time rather
than masking them as a SAP precision floor.

Cohort-2 Summary-path delta progression (38 certs):
  bucket          before slice 2    after slice 2
  exact (<1e-4)   11                11
  <0.005          0                 5     ← 9421 +0.0012, 2536 +0.0016, 9370 +0.0017, 0100 +0.0028, 2800 +0.0044
  0.005-0.07      15                10    ← all triple-glazed
  0.07-0.5        5                 5
  0.5-1           4                 4
  1-5             1                 1
  5+              2                 2
  RAISES          0                 0

3336 (user's flag) closes from +0.0674 → +0.0400 — the residual is
the remaining systematic offset the next slice will investigate.

Tests added (3):
- `test_summary_3336_triple_glazed_windows_route_to_code_6` — pins
  the mapper output for the user's flagged cert.
- `test_summary_000474_double_glazed_windows_route_to_code_3` —
  exercises the DG branch + the year-unknown alias mapping.
- `test_summary_mapper_raises_on_unmapped_glazing_type_label` —
  strict-enum coverage gate via mutated site notes.

Tests updated (1):
- `test_first_window_glazing_type` (test_elmhurst_end_to_end.py):
  asserts int code 5 (DG low-E argon — "Double post or during 2022")
  not the string verbatim. The string-passthrough behaviour was
  always a latent bug; this test was the only direct pin on it.

Pyright net-zero per file:
  - datatypes/epc/domain/mapper.py: 32 (baseline 32)
  - backend/documents_parser/tests/test_summary_pdf_mapper_chain.py: 0
  - backend/documents_parser/tests/test_elmhurst_end_to_end.py: 0

Regression baseline: 694 pass + 10 fail (= prior 691 + 10 + 3 new).
Triple-glazed original-cohort certs are now closer to worksheet too;
the ±0.07 chain tests on the original cohort still hold, and a future
slice tightens them once the next-largest residual is closed.

Spec refs:
- SAP 10.2 Table U2 — glazing-type integer enum.
- SAP 10.2 Table 6b col light — light-transmission g_L by glazing
  type (triple 0.70, double-glazed variants 0.80, single 0.90).
- RdSAP 10 §11 Windows — Summary lodging of glazing type as a
  type+install-date phrase.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
69668ec634 Slice S0380.16: add 'Normal' → cylinder_size=2 (110 L) for cohort 2
Unblocks two 38-cert-cohort certs that previously raised
`UnmappedElmhurstLabel("cylinder_size", 'Normal')` at extraction:
  cert 2536-2525-0600-0788-2292  ws SAP=79.7264
  cert 9421-3045-3205-1646-6200  ws SAP=87.4495

Both Summary §15.1 lodgements read "Cylinder Size: Normal"; both dr87
worksheets lodge line ref (47) "Store volume = 110.0000" L (extracted
from `Hot Water Cylinder → Cylinder Volume 110.00`). RdSAP 10 §10.5
Table 28 documents the "Normal (90-130 litres)" descriptor whose
midpoint is 110 L — the canonical Elmhurst label string in
`datatypes/epc/surveys/elmhurst_site_notes.py` is "Normal (90-130
litres)", and the worksheet's exact 110 L matches the midpoint.

Two-line fix:
  +    "Normal": 2,           in `_ELMHURST_CYLINDER_SIZE_LABEL_TO_SAP10`
  +    2: 110.0,              in `_CYLINDER_SIZE_CODE_TO_LITRES`

The cascade enum 2 is consistent with the existing
`cert_to_inputs.py` docstring's documented (but not-yet-observed)
code 2 → Normal slot, alongside code 3 (Medium / 160 L) and code 4
(Large / 210 L) added in earlier slices.

Slice keeps tight: two mapping unit tests pinning `cylinder_size == 2`
for both certs at extraction. Post-fix the first-attempt cascade
deltas vs worksheet are:
  cert 2536  Δ +0.0244   (was: RAISES)
  cert 9421  Δ +0.0296   (was: RAISES)

Both deltas now sit in the same systematic +0.02..+0.07 small-gap
band as ~12 other first-attempt certs in cohort 2 — chain test +
±0.07 pin would just paper over a known systematic residual that the
user has explicitly asked to drive towards 1e-4, not toward ±0.07.
Following slice will investigate the shared systematic offset and
close cert 2536 / 9421 along with the rest of the +0.04 band on
the chain.

Pyright net-zero per file:
  - datatypes/epc/domain/mapper.py: 32 (baseline 32)
  - domain/sap10_calculator/rdsap/cert_to_inputs.py: 35 (baseline 35)
  - backend/documents_parser/tests/test_summary_pdf_mapper_chain.py: 0

Regression baseline: 691 pass + 10 fail (= prior 689 + 10 + 2 new GREEN).

Spec refs:
- RdSAP 10 §10.5 Table 28 — "Cylinder Volume" Normal band 90-130 L,
  midpoint 110 L (also the canonical Elmhurst label suffix).
- Cert 2536 worksheet `dr87-0001-000889.pdf` line ref (47) = 110.0000.
- Cert 9421 worksheet `dr87-0001-000884.pdf` line ref (47) = 110.0000.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
a5665cfda5 Slice S0380.15: strict-enum raising on unmapped cylinder labels
Establishes the strict-enum pattern for Elmhurst label-to-cascade-enum
helpers: lodged-but-unrecognised labels raise `UnmappedElmhurstLabel`
instead of silently returning None and letting the cascade default to
a wrong-but-not-obviously-wrong value downstream.

Triggered by the user's observation following Slice S0380.14 ("In a
case like that, where the mapper maps to the wrong thing, is it
better to raise an exception?"). The cert 9418 "Large" cylinder miss
hid for an entire diagnostic cycle because
`_elmhurst_cylinder_size_code('Large', True)` silently returned None
→ cascade routed off the HW-with-cylinder path → 466 kWh/yr HW
under-count → Δ +2.60 SAP. Strict raising would have surfaced the
gap at the first cohort probe.

Scope-limited first pass — converts only the two cylinder helpers
(`_elmhurst_cylinder_size_code`, `_elmhurst_cylinder_insulation_code`)
to establish the pattern. Follow-up slices can extend to the other
label→enum helpers (wall_construction, wall_insulation, main_fuel,
pv_overshading, party_wall_construction, emitter_temperature,
flue_type, pump_age, …) where the source vocabulary is finite and we
control it.

Behavioural contract:
  - `(label = None)` → return None (lodging genuinely absent; cert
    has no cylinder, no §15.1 block, or the field is optional).
  - `(label in dict)` → return mapped code (existing behaviour).
  - `(label = "anything-else")` → raise UnmappedElmhurstLabel(field,
    value) with a message pointing the next reader at the corresponding
    mapper lookup dict.

Tests:
  - `test_summary_mapper_raises_on_unmapped_cylinder_size_label` —
    injects "Tiny" via dataclass mutation, asserts the public
    `from_elmhurst_site_notes` propagates the raise with the right
    field + value attributes.
  - `test_summary_mapper_raises_on_unmapped_cylinder_insulation_label`
    — mirror for the "Insulated" label dict.
  - `test_all_seven_ashp_cohort_certs_extract_without_unmapped_label_raise`
    — coverage forcing function: every cohort cert must extract
    cleanly. New cohort certs fall under the same gate. Any future
    Elmhurst-PDF variant with an unmapped cylinder label fails this
    test until the dict is extended.

Tests deliberately go through `from_elmhurst_site_notes` rather than
importing the private helpers (`reportPrivateUsage` clean).

Pyright net-zero across both edited files (mapper.py 32 baseline,
test 0).

Regression suite: 689 pass + 10 fail (= handover baseline 669 + 10 +
20 new GREEN tests across S0380.2..S0380.15).

Trade-off documented in the exception's docstring: strict raising
trades graceful degradation for early detection. For the cohort-
validation workflow (this branch's purpose) early detection is the
right default. Production extraction code that needs to soft-fail on
novel Elmhurst variants can either catch `UnmappedElmhurstLabel` at
the boundary or (in a future slice) the helpers can grow a
`strict: bool = True` parameter.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
b6454d27e6 Slice S0380.14: add 'Large' → cylinder_size=4 (closes cert 9418 Daikin)
🎯 Closes the 7th and final ASHP cohort cert. Summary path now
mirrors the API path's complete cohort closure at the ±0.07 spec
precision floor.

Cert 9418-3062-8205-3566-7200 (Summary_000902.pdf): Daikin Altherma
EDLQ05CAV3 (PCDB 102421 — distinct from the rest of the cohort's
Mitsubishi 104568), end-terrace house, TWO 1.64 kWp PV arrays (N+S),
210 L cylinder, `heating_duration_code='24'` (continuous heating).
Worksheet "SAP value" lodges 84.6305.

Single-line fix to
`_ELMHURST_CYLINDER_SIZE_LABEL_TO_SAP10`:
  +    "Large": 4,
extending Slice S0380.6's "Medium" → 3 mapping to also cover the
"Large" cylinder. Without it `_elmhurst_cylinder_size_code('Large',
True)` returned None → cascade routed off the HP-with-cylinder HW
path → HW kWh under by 466 (Summary 1404 vs API 1871 vs
worksheet-implied 1871 via (64)/(216) divide).

Forcing function: cert 9418 first-attempt Summary SAP closes from
Δ +2.5973 (lookup miss) to Δ **+0.0296** — within ±0.07. The PV
multi-array Slice S0380.9 work was already sufficient for cert
9418's two-array PV layout (1.64 kWp N + 1.64 kWp S surfaced
correctly first-try).

ASHP cohort closure: 7/7 at spec floor:
  cert  Δ vs worksheet
  0380  +0.0594
  0350  +0.0458
  2225  +0.0441
  2636  +0.0323
  3800  +0.0442
  9285  +0.0502
  9418  +0.0296  ← this slice
  ───────────────
  mean  +0.0437

Identical disposition to the API path's cohort closure at slice
102f (commit c0086660). Both paths now sit at the documented
Appendix N3.6 PSR-interpolation precision floor.

Added two tests:
- `test_summary_9418_large_cylinder_routes_to_code_4` — unit-level
  pin on the new mapping.
- `test_summary_9418_full_chain_sap_within_spec_floor_of_worksheet`
  — chain test at ±0.07.

Pyright net-zero on both edited files (mapper.py 32 baseline).

Regression suite: 686 pass + 10 fail (= handover baseline 669 + 10
+ 19 new GREEN tests across Slices S0380.2..S0380.14).

Spec refs:
- SAP 10.2 Table 2a — cylinder volume factor (52) keyed on volume_l;
  210 L = 0.8x range factor (vs 160 L = 0.9086).
- BRE PCDB Table 362 — Daikin EDLQ05CAV3 (id 102421) is the cohort's
  second HP record alongside Mitsubishi PUZ-WM50VHA (id 104568).
- Cert 9418 worksheet `dr87-0001-000902.pdf` "Cylinder Volume 210.00".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
29cfdf6461 Slice S0380.11: resolve zero-shower lodgings to count=0 (closes cert 2225)
Cert 2225-3062-8205-2856-7204 lodges **zero showers** in its Summary
§1x Baths and Showers block. The Summary mapper at
`mapper.py:3536-3537` predicated the shower-count assignment on
`has_electric_shower`: for cohort certs with no electric shower the
counts collapsed to None — but cert 2225 has no showers at all, and
the cascade's None-handling defaults to 1 mixer shower (over-counting
HW kWh by ~66 against the worksheet (64)/(216) target).

Same disposition the API path received in slice 102f-prep.8 (commit
1d5183c6, "API mapper resolves shower_outlets=None → 0 mixers") —
extending it to the Summary mapper.

Scope-limited fix: zero-shower lodgings resolve to **explicit 0**
counts (not None) so the cascade does not default-assume a mixer.
Non-zero shower lodgings keep their existing convention (None for
non-electric → cascade derives count from `shower_outlets`) so the 5
boiler-cohort hand-built parity tests
(`test_from_elmhurst_site_notes_matches_hand_built_*`) stay GREEN.

Forcing function: cert 2225 first-attempt Summary SAP closes from
Δ -0.3079 to Δ **+0.0441** — within the ±0.07 ASHP-cohort spec floor.

Cohort closure status (5 of 7 ASHP certs now at spec floor):
  cert  Δ vs worksheet  spec floor?
  0380  +0.0594         ✓
  0350  +0.0458         ✓
  2225  +0.0441         ✓  ← this slice
  2636  +0.4873         ✗  (cantilever + alt-wall; next slice)
  3800  +0.0442         ✓
  9285  +0.0502         ✓
  9418  +2.5973         ✗  (Daikin EDLQ05CAV3, distinct PCDB)

Added two tests:
- `test_summary_2225_no_showers_lodged_resolves_to_zero_counts` —
  unit-level pin that no-shower lodgings produce explicit 0 counts.
- `test_summary_2225_full_chain_sap_within_spec_floor_of_worksheet`
  — Layer-4 chain test at ±0.07.

Pyright net-zero on both edited files (mapper.py 32 baseline).

Regression suite: 682 pass + 10 fail (handover baseline 669 + 10 +
13 new GREEN tests across S0380.2..S0380.11). The 5 boiler hand-
built parity tests confirmed still GREEN — the refinement
deliberately preserves their convention by only flipping the zero-
shower case.

Spec refs:
- Slice 102f-prep.8 (commit 1d5183c6) — API-path precedent.
- SAP 10.2 Appendix J — shower energy accounting (electric vs mixer
  routing); mixer showers draw from the HW system and contribute to
  HW kWh; electric showers are §J line 64a (separate energy stream).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
8e6560d744 Slice S0380.9: multi-array PV support + close cert 0350 to ASHP spec floor
Refactors Elmhurst `Renewables` PV detail from four scalar fields
(pv_peak_power_kw / pv_orientation / pv_elevation_deg / pv_overshading
— single-array shape) to `pv_arrays: List[ElmhurstPvArray]`, then
walks the §19.0 PV Panel block in 4-tuples so dwellings with multiple
PV arrays surface every array.

Forced by cert 0350-2968-2650-2796-5255 (Summary_000903.pdf), the
second ASHP cohort cert through the Summary path and first to lodge
multiple PV arrays — the dr87 worksheet pins 2 arrays at 1.50 kWp
each (one SE at 45°, one NW at 45°). Pre-slice the extractor's
hardcoded "break at len(values) == 4" capped output at one array
regardless of how many the PDF lodged.

Three-layer end-to-end change:

1. `datatypes/epc/surveys/elmhurst_site_notes.py` — add
   `ElmhurstPvArray` dataclass (kw, orientation, elevation_deg,
   overshading); replace four `Renewables.pv_*` scalars with
   `pv_arrays: List[ElmhurstPvArray] = field(default_factory=list)`.
2. `backend/documents_parser/elmhurst_extractor.py` — rename
   `_extract_pv_array_detail` → `_extract_pv_arrays`; walk values
   after the "Photovoltaic panel details" anchor in 4-tuples until a
   stop token ("batteries"/"export"/etc.) or a §-header closes the
   block. §-header regex tightened to `\d{1,2}\.\d\s+\w` so kWp
   values like "1.50" don't trip the close (without the `\s+\w` the
   regex matched both "20.0 Wind Turbine" AND "1.50").
3. `datatypes/epc/domain/mapper.py` — `_elmhurst_pv_arrays` iterates
   the list and emits one `PhotovoltaicArray` per row; collapses
   empty list → None so the cascade keeps its no-PV fallback.

Forcing function: cert 0350 first-attempt Summary SAP closes from
Δ -4.5829 (Slice 8 baseline) to Δ **+0.0458** — within the ±0.07
ASHP-cohort spec-precision floor. PV export credit GBP moves from
158.91 (one array surfaced) to 265.99 (both arrays surfaced) — the
extra ~107 GBP of avoided cost lifts cert 0350's SAP by ~4.6 points.

This validates the structural-debt-amortizes hypothesis: cert 0350
needed only TWO new slices (S0380.8 inheritance + S0380.9 multi-PV)
beyond the cert 0380 closure work, vs cert 0380's 6 slices from
scratch. Subsequent cohort certs should converge similarly fast as
fixture-specific gaps are paid down.

Added two tests:
- `test_summary_0350_surfaces_two_pv_arrays` — unit test pinning
  the multi-array contract on the mapper boundary.
- `test_summary_0350_full_chain_sap_within_spec_floor_of_worksheet`
  — chain test pinning Δ < ±0.07 (matches cert 0380's chain test).

Cert 0380 (single-array, 3 kWp) continues to pass its chain test +
all 6 unit-level pins — the refactor preserves single-array behaviour.

Pyright net-zero across all four edited files:
  datatypes/epc/domain/mapper.py:                32 (baseline)
  datatypes/epc/surveys/elmhurst_site_notes.py:   0
  backend/documents_parser/elmhurst_extractor.py: 0
  backend/documents_parser/tests/test_summary_pdf_mapper_chain.py: 0

Regression suite: 677 pass + 10 fail (= handover baseline 669 + 10
+ 8 new GREEN unit+chain tests across Slices S0380.2..S0380.9).

Fixtures added: `backend/documents_parser/tests/fixtures/Summary_
000903.pdf` (copied from `sap worksheets/Additional data with api/
0350-2968-2650-2796-5255/`).

Spec refs:
- SAP 10.2 Appendix M (PDF p.103) — multiple PV arrays sum to total
  electricity generation per Equation M-1 (each array's surface flux
  computed independently per Appendix U3.3).
- SAP 10.2 Appendix U3.3 (PDF p.124) — per-array surface flux keyed
  on orientation + tilt + overshading.
- Cert 0350 worksheet `dr87-0001-000903.pdf` (29a Main 19.4575 W/K
  + Ext1 1.3025 W/K = 20.7600 ≡ Summary cascade walls_w_per_k; (39)
  avg HTC 173.4202 ≡ Summary cascade; (64) HW 2084.66 ÷ (216) HW eff
  1.7285 = 1206.04 ≡ Summary cascade hot_water_kwh_per_yr).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
c30b4fcdc8 Slice S0380.6: surface full §15.1 Hot Water Cylinder block — Summary HW exact
Closes the entire §15.1 Hot Water Cylinder lodging end-to-end and
collapses cert 0380's Summary path to the API path at the documented
HP-cohort spec-precision floor: SAP **88.5698 (Δ +0.0594)** — exactly
matching the API path's spec-floor closure. `hot_water_kwh_per_yr`
hits **878.0519** vs worksheet (64) 1502.16 ÷ (216) HW eff 1.7107 =
**878.05** — exact match at 1e-4.

Four §15.1 fields surfaced together (the cascade requires all four in
combination to compute the worksheet-correct HP HW path):

1. `cylinder_size_label` (Summary "Medium" → SAP10 cascade enum 3 =
   160 L per `_CYLINDER_SIZE_CODE_TO_LITRES`)
2. `cylinder_insulation_label` (Summary "Foam" → cascade enum 1 =
   factory, per SAP 10.2 Table 2 Note 2)
3. `cylinder_insulation_thickness_mm` (Summary "50 mm" → 50)
4. `cylinder_thermostat` (Summary "Yes" → bool True → mapper emits 'Y'
   for the cascade's `sh.cylinder_thermostat == "Y"` string compare)

Why all four were required:

- `_cylinder_storage_loss_override` in `cert_to_inputs.py:2238-2253`
  gates on `cylinder_size`, `cylinder_insulation_type ==
  _CYLINDER_INSULATION_TYPE_FACTORY (1)`, AND
  `cylinder_insulation_thickness_mm`. Missing any → no override →
  zero storage loss (62)m miscalculated.
- `cylinder_thermostat` keys the SAP 10.2 Table 2b temperature factor
  (53): with-stat 0.5400 vs no-stat ~0.9 → without 'Y' storage loss
  over-counts by ~300 kWh/yr (the precise diff between the bundled-
  fields-only attempt at SAP 86.5 vs the fully-bundled attempt at
  SAP 88.57).

Three-layer end-to-end change:

1. `datatypes/epc/surveys/elmhurst_site_notes.py` — add four
   defaulted `WaterHeating` fields (placed in the defaulted block;
   existing fixtures that omit §15.1 still construct unchanged).
2. `backend/documents_parser/elmhurst_extractor.py` — extend
   `_extract_water_heating` to read the §15.1 block via
   `_section_lines("15.1 Hot Water Cylinder", "15.2 Community Hot
   Water")` + `_local_val`. Section-scoping is required because the
   "Insulation Thickness" label collides with §7 Walls / §8 Roofs /
   §9 Floors lodgings on the same Summary PDF (cert 0380 has §7
   "Insulation Thickness 100 mm" for the FE wall — the global
   `_next_val` would return the wrong value).
3. `datatypes/epc/domain/mapper.py` — add
   `_elmhurst_cylinder_size_code` + `_elmhurst_cylinder_insulation_code`
   label-to-enum helpers; replace the broken
   `cylinder_size = water_heating.water_heating_code` (which was
   passing the §15 "Water Heating Code" string "HWP" into the
   numeric `cylinder_size` field, defeating the cascade) with the
   real `cylinder_size_label`-derived enum.

Pre-Slice 6, the Summary path was producing `cylinder_size='HWP'`
which `_int_or_none` reduced to None, silently routing the cascade
off the HP-with-cylinder HW path entirely. Surfacing the §15.1
block in full lets `_heat_pump_apm_efficiencies` use the spec-
correct HW efficiency (1.7107) and `_cylinder_storage_loss_override`
contribute the spec-correct (56) 435 kWh/yr storage loss.

Pyright net-zero across all four edited files:
  datatypes/epc/domain/mapper.py:                32 (baseline)
  datatypes/epc/surveys/elmhurst_site_notes.py:   0
  backend/documents_parser/elmhurst_extractor.py: 0
  backend/documents_parser/tests/test_summary_pdf_mapper_chain.py: 0

Regression suite: 674 pass + 11 fail (vs handover baseline 669 + 10
— net +5 pass for the new GREEN unit tests S0380.2..S0380.6; the +1
fail vs baseline is still S0380.1's chain test which pins at 1e-4 vs
worksheet 88.5104 and now lands at Δ +0.0594, the same Appendix N3.6
PSR-interpolation precision floor that the API path closes to and
that the cohort's 7 ASHP fixtures already track at ±0.07).

Tolerance disposition: the +0.0594 residual is identical to the
cohort's documented HP-path precision floor. Closing further requires
work on the calculator's Appendix N3.6 PSR interpolation step
(boilers already match worksheet at 1e-4 via the same cascade —
ground-truthed in closed-boiler precedents 001479, 0330), not on
the Summary mapper. The S0380.1 chain test should be re-pinned to
the ±0.07 ASHP-cohort tolerance in the next slice — same disposition
the API-path cohort received in slice 102f (commit c0086660).

Spec refs:
- SAP 10.2 §4 Table 2 (PDF p.135) — cylinder storage loss factor
  for foam-insulated cylinders (51) keyed on insulation thickness.
- SAP 10.2 §4 Table 2a (PDF p.135) — cylinder volume factor (52).
- SAP 10.2 §4 Table 2b (PDF p.135) — cylinder temperature factor
  (53) keyed on cylinder thermostat + separately-timed DHW.
- SAP 10.2 Appendix N3.7(a) (PDF p.6097) — HP HW in-use factor
  cylinder-criteria, footnote 53 (cert HX area unknown for Open EPC
  schema → criteria fail → 0.60 in-use factor; the worksheet's
  closed HW path uses this same factor).
- Cert 0380 worksheet `dr87-0001-000899.pdf` lodgings:
  (47) Cylinder Volume 160.00 L; "Cylinder Insulation Type Foam";
  "Cylinder Insulation Thickness 50 mm"; "Cylinder Stat Yes";
  (51)..(56) cylinder storage loss chain; (64) HW output 1502.16;
  (216) HW efficiency 171.0746%.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
9faff3e122 Slice S0380.5: surface insulated_door_u_value from Summary §10 'Average U-value'
Closes the three-layer gap that left the Summary mapper producing
`insulated_door_u_value=None` even though Summary §10 lodges
"Average U-value" / "1.20" explicitly on cert 0380:

1. `datatypes/epc/surveys/elmhurst_site_notes.py` — add
   `ElmhurstSiteNotes.insulated_door_u_value: Optional[float] = None`,
   placed in the defaulted-field block so existing fixtures that
   omit the field still construct without changes.
2. `backend/documents_parser/elmhurst_extractor.py` — add
   `_extract_door_u_value` that section-scopes the lookup to
   `_section_lines("10.0 Doors:", "11.0 Windows:")` so the bare
   "Average U-value" label cannot be shadowed by global U-value
   lookups in §7 Walls / §8 Roofs / §9 Floors.
3. `datatypes/epc/domain/mapper.py` — surface
   `insulated_door_u_value=survey.insulated_door_u_value` on the
   `from_elmhurst_site_notes` path. The comment in
   `epc_property_data.py:585` ("Not available in site notes") is now
   outdated for Elmhurst Summary PDFs that lodge the explicit value.

Worksheet anchor (dr87-0001-000899.pdf line ref (26)):

  Doors insulated 1   NetArea 3.7000   U-value 1.2000   A×U 4.4400 W/K

Forcing function (Slice S0380.1): cert 0380 Summary cascade
`doors_w_per_k` moves from 5.1800 to **4.4400 W/K — exact match
against worksheet line ref (26)**. The +0.74 W/K mis-attribution
was the default door-U fall-through that the lodged 1.20 value
silences. SAP moves 88.1981 (Δ -0.3123) → 88.2746 (Δ -0.2358).

Added focused unit test
`test_summary_0380_surfaces_insulated_door_u_value_1_2` that pins
the mapper boundary directly to the worksheet's lodged U-value 1.2,
so future debuggers can localise regressions in the new extractor /
field / mapper path before walking the full chain.

Pyright net-zero across all four edited files:
  datatypes/epc/domain/mapper.py:                32 (baseline)
  datatypes/epc/surveys/elmhurst_site_notes.py:   0
  backend/documents_parser/elmhurst_extractor.py: 0
  backend/documents_parser/tests/test_summary_pdf_mapper_chain.py: 0

Regression suite: 673 pass + 11 fail (vs handover baseline 669 + 10
— net +4 pass for the four GREEN unit tests across Slices S0380.2-5;
the +1 fail vs baseline is the S0380.1 chain test which this slice
moves to Δ -0.2358 but does not yet fully close).

Spec refs:
- SAP 10.2 Table 14 (door U-values: composite-construction default
  cascade is silenced when the assessor lodges an explicit measured
  U on the cert; routed via `insulated_door_u_value`).
- Cert 0380 worksheet dr87-0001-000899.pdf line ref (26) — the
  A×U=4.4400 W/K spec value that this slice closes the Summary
  cascade to exactly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
5fcb594f0a Slice S0380.4: surface wall_insulation_thickness from Summary §7.0
Closes the three-layer gap that left the Summary mapper producing
`wall_insulation_thickness=None` even though Summary §7.0 lodges
"Insulation Thickness" / "100 mm" explicitly on cert 0380. Three
small co-ordinated edits ship the field end-to-end:

1. `datatypes/epc/surveys/elmhurst_site_notes.py` — add
   `WallDetails.insulation_thickness_mm: Optional[int] = None`,
   mirroring the existing `RoofDetails.insulation_thickness_mm`.
2. `backend/documents_parser/elmhurst_extractor.py` — extend
   `_wall_details_from_lines` to read the `_local_val(lines,
   "Insulation Thickness")` label inside the §7 Walls block (the
   "Insulation Thickness" label is local-scoped per block, so it
   does not collide with §8 Roofs / §9 Floors).
3. `datatypes/epc/domain/mapper.py` — surface
   `wall_insulation_thickness=f"{walls.insulation_thickness_mm}mm"`
   on `SapBuildingPart`. Mirrors the API mapper's string-with-unit
   shape (`'100mm'`) so cert-to-cert parity tests (Summary EPC ≡
   API EPC) compare equal; the cascade's `_parse_thickness_mm`
   accepts either form.

Forcing function (Slice S0380.1): cert 0380 Summary cascade SAP
moves from 86.8671 (Δ -1.6433 — i.e. after Slice S0380.3 only) to
88.1981 (Δ -0.3123) — closes ~81% of the remaining gap. Critically,
`walls_w_per_k` now hits API parity exactly (Summary 11.6150 ≡ API
11.6150) — the composite filled-cavity-plus-external U-value calc
is now keyed off the lodged 100 mm thickness rather than its
internal default.

Residual -0.31 SAP vs worksheet is comparable to the documented HP
cohort's API-path residual of +0.06 (cert 0380 API path closes at
+0.0594). Summary path is now within ±0.37 of API path. Remaining
diffs to investigate (per the next-step diagnostic): hot-water
cascade (Summary 1002.74 kWh vs API 878.05 kWh, +124.69 kWh), HLC
parameters (heat_transfer_coefficient still differs slightly through
secondary terms), and possibly secondary-heating routing. The
worksheet vs API +0.06 residual is the documented Appendix N3.6
PSR-interpolation precision floor and out of scope for Summary-path
closure.

Added focused unit test
`test_summary_0380_surfaces_wall_insulation_thickness_100mm` that
pins the mapper boundary directly (Summary "100 mm" line pair →
EPC `wall_insulation_thickness="100mm"`), so future debuggers can
localise regressions in the new extractor / field / mapper path
before walking the full chain.

Pyright net-zero across all four edited files:
  datatypes/epc/domain/mapper.py:                32 (baseline)
  datatypes/epc/surveys/elmhurst_site_notes.py:   0
  backend/documents_parser/elmhurst_extractor.py: 0
  backend/documents_parser/tests/test_summary_pdf_mapper_chain.py: 0

Regression suite: 672 pass + 11 fail (vs handover baseline 669 + 10
— net +3 pass for the three Slices S0380.2-4 GREEN unit tests; the
+1 fail vs baseline is still the S0380.1 chain test which this slice
moves from Δ -1.6433 to Δ -0.3123 but does not yet fully close).

Spec refs:
- SAP 10.2 §3.7 / Appendix S Table S5 (composite filled-cavity-plus-
  external U-value calc — series-resistance form keyed off lodged
  insulation thickness)
- Cert 0380 Summary PDF §7.0 lines 121-122 ("Insulation Thickness"
  / "100 mm" — the missing extractor read this slice adds)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
74c4b5ebc1 Slice S0380.3: surface wall_insulation_type=6 for 'FE Filled Cavity + External'
Extends `_ELMHURST_INSULATION_CODE_TO_SAP10` in
`datatypes/epc/domain/mapper.py` with the two-letter dual codes
documented on Elmhurst Summary PDFs:

  "FE" → 6  (Filled cavity + External insulation; cohort fixture)
  "FI" → 7  (Filled cavity + Internal insulation; mirror, no fixture)

The cascade `wall_insulation_type` enum (per
`domain/sap10_ml/rdsap_uvalues.py` lines 120-131) treats codes 6 and
7 as composite-resistance walls (filled cavity in series with an
external/internal insulation layer), routing through a different
U-value calc than the plain filled-cavity default. Cert 0380's
Summary lodges `walls.insulation = "FE Filled Cavity + External"`
which until this slice fell through `_leading_code` to a missing
dict entry and the mapper produced `wall_insulation_type=None`,
defaulting the cascade to the as-built path and overstating walls
heat loss by +58 W/K.

Forcing function (Slice S0380.1): cert 0380 Summary cascade SAP
moves from 81.7528 (Δ -6.7576 — i.e. after Slice S0380.2 only) to
86.8671 (Δ -1.6433) — closes ~76% of the remaining gap. `walls_w_per_k`
drops from 69.6900 to 24.6238. Residual ~13 W/K wall gap vs API's
11.6150 is the next workstream: `wall_insulation_thickness` is still
None on the Summary EPC (API lodges '100mm'). Without the thickness
the cascade applies the composite U-value at the dual-code's default
thickness rather than the lodged 100 mm.

Added focused unit test
`test_summary_0380_filled_cavity_plus_external_insulation_routes_to_code_6`
that pins both `wall_construction == 4` and `wall_insulation_type == 6`
on the mapper boundary, so future debuggers can localise regressions
in the dual-code lookup before walking the full chain.

Pyright baseline preserved:
  datatypes/epc/domain/mapper.py: 32 errors (no new errors introduced)
  backend/documents_parser/tests/test_summary_pdf_mapper_chain.py: 0 errors

Regression suite: 671 pass + 11 fail (vs handover baseline 669 + 10 —
net +2 pass for the two new GREEN unit tests across Slices S0380.2-3,
+1 fail still being the S0380.1 chain test that this slice continues
to close but does not yet fully resolve).

Spec refs:
- SAP 10.2 §3.7 / Table S5 (U-values for masonry walls — composite
  filled-cavity-plus-insulation calc)
- `domain/sap10_ml/rdsap_uvalues.py:120` (RdSAP schema
  `wall_insulation_type` enum: 6 = filled cavity + external)
- Cert 0380 worksheet `dr87-0001-000899.pdf` (lodges Mitsubishi
  PUZ-WM50VHA ASHP on a cavity wall with subsequent external
  insulation — the composite-wall fixture)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
19e23d0c31 Slice S0380.2: surface main_heating_category=4 for PCDB heat-pump indices
Extends `_elmhurst_main_heating_category` in
`datatypes/epc/domain/mapper.py` so a PCDB index that resolves to a
Table 362 record (heat pumps only) yields category 4 — the SAP 10.2
Table 4a code that gates the Appendix N3.6/N3.7 heat-pump cascade
(`cert_to_inputs.py` lines 1896, 2005, 2057, 2104 all branch on
`main_heating_category == 4`).

Authoritative signal: PCDB Table 362 is heat-pumps-only, so
membership IS the heat-pump answer. `heat_pump_record(pcdb_id)`
(introduced for the API path's cohort closure) returns the typed
record or None; a non-None return is sufficient. No fuel-type
belt-and-braces is needed — Table 362 membership is unambiguous,
unlike the gas-boiler branch which uses fuel type to disambiguate
PCDB Table 105 records.

Forcing function (Slice S0380.1): cert 0380 Summary cascade SAP
moves from 33.7920 (Δ -54.7184) to 81.7528 (Δ -6.7576) — closes
~88% of the gap. Remaining -6.76 SAP is the next workstream:
cylinder / HW cascade, PV array surfacing, secondary-heating routing
(per HANDOVER_CERT_0380_SUMMARY_PATH.md debug order steps 3–4).

Added focused unit test
`test_summary_0380_main_heating_category_is_heat_pump` that pins the
contract at the mapper boundary (idx 104568 → category 4), so future
debuggers can localise regressions before walking the full chain.

Architectural note: introduces the first
`datatypes/epc/domain/mapper.py → domain/sap10_calculator/tables/pcdb`
import. PCDB is BRE reference data shared by both layers; treating it
as importable shared reference is the lighter alternative to either
(a) duplicating an HP-PCDB-IDs frozenset in the mapper or (b) hoisting
PCDB into a new shared package.

Pyright baseline preserved:
  datatypes/epc/domain/mapper.py: 32 errors (no new errors introduced)
  backend/documents_parser/tests/test_summary_pdf_mapper_chain.py: 0 errors

Regression suite: 670 pass + 11 fail (vs handover baseline 669 + 10 —
net +1 pass for the new GREEN unit test, +1 fail still being the
Slice 1 chain test that this slice does not yet fully close).

Spec refs:
- SAP 10.2 Table 4a (main heating category codes — code 4 = heat pump)
- SAP 10.2 Appendix N3.6/N3.7 (heat-pump space-heating efficiency
  with PSR interpolation, routed via the category-4 gate)
- BRE PCDB Table 362 (heat-pump records — pcdb_id 104568 = Mitsubishi
  Ecodan PUZ-WM50VHA, the cert 0380 main heating appliance)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
dfe2f2ce6e Slice 102f-prep.8: API mapper resolves shower_outlets=None → 0 mixers
Cert 2225 (Mitsubishi PUZ-WM50VHA, semi-detached 2-bp, TFA 82.49)
lodges `sap_heating.shower_outlets = None` in the Open EPC API
JSON. The worksheet (42a) "Hot water usage for mixer showers" reads
0 every month — Elmhurst's convention is "absent ⇒ no shower".

Pre-fix the API mapper returned `mixer_shower_count = None`,
deferring to the cert→inputs cascade's "RdSAP modal lodging"
default of 1 vented mixer. That added ~7 L/day to (44) daily HW
use, ~113 kWh/yr to (62) HW demand, and shifted cert 2225's SAP
residual from -0.31 → +0.04 (now aligned with the cohort's
+0.03..+0.06 cluster) once the mapper returns 0.

`_count_shower_outlets_by_type` now treats None as 0 (the API
mapper-only path). The cert→inputs cascade's
`_mixer_shower_flow_rates_from_cert` keeps the None→1 default for
the Elmhurst hand-built fixture path that doesn't route through
this helper.

Cohort impact: 6 of 7 ASHP certs now cluster at SAP Δ +0.03 to
+0.06 (vs worksheet); only cert 2636 remains an outlier (+0.49).
Golden cert PE/CO2 pins re-pinned for 6035, 8135, 0390 (the three
certs that previously lodged shower_outlets=None and consumed the
spurious 1-mixer default).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:46 +00:00
Khalim Conn-Kowlessar
7874374bcf Slice 101a: API glazing_type=14 → DG/TG 2022+ (RdSAP 10 Table 24)
Cert 0380 (ASHP semi-detached bungalow, worksheet SAP 88.5104)
lodges glazing_type=14 on all windows. The worksheet uses U=1.3258
(post-curtain) for line (27), back-calculating to a raw U=1.40 —
the SAP10.2 Table 24 row for "Double or triple glazed, 2022 or
later" (England/Wales 2022+ / Scotland 2023+ / NI 2022+). Without
code 14 in `_API_GLAZING_TYPE_TO_TRANSMISSION` the cascade falls
back to `u_window`'s default (~U=2.50 post-curtain), inflating
windows HLC by 5 W/K on cert 0380 (6.80 → 11.68).

Added `14: (1.4, 0.72, 0.70)` — same U/g/frame as code 13. Codes
13 and 14 are schema siblings within the post-2022 product family
(the cert lodgement integer differentiates between DG and TG
sealed-unit variants but Table 24 collapses them to the same row).

Effect on cert 0380 API path:
- windows HLC 11.68 → 6.80 (= worksheet 6.80 exact)
- (37) total HLC 104.22 → 99.34 (worksheet 96.09; Δ +3.25 left
  on walls — next slice closes it)
- sap_continuous 86.82 → 87.62 (Δ -1.69 → -0.89; closer to
  worksheet 88.51)

No golden cert residuals shifted (cohort + 9501 don't lodge
glazing_type=14).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:45 +00:00
Khalim Conn-Kowlessar
16845604e2 Slice 100c: API path — surface PV arrays + gap-aware glazing lookup
Two final API gaps to close cert 9501 at 1e-4:

(a) PV array surfacing — third shape variant:
    Schema-21 EPCs carry `photovoltaic_supply` as one of three shapes:
    - legacy `{"none_or_no_details": {...}}` (PV absent / roof-only)
    - nested list `[[{...}], ...]` (cohort cert 2130)
    - dict wrapper `{"pv_arrays": [{...}]}` (cert 9501)
    The schema's `PhotovoltaicSupply` modelled only `none_or_no_details`
    — cert 9501's measured arrays under `pv_arrays` were silently
    dropped (Δ -£250 PV credit → -9.32 SAP). Added
    `SchemaPhotovoltaicArray` dataclass + `pv_arrays:
    Optional[List[...]]` sibling field on `PhotovoltaicSupply`; updated
    `_map_schema_21_pv` to dispatch on the new shape.

(b) Gap-aware glazing lookup (RdSAP 10 Table 24 row 2):
    DG pre-2002 spec U varies by gap: 6mm=3.1 / 12mm=2.8 / 16+=2.7.
    The mapper's flat `_API_GLAZING_TYPE_TO_TRANSMISSION[3]` returned
    U=2.8 unconditionally — cert 9501 lodges `glazing_gap="16+"` so
    the worksheet uses 2.7. Added `_API_GLAZING_TYPE_GAP_TO_
    TRANSMISSION` keyed by (type, gap) with the spec-table values for
    code 3; `_api_glazing_transmission` consults the per-gap dict
    first, falling back to type-only when no gap entry exists.
    Refactored the inline `SapWindow(...)` build into
    `_api_sap_window` helper (also nets one pyright error: net-zero
    actually improved 33 → 32 on mapper.py).

Effect on cert 9501 API path:
- sap_continuous 59.20 → **68.525161** (= worksheet 68.5252 exact;
  Δ -0.000039 — well within 1e-4)
- total_fuel_cost £1101 → £849.21 (= worksheet 849.21 exact)
- pv_export_credit £0 → £250.02 (= worksheet 250.02 exact)

Re-pinned residuals (5 cohort certs with glazing_gap="16+" or 6 now
pick up the spec-correct DG-pre-2002 U):
- 0300: PE +8.44 → +8.28, CO2 -0.23 → -0.25
- 6035: PE +48.30 → +47.85, CO2 +1.10 → +1.09
- 7536: PE -6.51 → -7.08, CO2 -0.17 → -0.19
- 8135: PE -5.31 → -3.66 (gap=6 spec U=3.1), CO2 -0.07 → -0.04
- 2130: PE -38.18 → -38.63, CO2 +0.30 → +0.30

Layer 4 chain test `test_api_9501_full_chain_sap_matches_worksheet
_pdf_exactly` added — third production gate after cert 001479 +
cert 0330. First flat-shaped cert in the production gate set.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:45 +00:00
Khalim Conn-Kowlessar
d529f91a8e Slice 100b: API TFA — include per-bp RR floor area in continuous TFA
`_total_floor_area_from_building_parts` previously summed only
`sap_floor_dimensions[*].total_floor_area`; the RR floor area lives
under `sap_room_in_roof.floor_area` per RdSAP §3.9 convention and
was dropped from the per-bp TFA sum. Cert 9501 (113.08 m² real
TFA, of which 31.8 m² is RR) showed TFA 81.28 on the API path —
the cascade then under-computed occupancy N (Appendix J), HW kWh
(Appendix J), lighting kWh (Appendix L), and internal gains.

Add the RR contribution to the sum. The top-level
`schema.total_floor_area` scalar (integer-rounded for cert 9501:
113 vs raw 113.08) is still the fallback when no per-bp dims are
lodged.

Re-pinned residuals (improvements — TFA now includes the previously-
dropped RR storey):
- 0240: SAP -15 → -14, PE +15.69 → +12.49, CO2 +0.90 → +0.70
- 6035: PE +49.51 → +48.30, CO2 +1.14 → +1.10

Effect on cert 9501 API path: TFA 81.28 → 113.08 (= worksheet
113.08 exact). SAP delta still -9.32 vs worksheet — the remaining
gap is dominated by the missing PV credit (£250 — next slice).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:45 +00:00
Khalim Conn-Kowlessar
8e74b6b8b8 Slice 100a: API path — surface Detailed-RR per-surface areas
Two RR shapes coexist in real-API JSON: cohort certs (6035, 0240,
schema test 21_0_1.json) lodge `room_in_roof_type_1` (RdSAP §3.9.1
Simplified Type 1 — gable lengths only, cascade applies the 2.45 m
default storey height); cert 9501 lodges `room_in_roof_details`
(RdSAP §3.9 Detailed RR — per-surface lengths + heights + flat-
ceiling detail). The schema only modelled the Simplified-Type-1
wrapper, so `from_dict` parsed cert 9501's Detailed-RR block as
None and the API mapper built `SapRoomInRoof` with `detailed_
surfaces=None`. The cascade then defaulted to Simplified Type 2
"all elements" (RR floor area × Table 18 col(4) age-B U=2.30) for
the whole RR → roof HLC 149.43 W/K vs worksheet 18.10 (Δ +131.32).

Changes:
- Add `RoomInRoofDetails` dataclass to both schema 21.0.0 and 21.0.1
  with the 10 fields the JSON lodges: gable_wall_type_{1,2} +
  gable_wall_length_{1,2} + gable_wall_height_{1,2} + flat_ceiling_
  length_1 + flat_ceiling_height_1 + flat_ceiling_insulation_
  type_1 + flat_ceiling_insulation_thickness_1. `SapRoomInRoof`
  gains a sibling `room_in_roof_details` field next to the legacy
  `room_in_roof_type_1`; both shapes are now lossless.
- Extract `_api_build_room_in_roof` mapper helper that reads from
  whichever block is present and populates
  `SapRoomInRoof.detailed_surfaces` from the Detailed-RR block.
  Gables route to `gable_wall_external` for flats (top-floor flats
  with RR sit at the end of the building, no neighbour above) and
  to `gable_wall` (party at U=0.25) otherwise — mirrors the Summary
  mapper's `_map_elmhurst_rir_surface` heuristic.
- Replace both inline `SapRoomInRoof(...)` builds in
  `from_rdsap_schema_21_0_0` and `from_rdsap_schema_21_0_1` with
  the helper.

Effect on cert 9501 API path:
- roof HLC 149.43 → 18.10 (= worksheet 18.10 exact)
- walls HLC 168.74 → 218.81 (= worksheet 218.81 exact)
- (37) total HLC 382.19 → 297.54 (worksheet 296.68; Δ +0.86)
- sap_continuous still -9.27 vs worksheet because TFA on the API
  path is still 81.28 (missing the 31.8 m² RR floor area) — next
  slice closes that.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:45 +00:00
Khalim Conn-Kowlessar
965718d78e Slice 99e: PV pitch enum-not-degrees + cert 9501 Layer 2 chain test
`EpcPropertyData.PhotovoltaicArray.pitch` is the RdSAP 10 §11.1
integer code (1=0°, 2=30°, 3=45°, 4=60°, 5=90°) — NOT degrees. The
cascade's `cert_to_inputs._PV_PITCH_DEG_BY_CODE` reads the code, not
the value. Slice 99d's mapper passed the raw degrees (45) directly,
which fell through to the default 30° lookup (Appendix U3.3 S(SW,
30°) ≈ 1029 kWh/m²/yr vs S(SW, 45°) ≈ 1004 — 2.5% over-credit on
the PV generation, manifesting as -£6.27 over-credit on total cost
→ +0.23 SAP delta).

Added `_elmhurst_pv_pitch_code` helper that maps the lodged degrees
to the nearest tabulated code (snap-to-nearest fallback for non-
tabulated tilts; defaults to code 2 / 30° per the cascade's own
`_PV_PITCH_DEG_DEFAULT`).

Effect on cert 9501 Summary path:
- pv_export_credit £256.30 → £250.02 (= worksheet 250.02 exact)
- total_fuel_cost £842.94 → £849.21 (= worksheet 849.21 exact)
- sap_continuous 68.7577 → **68.5252** (= worksheet 68.5252 exact;
  Δ -0.0000 at 1e-4)

`test_summary_9501_full_chain_sap_matches_worksheet_pdf_exactly`
added — the second flat-shaped cert pinned to worksheet SAP at 1e-4
after the cert 0330 / 001479 boiler-house chain tests. Third boiler
validation cert closed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:45 +00:00
Khalim Conn-Kowlessar
a3a30957de Slice 99d: surface PV array from Elmhurst Summary §19.0
Cert 9501 lodges measured PV: 2.36 kWp South-West, 45° pitch, "None
Or Little" overshading. The worksheet's §10a credit (-250.02 GBP =
PV used in dwelling £-129.49 + PV exported £-120.53) depends on the
Appendix M / Appendix U3.3 cascade reading these from
`SapEnergySource.photovoltaic_arrays`. The prior extractor only
captured the `photovoltaic_panel: "Panel details"` label — the
actual kW / orientation / elevation / overshading were silently
dropped, so the cascade computed total cost ~£250 too high → ECF
2.92 vs worksheet 2.26 → SAP 59.26 vs 68.53 (Δ -9.27).

Changes:
- Extend `surveys.elmhurst_site_notes.Renewables` with 4 new
  optional fields: pv_peak_power_kw / pv_orientation /
  pv_elevation_deg / pv_overshading.
- Add `ElmhurstSiteNotesExtractor._extract_pv_array_detail` —
  anchors on "Photovoltaic panel details" then reads the 4
  consecutive value lines (kWp, orientation, elevation, overshading).
- Add `_elmhurst_pv_arrays` mapper helper to build the
  `[PhotovoltaicArray(...)]` list when all 4 values are present;
  return None for the "PV absent" path the cascade already handles.
- Add `_ELMHURST_PV_OVERSHADING_TO_RDSAP` map: "None Or Little" → 1
  (ZPV=1.0 per cert_to_inputs._PV_OVERSHADING_FACTOR), "Modest" →
  2, "Significant" → 3, "Heavy" → 4. RdSAP omits SAP10.2 Table M1's
  5th "Severe" bucket.
- Wire `photovoltaic_arrays=_elmhurst_pv_arrays(survey.renewables)`
  into `from_elmhurst_site_notes`'s `SapEnergySource(...)` call.

Effect on cert 9501 Summary path:
- sap_continuous 59.2585 → 68.7577 (target 68.5252; Δ +0.23)
- total_fuel_cost £1099 → £843 (worksheet £849; -£6 over-credit)
- ECF 2.92 → 2.24 (worksheet 2.26; -0.02 over-credit)

The remaining +0.23 SAP / +£6 cost drift is a precision gap in the
Appendix M cost-offset cascade for measured PV (not a missing-data
gap); next slice closes it to 1e-4.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:45 +00:00
Khalim Conn-Kowlessar
ccef01bf27 Slice 99c: Elmhurst mapper — RR gables external for flats + SO wall code
Cert 9501 worksheet line (29a) lodges both RR gable walls (13.50 +
15.95 m²) as EXTERNAL walls at U=1.7 (the main-wall U for age B
Solid Brick), contributing +50.07 W/K on top of the 168.74 W/K main-
wall HLC for a (29a) total of 218.81 W/K. Two mapper gaps blocked
this:

1. The Summary mapper defaulted un-typed RR gable walls
   (`surface.gable_type=None`) to `gable_wall` (party, U=0.25 per
   RdSAP Table 4 row 2). For flats with RR — top-floor dwellings
   that sit at the end of a building block with no neighbour above
   — the gable walls are exposed external, not party. Threading
   `is_flat=property_type.lower()=='flat'` through
   `_map_elmhurst_building_parts` → `_map_elmhurst_room_in_roof` →
   `_map_elmhurst_rir_surface` switches the default for un-typed
   gables on flats to `gable_wall_external` (cascade falls through
   to main-wall U `uw`).

2. The Elmhurst wall-construction code map was missing "SO Solid
   Brick" (newer Elmhurst PDF variant; the cohort certs lodge "SB
   Solid Brick"). Cert 9501's main wall fell through to
   wall_construction=None → cascade uw=1.5 (Table-18 unknown-cons
   age-B default) instead of 1.7 (Table-18 solid-brick age-B).
   Added "SO": 3 alongside "SB": 3 — same SAP10 mapping.

Joint effect on cert 9501 Summary path:
- walls HLC 148.89 → 218.81 (exact worksheet match)
- party_walls HLC 7.36 → 0.00 (gables no longer route to party)
- (37) total HLC 229.71 → 296.68 (exact worksheet match)

Cohort regression check: 259/0 mapper-chain + extractor + golden
tests pass. Houses keep the historical un-typed-gable → party
default. Houses lodging "SO" instead of "SB" now also pick up the
correct solid-brick U-value.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:45 +00:00
Khalim Conn-Kowlessar
e1348c424b Slice 99b: Elmhurst mapper — flat floor-position from floor.location
For flats, `EpcPropertyData.dwelling_type` needs a "Top-floor" /
"Mid-floor" / "Ground-floor" prefix so the cascade's
`_dwelling_exposure` (cert_to_inputs.py) gates floor + roof party-
surface routing correctly per RdSAP 10 §5. Before Slice 99a, the
broken `built_form` ("2.0 Number of Storeys:") meant cert 9501's
`dwelling_type` was "2.0 Number of Storeys: flat" — never matched
any flat-prefix in the cascade, so the cert was treated as a fully-
exposed dwelling (worksheet had floor U=0 / party-ceiling-down, but
cascade routed both as exposed → Δ +9.25 W/K on floor alone). After
99a's empty-attachment fix the prefix was just " flat" — still no
match.

Slice 99b composes the position prefix from the Summary's lodged
floor location + RR presence:
- floor.location lodges "dwelling below" → floor is party
  - + RR present → Top-floor (roof exposed)
  - + no RR → Mid-floor (roof party)
- floor.location doesn't lodge dwelling below → Ground-floor

For cert 9501: floor.location="A Another dwelling below" + RR
present (cert lodges Room-in-Roof with gable walls + flat ceiling).
Resulting `dwelling_type` = "Top-floor flat" — matches the cascade's
`_dwelling_exposure` "top-floor" prefix → has_exposed_floor=False,
has_exposed_roof=True, the worksheet's exposure shape.

Houses keep the historical contract: `f"{built_form}
{property_type.lower()}"` — cohort hand-builts and the 2 boiler
chain tests (001479 + 0330) unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:45 +00:00
Khalim Conn-Kowlessar
94262e5f6c Slice 98: API path shower-counts + window-rounding → cert 0330 1e-4
Closes the cert 0330 API path Layer 4 gate (Δ -0.000011 vs worksheet
SAP 61.5993) by surfacing two previously-broken inputs to the HW
cascade plus aligning the wall-net-deduction with the worksheet's
2-d.p.-per-window rounding convention.

(a) RdSAP schema 21.0.x `shower_outlets` shape mismatch:
    real-API certs lodge `[{"shower_outlet_type": N, "shower_wwhrs":
    M}, ...]` (a list of bare ShowerOutlet dicts), but the schema
    modelled it as `[ShowerOutlets]` with nested
    `{"shower_outlet": {...}}` wrappers. `from_dict` silently dropped
    every bare element's payload (left `shower_outlet=None`),
    blanking the cascade's mixer/electric counts on cert 0330 (and 4
    other golden fixtures). Normalisation in `from_api_response`
    rewrites the bare list shape to the wrapped form before
    `from_dict` parses, so the schema's `ShowerOutlets` dataclass
    sees the data it expects — no schema-class breakage downstream.

    New helper `_count_shower_outlets_by_type` walks the normalised
    list and counts outlets by integer code:
    - code 1 → mixer (drives `mixer_shower_count`)
    - code 2 → electric (drives `electric_shower_count`)
    Empirically derived from the golden cohort + Summary mapper
    cross-check (cert 0330 lodges code 2 + Summary surfaces "Electric
    shower"; cert 0240 lodges multiple code-1 outlets on a
    conventional oil-boiler + cylinder dwelling). No spec page
    reference found.

    Wired into both `from_rdsap_schema_21_0_0` and
    `from_rdsap_schema_21_0_1`. Effect on cert 0330 API path:
    `mixer_shower_count` 1 (cascade default) → 0; `electric_shower_
    count` None (= 0) → 1; HW kWh 3172.65 → 2111.93. SAP Δ +2.1155
    → -0.0012.

(b) Per-window 2-d.p. area rounding in wall-net deduction:
    RdSAP 10 §15 rounds per-window area at 2 d.p. before any sum.
    The cascade's `windows_w_per_k_total` branch already rounds
    per-window for the curtain transform; the wall-net deduction
    branch (computing `gross_wall - windows - door` for the (29a)
    line) was rounding the SUM once, which for cert 0330's 9 Main
    windows yields 12.22 m² vs the worksheet's per-window-rounded
    12.23 m² — Δ +0.01 m² × U=1.5 = +0.015 W/K on (29a). Aligned
    both branches to round per-window, matching worksheet line (27).
    SAP Δ -0.0012 → -0.000011.

Layer 4 chain test added:
- `test_api_0330_full_chain_sap_matches_worksheet_pdf_exactly` pins
  cert 0330 API path SAP at 1e-4 vs worksheet 61.5993. This is the
  second boiler validation cert with a Layer 4 1e-4 gate (cert
  001479 is the first).

Re-pinned golden cert residuals (shifted by changes (a) and (b)):
- 0300: PE +7.52 → +8.44, CO2 -0.27 → -0.23 (Slice 98a — electric
  shower count surfaced; cert has 1 electric + 1 mixer outlets)
- 2130: PE -38.17 → -38.18, CO2 +0.305 → +0.304 (Slice 98b —
  window rounding edge)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:45 +00:00
Khalim Conn-Kowlessar
f57e359f38 Slice 97: API glazing_type=2 → RdSAP 10 Table 24 (DG 2002-2021)
Cert 0330 API path was at Δ +1.68 SAP after Slice 96 because all 11
windows (`sap_windows[*].glazing_type = 2`) fell through
`_API_GLAZING_TYPE_TO_TRANSMISSION` (which only covered codes 3 +
13) to the cascade's `u_window` default (~U=2.5). The cert's actual
glazing is "Double, England/Wales 2002 or later (before 2022)" per
RdSAP 10 Table 24 page 79 → U=2.0, g=0.72 (PVC/wooden frame).

RdSAP 10 Table 24 verbatim:
  Glazing       Installed                       Gap       U-value   g
  Double or     England/Wales: 2002 or later                2.0    0.72
  triple        Scotland: 2003 or later         any
  glazed        N. Ireland: 2006 or later

The cascade's curtain-transform path (`U_eff = 1/(1/U + 0.04)`)
takes U_raw=2.0 to U_eff=1.8519 — matching the worksheet's per-
window (27) U value column to 4 d.p. across all 11 windows.

Effect on cert 0330 API path:
- Windows HLC 36.4545 → 29.7407 (= worksheet exact)
- (37) total fabric heat loss 244.48 → 237.77 (≈ worksheet 237.75)
- SAP Δ +1.68 → +2.12 (windows fix unmasks the standalone HW gap,
  which the next slice closes)

Re-pinned residuals (5 affected golden certs):
- 0240: PE +17.85 → +15.69; CO2 +1.01 → +0.90; SAP unchanged at -15
- 0300: PE +7.76 → +7.52; CO2 -0.25 → -0.27; SAP unchanged at +0
- 0390-2954: PE -26.46 → -28.68; CO2 -2.56 → -2.76; SAP unchanged
- 7536: SAP +0 → +1; PE -3.45 → -6.51; CO2 -0.09 → -0.17
- 8135: PE -2.41 → -5.31; CO2 -0.02 → -0.07; SAP unchanged at +0

The PE/CO2 widening on some certs (vs lodged GOV.UK values) reflects
the cascade now using the spec table U=2.0 where those certs may have
lodged a higher project-specific U — the spec-table is the right
floor for the API path; per-window measured U overrides would belong
on the cert's window_transmission_details.u_value field, which the
API JSON doesn't surface uniformly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:28:45 +00:00
Khalim Conn-Kowlessar
09fb6f1b73 fix: address 22 project-wide test failures from previous sweep
Three orthogonal issues surfaced by the full project test sweep:

1. Dockerfile.test: install poppler-utils alongside postgresql.
   The 20× `pdfinfo: No such file or directory` failures in
   test_summary_pdf_mapper_chain.py traced to the CI test image
   missing the poppler-utils system package (pdfinfo + pdftotext).
   `_summary_pdf_to_textract_style_pages` shells out to these for
   layout-preserving PDF text extraction. Pure-Python alternatives
   (pymupdf, pypdf) don't reproduce pdftotext -layout's row-major
   table cell ordering, which the Elmhurst Summary extractor depends
   on. So system poppler is the right fix; added to apt-get install
   with an explanatory comment.

2. test_from_rdsap_schema.py::test_total_floor_area: expected 55.0,
   got 45.82. Slice 95 (commit f502db8c) changed the API mapper to
   compute total_floor_area_m2 from the precise sum of per-bp
   sap_floor_dimensions[*].total_floor_area rather than the lodged
   scalar. The synthetic 21_0_1.json fixture has lodged total_floor_
   area=55 + a single fd of 45.82 (per-bp sum doesn't match lodged).
   Updated the expected to 45.82 with a comment explaining the
   Slice 95 per-bp-sum precedence.

3. test_elmhurst_end_to_end.py::test_emitter_temperature: expected
   "Unknown", got int 1. Pre-existing failure (confirmed by checking
   out commit 985a59e1 and reproducing). `_elmhurst_emitter_
   temperature_int` in datatypes/epc/domain/mapper.py converts the
   Elmhurst Summary §14 "Design flow temperature: Unknown" to SAP10.2
   Table 4d code 1 (high-temp / ≥45 °C, worst-case for unmeasured
   boilers). The int encoding mirrors the API mapper's MainHeating
   Detail.emitter_temperature for cross-mapper field parity. Test
   updated to expect 1 (with comment) since the conversion is the
   correct production behaviour.

Verified:
- Layer 4 1e-4 gate (test_api_001479_full_chain_sap_matches_worksheet_
  pdf_exactly) still GREEN.
- Wider domain sweep (domain/sap10_calculator + domain/sap10_ml):
  1654 passed / 20 failed, exact pre-fix baseline.
- All three originally-failing tests now PASS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 13:34:51 +00:00
Khalim Conn-Kowlessar
68401c517a refactor: lift-and-shift packages/domain/src/domain/ml → domain/sap10_ml
Sibling migration to the sap10_calculator move — `domain.ml` now lives
at the root-level layout (`domain/sap10_ml/`) matching the pattern
already used by `domain.addresses`, `domain.tasks`, `domain.postcode`,
and `domain.sap10_calculator`.

Changes:

- `git mv packages/domain/src/domain/ml → domain/sap10_ml` (19 files;
  history preserved).
- Subpackage rename: `domain.ml` → `domain.sap10_ml`. 32 references
  rewritten across .py and .md files: 11 internal + 21 external
  (datatypes/epc/domain/mapper.py, 14 files in domain/sap10_calculator,
  2 backend tests, 2 ADRs, 1 README, 1 design doc).
- Path-string updates: `pytest.ini` testpath
  `packages/domain/src/domain/ml/tests` → `domain/sap10_ml/tests` so
  ML tests stay in the default auto-discovered sweep. `CONTEXT.md`
  also updated.

`packages/domain/src/domain/` is now empty — the workspace `domain/`
tree has been fully migrated. Together with the `domain/__init__.py`
deletions from the sap10_calculator commit (29ac35cc), `domain` is
now a single root-level namespace package with subpackages
{addresses, sap10_calculator, sap10_ml, tasks} + the standalone
`postcode.py` module.

Verified:

- Focused sweep (backend mapper-chain + sap10_calculator worksheet
  e2e + golden fixtures): 99 passed / 19 failed — identical baseline.
- Wider sweep (all sap10_calculator + sap10_ml): 1654 passed / 20
  failed (same pre-existing failures).
- domain/sap10_ml/tests: 210/210 PASSED at new path.
- Pyright net-zero: heat_transmission.py 13, cert_to_inputs.py 35,
  mapper.py 33, rdsap_uvalues.py 1 (all unchanged from baseline).

Note: `packages/domain/pyproject.toml` still declares
`packages = ["src/domain"]` for the hatchling wheel — that target
directory is now empty and the wheel build is effectively a no-op.
Retiring the workspace package or repointing the wheel is a follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 13:01:35 +00:00
Khalim Conn-Kowlessar
29ac35ccbe refactor: lift-and-shift packages/domain/src/domain/sap → domain/sap10_calculator
Migration of the SAP 10.2 calculator package from the uv-workspace
src-layout (`packages/domain/src/domain/sap`) to the root-level layout
(`domain/sap10_calculator`), matching the pattern already used by
`domain.addresses` / `domain.tasks` / `domain.postcode`.

Changes:

- `git mv packages/domain/src/domain/sap → domain/sap10_calculator`
  (92 files; git auto-detected all as renames so blame/history is
  preserved).
- Subpackage rename: `domain.sap` → `domain.sap10_calculator`. 48
  Python files rewritten (`from domain.sap.X` → `from domain.sap10_
  calculator.X`); zero remaining `domain.sap` refs after the sed pass.
- Path-string updates: 3 .py files (test fixtures + xlsx loader) +
  6 markdown docs (CONTEXT.md, 2 ADRs, 3 sap-spec docs, sap10_
  calculator/README.md) had hard-coded `packages/domain/src/domain/
  sap/...` paths rewritten to `domain/sap10_calculator/...`.
- `Path(__file__).parents[N]` rebasing: the old tree was 3 levels
  deeper than the new one (`packages/domain/src/`), so 4× `parents[7]`
  became `parents[4]` and 1× `parents[6]` became `parents[3]` across
  `tables/pcdb/{__init__.py, postcode_weather.py, etl.py}`,
  `worksheet/tests/_xlsx_loader.py`, and `tests/test_pcdb_etl.py`.
- PEP 420 namespace package: deleted both `domain/__init__.py`
  (root + workspace, both load-bearing only as empty/docstring) so
  Python combines `domain.sap10_calculator` (root) and `domain.ml`
  (workspace) into one namespace package. Confirmed via
  `domain.__path__ == ['/workspaces/model/domain',
  '/workspaces/model/packages/domain/src/domain']`. Without this,
  the root `domain/__init__.py` shadowed the workspace one and
  `domain.ml` was unreachable.

Verified:

- Full sweep (`backend/documents_parser/tests/test_summary_pdf_
  mapper_chain.py + domain/sap10_calculator/worksheet/tests/test_
  e2e_elmhurst_sap_score.py + domain/sap10_calculator/rdsap/tests/
  test_golden_fixtures.py`): 99 passed / 19 failed — exact same
  counts as pre-refactor. All 19 failures pre-existing (9 hand-built
  001479 + 6 cohort diff + 4 cohort chain non-spec).
- Wider sweep (all sap10_calculator + domain.ml): 1654 passed /
  20 failed (the +1 vs the focused sweep is the pre-existing
  `test_roof_insulated_assumed_with_ni_thickness_uses_50mm_per_
  section_5_11_4` which was already failing on the previous baseline).
- Pyright net-zero on the three load-bearing baselines:
  `heat_transmission.py` 13, `cert_to_inputs.py` 35, `mapper.py` 33.

Lift-and-shift only — no semantic renames (`Sap10Calculator` stays
`Sap10Calculator`), no testpaths edits in pytest.ini (sap tests
continue to be invoked by explicit pytest paths).

Note: `domain.ml` still lives at `packages/domain/src/domain/ml/`.
Migrating it would close out the dual-`domain/` layout but is
out of scope for this commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 12:22:37 +00:00
Khalim Conn-Kowlessar
f502db8c74 Slice 95: API mapper TFA from per-bp dims + window area 2dp rounding — cert 001479 to 1e-4
The end-to-end production cascade `from_api_response → cert_to_inputs →
calculate_sap_from_inputs` now hits cert 001479's worksheet continuous
SAP 69.0094 at abs < 1e-4 (was +0.000584). Two fixes:

1. API mapper: `from_rdsap_schema_21_0_{0,1}` computes `total_floor_
   area_m2` as Σ per-bp `sap_floor_dimensions[*].total_floor_area.value`
   (cert 001479: 30.45+30.77+5.37+1.92 = 68.51), not the lodged scalar
   (rounded integer 69). `water_heating_from_cert` reads `epc.total_
   floor_area_m2` directly for occupancy N (Appendix J), which propagates
   to HW kWh (+6.31 → ~0), Appendix L lighting (+0.98 → 0), and internal
   gains (+25.72 W·months → 0).

2. Cascade window area rounding per RdSAP 10 §15 "Rounding of data"
   (p.66): "All element areas (gross) including window areas: 2 d.p."
   `solar_gains.py` and `internal_gains.py` now round `w * h` to 2 d.p.
   to match the existing `heat_transmission.py` pattern (line 344).
   Closes the residual solar gains delta (+1.50 W·months → 0) that
   became dominant once TFA was fixed.

Re-pinned 5 golden cert residuals where TFA + area rounding shifted
output: 0240 (SAP -14→-15, PE +14.6650→+17.8450, CO2 +0.8060→+1.0097),
6035 (PE +48.2971→+49.5139, CO2 +1.1016→+1.1423), 8135 (PE -2.4194→
-2.4072, CO2 -0.0198→-0.0195), 2130 (PE -38.1521→-38.1666), 0390
(PE +1.6837→+1.6962, CO2 +0.0637→+0.0639).

New test: `test_api_001479_full_chain_sap_matches_worksheet_pdf_
exactly` formalises Layer 4 of the validation stack as a 1e-4 gate.

Pyright net-zero (mapper.py 33).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 09:30:41 +00:00
Khalim Conn-Kowlessar
0320341837 Slice 94: API mapper sheltered_sides + floor_type — cert 001479 to 1e-3
Two API mapper gaps surfacing the cert 001479 +1.18 SAP gap post
Slice 93:

(1) `SapVentilation.sheltered_sides` from API `built_form`

The API schema doesn't lodge sheltered_sides as a discrete field —
it's derived per RdSAP §S5 from the dwelling's built_form. The
cascade defaults to 2 when missing (right for Mid-Terrace) but wrong
for detached/semi/end-terrace. Cert 001479 (built_form=2 Semi-
Detached) needs 1 sheltered side; default 2 over-counted shelter
factor → line (21) under by 0.185 → ventilation under by ~2 ACH/yr.

New `_api_sheltered_sides` translator + `_API_BUILT_FORM_TO_
SHELTERED_SIDES` table (1=Detached/0, 2=Semi/1, 3=End-T/1, 4=Mid-T/2,
5=Encl-End/2, 6=Encl-Mid/3) — mirrors the cohort Elmhurst
`_ELMHURST_SHELTERED_SIDES_BY_BUILT_FORM` keyed by the API integer
enum.

(2) `SapBuildingPart.floor_type` from API `floor_heat_loss`

The Slice 87 spec rule for §2(12) suspended-timber-floor infiltration
(`_has_suspended_timber_floor_per_spec` in cert_to_inputs) requires
the Main bp's lowest floor to have `floor_type == "Ground floor"` to
apply the (12)=0.2/0.1 rule. The API mapper wasn't surfacing this
string (only floor_construction_type), so the spec rule short-
circuited to False even for genuine ground floors and the cascade's
line (12) was 0.0 instead of 0.2.

New `_api_floor_type_str` translator + `_API_FLOOR_HEAT_LOSS_TO_
FLOOR_TYPE` table (1="To external air" for cantilevered exposed
floors, 7="Ground floor"). Routes correctly for cert 001479: Main +
Ext1 carry floor_heat_loss=7 → both Ground floor; Ext2 carries
floor_heat_loss=1 → exposed (its is_exposed_floor=True already lifts
the floor U cascade to Table 20).

**Result on cert 001479 API path:**
  SAP delta: +1.18 → +0.0006 (essentially exact match at integer SAP)
  Cascade SAP=69.0100 vs worksheet 69.0094 — within 1e-3 of target.

The remaining ~0.001 SAP gap is dominated by:
  - hot_water_kwh_per_yr: +6.7 (API 2365.0 vs target 2358.3)
  - internal_gains Σ: +25.7 W·months (subtle gain-cascade differences)
  - solar_gains Σ: +1.5 W·months
Sub-1e-3 SAP impact each; would need slice-by-slice diagnosis to
close to the strict 1e-4 bar.

Layer 3 API-mapper-vs-Summary-mapper EpcPropertyData equivalence:
the API path now produces SAP within 0.001 of the Summary path
(Summary Layer 2 = 69.0094 EXACT). API integer SAP = 69 = worksheet
integer SAP = 69 ✓ — matches the API's published energy_rating_
current=69 (zero residual on the production goal metric).

Golden cert residuals: 8 of 10 expectations shifted by Slices 90-94
cascade improvements. Spec-compliance shifts; new residuals pinned.

Pyright: mapper.py 33 → 33.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 08:27:10 +00:00
Khalim Conn-Kowlessar
7281b7b300 Slice 93: API mapper window_transmission_details from glazing_type
The API schema lodges `glazing_type` (int code) per window but
`window_transmission_details=None` and `frame_factor=None`. Without
per-window U lodgement the cascade falls back to a single global
`u_window(None,None,None)=2.5` × total area, which over-shot cert
001479's window W/K by +2.63 (cascade 46.23 vs worksheet 43.60).

Fix: `_API_GLAZING_TYPE_TO_TRANSMISSION` lookup translates
`glazing_type` → (u_value, solar_transmittance, frame_factor) and
the mapper populates `WindowTransmissionDetails` + `frame_factor`
per window so the cascade uses its per-window U fast path (each
window contributes A × U_eff_individual rather than total_area ×
U_eff_global). Two codes mapped now:

  3  → DG pre-2002        U=2.8  g=0.76  FF=0.70
  13 → DG post-2022 Argon U=1.4  g=0.72  FF=0.70

Cert 001479 lodges 8 Main windows at glazing_type=3 + 1 Ext1 window
at glazing_type=13 — exactly the manufacturer-lodged worksheet
values. The cascade now matches the worksheet's
`Windows 1: 13.96 × 2.518 = 35.15 W/K` and
`Windows 2: 6.37 × 1.3258 = 8.45 W/K` → **windows W/K EXACT 43.5962**.

**Cert 001479 API path: fabric heat loss is now COMPLETELY EXACT
across all 6 components** (walls/party/roof/floor/windows/doors all
match worksheet at the worksheet's 4 d.p. precision).

Total fabric:           139.4957 W/K  ✓ (was 122.6130 before Slice 87)
  walls:                 39.7652 ✓
  party walls:           17.0700 ✓
  roof:                  10.3438 ✓
  floor:                 23.1705 ✓
  windows:               43.5962 ✓
  doors:                  5.5500 ✓

API SAP delta progression through Slices 87-93:
  Slice 87 baseline:     +3.0752
  After Slice 90:        +1.5298  (party walls)
  After Slice 91:        +1.0970  (descriptive strings + roof desc)
  After Slice 92:        +1.0022  (floor dims)
  After Slice 93:        +1.1846  (windows — fabric now EXACT)

The +1.18 SAP gap is now PURELY non-fabric: candidates are internal
gains, solar gains, ventilation, MIT, or hot water cascade — to
diagnose in the next slice.

Golden cert residuals updated for the cascade improvements. Pyright
net-zero on mapper.py (33 → 33).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 08:18:33 +00:00
Khalim Conn-Kowlessar
8e752e5720 Slice 92: API mapper floor dimensions (SAP +0.25m + exposed-floor + NI→None)
Three coupled API-mapper fixes that close the cert 001479 floor-W/K
gap from +4.39 to EXACT 0.

(1) Upper-floor room_height_m += 0.25 m

SAP 10.2 convention: every storey above the lowest adds 0.25 m to the
lodged room_height for the joist/floor-void contribution (cohort
Elmhurst mapper already applies this via `_UPPER_FLOOR_HEIGHT_ADD_M`
at line 2338). The API schema lodges the raw internal height; the
cascade volume computation needs the +0.25 m before computing party-
wall area and ventilation ACH. For cert 001479 Main floor=1, raw
lodge 2.28 m vs worksheet 2.53 m — without the fix, party W/K was
short by 0.87 (party_wall_length × delta_height × U).

(2) `is_exposed_floor=True` when `bp.floor_heat_loss == 1`

API integer code 1 on `floor_heat_loss` signals an exposed floor (a
bp's lowest storey hanging over an unheated space or external air).
Mirrors the cohort Elmhurst mapper's `_is_floor_exposed_to_unheated_
space` for the API path. Applied only to the lowest storey (floor==0)
per the cohort 000490/000487 fixture convention. For cert 001479
Ext2 (cantilevered upper-storey extension over external air), this
routes the cascade through Table 20's `u_exposed_floor` (U=1.20)
rather than the BS EN ISO 13370 ground-floor formula.

(3) `floor_insulation_thickness="NI" → None` for cascade default

API certs commonly lodge "NI" (no measured thickness) on floors that
aren't actually uninsulated — for newer age bands (I-M with non-zero
Table 19 defaults: 25/75/100/100/140 mm) the cascade should use the
age-band default insulation rather than treating "NI" as explicit
zero. Translate "NI" → None at the mapper boundary so `u_floor`
reaches the Table 19 fallback. For cert 001479 Ext1 (age M, suspended
timber, NI lodged) the cascade now returns U=0.20 via the age-M
140 mm default — previously gave U=1.05 from treating thickness as 0.

**Floor W/K is now EXACT for cert 001479** (23.1705 ✓).

Impact on cert 001479 API path:
  Before Slice 87: +3.0752 SAP delta
  After  Slice 90: +1.5298
  After  Slice 91: +1.0970
  After  Slice 92: +1.0022 (floor W/K exact; remaining gap is in
                            windows / gains — Slice 93)

Golden cert residual updates: 7 of 10 expectations shifted from the
floor cascade improvements (NI→None changed many certs with age I-M
extensions). Spec-compliance shifts; new residuals committed.

Pyright: mapper.py 33 → 33.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 08:09:28 +00:00
Khalim Conn-Kowlessar
2cebba28dc Slice 91: API mapper descriptive strings + roof description per-bp fix
Three tightly-coupled fixes that close another big chunk of cert
001479's API-path SAP gap.

(1) Surface human-readable strings on SapBuildingPart from API ints

The API mapper sets `bp.floor_construction_type` and `bp.roof_
construction_type` strings via int→string lookups so the cascade
fixes from Slices 88 + 89 also apply to the API path:
  - `_API_FLOOR_CONSTRUCTION_TO_STR`: 1=Solid, 2=Suspended timber
    (drives `u_floor`'s suspended-branch selection)
  - `_API_ROOF_CONSTRUCTION_TO_STR`: 1=Flat, 3=Pitched no-loft,
    4=Pitched-access-to-loft, 5=Vaulted, 8=Pitched-sloping-ceiling
    (drives the cos(30°) inclined-surface factor)

(2) Pre-1950 PS sloping ceiling → thickness=0 (port Slice 57)

`_api_resolve_sloping_ceiling_thickness` mirrors Slice 57's Elmhurst-
mapper logic: when a PS pitched-sloping-ceiling roof (API code 8)
carries no insulation thickness on a pre-1950 dwelling (age bands
A-D), set thickness=0 so the cascade returns the uninsulated U=2.30
rather than the age-band-default (e.g. U=0.40 for age C).

(3) Cascade: per-bp `roof_thickness=0` overrides global "insulated"
description

For cert 001479 the API's `epc.roofs` carries two descriptions
(Main's "Pitched, 300mm loft insulation" + Ext1's "Pitched,
insulated") which the cascade joined into a global
`roof_description`. `u_roof`'s Table 18 footnote (2) ("assumed
insulation if described as insulated") then incorrectly upgraded
Ext2's explicitly-uninsulated thickness=0 to ins_mm=50 → U=0.68
instead of 2.30. Fix: in `heat_transmission.py` per-bp roof loop,
drop `roof_description` when the per-bp `roof_thickness` is
explicitly 0. The per-bp thickness lodgement is the authoritative
signal; the global description is for cases where no thickness was
lodged at all.

Impact on cert 001479 API path (cumulative through Slice 91):

  Before Slice 87: +3.0752 SAP delta
  After  Slice 90: +1.5298 (party wall enum fix)
  After  Slice 91: +1.0970 (descriptive strings + roof desc fix)

Roof W/K is now EXACT for cert 001479 (10.3438 = worksheet target).

Golden cert residual updates: 8 of 10 expectations shifted by
Slices 87-91 cascade improvements:
  0240: SAP -10→-13, PE -2.05→+10.45, CO2 -0.04→+0.59
  6035: SAP  -4→ -5, PE +34.02→+34.50, CO2 +0.76→+0.77
  7536: SAP  +3→ +2, PE -22.53→-15.83, CO2 -0.60→-0.42
  8135: SAP unchanged, PE -16.51→-16.37, CO2 unchanged
  2130: SAP unchanged, PE -51.90→-51.10, CO2 +0.14→+0.15
  0240/6035/7536: spec-compliance shifts (more accurate U-values
    move further from the assessor's lodged SAP, because the
    assessor's SAP was itself produced with the same incorrect
    paths the cascade previously matched).

Pyright: mapper.py 33 → 33; heat_transmission.py 13 → 13;
test_golden_fixtures.py 0 → 0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 21:41:34 +00:00
Khalim Conn-Kowlessar
fbbdca49ca Slice 90: API mapper translates party_wall_construction → SAP10 enum
The GOV.UK API `party_wall_construction` field uses a different enum
from the regular `wall_construction` field — RdSAP 10 Table 15 (p.31
"U-values of party walls") defines 5 categories that the API encodes
as integer codes 0..5 plus a "NA" string for extensions without a
party wall. The cascade's `u_party_wall` consumes the SAP10
`wall_construction` enum directly, so passing the raw API code gave
wildly wrong U-values (API code 2 = "Cavity masonry unfilled" →
should produce U=0.5, but cascade interpreted code 2 as SAP10
WALL_STONE_SANDSTONE → 0.0 W/m²K).

Impact on cert 001479 (the only golden fixture with party=2 lodged):

  Before: party_walls = 0.00 W/K (cascade applied U=0.0)
  After:  party_walls = 16.21 W/K (cascade applies U=0.5)

  API mapper → cascade SAP delta:
  Before Slice 90: +3.0752
  After  Slice 90: +1.5298

The remaining party-wall shortfall (16.21 vs target 17.07 W/K, -0.87
W/K) is the room_height_m +0.25 SAP convention not yet applied to
the API path — Slice 92 will close that.

Translation table (per `_API_PARTY_WALL_CONSTRUCTION_TO_SAP10`):
  0 → None (no party wall present; party_wall_length=0 anyway)
  1 → SAP10 code 3 (Solid Brick) → u_party_wall = 0.0
  2 → SAP10 code 4 (Cavity)      → u_party_wall = 0.5
  3 → SAP10 code 4 (Cavity)      → cascade emits 0.5 (TODO: 0.2 for
                                    cavity filled needs cascade extension)
  4 → None (Unable, house)       → u_party_wall default 0.25
  5 → None (Unable, flat)        → TODO: spec says 0.0 for flats

Schema change: `SapBuildingPart.party_wall_construction` is now
`Optional[Union[int, str]]` (was `Union[int, str]`) — the "0 sentinel
for Unable" convention was already in cohort hand-builts but the type
forbade the cleaner `None` representation. To preserve the dataclass
"no-default after default" rule, `sap_floor_dimensions` gets a
`field(default_factory=list)`.

Translation applied across all 6 from_rdsap_schema_* mappers + the
flagship `from_rdsap_schema_21_0_1` used by 001479.

Pyright: mapper.py 35 → 33 (cleared 7 cohort party_wall type errors
that were pre-existing, balanced against the schema change). Cohort
cascade pins remain GREEN (66 of 66); no new test regression.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 21:21:52 +00:00
Khalim Conn-Kowlessar
006e9842c9 Slice 89: PS pitched-sloping-ceiling roof area uses inclined surface
RdSAP 10 §3.8 "Roof area" spec:
  "Roof area is the greatest of the floor areas on each level...
   In the case of a pitched roof with a sloping ceiling, divide the
   area so obtained by cos(30°)."

The cascade previously used `top_floor_area_m2` (horizontal projection)
verbatim for the roof area calculation — correct for flat roofs and
pitched-with-loft (where assessors measure on the horizontal), but
~15% under-area for PS pitched-sloping-ceiling roofs (1/cos(30°) =
1.1547). For cert 001479 Ext1 + Ext2 (both PS sloping ceiling):

  Ext1: cascade 5.37 m² × 0.15 = 0.81 W/K
        worksheet 6.20 m² × 0.15 = 0.93 W/K  (delta -0.12)
  Ext2: cascade 1.92 m² × 2.30 = 4.42 W/K
        worksheet 2.22 m² × 2.30 = 5.11 W/K  (delta -0.69)
  Total roof W/K shortfall: -0.81

Fix: detect PS pitched-sloping-ceiling roofs via `bp.roof_construction
_type` (string lodgement from the Summary §8 "Roof Type" line) and
apply the 1/cos(30°) inclination factor before rounding the gross
roof area.

Schema addition: `SapBuildingPart.roof_construction_type: Optional[
str] = None` mirrors the existing `floor_construction_type`. Mapper
populates it via `_strip_code(roof.roof_type)` for both Main and
Extension bps — the Elmhurst Summary lodges the roof type
explicitly (e.g. "PS Pitched, sloping ceiling" / "PA Pitched (slates
/tiles), access to loft" / "Flat").

**Result: cert 001479 Summary → mapper → cascade now lands at SAP
69.0094 EXACT (delta -0.0000) — Layer 2 GREEN at 1e-4.** Full fabric
breakdown matches the worksheet exactly:
  fabric_heat_loss = 139.4957 W/K  ✓
    walls   = 39.7652 ✓  party   = 17.0700 ✓
    roof    = 10.3438 ✓  floor   = 23.1705 ✓
    windows = 43.5962 ✓  doors   =  5.5500 ✓

Layer 2 status across the 7 cert chain tests:
  000477  GREEN (was GREEN)
  000516  GREEN (was GREEN)
  001479  GREEN (new — was +1.19 before Slice 87)
  000474  RED   -0.7524 (Elmhurst (12) non-spec — orthogonal)
  000480  RED   -1.0273 (Elmhurst (12) non-spec — orthogonal)
  000487  RED   +0.4834 (Elmhurst (12) non-spec — orthogonal)
  000490  RED   -1.1042 (Elmhurst (12) non-spec — orthogonal)

Cohort cascade pins remain GREEN (66 of 66) — hand-built fixtures
have roof_construction_type=None (default) so the new code path is
inert for them; their roofs use RR detailed_surfaces with explicit
areas already.

Pyright net-zero on every touched file (heat_transmission 13 → 13,
mapper 35 → 35, epc_property_data 0 → 0).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 21:00:34 +00:00
Khalim Conn-Kowlessar
aff331ff34 Slice 87: implement RdSAP 10 §5 (12) spec rule for suspended timber floor
Replace the empirical `_elmhurst_has_suspended_timber_floor` heuristic
(which keyed on Room-in-Roof < Main ground area) with the mechanical
RdSAP 10 Specification §5 rule (page 29):

  - Age band A-E: U-value < 0.5 → sealed (0.1); retro insulation + no
    U → sealed (0.1); otherwise unsealed (0.2)
  - Age band F-M: sealed (0.1)
  - Park home: unsealed (0.2)
  - Only applies when Main bp's lowest floor is a "Ground floor" with
    "Suspended timber" construction

The spec rule is derived in `_has_suspended_timber_floor_per_spec`
(cert_to_inputs.py) and applied in `ventilation_from_cert` whenever
the lodged `epc.sap_ventilation.has_suspended_timber_floor` is None.
Explicit lodged values (cohort hand-built fixtures) take precedence.

Impact on cert 001479 (the load-bearing API↔Elmhurst parity-test
fixture; previously the RR-based heuristic returned False for this
no-RR semi-detached, dropping (12) entirely):

  Mapper → cascade → SAP delta vs worksheet 69.0094:
    BEFORE: +1.1903 (mapper extracted False; cascade applied (12)=0)
    AFTER : +0.2290 (mapper extracts None; spec derives True/unsealed;
                     cascade applies (12)=0.2 → matches worksheet)

  Cohort cascade pins remain GREEN (66 of 66) — cohort hand-built
  fixtures retain their explicit `has_suspended_timber_floor` values
  which override the spec derivation.

Expected cohort regressions to triage in the next slice:
  - 4 cohort chain tests RED (000474, 000480, 000487, 000490) — their
    Elmhurst worksheets enter non-spec (12) values (0.0 or 0.2 when
    spec predicts the opposite) so the mapper-path cascade now
    diverges from the worksheet PDF at 1e-4.
  - 6 cohort diff tests RED — mapper now produces
    has_suspended_timber_floor=None while the cohort hand-builts
    retain explicit True/False overrides, producing a 1-field
    divergence per cohort cert.

Pyright net-zero (mapper 35→35; cert_to_inputs 35→35) — dead
`_elmhurst_has_suspended_timber_floor` removed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 20:29:54 +00:00
Khalim Conn-Kowlessar
6baf66cdde Slice 68: party-wall "U Unable" + central_heating_pump_age_str → 1 diff left
Closes 4 of 5 remaining cohort 000474 diffs (5 → 1):

**Mapper:** Add "U" → 0 to `_ELMHURST_PARTY_WALL_CODE_TO_SAP10`. The
modal cohort lodgement Summary §7 "Party Wall Type: U Unable to
determine" was previously falling through to None; the cohort hand-
built convention uses 0 as the explicit "unknown" sentinel. The
cascade resolves both 0 and None to the same `u_party_wall` default
(0.25), so cascade output is unchanged. Closes 3 diffs (one per bp).

**Hand-built:** Set `central_heating_pump_age_str="Unknown"` on cohort
000474 Main heating detail (post-construction since the helper
doesn't expose the kwarg). Matches the Elmhurst mapper's surfaced
value from Summary §14 "Heat pump age: Unknown" — the str dual-
encoding internal_gains.py reads. Closes 1 diff.

All 66 cohort cascade pins remain GREEN at 1e-4. Pyright 35-error
baseline preserved on mapper.py; 0 errors on the hand-built file.

Remaining 1 diff on cohort 000474:
- `sap_windows: LEN 7 vs 5` — the cohort hand-built collapsed §11
  by glazing-type × orientation × bp group (preserving total area,
  cascade-equivalent but not field-equal); the mapper extracts 1:1
  with the worksheet's 7 §11 table rows. Next slice will expand the
  hand-built to 7 individual SapWindow entries matching the mapper.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 17:02:04 +00:00