Slice 50: Summary_000480 chain pins SAP at 1e-4; Room-in-Roof + baths + party-wall + roof-none

Four mapper extensions, validated by 000480 closing to 1e-4 and large
gap reductions across 000477/000487/000516.

1. Room-in-Roof support. `ElmhurstSiteNotes` gains `RoomInRoof` +
   `RoomInRoofSurface` dataclasses; extractor parses §8.1 (Flat
   Ceiling / Stud Wall / Slope / Gable Wall / Common Wall) with
   Length × Height + insulation + gable-type + measured-U cells.
   Mapper produces a `SapRoomInRoof` with `detailed_surfaces`
   attached to the Main bp: Stud Walls / Slopes / Flat Ceilings
   route through Table 17 insulation thickness; Gable Walls split
   between `gable_wall` (Party → Table 4 U=0.25) and
   `gable_wall_external` (Sheltered → assessor-lodged U-value
   override, e.g. 000487 Gable Wall 2 at U=0.86). Empty surfaces
   (0×0 — the cohort lodges a full 5-pair table) and Common Walls
   (handled by cascade's Simplified Type 2 geometry) are dropped.
   `total_floor_area_m2` now includes the RR floor area.

2. Party-wall construction mapping. 000516 lodges "S Solid masonry /
   timber / system build" which routes to SAP10 wall_construction=3
   (Solid Brick → U=0.0 via Table 4). The previous mapper used the
   same wall-type table as `wall_construction`, which lacked the
   "S" code and fell through to None (cascade default 0.25). Split
   into a dedicated `_elmhurst_party_wall_construction_int` keyed
   on the party-wall category codes.

3. Roof "None" insulation. When the §8.0 Roofs subsection lodges
   "Insulation N None" without a separate "Insulation Thickness"
   line, treat thickness as 0 mm so the cascade picks Table 16
   row 0 (U=2.30) rather than the age-band default. Closes the
   29 W/K roof-loss gap on 000516.

4. `number_baths` lodgement. `SapHeating.number_baths` now reads
   `survey.baths_and_showers.number_of_baths`. The cascade defaults
   `None → has-bath` for the modal UK case, but explicit `0` lodged
   on 000477/000480 (bathless dwellings, rare) drops the bath HW
   demand line per Table 1b. Closes 000480's last ~0.3 SAP gap.

Cohort state after this slice (target 1e-4):

  000474   0.0000  ✓ Slice 47
  000477  +1.1161     Elmhurst floor_ach quirk (true vs false despite
                      "T Suspended timber" lodged on all certs)
  000480   0.0000  ✓ THIS SLICE
  000487  +1.1844     extractor still drops most §11 windows on this
                      layout variant
  000490   0.0000  ✓ Slice 49
  000516  +0.1774     roof-window separation by U-value heuristic

3/6 certs now closed at 1e-4. Pyright net-zero (35 baseline). Tests
756 pass (added `test_summary_000480_full_chain_sap_matches_worksheet_
pdf_exactly`).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Khalim Conn-Kowlessar 2026-05-24 21:09:22 +00:00
parent ec4916b5a7
commit 598f04084a
4 changed files with 344 additions and 3 deletions

View file

@ -15,6 +15,8 @@ from datatypes.epc.surveys.elmhurst_site_notes import (
PropertyDetails,
Renewables,
RoofDetails,
RoomInRoof,
RoomInRoofSurface,
Shower,
SurveyorInfo,
VentilationAndCooling,
@ -237,9 +239,17 @@ class ElmhurstSiteNotesExtractor:
thickness_mm = (
int(thickness_raw.split()[0]) if thickness_raw and thickness_raw.split()[0].isdigit() else None
)
insulation = self._local_str(lines, "Insulation")
# The Summary PDF omits the "Insulation Thickness" line entirely
# when no retrofit insulation is lodged (e.g. "Insulation: N None"
# on 000516). Treat that case as 0 mm so the cascade picks Table
# 16 row 0 (U=2.30) rather than the age-band default — the
# surveyor explicitly recorded "None".
if thickness_mm is None and insulation.split(" ", 1)[0] == "N":
thickness_mm = 0
return RoofDetails(
roof_type=self._local_str(lines, "Type"),
insulation=self._local_str(lines, "Insulation"),
insulation=insulation,
u_value_known=self._local_bool(lines, "U-value Known"),
insulation_thickness_mm=thickness_mm,
)
@ -269,6 +279,148 @@ class ElmhurstSiteNotesExtractor:
lines = [l.strip() for l in main_body.splitlines() if l.strip()]
return self._floor_details_from_lines(lines)
# RIR surface row: `<name> <length> <height> [<insulation> [<ins_type>]
# [<gable_type>] <default_u> <known> <u>]`. The middle slot
# widths vary by surface kind; we match the four leading numerics
# robustly (length, height, default_u, u_value) and slot the
# remaining textual fields by position. The layout preprocessor
# collapses multi-space-separated cells into single newlines, so
# each row in the dump occupies multiple lines per cell.
_RIR_SURFACE_NAMES: tuple[str, ...] = (
"Flat Ceiling 1", "Flat Ceiling 2",
"Stud Wall 1", "Stud Wall 2",
"Slope 1", "Slope 2",
"Gable Wall 1", "Gable Wall 2",
"Common Wall 1", "Common Wall 2",
)
def _extract_room_in_roof(
self, main_dim_body: str, age_band_text: str
) -> Optional[RoomInRoof]:
"""Parse the §8.1 Rooms in Roof section for the Main bp. Returns
None when no RR is lodged (single-storey or simple loft houses).
`main_dim_body` is the Main-property §4 chunk used to pull the
RR floor area; `age_band_text` is the §3 raw text holding the
"Main Prop. Room(s) in Roof <band>" line."""
# RR floor area lives in §4 Dimensions immediately above the
# storey floor entries: "Room(s) in Roof: 15.06".
m = re.search(r"Room\(s\) in Roof:\s+(\d+(?:\.\d+)?)", main_dim_body)
if m is None:
return None
floor_area = float(m.group(1))
if floor_area <= 0:
return None
section = self._between("8.1 Rooms in Roof:", "9.0 Floors:")
if not section.strip() or "Room in roof type" not in section:
return None
bp_chunks = self._split_section_by_bp(section)
main_body = bp_chunks[0][1] if bp_chunks else section
lines = [l.strip() for l in main_body.splitlines() if l.strip()]
assessment_idx = next(
(i for i, l in enumerate(lines) if l == "Assessment"), None
)
assessment = (
lines[assessment_idx + 1] if assessment_idx is not None and assessment_idx + 1 < len(lines) else ""
)
surfaces: List[RoomInRoofSurface] = []
for name in self._RIR_SURFACE_NAMES:
try:
idx = lines.index(name)
except ValueError:
continue
surfaces.append(self._parse_rir_surface_row(name, lines, idx))
# Age band from §3: "Main Prop. Room(s) in Roof B 1900-1929"
age_m = re.search(
r"Main Prop\. Room\(s\) in Roof\s+([A-M] [^\n]+)", age_band_text
)
age_band = age_m.group(1).strip() if age_m else None
return RoomInRoof(
floor_area_m2=floor_area,
construction_age_band=age_band,
assessment=assessment,
surfaces=surfaces,
)
_RIR_NUMERIC_RE = re.compile(r"^-?\d+(?:\.\d+)?$")
_RIR_INSULATION_THICKNESS_RE = re.compile(r"^\d+\s*mm$")
def _parse_rir_surface_row(
self, name: str, lines: List[str], idx: int
) -> RoomInRoofSurface:
"""One RR surface row spans the name line followed by ~6-9 tokens
depending on which optional cells the surveyor filled. The token
order is stable: length, height, [insulation], [ins_type],
[gable_type], default_u, u_known, u_value. Numeric cells (length,
height, default_u, u_value) are the anchor; everything else is
slotted into the appropriate textual field."""
# Walk forward until either we exhaust the cell budget or hit
# the next RIR row's name marker — the layout dump puts each
# numeric / textual cell on its own line and we can't tell
# the LAST cell of THIS row from the FIRST cell of the next
# without that signal.
tokens: List[str] = []
scan_end = min(idx + 10, len(lines))
for j in range(idx + 1, scan_end):
if self._is_next_rir_row(lines[j]):
break
tokens.append(lines[j])
# First two numerics = length, height
length = float(tokens[0]) if tokens and self._RIR_NUMERIC_RE.match(tokens[0]) else 0.0
height = float(tokens[1]) if len(tokens) > 1 and self._RIR_NUMERIC_RE.match(tokens[1]) else 0.0
# Last numeric is u_value; preceding "Yes"/"No" is u_value_known;
# the numeric before that is default_u.
# Walk from the end backwards looking for the u_value, then known
# flag, then default_u.
u_value = 0.0
u_value_known = False
default_u: Optional[float] = None
# The known/default_u tail is fairly stable; collect the trailing
# tokens and slot by position. The "known" token is "No" or "Yes".
rev = list(reversed(tokens[2:]))
# rev[0] = u_value, rev[1] = u_value_known, rev[2] = default_u
if len(rev) >= 1 and self._RIR_NUMERIC_RE.match(rev[0]):
u_value = float(rev[0])
if len(rev) >= 2 and rev[1] in ("Yes", "No"):
u_value_known = rev[1] == "Yes"
if len(rev) >= 3 and self._RIR_NUMERIC_RE.match(rev[2]):
default_u = float(rev[2])
# Middle textual cells: insulation, insulation_type, gable_type.
# Drop the leading length/height (already consumed) and the
# trailing 3 tokens (default_u, known, u_value).
middle = tokens[2:-3] if len(tokens) >= 5 else []
insulation = ""
insulation_type: Optional[str] = None
gable_type: Optional[str] = None
for t in middle:
if self._RIR_INSULATION_THICKNESS_RE.match(t) or t in ("As Built", "None"):
if not insulation:
insulation = t
elif t in ("Mineral or EPS", "PUR", "PIR"):
insulation_type = t
elif t in ("Party", "Sheltered", "Connected to heated space"):
gable_type = t
return RoomInRoofSurface(
name=name,
length_m=length,
height_m=height,
insulation=insulation,
insulation_type=insulation_type,
gable_type=gable_type,
default_u_value=default_u,
u_value_known=u_value_known,
u_value=u_value,
)
def _is_next_rir_row(self, line: str) -> bool:
return line in self._RIR_SURFACE_NAMES
def _extract_extensions(self) -> List[ExtensionPart]:
"""Collect non-Main building parts. Cross-references the §4, §7,
§8, §9 per-bp subsections by extension name. "As Main: Yes"
@ -902,4 +1054,14 @@ class ElmhurstSiteNotesExtractor:
baths_and_showers=self._extract_baths_and_showers(),
renewables=self._extract_renewables(),
extensions=self._extract_extensions(),
room_in_roof=self._extract_room_in_roof_from_text(),
)
def _extract_room_in_roof_from_text(self) -> Optional[RoomInRoof]:
"""Convenience wrapper: pulls the Main §4 body + the §3 age-band
text once so `_extract_room_in_roof` doesn't need to re-slice
the document."""
dim_section = self._between("4.0 Dimensions:", "5.0 Conservatory:")
bp_chunks = self._split_section_by_bp(dim_section)
main_body = bp_chunks[0][1] if bp_chunks else dim_section
return self._extract_room_in_roof(main_body, self._text)

View file

@ -39,6 +39,7 @@ from domain.sap.rdsap.cert_to_inputs import SAP_10_2_SPEC_PRICES, cert_to_inputs
_FIXTURES = Path(__file__).parent / "fixtures"
_SUMMARY_000474_PDF = _FIXTURES / "Summary_000474.pdf"
_SUMMARY_000480_PDF = _FIXTURES / "Summary_000480.pdf"
_SUMMARY_000490_PDF = _FIXTURES / "Summary_000490.pdf"
@ -139,6 +140,26 @@ def test_summary_000474_full_chain_sap_matches_worksheet_pdf_exactly() -> None:
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 1e-4
def test_summary_000480_full_chain_sap_matches_worksheet_pdf_exactly() -> None:
# Arrange — cert U985-0001-000480 is a mid-terrace with main + one
# extension and a 19.83 m² room-in-roof storey. Worksheet PDF lodges
# unrounded SAP 61.2986 on line "SAP value". The Detailed §3.10 RR
# surfaces (2 stud walls @ 0mm + 2 slopes @ 0mm + 1 flat ceiling @
# 0mm + 2 party gables) plus zero baths drive the chain to 1e-4.
pages = _summary_pdf_to_textract_style_pages(_SUMMARY_000480_PDF)
site_notes = ElmhurstSiteNotesExtractor(pages).extract()
epc = EpcPropertyDataMapper.from_elmhurst_site_notes(site_notes)
# Act
result = calculate_sap_from_inputs(
cert_to_inputs(epc, prices=SAP_10_2_SPEC_PRICES)
)
# Assert
worksheet_unrounded_sap = 61.2986
assert abs(result.sap_score_continuous - worksheet_unrounded_sap) < 1e-4
def test_summary_000490_full_chain_sap_matches_worksheet_pdf_exactly() -> None:
# Arrange — cert U985-0001-000490 is an end-terrace with main +
# 1st extension. The worksheet PDF lodges unrounded SAP 57.3979.

View file

@ -23,6 +23,7 @@ from datatypes.epc.domain.epc_property_data import (
SapFloorDimension,
SapHeating,
SapRoomInRoof,
SapRoomInRoofSurface,
SapVentilation,
SapWindow,
ShowerOutlet,
@ -64,6 +65,8 @@ from datatypes.epc.surveys.elmhurst_site_notes import (
FloorDetails as ElmhurstFloorDetails,
MainHeating as ElmhurstMainHeating,
RoofDetails as ElmhurstRoofDetails,
RoomInRoof as ElmhurstRoomInRoof,
RoomInRoofSurface as ElmhurstRoomInRoofSurface,
VentilationAndCooling as ElmhurstVentilation,
WallDetails as ElmhurstWallDetails,
Window as ElmhurstWindow,
@ -310,7 +313,8 @@ class EpcPropertyDataMapper:
f.area_m2
for ext in survey.extensions
for f in ext.dimensions.floors
),
)
+ (survey.room_in_roof.floor_area_m2 if survey.room_in_roof else 0.0),
2,
),
built_form=built_form,
@ -1825,6 +1829,25 @@ def _elmhurst_wall_construction_int(coded: str) -> Optional[int]:
return _ELMHURST_WALL_CODE_TO_SAP10.get(_leading_code(coded))
# Elmhurst Party Wall Type codes — distinct category-set from the Wall
# Type field; the codes describe construction class for `u_party_wall`
# (Table 4 / RdSAP §S.3.2) rather than a specific SAP10 wall-type. Maps
# to the same SAP10 wall_construction integers since `u_party_wall`
# resolves via that domain (3 Solid Brick / 5 Timber / 6 System Build
# all → U=0.0; unknown → U=0.25).
_ELMHURST_PARTY_WALL_CODE_TO_SAP10: Dict[str, int] = {
"S": 3, # Solid masonry / timber / system build → U=0.0
"C": 4, # Cavity (unfilled) → U=0.5; observed in API path
}
def _elmhurst_party_wall_construction_int(coded: str) -> Optional[int]:
"""Map an Elmhurst party-wall-type string to a SAP10 wall_construction
integer. Returns None for 'U Unable to determine' (cascade default
U=0.25 then applies) and for unrecognised codes."""
return _ELMHURST_PARTY_WALL_CODE_TO_SAP10.get(_leading_code(coded))
def _elmhurst_wall_insulation_int(coded: str) -> Optional[int]:
"""Map an Elmhurst wall-insulation-type string ('A As Built') to
the SAP10 integer enum (4 = as-built). Returns None on unknown
@ -2028,6 +2051,7 @@ def _map_elmhurst_building_part(
walls: ElmhurstWallDetails,
roof: ElmhurstRoofDetails,
floor: ElmhurstFloorDetails,
room_in_roof: Optional[SapRoomInRoof] = None,
) -> SapBuildingPart:
"""Build a `SapBuildingPart` from one bp's worth of Elmhurst site-
notes data. `identifier` distinguishes Main from each extension."""
@ -2071,7 +2095,7 @@ def _map_elmhurst_building_part(
wall_construction=_elmhurst_wall_construction_int(walls.wall_type),
wall_insulation_type=_elmhurst_wall_insulation_int(walls.insulation),
wall_thickness_measured=not walls.thickness_unknown,
party_wall_construction=_elmhurst_wall_construction_int(walls.party_wall_type),
party_wall_construction=_elmhurst_party_wall_construction_int(walls.party_wall_type),
sap_floor_dimensions=floor_dims,
wall_thickness_mm=walls.thickness_mm,
roof_insulation_location=_strip_code(roof.insulation),
@ -2080,6 +2104,7 @@ def _map_elmhurst_building_part(
floor_construction_type=_strip_code(floor.floor_type),
floor_insulation_type_str=_strip_code(floor.insulation),
floor_u_value_known=floor.u_value_known,
sap_room_in_roof=room_in_roof,
)
@ -2106,6 +2131,7 @@ def _map_elmhurst_building_parts(survey: ElmhurstSiteNotes) -> List[SapBuildingP
walls=survey.walls,
roof=survey.roof,
floor=survey.floor,
room_in_roof=_map_elmhurst_room_in_roof(survey.room_in_roof),
)
]
for ext, identifier in zip(survey.extensions, _EXTENSION_IDENTIFIERS):
@ -2122,6 +2148,97 @@ def _map_elmhurst_building_parts(survey: ElmhurstSiteNotes) -> List[SapBuildingP
return parts
# RR detailed-surface naming → canonical kind. Each name in the Summary
# §8.1 table is "<kind> <ordinal>"; only the kind portion is meaningful.
_RIR_KIND_FROM_NAME_PREFIX: Dict[str, str] = {
"Flat Ceiling": "flat_ceiling",
"Stud Wall": "stud_wall",
"Slope": "slope",
"Gable Wall": "gable_wall",
}
# Elmhurst insulation-type strings → canonical SAP10 codes used by
# `SapRoomInRoofSurface.insulation_type`. Empty / unrecognised → None.
_RIR_INSULATION_TYPE_TO_SAP10: Dict[str, str] = {
"Mineral or EPS": "mineral_wool",
}
def _elmhurst_rir_insulation_thickness_mm(insulation_text: str) -> int:
"""Translate the Insulation cell ("100 mm", "None", "As Built", "")
into a thickness integer. The Elmhurst cohort uses "As Built" only
on surfaces whose Default U-value is the uninsulated 2.30 row, so
treating it as 0 mm is consistent with the Table 17 'none' column."""
if not insulation_text or insulation_text in ("None", "As Built"):
return 0
m = re.match(r"^(\d+)\s*mm$", insulation_text)
return int(m.group(1)) if m else 0
def _map_elmhurst_rir_surface(
surface: ElmhurstRoomInRoofSurface,
) -> Optional[SapRoomInRoofSurface]:
"""Translate one Elmhurst surface row into a `SapRoomInRoofSurface`.
Returns None when the surface is absent (0×0 the cohort lodges a
full 5-pair table even when only some surfaces exist) or is a
Common Wall (those are handled by the cascade's Simplified Type 2
geometry, not by Detailed enumeration)."""
if surface.length_m <= 0 or surface.height_m <= 0:
return None
if surface.name.startswith("Common Wall"):
return None
prefix = next(
(p for p in _RIR_KIND_FROM_NAME_PREFIX if surface.name.startswith(p)),
None,
)
if prefix is None:
return None
kind = _RIR_KIND_FROM_NAME_PREFIX[prefix]
# RdSAP Table 4 Gable Wall variant: "Party" → "gable_wall" (default
# U=0.25 per Table 4 row 2); "Sheltered" → "gable_wall_external"
# with the assessor-lodged U-value (line 29 of the U985 worksheet
# carries the lodged measurement) overriding the cascade.
u_value_override: Optional[float] = None
if kind == "gable_wall" and surface.gable_type == "Sheltered":
kind = "gable_wall_external"
u_value_override = surface.default_u_value
area_m2 = round(surface.length_m * surface.height_m, 2)
if kind in ("gable_wall", "gable_wall_external"):
# Gable walls aren't insulated through Table 17 — they use Table
# 4 / measured U. Don't lodge an insulation thickness on them.
return SapRoomInRoofSurface(
kind=kind,
area_m2=area_m2,
u_value=u_value_override,
)
return SapRoomInRoofSurface(
kind=kind,
area_m2=area_m2,
insulation_thickness_mm=_elmhurst_rir_insulation_thickness_mm(surface.insulation),
insulation_type=_RIR_INSULATION_TYPE_TO_SAP10.get(surface.insulation_type or ""),
)
def _map_elmhurst_room_in_roof(
rir: Optional[ElmhurstRoomInRoof],
) -> Optional[SapRoomInRoof]:
"""Build a `SapRoomInRoof` from the Elmhurst §8.1 detail. Returns
None when no RR is lodged (the dwelling has no room-in-roof storey
Summary PDF lacks the `Room(s) in Roof:` row or its area is 0)."""
if rir is None or rir.floor_area_m2 <= 0:
return None
detailed = [
s for s in (_map_elmhurst_rir_surface(s) for s in rir.surfaces)
if s is not None
]
return SapRoomInRoof(
floor_area=rir.floor_area_m2,
construction_age_band=_leading_code(rir.construction_age_band or ""),
detailed_surfaces=detailed or None,
)
# Elmhurst orientation strings → SAP10 octant integer (1=N..8=NW).
# Covers the orderings the layout-style window parser produces, both
# single-direction ("East") and combined ("North-West") forms.
@ -2303,6 +2420,7 @@ def _map_elmhurst_sap_heating(survey: ElmhurstSiteNotes) -> SapHeating:
),
water_heating_code=survey.water_heating.water_heating_sap_code,
secondary_heating_type=mh.secondary_heating_sap_code,
number_baths=survey.baths_and_showers.number_of_baths,
)

View file

@ -78,6 +78,40 @@ class FloorDetails:
default_u_value: Optional[float] = None
@dataclass
class RoomInRoofSurface:
"""One sub-element of a §3.10 Detailed Room-in-Roof assessment:
Flat Ceiling / Stud Wall / Slope / Gable Wall / Common Wall.
Each is lodged with a Length × Height pair plus insulation /
insulation-type / gable-type / measured-U fields. Absent surfaces
are still lodged at 0×0 (e.g. a Flat Ceiling with no flat-roof
portion) and filtered out in the mapper."""
name: str # e.g. "Flat Ceiling 1", "Stud Wall 2", "Gable Wall 1"
length_m: float
height_m: float
insulation: str # "As Built" | "None" | "100 mm" | ""
insulation_type: Optional[str] # e.g. "Mineral or EPS"
gable_type: Optional[str] # "Party" | "Sheltered" | "Connected to heated space"
default_u_value: Optional[float]
u_value_known: bool
u_value: float # assessor-measured U-value (0.00 when not known)
@dataclass
class RoomInRoof:
"""§8.1 Rooms in Roof — Main-property entry only (extensions never
carry RR in the observed corpus). `surfaces` lists all 5 RdSAP §3.10
detailed-assessment kinds in document order; 0×0 entries are kept so
the mapper sees the complete table shape."""
floor_area_m2: float
construction_age_band: Optional[str]
assessment: str # "Detailed" | "Simplified Type 1" | "Simplified Type 2"
surfaces: List[RoomInRoofSurface]
@dataclass
class Window:
width_m: float
@ -273,3 +307,9 @@ class ElmhurstSiteNotes:
# dimensions, and fabric details. Empty list = single-bp cert
# (preserves backward compatibility with the existing fixture).
extensions: List[ExtensionPart] = field(default_factory=lambda: []) # type: ignore[reportUnknownLambdaType]
# §8.1 Rooms in Roof — Main property only in the observed corpus.
# When None the dwelling has no RR storey (a 2-storey house with a
# cold loft instead of a room-in-roof). The mapper translates the
# surface table into a `SapRoomInRoof` attached to the Main bp.
room_in_roof: Optional[RoomInRoof] = None