Merge pull request #1263 from Hestia-Homes/feature/hyde_make_it_more_accurate_with_tests

data types
2026-06-30 13:10:47 +00:00 · 2026-06-23 13:50:45 +01:00 · 2026-06-23 13:50:45 +01:00 · 22b92f968e
commit 22b92f968e
parent 1a080254e7 00af7b5a54
12 changed files with 715 additions and 13 deletions
--- a/.claude/skills/expand-sap-accuracy-corpus/worklist.md
+++ b/.claude/skills/expand-sap-accuracy-corpus/worklist.md
@ -11,12 +11,102 @@ large, or a new/complex build pattern needed; NEEDS INVESTIGATION.
 2020 new-build flat) — full loop proven: eng 77 / elm 78, engine-on-Elmhurst-
 inputs 79 (calculator faithful within ~1). Use it to sanity-check the pipeline.

+# Files saving instructions
+I've noticed we arn't saving the epc.json (from epc api), input summary and worksheets from elmhurst. We shuold be doing that so we can reproduce behaviours quicker and verify. This is important as other memebers of team can help improve the verification. Files should be saved on backend/epc_api/json_samples/real_life_examples/<schema>/uprn_<uprn>/<files>
+
+
 ## Do next — new schemas (need a mapper)

 These two were flagged NOT MAPPABLE in the UI (red ✗). Mapper coverage now ADDED
 (dedicated per-schema methods in `EpcPropertyDataMapper`; corpus gauge green, 0
 new pyright errors). Elmhurst build + pin still pending — run the normal loop.

+### 🔧🔍 FOR KHALIM — 100061905751 (cert 2628-3046-6233-4584-2970, SAP-16.3) — mapper fix landed, Elmhurst +5 to review
+Surfaced as a PRODUCTION crash (NOT in The 100): `modelling_e2e` property 730259
+`predict_epc` → `EpcComparablePropertiesRepository.candidates_for(postcode)` maps
+EVERY cohort cert in one list comprehension, and THIS cert raised
+`ValueError: RdSapSchema17_1: missing required field 'insulated_door_count'`,
+aborting the whole prediction. 🔧 FIXED generically in `_normalize_sap_schema_16_x`
+(`mapper.py`): `setdefault("insulated_door_count", 0)` — door refinement, absent →
+"no insulated doors recorded" → 0 (conservative RdSAP). +regression test
+`test_16_x_missing_insulated_door_count_defaults_to_zero` (test_from_sap_schema.py);
+0 new pyright errors; real-cert accuracy + 16.x suites green.
+Cert: END-TERRACE HOUSE 2-storey, band A pre-1900, cavity wall AS-BUILT **no
+insulation** (U0.70), pitched 100mm loft, suspended uninsulated floor, mains-gas
+COMBI (PCDB 16366 Glow-worm Ultracom 2 35cxi), control 2106 CBE, double glazed,
+**2 OPEN FIREPLACES** (chimneys), lodged **secondary** open-fire smokeless fuel
+(SAP 631 = Elmhurst RKJ), 3.84m² band-B extension, natural vent, 17% LED, TFA 139.
+**eng 57 = lodged 57 EXACTLY** / elm worksheet 52 (+5). Built in Elmhurst end-to-end
+(boiler 16366, CBE/2106, 2 open chimneys, secondary RKJ open-fire-in-grate, window
+20.57m², party wall geometry GF 6.31/1F 5.14 CF). engine-on-Elmhurst-inputs 53 ≈
+worksheet 52 → **calculator faithful**. The +5 decomposes as: ~2 SAP = the documented
+16.x reduced-field **party-wall gap** (lodges no party_wall_length → engine models
+NONE; Elmhurst forces `Party walls Main 31.02m² × U0.250 = 7.76 W/K`), SCALED UP by
+2-storey geometry (party wall area ~31m² vs ~16m² on the single-storey 16.2 pins
+100021985993/100090182288 which showed only +2); remainder ~2-3 = reduced-field
+build choices (3.84m² extension folded into GF, wall thickness 250 vs lodged 300,
+draughtproofing default). build_100061905751.py complete. **NOT yet pinned** — to
+review with Khalim (his call per CONTEXT.md): pin engine=lodged 57 like other 16.x,
+and/or whether 16.x end-terrace should model a party wall to close the reduced-field
+gap. Sample saved under SAP-Schema-16.3/uprn_100061905751 (epc.json + both PDFs).
+
+### 🔧🔍 modelling_e2e production failure sweep (2026-06-23) — cohort resilience + mapper gaps
+Investigated the 45 FAILED `modelling_e2e` subtasks (DevAssessmentModelDB, portfolio
+796) — 36 property rows, ~25 unique UPRNs. Root cause for most: `predict_epc` →
+`EpcComparablePropertiesRepository.candidates_for(postcode)` mapped EVERY cohort cert
+in one comprehension, so ONE unmappable cohort cert aborted the whole prediction.
+
+**✅ DONE — cohort resilience (skip + report)** so a single bad cert can't sink a
+prediction, and the skipped certs surface for follow-up:
+- `candidates_for` now skips certs that raise `ValueError` (mapper failures:
+  missing-field + `UnmappedApiCode`/`UnmappedElmhurstLabel`, both `ValueError`),
+  recording `SkippedCohortCert(certificate_number, error)` on `repo.skipped` + a
+  `logger.warning`. Transient API errors (not `ValueError`) still propagate.
+- handler surfaces them in the subtask **outputs** (success → `outputs.result.
+  skipped_unmappable_cohort_certs`; failure → appended to the error message), with
+  cert numbers, so the gaps can be closed deliberately. +3 tests; 0 new pyright.
+  Live-verified on BN11 4EP: cohort now builds 35 + records the 1 skip.
+
+**✅ DONE — 2 safe generic mapper fixes (+regression tests, 0 new pyright):**
+- `_normalize_sap_schema_16_x`: `setdefault("insulated_door_count", 0)` (the
+  original prod crash) AND `setdefault("multiple_glazed_proportion", 100)` (16.3
+  cert 0418-3986-7250-2884-7970; ML-only field the SAP calc ignores; modal 100).
+- `_with_recorded_performance` co2/consumption/rating: `float(co2)` → crashed on
+  certs lodging `co2_emissions_current` as a Measurement dict `{'value':3.5,
+  'quantity':'tonnes per year'}` (16.x cert 2308-4997-7262-0137-9930, surfaced as
+  `TypeError: float() argument must be ... not 'dict'`). Now coerced via
+  `_measurement_value` (handles Measurement/dict/number). Verified maps → 3.5.
+
+**⚖️ FOR KHALIM — uncovered mapper gaps (mapper is your domain per CONTEXT.md):**
+- **`built_form` missing (SAP-16.0)** — cert 8742-6624-9300-2780-4926 (uprn
+  10070004512); has `property_type` but no `built_form`. Defaulting it drives
+  party-wall/exposed-perimeter modelling → a real modelling decision, not a neutral
+  default. Blocks props 710339, 723589.
+- **`window` missing + genuinely sparse (SAP-16.2)** — cert 8257-7539-1649-0633-4992
+  (uprn 10070009758); `windows` is a DICT not list, AND it omits `door_count`,
+  `habitable_room_count`, `glazed_area` — fail-loud is likely correct (data
+  insufficient). Blocks props 733315, 730259.
+- **🔍 RdSAP-21.0.0 systematically lacks the ADR-0028 Optional widenings 21.0.1 got.**
+  Cert 3135-3223-5500-0919-2206 (uprn 100021919725) omits 13 top-level fields
+  (`wet_rooms_count`, `open_chimneys_count`, `mechanical_ventilation_index_number`,
+  `led/cfl_fixed_lighting_bulbs_count`, the `mechanical_vent_duct_*`,
+  `insulated_door_u_value`, `pressure_test_certificate_number`,
+  `windows_transmission_details`) + `SapWindow` (`pvc_frame`/`glazing_gap`/
+  `frame_factor`/`window_transmission_details`) — ALL required in 21.0.0 but
+  `Optional[...] = None` in 21.0.1. NOT applied: widening risks pushing `None` into
+  `from_rdsap_schema_21_0_0` which may assume non-None (needs mapper-coalescing
+  review). Recommend aligning RdSapSchema21_0_0 to 21.0.1 (incl. the same Optionals
+  on insulated_door_count etc.) + reviewing the 21.0.0 mapper. Blocks prop 719897.
+  (The resilience change already stops all of these from crashing prod.)
+
+Caveat: ~13 other failed props showed no mapper gap on replay — genuinely
+not-predictable (empty/insufficient cohort) or already fixed by the
+`insulated_door_count` default; cohort replay only checked the first 60 certs/postcode
+and skipped rate-limited fetches, so a deeper gap could be missed.
+
+**Also done:** `deployment/terraform/lambda/modelling_e2e/variables.tf`
+`maximum_concurrency` default 2 → 4 (per request).
+
 - [x] 🔧 10096028301 — SAP-Schema-19.1.0 (full-SAP g/f FLAT, band M, combi PCDB 17929, MEV, AP50 3.5) · eng 82 / elm 82 (lodged 85) · PINNED engine 82. 🔧 mapper added: `from_sap_schema_19_1_0`. Built in Elmhurst (boiler 17929 via search, control CBE/2106, water from primary, MEV on, AP50 Blower Door 3.5+cert). Engine EXACTLY matches Elmhurst worksheet (82.11 vs 82); engine-on-Elmhurst-inputs 82.16 ≈ 82 → calculator faithful. −3 vs lodged = measured-U-vs-RdSAP-default + MEV extract-not-recovery (documented). No mapper change beyond coverage.
 - [x] 🔧 100021943298 — SAP-Schema-16.1 (g/f FLAT, band B, solid-brick internal, combi PCDB 10328) · eng 76 / elm 75 (lodged 72) · PINNED engine 76. 🔧 mapper added: `from_sap_schema_16_1`. Built in Elmhurst (boiler 10328 via search, control CBE/2106, water from primary, wall insulation thickness Unknown); worksheet 75 → engine within ~1 (tightest agreement, reduced-field). Boiler-select + water-heating + control dialogs all driven via automation (two-step row→Select / cascade + coordinate-OK). No mapper change beyond coverage.

--- a/applications/modelling_e2e/handler.py
+++ b/applications/modelling_e2e/handler.py
@ -61,6 +61,7 @@ from applications.modelling_e2e.modelling_e2e_trigger_body import (
 )
 from repositories.comparable_properties.epc_comparable_properties_repository import (
    EpcComparablePropertiesRepository,
+    SkippedCohortCert,
 )
 from repositories.geospatial.geospatial_s3_repository import (
    GeospatialS3Repository,
@ -131,6 +132,20 @@ def _solar_insights_for(
        return None


+def _dedupe_skipped(
+    skipped: list[SkippedCohortCert],
+) -> list[SkippedCohortCert]:
+    """First occurrence of each skipped cert number (the same cert can appear in
+    more than one postcode cohort across a batch)."""
+    seen: set[str] = set()
+    unique: list[SkippedCohortCert] = []
+    for cert in skipped:
+        if cert.certificate_number not in seen:
+            seen.add(cert.certificate_number)
+            unique.append(cert)
+    return unique
+
+
 def _predict_epc(
    *,
    property_id: int,
@ -168,7 +183,7 @@ def _predict_epc(


@task_handler(task_source="modelling_e2e", source=Source.PROPERTY)
-def handler(body: dict[str, Any], context: Any) -> None:
+def handler(body: dict[str, Any], context: Any) -> Optional[dict[str, Any]]:
    trigger = ModellingE2ETriggerBody.model_validate(body)
    property_ids = trigger.property_ids
    portfolio_id = trigger.portfolio_id
@ -349,7 +364,25 @@ def handler(body: dict[str, Any], context: Any) -> None:
                )
                errors.append(property_id)

+        # Cohort certs the mapper could not consume were skipped (not aborted on)
+        # so prediction could proceed; surface them — with cert numbers — in the
+        # subtask outputs so the mapper gaps can be closed later.
+        skipped_certs: list[dict[str, str]] = [
+            {"certificate_number": s.certificate_number, "error": s.error}
+            for s in _dedupe_skipped(comparables_repo.skipped)
+        ]
+        if skipped_certs:
+            logger.info(
+                f"skipped {len(skipped_certs)} unmappable cohort cert(s): "
+                f"{[s['certificate_number'] for s in skipped_certs]}"
+            )
+
        if errors:
-            raise RuntimeError(f"failed property_ids: {errors}")
+            message = f"failed property_ids: {errors}"
+            if skipped_certs:
+                message += f"; skipped_unmappable_cohort_certs: {skipped_certs}"
+            raise RuntimeError(message)
+
+        return {"skipped_unmappable_cohort_certs": skipped_certs} if skipped_certs else None
    finally:
        read_session.close()
--- a/backend/epc_api/json_samples/real_life_examples/SAP-Schema-16.3/uprn_100061905751/elmhurst_summary.pdf
+++ b/backend/epc_api/json_samples/real_life_examples/SAP-Schema-16.3/uprn_100061905751/elmhurst_summary.pdf
--- a/backend/epc_api/json_samples/real_life_examples/SAP-Schema-16.3/uprn_100061905751/elmhurst_worksheet.pdf
+++ b/backend/epc_api/json_samples/real_life_examples/SAP-Schema-16.3/uprn_100061905751/elmhurst_worksheet.pdf
--- a/backend/epc_api/json_samples/real_life_examples/SAP-Schema-16.3/uprn_100061905751/epc.json
+++ b/backend/epc_api/json_samples/real_life_examples/SAP-Schema-16.3/uprn_100061905751/epc.json
@ -0,0 +1,335 @@
+{
+  "uprn": 100061905751,
+  "roofs": [
+    {
+      "description": "Pitched, 100 mm loft insulation",
+      "energy_efficiency_rating": 3,
+      "environmental_efficiency_rating": 3
+    }
+  ],
+  "walls": [
+    {
+      "description": "Cavity wall, as built, no insulation (assumed)",
+      "energy_efficiency_rating": 1,
+      "environmental_efficiency_rating": 1
+    }
+  ],
+  "floors": [
+    {
+      "description": "Suspended, no insulation (assumed)",
+      "energy_efficiency_rating": 0,
+      "environmental_efficiency_rating": 0
+    }
+  ],
+  "status": "entered",
+  "tenure": 1,
+  "windows": [
+    {
+      "description": "Fully double glazed",
+      "energy_efficiency_rating": 4,
+      "environmental_efficiency_rating": 4
+    }
+  ],
+  "addendum": {
+    "cavity_fill_recommended": "true"
+  },
+  "lighting": {
+    "description": "Low energy lighting in 17% of fixed outlets",
+    "energy_efficiency_rating": 2,
+    "environmental_efficiency_rating": 2
+  },
+  "postcode": "BN11 4EP",
+  "hot_water": {
+    "description": "From main system",
+    "energy_efficiency_rating": 4,
+    "environmental_efficiency_rating": 4
+  },
+  "post_town": "WORTHING",
+  "built_form": 3,
+  "created_at": "2014-07-24 12:49:23",
+  "door_count": 2,
+  "glazed_area": 1,
+  "region_code": 14,
+  "report_type": 2,
+  "sap_heating": {
+    "wwhrs": {
+      "rooms_with_bath_and_or_shower": 1,
+      "rooms_with_mixer_shower_no_bath": 0,
+      "rooms_with_bath_and_mixer_shower": 1
+    },
+    "cylinder_size": 1,
+    "water_heating_code": 901,
+    "water_heating_fuel": 26,
+    "secondary_fuel_type": 15,
+    "main_heating_details": [
+      {
+        "has_fghrs": "N",
+        "main_fuel_type": 26,
+        "boiler_flue_type": 2,
+        "fan_flue_present": "Y",
+        "heat_emitter_type": 1,
+        "boiler_index_number": 16366,
+        "main_heating_number": 1,
+        "main_heating_control": 2106,
+        "main_heating_category": 2,
+        "main_heating_fraction": 1,
+        "main_heating_data_source": 1
+      }
+    ],
+    "secondary_heating_type": 631,
+    "has_fixed_air_conditioning": "false"
+  },
+  "sap_version": 9.91,
+  "schema_type": "SAP-Schema-16.3",
+  "uprn_source": "Energy Assessor",
+  "country_code": "EAW",
+  "main_heating": [
+    {
+      "description": "Boiler and radiators, mains gas",
+      "energy_efficiency_rating": 4,
+      "environmental_efficiency_rating": 4
+    }
+  ],
+  "dwelling_type": "End-terrace house",
+  "language_code": 1,
+  "property_type": 0,
+  "address_line_1": "77, Tarring Road",
+  "schema_version": "LIG-16.1",
+  "assessment_type": "RdSAP",
+  "completion_date": "2014-07-24",
+  "inspection_date": "2014-07-22",
+  "extensions_count": 1,
+  "measurement_type": 1,
+  "total_floor_area": 139,
+  "transaction_type": 5,
+  "conservatory_type": 1,
+  "heated_room_count": 5,
+  "registration_date": "2014-04-25",
+  "restricted_access": 0,
+  "sap_energy_source": {
+    "main_gas": "Y",
+    "meter_type": 2,
+    "photovoltaic_supply": [
+      [
+        {
+          "pitch": 3,
+          "peak_power": 1.25,
+          "orientation": 5,
+          "overshading": 1
+        }
+      ],
+      [
+        {
+          "pitch": 3,
+          "peak_power": 0.5,
+          "orientation": 3,
+          "overshading": 1
+        }
+      ],
+      [
+        {
+          "pitch": 3,
+          "peak_power": 0.5,
+          "orientation": 7,
+          "overshading": 2
+        }
+      ]
+    ],
+    "wind_turbines_count": 0,
+    "wind_turbines_terrain_type": 2
+  },
+  "secondary_heating": {
+    "description": "Room heaters, smokeless fuel",
+    "energy_efficiency_rating": 0,
+    "environmental_efficiency_rating": 0
+  },
+  "lzc_energy_sources": [
+    11
+  ],
+  "sap_building_parts": [
+    {
+      "identifier": "Main Dwelling",
+      "wall_dry_lined": "N",
+      "wall_thickness": 300,
+      "floor_heat_loss": 7,
+      "roof_construction": 4,
+      "wall_construction": 4,
+      "building_part_number": 1,
+      "sap_floor_dimensions": [
+        {
+          "floor": 0,
+          "room_height": 2.7,
+          "floor_insulation": 1,
+          "total_floor_area": 67.49,
+          "floor_construction": 2,
+          "heat_loss_perimeter": 27.69
+        },
+        {
+          "floor": 1,
+          "room_height": 2.47,
+          "total_floor_area": 67.49,
+          "heat_loss_perimeter": 31.38
+        }
+      ],
+      "wall_insulation_type": 4,
+      "construction_age_band": "A",
+      "wall_thickness_measured": "Y",
+      "roof_insulation_location": 2,
+      "roof_insulation_thickness": "100mm"
+    },
+    {
+      "identifier": "Extension",
+      "wall_dry_lined": "N",
+      "wall_thickness": 280,
+      "floor_heat_loss": 7,
+      "roof_construction": 5,
+      "wall_construction": 4,
+      "building_part_number": 2,
+      "sap_floor_dimensions": [
+        {
+          "floor": 0,
+          "room_height": 2.45,
+          "floor_insulation": 1,
+          "total_floor_area": 3.84,
+          "floor_construction": 1,
+          "heat_loss_perimeter": 5.77
+        }
+      ],
+      "wall_insulation_type": 4,
+      "construction_age_band": "B",
+      "wall_thickness_measured": "Y",
+      "roof_insulation_location": 4,
+      "roof_insulation_thickness": "NI"
+    }
+  ],
+  "low_energy_lighting": 17,
+  "solar_water_heating": "N",
+  "bedf_revision_number": 361,
+  "habitable_room_count": 5,
+  "heating_cost_current": {
+    "value": 1430,
+    "currency": "GBP"
+  },
+  "co2_emissions_current": 7.2,
+  "energy_rating_average": 60,
+  "energy_rating_current": 57,
+  "lighting_cost_current": {
+    "value": 132,
+    "currency": "GBP"
+  },
+  "main_heating_controls": [
+    {
+      "description": "Programmer, room thermostat and TRVs",
+      "energy_efficiency_rating": 4,
+      "environmental_efficiency_rating": 4
+    }
+  ],
+  "multiple_glazing_type": 2,
+  "open_fireplaces_count": 2,
+  "has_hot_water_cylinder": "false",
+  "heating_cost_potential": {
+    "value": 881,
+    "currency": "GBP"
+  },
+  "hot_water_cost_current": {
+    "value": 105,
+    "currency": "GBP"
+  },
+  "mechanical_ventilation": 0,
+  "percent_draughtproofed": 100,
+  "suggested_improvements": [
+    {
+      "sequence": 1,
+      "typical_saving": {
+        "value": 308,
+        "currency": "GBP"
+      },
+      "indicative_cost": "\u00c2\u00a3500 - \u00c2\u00a31,500",
+      "improvement_type": "B",
+      "improvement_details": {
+        "improvement_number": 6
+      },
+      "improvement_category": 5,
+      "energy_performance_rating": 71,
+      "environmental_impact_rating": 63
+    },
+    {
+      "sequence": 2,
+      "typical_saving": {
+        "value": 73,
+        "currency": "GBP"
+      },
+      "indicative_cost": "\u00c2\u00a3800 - \u00c2\u00a31,200",
+      "improvement_type": "W",
+      "improvement_details": {
+        "improvement_number": 47
+      },
+      "improvement_category": 5,
+      "energy_performance_rating": 74,
+      "environmental_impact_rating": 66
+    },
+    {
+      "sequence": 3,
+      "typical_saving": {
+        "value": 49,
+        "currency": "GBP"
+      },
+      "indicative_cost": "\u00c2\u00a375",
+      "improvement_type": "E",
+      "improvement_details": {
+        "improvement_number": 35
+      },
+      "improvement_category": 5,
+      "energy_performance_rating": 75,
+      "environmental_impact_rating": 67
+    }
+  ],
+  "co2_emissions_potential": 4.0,
+  "energy_rating_potential": 75,
+  "lighting_cost_potential": {
+    "value": 72,
+    "currency": "GBP"
+  },
+  "alternative_improvements": [
+    {
+      "improvement": {
+        "sequence": 1,
+        "typical_saving": {
+          "value": 67,
+          "currency": "GBP"
+        },
+        "improvement_type": "Q2",
+        "improvement_details": {
+          "improvement_number": 55
+        },
+        "improvement_category": 6,
+        "energy_performance_rating": 73,
+        "environmental_impact_rating": 65
+      }
+    }
+  ],
+  "hot_water_cost_potential": {
+    "value": 105,
+    "currency": "GBP"
+  },
+  "renewable_heat_incentive": {
+    "water_heating": 2315,
+    "impact_of_loft_insulation": -869,
+    "impact_of_cavity_insulation": -8658,
+    "space_heating_existing_dwelling": 23216
+  },
+  "seller_commission_report": "Y",
+  "energy_consumption_current": 230,
+  "has_fixed_air_conditioning": "false",
+  "multiple_glazed_proportion": 100,
+  "calculation_software_version": "9.1.2",
+  "energy_consumption_potential": 125,
+  "environmental_impact_current": 45,
+  "fixed_lighting_outlets_count": 18,
+  "current_energy_efficiency_band": "D",
+  "environmental_impact_potential": 67,
+  "has_heated_separate_conservatory": "false",
+  "potential_energy_efficiency_band": "C",
+  "co2_emissions_current_per_floor_area": 52,
+  "low_energy_fixed_lighting_outlets_count": 3
+}
--- a/datatypes/epc/domain/mapper.py
+++ b/datatypes/epc/domain/mapper.py
@ -2817,16 +2817,23 @@ def _with_recorded_performance(
        current_energy_efficiency_band=(
            Epc(band) if band is not None else epc.current_energy_efficiency_band
        ),
+        # These scalars are normally plain numbers but some certs (e.g. 16.x cert
+        # 2308-4997-7262-0137-9930) lodge them as a Measurement dict
+        # {'value': 3.5, 'quantity': 'tonnes per year'} — `_measurement_value`
+        # coerces Measurement / dict / plain-number alike, so `float(dict)` no
+        # longer crashes the whole prediction cohort.
        co2_emissions_current=(
-            float(co2) if co2 is not None else epc.co2_emissions_current
+            _measurement_value(co2) if co2 is not None else epc.co2_emissions_current
        ),
        energy_consumption_current=(
-            int(consumption)
+            int(_measurement_value(consumption))
            if consumption is not None
            else epc.energy_consumption_current
        ),
        energy_rating_current=(
-            int(rating) if rating is not None else epc.energy_rating_current
+            int(_measurement_value(rating))
+            if rating is not None
+            else epc.energy_rating_current
        ),
    )

@ -3306,6 +3313,14 @@ def _normalize_sap_schema_16_x(data: Dict[str, Any]) -> Dict[str, Any]:
    # still fail loud — that is correct, the fabric data is insufficient.
    d.setdefault("insulated_door_count", 0)

+    # Some 16.x certs (e.g. 16.3 cert 0418-3986-7250-2884-7970) omit
+    # `multiple_glazed_proportion` while still lodging `multiple_glazing_type` —
+    # RdSapSchema17_1 requires it. It is an ML-feature field the SAP-10 calculator
+    # never reads (only `sap10_ml/transform`), so the value is non-load-bearing for
+    # the score; default the modal 100 ("fully" the lodged glazing type) to keep the
+    # cert mappable. Mirrors the `insulated_door_count` default above.
+    d.setdefault("multiple_glazed_proportion", 100)
+
    # 16.2 lodges glazing in BOTH `multiple_glazing_type` (frequently the "ND"
    # not-defined sentinel) AND the windows[].description. When the numeric field
    # is undefined, honour an explicit "Single glazed" description so it is not
--- a/datatypes/epc/domain/tests/test_from_sap_schema.py
+++ b/datatypes/epc/domain/tests/test_from_sap_schema.py
@ -538,6 +538,36 @@ class TestFromSapSchema16_2:
        assert isinstance(epc, EpcPropertyData)
        assert epc.insulated_door_count == 0

+    def test_16_x_missing_multiple_glazed_proportion_still_maps(self) -> None:
+        # Some 16.x certs (e.g. 16.3 cert 0418-3986-7250-2884-7970) lodge
+        # `multiple_glazing_type` but omit `multiple_glazed_proportion`, which
+        # RdSapSchema17_1 requires — previously raised "missing required field
+        # 'multiple_glazed_proportion'", aborting the prediction cohort. The
+        # normaliser defaults the modal 100 so the cert maps. (The field is an
+        # ML-only feature the SAP calculator never reads, and from_rdsap_schema_17_1
+        # does not carry it onto EpcPropertyData — the point here is mappability.)
+        data = load("sap_16_3.json")
+        del data["multiple_glazed_proportion"]
+        assert "multiple_glazed_proportion" not in data
+
+        epc = EpcPropertyDataMapper.from_api_response(data)
+
+        assert isinstance(epc, EpcPropertyData)
+
+    def test_recorded_co2_as_measurement_dict_is_coerced_not_crashed(self) -> None:
+        # Some certs (e.g. 16.x cert 2308-4997-7262-0137-9930) lodge
+        # `co2_emissions_current` as a Measurement dict {'value': 3.5, 'quantity':
+        # 'tonnes per year'} rather than a plain number. `_with_recorded_performance`
+        # previously did `float(co2)` → "float() argument must be ... not 'dict'",
+        # crashing the whole prediction cohort. It now coerces via _measurement_value.
+        data = load("sap_16_3.json")
+        data["co2_emissions_current"] = {"value": 3.5, "quantity": "tonnes per year"}
+
+        epc = EpcPropertyDataMapper.from_api_response(data)
+
+        assert isinstance(epc, EpcPropertyData)
+        assert epc.co2_emissions_current == 3.5
+
    def test_16_2_normalizer_does_not_mutate_caller_dict(self) -> None:
        # Mirror _normalize_shower_outlets' contract: the caller's dict is
        # untouched (deep copy), so a re-dispatch sees the original shape.
--- a/deployment/terraform/lambda/modelling_e2e/variables.tf
+++ b/deployment/terraform/lambda/modelling_e2e/variables.tf
@ -26,7 +26,7 @@ variable "reserved_concurrent_executions" {

 variable "maximum_concurrency" {
  type        = number
-  default     = 2
+  default     = 4
  description = "Maximum concurrent Lambda invocations from the SQS trigger."
 }

--- a/repositories/comparable_properties/epc_comparable_properties_repository.py
+++ b/repositories/comparable_properties/epc_comparable_properties_repository.py
@ -9,6 +9,8 @@ UPRNs share a partition). Register metadata the cert itself doesn't carry

 from __future__ import annotations

+import logging
+from dataclasses import dataclass
 from datetime import date
 from typing import Optional, Protocol

@ -38,12 +40,30 @@ class CohortGeospatial(Protocol):
    ) -> dict[int, Coordinates]: ...


+logger = logging.getLogger(__name__)
+
+
+@dataclass(frozen=True)
+class SkippedCohortCert:
+    """A postcode-cohort cert the mapper could not consume, so it was excluded
+    from the cohort rather than sinking the whole prediction. The certificate
+    number + error are captured so the gap surfaces (subtask outputs + logs) and
+    can be closed by extending the mapper later (ADR-0031)."""
+
+    certificate_number: str
+    error: str
+
+
 class EpcComparablePropertiesRepository(ComparablePropertiesRepository):
    def __init__(
        self, epc_client: CohortEpcClient, geospatial: CohortGeospatial
    ) -> None:
        self._epc_client = epc_client
        self._geospatial = geospatial
+        # Cohort certs skipped because they are not yet mappable. Accumulates
+        # across every postcode the instance serves; the caller reads it after
+        # the run to report the mapper gaps (see modelling_e2e handler).
+        self.skipped: list[SkippedCohortCert] = []

    def candidates_for(self, postcode: str) -> list[ComparableProperty]:
        results: list[EpcSearchResult] = self._epc_client.search_by_postcode(
@ -53,14 +73,37 @@ class EpcComparablePropertiesRepository(ComparablePropertiesRepository):
        coordinates: dict[int, Coordinates] = self._geospatial.coordinates_for_uprns(
            uprns
        )
-        return [self._comparable(result, coordinates) for result in results]
+        cohort: list[ComparableProperty] = []
+        for result in results:
+            comparable = self._comparable_or_skip(result, coordinates)
+            if comparable is not None:
+                cohort.append(comparable)
+        return cohort

-    def _comparable(
+    def _comparable_or_skip(
        self, result: EpcSearchResult, coordinates: dict[int, Coordinates]
-    ) -> ComparableProperty:
-        epc: EpcPropertyData = self._epc_client.get_by_certificate_number(
-            result.certificate_number
-        )
+    ) -> Optional[ComparableProperty]:
+        """Map one cohort cert, or record + skip it when the mapper raises a
+        ``ValueError`` (missing required field / unmapped code — see
+        ``UnmappedApiCode``). One unmappable cert must NOT abort the whole cohort;
+        transient API errors are NOT ``ValueError`` and still propagate."""
+        try:
+            epc: EpcPropertyData = self._epc_client.get_by_certificate_number(
+                result.certificate_number
+            )
+        except ValueError as exc:
+            self.skipped.append(
+                SkippedCohortCert(
+                    certificate_number=result.certificate_number,
+                    error=f"{type(exc).__name__}: {exc}",
+                )
+            )
+            logger.warning(
+                "skipping unmappable cohort cert %s: %s",
+                result.certificate_number,
+                exc,
+            )
+            return None
        resolved: Optional[Coordinates] = (
            coordinates.get(result.uprn) if result.uprn is not None else None
        )
--- a/scripts/hyde/elmhurst_download.py
+++ b/scripts/hyde/elmhurst_download.py
@ -28,7 +28,7 @@ SESSION_DIR = HERE / ".elmhurst-session"
 SAMPLE_DIR = (
    HERE.parent.parent
    / "backend/epc_api/json_samples/real_life_examples"
-    / "RdSAP-Schema-17.1/uprn_10010215568"
+    / "SAP-Schema-16.3/uprn_100061905751"
 )

 ASSESSMENT_GUID = "B44A0DB4-4C08-4241-B818-86F060172105"
--- a/tests/applications/modelling_e2e/test_handler.py
+++ b/tests/applications/modelling_e2e/test_handler.py
@ -198,6 +198,99 @@ def test_lodged_epc_path_saves_epc_plan_and_marks_modelled() -> None:
    mock_uow.commit.assert_called_once()


+def test_skipped_cohort_certs_are_surfaced_in_the_outputs() -> None:
+    """Cohort certs the mapper can't consume are skipped (so prediction is not
+    aborted) and surfaced — with cert numbers — in the subtask outputs, so the
+    mapper gaps can be reported and closed."""
+    from repositories.comparable_properties.epc_comparable_properties_repository import (
+        SkippedCohortCert,
+    )
+
+    mock_engine = _engine_mock([PROPERTY_ID], [UPRN], [POSTCODE])
+    mock_plan = _plan_mock()
+    skipped = [
+        SkippedCohortCert(
+            certificate_number="8257-7539-1649-0633-4992",
+            error="ValueError: RdSapSchema17_1: missing required field 'window'",
+        )
+    ]
+
+    with ExitStack() as stack:
+        stack.enter_context(
+            patch("applications.modelling_e2e.handler.os.environ", _ENV)
+        )
+        stack.enter_context(
+            patch(
+                "applications.modelling_e2e.handler._get_engine",
+                return_value=mock_engine,
+            )
+        )
+        stack.enter_context(
+            patch("applications.modelling_e2e.handler.EpcClientService")
+        ).return_value.get_by_uprn.return_value = MagicMock()
+        stack.enter_context(
+            patch("applications.modelling_e2e.handler.GeospatialS3Repository")
+        )
+        stack.enter_context(
+            patch("applications.modelling_e2e.handler.GoogleSolarApiClient")
+        )
+        stack.enter_context(
+            patch("applications.modelling_e2e.handler._spatial_for", return_value=None)
+        )
+        stack.enter_context(
+            patch(
+                "applications.modelling_e2e.handler._solar_insights_for",
+                return_value=None,
+            )
+        )
+        stack.enter_context(
+            patch("applications.modelling_e2e.handler.overlays_from", return_value=[])
+        )
+        stack.enter_context(
+            patch("applications.modelling_e2e.handler.PropertyOverridesPostgresReader")
+        )
+        stack.enter_context(
+            patch("applications.modelling_e2e.handler.ScenarioPostgresRepository")
+        ).return_value.get_many.return_value = [MagicMock()]
+        stack.enter_context(
+            patch(
+                "applications.modelling_e2e.handler.catalogue_with_off_catalogue_overrides"
+            )
+        )
+        stack.enter_context(patch("applications.modelling_e2e.handler.Session"))
+        stack.enter_context(
+            patch(
+                "applications.modelling_e2e.handler.run_modelling",
+                return_value=mock_plan,
+            )
+        )
+        # The repo accumulated a skipped (unmappable) cohort cert during the run.
+        stack.enter_context(
+            patch("applications.modelling_e2e.handler.EpcComparablePropertiesRepository")
+        ).return_value.skipped = skipped
+        MockUoW = stack.enter_context(
+            patch("applications.modelling_e2e.handler.PostgresUnitOfWork")
+        )
+        MockUoW.return_value.__enter__.return_value = MagicMock()
+        MockUoW.return_value.__exit__.return_value = False
+
+        # Act
+        result = _call_handler(_BODY)
+
+    # Assert — the handler's return (→ subtask outputs.result) carries the cert
+    # numbers + errors of every skipped cohort cert.
+    assert result == {
+        "skipped_unmappable_cohort_certs": [
+            {
+                "certificate_number": "8257-7539-1649-0633-4992",
+                "error": (
+                    "ValueError: RdSapSchema17_1: missing required field 'window'"
+                ),
+            }
+        ]
+    }
+
+
 # ---------------------------------------------------------------------------
 # EPC Prediction path
 # ---------------------------------------------------------------------------
--- a/tests/repositories/comparable_properties/test_epc_comparable_properties_repository.py
+++ b/tests/repositories/comparable_properties/test_epc_comparable_properties_repository.py
@ -9,6 +9,8 @@ from datetime import date
 from pathlib import Path
 from typing import Any, Optional

+import pytest
+
 from datatypes.epc.domain.epc_property_data import EpcPropertyData
 from datatypes.epc.domain.mapper import EpcPropertyDataMapper
 from datatypes.epc.search.epc_search_result import EpcSearchResult
@ -126,3 +128,64 @@ def test_no_certs_in_the_postcode_yields_no_candidates() -> None:
    # Assert — no candidates, and the postcode was searched (normalisation/IO ran).
    assert candidates == []
    assert client.searched_postcode == "LS6 1AA"
+
+
+class _FakeEpcClientRaising:
+    """Serves the cohort, but raises the given exception for specific certs."""
+
+    def __init__(
+        self, results: list[EpcSearchResult], failures: dict[str, Exception]
+    ) -> None:
+        self._results = results
+        self._failures = failures
+
+    def search_by_postcode(self, postcode: str) -> list[EpcSearchResult]:
+        return self._results
+
+    def get_by_certificate_number(self, cert_num: str) -> EpcPropertyData:
+        if cert_num in self._failures:
+            raise self._failures[cert_num]
+        return _epc()
+
+
+def test_unmappable_cohort_cert_is_skipped_and_recorded() -> None:
+    # Arrange — two certs share the postcode; one can't be mapped (the mapper
+    # raises ValueError for a missing required field, as the gov-API 16.x / sparse
+    # certs do).
+    here = Coordinates(longitude=-1.55, latitude=53.81)
+    client = _FakeEpcClientRaising(
+        [_result("CERT-OK", uprn=12345), _result("CERT-BAD", uprn=67890)],
+        failures={
+            "CERT-BAD": ValueError(
+                "RdSapSchema17_1: missing required field 'window'"
+            )
+        },
+    )
+    geospatial = _FakeGeospatial({12345: here})
+    repo = EpcComparablePropertiesRepository(client, geospatial)
+
+    # Act
+    candidates = repo.candidates_for("LS6 1AA")
+
+    # Assert — the unmappable cert is excluded (does NOT sink the cohort) and the
+    # good one survives; the skip is recorded with the cert number + error so the
+    # mapper gap can be reported and closed later.
+    assert [c.certificate_number for c in candidates] == ["CERT-OK"]
+    assert len(repo.skipped) == 1
+    assert repo.skipped[0].certificate_number == "CERT-BAD"
+    assert "missing required field 'window'" in repo.skipped[0].error
+
+
+def test_transient_fetch_error_is_not_swallowed_as_unmappable() -> None:
+    # Arrange — a non-ValueError (e.g. a transient API failure) must NOT be
+    # silently skipped as "unmappable"; it propagates so the run fails loudly.
+    client = _FakeEpcClientRaising(
+        [_result("CERT-1", uprn=12345)],
+        failures={"CERT-1": RuntimeError("EPC API error 503")},
+    )
+    repo = EpcComparablePropertiesRepository(client, _FakeGeospatial({}))
+
+    # Act / Assert
+    with pytest.raises(RuntimeError, match="503"):
+        repo.candidates_for("LS6 1AA")
+    assert repo.skipped == []