diff --git a/docs/grill-sessions/2026-06-10-pre-sap10-mapper-generalization.md b/docs/grill-sessions/2026-06-10-pre-sap10-mapper-generalization.md new file mode 100644 index 00000000..5602e77b --- /dev/null +++ b/docs/grill-sessions/2026-06-10-pre-sap10-mapper-generalization.md @@ -0,0 +1,181 @@ +# Grill spec — generalise Reduced-Field Synthesis to the rest of the pre-SAP10 RdSAP family + +**Date:** 2026-06-10 · **Branch:** `feature/junte+khalim` · **Status:** SPEC — READY TO GRILL. + +Grill this by running `/grill-me` and feeding it this file. Start at **Q1 (ROOT)**. + +--- + +## Why this exists + +The RdSAP **20.0.0** mapper now works end-to-end: all 1000 corpus certs parse, map via +**Reduced-Field Synthesis** (ADR-0027), and score through `Sap10Calculator` without crashing. +`scripts/eon/find_epc_data.py` shows lodged-vs-our-calculated SAP side by side and the deltas are +sane (mostly ±7, same band). The pattern is proven. + +The goal now: **apply the same playbook to the other pre-SAP10 RdSAP specs** so historical EPC data +across more lodgement years can be Rebaselined. This is pure leverage — the hard design thinking +(synthesis coefficients, Validation-Cohort rule, schema-fix mechanism) is already done; what remains +is per-spec drift and a decision about how much to share vs copy. + +## What we already hold (the repeatable 20.0.0 playbook) + +Each step below is *proven* for 20.0.0. The grill is about which steps change per spec. + +1. **Harvest a corpus** — `scripts/eon/harvest_certs.py` streams a local bulk dump + (`downloads/certificates-YYYY.json`) for the year that spec dominates, caps at 1000, writes + `backend/epc_api/json_samples//corpus.jsonl`. No API token needed. +2. **Fix the placeholder schema** — every `rdsap_schema_*.py` was generated from ONE example so it + over-constrains. Make it `@dataclass(kw_only=True)` + data-driven required→optional (any field + present in <100% of the corpus gets a default; `[]` for lists, `None` otherwise) → all certs parse. +3. **Synthesise the measured fields** the reduced schema only records categorically: + windows (`glazed_area` band × floor area, 4-way N/E/S/W split), lighting LEL, hot-water bath/mixer + counts, ventilation/chimneys/sheltered-sides, glazing cascade. +4. **Leave calculator defaults to the calculator** — `cert_to_inputs` is the RdSAP Table-5 expansion + engine; the mapper supplies raw reduced data only. +5. **Wire dispatch + flip a strict guard** — add the `schema_type` branch to + `from_api_response`, promote the corpus into the strict parse+map bucket in + `infrastructure/epc_client/tests/test_mapper_corpus.py`. +6. **Record every synthesis assumption in code comments + test names** (Validation-Cohort rule: no + same-spec ground truth). + +## Ground truth about the targets (verified 2026-06-10) + +| Spec | Schema module | Mapper method | Dispatched? | Corpus? | Notes | +|------|---------------|---------------|-------------|---------|-------| +| 21.0.1 | ✅ | `from_rdsap_schema_21_0_1` | ✅ | ✅ 1000 | reference (rich, measured) | +| 21.0.0 | ✅ | `from_rdsap_schema_21_0_0` | ✅ | ❌ | dispatched but unguarded | +| **20.0.0** | ✅ | `from_rdsap_schema_20_0_0` | ✅ | ✅ 1000 | **DONE — the template** | +| **19.0** | ✅ | `from_rdsap_schema_19_0` | ❌ | ❌ | orphaned; `sap_windows=[]` hardcoded | +| **18.0** | ✅ | `from_rdsap_schema_18_0` | ❌ | ❌ | orphaned | +| **17.1** | ✅ | `from_rdsap_schema_17_1` | ❌ | ❌ | orphaned | +| **17.0** | ✅ | `from_rdsap_schema_17_0` | ❌ | ❌ | orphaned | + +- 19.0 confirmed same reduced-field shape as 20.0.0: `glazed_area: int` band + `multiple_glazing_type: + int`, and the mapper currently hardcodes `sap_windows=[]` — i.e. the exact windowless-corruption bug + that 20.0.0's synthesis fixed. 18.0/17.1/17.0 are almost certainly the same family. +- The 17–19 mapper methods **exist** but are unreachable: `from_api_response` only branches 21.0.1 / + 21.0.0 / 20.0.0; everything else hits `raise ValueError(f"Unsupported EPC schema")`. +- **Corpora are harvestable.** `downloads/README.txt` schema-by-year: + `2020 → RdSAP-Schema-19.0 (1632)`, `2021–2024 → 20.0.0`, `2025–2026 → 21.0.1`. Older RdSAP (17.x/18.0) + live in the 2012–2019 dumps (all present locally). `SAP-Schema-1x` (full/design SAP) and `CEPC-*` + (commercial) are different families with no RdSAP mapper. + +--- + +## Decision tree to grill (each has a recommended answer) + +### Q1 (ROOT) — Target set and order. What are we generalising to, and in what order? +**Recommend:** the **pre-SAP10 RdSAP family only**, one spec at a time, **19.0 first** (dominant in the +2020 dump, closest sibling to 20.0.0, mapper already stubbed), then 18.0 → 17.1 → 17.0 as their dumps +confirm volume. **Exclude** `SAP-Schema-1x` (full/design SAP — new-build, not reduced; a separate +mapper family and ADR) and `CEPC-*` (non-domestic). **Carve out** 21.0.0 as a quick win: it's already +dispatched, it just needs a harvested corpus to join the strict guard. +*Sub-question:* do we batch all four 17–19 in one branch sweep, or land 19.0 fully (corpus → schema → +synthesis → dispatch → guard) before starting 18.0? Recommend: **land 19.0 end-to-end first** — it +either confirms the playbook transfers cleanly (then 18.0/17.x are fast) or surfaces drift early. + +### Q2 — Coefficient reuse vs re-fit (the load-bearing, ADR-worthy one). +20.0.0's glazing synthesis uses `0.148 × TFA × band_multiplier`, fit from the **21.0.1** corpus's +glazing/floor ratio. For 19.0/18.0/17.x: reuse the same coefficients, or re-fit per spec? +**DIRECTION (user, 2026-06-10): re-work the coefficients from each new corpus's own data — do not +inherit the 21.0.1 fit by default.** Treat `0.148` + the band multipliers as a *starting hypothesis* to +confirm or replace against what the new corpus actually shows, per spec. The empirical numbers lead; we +only keep the 20.0.0 values if the new corpus reproduces them. + +**The constraint this hits (must resolve while grilling):** a reduced schema does **not** measure +per-window area (that's the whole reason synthesis exists), so a 19.0/18.0/17.x corpus *cannot +self-fit the glazing/floor ratio* — there's no measured glazing column in it to regress on. So +"work it out from the new corpus" splits into two parts: +- **What the reduced corpus *can* give us directly** → re-derive per spec: the `glazed_area` band + *distribution* (how many Normal/More/Less), `total_floor_area` distribution, and whether the band + codes/semantics match 20.0.0. This validates (or breaks) the band-multiplier assumption empirically. +- **The base ratio itself (the `0.148`)** → needs a *measured* reference from the same stock/era. + Options to grill: (a) use the contemporaneous measured corpus if one exists for that year (e.g. a + rich-window spec lodged alongside), (b) fit from the handful of rich certs the reduced corpus *does* + carry (20.0.0 had 7/1000 with lodged `sap_windows` — check the count per spec), or (c) fall back to + the 21.0.1 fit *only* if (a)/(b) yield too little signal, and say so explicitly. + +This moves every rebaselined score for the spec, so the per-spec fit + its evidence wants an ADR +(extends ADR-0027). Record the derivation (corpus, sample size, quartiles) the same way 20.0.0 did. + +### Q3 — Code-space drift across versions. +Do 17–19 use the same integer code spaces as 20.0.0 (glazing_type, built_form, orientation, fuel, +heat-emitter, party-wall, roof/floor construction)? 20.0.0's codes turned out **identical** to 21.0.1's, +so we routed through the existing cascades verbatim. **Recommend:** assume identical within the RdSAP +family; cross-check each version against `epc_codes.csv` during grilling and add a per-spec cascade +override *only* where the corpus proves a code diverged. Don't pre-build translation layers. + +### Q4 — Schema-fix mechanism. Same `kw_only` + data-driven required→optional? +**Recommend: yes, unchanged.** Each placeholder schema over-constrains identically (single-example +generation). Run the one-pass corpus scan to enumerate all missing-required fields at once (not +whack-a-mole), then default them. Mechanical, low-risk, proven. + +### Q5 — Shared synthesis helper vs per-mapper copy (the architecture fork). +20.0.0's synthesis lives in `_synthesise_20_0_0_sap_windows` + inline mapper blocks. With 19.0 we'll +have a **second instance** — the classic extract trigger. **Recommend:** once 19.0 is green, extract a +single spec-parameterised `_synthesise_reduced_field_windows(glazed_area, tfa, glazing_type)` (and +shared lighting/hot-water/ventilation helpers) so 18.0/17.x are near-free and the coefficients live in +exactly one place. Defer the extraction until 19.0 confirms the shape (avoid abstracting from one +example). This is the `/improve-codebase-architecture` hook — a deep module behind a small interface. + +### Q6 — Per-spec field availability. +Do 17–19 actually lodge the synthesis *inputs* 20.0.0 relies on — `instantaneous_wwhrs` (bath room +counts), `low_energy_fixed_lighting_outlets_count`, `percent_draughtproofed`, `open_fireplaces_count`, +`multiple_glazing_type`? Older specs may omit or rename some. **Recommend:** profile each corpus up +front (one-pass field-presence scan); where a 20.0.0 input is absent, degrade gracefully to the +calculator's own default rather than fabricating — and record the gap in a test name. + +### Q7 — Dispatch wiring + acceptance bar. +**Recommend:** per spec, add the `schema_type` branch to `from_api_response` (wrapped in +`_clear_basement_flag_when_system_built` like the others) and promote its corpus into the strict +parse+map bucket in `test_mapper_corpus.py`. Smoke-check with `scripts/eon/find_epc_data.py` (extend +the UPRN list with that spec's certs) — our re-score should track the lodged figure within a sane band. +The formal SAP-score *value* test stays deferred (same as 20.0.0) until we choose to land it. + +### Q8 — Validation-Cohort / is there ANY cross-check? +Same rule as 20.0.0: a pre-SAP10 cert has no same-spec lodged figure to validate against, so synthesis +assumptions go in code/test names. **But probe one opportunistic anchor:** a single UPRN re-lodged +across spec versions (e.g. a dwelling with both a 19.0 and a 20.0.0 cert, unchanged between) — our +re-score of both should roughly agree. **Recommend:** if dual-lodged UPRNs surface during harvest, keep +a handful as a cross-spec regression anchor; don't block on it. + +--- + +## How to reproduce / kick off (19.0 first) + +```bash +# 1. Confirm 19.0 volume + reduced-field shape in the 2020 dump +python - <<'EOF' +import json +from collections import Counter +# stream the first N lines of certificates-2020.json, bucket by schema_type, +# and dump one RdSAP-Schema-19.0 document to inspect glazed_area / sap_windows +EOF + +# 2. Add a harvest source row and run it +# scripts/eon/harvest_certs.py SOURCES += ("certificates-2020.json","RdSAP-Schema-19.0",1000) + +# 3. Drive the (orphaned) 19.0 mapper against the new corpus to bucket parse failures +python - <<'EOF' +import json, collections +from pathlib import Path +from datatypes.epc.schema.rdsap_schema_19_0 import RdSapSchema19_0 +from datatypes.epc.schema.helpers import from_dict +certs=[json.loads(l) for l in Path("backend/epc_api/json_samples/RdSAP-Schema-19.0/corpus.jsonl").read_text().splitlines() if l.strip()] +b=collections.Counter() +for c in certs: + try: from_dict(RdSapSchema19_0, c) + except Exception as e: b[f"{type(e).__name__}: {str(e)[:70]}"]+=1 +for k,v in b.most_common(): print(v,k) +EOF +``` + +## References + +- **ADR-0027** (`docs/adr/0027-rdsap-20-0-0-reduced-field-synthesis.md`) — the synthesis decision, + coefficients, rejected alternatives. Extend (not replace) for the family-wide coefficient choice (Q2). +- **ADR-0015** (mappers own cert normalization), **ADR-0004** (lodged-vs-effective pair). +- **CONTEXT.md** — _Reduced-Field Synthesis_, _Rebaselining_, _Lodged / Effective Performance_, + _Validation Cohort_, _pre-SAP10_. +- **20.0.0 resume doc** — `docs/grill-sessions/2026-06-09-rdsap-20-0-0-remapper.md` (the worked example). diff --git a/next_claude_prompt.txt b/next_claude_prompt.txt new file mode 100644 index 00000000..309926b1 --- /dev/null +++ b/next_claude_prompt.txt @@ -0,0 +1 @@ +/grill-me docs/grill-sessions/2026-06-10-pre-sap10-mapper-generalization.md diff --git a/scripts/run_modelling_e2e.py b/scripts/run_modelling_e2e.py index e38fdb63..7019361d 100644 --- a/scripts/run_modelling_e2e.py +++ b/scripts/run_modelling_e2e.py @@ -124,16 +124,16 @@ def _s3_parquet_reader(bucket: str) -> ParquetReader: return read -def _spatial_for( - repo: GeospatialS3Repository, uprn: int -) -> Optional[SpatialReference]: +def _spatial_for(repo: GeospatialS3Repository, uprn: int) -> Optional[SpatialReference]: """The UPRN's spatial reference (coordinates + planning protections), or None when S3 doesn't cover it — a missing reference must not abort the run, so a lookup error degrades to None (unrestricted, no solar).""" try: return repo.spatial_for(uprn) except Exception as error: # noqa: BLE001 — S3/parquet hiccup is non-fatal - print(f" spatial lookup failed for uprn {uprn}: {type(error).__name__}: {error}") + print( + f" spatial lookup failed for uprn {uprn}: {type(error).__name__}: {error}" + ) return None @@ -186,7 +186,9 @@ def _parse_measures(raw: Optional[str]) -> Optional[frozenset[MeasureType]]: (consider every modelled measure) when unset. Raises on an unknown type.""" if raw is None: return None - return frozenset(MeasureType(token.strip()) for token in raw.split(",") if token.strip()) + return frozenset( + MeasureType(token.strip()) for token in raw.split(",") if token.strip() + ) def _context_summary( @@ -252,8 +254,12 @@ def _persist( def main() -> None: parser = argparse.ArgumentParser(description=__doc__) - parser.add_argument("property_ids", type=int, nargs="+", help="Property ids to model") - parser.add_argument("--goal", default="C", help="target band when no --scenario-id (default C)") + parser.add_argument( + "property_ids", type=int, nargs="+", help="Property ids to model" + ) + parser.add_argument( + "--goal", default="C", help="target band when no --scenario-id (default C)" + ) parser.add_argument( "--scenario-id", type=int, default=None, help="model against this DB Scenario" ) @@ -263,12 +269,16 @@ def main() -> None: help="comma-separated measure types to consider (default: all)", ) parser.add_argument( - "--portfolio-id", type=int, default=None, help="portfolio id (required for --persist)" + "--portfolio-id", + type=int, + default=None, + help="portfolio id (required for --persist)", ) parser.add_argument( "--persist", action="store_true", help="WRITE the inputs + Plan to the DB (default: inspect only, no writes)", + default=False, ) parser.add_argument( "--no-solar", @@ -355,7 +365,9 @@ def main() -> None: solar_insights=solar_insights, plan=plan, ) - except Exception as error: # noqa: BLE001 — one bad property must not stop the run + except ( + Exception + ) as error: # noqa: BLE001 — one bad property must not stop the run line = f"property {property_id} (uprn {uprn}): ERROR — {type(error).__name__}: {error}" print(line + "\n") md_lines.append(f"## Property {property_id}\n\n`{line}`\n")