Document RdSAP 20.0.0 Reduced-Field Synthesis (CONTEXT.md term + ADR-0027)

Sharpen the glossary to decouple deterministic old-schema re-mapping from neighbour-prediction gap-fill (a separate, unimplemented ML path), and add the Reduced-Field Synthesis term. ADR-0027 records the pre-SAP10 20.0.0 mapper's best-attempt synthesis (corpus-fit glazing 0.148xTFAxband, 4-way orientation) and its trade-offs. Grill resume doc captures every resolved branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 13:10:47 +00:00 · 2026-06-10 14:14:18 +00:00 · 2026-06-10 14:14:18 +00:00 · 5589a66e7c
commit 5589a66e7c
parent 14dc4efeed
3 changed files with 153 additions and 4 deletions
--- a/CONTEXT.md
+++ b/CONTEXT.md
@ -78,13 +78,17 @@ _Avoid_: patches (deprecated), corrections, manual EPC, edits
 ### Modelling

 **Effective EPC**:
-The assembled `EpcPropertyData` picture the modelling pipeline scores for a single Property. Assembled from whichever source applies: Site Notes alone; or the public EPC with **Landlord Overrides** applied; or — when the EPC is **old** — its schema re-mapped to current and gaps filled from neighbour predictions; or — when there is **no EPC** — components **estimated from surrounding properties**. Carries source-derived physical fields and originally recorded performance values; the performance scored from this picture is held separately in **Baseline Performance**.
+The assembled `EpcPropertyData` picture the modelling pipeline scores for a single Property. Assembled from whichever source applies: Site Notes alone; or the public EPC with **Landlord Overrides** applied; or — when the EPC is **old** — its schema re-mapped to current via **Reduced-Field Synthesis** (deterministic, from the cert plus calibrated coefficients — no neighbour data); or — when there is **no EPC** — components **estimated from surrounding properties** (a separate neighbour-prediction ML mechanism, not yet implemented). Carries source-derived physical fields and originally recorded performance values; the performance scored from this picture is held separately in **Baseline Performance**.
 _Avoid_: modelling EPC, working EPC, resolved EPC, derived EPC

 **Rebaselining**:
-Establishing a Property's **Effective Performance** (SAP score, EPC Band, CO2, Primary Energy Intensity, space-heating & hot-water kWh) by **assembling the Effective EPC picture and scoring it** through **SAP10 Calculation** (the deterministic `Sap10Calculator`, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013). The *assembly* is the substance: apply **Landlord Overrides** (e.g. boiler → ASHP, wall insulated) as a simulation on the `EpcPropertyData`; estimate components from surrounding properties when there is no EPC; re-map an old-schema EPC to current and gap-fill from neighbour predictions. The calculator is the **scoring engine at the tail**, not the whole of Rebaselining — so its call lives inside the Rebaseliner, after assembly. Triggered whenever the assembled picture differs from the lodged record: (a) the EPC was lodged under a methodology the calculator supersedes (`sap_version < 10.2`), (b) Overrides / Site Notes changed the physical state (walls / heating / windows / etc.), or (c) the picture is estimated or remapped rather than a real current EPC. Produces Effective Performance; Lodged Performance is preserved unchanged. The same single scoring also yields the per-end-use kWh that **Bill Derivation** prices — one scoring, two products. kWh is an ML target per ADR-0007 — see [[epc-ml-transform]].
+Establishing a Property's **Effective Performance** (SAP score, EPC Band, CO2, Primary Energy Intensity, space-heating & hot-water kWh) by **assembling the Effective EPC picture and scoring it** through **SAP10 Calculation** (the deterministic `Sap10Calculator`, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013). The *assembly* is the substance: apply **Landlord Overrides** (e.g. boiler → ASHP, wall insulated) as a simulation on the `EpcPropertyData`; re-map an old-schema EPC to current via **Reduced-Field Synthesis** (deterministic, cert-only); estimate components from surrounding properties when there is no EPC (neighbour-prediction gap-fill — a separate ML mechanism, not yet implemented). The calculator is the **scoring engine at the tail**, not the whole of Rebaselining — so its call lives inside the Rebaseliner, after assembly. Triggered whenever the assembled picture differs from the lodged record: (a) the EPC was lodged under a methodology the calculator supersedes (`sap_version < 10.2`), (b) Overrides / Site Notes changed the physical state (walls / heating / windows / etc.), or (c) the picture is estimated or remapped rather than a real current EPC. Produces Effective Performance; Lodged Performance is preserved unchanged. The same single scoring also yields the per-end-use kWh that **Bill Derivation** prices — one scoring, two products. kWh is an ML target per ADR-0007 — see [[epc-ml-transform]].
 _Avoid_: re-scoring, re-prediction, performance recomputation, refresh (for cache-freshness)

+**Reduced-Field Synthesis**:
+Deterministically translating an **old / reduced-data EPC schema** into the current `EpcPropertyData`, synthesising the *measured* fields the target expects from the source's *reduced or categorical* fields, using only the cert itself plus fixed calibrated coefficients — never neighbour data. Used when re-mapping a **pre-SAP10** cert (e.g. `RdSAP-Schema-20.0.0`) as part of assembling the **Effective EPC**: e.g. a glazing-area *band* + floor area → window m²; bath/shower *room counts* → bath and shower counts. A *best attempt* with no ground truth to validate against (per the **Validation Cohort** rule, a pre-SAP10 cert has no same-spec lodged figure to check), so each synthesis assumption is recorded explicitly in code and tests to keep it debuggable. Distinct from **neighbour-prediction gap-fill** (ML estimation of genuinely-absent fields from surrounding properties — the no-EPC path, a separate mechanism not yet implemented) and from the calculator's own RdSAP Table-5 defaulting in `cert_to_inputs` (which expands `EpcPropertyData` into the full SAP input set downstream).
+_Avoid_: gap-fill (means the neighbour-ML path), reduced-data expansion (overloaded with the calculator's Table-5 step), remapping (the schema-translation part only)
+
 **Baseline Performance**:
 A Property's current performance aggregate, holding both Lodged Performance and Effective Performance plus the energy block: delivered kWh **per end use** (heating, hot water, lighting, appliances, cooking, pumps/fans, cooling) and the **annual bill** composed into per-section costs plus a total, produced by **Bill Derivation** from SAP10 Calculation's per-end-use kWh × current Fuel Rates. Persisted as one row (flat typed columns, per-section kWh + cost + total); surfaced as one block in the UI.
 _Avoid_: baseline predictions, predicted baseline, rebaselined values
--- a/docs/adr/0027-rdsap-20-0-0-reduced-field-synthesis.md
+++ b/docs/adr/0027-rdsap-20-0-0-reduced-field-synthesis.md
@ -0,0 +1,101 @@
+---
+Status: accepted
+---
+
+# The pre-SAP10 (RdSAP 20.0.0) mapper does best-attempt Reduced-Field Synthesis
+
+Decided in a `/grill-me` → `/grill-with-docs` session (2026-06-10). Instantiates and extends
+[ADR-0015](0015-mappers-own-cert-normalization.md) (mappers own cert normalization); sits inside the
+**old-schema re-map** half of **Rebaselining** ([CONTEXT.md](../../CONTEXT.md): _Effective EPC_,
+_Rebaselining_, _Reduced-Field Synthesis_); relates to [ADR-0004](0004-baseline-performance-lodged-effective-pair.md)
+(lodged-vs-effective pair) and [ADR-0009](0009-deterministic-sap-calculator.md)/[ADR-0013](0013-calculator-produces-effective-performance-shadow-first.md)
+(the deterministic calculator that scores the result). Resume notes:
+[docs/grill-sessions/2026-06-09-rdsap-20-0-0-remapper.md](../grill-sessions/2026-06-09-rdsap-20-0-0-remapper.md).
+
+## Context
+
+`RdSAP-Schema-20.0.0` is a **pre-SAP10** schema (RdSAP 2012). Its certs are historical
+(2021–2024 bulk lodgements, harvested offline — the gov EPC API only returns 21.0.x), and we need
+them re-mapped to the current `EpcPropertyData` so they can be **Rebaselined**: assembled → scored by
+`Sap10Calculator` → **Effective Performance**. Per the **Validation Cohort** rule
+([CONTEXT.md](../../CONTEXT.md): _Spec Version_, _Calculated SAP10 Performance_), a pre-SAP10 cert
+has **no same-spec lodged figure to validate against** — the lodged 20.0.0 score is preserved as
+**Lodged Performance** but is *not* a 1:1 comparison target. So the calculator's output simply *is*
+the Effective Performance for these properties; there is no ground truth to check the mapping against.
+
+The problem: `EpcPropertyData` (and the calculator behind it) expects **measured** fields that
+20.0.0 records only **categorically**, or not at all:
+
+- **Windows** — 20.0.0 lodges a `glazed_area` *band* (Normal / More / Less, 945/1000 = Normal) and
+  dwelling-level aggregates, **not** per-window m² (only 7/1000 carry a `sap_windows` array). The
+  calculator needs `width × height` per window for heat-transmission and per-orientation solar gain.
+  An empty `sap_windows=[]` does **not** crash — it silently models a windowless dwelling (zero solar
+  gain, zero window heat loss), which is the worst outcome for a score that drives bills and packages.
+- **Hot water** — 20.0.0 lodges bath/shower *room counts*, not `number_baths` / `mixer_shower_count`.
+- **Lighting** — outlet counts + a low-energy count, not per-bulb-type counts.
+- **Ventilation / chimneys / sheltered sides** — partial or coded differently.
+
+The placeholder `RdSapSchema20_0_0` (generated from a single example) also over-constrains: 993/1000
+certs fail to even parse because fields the corpus routinely omits (`sap_windows`,
+`windows_transmission_details`, `lzc_energy_sources`, many `SapBuildingPart` fields) are declared
+required.
+
+Three ways to fill the measured fields were genuinely on the table:
+
+1. **Leave them empty/zero** — type-safe ingest only. Rejected: silently corrupts the score
+   (windowless dwellings; under-counted baths; under-stated lighting for 439/1000 certs).
+2. **Neighbour-prediction gap-fill** — ML-estimate from surrounding properties. This is a *separate*
+   mechanism, **not yet implemented**, reserved for the no-EPC case. Out of scope here.
+3. **Reduced-Field Synthesis** — deterministically synthesize the measured fields from the cert's own
+   reduced/categorical fields plus fixed coefficients. **Chosen.**
+
+## Decision
+
+The 20.0.0 mapper produces a *complete* `EpcPropertyData` by **Reduced-Field Synthesis** — using the
+cert alone plus fixed, corpus-calibrated coefficients, never neighbour data — so the existing
+`Sap10Calculator` runs unchanged. The calculator + its tests are the acceptance criterion; the mapper
+owns all synthesis (extending ADR-0015 from code-normalization to reduced-field synthesis).
+
+Load-bearing, surprising-without-context choices (the reason this is an ADR):
+
+- **Window area** = `0.148 × total_floor_area × band_multiplier`. The `0.148` (median) and the band
+  multipliers `{Normal 1.00, More 1.25, Less 0.81, MuchMore 1.51, MuchLess 0.62}` are fit from the
+  **glazing-area ÷ floor-area ratio of all 1000 real 21.0.1 certs** (quartiles: P25 0.12 / P50 0.148 /
+  P75 0.185 / P90 0.224), which the band labels map onto. Chosen over the published **RdSAP 2012**
+  band→m² formula because that spec is retired and not in our possession (RdSAP10 / SAP10.2 both
+  *measure* windows and dropped it), whereas our own corpus is data we hold and can validate against a
+  held-out split. The 7 rich certs use their lodged `window_area` directly.
+- **Orientation** — 20.0.0 records none, so the synthesized area is **split 4-way across N/E/S/W**;
+  the unchanged `solar_gains.py` then averages them (the avg-orientation treatment). The spec's literal
+  default for unrecorded orientation is **E/W** (RdSAP10 §7 / §8.2); 4-way lands within ~3% of that and
+  was chosen for the "distribute it" intuition. (The prior calculator behaviour — *skip* unknown
+  orientation → zero solar gain — is a downward bias we are removing.)
+- **Window geometry representation** — each synthesized window is `width = area/4, height = 1.0`
+  (width×height is the only quantity the calculator reads; exact, matching existing Elmhurst precedent).
+- **Everything the calculator already defaults, the mapper leaves to it.** `cert_to_inputs` is the
+  RdSAP Table-5 expansion engine (extract-fans from age+rooms, suspended-timber sealing, draught-lobby,
+  modal hot-water defaults). The mapper supplies raw reduced data only and does **not** re-derive these.
+- **Schema parse fix** — data-driven required→optional: any field present in <100% of the corpus
+  becomes `Optional` (`[]` for the list fields, `None` otherwise), so all 1000 certs parse.
+
+Because there is no ground truth (per the Validation-Cohort rule), **every synthesis assumption is
+recorded explicitly in code comments and test names**, so a future debugger can see exactly which
+coefficient or default produced a surprising Effective Performance.
+
+## Consequences
+
+- **Every 20.0.0 property's Effective Performance depends on these coefficients.** Changing `0.148`,
+  the band multipliers, or the 4-way split shifts every rebaselined 20.0.0 score — and any bill /
+  package built on it. That is the "hard to reverse" cost; it is why they live in one named place with
+  their derivation recorded, not scattered as magic numbers.
+- **Synthesis is a best attempt, not validated.** We cannot close the loop against a lodged SAP10
+  figure for these certs. Revisit if either (a) the **RdSAP 2012** band→m² formula is sourced (cross-check
+  / replace the corpus fit), or (b) a same-spec **Validation Cohort** becomes available.
+- **Fidelity ceilings are accepted and documented:** 20.0.0 cannot give per-orientation glazing
+  (single averaged treatment), cannot distinguish roof windows from wall windows (all treated as wall),
+  and approximates 1 lighting outlet ≈ 1 bulb. These are inherent to the reduced schema, not bugs.
+- **Neighbour-prediction gap-fill stays out.** If a future slice wants to improve a synthesized field
+  from surrounding properties, that is the separate (unimplemented) ML mechanism and a new ADR — not a
+  tweak to this deterministic path.
+- The corpus test flips from `xfail` to a strict 1000/1000 **parse + scores-without-crashing** guard;
+  it is a *mapper-correctness* vehicle, **not** a lodged-vs-effective accuracy check.
--- a/docs/grill-sessions/2026-06-09-rdsap-20-0-0-remapper.md
+++ b/docs/grill-sessions/2026-06-09-rdsap-20-0-0-remapper.md
@ -1,8 +1,52 @@
 # Grill session — RdSAP-Schema-20.0.0 → EpcPropertyData remapper

-**Date:** 2026-06-09  ·  **Branch:** `feature/junte+khalim`  ·  **Status:** paused at Q1 (awaiting answer)
+**Date:** 2026-06-09 (resumed 2026-06-10)  ·  **Branch:** `feature/junte+khalim`  ·  **Status:** GRILL COMPLETE — all branches resolved (windows, lighting, glazing, ventilation, hot-water, schema fix, trivia). Ready for ADR + TDD.

-Resume by re-running `/grill-me` and feeding it this file, or just answer **Q1** below and continue.
+Resume by re-running `/grill-me` and feeding it this file.
+
+---
+
+## RESOLVED in 2026-06-10 grill
+
+**Spec sources (authoritative):** RdSAP10 Specification (9 June 2025) + **SAP 10.2** (already in repo) for anything RdSAP10 doesn't cover. The band→m² glazing rule exists in *neither* (both measure all windows); it was a RdSAP-2012-only convention — hence Q2 below resolved by fitting from our own data.
+
+**Q1 — bar for "correct".** RESOLVED → **(a) full SAP cascade parity.** Use case is a *counterfactual spec-change predictor*: take 2021–24 lodged 20.0.0 data, hold the building constant, run our 21.0.1 `sap10_calculator` to estimate "what EPC today, no changes" (replaces a surveyor visit). ⇒ mapping-fidelity matters (empty/zero windows would silently corrupt solar+heat-loss and conflate "spec changed" with "we mangled the windows"). Empty `sap_windows` rules out (b)/(c).
+
+**Deliverable framing.** The job is the **mapper** (`from_rdsap_schema_20_0_0`), not the calculator. Make it produce a *complete* `EpcPropertyData` so the existing `sap10_calculator` runs unchanged. Calculator + its tests = the spec / acceptance criterion. Mapper owns ALL synthesis.
+
+**Windows (gap B) — fully specified:**
+- **Area:** `window_area = 0.148 × total_floor_area × band_multiplier`. The `0.148` = median glazing/floor ratio measured from all 1000 real 21.0.1 certs (mean 0.155; ~constant across dwelling sizes 0.141–0.156, so a flat proportional rule is sound). Band 1 ("Normal") = ×1.0 and covers 945/1000 certs. The 7 rich certs (have `sap_windows`) use their lodged `window_area` directly.
+- **Band multipliers:** RESOLVED → `{1:1.00, 2:1.25, 3:0.81, 4:1.51, 5:0.62, ND:1.00}`, derived from the 21.0.1 ratio-distribution quartiles (Normal=P50, More=P75/P50, Less=P25/P50, MuchMore=P90/P50, MuchLess=P10/P50) — same source as the 0.148. Interpretation (human band → population quartile) but defensible; 55/1000 certs, low impact. ADR-worthy.
+- **Orientation:** 20.0.0 records none → **split the synthesized area 4-way across N/E/S/W** (codes 1/3/5/7) so the *unchanged* `solar_gains.py` averages them (avg-orientation treatment; current code skips unknown-orientation → zero solar gain, which we're removing). Spec's literal default for unrecorded orientation is E/W (§7 + §8.2/Table 25 conservatory note); 4-way ≈ that average within ~3%, chosen for the "distribute it" intuition.
+- **Representation:** each of the 4 windows = `SapWindow(window_width = area/4, window_height = 1.0)`. width×height is the ONLY thing the calculator reads (verified: solar_gains/internal_gains/heat_transmission/ML all use the product); `width=area, height=1.0` is exact and matches existing precedent at `mapper.py:4448-4456` (Elmhurst path).
+- **U / g:** use `windows_transmission_details` (u_value, solar_transmittance) where present (687/1000); else default from **RdSAP10 Table 24** keyed by `multiple_glazing_type` + `glazing_gap` + age band (313/1000). Frame factor 0.7 PVC/wood, 0.8 metal.
+- **Glazing type:** RESOLVED → 20.0.0 `glazed_type` codes 1–8+ND are IDENTICAL to 21.0.1's (epc_codes.csv), so route through the existing `_api_cascade_glazing_type` verbatim: `glazing_type = _api_cascade_glazing_type(multiple_glazing_type)` for the 993, `_api_cascade_glazing_type(w.glazing_type)` for the 7. This (a) fixes a real bug — current mapper raw-passes the code, so the calculator mis-reads code 1 "double pre-2002" as single (62 certs); (b) needs no extension despite the cascade only remapping 1→2, because g⊥ comes from the per-window `window_transmission_details.solar_transmittance` (lodged for 687, Table-24-synthesized for 313) — so `glazing_type` only feeds g_L daylight, where the cascade is correct for every code present (code-5 single never appears; secondary g_L=0.80 ✓). `ND` is a string → calculator defaults g⊥ to 0.76 (double) naturally.
+
+**Non-window gaps — spec resolutions found (RdSAP10 tables), most likely already in `sap10_calculator`; verify which the calculator applies vs. the mapper must supply:**
+- Hot-water demand (bath/shower counts): RESOLVED → derive from `instantaneous_wwhrs` room counts. NAME TRAP: domain `InstantaneousWwhrs` = WWHR *device* index numbers (correctly empty for 20.0.0); the 20.0.0 *schema* `instantaneous_wwhrs` carries *room counts* (`rooms_with_bath_and_or_shower`, `rooms_with_mixer_shower_no_bath`, `rooms_with_bath_and_mixer_shower`). Calculator reads `sap_heating.number_baths`/`mixer_shower_count`/`electric_shower_count` (modal-default 1 bath/1 mixer/0 electric when None — `cert_to_inputs.py:4613-4660`). Present in 1000/1000; **496/1000 have ≠1 bath**, so modal default systematically under-counts HW demand for multi-bath homes. Map: `number_baths = rooms_with_bath_and_or_shower + rooms_with_bath_and_mixer_shower`; `mixer_shower_count = rooms_with_mixer_shower_no_bath + rooms_with_bath_and_mixer_shower`; `electric_shower_count=0`. IMPL CAVEAT: confirm `rooms_with_bath_and_or_shower` means "has a bath" (not shower-only) against RdSAP data dictionary before trusting the bath arithmetic — bounded risk.
+- `region_code`: NOT a gap — calculator discards it (`cert_to_inputs.py:1409 _region_index`) and uses **UK-average climate for the SAP rating** per RdSAP §14. No mapping needed for the score; only relevant for postcode-accurate *costs* (future, secondary).
+- `wet_rooms_count`: calculator reads it raw (defaults to 1 when 0; `cert_to_inputs.py:606`), does NOT derive it. Only feeds mechanical-extract fan counts → irrelevant for 995 natural certs, matters for 5 MEV. → mapper derives from habitable rooms (RdSAP Table 5). Currently hardcoded 0.
+- `sap_roof_windows`: calculator None-safe (`... or []`). 20.0.0 has no rooflight signal → set None; all synthesized glazing = wall (vertical) windows. Minor fidelity ceiling. ADR-note.
+- `blocked_chimneys_count`: domain `Optional[int]=None`, calc treats None→0. No 20.0.0 source → leave None.
+- Energy scores/costs (gap E): cert OUTPUTS, Optional on domain, calculator RECOMPUTES (not inputs). → map 1:1 as the **comparison baseline** (lodged old-20.0.0 vs our recomputed 21.0.1-spec score). Zero calc impact.
+- `percent_draughtproofed`: schema has it → populate. EASY WIN.
+- `wet_rooms_count`: RdSAP10 Table 5 → derived from habitable rooms (1–2→K+1, 3–4→K+2, …).
+- `sheltered_sides`: Table 5 → from built form (0 detached, 1 semi/end, 3 enclosed-mid, 2 else).
+- extract fans / draught lobby / infiltration: Table 5 → from age band + rooms + built form.
+- living area, cylinder size/insulation, door area (1.85 m²): Tables 27/28/29, §3.7 defaults.
+- Ventilation: RESOLVED. **Key finding — the CALCULATOR is the RdSAP-expansion engine**: it applies Table-5 defaults internally (extract-fans from age+rooms via `_rdsap_extract_fans_default`, suspended-timber sealing, draught-lobby default), so the mapper supplies raw reduced-data only and must NOT re-derive them. Fixes/construction:
+  - `open_chimneys_count = schema.open_fireplaces_count` (currently hardcoded 0 → drops 80 m³/h/chimney for 53 certs). BUG FIX.
+  - `percent_draughtproofed = schema.percent_draughtproofed` (1000 certs present, currently unset). EASY WIN.
+  - `sap_ventilation = SapVentilation(sheltered_sides=_api_sheltered_sides(schema.built_form), mechanical_ventilation_kind=<decode>)`. All flue/fan/vent counts left None → calculator defaults them. **sheltered_sides matters**: if sap_ventilation=None the calculator defaults it to 2 (mid-terrace) for ALL dwellings (wrong for detached=0, enclosed-mid=3).
+  - `_api_sheltered_sides` reuse VERIFIED safe: 20.0.0 `built_form: int` shares the identical 1–6 code space as 21.0.1 (epc_codes.csv); all corpus values 1–6 mapped; 'NR' (absent) → None → calculator default, no crash.
+  - `mechanical_ventilation` int decode (20.0.0 codes 0=natural, 1=mech supply+extract, 2=mech extract only): 0→NATURAL, 2→mech-extract kind (5 certs), 1→MV no-HR (0 certs; 20.0.0 has no HR flag → conservative). Calculator path: `cert_to_inputs.py:4522+ ventilation_from_cert`.
+- `sap_roof_windows`: 20.0.0 has no roof/wall distinction → treat all as wall windows.
+- Lighting (gap C): RESOLVED. 20.0.0 gives `fixed_lighting_outlets_count` (total) + `low_energy_fixed_lighting_outlets_count` (low-energy); `low_energy_lighting` is just the % (ignore). Current mapper hardcodes led/cfl/incandescent=0 → understates lighting for **439/1000** certs that have incandescent bulbs. Fix: `low_energy_fixed_lighting_bulbs_count = low_energy_outlets` (→ calculator LEL path, 15 W/80 Lm/W, per RdSAP10 §12-1 unknown-split), `incandescent_fixed_lighting_bulbs_count = total − low_energy`, led=cfl=0. Approximation: 1 outlet ≈ 1 bulb (20.0.0 has no bulb count; older-RdSAP behaviour). ADR-note both. Calculator path: `internal_gains.py:565 _lighting_capacity_and_efficacy_from_cert`.
+
+**Verified facts worth keeping:**
+- 20.0.0 is **offline-only** (gov API returns 21.0.1/21.0.0 only); corpus built from bulk dumps (`scripts/harvest_certs.py`). Only the xfail corpus test exercises it.
+- 993/1000 fail at the **schema-parse layer** (placeholder over-constrains required fields); fix = required→optional with defaults (`sap_windows`→`[]`, `lzc_energy_sources`→`[]`, `roof_insulation_thickness`→Optional).
+- Corpus window-field presence: `glazed_area` 1000 (band: 1=945,2=45,4=7,3=3), `multiple_glazing_type` 1000, `multiple_glazed_proportion` 1000, `windows_transmission_details` 687, `sap_windows` 7.

 ---