Record pre-SAP10 RdSAP family coefficient transfer (ADR-0028)

Documents the inherit-and-validate decision for 18.0: reuse 20.0.0's 0.148 +
band multipliers (the corpus can't self-fit — 958/1000 band-1 with no measured
band-1 windows), validated against 18.0's own band-4 rich certs (0.223 obs vs
0.148 x 1.51 pred). References ADR-0027 one-way (keeps the accepted ADR immutable).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Jun-te Kim 2026-06-11 12:06:03 +00:00
parent 744aa5482d
commit cc0e875fd8

View file

@ -0,0 +1,106 @@
---
Status: accepted
---
# Pre-SAP10 RdSAP coefficients transfer across the family: inherit-and-validate, starting with 18.0
Decided in a `/grill-me` session (2026-06-11). **Extends** [ADR-0027](0027-rdsap-20-0-0-reduced-field-synthesis.md)
(RdSAP 20.0.0 Reduced-Field Synthesis) from a single spec to the wider **pre-SAP10 RdSAP family**;
sits inside the **old-schema re-map** half of **Rebaselining** ([CONTEXT.md](../../CONTEXT.md):
_Effective EPC_, _Rebaselining_, _Reduced-Field Synthesis_, _Validation Cohort_, _Spec Version_).
Relates to [ADR-0015](0015-mappers-own-cert-normalization.md) (mappers own cert normalization) and
[ADR-0004](0004-baseline-performance-lodged-effective-pair.md) (lodged-vs-effective pair). Grill spec:
[docs/grill-sessions/2026-06-10-pre-sap10-mapper-generalization.md](../grill-sessions/2026-06-10-pre-sap10-mapper-generalization.md).
## Context
ADR-0027 proved Reduced-Field Synthesis end-to-end for `RdSAP-Schema-20.0.0`. The pre-SAP10 RdSAP
family has more orphaned siblings (`19.0`, `18.0`, `17.1`, `17.0`) whose mapper methods exist but are
unreachable (`from_api_response` never dispatches to them) and whose placeholder schemas over-constrain
identically. We want each re-mapped to the current `EpcPropertyData` so its historical certs can be
**Rebaselined**. This ADR records the *family-level* coefficient decision; `18.0` is the first instance
and the worked example. (Order set by direction 2026-06-11: **18.0 alone, end-to-end, first**; `17.1`
is a separate later effort.)
ADR-0027 left one question open for the rest of the family: do later pre-SAP10 specs **reuse** 0027's
fitted coefficients (`0.148 × total_floor_area × band_multiplier`, multipliers
`{Normal 1.00, More 1.25, Less 0.81, MuchMore 1.51, MuchLess 0.62}`), or **re-fit** per spec? The
initial direction (2026-06-10) was *re-fit from each new corpus's own data — do not inherit by default*.
Profiling the harvested `18.0` corpus (1000 certs from `certificates-2018.json`, ~82% of that dump)
showed why a literal re-fit is **not achievable**, and — more usefully — that it is **not necessary**:
- **The corpus cannot self-fit the glazing/floor ratio.** A reduced schema records `glazed_area` as a
*band*, not per-window m². `18.0`'s population is **958/1000 band-1 (Normal)**, and only **10/1000**
carry a lodged `sap_windows` array at all. So there is no measured glazing column to regress on for
the band that dominates the stock — the exact constraint ADR-0027 anticipated.
- **The 10 rich certs are systematically the outliers, not a representative sample.** They are
**9× band-4 ("Much More Than Typical") + 1× band-5 ("Much Less")**, with **zero band-1**. The
dwellings that bother to lodge full per-window geometry are the unusually-glazed ones. A "fit" off
these would measure band-4 dwellings, then dividing by the band-4 multiplier (1.51) only reconstructs
`0.148` — circular.
- **Where the corpus *can* be measured, it reproduces ADR-0027's model almost exactly:**
| Band | 18.0 observed glazing/floor (n) | ADR-0027 predicts (`0.148 × mult`) |
|------|---------------------------------|------------------------------------|
| 4 (MuchMore, ×1.51) | **0.223** (n=9) | **0.223** |
| 5 (MuchLess, ×0.62) | **0.086** (n=1) | **0.092** |
So the new corpus's own data **validates** the inherited coefficients rather than contradicting them.
- **Integer code spaces are identical.** `built_form`, `glazed_area`, `glazed_type`, and
`mechanical_ventilation` were diffed against `datatypes/epc/domain/epc_codes.csv` for
`18.0` / `17.1` / `20.0.0` / `21.0.1`: byte-identical for every code the corpus uses (`glazed_type`
1-8 + ND; `built_form` 1-6 + NR; `glazed_area` 1-5 + ND). The cert-side codes never reach 21.0.1's
later extensions. So the verified 21.0.1 glazing/sheltered-sides cascades apply verbatim — no per-spec
override.
## Decision
For the pre-SAP10 RdSAP family, **inherit ADR-0027's coefficients and validate the transfer per spec —
do not re-fit by default.** Concretely, for `18.0` (and as the rule for `17.x`/`19.0`):
- **Reuse `0.148` and the band multipliers unchanged.** The corpus structurally cannot self-fit them
(96% band-1, zero measured band-1 windows), and where it can be measured it reproduces the inherited
model to within rounding. Re-fit a spec **only if** its own rich certs contradict the inherited model;
`18.0` does not.
- **The rich certs are a per-spec Validation Cohort, not a fit set.** Their lodged `window_area` is used
**directly** as geometry (the accuracy-where-we-have-it rule from ADR-0027 — synthesise only over the
windowless majority, never over real measured data). For `18.0` that is 10 certs direct, 990
synthesised.
- **Route through the existing verified cascades verbatim** (glazing-type, sheltered-sides), per the
code-space diff above.
- **Schema parse fix = ADR-0027's mechanism plus one additive change.** (a) `@dataclass(kw_only=True)` +
data-driven required→optional: any field present in <100% of the corpus gets a default (`[]` for
lists, `None` otherwise) — for `18.0` that is `lzc_energy_sources`, `glazing_gap`
(`Optional[Union[int, str]]` — the corpus lodges str, int, **and** absent), `pvc_window_frames`, and
scattered `SapBuildingPart` / `AlternativeImprovement` / `PhotovoltaicSupply` fields; this takes the
parse rate from 14/1000 to 1000/1000. (b) **Add a `sap_windows` field** — the placeholder `18.0`
schema omits it entirely, so without this the 10 rich certs' lodged geometry is silently dropped at
parse time, defeating the direct-use rule.
Because there is still no same-spec ground truth (**Validation Cohort** rule), every synthesis
assumption is recorded in code comments + test names, exactly as ADR-0027 requires.
## Consequences
- **The coefficients are now shared across specs.** Changing `0.148`, a band multiplier, or the 4-way
orientation split moves **every** rebaselined 20.0.0 **and** 18.0 score (and any 17.x/19.0 that later
joins). The blast radius of ADR-0027's named-constant block grew; that is the cost of transfer and the
reason the constants stay in one place with their derivation recorded.
- **The transfer is validated, not the absolute fit.** The band-4 match (0.223 obs vs pred) confirms the
*model shape* carries from 21.0.1-era stock to 2018-era stock; it does not independently establish the
base ratio for band-1, which remains inherited. Revisit if (a) the retired **RdSAP 2012** band→m²
formula is sourced, or (b) a same-spec Validation Cohort becomes available.
- **No cross-spec anchor exists in the current corpora.** A dual-lodged UPRN (same dwelling certified
under two specs) would let two re-scores cross-check, but the year-capped corpora have **zero** UPRN
overlap (18.0∩20.0.0 = 0). A true anchor would have to be *manufactured* via a targeted dual-lodged
harvest (scan the 2018 and 2022 dumps for shared UPRNs) — deferred, not part of landing 18.0.
- **Acceptance bar matches 20.0.0 (ADR-0027):** the corpus test promotes `RdSAP-Schema-18.0` into the
strict **parse + map** guard (1000/1000 return `EpcPropertyData`); it does **not** assert calculator
scores. Scoring is spot-checked manually via `scripts/eon/find_epc_data.py`; the formal score-value
test stays deferred. Expect wider lodged-vs-recalc deltas than 20.0.0 — the lodged 18.0 figure is on
an older SAP version, so it is Lodged Performance, not a target.
- **Synthesis stays copied for the first instance; the shared helper is deferred.** `18.0` adapts
ADR-0027's synthesis inline (one new instance). The shared, spec-parameterised
`_synthesise_reduced_field_windows` is extracted when `17.1` lands (the second instance), pulling
20.0.0 + 18.0 + 17.1 through one coefficient block — avoiding abstraction from a single example while
preventing three divergent copies.