Model/CONTEXT.md
Khalim Conn-Kowlessar da69dc27fd docs(modelling): wall-insulation eligibility + conservation-status ADRs
Captures the grill-with-docs session for solid-wall insulation: CONTEXT.md
gains EWI/IWI Measure Types + the Wall Insulation Eligibility rule (+ a
flagged-ambiguity that the three planning flags stay distinct, never recollapsed
to legacy restricted_measures). ADR-0019 records the eligibility policy (cavity
-> cavity only; brick/system -> IWI+EWI; timber -> IWI only; cob/stone -> none;
conservation/flat block EWI, listed/heritage block both). ADR-0020 records
conservation/listed/heritage as three distinct Property attributes sourced by
extending the geospatial S3 repo (flags co-located with lat/long).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 15:11:40 +00:00

40 KiB
Raw Blame History

Ara

The Domna product for domestic retrofit modelling: ingests open-source EPC data, lets users correct or supersede it with their own surveys, and produces optimised retrofit packages for each property in a portfolio.

Language

Product

Ara: The Domna product. Latin for "the altar"; named under Domna's classical-naming convention. Covers both the modelling product and the backend that powers it. Avoid: ARA (acronym style), v2 backend, the new backend

Domna: The company. Roman name; sibling to Ara in the same naming convention.

Energy Performance Certificates

EPC: An Energy Performance Certificate — a government-issued document rating a dwelling's energy efficiency from A (best) to G (worst). Avoid: energy certificate, energy report

Certificate Number: The unique identifier assigned to an EPC by the government registry. Avoid: cert number, EPC ID

Registration Date: The date an EPC was lodged with the government register; used to identify the most recent certificate for a property. Avoid: assessment date, submission date

EPC Band: A single letter AG representing a property's current or potential energy efficiency rating. Avoid: energy rating, EPC grade, EPC score

Schema Type: The versioned RdSAP or SAP schema that describes the structure of an EPC's raw data (e.g. RdSAP-Schema-21.0.1). Avoid: schema version, EPC format

Domestic Certificate: An EPC issued for a residential dwelling, as opposed to a commercial one. Avoid: residential EPC, home EPC

Properties and addresses

Property: The Ara domain aggregate representing a single dwelling under modelling: its identity, source data, enrichments, and modelling outputs. Avoid: dwelling, unit, home, asset

Properties: A first-class collection of Property objects; the unit of bulk operation in services. Avoid: property list, batch (used for SQS chunks)

UPRN: Unique Property Reference Number — the government-issued permanent identifier for a physical address in the UK. Avoid: property ID, address ID, code

Postcode: A UK postal code used to group nearby addresses; the primary search key for finding EPC records. Avoid: zip code, postal code

User Address: A structured dataclass (domain.addresses.user_address.UserAddress) capturing a customer-supplied address: a free-text user_address line, a canonical postcode (sanitised on construction), and an optional internal_reference. The bare string sense — the raw free-text address line as it arrives from upstream ingestion, before being wrapped — remains valid when discussing CSV columns, API payloads, or other upstream contexts; in domain code, prefer the dataclass. Avoid: user input, raw address, user_inputed_address

Comparable Properties: The reference cohort matched to a target Property by both geographic proximity (postcode prefix / UPRN range) and physical similarity (property type, built form, age band); used by the EPC Prediction Service for gap-filling and anomaly detection. Avoid: neighbours, similar properties, peer set

Source data

Site Notes: The full-coverage record produced by a Domna survey of a single Property; carries every EPC field the modelling pipeline requires, and when present supersedes the public EPC for that Property — except when the public EPC is newer. Avoid: energy assessment, site survey, field survey, Domna survey, Hestia survey

Landlord Overrides: Property data supplied by a landlord that may correct or supplement the public EPC for a single Property; triggers Rebaselining when applied; not applicable when Site Notes are present. Avoid: patches (deprecated), corrections, manual EPC, edits

Modelling

Effective EPC: The assembled EpcPropertyData picture the modelling pipeline scores for a single Property. Assembled from whichever source applies: Site Notes alone; or the public EPC with Landlord Overrides applied; or — when the EPC is old — its schema re-mapped to current and gaps filled from neighbour predictions; or — when there is no EPC — components estimated from surrounding properties. Carries source-derived physical fields and originally recorded performance values; the performance scored from this picture is held separately in Baseline Performance. Avoid: modelling EPC, working EPC, resolved EPC, derived EPC

Rebaselining: Establishing a Property's Effective Performance (SAP score, EPC Band, CO2, Primary Energy Intensity, space-heating & hot-water kWh) by assembling the Effective EPC picture and scoring it through SAP10 Calculation (the deterministic Sap10Calculator, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013). The assembly is the substance: apply Landlord Overrides (e.g. boiler → ASHP, wall insulated) as a simulation on the EpcPropertyData; estimate components from surrounding properties when there is no EPC; re-map an old-schema EPC to current and gap-fill from neighbour predictions. The calculator is the scoring engine at the tail, not the whole of Rebaselining — so its call lives inside the Rebaseliner, after assembly. Triggered whenever the assembled picture differs from the lodged record: (a) the EPC was lodged under a methodology the calculator supersedes (sap_version < 10.2), (b) Overrides / Site Notes changed the physical state (walls / heating / windows / etc.), or (c) the picture is estimated or remapped rather than a real current EPC. Produces Effective Performance; Lodged Performance is preserved unchanged. The same single scoring also yields the per-end-use kWh that Bill Derivation prices — one scoring, two products. kWh is an ML target per ADR-0007 — see epc-ml-transform. Avoid: re-scoring, re-prediction, performance recomputation, refresh (for cache-freshness)

Baseline Performance: A Property's current performance aggregate, holding both Lodged Performance and Effective Performance plus the energy block: delivered kWh per end use (heating, hot water, lighting, appliances, cooking, pumps/fans, cooling) and the annual bill composed into per-section costs plus a total, produced by Bill Derivation from SAP10 Calculation's per-end-use kWh × current Fuel Rates. Persisted as one row (flat typed columns, per-section kWh + cost + total); surfaced as one block in the UI. Avoid: baseline predictions, predicted baseline, rebaselined values

Lodged Performance: The SAP / EPC Band / carbon emissions / Primary Energy Intensity recorded on the public EPC (or the Site Notes' as-surveyed values when Site Notes are the source) — unmodified by modelling. The half of Baseline Performance that says "what the government register says about this Property". Avoid: original performance, raw EPC values, recorded baseline

Effective Performance: The SAP / EPC Band / carbon emissions / Primary Energy Intensity the modelling pipeline actually scored against — equal to Lodged Performance when no Rebaselining trigger fires, replaced by SAP10 Calculation output (the deterministic Sap10Calculator, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013) when triggered. The half of Baseline Performance that says "what we modelled". Avoid: modelled performance, rebaselined performance (only correct when rebaselining ran), scored values

Calculated SAP10 Performance: The SAP score, EPC Band, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh produced by SAP10 Calculation from a Property's EpcPropertyData. It is not a separately-persisted third value-set beside Lodged and Effective: in every baselining scenario the calculator's output is the Effective Performance (real lodged SAP10 EPC with no overrides ⇒ Calculated = Lodged = Effective; overrides or an estimated / pre-SAP10 EPC ⇒ Calculated = Effective, there being no lodged SAP10 figure to compare against). The calculator is therefore the mechanism that produces Effective Performance, having superseded the old ML-API rebaseliner. The calculator is load-bearing: for sap_version < 10.2 (lodged under a superseded methodology) its output is the Effective Performance; for ≥ 10.2 the API's lodged figures are kept and the calculator runs alongside, logging any divergence (SAP > 0.5, PEUI/CO2 beyond tolerance) as a validation signal (see sap-spec-version). It is load-bearing for Bill Derivation regardless of version (the EPC lodges no per-end-use kWh), so a calculator strict-raise aborts the batch and the un-mapped cert is fixed immediately. ADR-0009 introduced the term, amended by ADR-0010, realized by ADR-0013 (whose shadow stepping-stone is superseded) and ADR-0014. Avoid: calculator output, computed performance, worksheet performance, SAP10 output, calculated value-set (it is not a stored third set)

SAP10 Calculation: The process that runs the deterministic SAP 10.2 (14-03-2025 amendment) worksheet over a Property's EpcPropertyData and emits Calculated SAP10 Performance. Implemented by the Sap10Calculator service class in domain/sap10_calculator/ (calculator.py). Reads cert fabric/heating/geometry fields, applies the RdSAP 10 (10-06-2025) cert→input mapping, executes the 12-month heat balance per SAP 10.2 §§1-14, looks up boiler/heat-pump performance in the PCDB when the cert lodges a product index, and returns a SapResult carrying the five Calculated SAP10 Performance quantities plus a monthly breakdown and worksheet-line audit trail. Distinct from Rebaselining, which is ML-based. ADR-0009 originally targeted SAP 10.3 (13-01-2026); ADR-0010 retargets to SAP 10.2 (14-03-2025) until the cert corpus migrates. Avoid: SAP calculation (ambiguous with the gov calculator), SAP scoring, calculator run, SAP 10.3 calculation (active target is 10.2 — see sap-spec-version)

SAP Spec Version: The dated revision of the SAP specification that produced a given SAP/PEUI/CO2 value. Domain-meaningful because the same EpcPropertyData yields different sap_score under different spec versions — fuel-price tables, CO2 factors, PCDB references, and rating-equation deflators all change between revisions. Lodged Performance carries the version current when the cert was lodged (mostly SAP 10.1 / SAP 10.2 pre- and post-14-03-2025 amendment in the corpus). Calculated SAP10 Performance is locked to SAP 10.2 (14-03-2025). A 1-to-1 Lodged-vs-Calculated comparison therefore only makes sense within a Validation Cohort of certs lodged on the same spec version. Avoid: SAP version (ambiguous with the sap_version field on the cert, which only carries the major version like 10.2 — not the amendment date), spec revision

Validation Cohort: The subset of corpus certs used to validate SAP10 Calculation against Lodged Performance, filtered to certs lodged after the calculator's target SAP Spec Version rolled out in commercial assessor software — currently inspection_date ≥ 2025-07-01 (a buffer past 14-03-2025 to allow vendor rollout). Smaller than the full corpus but each cert is comparable under the same spec, so probe MAE is a clean signal of calculator-vs-spec correctness rather than spec-version mixture noise. ADR-0010. Avoid: parity cohort, validation set, corpus sample

Measure Application: The process that translates an Optimised Package into cert-field changes and produces the "ending state snapshot" EpcPropertyData that the Plan persists. Implemented by the MeasureApplicator service class in domain/sap/ (or a sibling package). Each Measure Type's translation rules (e.g. loft_insulationroof_insulation_thickness_mm = 270mm, ashpmain_heating_details[0] replacement) live here. Pure function — does not run SAP10 Calculation itself; the caller chains MeasureApplicator.apply(epc, package) → Sap10Calculator.calculate(post_epc). ADR-0009. Avoid: measure overrides (rejected during ADR-0009 grill — phantom mid-layer), package applier, retrofit simulator

Bill Derivation: The deterministic process that derives a Property's annual energy bill, composed into per-end-use sections (heating, hot water, lighting, appliances, cooking, pumps/fans, …) plus a total, by pricing SAP10 Calculation's delivered kWh per end use at current Fuel Rates — each end use billed at its fuel's rate, rolled up per fuel for standing charges (metered fuels only — gas/electricity; oil/LPG/solid have none) minus SEG export credit on PV. Implemented by BillDerivation in domain/billing/ (a cross-stage concern — the Baseline stage derives the current bill, the Modelling stage re-runs it on the post-package end-state for post-retrofit bills; deterministic, ADR-0006). Reads Fuel Rates from a committed static snapshot via FuelRatesRepository (no live ETL yet). Distinct from the calculator's total_fuel_cost_gbp, which is the SAP-rating notional cost at RdSAP Table 32 standardised prices (~half the real electricity price) — not what the household pays. Raises on a fuel it has no rate for (e.g. house coal, heat network). ADR-0014. Avoid: EPC Energy Derivation (renamed), EpcEnergyDerivationService (no "service" suffix), kWh prediction, baseline kWh, energy estimation

UCL Correction: The per-band linear correction (Few et al. 2023, Energy & Buildings 288 113024) that aligns EPC-modelled Primary Energy Intensity with metered consumption. Folded into ML training labels at fit time (per ADR-0007) rather than applied at runtime — the trained model emits metered-equivalent PEUI directly, avoiding the discontinuities at EPC band boundaries that arose when the per-band linear correction was applied post-prediction. Calibrated against gas-heated, non-PV homes in England and Wales rated under SAP 2012; the current implementation extrapolates it to all properties (open question §15.14). Avoid: UCL adjustment, energy correction, metered correction

EPC Anomaly Flag: A per-field indicator that a Property's value for an EPC field differs significantly from Comparable Properties; advisory only — surfaces in the UI to prompt user review, does not block modelling. Avoid: outlier, mismatch, divergence flag

Pipeline composition

The modelling backend is composed from three independently-invocable stage orchestrators, chained differently per use case. This composability — not a single end-to-end function — is the point: it is what lets the interactive single-property flow pause between stages where the batch flows do not. (Supersedes the monolithic model_engine.)

Ingestion: The first stage. Acquires a Property's external source data — the EPC certificate (New EPC API) and Google Solar insights — and resolves its coordinates, then writes everything to repos. Writes only; runs no modelling business logic. Per ADR-0003 nothing downstream reads across this seam by calling back to a source — downstream stages read the persisted data from repos. Avoid: fetching (a fetch is one source call; Ingestion is the whole write stage), data load

Baseline (stage): The second stage. Reads the persisted source data from repos, hydrates the Property aggregate, resolves its Effective EPC, and establishes its Baseline Performance. Re-scoring after a user override lives here. Distinct from Baseline Performance (the aggregate it produces). Avoid: rebaseline (that is a specific ML trigger — see Rebaselining), enrichment

Modelling (stage): The third stage. Takes the baselined Property plus a set of Scenarios and produces Recommendations → an Optimised PackagePlans, persisted to repos. A separate orchestrator from Baseline so the single-property flow can stop after Baseline and only run Modelling when the user hits "play". Avoid: scoring (overloaded), recommendation engine

First Run: The use case where a Property has only a row in the property table (post address→UPRN matching) and no existing Plan: the pipeline runs Ingestion → Baseline → Modelling end-to-end over a batch. The first sibling lambda being built (ara_first_run). Avoid: initial run, cold run

ML training

EPC ML Transform: The versioned class at domain/sap10_ml/transform.py that maps an EpcPropertyData to a fixed-width row of features + targets. The single ML-data contract between this repo and the AutoGluon training repo. Owns the windows compression, building-parts compression, Top-N Code Taxonomy, and UCL folding decisions. Each version is tagged on the deployed scoring lambda; a mismatch is a deploy-time fail. Avoid: feature builder, ML mapper, EPC vectoriser

Feature Schema Version: The semver version of the EPC ML Transform (e.g. 0.1.0), included in the parquet output path and the deployed scoring lambda's tag. MAJOR bump when columns are removed or renamed; MINOR when optional columns are added; PATCH for non-behavioural fixes. Avoid: transform version, schema version (overloaded with the SAP RdSAP schema version on EPCs), model version

Primary Energy Intensity (PEUI): A Property's total annual primary energy use per square metre of floor area (kWh/m²/yr), the SAP10 quantity recorded as energy_consumption_current on the EPC. Covers all end uses (heating, hot water, lighting, appliances, cooking) weighted by SAP primary energy factors per fuel. The quantity the UCL Correction aligns to metered consumption. Avoid: heat demand (which colloquially means the building's space heating thermal requirement — a distinct concept), energy demand, total energy use, kWh per square metre

PV Capacity Source: A flag on the EPC ML Transform feature set indicating whether a Property's PV capacity is measured (from sap_energy_source.photovoltaic_supply[].peak_power), estimated_from_roof_area (the percent_roof_area fallback used when the surveyor could not confirm array configuration), or none (no PV present). Lets the model weight the correct capacity signal per property. Avoid: PV source, PV configuration type, solar source

Top-N Code Taxonomy: The empirical top-N SAP code list (covering ~95% of mass on the training sample) committed by the EPC ML Transform for each list-aggregated categorical field (wall_construction, glazing_type, frame_material, etc.). Rare codes go into a per-field _other bucket. The taxonomy is locked at each Feature Schema Version; changes warrant a MINOR bump (adding) or MAJOR bump (removing codes). Avoid: code list, code dictionary, vocab

Reference data

Fuel Rates: The current per-fuel rate (pence/kWh) and standing charge used to compute a Property's bills; time-versioned and regional. Sourced for now from a committed static snapshot (national, Ofgem-cap period for gas/electricity + DESNZ/NEP for off-gas fuels), read via FuelRatesRepository; an Ofgem-cap ETL automating the refresh is future, not a prerequisite. The Smart Export Guarantee rate sits in the same set as electricity_export. Consumed by Bill Derivation. Avoid: fuel prices (commodity prices, different concept), tariff, energy cost

Carbon Factors: The per-fuel CO2 emission factor (kgCO2e/kWh) used to compute a Property's carbon emissions; time-versioned, refreshed from Defra's annual publication. Consumed by Bill Derivation. Avoid: emission factors (ambiguous), CO2 rates

Outputs

Scenario: A named portfolio-level retrofit plan, built by a user in the scenario-builder UI and persisted before any modelling fires; carries the overall goal (e.g. Increasing EPC), budget, exclusions, housing type, and the set of measure types it permits. The model is triggered against one or more Scenarios at once; each Scenario yields one Plan per Property. Avoid: project, batch, run-set

Scenario Snapshot: A frozen copy of a Scenario pinned at trigger time, keyed by (task, scenario); used by the modelling pipeline so mid-run edits to the live Scenario do not affect an in-flight job. Snapshots are read-only and may be garbage-collected after the task completes. Avoid: scenario version, frozen scenario, pinned scenario

Plan: The per-Property output of one Scenario's modelling run; carries the Optimised Package selected for the Property (its Plan Measures) and the Property's post-retrofit figures (SAP / kWh / CO₂ / bills). A Property modelled against N Scenarios in one trigger ends up with N Plans. Avoid: recommendation set, output, result

Plan Measure: One selected Measure Option as persisted inside a Plan — the single Option the Optimiser kept for a given Recommendation, recorded with its installed Cost and its final-package (role-3) attributed impact (the SAP points and CO₂ / energy savings that telescope exactly to the Plan's package total, per ADR-0016). It is the output counterpart to a Recommendation's candidate Option: a Recommendation proposes mutually-exclusive Options carrying no stored impact, whereas a Plan Measure is the one that was chosen with its truthful attributed impact frozen in. The persisted set of a Plan's Plan Measures is its Optimised Package. Avoid: recommendation (that is the candidate — never persist an output as a Recommendation), installed measure, selected measure (that names the package, not the line), plan item, plan recommendation

Recommendation: The finding that a Property needs work on a given target surface — a building part (the MAIN wall, an extension roof…) or a system (heating + hot water + controls, treated as one). Carries one or more mutually-exclusive Measure Options; the Optimiser selects at most one. The target itself is encoded in each Option's Simulation Overlay (which addresses a building part, a specific window, or a system) — never as a typed key on the Recommendation, so the type stays stable as new surfaces land. Recommendations partition the modifiable surface of EpcPropertyData: no two Recommendations write the same field of the same target, so selected Options never collide. Exclusivity between competing treatments (cavity-fill vs EWI; a boiler bundle vs an ASHP) is captured within one Recommendation, never across them. Avoid: suggestion, recommendation engine, keying by measure type (a Recommendation can span measure types — e.g. a heating + hot-water bundle), the persisted selected-measure output line (that is a Plan Measure, which carries impact; a Recommendation never does)

Measure Option: One mutually-exclusive way to satisfy a Recommendation — possibly a bundle of sub-measures (e.g. "new condensing boiler + cylinder insulation"), possibly a single intervention at a chosen size/product (a 4 kWp PV array of product X). Carries its total cost and a Simulation Overlay for its combined effect on the target surface. Cost is intrinsic to the Option; SAP / kWh / carbon impact is not — impact is cascade-conditional (depends on what is already installed) and is produced by scoring, never stored on the Option. Two Options under one Recommendation may share an identical Simulation Overlay (differing only on cost/product) or differ (e.g. PV kWp), so scoring runs per distinct Overlay. Avoid: option (too generic), variant, SKU

Simulation Overlay (type EpcSimulation): The change a single Measure Option makes to a Property's EpcPropertyData, expressed as an all-optional partial mirror of EpcPropertyData and its nested types — covering only the retrofit-relevant surface (walls/roofs/floors, windows, heating + controls, hot water, ventilation, lighting, PV, draughtproofing), never identity/location fields. Targets a specific building part by BuildingPartIdentifier (MAIN, EXTENSION_1..4) so "insulate the cavity wall" addresses the exact SapBuildingPart; targets a specific window by its index in sap_windows (the PDF's W1/W2/W3) — glazing measures address windows directly by number, regardless of which wall they sit on; the window's building-part association is carried separately via window_location (resolved by _window_bp_index), not used for targeting; and targets whole-dwelling systems (e.g. sap_heating) directly. Carries no scores. It is not an EpcPropertyData (composition, not inheritance — an all-None overlay is not a valid EPC). A domain operation folds a baseline EpcPropertyData + an ordered set of Overlays into a throwaway EpcPropertyData handed to the calculator; only the score is kept, the EPD is discarded. Avoid: simulation config (the legacy EPC-API flag object), patch, delta, diff

Product: A catalogue entry a Measure Option installs — insulation, glazing units, heat pumps, boilers, cylinders, PV panels, inverters, batteries — carrying the data to price an Option and shape its Simulation Overlay. Named Product, not material: the catalogue is dominated by equipment and appliances, and a heat pump is not a building material. Read via ProductRepository, which for now combines two inputs — the Products in the database plus a committed costs file holding what the ETL does not yet supply. Single-source unification (ETL-supplied costs) is separate, queued work; legacy Costs.py is retained but queued for deletion. Avoid: material, building material (inaccurate for appliances), part (the per-Option installed line item), SKU

Cost (of a Measure Option): A single fully-loaded total — products + labour + preliminaries + VAT + margin rolled into one figure — plus a separately-carried Contingency. Only contingency is broken out; the rest is not decomposed, as that breakdown proved unhelpful.

Contingency: A per-Measure-Type percentage uplift on an Option's cost covering job-specific risk (e.g. cavity-wall 10%, internal/external wall 26%, ASHP 25% — cf. legacy Costs.CONTINGENCIES). The one cost component carried separately from the fully-loaded total, because the rate is measure-type-specific and meaningful to surface. Avoid: preliminaries (a different, rolled-in 10%), margin

Measure Dependency: A "selecting A requires B" edge between Recommendations, for couplings that are real but that the Optimiser would not choose on its own — e.g. wall (and possibly roof) insulation requires adequate ventilation. The required Option is excluded from the optimiser's candidate pool (it is mandatory-when-triggered, not a free choice) but is injected into the Optimised Package before the package re-score, so its real SAP contribution — which for ventilation is negative — is captured in the true package score and in the undershoot/repair loop. Trigger set is held as data (cf. legacy assumptions.measures_needing_ventilation), not control flow, so extending the triggers (e.g. to roof insulation) is a data edit. Distinct from the legacy post-optimisation best-practice add, which tacked cost on after scoring and so undershot. Avoid: best-practice measure (legacy term), forced measure

Optimised Package: The subset of a Property's Recommendations selected by the Optimiser Service for installation. For an Increasing EPC goal the objective is least-cost-to-target: the cheapest package that reaches the goal band — so it stops at the target and does not overshoot into a higher band, leaving surplus budget unspent. When the target is unreachable within budget, it falls back to the maximum improvement the budget buys (best effort, below target). With no budget it is simply the cheapest package that reaches the target. Reaching the target is judged on the true whole-package re-score (ADR-0016), not on summed per-measure scores. (Other goals — Energy Savings, Reducing CO₂ — don't yet set a target and currently maximise improvement within budget; future work.) Avoid: selected measures, default measures, optimal solution, recommended bundle

Measure Type: The catalogue classification of a retrofit measure (e.g. solar_pv, loft_insulation, ashp); one or more Recommendations reference the same Measure Type with property-specific cost and impact. Avoid: measure (ambiguous), category

External Wall Insulation (EWI) / Internal Wall Insulation (IWI): The two competing Measure Options for insulating a solid (non-cavity) main wall — insulation fixed to the outside face (wall_insulation_type = 1) or the room side (wall_insulation_type = 3), both 100 mm at λ = 0.04 W/m·K; the calculator derives the post-insulation U-value from the type + thickness (the lodged cert carries no U-value). IWI additionally lowers the wall's thermal-mass parameter (changing heating demand); EWI does not. Avoid: solid wall insulation (that names the pair, not one Option), cladding, drylining

Wall Insulation Eligibility: The rule fixing which wall Option(s) the main-wall Recommendation offers, by wall construction then planning status. By construction: cavity → cavity fill only (never solid insulation); solid brick / system-built → IWI + EWI; timber-frame → IWI only (EWI is not constructable); cob / granite or whinstone / sandstone or limestone → none (breathable fabric — standard insulation risks trapping moisture). Then, as planning gates: a conservation area or flat removes EWI (external-appearance / whole-block constraints), and a listed or heritage building removes both EWI and IWI (protected fabric). Avoid: restricted measures (legacy collapsed conservation/listed/heritage into one boolean — they now gate different Options, so keep them distinct)

Valuation

Property Valuation: The current open-market value of a Property — an externally-sourced Baseline attribute (customer upload or, later, an estimate), absent for most Properties and never derived from the EPC. Avoid: valuation (ambiguous with Valuation Uplift), market price, current value, house price

Valuation Uplift: The estimated increase in a Property's market value produced by a Plan's retrofit — plan-conditional (it depends on the Plan's target EPC Band) and percentage-primary: always expressible as a % from the Band jump (current → target), and as an absolute £ amount only when a Property Valuation is known. Capped so the £ uplift never exceeds twice the Plan's cost (the cap can only bite once a Property Valuation supplies the £ form — see ADR-0018). Avoid: valuation increase, value gain, financial uplift, property_valuation_increase (pick one — Valuation Uplift is canonical)

Address matching

Lexiscore: A similarity score in [0, 1] between a User Address and a candidate EPC address; combines token overlap and character-level similarity. Avoid: score, match score, similarity

Lexirank: Dense rank of candidates sorted by Lexiscore descending; rank 1 = best match. Avoid: rank, position

UPRN Candidate: An EPC Search Result that is a plausible match for a given User Address, before scoring decides the winner. Avoid: match candidate, result

Score Threshold: The minimum Lexiscore (currently 0.6) below which no match is returned even if a candidate exists. Avoid: minimum score, cutoff

Ambiguous Match: A matching outcome where two or more candidates share Lexirank 1, making it impossible to select a unique winner. Avoid: tie, draw, duplicate

Best Match: The single UPRN Candidate with Lexirank 1 that meets or exceeds the Score Threshold. Avoid: winner, top result

API and integration

EPC Search Result: A lightweight record returned by the government domestic search endpoint — address lines, postcode, UPRN, band, and certificate number, but not full certificate data. Avoid: search row, EPC row, result

EPC Property Data: The fully mapped domain object produced after fetching and parsing a complete EPC certificate; the schema the modelling pipeline operates against. Avoid: EPC data, certificate data, parsed EPC

Old EPC API: The retired government API (epc.opendatacommunities.org) using HTTP Basic auth; decommissioned 30 May 2026. Avoid: legacy API

New EPC API: The replacement government API (api.get-energy-performance-data.communities.gov.uk) using Bearer Token auth. Avoid: new API, current API

Bearer Token: The auth credential required by the New EPC API; stored in the EPC_AUTH_TOKEN environment variable. Avoid: API key, auth token, secret

Relationships

  • A Property represents a single physical dwelling for modelling; identified by (portfolio_id, UPRN) or (portfolio_id, landlord_property_id).
  • A Property has zero or more EPCs across time, exactly one Effective EPC, zero or one set of Site Notes, and zero or one set of Landlord Overrides.
  • An EPC belongs to exactly one Property and has one Certificate Number.
  • An EPC carries an EPC Band and is identifiable by its Registration Date; the most recent one is the current.
  • A UPRN identifies a physical dwelling permanently; it does not change when the property changes owner — but each portfolio gets its own Property keyed against it.
  • When a Property has both Site Notes and a public EPC, the newer of the two derives the Effective EPC. Landlord Overrides apply only when the EPC is the source — never when Site Notes are.
  • A Property's Baseline Performance holds two halves: Lodged Performance (the gov register's SAP / band / carbon / heat) and Effective Performance (what the modelling pipeline scored against). The two are equal unless Rebaselining fires.
  • Rebaselining produces Effective Performance by ML re-prediction across SAP score, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh, when either (a) the Effective EPC was lodged under a pre-SAP10 schema, or (b) the Effective EPC's physical state diverges from the lodged EPC. Lodged Performance is never overwritten.
  • Bill Derivation derives fuel split and bills from kWh values (sourced from the EPC's renewable_heat_incentive fields for baseline SAP10 properties, or from ML when Rebaselining fires), reading current Fuel Rates and Carbon Factors from their respective repos.
  • The EPC Prediction Service uses Comparable Properties for both gap-filling and producing EPC Anomaly Flags.
  • Triggering the model against N Scenarios produces N Plans per Property. Each Plan holds one Optimised Package — its selected Plan Measures — plus the Property's post-retrofit figures.
  • A Scenario Snapshot is pinned at trigger time per (task, scenario) so mid-run edits to the live Scenario do not affect an in-flight modelling job.
  • A Recommendation references one Measure Type and carries property-specific cost and impact.
  • A Property Valuation (current market value) is a Baseline attribute and is mostly absent; a Valuation Uplift is a Plan output, always a percentage from the EPC Band jump and an absolute £ only when a Property Valuation exists.
  • Address Matching uses a User Address and Postcode to find a UPRN by scoring UPRN Candidates from an EPC search. A Lexirank of 1 with no Ambiguous Match and a Lexiscore ≥ the Score Threshold produces a Best Match.

Example dialogue

Dev: "A landlord uploads a corrected boiler for one of their properties. What happens?"

Domain expert: "That's a Landlord Override on the heating fields. Save it against the Property. The Effective EPC has changed, so Rebaselining runs to re-predict SAP / carbon / PEUI / space heating kWh / hot water kWh, and Bill Derivation re-runs to update the fuel split and bills based on the new kWh values and fuel deduction. With fresh Baseline Performance we regenerate Recommendations."

Dev: "What if the same Property also has Site Notes?"

Domain expert: "Site Notes supersede the public EPC, so Landlord Overrides don't apply. We model from the Site Notes version of the Effective EPC. If the public EPC is newer than the Site Notes, that's the one exception — we use the newer one."

Dev: "After modelling we end up with a list of measures. Which ones get installed?"

Domain expert: "The Optimiser Service picks the Optimised Package — a subset of Recommendations that hits the Scenario goal within budget. The rest stay in the Plan as alternatives the user can swap in."

Dev: "I'm looking at a property where the EPC says cavity walls but every other house on the street has solid. Is that a bug?"

Domain expert: "That's an EPC Anomaly Flag. We compute it against the Comparable Properties for that postcode. It's advisory — the UI surfaces it and the landlord can apply a Landlord Override if it's wrong."

Dev: "The property card shows two SAP scores side by side. Why?"

Domain expert: "Those are Lodged Performance and Effective Performance. Lodged is what the gov register says — the EPC was rated under SAP 2012. Effective is what we scored against — we ran Rebaselining to predict the SAP10-equivalent rating because the methodology changed. Both stay on the Baseline Performance so users can see what's on record and what we're modelling against."

Flagged ambiguities

  • "property" was historically warned against in favour of "dwelling"; that has been inverted. Property is now canonical for the Ara domain aggregate. Legacy code still uses "dwelling" in places — treat as alias.
  • "energy assessment" in the existing codebase (energy_assessment_functions, energy_assessments_by_uprn) refers to what is now canonically called Site Notes. New code uses Site Notes.
  • "patch" / patch_epc in the existing codebase has been merged into Landlord Overrides; the original concept is deprecated.
  • "already_installed measures" in the existing codebase is likely subsumed by Landlord Overrides ("we have a heat pump now" → override the heating fields). Final call deferred to implementation.
  • "address" appears as both the raw User Address (free-text from customer data, or the structured UserAddress dataclass that wraps it) and a structured field on an EPC Search Result (normalised lines). Always qualify: "user address" vs "EPC address" or "address line 1". Within domain/, User Address specifically means the UserAddress dataclass; in upstream ingestion contexts (CSV columns, SQS payloads) it can still mean the raw string sense.
  • "score" is used for AddressMatch.score() output, the lexiscore column, and informally. Prefer Lexiscore in domain discussions; reserve "score" for method-level code comments.
  • "user_inputed_address" in backend/address2UPRN/main.py is a misspelling and a synonym for User Address — the canonical term. New code should use user_address.
  • "EPC" is overloaded as both the document and the rating band letter. Use EPC for the document, EPC Band for the letter.
  • "re-scoring" has two meanings in the codebase — Rebaselining (re-predicting baseline performance after an EPC change) and post-optimisation measure re-prediction. Prefer Rebaselining for the former; for the latter, the Optimiser Service step does its own scoring without a special name.
  • "phase" (sequencing measures into ordered steps within a Scenario/Plan) was a speculative, prospective-client feature and is deferred — out of scope (see ADR-0005). It is not a current domain term: a Scenario carries one set of measures, a Plan one Optimised Package. The only live use of "phase" is cut-over timeline language in the PRD ("Phase 0 — Status quo"), which is project-management vocabulary and does not enter code.
  • "valuation" was used for both a Property's current market value and the increase a retrofit produces — resolved into two distinct terms: Property Valuation (current value, a Baseline attribute) and Valuation Uplift (the plan-conditional, percentage-primary increase). The bare word "valuation" should be qualified to one of these.
  • "stale" appears in two senses: cache-freshness ("a Repo record is stale and the orchestrator should refetch") — a legitimate operational concept; and as loose shorthand for the EPC's recorded cost fields being unusable. The cost fields are not stale — they are pinned to the inspection-date fuel rates by design. Use "pinned to inspection date" or "pre-SAP10 schema" (whichever applies) instead.
  • "restricted_measures" (legacy backend/Property.py) collapsed in_conservation_area, is_listed_building, and is_heritage_building into one boolean that blocked EWI only. Resolved: the rebuild keeps the three flags distinct, because they gate different Options — a conservation area blocks EWI but allows IWI, whereas listed/heritage block both (see Wall Insulation Eligibility). Don't reintroduce a single collapsed flag.