# Ara The Domna product for domestic retrofit modelling: ingests open-source EPC data, lets users correct or supersede it with their own surveys, and produces optimised retrofit packages for each property in a portfolio. ## Language ### Product **Ara**: The Domna product. Latin for "the altar"; named under Domna's classical-naming convention. Covers both the modelling product and the backend that powers it. _Avoid_: ARA (acronym style), v2 backend, the new backend **Domna**: The company. Roman name; sibling to Ara in the same naming convention. ### Energy Performance Certificates **EPC**: An Energy Performance Certificate — a government-issued document rating a dwelling's energy efficiency from A (best) to G (worst). _Avoid_: energy certificate, energy report **Certificate Number**: The unique identifier assigned to an EPC by the government registry. _Avoid_: cert number, EPC ID **Registration Date**: The date an EPC was lodged with the government register; used to identify the most recent certificate for a property. _Avoid_: assessment date, submission date **EPC Band**: A single letter A–G representing a property's current or potential energy efficiency rating. _Avoid_: energy rating, EPC grade, EPC score **Schema Type**: The versioned RdSAP or SAP schema that describes the structure of an EPC's raw data (e.g. `RdSAP-Schema-21.0.1`). _Avoid_: schema version, EPC format **Domestic Certificate**: An EPC issued for a residential dwelling, as opposed to a commercial one. _Avoid_: residential EPC, home EPC ### Properties and addresses **Property**: The Ara domain aggregate representing a single dwelling under modelling: its identity, source data, enrichments, and modelling outputs. _Avoid_: dwelling, unit, home, asset **Properties**: A first-class collection of Property objects; the unit of bulk operation in services. _Avoid_: property list, batch (used for SQS chunks) **UPRN**: Unique Property Reference Number — the government-issued permanent identifier for a physical address in the UK. _Avoid_: property ID, address ID, code **Postcode**: A UK postal code used to group nearby addresses; the primary search key for finding EPC records. _Avoid_: zip code, postal code **User Address**: A structured dataclass (`domain.addresses.user_address.UserAddress`) capturing a customer-supplied address: a free-text `user_address` line, a canonical `postcode` (sanitised on construction), and an optional `internal_reference`. The bare string sense — the raw free-text address line as it arrives from upstream ingestion, before being wrapped — remains valid when discussing CSV columns, API payloads, or other upstream contexts; in domain code, prefer the dataclass. _Avoid_: user input, raw address, user_inputed_address **Comparable Properties**: The reference cohort matched to a target Property by both geographic proximity (postcode prefix / UPRN range) and physical similarity (property type, built form, age band); used by the EPC Prediction Service for gap-filling and anomaly detection. _Avoid_: neighbours, similar properties, peer set ### Source data **Site Notes**: The full-coverage record produced by a Domna survey of a single Property; carries every EPC field the modelling pipeline requires, and when present supersedes the public EPC for that Property — except when the public EPC is newer. _Avoid_: energy assessment, site survey, field survey, Domna survey, Hestia survey **Landlord Overrides**: Property data supplied by a landlord that may correct or supplement the public EPC for a single Property; triggers Rebaselining when applied; not applicable when Site Notes are present. _Avoid_: patches (deprecated), corrections, manual EPC, edits ### Modelling **Effective EPC**: The EpcPropertyData scored by the modelling pipeline for a single Property, derived from either Site Notes alone or the public EPC with Landlord Overrides applied; carries source-derived physical fields and originally recorded performance values, with model-rebaselined performance held separately in Baseline Performance. _Avoid_: modelling EPC, working EPC, resolved EPC, derived EPC **Rebaselining**: Re-predicting a Property's SAP score, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh via **SAP10 Calculation** (the deterministic `Sap10Calculator`, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013) so the modelling pipeline scores it against the current SAP10 methodology. Triggered when either (a) the Effective EPC was lodged under a methodology the calculator supersedes (`sap_version < 10.2`, the calculator's target spec), so the recorded scores reflect a superseded methodology, or (b) Site Notes / Landlord Overrides changed the physical state of the Property (walls / heating / windows / etc.) so the lodged scores no longer reflect what's installed. Both triggers may fire together. Produces Effective Performance; Lodged Performance is preserved unchanged. kWh is included as ML targets per ADR-0007 — see [[epc-ml-transform]]. _Avoid_: re-scoring, re-prediction, performance recomputation, refresh (for cache-freshness) **Baseline Performance**: A Property's current performance aggregate, holding both Lodged Performance and Effective Performance plus the energy block: delivered kWh **per end use** (heating, hot water, lighting, appliances, cooking, pumps/fans, …) and the **annual bill** composed into per-section costs plus a total, produced by **Bill Derivation** from SAP10 Calculation's per-end-use kWh × current Fuel Rates. Persisted as one row (flat typed columns, per-section kWh + cost + total); surfaced as one block in the UI. _Avoid_: baseline predictions, predicted baseline, rebaselined values **Lodged Performance**: The SAP / EPC Band / carbon emissions / Primary Energy Intensity recorded on the public EPC (or the Site Notes' as-surveyed values when Site Notes are the source) — unmodified by modelling. The half of Baseline Performance that says "what the government register says about this Property". _Avoid_: original performance, raw EPC values, recorded baseline **Effective Performance**: The SAP / EPC Band / carbon emissions / Primary Energy Intensity the modelling pipeline actually scored against — equal to Lodged Performance when no Rebaselining trigger fires, replaced by **SAP10 Calculation** output (the deterministic `Sap10Calculator`, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013) when triggered. The half of Baseline Performance that says "what we modelled". _Avoid_: modelled performance, rebaselined performance (only correct when rebaselining ran), scored values **Calculated SAP10 Performance**: The SAP score, EPC Band, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh produced by **SAP10 Calculation** from a Property's EpcPropertyData. It is **not** a separately-persisted third value-set beside Lodged and Effective: in every baselining scenario the calculator's output *is* the **Effective Performance** (real lodged SAP10 EPC with no overrides ⇒ Calculated = Lodged = Effective; overrides or an estimated / pre-SAP10 EPC ⇒ Calculated = Effective, there being no lodged SAP10 figure to compare against). The calculator is therefore the mechanism that produces Effective Performance, having superseded the old ML-API rebaseliner. The calculator is **load-bearing**: for `sap_version < 10.2` (lodged under a superseded methodology) its output *is* the Effective Performance; for `≥ 10.2` the API's lodged figures are kept and the calculator runs **alongside, logging any divergence** (SAP > 0.5, PEUI/CO2 beyond tolerance) as a validation signal (see [[sap-spec-version]]). It is load-bearing for **Bill Derivation regardless of version** (the EPC lodges no per-end-use kWh), so a calculator strict-raise **aborts the batch** and the un-mapped cert is fixed immediately. ADR-0009 introduced the term, amended by ADR-0010, realized by ADR-0013 (whose shadow stepping-stone is superseded) and ADR-0014. _Avoid_: calculator output, computed performance, worksheet performance, SAP10 output, calculated value-set (it is not a stored third set) **SAP10 Calculation**: The process that runs the deterministic SAP 10.2 (14-03-2025 amendment) worksheet over a Property's EpcPropertyData and emits **Calculated SAP10 Performance**. Implemented by the `Sap10Calculator` service class in `domain/sap10_calculator/` (`calculator.py`). Reads cert fabric/heating/geometry fields, applies the RdSAP 10 (10-06-2025) cert→input mapping, executes the 12-month heat balance per SAP 10.2 §§1-14, looks up boiler/heat-pump performance in the **PCDB** when the cert lodges a product index, and returns a `SapResult` carrying the five Calculated SAP10 Performance quantities plus a monthly breakdown and worksheet-line audit trail. Distinct from **Rebaselining**, which is ML-based. ADR-0009 originally targeted SAP 10.3 (13-01-2026); ADR-0010 retargets to SAP 10.2 (14-03-2025) until the cert corpus migrates. _Avoid_: SAP calculation (ambiguous with the gov calculator), SAP scoring, calculator run, SAP 10.3 calculation (active target is 10.2 — see [[sap-spec-version]]) **SAP Spec Version**: The dated revision of the SAP specification that produced a given SAP/PEUI/CO2 value. Domain-meaningful because the same EpcPropertyData yields different `sap_score` under different spec versions — fuel-price tables, CO2 factors, PCDB references, and rating-equation deflators all change between revisions. **Lodged Performance** carries the version current when the cert was lodged (mostly SAP 10.1 / SAP 10.2 pre- and post-14-03-2025 amendment in the corpus). **Calculated SAP10 Performance** is locked to SAP 10.2 (14-03-2025). A 1-to-1 Lodged-vs-Calculated comparison therefore only makes sense within a **Validation Cohort** of certs lodged on the same spec version. _Avoid_: SAP version (ambiguous with the `sap_version` field on the cert, which only carries the major version like 10.2 — not the amendment date), spec revision **Validation Cohort**: The subset of corpus certs used to validate **SAP10 Calculation** against **Lodged Performance**, filtered to certs lodged after the calculator's target **SAP Spec Version** rolled out in commercial assessor software — currently `inspection_date ≥ 2025-07-01` (a buffer past 14-03-2025 to allow vendor rollout). Smaller than the full corpus but each cert is comparable under the same spec, so probe MAE is a clean signal of calculator-vs-spec correctness rather than spec-version mixture noise. ADR-0010. _Avoid_: parity cohort, validation set, corpus sample **Measure Application**: The process that translates an Optimised Package into cert-field changes and produces the "ending state snapshot" EpcPropertyData that Plan Phase persists. Implemented by the `MeasureApplicator` service class in `domain/sap/` (or a sibling package). Each Measure Type's translation rules (e.g. `loft_insulation` → `roof_insulation_thickness_mm = 270mm`, `ashp` → `main_heating_details[0]` replacement) live here. Pure function — does not run SAP10 Calculation itself; the caller chains `MeasureApplicator.apply(epc, package) → Sap10Calculator.calculate(post_epc)`. ADR-0009. _Avoid_: measure overrides (rejected during ADR-0009 grill — phantom mid-layer), package applier, retrofit simulator **Bill Derivation**: The deterministic process that derives a Property's annual energy **bill**, composed into per-end-use sections (heating, hot water, lighting, appliances, cooking, pumps/fans, …) plus a **total**, by pricing **SAP10 Calculation**'s delivered kWh per end use at **current Fuel Rates** — each end use billed at its fuel's rate, rolled up per fuel for **standing charges** (metered fuels only — gas/electricity; oil/LPG/solid have none) minus **SEG** export credit on PV. Implemented by `BillDerivation` in `domain/property_baseline/` (deterministic, ADR-0006). Reads Fuel Rates from a committed static snapshot via `FuelRatesRepository` (no live ETL yet). **Distinct from the calculator's `total_fuel_cost_gbp`**, which is the SAP-rating notional cost at RdSAP Table 32 standardised prices (~half the real electricity price) — not what the household pays. Raises on a fuel it has no rate for (e.g. house coal, heat network). ADR-0014. _Avoid_: EPC Energy Derivation (renamed), EpcEnergyDerivationService (no "service" suffix), kWh prediction, baseline kWh, energy estimation **UCL Correction**: The per-band linear correction (Few et al. 2023, _Energy & Buildings_ 288 113024) that aligns EPC-modelled Primary Energy Intensity with metered consumption. Folded into ML training labels at fit time (per ADR-0007) rather than applied at runtime — the trained model emits metered-equivalent PEUI directly, avoiding the discontinuities at EPC band boundaries that arose when the per-band linear correction was applied post-prediction. Calibrated against gas-heated, non-PV homes in England and Wales rated under SAP 2012; the current implementation extrapolates it to all properties (open question §15.14). _Avoid_: UCL adjustment, energy correction, metered correction **EPC Anomaly Flag**: A per-field indicator that a Property's value for an EPC field differs significantly from Comparable Properties; advisory only — surfaces in the UI to prompt user review, does not block modelling. _Avoid_: outlier, mismatch, divergence flag ### Pipeline composition The modelling backend is composed from three independently-invocable **stage orchestrators**, chained differently per use case. This composability — not a single end-to-end function — is the point: it is what lets the interactive single-property flow pause between stages where the batch flows do not. (Supersedes the monolithic `model_engine`.) **Ingestion**: The first stage. Acquires a Property's external source data — the EPC certificate (New EPC API) and Google Solar insights — and resolves its coordinates, then writes everything to repos. Writes only; runs no modelling business logic. Per ADR-0003 nothing downstream reads across this seam by calling back to a source — downstream stages read the persisted data from repos. _Avoid_: fetching (a fetch is one source call; Ingestion is the whole write stage), data load **Baseline** (stage): The second stage. Reads the persisted source data from repos, hydrates the **Property** aggregate, resolves its **Effective EPC**, and establishes its **Baseline Performance**. Re-scoring after a user override lives here. Distinct from **Baseline Performance** (the aggregate it produces). _Avoid_: rebaseline (that is a specific ML trigger — see Rebaselining), enrichment **Modelling** (stage): The third stage. Takes the baselined Property plus a set of **Scenarios** and produces **Recommendations** → an **Optimised Package** per **Scenario Phase** → **Plans**, persisted to repos. A separate orchestrator from Baseline so the single-property flow can stop after Baseline and only run Modelling when the user hits "play". _Avoid_: scoring (overloaded), recommendation engine **First Run**: The use case where a Property has only a row in the property table (post address→UPRN matching) and no existing **Plan**: the pipeline runs Ingestion → Baseline → Modelling end-to-end over a batch. The first sibling lambda being built (`ara_first_run`). _Avoid_: initial run, cold run ### ML training **EPC ML Transform**: The versioned class at `domain/sap10_ml/transform.py` that maps an EpcPropertyData to a fixed-width row of features + targets. The single ML-data contract between this repo and the AutoGluon training repo. Owns the windows compression, building-parts compression, Top-N Code Taxonomy, and UCL folding decisions. Each version is tagged on the deployed scoring lambda; a mismatch is a deploy-time fail. _Avoid_: feature builder, ML mapper, EPC vectoriser **Feature Schema Version**: The semver version of the EPC ML Transform (e.g. `0.1.0`), included in the parquet output path and the deployed scoring lambda's tag. MAJOR bump when columns are removed or renamed; MINOR when optional columns are added; PATCH for non-behavioural fixes. _Avoid_: transform version, schema version (overloaded with the SAP RdSAP schema version on EPCs), model version **Primary Energy Intensity** (**PEUI**): A Property's total annual primary energy use per square metre of floor area (kWh/m²/yr), the SAP10 quantity recorded as `energy_consumption_current` on the EPC. Covers all end uses (heating, hot water, lighting, appliances, cooking) weighted by SAP primary energy factors per fuel. The quantity the UCL Correction aligns to metered consumption. _Avoid_: heat demand (which colloquially means the building's space heating thermal requirement — a distinct concept), energy demand, total energy use, kWh per square metre **PV Capacity Source**: A flag on the EPC ML Transform feature set indicating whether a Property's PV capacity is `measured` (from `sap_energy_source.photovoltaic_supply[].peak_power`), `estimated_from_roof_area` (the `percent_roof_area` fallback used when the surveyor could not confirm array configuration), or `none` (no PV present). Lets the model weight the correct capacity signal per property. _Avoid_: PV source, PV configuration type, solar source **Top-N Code Taxonomy**: The empirical top-N SAP code list (covering ~95% of mass on the training sample) committed by the EPC ML Transform for each list-aggregated categorical field (`wall_construction`, `glazing_type`, `frame_material`, etc.). Rare codes go into a per-field `_other` bucket. The taxonomy is locked at each Feature Schema Version; changes warrant a MINOR bump (adding) or MAJOR bump (removing codes). _Avoid_: code list, code dictionary, vocab ### Reference data **Fuel Rates**: The current per-fuel rate (pence/kWh) and standing charge used to compute a Property's bills; time-versioned and regional. Sourced for now from a **committed static snapshot** (national, Ofgem-cap period for gas/electricity + DESNZ/NEP for off-gas fuels), read via `FuelRatesRepository`; an Ofgem-cap ETL automating the refresh is future, not a prerequisite. The Smart Export Guarantee rate sits in the same set as `electricity_export`. Consumed by Bill Derivation. _Avoid_: fuel prices (commodity prices, different concept), tariff, energy cost **Carbon Factors**: The per-fuel CO2 emission factor (kgCO2e/kWh) used to compute a Property's carbon emissions; time-versioned, refreshed from Defra's annual publication. Consumed by Bill Derivation. _Avoid_: emission factors (ambiguous), CO2 rates ### Outputs **Scenario**: A named portfolio-level retrofit plan, built by a user in the scenario-builder UI and persisted before any modelling fires; carries the overall goal (e.g. Increasing EPC), budget, exclusions, housing type, and an ordered list of Scenario Phases. The model is triggered against one or more Scenarios at once; each Scenario yields one Plan per Property. _Avoid_: project, batch, run-set **Scenario Phase**: One ordered step inside a Scenario, carrying a measure-type allowlist (e.g. "loft insulation and walls in phase 1; ASHP in phase 2"), an optional phase budget, and an optional phase target. A single-phase Scenario is one Scenario Phase with all measure types allowed and the full budget on it — there is no special-case path. _Avoid_: scenario stage, scenario step, tranche **Scenario Snapshot**: A frozen copy of a Scenario pinned at trigger time, keyed by (task, scenario); used by the modelling pipeline so mid-run edits to the live Scenario do not affect an in-flight job. Snapshots are read-only and may be garbage-collected after the task completes. _Avoid_: scenario version, frozen scenario, pinned scenario **Plan**: The per-Property output of one Scenario's modelling run; carries an ordered list of Plan Phases matching the Scenario's Phase shape. A Property modelled against N Scenarios in one trigger ends up with N Plans. _Avoid_: recommendation set, output, result **Plan Phase**: The per-Property output of one Scenario Phase: the Optimised Package selected for that phase, the ending state snapshot (the Property's SAP / kWh / bills after the package is applied), and any Rolled-over Options that flow as candidates into the next Plan Phase. _Avoid_: plan stage, plan step **Rolled-over Options**: Recommendations generated but not selected by the Optimiser in a given Plan Phase, that remain eligible as candidates in subsequent Plan Phases. Exact roll-over rule (automatic vs user-marked) is under design. _Avoid_: deferred measures, leftover recommendations **Recommendation**: A single proposed retrofit measure for a Property, with its cost, SAP impact, kWh savings, carbon savings, and parts list. _Avoid_: suggestion, option **Optimised Package**: The subset of a Property's Recommendations selected by the Optimiser Service for installation, chosen to satisfy the Scenario's goal subject to budget. _Avoid_: selected measures, default measures, optimal solution, recommended bundle **Measure Type**: The catalogue classification of a retrofit measure (e.g. `solar_pv`, `loft_insulation`, `ashp`); one or more Recommendations reference the same Measure Type with property-specific cost and impact. _Avoid_: measure (ambiguous), category ### Address matching **Lexiscore**: A similarity score in [0, 1] between a User Address and a candidate EPC address; combines token overlap and character-level similarity. _Avoid_: score, match score, similarity **Lexirank**: Dense rank of candidates sorted by Lexiscore descending; rank 1 = best match. _Avoid_: rank, position **UPRN Candidate**: An EPC Search Result that is a plausible match for a given User Address, before scoring decides the winner. _Avoid_: match candidate, result **Score Threshold**: The minimum Lexiscore (currently 0.6) below which no match is returned even if a candidate exists. _Avoid_: minimum score, cutoff **Ambiguous Match**: A matching outcome where two or more candidates share Lexirank 1, making it impossible to select a unique winner. _Avoid_: tie, draw, duplicate **Best Match**: The single UPRN Candidate with Lexirank 1 that meets or exceeds the Score Threshold. _Avoid_: winner, top result ### API and integration **EPC Search Result**: A lightweight record returned by the government domestic search endpoint — address lines, postcode, UPRN, band, and certificate number, but not full certificate data. _Avoid_: search row, EPC row, result **EPC Property Data**: The fully mapped domain object produced after fetching and parsing a complete EPC certificate; the schema the modelling pipeline operates against. _Avoid_: EPC data, certificate data, parsed EPC **Old EPC API**: The retired government API (`epc.opendatacommunities.org`) using HTTP Basic auth; decommissioned 30 May 2026. _Avoid_: legacy API **New EPC API**: The replacement government API (`api.get-energy-performance-data.communities.gov.uk`) using Bearer Token auth. _Avoid_: new API, current API **Bearer Token**: The auth credential required by the New EPC API; stored in the `EPC_AUTH_TOKEN` environment variable. _Avoid_: API key, auth token, secret ## Relationships - A **Property** represents a single physical dwelling for modelling; identified by `(portfolio_id, UPRN)` or `(portfolio_id, landlord_property_id)`. - A **Property** has zero or more **EPCs** across time, exactly one **Effective EPC**, zero or one set of **Site Notes**, and zero or one set of **Landlord Overrides**. - An **EPC** belongs to exactly one **Property** and has one **Certificate Number**. - An **EPC** carries an **EPC Band** and is identifiable by its **Registration Date**; the most recent one is the current. - A **UPRN** identifies a physical dwelling permanently; it does not change when the property changes owner — but each portfolio gets its own **Property** keyed against it. - When a **Property** has both **Site Notes** and a public **EPC**, the newer of the two derives the **Effective EPC**. **Landlord Overrides** apply only when the **EPC** is the source — never when **Site Notes** are. - A Property's **Baseline Performance** holds two halves: **Lodged Performance** (the gov register's SAP / band / carbon / heat) and **Effective Performance** (what the modelling pipeline scored against). The two are equal unless **Rebaselining** fires. - **Rebaselining** produces **Effective Performance** by ML re-prediction across SAP score, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh, when either (a) the Effective EPC was lodged under a pre-SAP10 schema, or (b) the Effective EPC's physical state diverges from the lodged EPC. **Lodged Performance** is never overwritten. - **Bill Derivation** derives **fuel split** and **bills** from kWh values (sourced from the EPC's `renewable_heat_incentive` fields for baseline SAP10 properties, or from ML when Rebaselining fires), reading current **Fuel Rates** and **Carbon Factors** from their respective repos. - The **EPC Prediction Service** uses **Comparable Properties** for both gap-filling and producing **EPC Anomaly Flags**. - A **Scenario** carries one or more ordered **Scenario Phases**. Triggering the model against N Scenarios produces N **Plans** per Property; each Plan carries an ordered list of **Plan Phases** matching the Scenario's shape. - Each **Plan Phase** holds its **Optimised Package**, the ending state snapshot, and any **Rolled-over Options** that flow as candidates into the next Plan Phase. A single-phase Scenario is one Scenario Phase with all measure types allowed; the same machinery handles it. - A **Scenario Snapshot** is pinned at trigger time per (task, scenario) so mid-run edits to the live Scenario do not affect an in-flight modelling job. - A **Recommendation** references one **Measure Type** and carries property-specific cost and impact. - **Address Matching** uses a **User Address** and **Postcode** to find a **UPRN** by scoring **UPRN Candidates** from an EPC search. A **Lexirank** of 1 with no **Ambiguous Match** and a **Lexiscore** ≥ the **Score Threshold** produces a **Best Match**. ## Example dialogue > **Dev:** "A landlord uploads a corrected boiler for one of their properties. What happens?" > > **Domain expert:** "That's a **Landlord Override** on the heating fields. Save it against the **Property**. The **Effective EPC** has changed, so **Rebaselining** runs to re-predict SAP / carbon / PEUI / space heating kWh / hot water kWh, and **Bill Derivation** re-runs to update the fuel split and bills based on the new kWh values and fuel deduction. With fresh **Baseline Performance** we regenerate **Recommendations**." > **Dev:** "What if the same Property also has Site Notes?" > > **Domain expert:** "**Site Notes** supersede the public **EPC**, so **Landlord Overrides** don't apply. We model from the **Site Notes** version of the **Effective EPC**. If the public **EPC** is newer than the **Site Notes**, that's the one exception — we use the newer one." > **Dev:** "After modelling we end up with a list of measures. Which ones get installed?" > > **Domain expert:** "The **Optimiser Service** picks the **Optimised Package** — a subset of **Recommendations** that hits the **Scenario** goal within budget. The rest stay in the **Plan** as alternatives the user can swap in." > **Dev:** "I'm looking at a property where the EPC says cavity walls but every other house on the street has solid. Is that a bug?" > > **Domain expert:** "That's an **EPC Anomaly Flag**. We compute it against the **Comparable Properties** for that postcode. It's advisory — the UI surfaces it and the landlord can apply a **Landlord Override** if it's wrong." > **Dev:** "The property card shows two SAP scores side by side. Why?" > > **Domain expert:** "Those are **Lodged Performance** and **Effective Performance**. **Lodged** is what the gov register says — the EPC was rated under SAP 2012. **Effective** is what we scored against — we ran **Rebaselining** to predict the SAP10-equivalent rating because the methodology changed. Both stay on the **Baseline Performance** so users can see what's on record and what we're modelling against." > **Dev:** "A landlord wants a 3-year retrofit plan — fabric work this year, heat pump next, solar after. How do we model that?" > > **Domain expert:** "Three **Scenario Phases** in one **Scenario**. Phase 1 allows fabric measures with this year's budget, phase 2 allows the heat pump with next year's budget, phase 3 allows solar. When we model, the **Optimiser Service** runs per phase against the rolling state — the heat pump is scored against the post-insulation property, not the original one. Each **Plan Phase** captures the **Optimised Package** plus the ending SAP / bills, and any **Rolled-over Options** that didn't make this phase's budget become candidates next phase." ## Flagged ambiguities - **"property"** was historically warned against in favour of "dwelling"; that has been inverted. **Property** is now canonical for the Ara domain aggregate. Legacy code still uses "dwelling" in places — treat as alias. - **"energy assessment"** in the existing codebase (`energy_assessment_functions`, `energy_assessments_by_uprn`) refers to what is now canonically called **Site Notes**. New code uses **Site Notes**. - **"patch"** / `patch_epc` in the existing codebase has been merged into **Landlord Overrides**; the original concept is deprecated. - **"already_installed measures"** in the existing codebase is likely subsumed by **Landlord Overrides** ("we have a heat pump now" → override the heating fields). Final call deferred to implementation. - **"address"** appears as both the raw **User Address** (free-text from customer data, or the structured `UserAddress` dataclass that wraps it) and a structured field on an **EPC Search Result** (normalised lines). Always qualify: "user address" vs "EPC address" or "address line 1". Within `domain/`, **User Address** specifically means the `UserAddress` dataclass; in upstream ingestion contexts (CSV columns, SQS payloads) it can still mean the raw string sense. - **"score"** is used for `AddressMatch.score()` output, the `lexiscore` column, and informally. Prefer **Lexiscore** in domain discussions; reserve "score" for method-level code comments. - **"user_inputed_address"** in `backend/address2UPRN/main.py` is a misspelling and a synonym for **User Address** — the canonical term. New code should use `user_address`. - **"EPC"** is overloaded as both the document and the rating band letter. Use **EPC** for the document, **EPC Band** for the letter. - **"re-scoring"** has two meanings in the codebase — **Rebaselining** (re-predicting baseline performance after an EPC change) and post-optimisation measure re-prediction. Prefer **Rebaselining** for the former; for the latter, the **Optimiser Service** step does its own scoring without a special name. - **"phase"** appears in two unrelated contexts: as cut-over timeline language in the PRD ("Phase 0 — Status quo", "Phase 1 — Forced cut-over") and as a domain concept in **Scenario Phase** / **Plan Phase**. Only the latter is a glossary term; cut-over phases are project-management vocabulary that does not enter code. - **"stale"** appears in two senses: cache-freshness ("a Repo record is stale and the orchestrator should refetch") — a legitimate operational concept; and as loose shorthand for the EPC's recorded cost fields being unusable. The cost fields are not stale — they are pinned to the inspection-date fuel rates by design. Use "pinned to inspection date" or "pre-SAP10 schema" (whichever applies) instead.