# Ara The Domna product for domestic retrofit modelling: ingests open-source EPC data, lets users correct or supersede it with their own surveys, and produces optimised retrofit packages for each property in a portfolio. ## Language ### Product **Ara**: The Domna product. Latin for "the altar"; named under Domna's classical-naming convention. Covers both the modelling product and the backend that powers it. _Avoid_: ARA (acronym style), v2 backend, the new backend **Domna**: The company. Roman name; sibling to Ara in the same naming convention. ### Energy Performance Certificates **EPC**: An Energy Performance Certificate — a government-issued document rating a dwelling's energy efficiency from A (best) to G (worst). _Avoid_: energy certificate, energy report **Certificate Number**: The unique identifier assigned to an EPC by the government registry. _Avoid_: cert number, EPC ID **Registration Date**: The date an EPC was lodged with the government register; used to identify the most recent certificate for a property. _Avoid_: assessment date, submission date **EPC Band**: A single letter A–G representing a property's current or potential energy efficiency rating. _Avoid_: energy rating, EPC grade, EPC score **Schema Type**: The versioned RdSAP or SAP schema that describes the structure of an EPC's raw data (e.g. `RdSAP-Schema-21.0.1`). _Avoid_: schema version, EPC format **Domestic Certificate**: An EPC issued for a residential dwelling, as opposed to a commercial one. _Avoid_: residential EPC, home EPC ### Properties and addresses **Property**: The Ara domain aggregate representing a single dwelling under modelling: its identity, source data, enrichments, and modelling outputs. _Avoid_: dwelling, unit, home, asset **Properties**: A first-class collection of Property objects; the unit of bulk operation in services. _Avoid_: property list, batch (used for SQS chunks) **UPRN**: Unique Property Reference Number — the government-issued permanent identifier for a physical address in the UK. _Avoid_: property ID, address ID, code **Postcode**: A UK postal code used to group nearby addresses; the primary search key for finding EPC records. _Avoid_: zip code, postal code **User Address**: A free-text address string provided by a user or imported from a customer dataset, before any normalisation or matching. _Avoid_: user input, raw address, user_inputed_address **Comparable Properties**: The reference cohort matched to a target Property by both geographic proximity (postcode prefix / UPRN range) and physical similarity (property type, built form, age band); used by the EPC Prediction Service for gap-filling and anomaly detection. _Avoid_: neighbours, similar properties, peer set ### Source data **Site Notes**: The full-coverage record produced by a Domna survey of a single Property; carries every EPC field the modelling pipeline requires, and when present supersedes the public EPC for that Property — except when the public EPC is newer. _Avoid_: energy assessment, site survey, field survey, Domna survey, Hestia survey **Landlord Overrides**: Property data supplied by a landlord that may correct or supplement the public EPC for a single Property; triggers Rebaselining when applied; not applicable when Site Notes are present. _Avoid_: patches (deprecated), corrections, manual EPC, edits ### Modelling **Effective EPC**: The EpcPropertyData scored by the modelling pipeline for a single Property, derived from either Site Notes alone or the public EPC with Landlord Overrides applied; carries source-derived physical fields and originally recorded performance values, with model-rebaselined performance held separately in Baseline Performance. _Avoid_: modelling EPC, working EPC, resolved EPC, derived EPC **Rebaselining**: Re-predicting a Property's SAP, carbon emissions, and heat demand via ML so the modelling pipeline scores it against the current SAP10 methodology. Triggered when either (a) the Effective EPC was lodged under a pre-SAP10 schema (`sap_version < 10.0`), so the recorded scores reflect a superseded methodology, or (b) Site Notes / Landlord Overrides changed the physical state of the Property (walls / heating / windows / etc.) so the lodged scores no longer reflect what's installed. Both triggers may fire together. Produces Effective Performance; Lodged Performance is preserved unchanged. Does not include kWh — that is always derived deterministically by EPC Energy Derivation. _Avoid_: re-scoring, re-prediction, performance recomputation, refresh (for cache-freshness) **Baseline Performance**: A Property's current performance aggregate, holding both Lodged Performance and Effective Performance plus annual kWh / fuel split / bills derived from the Effective EPC. Persisted as one row; surfaced as one block in the UI. _Avoid_: baseline predictions, predicted baseline, rebaselined values **Lodged Performance**: The SAP / EPC Band / carbon emissions / heat demand recorded on the public EPC (or the Site Notes' as-surveyed values when Site Notes are the source) — unmodified by modelling. The half of Baseline Performance that says "what the government register says about this Property". _Avoid_: original performance, raw EPC values, recorded baseline **Effective Performance**: The SAP / EPC Band / carbon emissions / heat demand the modelling pipeline actually scored against — equal to Lodged Performance when no Rebaselining trigger fires, replaced by ML output when triggered. The half of Baseline Performance that says "what we modelled". _Avoid_: modelled performance, rebaselined performance (only correct when rebaselining ran), scored values **EPC Energy Derivation**: The deterministic process that derives a Property's annual kWh, fuel split across heating, hot water, lighting, appliances and cooking, and bills from the Effective EPC — applying a UCL Correction for known EPC over/under-prediction and deducing fuel type from the SAP heating fields. No ML. _Avoid_: kWh prediction, baseline kWh, energy estimation **UCL Correction**: The per-band linear correction (Few et al. 2023, _Energy & Buildings_ 288 113024) applied to EPC-modelled total primary energy use intensity to align it with metered consumption. Calibrated against gas-heated, non-PV homes in England and Wales rated under SAP 2012; the current implementation extrapolates it to all properties (open question §15.14). _Avoid_: UCL adjustment, energy correction, metered correction **EPC Anomaly Flag**: A per-field indicator that a Property's value for an EPC field differs significantly from Comparable Properties; advisory only — surfaces in the UI to prompt user review, does not block modelling. _Avoid_: outlier, mismatch, divergence flag ### Reference data **Fuel Rates**: The current per-fuel rate (pence/kWh) and standing charge used to compute a Property's bills; time-versioned and regional, refreshed from Ofgem's published caps via an ETL. The Smart Export Guarantee rate sits in the same set as `electricity_export`. Consumed by EPC Energy Derivation. _Avoid_: fuel prices (commodity prices, different concept), tariff, energy cost **Carbon Factors**: The per-fuel CO2 emission factor (kgCO2e/kWh) used to compute a Property's carbon emissions; time-versioned, refreshed from Defra's annual publication. Consumed by EPC Energy Derivation. _Avoid_: emission factors (ambiguous), CO2 rates ### Outputs **Scenario**: A named portfolio-level retrofit plan, built by a user in the scenario-builder UI and persisted before any modelling fires; carries the overall goal (e.g. Increasing EPC), budget, exclusions, housing type, and an ordered list of Scenario Phases. The model is triggered against one or more Scenarios at once; each Scenario yields one Plan per Property. _Avoid_: project, batch, run-set **Scenario Phase**: One ordered step inside a Scenario, carrying a measure-type allowlist (e.g. "loft insulation and walls in phase 1; ASHP in phase 2"), an optional phase budget, and an optional phase target. A single-phase Scenario is one Scenario Phase with all measure types allowed and the full budget on it — there is no special-case path. _Avoid_: scenario stage, scenario step, tranche **Scenario Snapshot**: A frozen copy of a Scenario pinned at trigger time, keyed by (task, scenario); used by the modelling pipeline so mid-run edits to the live Scenario do not affect an in-flight job. Snapshots are read-only and may be garbage-collected after the task completes. _Avoid_: scenario version, frozen scenario, pinned scenario **Plan**: The per-Property output of one Scenario's modelling run; carries an ordered list of Plan Phases matching the Scenario's Phase shape. A Property modelled against N Scenarios in one trigger ends up with N Plans. _Avoid_: recommendation set, output, result **Plan Phase**: The per-Property output of one Scenario Phase: the Optimised Package selected for that phase, the ending state snapshot (the Property's SAP / kWh / bills after the package is applied), and any Rolled-over Options that flow as candidates into the next Plan Phase. _Avoid_: plan stage, plan step **Rolled-over Options**: Recommendations generated but not selected by the Optimiser in a given Plan Phase, that remain eligible as candidates in subsequent Plan Phases. Exact roll-over rule (automatic vs user-marked) is under design. _Avoid_: deferred measures, leftover recommendations **Recommendation**: A single proposed retrofit measure for a Property, with its cost, SAP impact, kWh savings, carbon savings, and parts list. _Avoid_: suggestion, option **Optimised Package**: The subset of a Property's Recommendations selected by the Optimiser Service for installation, chosen to satisfy the Scenario's goal subject to budget. _Avoid_: selected measures, default measures, optimal solution, recommended bundle **Measure Type**: The catalogue classification of a retrofit measure (e.g. `solar_pv`, `loft_insulation`, `ashp`); one or more Recommendations reference the same Measure Type with property-specific cost and impact. _Avoid_: measure (ambiguous), category ### Address matching **Lexiscore**: A similarity score in [0, 1] between a User Address and a candidate EPC address; combines token overlap and character-level similarity. _Avoid_: score, match score, similarity **Lexirank**: Dense rank of candidates sorted by Lexiscore descending; rank 1 = best match. _Avoid_: rank, position **UPRN Candidate**: An EPC Search Result that is a plausible match for a given User Address, before scoring decides the winner. _Avoid_: match candidate, result **Score Threshold**: The minimum Lexiscore (currently 0.6) below which no match is returned even if a candidate exists. _Avoid_: minimum score, cutoff **Ambiguous Match**: A matching outcome where two or more candidates share Lexirank 1, making it impossible to select a unique winner. _Avoid_: tie, draw, duplicate **Best Match**: The single UPRN Candidate with Lexirank 1 that meets or exceeds the Score Threshold. _Avoid_: winner, top result ### API and integration **EPC Search Result**: A lightweight record returned by the government domestic search endpoint — address lines, postcode, UPRN, band, and certificate number, but not full certificate data. _Avoid_: search row, EPC row, result **EPC Property Data**: The fully mapped domain object produced after fetching and parsing a complete EPC certificate; the schema the modelling pipeline operates against. _Avoid_: EPC data, certificate data, parsed EPC **Old EPC API**: The retired government API (`epc.opendatacommunities.org`) using HTTP Basic auth; decommissioned 30 May 2026. _Avoid_: legacy API **New EPC API**: The replacement government API (`api.get-energy-performance-data.communities.gov.uk`) using Bearer Token auth. _Avoid_: new API, current API **Bearer Token**: The auth credential required by the New EPC API; stored in the `EPC_AUTH_TOKEN` environment variable. _Avoid_: API key, auth token, secret ## Relationships - A **Property** represents a single physical dwelling for modelling; identified by `(portfolio_id, UPRN)` or `(portfolio_id, landlord_property_id)`. - A **Property** has zero or more **EPCs** across time, exactly one **Effective EPC**, zero or one set of **Site Notes**, and zero or one set of **Landlord Overrides**. - An **EPC** belongs to exactly one **Property** and has one **Certificate Number**. - An **EPC** carries an **EPC Band** and is identifiable by its **Registration Date**; the most recent one is the current. - A **UPRN** identifies a physical dwelling permanently; it does not change when the property changes owner — but each portfolio gets its own **Property** keyed against it. - When a **Property** has both **Site Notes** and a public **EPC**, the newer of the two derives the **Effective EPC**. **Landlord Overrides** apply only when the **EPC** is the source — never when **Site Notes** are. - A Property's **Baseline Performance** holds two halves: **Lodged Performance** (the gov register's SAP / band / carbon / heat) and **Effective Performance** (what the modelling pipeline scored against). The two are equal unless **Rebaselining** fires. - **Rebaselining** produces **Effective Performance** by ML re-prediction when either (a) the Effective EPC was lodged under a pre-SAP10 schema, or (b) the Effective EPC's physical state diverges from the lodged EPC. **Lodged Performance** is never overwritten. - **EPC Energy Derivation** contributes the annual kWh, fuel split, and bills on every Property unconditionally, reading current **Fuel Rates** and **Carbon Factors** from their respective repos. - The **EPC Prediction Service** uses **Comparable Properties** for both gap-filling and producing **EPC Anomaly Flags**. - A **Scenario** carries one or more ordered **Scenario Phases**. Triggering the model against N Scenarios produces N **Plans** per Property; each Plan carries an ordered list of **Plan Phases** matching the Scenario's shape. - Each **Plan Phase** holds its **Optimised Package**, the ending state snapshot, and any **Rolled-over Options** that flow as candidates into the next Plan Phase. A single-phase Scenario is one Scenario Phase with all measure types allowed; the same machinery handles it. - A **Scenario Snapshot** is pinned at trigger time per (task, scenario) so mid-run edits to the live Scenario do not affect an in-flight modelling job. - A **Recommendation** references one **Measure Type** and carries property-specific cost and impact. - **Address Matching** uses a **User Address** and **Postcode** to find a **UPRN** by scoring **UPRN Candidates** from an EPC search. A **Lexirank** of 1 with no **Ambiguous Match** and a **Lexiscore** ≥ the **Score Threshold** produces a **Best Match**. ## Example dialogue > **Dev:** "A landlord uploads a corrected boiler for one of their properties. What happens?" > > **Domain expert:** "That's a **Landlord Override** on the heating fields. Save it against the **Property**. The **Effective EPC** has changed, so **Rebaselining** runs to re-predict SAP / carbon / heat, and **EPC Energy Derivation** re-runs to update kWh / bills based on the new fuel deduction. With fresh **Baseline Performance** we regenerate **Recommendations**." > **Dev:** "What if the same Property also has Site Notes?" > > **Domain expert:** "**Site Notes** supersede the public **EPC**, so **Landlord Overrides** don't apply. We model from the **Site Notes** version of the **Effective EPC**. If the public **EPC** is newer than the **Site Notes**, that's the one exception — we use the newer one." > **Dev:** "After modelling we end up with a list of measures. Which ones get installed?" > > **Domain expert:** "The **Optimiser Service** picks the **Optimised Package** — a subset of **Recommendations** that hits the **Scenario** goal within budget. The rest stay in the **Plan** as alternatives the user can swap in." > **Dev:** "I'm looking at a property where the EPC says cavity walls but every other house on the street has solid. Is that a bug?" > > **Domain expert:** "That's an **EPC Anomaly Flag**. We compute it against the **Comparable Properties** for that postcode. It's advisory — the UI surfaces it and the landlord can apply a **Landlord Override** if it's wrong." > **Dev:** "The property card shows two SAP scores side by side. Why?" > > **Domain expert:** "Those are **Lodged Performance** and **Effective Performance**. **Lodged** is what the gov register says — the EPC was rated under SAP 2012. **Effective** is what we scored against — we ran **Rebaselining** to predict the SAP10-equivalent rating because the methodology changed. Both stay on the **Baseline Performance** so users can see what's on record and what we're modelling against." > **Dev:** "A landlord wants a 3-year retrofit plan — fabric work this year, heat pump next, solar after. How do we model that?" > > **Domain expert:** "Three **Scenario Phases** in one **Scenario**. Phase 1 allows fabric measures with this year's budget, phase 2 allows the heat pump with next year's budget, phase 3 allows solar. When we model, the **Optimiser Service** runs per phase against the rolling state — the heat pump is scored against the post-insulation property, not the original one. Each **Plan Phase** captures the **Optimised Package** plus the ending SAP / bills, and any **Rolled-over Options** that didn't make this phase's budget become candidates next phase." ## Flagged ambiguities - **"property"** was historically warned against in favour of "dwelling"; that has been inverted. **Property** is now canonical for the Ara domain aggregate. Legacy code still uses "dwelling" in places — treat as alias. - **"energy assessment"** in the existing codebase (`energy_assessment_functions`, `energy_assessments_by_uprn`) refers to what is now canonically called **Site Notes**. New code uses **Site Notes**. - **"patch"** / `patch_epc` in the existing codebase has been merged into **Landlord Overrides**; the original concept is deprecated. - **"already_installed measures"** in the existing codebase is likely subsumed by **Landlord Overrides** ("we have a heat pump now" → override the heating fields). Final call deferred to implementation. - **"address"** appears as both the raw **User Address** (free-text) and a structured field on an **EPC Search Result** (normalised lines). Always qualify: "user address" vs "EPC address" or "address line 1". - **"score"** is used for `AddressMatch.score()` output, the `lexiscore` column, and informally. Prefer **Lexiscore** in domain discussions; reserve "score" for method-level code comments. - **"user_inputed_address"** in `backend/address2UPRN/main.py` is a misspelling and a synonym for **User Address** — the canonical term. New code should use `user_address`. - **"EPC"** is overloaded as both the document and the rating band letter. Use **EPC** for the document, **EPC Band** for the letter. - **"re-scoring"** has two meanings in the codebase — **Rebaselining** (re-predicting baseline performance after an EPC change) and post-optimisation measure re-prediction. Prefer **Rebaselining** for the former; for the latter, the **Optimiser Service** step does its own scoring without a special name. - **"phase"** appears in two unrelated contexts: as cut-over timeline language in the PRD ("Phase 0 — Status quo", "Phase 1 — Forced cut-over") and as a domain concept in **Scenario Phase** / **Plan Phase**. Only the latter is a glossary term; cut-over phases are project-management vocabulary that does not enter code. - **"stale"** appears in two senses: cache-freshness ("a Repo record is stale and the orchestrator should refetch") — a legitimate operational concept; and as loose shorthand for the EPC's recorded cost fields being unusable. The cost fields are not stale — they are pinned to the inspection-date fuel rates by design. Use "pinned to inspection date" or "pre-SAP10 schema" (whichever applies) instead.