Model/CONTEXT.md
2026-05-13 21:52:02 +00:00

13 KiB
Raw Blame History

Ara

The Domna product for domestic retrofit modelling: ingests open-source EPC data, lets users correct or supersede it with their own surveys, and produces optimised retrofit packages for each property in a portfolio.

Language

Product

Ara: The Domna product. Latin for "the altar"; named under Domna's classical-naming convention. Covers both the modelling product and the backend that powers it. Avoid: ARA (acronym style), v2 backend, the new backend

Domna: The company. Roman name; sibling to Ara in the same naming convention.

Energy Performance Certificates

EPC: An Energy Performance Certificate — a government-issued document rating a dwelling's energy efficiency from A (best) to G (worst). Avoid: energy certificate, energy report

Certificate Number: The unique identifier assigned to an EPC by the government registry. Avoid: cert number, EPC ID

Registration Date: The date an EPC was lodged with the government register; used to identify the most recent certificate for a property. Avoid: assessment date, submission date

EPC Band: A single letter AG representing a property's current or potential energy efficiency rating. Avoid: energy rating, EPC grade, EPC score

Schema Type: The versioned RdSAP or SAP schema that describes the structure of an EPC's raw data (e.g. RdSAP-Schema-21.0.1). Avoid: schema version, EPC format

Domestic Certificate: An EPC issued for a residential dwelling, as opposed to a commercial one. Avoid: residential EPC, home EPC

Properties and addresses

Property: The Ara domain aggregate representing a single dwelling under modelling: its identity, source data, enrichments, and modelling outputs. Avoid: dwelling, unit, home, asset

Properties: A first-class collection of Property objects; the unit of bulk operation in services. Avoid: property list, batch (used for SQS chunks)

UPRN: Unique Property Reference Number — the government-issued permanent identifier for a physical address in the UK. Avoid: property ID, address ID, code

Postcode: A UK postal code used to group nearby addresses; the primary search key for finding EPC records. Avoid: zip code, postal code

User Address: A free-text address string provided by a user or imported from a customer dataset, before any normalisation or matching. Avoid: user input, raw address, user_inputed_address

Comparable Properties: The reference cohort matched to a target Property by both geographic proximity (postcode prefix / UPRN range) and physical similarity (property type, built form, age band); used by the EPC Prediction Service for gap-filling and anomaly detection. Avoid: neighbours, similar properties, peer set

Source data

Site Notes: The full-coverage record produced by a Domna survey of a single Property; carries every EPC field the modelling pipeline requires, and when present supersedes the public EPC for that Property — except when the public EPC is newer. Avoid: energy assessment, site survey, field survey, Domna survey, Hestia survey

Landlord Overrides: Property data supplied by a landlord that may correct or supplement the public EPC for a single Property; triggers Rebaselining when applied; not applicable when Site Notes are present. Avoid: patches (deprecated), corrections, manual EPC, edits

Modelling

Effective EPC: The EpcPropertyData scored by the modelling pipeline for a single Property, derived from either Site Notes alone or the public EPC with Landlord Overrides applied; carries source-derived physical fields and originally recorded performance values, with model-rebaselined performance held separately in Baseline Performance. Avoid: modelling EPC, working EPC, resolved EPC, derived EPC

Rebaselining: Re-predicting a Property's SAP, carbon emissions, and heat demand via ML when its Effective EPC's physical state diverges from the originally lodged public EPC (because Site Notes or Landlord Overrides have changed walls / heating / windows / etc.). Does not include kWh — that is always derived deterministically. Avoid: re-scoring, re-prediction, performance recomputation

Baseline Performance: A Property's current performance values — SAP, carbon emissions, heat demand, annual kWh, fuel split, bills — held against the Effective EPC. SAP / carbon / heat come directly from the Effective EPC's recorded values when no override applies, or from Rebaselining when an override changes physical state. Annual kWh and the fuel split are always derived deterministically by the EPC Energy Derivation Service. Avoid: baseline predictions, predicted baseline, rebaselined values

EPC Energy Derivation: The deterministic process that derives a Property's annual kWh, fuel split (gas / electric / other), and bills from the Effective EPC's energy fields — applying a UCL-style correction for known EPC over/under-prediction and deducing fuel type for heating + hot water from the SAP heating fields. No ML. Avoid: kWh prediction, baseline kWh, energy estimation

EPC Anomaly Flag: A per-field indicator that a Property's value for an EPC field differs significantly from Comparable Properties; advisory only — surfaces in the UI to prompt user review, does not block modelling. Avoid: outlier, mismatch, divergence flag

Outputs

Scenario: A named portfolio-level container for a single modelling run, capturing the goal (e.g. Increasing EPC), budget, exclusions, and housing type; holds many Plans. Avoid: project, batch, run-set

Plan: The per-Property output of a single modelling run; belongs to one Scenario and carries the Property's full Recommendation list, Optimised Package, and post-retrofit predictions. Avoid: recommendation set, output, result

Recommendation: A single proposed retrofit measure for a Property, with its cost, SAP impact, kWh savings, carbon savings, and parts list. Avoid: suggestion, option

Optimised Package: The subset of a Property's Recommendations selected by the Optimiser Service for installation, chosen to satisfy the Scenario's goal subject to budget. Avoid: selected measures, default measures, optimal solution, recommended bundle

Measure Type: The catalogue classification of a retrofit measure (e.g. solar_pv, loft_insulation, ashp); one or more Recommendations reference the same Measure Type with property-specific cost and impact. Avoid: measure (ambiguous), category

Address matching

Lexiscore: A similarity score in [0, 1] between a User Address and a candidate EPC address; combines token overlap and character-level similarity. Avoid: score, match score, similarity

Lexirank: Dense rank of candidates sorted by Lexiscore descending; rank 1 = best match. Avoid: rank, position

UPRN Candidate: An EPC Search Result that is a plausible match for a given User Address, before scoring decides the winner. Avoid: match candidate, result

Score Threshold: The minimum Lexiscore (currently 0.6) below which no match is returned even if a candidate exists. Avoid: minimum score, cutoff

Ambiguous Match: A matching outcome where two or more candidates share Lexirank 1, making it impossible to select a unique winner. Avoid: tie, draw, duplicate

Best Match: The single UPRN Candidate with Lexirank 1 that meets or exceeds the Score Threshold. Avoid: winner, top result

API and integration

EPC Search Result: A lightweight record returned by the government domestic search endpoint — address lines, postcode, UPRN, band, and certificate number, but not full certificate data. Avoid: search row, EPC row, result

EPC Property Data: The fully mapped domain object produced after fetching and parsing a complete EPC certificate; the schema the modelling pipeline operates against. Avoid: EPC data, certificate data, parsed EPC

Old EPC API: The retired government API (epc.opendatacommunities.org) using HTTP Basic auth; decommissioned 30 May 2026. Avoid: legacy API

New EPC API: The replacement government API (api.get-energy-performance-data.communities.gov.uk) using Bearer Token auth. Avoid: new API, current API

Bearer Token: The auth credential required by the New EPC API; stored in the EPC_AUTH_TOKEN environment variable. Avoid: API key, auth token, secret

Relationships

  • A Property represents a single physical dwelling for modelling; identified by (portfolio_id, UPRN) or (portfolio_id, landlord_property_id).
  • A Property has zero or more EPCs across time, exactly one Effective EPC, zero or one set of Site Notes, and zero or one set of Landlord Overrides.
  • An EPC belongs to exactly one Property and has one Certificate Number.
  • An EPC carries an EPC Band and is identifiable by its Registration Date; the most recent one is the current.
  • A UPRN identifies a physical dwelling permanently; it does not change when the property changes owner — but each portfolio gets its own Property keyed against it.
  • When a Property has both Site Notes and a public EPC, the newer of the two derives the Effective EPC. Landlord Overrides apply only when the EPC is the source — never when Site Notes are.
  • Rebaselining contributes the SAP / carbon / heat parts of Baseline Performance when the Effective EPC physical state diverges from the originally lodged EPC. EPC Energy Derivation contributes the kWh / fuel split / bills parts unconditionally for every Property.
  • The EPC Prediction Service uses Comparable Properties for both gap-filling and producing EPC Anomaly Flags.
  • A Scenario contains many Plans (one per Property). A Plan carries many Recommendations; the Optimised Package is the subset selected for installation.
  • A Recommendation references one Measure Type and carries property-specific cost and impact.
  • Address Matching uses a User Address and Postcode to find a UPRN by scoring UPRN Candidates from an EPC search. A Lexirank of 1 with no Ambiguous Match and a Lexiscore ≥ the Score Threshold produces a Best Match.

Example dialogue

Dev: "A landlord uploads a corrected boiler for one of their properties. What happens?"

Domain expert: "That's a Landlord Override on the heating fields. Save it against the Property. The Effective EPC has changed, so Rebaselining runs to re-predict SAP / carbon / heat, and EPC Energy Derivation re-runs to update kWh / bills based on the new fuel deduction. With fresh Baseline Performance we regenerate Recommendations."

Dev: "What if the same Property also has Site Notes?"

Domain expert: "Site Notes supersede the public EPC, so Landlord Overrides don't apply. We model from the Site Notes version of the Effective EPC. If the public EPC is newer than the Site Notes, that's the one exception — we use the newer one."

Dev: "After modelling we end up with a list of measures. Which ones get installed?"

Domain expert: "The Optimiser Service picks the Optimised Package — a subset of Recommendations that hits the Scenario goal within budget. The rest stay in the Plan as alternatives the user can swap in."

Dev: "I'm looking at a property where the EPC says cavity walls but every other house on the street has solid. Is that a bug?"

Domain expert: "That's an EPC Anomaly Flag. We compute it against the Comparable Properties for that postcode. It's advisory — the UI surfaces it and the landlord can apply a Landlord Override if it's wrong."

Flagged ambiguities

  • "property" was historically warned against in favour of "dwelling"; that has been inverted. Property is now canonical for the Ara domain aggregate. Legacy code still uses "dwelling" in places — treat as alias.
  • "energy assessment" in the existing codebase (energy_assessment_functions, energy_assessments_by_uprn) refers to what is now canonically called Site Notes. New code uses Site Notes.
  • "patch" / patch_epc in the existing codebase has been merged into Landlord Overrides; the original concept is deprecated.
  • "already_installed measures" in the existing codebase is likely subsumed by Landlord Overrides ("we have a heat pump now" → override the heating fields). Final call deferred to implementation.
  • "address" appears as both the raw User Address (free-text) and a structured field on an EPC Search Result (normalised lines). Always qualify: "user address" vs "EPC address" or "address line 1".
  • "score" is used for AddressMatch.score() output, the lexiscore column, and informally. Prefer Lexiscore in domain discussions; reserve "score" for method-level code comments.
  • "user_inputed_address" in backend/address2UPRN/main.py is a misspelling and a synonym for User Address — the canonical term. New code should use user_address.
  • "EPC" is overloaded as both the document and the rating band letter. Use EPC for the document, EPC Band for the letter.
  • "re-scoring" has two meanings in the codebase — Rebaselining (re-predicting baseline performance after an EPC change) and post-optimisation measure re-prediction. Prefer Rebaselining for the former; for the latter, the Optimiser Service step does its own scoring without a special name.