mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
264 lines
20 KiB
Markdown
264 lines
20 KiB
Markdown
# Ara
|
||
|
||
The Domna product for domestic retrofit modelling: ingests open-source EPC data, lets users correct or supersede it with their own surveys, and produces optimised retrofit packages for each property in a portfolio.
|
||
|
||
## Language
|
||
|
||
### Product
|
||
|
||
**Ara**:
|
||
The Domna product. Latin for "the altar"; named under Domna's classical-naming convention. Covers both the modelling product and the backend that powers it.
|
||
_Avoid_: ARA (acronym style), v2 backend, the new backend
|
||
|
||
**Domna**:
|
||
The company. Roman name; sibling to Ara in the same naming convention.
|
||
|
||
### Energy Performance Certificates
|
||
|
||
**EPC**:
|
||
An Energy Performance Certificate — a government-issued document rating a dwelling's energy efficiency from A (best) to G (worst).
|
||
_Avoid_: energy certificate, energy report
|
||
|
||
**Certificate Number**:
|
||
The unique identifier assigned to an EPC by the government registry.
|
||
_Avoid_: cert number, EPC ID
|
||
|
||
**Registration Date**:
|
||
The date an EPC was lodged with the government register; used to identify the most recent certificate for a property.
|
||
_Avoid_: assessment date, submission date
|
||
|
||
**EPC Band**:
|
||
A single letter A–G representing a property's current or potential energy efficiency rating.
|
||
_Avoid_: energy rating, EPC grade, EPC score
|
||
|
||
**Schema Type**:
|
||
The versioned RdSAP or SAP schema that describes the structure of an EPC's raw data (e.g. `RdSAP-Schema-21.0.1`).
|
||
_Avoid_: schema version, EPC format
|
||
|
||
**Domestic Certificate**:
|
||
An EPC issued for a residential dwelling, as opposed to a commercial one.
|
||
_Avoid_: residential EPC, home EPC
|
||
|
||
### Properties and addresses
|
||
|
||
**Property**:
|
||
The Ara domain aggregate representing a single dwelling under modelling: its identity, source data, enrichments, and modelling outputs.
|
||
_Avoid_: dwelling, unit, home, asset
|
||
|
||
**Properties**:
|
||
A first-class collection of Property objects; the unit of bulk operation in services.
|
||
_Avoid_: property list, batch (used for SQS chunks)
|
||
|
||
**UPRN**:
|
||
Unique Property Reference Number — the government-issued permanent identifier for a physical address in the UK.
|
||
_Avoid_: property ID, address ID, code
|
||
|
||
**Postcode**:
|
||
A UK postal code used to group nearby addresses; the primary search key for finding EPC records.
|
||
_Avoid_: zip code, postal code
|
||
|
||
**User Address**:
|
||
A free-text address string provided by a user or imported from a customer dataset, before any normalisation or matching.
|
||
_Avoid_: user input, raw address, user_inputed_address
|
||
|
||
**Comparable Properties**:
|
||
The reference cohort matched to a target Property by both geographic proximity (postcode prefix / UPRN range) and physical similarity (property type, built form, age band); used by the EPC Prediction Service for gap-filling and anomaly detection.
|
||
_Avoid_: neighbours, similar properties, peer set
|
||
|
||
### Source data
|
||
|
||
**Site Notes**:
|
||
The full-coverage record produced by a Domna survey of a single Property; carries every EPC field the modelling pipeline requires, and when present supersedes the public EPC for that Property — except when the public EPC is newer.
|
||
_Avoid_: energy assessment, site survey, field survey, Domna survey, Hestia survey
|
||
|
||
**Landlord Overrides**:
|
||
Property data supplied by a landlord that may correct or supplement the public EPC for a single Property; triggers Rebaselining when applied; not applicable when Site Notes are present.
|
||
_Avoid_: patches (deprecated), corrections, manual EPC, edits
|
||
|
||
### Modelling
|
||
|
||
**Effective EPC**:
|
||
The EpcPropertyData scored by the modelling pipeline for a single Property, derived from either Site Notes alone or the public EPC with Landlord Overrides applied; carries source-derived physical fields and originally recorded performance values, with model-rebaselined performance held separately in Baseline Performance.
|
||
_Avoid_: modelling EPC, working EPC, resolved EPC, derived EPC
|
||
|
||
**Rebaselining**:
|
||
Re-predicting a Property's SAP, carbon emissions, and heat demand via ML so the modelling pipeline scores it against the current SAP10 methodology. Triggered when either (a) the Effective EPC was lodged under a pre-SAP10 schema (`sap_version < 10.0`), so the recorded scores reflect a superseded methodology, or (b) Site Notes / Landlord Overrides changed the physical state of the Property (walls / heating / windows / etc.) so the lodged scores no longer reflect what's installed. Both triggers may fire together. Produces Effective Performance; Lodged Performance is preserved unchanged. Does not include kWh — that is always derived deterministically by EPC Energy Derivation.
|
||
_Avoid_: re-scoring, re-prediction, performance recomputation, refresh (for cache-freshness)
|
||
|
||
**Baseline Performance**:
|
||
A Property's current performance aggregate, holding both Lodged Performance and Effective Performance plus annual kWh / fuel split / bills derived from the Effective EPC. Persisted as one row; surfaced as one block in the UI.
|
||
_Avoid_: baseline predictions, predicted baseline, rebaselined values
|
||
|
||
**Lodged Performance**:
|
||
The SAP / EPC Band / carbon emissions / heat demand recorded on the public EPC (or the Site Notes' as-surveyed values when Site Notes are the source) — unmodified by modelling. The half of Baseline Performance that says "what the government register says about this Property".
|
||
_Avoid_: original performance, raw EPC values, recorded baseline
|
||
|
||
**Effective Performance**:
|
||
The SAP / EPC Band / carbon emissions / heat demand the modelling pipeline actually scored against — equal to Lodged Performance when no Rebaselining trigger fires, replaced by ML output when triggered. The half of Baseline Performance that says "what we modelled".
|
||
_Avoid_: modelled performance, rebaselined performance (only correct when rebaselining ran), scored values
|
||
|
||
**EPC Energy Derivation**:
|
||
The deterministic process that derives a Property's annual kWh, fuel split across heating, hot water, lighting, appliances and cooking, and bills from the Effective EPC — applying a UCL Correction for known EPC over/under-prediction and deducing fuel type from the SAP heating fields. No ML.
|
||
_Avoid_: kWh prediction, baseline kWh, energy estimation
|
||
|
||
**UCL Correction**:
|
||
The per-band linear correction (Few et al. 2023, _Energy & Buildings_ 288 113024) applied to EPC-modelled total primary energy use intensity to align it with metered consumption. Calibrated against gas-heated, non-PV homes in England and Wales rated under SAP 2012; the current implementation extrapolates it to all properties (open question §15.14).
|
||
_Avoid_: UCL adjustment, energy correction, metered correction
|
||
|
||
**EPC Anomaly Flag**:
|
||
A per-field indicator that a Property's value for an EPC field differs significantly from Comparable Properties; advisory only — surfaces in the UI to prompt user review, does not block modelling.
|
||
_Avoid_: outlier, mismatch, divergence flag
|
||
|
||
### Reference data
|
||
|
||
**Fuel Rates**:
|
||
The current per-fuel rate (pence/kWh) and standing charge used to compute a Property's bills; time-versioned and regional, refreshed from Ofgem's published caps via an ETL. The Smart Export Guarantee rate sits in the same set as `electricity_export`. Consumed by EPC Energy Derivation.
|
||
_Avoid_: fuel prices (commodity prices, different concept), tariff, energy cost
|
||
|
||
**Carbon Factors**:
|
||
The per-fuel CO2 emission factor (kgCO2e/kWh) used to compute a Property's carbon emissions; time-versioned, refreshed from Defra's annual publication. Consumed by EPC Energy Derivation.
|
||
_Avoid_: emission factors (ambiguous), CO2 rates
|
||
|
||
### Outputs
|
||
|
||
**Scenario**:
|
||
A named portfolio-level retrofit plan, built by a user in the scenario-builder UI and persisted before any modelling fires; carries the overall goal (e.g. Increasing EPC), budget, exclusions, housing type, and an ordered list of Scenario Phases. The model is triggered against one or more Scenarios at once; each Scenario yields one Plan per Property.
|
||
_Avoid_: project, batch, run-set
|
||
|
||
**Scenario Phase**:
|
||
One ordered step inside a Scenario, carrying a measure-type allowlist (e.g. "loft insulation and walls in phase 1; ASHP in phase 2"), an optional phase budget, and an optional phase target. A single-phase Scenario is one Scenario Phase with all measure types allowed and the full budget on it — there is no special-case path.
|
||
_Avoid_: scenario stage, scenario step, tranche
|
||
|
||
**Scenario Snapshot**:
|
||
A frozen copy of a Scenario pinned at trigger time, keyed by (task, scenario); used by the modelling pipeline so mid-run edits to the live Scenario do not affect an in-flight job. Snapshots are read-only and may be garbage-collected after the task completes.
|
||
_Avoid_: scenario version, frozen scenario, pinned scenario
|
||
|
||
**Plan**:
|
||
The per-Property output of one Scenario's modelling run; carries an ordered list of Plan Phases matching the Scenario's Phase shape. A Property modelled against N Scenarios in one trigger ends up with N Plans.
|
||
_Avoid_: recommendation set, output, result
|
||
|
||
**Plan Phase**:
|
||
The per-Property output of one Scenario Phase: the Optimised Package selected for that phase, the ending state snapshot (the Property's SAP / kWh / bills after the package is applied), and any Rolled-over Options that flow as candidates into the next Plan Phase.
|
||
_Avoid_: plan stage, plan step
|
||
|
||
**Rolled-over Options**:
|
||
Recommendations generated but not selected by the Optimiser in a given Plan Phase, that remain eligible as candidates in subsequent Plan Phases. Exact roll-over rule (automatic vs user-marked) is under design.
|
||
_Avoid_: deferred measures, leftover recommendations
|
||
|
||
**Recommendation**:
|
||
A single proposed retrofit measure for a Property, with its cost, SAP impact, kWh savings, carbon savings, and parts list.
|
||
_Avoid_: suggestion, option
|
||
|
||
**Optimised Package**:
|
||
The subset of a Property's Recommendations selected by the Optimiser Service for installation, chosen to satisfy the Scenario's goal subject to budget.
|
||
_Avoid_: selected measures, default measures, optimal solution, recommended bundle
|
||
|
||
**Measure Type**:
|
||
The catalogue classification of a retrofit measure (e.g. `solar_pv`, `loft_insulation`, `ashp`); one or more Recommendations reference the same Measure Type with property-specific cost and impact.
|
||
_Avoid_: measure (ambiguous), category
|
||
|
||
### Address matching
|
||
|
||
**Lexiscore**:
|
||
A similarity score in [0, 1] between a User Address and a candidate EPC address; combines token overlap and character-level similarity.
|
||
_Avoid_: score, match score, similarity
|
||
|
||
**Lexirank**:
|
||
Dense rank of candidates sorted by Lexiscore descending; rank 1 = best match.
|
||
_Avoid_: rank, position
|
||
|
||
**UPRN Candidate**:
|
||
An EPC Search Result that is a plausible match for a given User Address, before scoring decides the winner.
|
||
_Avoid_: match candidate, result
|
||
|
||
**Score Threshold**:
|
||
The minimum Lexiscore (currently 0.6) below which no match is returned even if a candidate exists.
|
||
_Avoid_: minimum score, cutoff
|
||
|
||
**Ambiguous Match**:
|
||
A matching outcome where two or more candidates share Lexirank 1, making it impossible to select a unique winner.
|
||
_Avoid_: tie, draw, duplicate
|
||
|
||
**Best Match**:
|
||
The single UPRN Candidate with Lexirank 1 that meets or exceeds the Score Threshold.
|
||
_Avoid_: winner, top result
|
||
|
||
### API and integration
|
||
|
||
**EPC Search Result**:
|
||
A lightweight record returned by the government domestic search endpoint — address lines, postcode, UPRN, band, and certificate number, but not full certificate data.
|
||
_Avoid_: search row, EPC row, result
|
||
|
||
**EPC Property Data**:
|
||
The fully mapped domain object produced after fetching and parsing a complete EPC certificate; the schema the modelling pipeline operates against.
|
||
_Avoid_: EPC data, certificate data, parsed EPC
|
||
|
||
**Old EPC API**:
|
||
The retired government API (`epc.opendatacommunities.org`) using HTTP Basic auth; decommissioned 30 May 2026.
|
||
_Avoid_: legacy API
|
||
|
||
**New EPC API**:
|
||
The replacement government API (`api.get-energy-performance-data.communities.gov.uk`) using Bearer Token auth.
|
||
_Avoid_: new API, current API
|
||
|
||
**Bearer Token**:
|
||
The auth credential required by the New EPC API; stored in the `EPC_AUTH_TOKEN` environment variable.
|
||
_Avoid_: API key, auth token, secret
|
||
|
||
## Relationships
|
||
|
||
- A **Property** represents a single physical dwelling for modelling; identified by `(portfolio_id, UPRN)` or `(portfolio_id, landlord_property_id)`.
|
||
- A **Property** has zero or more **EPCs** across time, exactly one **Effective EPC**, zero or one set of **Site Notes**, and zero or one set of **Landlord Overrides**.
|
||
- An **EPC** belongs to exactly one **Property** and has one **Certificate Number**.
|
||
- An **EPC** carries an **EPC Band** and is identifiable by its **Registration Date**; the most recent one is the current.
|
||
- A **UPRN** identifies a physical dwelling permanently; it does not change when the property changes owner — but each portfolio gets its own **Property** keyed against it.
|
||
- When a **Property** has both **Site Notes** and a public **EPC**, the newer of the two derives the **Effective EPC**. **Landlord Overrides** apply only when the **EPC** is the source — never when **Site Notes** are.
|
||
- A Property's **Baseline Performance** holds two halves: **Lodged Performance** (the gov register's SAP / band / carbon / heat) and **Effective Performance** (what the modelling pipeline scored against). The two are equal unless **Rebaselining** fires.
|
||
- **Rebaselining** produces **Effective Performance** by ML re-prediction when either (a) the Effective EPC was lodged under a pre-SAP10 schema, or (b) the Effective EPC's physical state diverges from the lodged EPC. **Lodged Performance** is never overwritten.
|
||
- **EPC Energy Derivation** contributes the annual kWh, fuel split, and bills on every Property unconditionally, reading current **Fuel Rates** and **Carbon Factors** from their respective repos.
|
||
- The **EPC Prediction Service** uses **Comparable Properties** for both gap-filling and producing **EPC Anomaly Flags**.
|
||
- A **Scenario** carries one or more ordered **Scenario Phases**. Triggering the model against N Scenarios produces N **Plans** per Property; each Plan carries an ordered list of **Plan Phases** matching the Scenario's shape.
|
||
- Each **Plan Phase** holds its **Optimised Package**, the ending state snapshot, and any **Rolled-over Options** that flow as candidates into the next Plan Phase. A single-phase Scenario is one Scenario Phase with all measure types allowed; the same machinery handles it.
|
||
- A **Scenario Snapshot** is pinned at trigger time per (task, scenario) so mid-run edits to the live Scenario do not affect an in-flight modelling job.
|
||
- A **Recommendation** references one **Measure Type** and carries property-specific cost and impact.
|
||
- **Address Matching** uses a **User Address** and **Postcode** to find a **UPRN** by scoring **UPRN Candidates** from an EPC search. A **Lexirank** of 1 with no **Ambiguous Match** and a **Lexiscore** ≥ the **Score Threshold** produces a **Best Match**.
|
||
|
||
## Example dialogue
|
||
|
||
> **Dev:** "A landlord uploads a corrected boiler for one of their properties. What happens?"
|
||
>
|
||
> **Domain expert:** "That's a **Landlord Override** on the heating fields. Save it against the **Property**. The **Effective EPC** has changed, so **Rebaselining** runs to re-predict SAP / carbon / heat, and **EPC Energy Derivation** re-runs to update kWh / bills based on the new fuel deduction. With fresh **Baseline Performance** we regenerate **Recommendations**."
|
||
|
||
> **Dev:** "What if the same Property also has Site Notes?"
|
||
>
|
||
> **Domain expert:** "**Site Notes** supersede the public **EPC**, so **Landlord Overrides** don't apply. We model from the **Site Notes** version of the **Effective EPC**. If the public **EPC** is newer than the **Site Notes**, that's the one exception — we use the newer one."
|
||
|
||
> **Dev:** "After modelling we end up with a list of measures. Which ones get installed?"
|
||
>
|
||
> **Domain expert:** "The **Optimiser Service** picks the **Optimised Package** — a subset of **Recommendations** that hits the **Scenario** goal within budget. The rest stay in the **Plan** as alternatives the user can swap in."
|
||
|
||
> **Dev:** "I'm looking at a property where the EPC says cavity walls but every other house on the street has solid. Is that a bug?"
|
||
>
|
||
> **Domain expert:** "That's an **EPC Anomaly Flag**. We compute it against the **Comparable Properties** for that postcode. It's advisory — the UI surfaces it and the landlord can apply a **Landlord Override** if it's wrong."
|
||
|
||
> **Dev:** "The property card shows two SAP scores side by side. Why?"
|
||
>
|
||
> **Domain expert:** "Those are **Lodged Performance** and **Effective Performance**. **Lodged** is what the gov register says — the EPC was rated under SAP 2012. **Effective** is what we scored against — we ran **Rebaselining** to predict the SAP10-equivalent rating because the methodology changed. Both stay on the **Baseline Performance** so users can see what's on record and what we're modelling against."
|
||
|
||
> **Dev:** "A landlord wants a 3-year retrofit plan — fabric work this year, heat pump next, solar after. How do we model that?"
|
||
>
|
||
> **Domain expert:** "Three **Scenario Phases** in one **Scenario**. Phase 1 allows fabric measures with this year's budget, phase 2 allows the heat pump with next year's budget, phase 3 allows solar. When we model, the **Optimiser Service** runs per phase against the rolling state — the heat pump is scored against the post-insulation property, not the original one. Each **Plan Phase** captures the **Optimised Package** plus the ending SAP / bills, and any **Rolled-over Options** that didn't make this phase's budget become candidates next phase."
|
||
|
||
## Flagged ambiguities
|
||
|
||
- **"property"** was historically warned against in favour of "dwelling"; that has been inverted. **Property** is now canonical for the Ara domain aggregate. Legacy code still uses "dwelling" in places — treat as alias.
|
||
- **"energy assessment"** in the existing codebase (`energy_assessment_functions`, `energy_assessments_by_uprn`) refers to what is now canonically called **Site Notes**. New code uses **Site Notes**.
|
||
- **"patch"** / `patch_epc` in the existing codebase has been merged into **Landlord Overrides**; the original concept is deprecated.
|
||
- **"already_installed measures"** in the existing codebase is likely subsumed by **Landlord Overrides** ("we have a heat pump now" → override the heating fields). Final call deferred to implementation.
|
||
- **"address"** appears as both the raw **User Address** (free-text) and a structured field on an **EPC Search Result** (normalised lines). Always qualify: "user address" vs "EPC address" or "address line 1".
|
||
- **"score"** is used for `AddressMatch.score()` output, the `lexiscore` column, and informally. Prefer **Lexiscore** in domain discussions; reserve "score" for method-level code comments.
|
||
- **"user_inputed_address"** in `backend/address2UPRN/main.py` is a misspelling and a synonym for **User Address** — the canonical term. New code should use `user_address`.
|
||
- **"EPC"** is overloaded as both the document and the rating band letter. Use **EPC** for the document, **EPC Band** for the letter.
|
||
- **"re-scoring"** has two meanings in the codebase — **Rebaselining** (re-predicting baseline performance after an EPC change) and post-optimisation measure re-prediction. Prefer **Rebaselining** for the former; for the latter, the **Optimiser Service** step does its own scoring without a special name.
|
||
- **"phase"** appears in two unrelated contexts: as cut-over timeline language in the PRD ("Phase 0 — Status quo", "Phase 1 — Forced cut-over") and as a domain concept in **Scenario Phase** / **Plan Phase**. Only the latter is a glossary term; cut-over phases are project-management vocabulary that does not enter code.
|
||
- **"stale"** appears in two senses: cache-freshness ("a Repo record is stale and the orchestrator should refetch") — a legitimate operational concept; and as loose shorthand for the EPC's recorded cost fields being unusable. The cost fields are not stale — they are pinned to the inspection-date fuel rates by design. Use "pinned to inspection date" or "pre-SAP10 schema" (whichever applies) instead.
|