mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
Pin the bills design from a /grill-with-docs session: - ADR-0014: whole-home annual bill from SAP10 Calculation's delivered kWh per end use, re-priced at real Fuel Rates (NOT the calculator's SAP-notional total_fuel_cost_gbp, which is RdSAP Table 32 standardised prices ~half real electricity). Fuel enum + FuelRates + FuelRatesRepository static snapshot; per-section + total flat columns; raise on unpriced fuel (house coal / heat network are the named gaps). - ADR-0013 amendment: the shadow stepping-stone is collapsed — the calculator is load-bearing now. effective=calculated for sap_version<10.2 (StubRebaseliner floor 10.0->10.2); >=10.2 keeps lodged + logs divergence; a strict-raise aborts the batch (load-bearing for bills regardless of version). - CONTEXT: EPC Energy Derivation -> Bill Derivation (no "service" suffix); Baseline Performance energy block = per-end-use kWh + per-section bill + total; Fuel Rates = committed static snapshot; Rebaselining trigger threshold 10.2. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
326 lines
31 KiB
Markdown
326 lines
31 KiB
Markdown
# Ara
|
||
|
||
The Domna product for domestic retrofit modelling: ingests open-source EPC data, lets users correct or supersede it with their own surveys, and produces optimised retrofit packages for each property in a portfolio.
|
||
|
||
## Language
|
||
|
||
### Product
|
||
|
||
**Ara**:
|
||
The Domna product. Latin for "the altar"; named under Domna's classical-naming convention. Covers both the modelling product and the backend that powers it.
|
||
_Avoid_: ARA (acronym style), v2 backend, the new backend
|
||
|
||
**Domna**:
|
||
The company. Roman name; sibling to Ara in the same naming convention.
|
||
|
||
### Energy Performance Certificates
|
||
|
||
**EPC**:
|
||
An Energy Performance Certificate — a government-issued document rating a dwelling's energy efficiency from A (best) to G (worst).
|
||
_Avoid_: energy certificate, energy report
|
||
|
||
**Certificate Number**:
|
||
The unique identifier assigned to an EPC by the government registry.
|
||
_Avoid_: cert number, EPC ID
|
||
|
||
**Registration Date**:
|
||
The date an EPC was lodged with the government register; used to identify the most recent certificate for a property.
|
||
_Avoid_: assessment date, submission date
|
||
|
||
**EPC Band**:
|
||
A single letter A–G representing a property's current or potential energy efficiency rating.
|
||
_Avoid_: energy rating, EPC grade, EPC score
|
||
|
||
**Schema Type**:
|
||
The versioned RdSAP or SAP schema that describes the structure of an EPC's raw data (e.g. `RdSAP-Schema-21.0.1`).
|
||
_Avoid_: schema version, EPC format
|
||
|
||
**Domestic Certificate**:
|
||
An EPC issued for a residential dwelling, as opposed to a commercial one.
|
||
_Avoid_: residential EPC, home EPC
|
||
|
||
### Properties and addresses
|
||
|
||
**Property**:
|
||
The Ara domain aggregate representing a single dwelling under modelling: its identity, source data, enrichments, and modelling outputs.
|
||
_Avoid_: dwelling, unit, home, asset
|
||
|
||
**Properties**:
|
||
A first-class collection of Property objects; the unit of bulk operation in services.
|
||
_Avoid_: property list, batch (used for SQS chunks)
|
||
|
||
**UPRN**:
|
||
Unique Property Reference Number — the government-issued permanent identifier for a physical address in the UK.
|
||
_Avoid_: property ID, address ID, code
|
||
|
||
**Postcode**:
|
||
A UK postal code used to group nearby addresses; the primary search key for finding EPC records.
|
||
_Avoid_: zip code, postal code
|
||
|
||
**User Address**:
|
||
A structured dataclass (`domain.addresses.user_address.UserAddress`) capturing a customer-supplied address: a free-text `user_address` line, a canonical `postcode` (sanitised on construction), and an optional `internal_reference`. The bare string sense — the raw free-text address line as it arrives from upstream ingestion, before being wrapped — remains valid when discussing CSV columns, API payloads, or other upstream contexts; in domain code, prefer the dataclass.
|
||
_Avoid_: user input, raw address, user_inputed_address
|
||
|
||
**Comparable Properties**:
|
||
The reference cohort matched to a target Property by both geographic proximity (postcode prefix / UPRN range) and physical similarity (property type, built form, age band); used by the EPC Prediction Service for gap-filling and anomaly detection.
|
||
_Avoid_: neighbours, similar properties, peer set
|
||
|
||
### Source data
|
||
|
||
**Site Notes**:
|
||
The full-coverage record produced by a Domna survey of a single Property; carries every EPC field the modelling pipeline requires, and when present supersedes the public EPC for that Property — except when the public EPC is newer.
|
||
_Avoid_: energy assessment, site survey, field survey, Domna survey, Hestia survey
|
||
|
||
**Landlord Overrides**:
|
||
Property data supplied by a landlord that may correct or supplement the public EPC for a single Property; triggers Rebaselining when applied; not applicable when Site Notes are present.
|
||
_Avoid_: patches (deprecated), corrections, manual EPC, edits
|
||
|
||
### Modelling
|
||
|
||
**Effective EPC**:
|
||
The EpcPropertyData scored by the modelling pipeline for a single Property, derived from either Site Notes alone or the public EPC with Landlord Overrides applied; carries source-derived physical fields and originally recorded performance values, with model-rebaselined performance held separately in Baseline Performance.
|
||
_Avoid_: modelling EPC, working EPC, resolved EPC, derived EPC
|
||
|
||
**Rebaselining**:
|
||
Re-predicting a Property's SAP score, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh via **SAP10 Calculation** (the deterministic `Sap10Calculator`, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013) so the modelling pipeline scores it against the current SAP10 methodology. Triggered when either (a) the Effective EPC was lodged under a methodology the calculator supersedes (`sap_version < 10.2`, the calculator's target spec), so the recorded scores reflect a superseded methodology, or (b) Site Notes / Landlord Overrides changed the physical state of the Property (walls / heating / windows / etc.) so the lodged scores no longer reflect what's installed. Both triggers may fire together. Produces Effective Performance; Lodged Performance is preserved unchanged. kWh is included as ML targets per ADR-0007 — see [[epc-ml-transform]].
|
||
_Avoid_: re-scoring, re-prediction, performance recomputation, refresh (for cache-freshness)
|
||
|
||
**Baseline Performance**:
|
||
A Property's current performance aggregate, holding both Lodged Performance and Effective Performance plus the energy block: delivered kWh **per end use** (heating, hot water, lighting, appliances, cooking, pumps/fans, …) and the **annual bill** composed into per-section costs plus a total, produced by **Bill Derivation** from SAP10 Calculation's per-end-use kWh × current Fuel Rates. Persisted as one row (flat typed columns, per-section kWh + cost + total); surfaced as one block in the UI.
|
||
_Avoid_: baseline predictions, predicted baseline, rebaselined values
|
||
|
||
**Lodged Performance**:
|
||
The SAP / EPC Band / carbon emissions / Primary Energy Intensity recorded on the public EPC (or the Site Notes' as-surveyed values when Site Notes are the source) — unmodified by modelling. The half of Baseline Performance that says "what the government register says about this Property".
|
||
_Avoid_: original performance, raw EPC values, recorded baseline
|
||
|
||
**Effective Performance**:
|
||
The SAP / EPC Band / carbon emissions / Primary Energy Intensity the modelling pipeline actually scored against — equal to Lodged Performance when no Rebaselining trigger fires, replaced by **SAP10 Calculation** output (the deterministic `Sap10Calculator`, which superseded the old ML-API rebaseliner; an ML residual head over the calculator is future — ADR-0009/0013) when triggered. The half of Baseline Performance that says "what we modelled".
|
||
_Avoid_: modelled performance, rebaselined performance (only correct when rebaselining ran), scored values
|
||
|
||
**Calculated SAP10 Performance**:
|
||
The SAP score, EPC Band, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh produced by **SAP10 Calculation** from a Property's EpcPropertyData. It is **not** a separately-persisted third value-set beside Lodged and Effective: in every baselining scenario the calculator's output *is* the **Effective Performance** (real lodged SAP10 EPC with no overrides ⇒ Calculated = Lodged = Effective; overrides or an estimated / pre-SAP10 EPC ⇒ Calculated = Effective, there being no lodged SAP10 figure to compare against). The calculator is therefore the mechanism that produces Effective Performance, having superseded the old ML-API rebaseliner. The calculator is **load-bearing**: for `sap_version < 10.2` (lodged under a superseded methodology) its output *is* the Effective Performance; for `≥ 10.2` the API's lodged figures are kept and the calculator runs **alongside, logging any divergence** (SAP > 0.5, PEUI/CO2 beyond tolerance) as a validation signal (see [[sap-spec-version]]). It is load-bearing for **Bill Derivation regardless of version** (the EPC lodges no per-end-use kWh), so a calculator strict-raise **aborts the batch** and the un-mapped cert is fixed immediately. ADR-0009 introduced the term, amended by ADR-0010, realized by ADR-0013 (whose shadow stepping-stone is superseded) and ADR-0014.
|
||
_Avoid_: calculator output, computed performance, worksheet performance, SAP10 output, calculated value-set (it is not a stored third set)
|
||
|
||
**SAP10 Calculation**:
|
||
The process that runs the deterministic SAP 10.2 (14-03-2025 amendment) worksheet over a Property's EpcPropertyData and emits **Calculated SAP10 Performance**. Implemented by the `Sap10Calculator` service class in `domain/sap10_calculator/` (`calculator.py`). Reads cert fabric/heating/geometry fields, applies the RdSAP 10 (10-06-2025) cert→input mapping, executes the 12-month heat balance per SAP 10.2 §§1-14, looks up boiler/heat-pump performance in the **PCDB** when the cert lodges a product index, and returns a `SapResult` carrying the five Calculated SAP10 Performance quantities plus a monthly breakdown and worksheet-line audit trail. Distinct from **Rebaselining**, which is ML-based. ADR-0009 originally targeted SAP 10.3 (13-01-2026); ADR-0010 retargets to SAP 10.2 (14-03-2025) until the cert corpus migrates.
|
||
_Avoid_: SAP calculation (ambiguous with the gov calculator), SAP scoring, calculator run, SAP 10.3 calculation (active target is 10.2 — see [[sap-spec-version]])
|
||
|
||
**SAP Spec Version**:
|
||
The dated revision of the SAP specification that produced a given SAP/PEUI/CO2 value. Domain-meaningful because the same EpcPropertyData yields different `sap_score` under different spec versions — fuel-price tables, CO2 factors, PCDB references, and rating-equation deflators all change between revisions. **Lodged Performance** carries the version current when the cert was lodged (mostly SAP 10.1 / SAP 10.2 pre- and post-14-03-2025 amendment in the corpus). **Calculated SAP10 Performance** is locked to SAP 10.2 (14-03-2025). A 1-to-1 Lodged-vs-Calculated comparison therefore only makes sense within a **Validation Cohort** of certs lodged on the same spec version.
|
||
_Avoid_: SAP version (ambiguous with the `sap_version` field on the cert, which only carries the major version like 10.2 — not the amendment date), spec revision
|
||
|
||
**Validation Cohort**:
|
||
The subset of corpus certs used to validate **SAP10 Calculation** against **Lodged Performance**, filtered to certs lodged after the calculator's target **SAP Spec Version** rolled out in commercial assessor software — currently `inspection_date ≥ 2025-07-01` (a buffer past 14-03-2025 to allow vendor rollout). Smaller than the full corpus but each cert is comparable under the same spec, so probe MAE is a clean signal of calculator-vs-spec correctness rather than spec-version mixture noise. ADR-0010.
|
||
_Avoid_: parity cohort, validation set, corpus sample
|
||
|
||
**Measure Application**:
|
||
The process that translates an Optimised Package into cert-field changes and produces the "ending state snapshot" EpcPropertyData that Plan Phase persists. Implemented by the `MeasureApplicator` service class in `domain/sap/` (or a sibling package). Each Measure Type's translation rules (e.g. `loft_insulation` → `roof_insulation_thickness_mm = 270mm`, `ashp` → `main_heating_details[0]` replacement) live here. Pure function — does not run SAP10 Calculation itself; the caller chains `MeasureApplicator.apply(epc, package) → Sap10Calculator.calculate(post_epc)`. ADR-0009.
|
||
_Avoid_: measure overrides (rejected during ADR-0009 grill — phantom mid-layer), package applier, retrofit simulator
|
||
|
||
**Bill Derivation**:
|
||
The deterministic process that derives a Property's annual energy **bill**, composed into per-end-use sections (heating, hot water, lighting, appliances, cooking, pumps/fans, …) plus a **total**, by pricing **SAP10 Calculation**'s delivered kWh per end use at **current Fuel Rates** — each end use billed at its fuel's rate, rolled up per fuel for **standing charges** (metered fuels only — gas/electricity; oil/LPG/solid have none) minus **SEG** export credit on PV. Implemented by `BillDerivation` in `domain/property_baseline/` (deterministic, ADR-0006). Reads Fuel Rates from a committed static snapshot via `FuelRatesRepository` (no live ETL yet). **Distinct from the calculator's `total_fuel_cost_gbp`**, which is the SAP-rating notional cost at RdSAP Table 32 standardised prices (~half the real electricity price) — not what the household pays. Raises on a fuel it has no rate for (e.g. house coal, heat network). ADR-0014.
|
||
_Avoid_: EPC Energy Derivation (renamed), EpcEnergyDerivationService (no "service" suffix), kWh prediction, baseline kWh, energy estimation
|
||
|
||
**UCL Correction**:
|
||
The per-band linear correction (Few et al. 2023, _Energy & Buildings_ 288 113024) that aligns EPC-modelled Primary Energy Intensity with metered consumption. Folded into ML training labels at fit time (per ADR-0007) rather than applied at runtime — the trained model emits metered-equivalent PEUI directly, avoiding the discontinuities at EPC band boundaries that arose when the per-band linear correction was applied post-prediction. Calibrated against gas-heated, non-PV homes in England and Wales rated under SAP 2012; the current implementation extrapolates it to all properties (open question §15.14).
|
||
_Avoid_: UCL adjustment, energy correction, metered correction
|
||
|
||
**EPC Anomaly Flag**:
|
||
A per-field indicator that a Property's value for an EPC field differs significantly from Comparable Properties; advisory only — surfaces in the UI to prompt user review, does not block modelling.
|
||
_Avoid_: outlier, mismatch, divergence flag
|
||
|
||
### Pipeline composition
|
||
|
||
The modelling backend is composed from three independently-invocable **stage orchestrators**, chained differently per use case. This composability — not a single end-to-end function — is the point: it is what lets the interactive single-property flow pause between stages where the batch flows do not. (Supersedes the monolithic `model_engine`.)
|
||
|
||
**Ingestion**:
|
||
The first stage. Acquires a Property's external source data — the EPC certificate (New EPC API) and Google Solar insights — and resolves its coordinates, then writes everything to repos. Writes only; runs no modelling business logic. Per ADR-0003 nothing downstream reads across this seam by calling back to a source — downstream stages read the persisted data from repos.
|
||
_Avoid_: fetching (a fetch is one source call; Ingestion is the whole write stage), data load
|
||
|
||
**Baseline** (stage):
|
||
The second stage. Reads the persisted source data from repos, hydrates the **Property** aggregate, resolves its **Effective EPC**, and establishes its **Baseline Performance**. Re-scoring after a user override lives here. Distinct from **Baseline Performance** (the aggregate it produces).
|
||
_Avoid_: rebaseline (that is a specific ML trigger — see Rebaselining), enrichment
|
||
|
||
**Modelling** (stage):
|
||
The third stage. Takes the baselined Property plus a set of **Scenarios** and produces **Recommendations** → an **Optimised Package** per **Scenario Phase** → **Plans**, persisted to repos. A separate orchestrator from Baseline so the single-property flow can stop after Baseline and only run Modelling when the user hits "play".
|
||
_Avoid_: scoring (overloaded), recommendation engine
|
||
|
||
**First Run**:
|
||
The use case where a Property has only a row in the property table (post address→UPRN matching) and no existing **Plan**: the pipeline runs Ingestion → Baseline → Modelling end-to-end over a batch. The first sibling lambda being built (`ara_first_run`).
|
||
_Avoid_: initial run, cold run
|
||
|
||
### ML training
|
||
|
||
**EPC ML Transform**:
|
||
The versioned class at `domain/sap10_ml/transform.py` that maps an EpcPropertyData to a fixed-width row of features + targets. The single ML-data contract between this repo and the AutoGluon training repo. Owns the windows compression, building-parts compression, Top-N Code Taxonomy, and UCL folding decisions. Each version is tagged on the deployed scoring lambda; a mismatch is a deploy-time fail.
|
||
_Avoid_: feature builder, ML mapper, EPC vectoriser
|
||
|
||
**Feature Schema Version**:
|
||
The semver version of the EPC ML Transform (e.g. `0.1.0`), included in the parquet output path and the deployed scoring lambda's tag. MAJOR bump when columns are removed or renamed; MINOR when optional columns are added; PATCH for non-behavioural fixes.
|
||
_Avoid_: transform version, schema version (overloaded with the SAP RdSAP schema version on EPCs), model version
|
||
|
||
**Primary Energy Intensity** (**PEUI**):
|
||
A Property's total annual primary energy use per square metre of floor area (kWh/m²/yr), the SAP10 quantity recorded as `energy_consumption_current` on the EPC. Covers all end uses (heating, hot water, lighting, appliances, cooking) weighted by SAP primary energy factors per fuel. The quantity the UCL Correction aligns to metered consumption.
|
||
_Avoid_: heat demand (which colloquially means the building's space heating thermal requirement — a distinct concept), energy demand, total energy use, kWh per square metre
|
||
|
||
**PV Capacity Source**:
|
||
A flag on the EPC ML Transform feature set indicating whether a Property's PV capacity is `measured` (from `sap_energy_source.photovoltaic_supply[].peak_power`), `estimated_from_roof_area` (the `percent_roof_area` fallback used when the surveyor could not confirm array configuration), or `none` (no PV present). Lets the model weight the correct capacity signal per property.
|
||
_Avoid_: PV source, PV configuration type, solar source
|
||
|
||
**Top-N Code Taxonomy**:
|
||
The empirical top-N SAP code list (covering ~95% of mass on the training sample) committed by the EPC ML Transform for each list-aggregated categorical field (`wall_construction`, `glazing_type`, `frame_material`, etc.). Rare codes go into a per-field `_other` bucket. The taxonomy is locked at each Feature Schema Version; changes warrant a MINOR bump (adding) or MAJOR bump (removing codes).
|
||
_Avoid_: code list, code dictionary, vocab
|
||
|
||
### Reference data
|
||
|
||
**Fuel Rates**:
|
||
The current per-fuel rate (pence/kWh) and standing charge used to compute a Property's bills; time-versioned and regional. Sourced for now from a **committed static snapshot** (national, Ofgem-cap period for gas/electricity + DESNZ/NEP for off-gas fuels), read via `FuelRatesRepository`; an Ofgem-cap ETL automating the refresh is future, not a prerequisite. The Smart Export Guarantee rate sits in the same set as `electricity_export`. Consumed by Bill Derivation.
|
||
_Avoid_: fuel prices (commodity prices, different concept), tariff, energy cost
|
||
|
||
**Carbon Factors**:
|
||
The per-fuel CO2 emission factor (kgCO2e/kWh) used to compute a Property's carbon emissions; time-versioned, refreshed from Defra's annual publication. Consumed by Bill Derivation.
|
||
_Avoid_: emission factors (ambiguous), CO2 rates
|
||
|
||
### Outputs
|
||
|
||
**Scenario**:
|
||
A named portfolio-level retrofit plan, built by a user in the scenario-builder UI and persisted before any modelling fires; carries the overall goal (e.g. Increasing EPC), budget, exclusions, housing type, and an ordered list of Scenario Phases. The model is triggered against one or more Scenarios at once; each Scenario yields one Plan per Property.
|
||
_Avoid_: project, batch, run-set
|
||
|
||
**Scenario Phase**:
|
||
One ordered step inside a Scenario, carrying a measure-type allowlist (e.g. "loft insulation and walls in phase 1; ASHP in phase 2"), an optional phase budget, and an optional phase target. A single-phase Scenario is one Scenario Phase with all measure types allowed and the full budget on it — there is no special-case path.
|
||
_Avoid_: scenario stage, scenario step, tranche
|
||
|
||
**Scenario Snapshot**:
|
||
A frozen copy of a Scenario pinned at trigger time, keyed by (task, scenario); used by the modelling pipeline so mid-run edits to the live Scenario do not affect an in-flight job. Snapshots are read-only and may be garbage-collected after the task completes.
|
||
_Avoid_: scenario version, frozen scenario, pinned scenario
|
||
|
||
**Plan**:
|
||
The per-Property output of one Scenario's modelling run; carries an ordered list of Plan Phases matching the Scenario's Phase shape. A Property modelled against N Scenarios in one trigger ends up with N Plans.
|
||
_Avoid_: recommendation set, output, result
|
||
|
||
**Plan Phase**:
|
||
The per-Property output of one Scenario Phase: the Optimised Package selected for that phase, the ending state snapshot (the Property's SAP / kWh / bills after the package is applied), and any Rolled-over Options that flow as candidates into the next Plan Phase.
|
||
_Avoid_: plan stage, plan step
|
||
|
||
**Rolled-over Options**:
|
||
Recommendations generated but not selected by the Optimiser in a given Plan Phase, that remain eligible as candidates in subsequent Plan Phases. Exact roll-over rule (automatic vs user-marked) is under design.
|
||
_Avoid_: deferred measures, leftover recommendations
|
||
|
||
**Recommendation**:
|
||
A single proposed retrofit measure for a Property, with its cost, SAP impact, kWh savings, carbon savings, and parts list.
|
||
_Avoid_: suggestion, option
|
||
|
||
**Optimised Package**:
|
||
The subset of a Property's Recommendations selected by the Optimiser Service for installation, chosen to satisfy the Scenario's goal subject to budget.
|
||
_Avoid_: selected measures, default measures, optimal solution, recommended bundle
|
||
|
||
**Measure Type**:
|
||
The catalogue classification of a retrofit measure (e.g. `solar_pv`, `loft_insulation`, `ashp`); one or more Recommendations reference the same Measure Type with property-specific cost and impact.
|
||
_Avoid_: measure (ambiguous), category
|
||
|
||
### Address matching
|
||
|
||
**Lexiscore**:
|
||
A similarity score in [0, 1] between a User Address and a candidate EPC address; combines token overlap and character-level similarity.
|
||
_Avoid_: score, match score, similarity
|
||
|
||
**Lexirank**:
|
||
Dense rank of candidates sorted by Lexiscore descending; rank 1 = best match.
|
||
_Avoid_: rank, position
|
||
|
||
**UPRN Candidate**:
|
||
An EPC Search Result that is a plausible match for a given User Address, before scoring decides the winner.
|
||
_Avoid_: match candidate, result
|
||
|
||
**Score Threshold**:
|
||
The minimum Lexiscore (currently 0.6) below which no match is returned even if a candidate exists.
|
||
_Avoid_: minimum score, cutoff
|
||
|
||
**Ambiguous Match**:
|
||
A matching outcome where two or more candidates share Lexirank 1, making it impossible to select a unique winner.
|
||
_Avoid_: tie, draw, duplicate
|
||
|
||
**Best Match**:
|
||
The single UPRN Candidate with Lexirank 1 that meets or exceeds the Score Threshold.
|
||
_Avoid_: winner, top result
|
||
|
||
### API and integration
|
||
|
||
**EPC Search Result**:
|
||
A lightweight record returned by the government domestic search endpoint — address lines, postcode, UPRN, band, and certificate number, but not full certificate data.
|
||
_Avoid_: search row, EPC row, result
|
||
|
||
**EPC Property Data**:
|
||
The fully mapped domain object produced after fetching and parsing a complete EPC certificate; the schema the modelling pipeline operates against.
|
||
_Avoid_: EPC data, certificate data, parsed EPC
|
||
|
||
**Old EPC API**:
|
||
The retired government API (`epc.opendatacommunities.org`) using HTTP Basic auth; decommissioned 30 May 2026.
|
||
_Avoid_: legacy API
|
||
|
||
**New EPC API**:
|
||
The replacement government API (`api.get-energy-performance-data.communities.gov.uk`) using Bearer Token auth.
|
||
_Avoid_: new API, current API
|
||
|
||
**Bearer Token**:
|
||
The auth credential required by the New EPC API; stored in the `EPC_AUTH_TOKEN` environment variable.
|
||
_Avoid_: API key, auth token, secret
|
||
|
||
## Relationships
|
||
|
||
- A **Property** represents a single physical dwelling for modelling; identified by `(portfolio_id, UPRN)` or `(portfolio_id, landlord_property_id)`.
|
||
- A **Property** has zero or more **EPCs** across time, exactly one **Effective EPC**, zero or one set of **Site Notes**, and zero or one set of **Landlord Overrides**.
|
||
- An **EPC** belongs to exactly one **Property** and has one **Certificate Number**.
|
||
- An **EPC** carries an **EPC Band** and is identifiable by its **Registration Date**; the most recent one is the current.
|
||
- A **UPRN** identifies a physical dwelling permanently; it does not change when the property changes owner — but each portfolio gets its own **Property** keyed against it.
|
||
- When a **Property** has both **Site Notes** and a public **EPC**, the newer of the two derives the **Effective EPC**. **Landlord Overrides** apply only when the **EPC** is the source — never when **Site Notes** are.
|
||
- A Property's **Baseline Performance** holds two halves: **Lodged Performance** (the gov register's SAP / band / carbon / heat) and **Effective Performance** (what the modelling pipeline scored against). The two are equal unless **Rebaselining** fires.
|
||
- **Rebaselining** produces **Effective Performance** by ML re-prediction across SAP score, CO2 emissions, Primary Energy Intensity, space heating kWh, and hot water kWh, when either (a) the Effective EPC was lodged under a pre-SAP10 schema, or (b) the Effective EPC's physical state diverges from the lodged EPC. **Lodged Performance** is never overwritten.
|
||
- **Bill Derivation** derives **fuel split** and **bills** from kWh values (sourced from the EPC's `renewable_heat_incentive` fields for baseline SAP10 properties, or from ML when Rebaselining fires), reading current **Fuel Rates** and **Carbon Factors** from their respective repos.
|
||
- The **EPC Prediction Service** uses **Comparable Properties** for both gap-filling and producing **EPC Anomaly Flags**.
|
||
- A **Scenario** carries one or more ordered **Scenario Phases**. Triggering the model against N Scenarios produces N **Plans** per Property; each Plan carries an ordered list of **Plan Phases** matching the Scenario's shape.
|
||
- Each **Plan Phase** holds its **Optimised Package**, the ending state snapshot, and any **Rolled-over Options** that flow as candidates into the next Plan Phase. A single-phase Scenario is one Scenario Phase with all measure types allowed; the same machinery handles it.
|
||
- A **Scenario Snapshot** is pinned at trigger time per (task, scenario) so mid-run edits to the live Scenario do not affect an in-flight modelling job.
|
||
- A **Recommendation** references one **Measure Type** and carries property-specific cost and impact.
|
||
- **Address Matching** uses a **User Address** and **Postcode** to find a **UPRN** by scoring **UPRN Candidates** from an EPC search. A **Lexirank** of 1 with no **Ambiguous Match** and a **Lexiscore** ≥ the **Score Threshold** produces a **Best Match**.
|
||
|
||
## Example dialogue
|
||
|
||
> **Dev:** "A landlord uploads a corrected boiler for one of their properties. What happens?"
|
||
>
|
||
> **Domain expert:** "That's a **Landlord Override** on the heating fields. Save it against the **Property**. The **Effective EPC** has changed, so **Rebaselining** runs to re-predict SAP / carbon / PEUI / space heating kWh / hot water kWh, and **Bill Derivation** re-runs to update the fuel split and bills based on the new kWh values and fuel deduction. With fresh **Baseline Performance** we regenerate **Recommendations**."
|
||
|
||
> **Dev:** "What if the same Property also has Site Notes?"
|
||
>
|
||
> **Domain expert:** "**Site Notes** supersede the public **EPC**, so **Landlord Overrides** don't apply. We model from the **Site Notes** version of the **Effective EPC**. If the public **EPC** is newer than the **Site Notes**, that's the one exception — we use the newer one."
|
||
|
||
> **Dev:** "After modelling we end up with a list of measures. Which ones get installed?"
|
||
>
|
||
> **Domain expert:** "The **Optimiser Service** picks the **Optimised Package** — a subset of **Recommendations** that hits the **Scenario** goal within budget. The rest stay in the **Plan** as alternatives the user can swap in."
|
||
|
||
> **Dev:** "I'm looking at a property where the EPC says cavity walls but every other house on the street has solid. Is that a bug?"
|
||
>
|
||
> **Domain expert:** "That's an **EPC Anomaly Flag**. We compute it against the **Comparable Properties** for that postcode. It's advisory — the UI surfaces it and the landlord can apply a **Landlord Override** if it's wrong."
|
||
|
||
> **Dev:** "The property card shows two SAP scores side by side. Why?"
|
||
>
|
||
> **Domain expert:** "Those are **Lodged Performance** and **Effective Performance**. **Lodged** is what the gov register says — the EPC was rated under SAP 2012. **Effective** is what we scored against — we ran **Rebaselining** to predict the SAP10-equivalent rating because the methodology changed. Both stay on the **Baseline Performance** so users can see what's on record and what we're modelling against."
|
||
|
||
> **Dev:** "A landlord wants a 3-year retrofit plan — fabric work this year, heat pump next, solar after. How do we model that?"
|
||
>
|
||
> **Domain expert:** "Three **Scenario Phases** in one **Scenario**. Phase 1 allows fabric measures with this year's budget, phase 2 allows the heat pump with next year's budget, phase 3 allows solar. When we model, the **Optimiser Service** runs per phase against the rolling state — the heat pump is scored against the post-insulation property, not the original one. Each **Plan Phase** captures the **Optimised Package** plus the ending SAP / bills, and any **Rolled-over Options** that didn't make this phase's budget become candidates next phase."
|
||
|
||
## Flagged ambiguities
|
||
|
||
- **"property"** was historically warned against in favour of "dwelling"; that has been inverted. **Property** is now canonical for the Ara domain aggregate. Legacy code still uses "dwelling" in places — treat as alias.
|
||
- **"energy assessment"** in the existing codebase (`energy_assessment_functions`, `energy_assessments_by_uprn`) refers to what is now canonically called **Site Notes**. New code uses **Site Notes**.
|
||
- **"patch"** / `patch_epc` in the existing codebase has been merged into **Landlord Overrides**; the original concept is deprecated.
|
||
- **"already_installed measures"** in the existing codebase is likely subsumed by **Landlord Overrides** ("we have a heat pump now" → override the heating fields). Final call deferred to implementation.
|
||
- **"address"** appears as both the raw **User Address** (free-text from customer data, or the structured `UserAddress` dataclass that wraps it) and a structured field on an **EPC Search Result** (normalised lines). Always qualify: "user address" vs "EPC address" or "address line 1". Within `domain/`, **User Address** specifically means the `UserAddress` dataclass; in upstream ingestion contexts (CSV columns, SQS payloads) it can still mean the raw string sense.
|
||
- **"score"** is used for `AddressMatch.score()` output, the `lexiscore` column, and informally. Prefer **Lexiscore** in domain discussions; reserve "score" for method-level code comments.
|
||
- **"user_inputed_address"** in `backend/address2UPRN/main.py` is a misspelling and a synonym for **User Address** — the canonical term. New code should use `user_address`.
|
||
- **"EPC"** is overloaded as both the document and the rating band letter. Use **EPC** for the document, **EPC Band** for the letter.
|
||
- **"re-scoring"** has two meanings in the codebase — **Rebaselining** (re-predicting baseline performance after an EPC change) and post-optimisation measure re-prediction. Prefer **Rebaselining** for the former; for the latter, the **Optimiser Service** step does its own scoring without a special name.
|
||
- **"phase"** appears in two unrelated contexts: as cut-over timeline language in the PRD ("Phase 0 — Status quo", "Phase 1 — Forced cut-over") and as a domain concept in **Scenario Phase** / **Plan Phase**. Only the latter is a glossary term; cut-over phases are project-management vocabulary that does not enter code.
|
||
- **"stale"** appears in two senses: cache-freshness ("a Repo record is stale and the orchestrator should refetch") — a legitimate operational concept; and as loose shorthand for the EPC's recorded cost fields being unusable. The cost fields are not stale — they are pinned to the inspection-date fuel rates by design. Use "pinned to inspection date" or "pre-SAP10 schema" (whichever applies) instead.
|