Model/UBIQUITOUS_LANGUAGE.md
2026-06-01 16:32:48 +00:00

9.5 KiB
Raw Permalink Blame History

Ubiquitous Language

Domain terminology glossary for this project. Generated and maintained by the /ubiquitous-language Claude Code skill.

Invoke /ubiquitous-language in any session to extract new terms from the conversation, flag ambiguities, and update this file with canonical definitions.


Energy Performance Certificates

Term Definition Aliases to avoid
EPC An Energy Performance Certificate — a government-issued document rating a dwelling's energy efficiency from A (best) to G (worst). "energy certificate", "energy report"
Certificate Number The unique identifier assigned to an EPC by the government registry. "cert number", "EPC ID"
Registration Date The date an EPC was lodged with the government register; used to identify the most recent certificate for a property. "assessment date", "submission date"
EPC Band A single letter AG representing a property's current or potential energy efficiency rating. "energy rating", "EPC grade", "EPC score"
Schema Type The versioned RdSAP or SAP schema that describes the structure of a certificate's raw data (e.g. RdSAP-Schema-21.0.1). "schema version", "EPC format"
Domestic Certificate An EPC issued for a residential dwelling, as opposed to a commercial one. "residential EPC", "home EPC"

Properties and Addresses

Term Definition Aliases to avoid
UPRN Unique Property Reference Number — the government-issued permanent identifier for a physical address in the UK. "property ID", "address ID", "code"
Postcode A UK postal code used to group nearby addresses; the primary search key for finding EPC records. "zip code", "postal code"
Unstandardised Address A frozen dataclass (domain.addresses.unstandardised_address.UnstandardisedAddress) capturing a single address exactly as a customer supplied it, before any standardisation: a free-text address line (intentionally NOT normalised), a canonical postcode (a Postcode value object, sanitised on construction), an optional org_reference (the customer's own identifier for the property), and additional_info (the full source row — every column of the customer's upload, preserved verbatim). "user address", "asset list", "raw address", "landlord address", "Hyde address"
Address List A nominal NewType over list[UnstandardisedAddress] (domain.addresses.unstandardised_address.AddressList) — a batch of unstandardised addresses, such as one customer's bulk-onboarding upload or a postcode-grouped sub-batch produced for downstream processing. Being nominal, it is constructed explicitly: AddressList([...]). It is the raw input to ingestion; the standardised output is a Standardised Asset List. "asset list", "Hyde address list", "user addresses"
Standardised Asset List (SAL) A customer's property portfolio after ingestion has cleaned and standardised it — each property carrying a canonical field set (UPRN, standardised address, postcode, property type, built form, …). It is the standardised output of the pipeline whose raw input is an Address List of Unstandardised Addresses; generated by the SALOrchestrator. (Legacy implementation: asset_list.AssetList via load_standardised_asset_list.) "address list" (that is the raw input), "asset register", "portfolio list"
Dwelling A single residential unit that can hold an EPC — a house, flat, or maisonette. "property", "unit", "home"

Address Matching

Term Definition Aliases to avoid
Lexiscore A similarity score in [0, 1] between an unstandardised address and a candidate EPC address; combines token overlap and character-level similarity. "score", "match score", "similarity"
Lexirank Dense rank of candidates sorted by lexiscore descending; rank 1 = best match. "rank", "position"
UPRN Candidate An EPC search result that is a plausible match for a given unstandardised address, before scoring decides the winner. "match candidate", "result"
Score Threshold The minimum lexiscore (currently 0.6) below which no match is returned even if a candidate exists. "minimum score", "cutoff"
Ambiguous Match A matching outcome where two or more candidates share lexirank 1, making it impossible to select a unique winner. "tie", "draw", "duplicate"
Best Match The single UPRN candidate with lexirank 1 that meets or exceeds the score threshold. "winner", "top result"

API and Integration

Term Definition Aliases to avoid
EPC Search Result A lightweight record returned by the government domestic search endpoint — contains address lines, postcode, UPRN, band, and certificate number but not the full certificate data. "search row", "EPC row", "result"
EPC Property Data The fully mapped domain object produced after fetching and parsing a complete EPC certificate. "EPC data", "certificate data", "parsed EPC"
Old EPC API The retired government API (epc.opendatacommunities.org) using HTTP Basic auth; decommissioned May 2026. "legacy API"
New EPC API The replacement government API (api.get-energy-performance-data.communities.gov.uk) using Bearer token auth. "new API", "current API"
Bearer Token The auth credential required by the new EPC API; stored in the EPC_AUTH_TOKEN environment variable. "API key", "auth token", "secret"

Relationships

  • An EPC belongs to exactly one Dwelling and has one Certificate Number.
  • A Dwelling may have multiple EPCs across time; the one with the most recent Registration Date is the current one.
  • A UPRN identifies a Dwelling permanently; it does not change when the property changes owner.
  • An EPC Search Result is a summary; it points to a full EPC via its Certificate Number.
  • An Address List is an ordered batch of Unstandardised Addresses; a customer's bulk-onboarding upload arrives as one.
  • Ingestion turns an Address List (raw input) into a Standardised Asset List (standardised output) — the SAL Orchestrator drives this.
  • Address Matching uses an Unstandardised Address and Postcode to find a UPRN by scoring UPRN Candidates from an EPC search.
  • A Lexirank of 1 with no Ambiguous Match and a Lexiscore ≥ the Score Threshold produces a Best Match.

Example dialogue

Dev: "We have an unstandardised address and postcode. How do we find the UPRN?"

Domain expert: "Search the New EPC API by Postcode — you get back a list of EPC Search Results for that area. Each one has an address and a UPRN. Score each against the Unstandardised Address using the Lexiscore. If the top UPRN Candidate scores above the Score Threshold and there's no Ambiguous Match, that's your Best Match."

Dev: "What if two results share the same address line 1?"

Domain expert: "That's an Ambiguous Match — two candidates at Lexirank 1. Fall back to scoring on the full address using all address lines joined together. If that still ties, return nothing."

Dev: "Once we have the best match, do we use the UPRN or fetch the full EPC?"

Domain expert: "Depends on what you need. The EPC Search Result gives you the EPC Band and Certificate Number. If you need energy efficiency detail, use the Certificate Number to fetch the full EPC Property Data."

Flagged ambiguities

  • "address" appears in several senses: the Unstandardised Address dataclass (one customer-supplied address before standardisation), its free-text address field, and the normalised address lines on an EPC Search Result. Always qualify: "unstandardised address" vs "EPC address" or "address line 1". Within domain/addresses/, the dataclass is Unstandardised Address; in upstream ingestion contexts (CSV columns, SQS payloads) "address" may still mean the bare free-text string.
  • "score" is used for the AddressMatch.score() function output, the lexiscore DataFrame column, and informally in conversation. Prefer Lexiscore in domain discussions; reserve "score" for method-level code comments.
  • "user_inputed_address" (and user_address) in backend/address2UPRN/ is legacy naming — a misspelled synonym for what is now the Unstandardised Address. That address-matching code has not been renamed; new code should use Unstandardised Address.
  • "Hyde address list" — "Hyde" is the name of one customer, not a domain concept. A domain expert may say "the Hyde address list" because Hyde is the customer in front of them, but the generalised term is Address List (and Unstandardised Address for a single item). A customer's identity is data — it belongs in org_reference or additional_info, never in a type or module name.
  • "address list" vs "asset list" — opposite ends of the ingestion pipeline; do not conflate them. An Address List is the raw input (unstandardised addresses as the customer supplied them); a Standardised Asset List is the standardised output. The historical AssetList dataclass (now Unstandardised Address) misnamed the input an "asset list" — that mistake is what the rename corrected.
  • "EPC" is overloaded as both the document (an Energy Performance Certificate) and the rating band letter. Use EPC for the document and EPC Band for the letter.