9.2 KiB
Context
This document captures the domain language used in this project. Terms here are the canonical ones — when more than one word exists for a concept, we pick one and treat the others as aliases to avoid.
This file grows as terms are resolved during design conversations. Concepts that haven't been examined yet are not listed.
Language
Bulk upload
BulkUpload: A user-supplied spreadsheet of addresses for a Portfolio, transformed and matched to UPRNs before being inserted as Properties. Has an explicit lifecycle from upload through finalisation. Avoid: import, batch, file upload, ingest
ColumnMapping:
The user's declaration of which spreadsheet column means what (e.g. column "Property Address" means address_1). Stored as JSON on the BulkUpload row.
Avoid: schema, header map, field mapping
UPRN: Unique Property Reference Number — the UK national identifier for an address. Address matching attaches a UPRN to each row where possible.
Address matching: The pipeline stage that splits the source file by postcode, looks up UPRNs, and produces matched-address output. Triggered via FastAPI. Avoid: postcode lookup, address resolution, address lookup
Combiner: The pipeline stage that aggregates the per-postcode address-matching outputs into a single combined CSV in S3, ready for review. Avoid: aggregator, merger
Finalise: The terminal action that reads the combiner output, inserts rows as Properties on the Portfolio, and decides whether the BulkUpload needs further review. Avoid: import, commit, ingest
Landlord overrides
Landlord: The housing association supplying a Portfolio's BulkUploads. A Landlord knows facts about their properties that EPC data doesn't (e.g. that a cavity has been filled), and those facts take precedence when computing an assessment. Avoid: customer, client, owner, organisation (Organisation is a separate, broader entity)
Landlord override: A landlord-supplied fact about a property that takes precedence over EPC-derived defaults when computing an assessment. The end-to-end Landlord override journey has two layers — a VocabularyMapping layer (this glossary entry below) and a per-Property fact layer (the Property override, below). Avoid: customer data, manual override, landlord data
Property override:
The per-Property fact layer — one resolved fact per (Property, Building part, component), where component is one of wall_type/roof_type/property_type/built_form_type. Holds a snapshot of the resolved enum value (a denormalised copy of the VocabularyMapping outcome at finalise time, so two Properties sharing a description can later diverge), plus the original spreadsheet text it resolved from. Materialised by the finaliser; see ADR-0005. (Table created; population is follow-up work.)
Avoid: per-property mapping, property fact, override row
VocabularyMapping:
The translation from a Landlord's free-text description in a BulkUpload column (e.g. "cavity: filledcavity") to a canonical domain enum value (e.g. WallType.CAVITY). Produced by a ColumnClassifier (today an LLM, tomorrow possibly a lookup table or rules engine) in the Model service. Stored per-Portfolio, one row per (category, description). A row carries provenance (classifier or user) so user overrides survive re-classification.
Avoid: column mapping (that's a separate concept — see ColumnMapping above), classification, dictionary
Building parts
Building part:
One physically distinct part of a dwelling described by a single entry within a multi-valued cell. A dwelling is one Main building plus zero or more Extensions. Per-part descriptions appear as comma-separated entries in physical-element columns (e.g. Walls, Roofs); whole-dwelling columns (e.g. Property Type) carry a single entry and are not split per part.
Avoid: annexe, unit, section, dwelling part
Main building: The principal building part of a dwelling — exactly one per address. The others are Extensions.
Extension: A building part that is not the Main building, numbered Extension 1 … Extension N-1 for an N-entry address. Avoid: annexe, addition, outbuilding
Multi-entry:
The property of a BulkUpload row whose physical-element cells hold more than one comma-separated entry, one per Building part. Always intra-cell in our data — never multiple rows sharing one address/UPRN. Within a row, the multi-valued columns agree on entry-count, so position i is the same Building part across every multi-valued column.
Avoid: multi-row, multi-record, duplicate address
Building-part ordering (a.k.a. ordering):
The user's declaration, captured once per file, of which list-position maps to which Building part — because the entry order is a consistent per-file mistake ("A, B" could be [Main, Extension 1] or [Extension 1, Main]). Stored per entry-count as a permutation. See ADR-0004.
Avoid: sort order, sequence, column mapping
Lifecycle
A BulkUpload moves through these statuses:
ready_for_processing
→ mapping_complete (user submits ColumnMapping; Next.js writes)
→ processing (Address matching triggered; Next.js writes)
→ combining (Combiner stage running; FastAPI writes directly)
→ awaiting_review (Combiner output in S3; FastAPI writes directly)
→ finalising (Finalise dispatched; Next.js writes via compare-and-swap)
→ complete (Finaliser succeeded; FastAPI/Lambda writes directly)
→ failed (Finaliser failed; FastAPI/Lambda writes directly)
complete and failed are terminal. finalising is the in-flight state of the
async finaliser (mirrors combining); the UI renders it as "Uploading to ARA". See
ADR-0005.
Re-mapping (PATCHing columnMapping) is legal only in ready_for_processing and mapping_complete. Any later state rejects with 409.
Two writers: Next.js owns transitions out of mapping_complete, into processing, and the awaiting_review → finalising compare-and-swap at Finalise dispatch. FastAPI/Lambda owns combining, awaiting_review, and the terminal finalising → complete/failed — writing them direct to the DB during the combiner and finaliser runs. The BulkUpload aggregate observes both. See ADR-0005.
At awaiting_review, Finalise is gated (not a new status — a precondition on the action): when classifier columns were mapped the user must acknowledge the classification-verification step, and when the file is Multi-entry they must confirm the Building-part ordering. See ADR-0004.
See ADR-0001 for the deliberate "not yet" decisions baked into this lifecycle.
Relationships
- A Portfolio has many BulkUploads.
- A BulkUpload produces zero or more Properties when finalised.
- A BulkUpload has at most one Task (the orchestration handle for the FastAPI pipeline run); a Task has many SubTasks (one per pipeline stage: address matching, combiner).
- A Portfolio has many VocabularyMappings — one row per
(category, description)it has ever encountered across all its BulkUploads. See ADR-0002.
Baseline performance
Lodged performance: The SAP score, EPC band, CO₂ emissions, and primary energy intensity as submitted to the government EPC register. Ground truth from the register; never modified. Avoid: original performance, registered performance
Effective performance: The SAP score (and associated metrics) that the modelling engine actually uses as its baseline. Usually equals Lodged performance, but differs when a Landlord override or data-quality issue makes the lodged certificate unreliable — triggering a Rebaseline. Avoid: current performance, adjusted performance
Rebaseline:
The act of substituting a corrected set of performance metrics in place of the Lodged values. Recorded on property_baseline_performance with a rebaseline_reason enum value: none, pre_sap10, physical_state_changed, or both.
Avoid: override, adjustment, correction
Example dialogue
Dev: "If the Combiner finishes but the user hasn't clicked Finalise, what does the user see?" Domain expert: "The BulkUpload sits in
awaiting_review. The frontend polls and shows a 'review and confirm' button. Nothing's been written to Properties yet."Dev: "And if Finalise runs and 30% of rows have no UPRN?" Domain expert: "Those still get imported as Properties — just without a UPRN — and the BulkUpload moves to
complete. Manual cleanup happens later in the property table."
Flagged ambiguities
- "Upload" is used in the codebase to mean both the file-on-S3 and the BulkUpload row. We standardise on BulkUpload for the row; the file is just "the source file."
- "Onboarding" appears in some route paths (
bulk_onboarding_inputs/...) but isn't part of this glossary — we use BulkUpload end-to-end.