assessment-model/CONTEXT.md
2026-05-26 10:20:29 +00:00

5.3 KiB

Context

This document captures the domain language used in this project. Terms here are the canonical ones — when more than one word exists for a concept, we pick one and treat the others as aliases to avoid.

This file grows as terms are resolved during design conversations. Concepts that haven't been examined yet are not listed.

Language

Bulk upload

BulkUpload: A user-supplied spreadsheet of addresses for a Portfolio, transformed and matched to UPRNs before being inserted as Properties. Has an explicit lifecycle from upload through finalisation. Avoid: import, batch, file upload, ingest

ColumnMapping: The user's declaration of which spreadsheet column means what (e.g. column "Property Address" means address_1). Stored as JSON on the BulkUpload row. Avoid: schema, header map, field mapping

UPRN: Unique Property Reference Number — the UK national identifier for an address. Address matching attaches a UPRN to each row where possible.

Address matching: The pipeline stage that splits the source file by postcode, looks up UPRNs, and produces matched-address output. Triggered via FastAPI. Avoid: postcode lookup, address resolution, address lookup

Combiner: The pipeline stage that aggregates the per-postcode address-matching outputs into a single combined CSV in S3, ready for review. Avoid: aggregator, merger

Finalise: The terminal action that reads the combiner output, inserts rows as Properties on the Portfolio, and decides whether the BulkUpload needs further review. Avoid: import, commit, ingest

Landlord overrides

Landlord: The housing association supplying a Portfolio's BulkUploads. A Landlord knows facts about their properties that EPC data doesn't (e.g. that a cavity has been filled), and those facts take precedence when computing an assessment. Avoid: customer, client, owner, organisation (Organisation is a separate, broader entity)

Landlord override: A landlord-supplied fact about a property that takes precedence over EPC-derived defaults when computing an assessment. The end-to-end Landlord override journey has two layers — a VocabularyMapping layer (this glossary entry below) and a per-Property fact layer (not yet modelled). Avoid: customer data, manual override, landlord data

VocabularyMapping: The translation from a Landlord's free-text description in a BulkUpload column (e.g. "cavity: filledcavity") to a canonical domain enum value (e.g. WallType.CAVITY). Produced by a ColumnClassifier (today an LLM, tomorrow possibly a lookup table or rules engine) in the Model service. Stored per-Portfolio, one row per (category, description). A row carries provenance (classifier or user) so user overrides survive re-classification. Avoid: column mapping (that's a separate concept — see ColumnMapping above), classification, dictionary

Lifecycle

A BulkUpload moves through these statuses:

ready_for_processing
  → mapping_complete         (user submits ColumnMapping; Next.js writes)
    → processing             (Address matching triggered; Next.js writes)
      → combining            (Combiner stage running; FastAPI writes directly)
        → awaiting_review    (Combiner output in S3; FastAPI writes directly)
          → complete         (Finalise succeeded; Next.js writes)
          → failed           (FastAPI reports in-flight failure — schema only, not yet wired)

complete and failed are terminal.

Re-mapping (PATCHing columnMapping) is legal only in ready_for_processing and mapping_complete. Any later state rejects with 409.

Two writers: Next.js owns transitions out of mapping_complete, into processing, and the terminal Finalise outcomes. FastAPI owns combining and awaiting_review — writing them direct to the DB during the combiner run. The BulkUpload aggregate observes both.

See ADR-0001 for the deliberate "not yet" decisions baked into this lifecycle.

Relationships

  • A Portfolio has many BulkUploads.
  • A BulkUpload produces zero or more Properties when finalised.
  • A BulkUpload has at most one Task (the orchestration handle for the FastAPI pipeline run); a Task has many SubTasks (one per pipeline stage: address matching, combiner).
  • A Portfolio has many VocabularyMappings — one row per (category, description) it has ever encountered across all its BulkUploads. See ADR-0002.

Example dialogue

Dev: "If the Combiner finishes but the user hasn't clicked Finalise, what does the user see?" Domain expert: "The BulkUpload sits in awaiting_review. The frontend polls and shows a 'review and confirm' button. Nothing's been written to Properties yet."

Dev: "And if Finalise runs and 30% of rows have no UPRN?" Domain expert: "Those still get imported as Properties — just without a UPRN — and the BulkUpload moves to complete. Manual cleanup happens later in the property table."

Flagged ambiguities

  • "Upload" is used in the codebase to mean both the file-on-S3 and the BulkUpload row. We standardise on BulkUpload for the row; the file is just "the source file."
  • "Onboarding" appears in some route paths (bulk_onboarding_inputs/...) but isn't part of this glossary — we use BulkUpload end-to-end.