assessment-model/CONTEXT.md
2026-05-26 10:20:29 +00:00

88 lines
5.3 KiB
Markdown

# Context
This document captures the domain language used in this project. Terms here are the **canonical** ones — when more than one word exists for a concept, we pick one and treat the others as aliases to avoid.
This file grows as terms are resolved during design conversations. Concepts that haven't been examined yet are not listed.
## Language
### Bulk upload
**BulkUpload**:
A user-supplied spreadsheet of addresses for a Portfolio, transformed and matched to UPRNs before being inserted as Properties. Has an explicit lifecycle from upload through finalisation.
_Avoid_: import, batch, file upload, ingest
**ColumnMapping**:
The user's declaration of which spreadsheet column means what (e.g. column "Property Address" means `address_1`). Stored as JSON on the BulkUpload row.
_Avoid_: schema, header map, field mapping
**UPRN**:
Unique Property Reference Number — the UK national identifier for an address. Address matching attaches a UPRN to each row where possible.
**Address matching**:
The pipeline stage that splits the source file by postcode, looks up UPRNs, and produces matched-address output. Triggered via FastAPI.
_Avoid_: postcode lookup, address resolution, address lookup
**Combiner**:
The pipeline stage that aggregates the per-postcode address-matching outputs into a single combined CSV in S3, ready for review.
_Avoid_: aggregator, merger
**Finalise**:
The terminal action that reads the combiner output, inserts rows as Properties on the Portfolio, and decides whether the BulkUpload needs further review.
_Avoid_: import, commit, ingest
### Landlord overrides
**Landlord**:
The housing association supplying a Portfolio's BulkUploads. A Landlord knows facts about their properties that EPC data doesn't (e.g. that a cavity has been filled), and those facts take precedence when computing an assessment.
_Avoid_: customer, client, owner, organisation (Organisation is a separate, broader entity)
**Landlord override**:
A landlord-supplied fact about a property that takes precedence over EPC-derived defaults when computing an assessment. The end-to-end Landlord override journey has two layers — a **VocabularyMapping** layer (this glossary entry below) and a per-Property fact layer (not yet modelled).
_Avoid_: customer data, manual override, landlord data
**VocabularyMapping**:
The translation from a Landlord's free-text description in a BulkUpload column (e.g. `"cavity: filledcavity"`) to a canonical domain enum value (e.g. `WallType.CAVITY`). Produced by a `ColumnClassifier` (today an LLM, tomorrow possibly a lookup table or rules engine) in the Model service. Stored per-Portfolio, one row per `(category, description)`. A row carries provenance (`classifier` or `user`) so user overrides survive re-classification.
_Avoid_: column mapping (that's a separate concept — see `ColumnMapping` above), classification, dictionary
## Lifecycle
A **BulkUpload** moves through these statuses:
```
ready_for_processing
→ mapping_complete (user submits ColumnMapping; Next.js writes)
→ processing (Address matching triggered; Next.js writes)
→ combining (Combiner stage running; FastAPI writes directly)
→ awaiting_review (Combiner output in S3; FastAPI writes directly)
→ complete (Finalise succeeded; Next.js writes)
→ failed (FastAPI reports in-flight failure — schema only, not yet wired)
```
`complete` and `failed` are terminal.
Re-mapping (PATCHing `columnMapping`) is legal only in `ready_for_processing` and `mapping_complete`. Any later state rejects with 409.
**Two writers**: Next.js owns transitions out of `mapping_complete`, into `processing`, and the terminal Finalise outcomes. FastAPI owns `combining` and `awaiting_review` — writing them direct to the DB during the combiner run. The BulkUpload aggregate observes both.
See [ADR-0001](./docs/adr/0001-bulk-upload-state-machine.md) for the deliberate "not yet" decisions baked into this lifecycle.
## Relationships
- A **Portfolio** has many **BulkUploads**.
- A **BulkUpload** produces zero or more **Properties** when finalised.
- A **BulkUpload** has at most one **Task** (the orchestration handle for the FastAPI pipeline run); a Task has many **SubTasks** (one per pipeline stage: address matching, combiner).
- A **Portfolio** has many **VocabularyMappings** — one row per `(category, description)` it has ever encountered across all its BulkUploads. See [ADR-0002](./docs/adr/0002-landlord-override-vocabulary.md).
## Example dialogue
> **Dev:** "If the **Combiner** finishes but the user hasn't clicked Finalise, what does the user see?"
> **Domain expert:** "The BulkUpload sits in `awaiting_review`. The frontend polls and shows a 'review and confirm' button. Nothing's been written to **Properties** yet."
>
> **Dev:** "And if **Finalise** runs and 30% of rows have no **UPRN**?"
> **Domain expert:** "Those still get imported as **Properties** — just without a UPRN — and the BulkUpload moves to `complete`. Manual cleanup happens later in the property table."
## Flagged ambiguities
- "Upload" is used in the codebase to mean both the file-on-S3 and the BulkUpload row. We standardise on **BulkUpload** for the row; the file is just "the source file."
- "Onboarding" appears in some route paths (`bulk_onboarding_inputs/...`) but isn't part of this glossary — we use **BulkUpload** end-to-end.