Model/docs/adr/0013-bulk-upload-finaliser-writes-properties.md
2026-06-04 14:50:04 +00:00

125 lines
6.8 KiB
Markdown

# ADR-0013: The `bulk_upload_finaliser` Lambda writes properties and the terminal status
**Status:** Accepted
**Date:** 2026-06-04
> Companion to the `assessment-model` (frontend) repo's
> [ADR-0005](https://github.com/Hestia-Homes/assessment-model/blob/main/docs/adr/0005-async-bulk-upload-finaliser.md),
> which owns the state-machine change and the Drizzle schema. This ADR owns the
> **backend write path**. It applies the direct-write pattern, transaction-boundary
> rule, and port/adapter/row-mirror file layout established by
> [ADR-0003](0003-python-writes-landlord-overrides-directly.md).
## Context
Finalising a BulkUpload inserts one `property` row per source row — up to ~40,000.
Today a synchronous Next.js route does it; frontend ADR-0005 moves it to a dispatched
Lambda for the same reasons ADR-0003 moved the classifier write into Python: the work
is internal (a Lambda computes the rows, the same Lambda persists them), the Lambda
already sits next to a Postgres connection, and routing 40k inserts back through an
HTTP hop buys nothing.
`PropertyRow` in this repo
([`infrastructure/postgres/property_table.py`](../../infrastructure/postgres/property_table.py))
is currently a **defensive read-only view** — its docstring states the backend never
inserts properties. Finalise changes that.
## Decision
1. **New application `applications/bulk_upload_finaliser/`** wraps a handler in
`@subtask_handler` (auto-injected `TaskOrchestrator`; `run_subtask` owns the
subtask start/complete/fail and the Task cascade), exactly like
`applications/landlord_description_overrides/handler.py`. The trigger body:
```python
class BulkUploadFinaliserTriggerBody(SubtaskTriggerBody): # task_id, sub_task_id
s3_uri: str # combiner output (combined_output_s3_uri)
portfolio_id: int
bulk_upload_id: UUID
```
Dispatched via a new `POST /v1/bulk-uploads/trigger-finaliser` FastAPI endpoint
(auth `validate_token`) that enqueues to a new SQS queue — mirroring
`trigger-postcode-splitter` / `trigger-landlord-overrides`.
2. **`PropertyRow` drops its "backend never inserts" invariant** and gains the
insertable columns finalise needs — the **exact nine** the frontend route writes
today: `portfolio_id`, `creation_status` (`'READY'`), `uprn`,
`landlord_property_id` (← `Internal Reference`), `address` (matched ?? user
input), `postcode`, `user_inputted_address`, `user_inputted_postcode`,
`lexiscore`. Drizzle remains the schema source of truth (ADR-0003); this is a
mirror change only.
3. **Insert policy lives in SQL, idempotent:**
```sql
INSERT INTO property (portfolio_id, creation_status, uprn, landlord_property_id,
address, postcode, user_inputted_address,
user_inputted_postcode, lexiscore)
VALUES …
ON CONFLICT (portfolio_id, uprn) WHERE uprn IS NOT NULL
DO NOTHING;
```
This reproduces the frontend's `onConflictDoNothing` exactly — existing properties
are not churned on a re-run.
4. **The Lambda writes the terminal BulkUpload status directly — DDD, on its own
session.** On success it sets `status='complete'`; on failure `status='failed'`.
Rather than the combiner's `backend/app/db/functions` (which open the legacy
`backend/` connection on the separate `DB_*` config), the finaliser stays
**PostgresConfig-only like the landlord classifier Lambda**: it writes status
through a DDD `BulkUploadStatusWriter` port (`repositories/bulk_upload/`) backed
by a minimal `BulkAddressUploadRow` mirror (`infrastructure/postgres/`), on the
*same* session. So a single DB config (`POSTGRES_*`) drives the run and the
image needs no `backend/`. The `complete` flip happens **in the same transaction
as the property insert** (atomic finalise — either both land or neither);
`failed` is written on a fresh session in the error path. Next.js no longer
writes `complete`; it owns only the `awaiting_review → finalising`
compare-and-swap at dispatch (frontend ADR-0005).
5. **DDD layering (ADR-0003 + the landlord precedent).** The
`orchestration/bulk_upload_finaliser_orchestrator.py` owns the whole Finalise
domain flow via a single `finalise(rows, portfolio_id, task_id)` that depends
**only on the two repository ports** — it resolves the combiner rows, inserts
the properties, and marks the upload `complete`, with no engine/session/DB of
its own. So it is unit-testable with two in-memory fakes
(`tests/orchestration/test_bulk_upload_finaliser_orchestrator.py`). The Lambda
`applications/bulk_upload_finaliser/handler.py` is the composition root: parse
trigger, read S3, build the engine/session and the concrete adapters, open the
transaction around `finalise`, and handle the failure path. New seams:
- Property insert: `insert_all` (+ the `PropertyIdentityInsert` DTO) is added to
the existing `PropertyRepository` port and its `PropertyPostgresRepository`
adapter — **one repository per aggregate**, reads and writes together, writing
the amended `infrastructure/postgres/property_table.py` mirror. `epc_repo` is
optional: the write path creates identity rows and never hydrates, so the
finaliser constructs the repo with a session alone.
- Status: port `repositories/bulk_upload/bulk_upload_status_writer.py`, adapter
`…_postgres.py`, writing the `infrastructure/postgres/bulk_address_upload_table.py`
mirror (a separate table → a separate repository).
- Transaction boundary stays in the `infrastructure/postgres/engine` helper
(`commit_scope`); the handler opens the context (once, around insert+complete),
never calls `.commit()`; orchestrator and repositories never commit.
6. **`property_overrides` is out of scope (v2).** The table exists (frontend
migration 0221) but this Lambda does not populate it in v1. When v2 lands it adds
a `PropertyOverrideRow` mirror + `infrastructure/property/property_overrides_postgres_repository.py`
using `onConflictDoUpdate` per frontend ADR-0005.
## Consequences
**Positive.**
- 40k inserts run in a Lambda with a single `sub_task` row to observe, not behind a
synchronous HTTP request.
- Reuses the `@subtask_handler` / SQS / direct-Postgres machinery — reviewers have
the classifier handler and the combiner status-writers as precedents.
**Negative.**
- **`PropertyRow` is no longer read-only.** The defensive invariant that made
accidental backend property writes impossible is gone; correctness now rests on the
finaliser being the only inserting caller.
- **Terminal-status ownership crosses the repo boundary.** The backend now writes a
BulkUpload status (`complete`/`failed`) that Next.js used to own — coordinated only
through the status column (frontend ADR-0005's "two writers, extended").
- The Drizzle/SQLModel lockstep risk from ADR-0003 now extends to `property`'s
insertable columns.