mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
125 lines
6.8 KiB
Markdown
125 lines
6.8 KiB
Markdown
# ADR-0013: The `bulk_upload_finaliser` Lambda writes properties and the terminal status
|
|
|
|
**Status:** Accepted
|
|
**Date:** 2026-06-04
|
|
|
|
> Companion to the `assessment-model` (frontend) repo's
|
|
> [ADR-0005](https://github.com/Hestia-Homes/assessment-model/blob/main/docs/adr/0005-async-bulk-upload-finaliser.md),
|
|
> which owns the state-machine change and the Drizzle schema. This ADR owns the
|
|
> **backend write path**. It applies the direct-write pattern, transaction-boundary
|
|
> rule, and port/adapter/row-mirror file layout established by
|
|
> [ADR-0003](0003-python-writes-landlord-overrides-directly.md).
|
|
|
|
## Context
|
|
|
|
Finalising a BulkUpload inserts one `property` row per source row — up to ~40,000.
|
|
Today a synchronous Next.js route does it; frontend ADR-0005 moves it to a dispatched
|
|
Lambda for the same reasons ADR-0003 moved the classifier write into Python: the work
|
|
is internal (a Lambda computes the rows, the same Lambda persists them), the Lambda
|
|
already sits next to a Postgres connection, and routing 40k inserts back through an
|
|
HTTP hop buys nothing.
|
|
|
|
`PropertyRow` in this repo
|
|
([`infrastructure/postgres/property_table.py`](../../infrastructure/postgres/property_table.py))
|
|
is currently a **defensive read-only view** — its docstring states the backend never
|
|
inserts properties. Finalise changes that.
|
|
|
|
## Decision
|
|
|
|
1. **New application `applications/bulk_upload_finaliser/`** wraps a handler in
|
|
`@subtask_handler` (auto-injected `TaskOrchestrator`; `run_subtask` owns the
|
|
subtask start/complete/fail and the Task cascade), exactly like
|
|
`applications/landlord_description_overrides/handler.py`. The trigger body:
|
|
|
|
```python
|
|
class BulkUploadFinaliserTriggerBody(SubtaskTriggerBody): # task_id, sub_task_id
|
|
s3_uri: str # combiner output (combined_output_s3_uri)
|
|
portfolio_id: int
|
|
bulk_upload_id: UUID
|
|
```
|
|
|
|
Dispatched via a new `POST /v1/bulk-uploads/trigger-finaliser` FastAPI endpoint
|
|
(auth `validate_token`) that enqueues to a new SQS queue — mirroring
|
|
`trigger-postcode-splitter` / `trigger-landlord-overrides`.
|
|
|
|
2. **`PropertyRow` drops its "backend never inserts" invariant** and gains the
|
|
insertable columns finalise needs — the **exact nine** the frontend route writes
|
|
today: `portfolio_id`, `creation_status` (`'READY'`), `uprn`,
|
|
`landlord_property_id` (← `Internal Reference`), `address` (matched ?? user
|
|
input), `postcode`, `user_inputted_address`, `user_inputted_postcode`,
|
|
`lexiscore`. Drizzle remains the schema source of truth (ADR-0003); this is a
|
|
mirror change only.
|
|
|
|
3. **Insert policy lives in SQL, idempotent:**
|
|
|
|
```sql
|
|
INSERT INTO property (portfolio_id, creation_status, uprn, landlord_property_id,
|
|
address, postcode, user_inputted_address,
|
|
user_inputted_postcode, lexiscore)
|
|
VALUES …
|
|
ON CONFLICT (portfolio_id, uprn) WHERE uprn IS NOT NULL
|
|
DO NOTHING;
|
|
```
|
|
|
|
This reproduces the frontend's `onConflictDoNothing` exactly — existing properties
|
|
are not churned on a re-run.
|
|
|
|
4. **The Lambda writes the terminal BulkUpload status directly — DDD, on its own
|
|
session.** On success it sets `status='complete'`; on failure `status='failed'`.
|
|
Rather than the combiner's `backend/app/db/functions` (which open the legacy
|
|
`backend/` connection on the separate `DB_*` config), the finaliser stays
|
|
**PostgresConfig-only like the landlord classifier Lambda**: it writes status
|
|
through a DDD `BulkUploadStatusWriter` port (`repositories/bulk_upload/`) backed
|
|
by a minimal `BulkAddressUploadRow` mirror (`infrastructure/postgres/`), on the
|
|
*same* session. So a single DB config (`POSTGRES_*`) drives the run and the
|
|
image needs no `backend/`. The `complete` flip happens **in the same transaction
|
|
as the property insert** (atomic finalise — either both land or neither);
|
|
`failed` is written on a fresh session in the error path. Next.js no longer
|
|
writes `complete`; it owns only the `awaiting_review → finalising`
|
|
compare-and-swap at dispatch (frontend ADR-0005).
|
|
|
|
5. **DDD layering (ADR-0003 + the landlord precedent).** The
|
|
`orchestration/bulk_upload_finaliser_orchestrator.py` owns the whole Finalise
|
|
domain flow via a single `finalise(rows, portfolio_id, task_id)` that depends
|
|
**only on the two repository ports** — it resolves the combiner rows, inserts
|
|
the properties, and marks the upload `complete`, with no engine/session/DB of
|
|
its own. So it is unit-testable with two in-memory fakes
|
|
(`tests/orchestration/test_bulk_upload_finaliser_orchestrator.py`). The Lambda
|
|
`applications/bulk_upload_finaliser/handler.py` is the composition root: parse
|
|
trigger, read S3, build the engine/session and the concrete adapters, open the
|
|
transaction around `finalise`, and handle the failure path. New seams:
|
|
- Property insert: `insert_all` (+ the `PropertyIdentityInsert` DTO) is added to
|
|
the existing `PropertyRepository` port and its `PropertyPostgresRepository`
|
|
adapter — **one repository per aggregate**, reads and writes together, writing
|
|
the amended `infrastructure/postgres/property_table.py` mirror. `epc_repo` is
|
|
optional: the write path creates identity rows and never hydrates, so the
|
|
finaliser constructs the repo with a session alone.
|
|
- Status: port `repositories/bulk_upload/bulk_upload_status_writer.py`, adapter
|
|
`…_postgres.py`, writing the `infrastructure/postgres/bulk_address_upload_table.py`
|
|
mirror (a separate table → a separate repository).
|
|
- Transaction boundary stays in the `infrastructure/postgres/engine` helper
|
|
(`commit_scope`); the handler opens the context (once, around insert+complete),
|
|
never calls `.commit()`; orchestrator and repositories never commit.
|
|
|
|
6. **`property_overrides` is out of scope (v2).** The table exists (frontend
|
|
migration 0221) but this Lambda does not populate it in v1. When v2 lands it adds
|
|
a `PropertyOverrideRow` mirror + `infrastructure/property/property_overrides_postgres_repository.py`
|
|
using `onConflictDoUpdate` per frontend ADR-0005.
|
|
|
|
## Consequences
|
|
|
|
**Positive.**
|
|
- 40k inserts run in a Lambda with a single `sub_task` row to observe, not behind a
|
|
synchronous HTTP request.
|
|
- Reuses the `@subtask_handler` / SQS / direct-Postgres machinery — reviewers have
|
|
the classifier handler and the combiner status-writers as precedents.
|
|
|
|
**Negative.**
|
|
- **`PropertyRow` is no longer read-only.** The defensive invariant that made
|
|
accidental backend property writes impossible is gone; correctness now rests on the
|
|
finaliser being the only inserting caller.
|
|
- **Terminal-status ownership crosses the repo boundary.** The backend now writes a
|
|
BulkUpload status (`complete`/`failed`) that Next.js used to own — coordinated only
|
|
through the status column (frontend ADR-0005's "two writers, extended").
|
|
- The Drizzle/SQLModel lockstep risk from ADR-0003 now extends to `property`'s
|
|
insertable columns.
|