6.8 KiB
ADR-0013: The bulk_upload_finaliser Lambda writes properties and the terminal status
Status: Accepted Date: 2026-06-04
Companion to the
assessment-model(frontend) repo's ADR-0005, which owns the state-machine change and the Drizzle schema. This ADR owns the backend write path. It applies the direct-write pattern, transaction-boundary rule, and port/adapter/row-mirror file layout established by ADR-0003.
Context
Finalising a BulkUpload inserts one property row per source row — up to ~40,000.
Today a synchronous Next.js route does it; frontend ADR-0005 moves it to a dispatched
Lambda for the same reasons ADR-0003 moved the classifier write into Python: the work
is internal (a Lambda computes the rows, the same Lambda persists them), the Lambda
already sits next to a Postgres connection, and routing 40k inserts back through an
HTTP hop buys nothing.
PropertyRow in this repo
(infrastructure/postgres/property_table.py)
is currently a defensive read-only view — its docstring states the backend never
inserts properties. Finalise changes that.
Decision
-
New application
applications/bulk_upload_finaliser/wraps a handler in@subtask_handler(auto-injectedTaskOrchestrator;run_subtaskowns the subtask start/complete/fail and the Task cascade), exactly likeapplications/landlord_description_overrides/handler.py. The trigger body:class BulkUploadFinaliserTriggerBody(SubtaskTriggerBody): # task_id, sub_task_id s3_uri: str # combiner output (combined_output_s3_uri) portfolio_id: int bulk_upload_id: UUIDDispatched via a new
POST /v1/bulk-uploads/trigger-finaliserFastAPI endpoint (authvalidate_token) that enqueues to a new SQS queue — mirroringtrigger-postcode-splitter/trigger-landlord-overrides. -
PropertyRowdrops its "backend never inserts" invariant and gains the insertable columns finalise needs — the exact nine the frontend route writes today:portfolio_id,creation_status('READY'),uprn,landlord_property_id(←Internal Reference),address(matched ?? user input),postcode,user_inputted_address,user_inputted_postcode,lexiscore. Drizzle remains the schema source of truth (ADR-0003); this is a mirror change only. -
Insert policy lives in SQL, idempotent:
INSERT INTO property (portfolio_id, creation_status, uprn, landlord_property_id, address, postcode, user_inputted_address, user_inputted_postcode, lexiscore) VALUES … ON CONFLICT (portfolio_id, uprn) WHERE uprn IS NOT NULL DO NOTHING;This reproduces the frontend's
onConflictDoNothingexactly — existing properties are not churned on a re-run. -
The Lambda writes the terminal BulkUpload status directly — DDD, on its own session. On success it sets
status='complete'; on failurestatus='failed'. Rather than the combiner'sbackend/app/db/functions(which open the legacybackend/connection on the separateDB_*config), the finaliser stays PostgresConfig-only like the landlord classifier Lambda: it writes status through a DDDBulkUploadStatusWriterport (repositories/bulk_upload/) backed by a minimalBulkAddressUploadRowmirror (infrastructure/postgres/), on the same session. So a single DB config (POSTGRES_*) drives the run and the image needs nobackend/. Thecompleteflip happens in the same transaction as the property insert (atomic finalise — either both land or neither);failedis written on a fresh session in the error path. Next.js no longer writescomplete; it owns only theawaiting_review → finalisingcompare-and-swap at dispatch (frontend ADR-0005). -
DDD layering (ADR-0003 + the landlord precedent). The
orchestration/bulk_upload_finaliser_orchestrator.pyowns the whole Finalise domain flow via a singlefinalise(rows, portfolio_id, task_id)that depends only on the two repository ports — it resolves the combiner rows, inserts the properties, and marks the uploadcomplete, with no engine/session/DB of its own. So it is unit-testable with two in-memory fakes (tests/orchestration/test_bulk_upload_finaliser_orchestrator.py). The Lambdaapplications/bulk_upload_finaliser/handler.pyis the composition root: parse trigger, read S3, build the engine/session and the concrete adapters, open the transaction aroundfinalise, and handle the failure path. New seams:- Property insert:
insert_all(+ thePropertyIdentityInsertDTO) is added to the existingPropertyRepositoryport and itsPropertyPostgresRepositoryadapter — one repository per aggregate, reads and writes together, writing the amendedinfrastructure/postgres/property_table.pymirror.epc_repois optional: the write path creates identity rows and never hydrates, so the finaliser constructs the repo with a session alone. - Status: port
repositories/bulk_upload/bulk_upload_status_writer.py, adapter…_postgres.py, writing theinfrastructure/postgres/bulk_address_upload_table.pymirror (a separate table → a separate repository). - Transaction boundary stays in the
infrastructure/postgres/enginehelper (commit_scope); the handler opens the context (once, around insert+complete), never calls.commit(); orchestrator and repositories never commit.
- Property insert:
-
property_overridesis out of scope (v2). The table exists (frontend migration 0221) but this Lambda does not populate it in v1. When v2 lands it adds aPropertyOverrideRowmirror +infrastructure/property/property_overrides_postgres_repository.pyusingonConflictDoUpdateper frontend ADR-0005.
Consequences
Positive.
- 40k inserts run in a Lambda with a single
sub_taskrow to observe, not behind a synchronous HTTP request. - Reuses the
@subtask_handler/ SQS / direct-Postgres machinery — reviewers have the classifier handler and the combiner status-writers as precedents.
Negative.
PropertyRowis no longer read-only. The defensive invariant that made accidental backend property writes impossible is gone; correctness now rests on the finaliser being the only inserting caller.- Terminal-status ownership crosses the repo boundary. The backend now writes a
BulkUpload status (
complete/failed) that Next.js used to own — coordinated only through the status column (frontend ADR-0005's "two writers, extended"). - The Drizzle/SQLModel lockstep risk from ADR-0003 now extends to
property's insertable columns.