Model/docs/adr/0013-bulk-upload-finaliser-writes-properties.md
2026-06-04 14:50:04 +00:00

6.8 KiB

ADR-0013: The bulk_upload_finaliser Lambda writes properties and the terminal status

Status: Accepted Date: 2026-06-04

Companion to the assessment-model (frontend) repo's ADR-0005, which owns the state-machine change and the Drizzle schema. This ADR owns the backend write path. It applies the direct-write pattern, transaction-boundary rule, and port/adapter/row-mirror file layout established by ADR-0003.

Context

Finalising a BulkUpload inserts one property row per source row — up to ~40,000. Today a synchronous Next.js route does it; frontend ADR-0005 moves it to a dispatched Lambda for the same reasons ADR-0003 moved the classifier write into Python: the work is internal (a Lambda computes the rows, the same Lambda persists them), the Lambda already sits next to a Postgres connection, and routing 40k inserts back through an HTTP hop buys nothing.

PropertyRow in this repo (infrastructure/postgres/property_table.py) is currently a defensive read-only view — its docstring states the backend never inserts properties. Finalise changes that.

Decision

  1. New application applications/bulk_upload_finaliser/ wraps a handler in @subtask_handler (auto-injected TaskOrchestrator; run_subtask owns the subtask start/complete/fail and the Task cascade), exactly like applications/landlord_description_overrides/handler.py. The trigger body:

    class BulkUploadFinaliserTriggerBody(SubtaskTriggerBody):  # task_id, sub_task_id
        s3_uri: str        # combiner output (combined_output_s3_uri)
        portfolio_id: int
        bulk_upload_id: UUID
    

    Dispatched via a new POST /v1/bulk-uploads/trigger-finaliser FastAPI endpoint (auth validate_token) that enqueues to a new SQS queue — mirroring trigger-postcode-splitter / trigger-landlord-overrides.

  2. PropertyRow drops its "backend never inserts" invariant and gains the insertable columns finalise needs — the exact nine the frontend route writes today: portfolio_id, creation_status ('READY'), uprn, landlord_property_id (← Internal Reference), address (matched ?? user input), postcode, user_inputted_address, user_inputted_postcode, lexiscore. Drizzle remains the schema source of truth (ADR-0003); this is a mirror change only.

  3. Insert policy lives in SQL, idempotent:

    INSERT INTO property (portfolio_id, creation_status, uprn, landlord_property_id,
                          address, postcode, user_inputted_address,
                          user_inputted_postcode, lexiscore)
    VALUES 
    ON CONFLICT (portfolio_id, uprn) WHERE uprn IS NOT NULL
    DO NOTHING;
    

    This reproduces the frontend's onConflictDoNothing exactly — existing properties are not churned on a re-run.

  4. The Lambda writes the terminal BulkUpload status directly — DDD, on its own session. On success it sets status='complete'; on failure status='failed'. Rather than the combiner's backend/app/db/functions (which open the legacy backend/ connection on the separate DB_* config), the finaliser stays PostgresConfig-only like the landlord classifier Lambda: it writes status through a DDD BulkUploadStatusWriter port (repositories/bulk_upload/) backed by a minimal BulkAddressUploadRow mirror (infrastructure/postgres/), on the same session. So a single DB config (POSTGRES_*) drives the run and the image needs no backend/. The complete flip happens in the same transaction as the property insert (atomic finalise — either both land or neither); failed is written on a fresh session in the error path. Next.js no longer writes complete; it owns only the awaiting_review → finalising compare-and-swap at dispatch (frontend ADR-0005).

  5. DDD layering (ADR-0003 + the landlord precedent). The orchestration/bulk_upload_finaliser_orchestrator.py owns the whole Finalise domain flow via a single finalise(rows, portfolio_id, task_id) that depends only on the two repository ports — it resolves the combiner rows, inserts the properties, and marks the upload complete, with no engine/session/DB of its own. So it is unit-testable with two in-memory fakes (tests/orchestration/test_bulk_upload_finaliser_orchestrator.py). The Lambda applications/bulk_upload_finaliser/handler.py is the composition root: parse trigger, read S3, build the engine/session and the concrete adapters, open the transaction around finalise, and handle the failure path. New seams:

    • Property insert: insert_all (+ the PropertyIdentityInsert DTO) is added to the existing PropertyRepository port and its PropertyPostgresRepository adapter — one repository per aggregate, reads and writes together, writing the amended infrastructure/postgres/property_table.py mirror. epc_repo is optional: the write path creates identity rows and never hydrates, so the finaliser constructs the repo with a session alone.
    • Status: port repositories/bulk_upload/bulk_upload_status_writer.py, adapter …_postgres.py, writing the infrastructure/postgres/bulk_address_upload_table.py mirror (a separate table → a separate repository).
    • Transaction boundary stays in the infrastructure/postgres/engine helper (commit_scope); the handler opens the context (once, around insert+complete), never calls .commit(); orchestrator and repositories never commit.
  6. property_overrides is out of scope (v2). The table exists (frontend migration 0221) but this Lambda does not populate it in v1. When v2 lands it adds a PropertyOverrideRow mirror + infrastructure/property/property_overrides_postgres_repository.py using onConflictDoUpdate per frontend ADR-0005.

Consequences

Positive.

  • 40k inserts run in a Lambda with a single sub_task row to observe, not behind a synchronous HTTP request.
  • Reuses the @subtask_handler / SQS / direct-Postgres machinery — reviewers have the classifier handler and the combiner status-writers as precedents.

Negative.

  • PropertyRow is no longer read-only. The defensive invariant that made accidental backend property writes impossible is gone; correctness now rests on the finaliser being the only inserting caller.
  • Terminal-status ownership crosses the repo boundary. The backend now writes a BulkUpload status (complete/failed) that Next.js used to own — coordinated only through the status column (frontend ADR-0005's "two writers, extended").
  • The Drizzle/SQLModel lockstep risk from ADR-0003 now extends to property's insertable columns.