Model/docs/adr/0003-python-writes-landlord-overrides-directly.md
2026-06-01 15:46:46 +00:00

7.4 KiB

ADR-0003: Python writes landlord overrides directly to Postgres

Status: Accepted Date: 2026-05-26 Supersedes (in part): assessment-model/docs/adr/0002-landlord-override-vocabulary.md — specifically the clause beginning "Writes happen from Next.js …".

Context

ADR-0002 (in the assessment-model TS repo) defined the landlord_property_type_overrides and landlord_wall_type_overrides tables and noted that the Model service would POST classification results to a Next.js route handler, with Next.js performing the upsert. Drizzle remained the schema source of truth.

That extra hop has not been built and is now judged unnecessary for the present scope:

  • The classification result is internal — a Lambda computes it, the same Lambda persists it. No third party needs to participate in the write.
  • Drizzle remains the schema's source of truth either way: the Python adapter mirrors the schema in a SQLModel row, but the migrations stay with Drizzle. Adding a Next.js route would not change which side owns schema definition.
  • The Python lambda already lives next to a Postgres connection in the existing pipeline (subtask/tasks tables are written from Python today). Adding two more tables to that adapter surface is a small, well-understood change. Routing the same writes through Next.js would mean: lambda → JSON-over-HTTP → Next.js route → Drizzle → Postgres, instead of lambda → SQLAlchemy → Postgres. Three extra moving parts to ship, deploy, monitor, and authenticate for no behavioural gain.

Decision

The Model service (specifically applications/landlord_description_overrides/handler.py) writes directly to landlord_property_type_overrides and landlord_wall_type_overrides via a SQLAlchemy-backed LandlordOverrideRepository[E] adapter. No Next.js route handler is required.

Transaction boundaries live in infrastructure/postgres/engine.transactional_session — a context manager that commits on clean exit and rolls back on exception. The application layer (handler.py) never calls .commit() or .rollback() itself; it only opens the context. Orchestration and repository code likewise never commits — keeping transaction semantics confined to one infrastructure helper.

The conflict policy lives in SQL and is identical for every override category. A single generic adapter, LandlordOverridesRepository[E], implements it once; the target table is selected by the SQLModel …Row class passed at construction. Each category (property / built-form / wall / roof type) is that same adapter parameterised by its row class:

INSERT INTO landlord_property_type_overrides (portfolio_id, description, value, source)
VALUES 
ON CONFLICT (portfolio_id, description)
DO UPDATE SET value = EXCLUDED.value,
              source = EXCLUDED.source,
              updated_at = now()
WHERE landlord_property_type_overrides.source = 'classifier';

The WHERE existing.source = 'classifier' guard is load-bearing: it lets the classifier refresh its own past output while leaving source = 'user' rows untouched. This is the contract ADR-0002's source column was added for.

UNKNOWN values are persisted, not skipped — consistent with ADR-0002 §5. A future user override can upgrade them.

Consequences

Positive.

  • One fewer service to deploy, monitor, and authenticate.
  • The classifier and persistence live in the same process — failures surface against a single sub_task row, not split across two systems.
  • The Postgres adapter mirrors the existing subtask/tasks repositories, so reviewers have a precedent to compare against.

Negative.

  • The Python repo now holds two schemas — the schema-source-of-truth Drizzle definition lives in the TS repo, and the Python SQLModel row class shadows it. They must stay in lockstep. Mitigations: the TS schema header comment (landlord_overrides.ts:12) already names the Python source-of-truth file; a future ADR may add a CI check that diffs the two.
  • The boundary that ADR-0002 anticipated for pgEnum validation (a Next.js route validating incoming values before insert) is gone. Pydantic + the Python Enum type catch invalid values on the producing side, and Postgres's pgEnum will reject anything that slips through.

File layout

This ADR also fixes a placement convention for Postgres adapters going forward. The codebase currently has the ChatGPT classifier split cleanly along DDD lines — port in domain/, adapter in infrastructure/chatgpt/ — but the tasks Postgres adapter does not follow the same shape: its concrete class lives in repositories/tasks/, not infrastructure/postgres/.

The convention going forward separates the persistence behaviour (grouped by aggregate) from the schema mirrors (grouped by technology, since they share pgEnums and engine metadata):

  • Port (protocol / abstract base): repositories/<aggregate>/<thing>_repository.py
  • Postgres repository adapter (concrete): infrastructure/<aggregate>/<aggregate>_postgres_repository.py
  • SQLModel row class (table=True schema mirror): infrastructure/postgres/<thing>_table.py

The LandlordOverridesRepository adapter follows this convention: the concrete class — the aggregate's "talker" to Postgres — lives at infrastructure/landlord_overrides/landlord_overrides_postgres_repository.py, while the per-category …Row classes stay in infrastructure/postgres/. The …Row classes are one-per-table — each mirrors a genuinely distinct Drizzle table and value pgEnum, and they share the single override_source pgEnum instance, so they belong together in the Postgres technology bucket as schema mirrors, not duplicated logic.

(This refines the placement first sketched in this ADR, which put the adapter in infrastructure/postgres/ alongside the row classes. The adapter holds no schema — only the write path — so it groups by aggregate; only the table=True mirrors stay tech-bucketed.)

Existing outliers to relocate in a follow-up:

  • repositories/tasks/task_postgres_repository.pyinfrastructure/tasks/task_postgres_repository.py
  • repositories/tasks/subtask_postgres_repository.pyinfrastructure/tasks/subtask_postgres_repository.py

(Their task_table.py / subtask_table.py schema mirrors already sit correctly in infrastructure/postgres/.) Both moves are mechanical (import-path updates only). They are intentionally out of scope for the present PR.

Out of scope (deferred to follow-up work)

  • Relocating task_postgres_repository.py and subtask_postgres_repository.py into infrastructure/tasks/ per the convention above.
  • Extracting a shared upsert helper / base class once a third landlord_*_overrides column lands — until then the per-category adapters' 95%-identical bodies are kept side-by-side for direct comparison. Done. The per-category adapter bodies were byte-identical (varying only in their row class), so they were consolidated into one generic LandlordOverridesRepository[E] parameterised by row class rather than waiting for a third column.
  • Switching applications/landlord_description_overrides/handler.py to acquire its Session via a @subtask_handler()-style decorator instead of building its own engine.
  • A cross-repo PR amending ADR-0002 to point at this ADR.
  • A CI check (or codegen) that diffs the Drizzle pgEnum literals against the Python Enum.value strings.