save plans for landlord overrid

This commit is contained in:
Jun-te Kim 2026-05-27 10:13:59 +00:00
parent 899ac432ea
commit 4981b0ee12

View file

@ -1,13 +1,14 @@
# Landlord override frontend — in-flight design notes
**Status:** Paused mid-grilling (2026-05-27)
**Branch:** `feautre/additional_walltypes`
**Status:** Grilling complete (2026-05-28) — Q1Q7 resolved; ready to promote to ADR
**Branch:** `feature/frontend_landlord_overrides`
**Author:** Jun-te (with Claude, via `/grill-me`)
This is a *design-in-progress* document, not an ADR. It captures decisions made
so far on the landlord-override frontend plan so the conversation can resume
without re-litigating settled questions. Promote to an ADR once the trigger
mechanism (Q4) is resolved — that's the decision worth permanent recording.
This is a _design-in-progress_ document, not an ADR. It captures the decisions
reached during grilling so the conversation can resume without re-litigating
settled questions. All seven questions are now resolved; the trigger +
state-machine integration (Q4) is ready to promote to a frontend ADR-0003, and
the work is ready to break into issues.
## Goal
@ -23,14 +24,22 @@ Build the front-end e2e for `landlord_description_override`, starting from the
rationale in [ADR-0002](../adr/0002-landlord-override-vocabulary.md).
- Nothing in Next.js reads or writes them yet.
- The Python lambda at
`/workspaces/home/github/Model/applications/landlord_description_overrides/handler.py`
is **not deployed** and **not wired** into the BulkUpload pipeline. It
hardcodes its trigger params (`portfolio_id`, `s3_uri`) and its source column
names (`"Property Type"`, `"Walls"`, `"Roofs"`).
- Note: ADR-0002 says writes come from Next.js POST, but the current backend
writes direct to Postgres. This drift may need to be revisited under Q4.
`Model/applications/landlord_description_overrides/handler.py` (branch
`feature/landlord_data`) is **not deployed** and **not wired** into the
BulkUpload pipeline. It hardcodes the trigger fields (handler.py:55-60) and
builds a hardcoded `ClassifiableColumn` list with `source_column`s
`"Property Type"` (→ both `property_type` **and** `built_form_type`),
`"Walls"`, `"Roofs"`. It also caps the batch to 20 rows while under test.
- **Resolved drift (was a Q4 risk):** the backend's
[ADR-0003 (python-writes-landlord-overrides-directly)](https://github.com/Hestia-Homes/Model/blob/main/docs/adr/0003-python-writes-landlord-overrides-directly.md),
accepted 2026-05-26, **supersedes** ADR-0002's "writes happen from Next.js"
clause. The lambda computes **and** persists the classification directly to
Postgres via a SQLAlchemy `LandlordOverrideRepository[E]`, with an
`ON CONFLICT … WHERE source = 'classifier'` upsert. There is **no** Next.js
POST-back. Drizzle stays schema source-of-truth; the Python `SQLModel`
shadows it.
## Decided so far
## Decided
### Q1 — Scope
@ -41,8 +50,7 @@ to the lambda → lambda needs edits to work when deployed.
### Q2 — Categories
**All four classifier categories** get independent optional slots in
[`INTERNAL_FIELDS`](../../src/app/portfolio/[slug]/(portfolio)/bulk-upload/[uploadId]/map-columns/MapColumnsClient.tsx#L14-L21):
**All four classifier categories** are independent optional fields:
`property_type`, `built_form_type`, `wall_type`, `roof_type`.
Rejected alternatives: (a) start with only PT+BF — wasted plumbing churn for
@ -50,58 +58,118 @@ the same migration cost; (c) collapse PT+BF into one UI slot — bakes a
backend coincidence (they read the same CSV column today) into the
user-facing model.
### Q2.1 — No `autoDetect` for the new slots
### Q2.1 — No `autoDetect` for classifier fields
The four new slots default to `"skip"`. The user must explicitly map them.
`autoDetect()` regex patterns are for required address-ish fields only.
The four classifier fields default to unmapped ("Not provided"). The user must
explicitly map them. `autoDetect()` is for required address-ish fields only.
**Why:** Address headers are unambiguous and required, so guessing is safe and
useful. Landlord-description columns are ambiguous (a "type" column could be
PropertyType or BuiltFormType or something else) and they are optional —
useful. Landlord-description columns are ambiguous and optional —
auto-detecting them would silently opt the landlord into classifier runs they
didn't intend.
## Open — resume here
### Q2.2 — Mapping shape: unified `field → header` (refines Q2's mechanism)
### Q3 (in flight) — Uniqueness validation on the mapping
The mapping is **one unified map keyed by internal field, valued by source CSV
header** (`{ address_1: "Addr 1", property_type: "Property Type",
built_form_type: "Property Type", wall_type: "Walls", … }`), replacing the
current `header → field` shape
([MapColumnsClient.tsx:62-64](<../../src/app/portfolio/[slug]/(portfolio)/bulk-upload/[uploadId]/map-columns/MapColumnsClient.tsx#L62-L64>)).
Today validation only checks required fields exist
([MapColumnsClient.tsx:67-68](../../src/app/portfolio/[slug]/(portfolio)/bulk-upload/[uploadId]/map-columns/MapColumnsClient.tsx#L67-L68));
two CSV headers can both map to `address_1` silently.
**Why:** the backend's `ClassifiableColumn` is a `(name, source_column)` pair
where **one `source_column` feeds many `name`s**`"Property Type"` feeds both
`property_type` and `built_form_type`
([classifiable_column.py:28-31](https://github.com/Hestia-Homes/Model/blob/feature/landlord_data/orchestration/classifiable_column.py)).
A `header → field` map (one value per header) **cannot express** that; the
chosen mechanism in Q2 (extending the single `INTERNAL_FIELDS` dropdown) would
leave `built_form_type` unmappable. `field → header` matches the backend model
1:1 and lets multiple categories share a header.
- (a) Leave it alone — backend last-wins.
- (b) Enforce uniqueness only on the four new slots.
- (c) Enforce uniqueness everywhere except `skip`. **Recommended.**
**Consequences:** UI inverts to one row per internal field with a header-picker;
`autoDetect` inverts to "best header per field"; "skip" becomes a per-field
"Not provided"; rewrite
[`transformFile`](<../../src/app/api/portfolio/[portfolioId]/bulk-uploads/[uploadId]/start-address-matching/route.ts#L17-L54>)
(currently iterates `header → field`), the `z.record` route schema, and
**migrate existing persisted `columnMapping` rows** (semantics flip) — see Q5.
### Q4 (queued — biggest) — Trigger mechanism
### Q3 — Sharing / uniqueness rule (reframed)
How does Next.js invoke the lambda once mapping is complete? SQS message?
Direct lambda invoke? HTTP endpoint? And what's the state-machine integration —
new `BulkUpload` status, or runs orthogonally to address matching?
A source header may feed **at most one address/reference field**, but **any
number of classifier fields** (PT + BF → `"Property Type"` is required, so
classifier sharing must be allowed). Required-field validation stays:
`address_1` and `postcode` must each be assigned a header.
This drives both the deployment work and the lambda edits. Likely worth its
own ADR once decided.
This **replaces** the original "enforce uniqueness everywhere except skip"
framing, which was wrong once classifier header-sharing became a requirement.
### Q5 (queued) — Persistence of the extended mapping
### Q4 — Trigger mechanism + state-machine integration (the big one)
Current `bulkAddressUploads.columnMapping` is a `Record<string, string>` and
naturally accommodates the new slots. Confirm no separate table is needed.
- **Transport:** reuse
[`triggerFastApiPipeline`](../../src/lib/bulkUpload/server.ts) with a **new
FastAPI endpoint**; payload `{ task_id, sub_task_id, s3_uri, portfolio_id,
column_mapping }`. FastAPI turns the POST into the SQS subtask envelope the
handler TODO (handler.py:51-54) references. Not a direct lambda invoke, not
Next.js → SQS.
- **`s3_uri` is the ORIGINAL upload** (`upload.s3Bucket/s3Key`), **not** the
address-matching transformed CSV — the description columns and their original
header names only survive in the original (the address transform strips every
non-address column).
- **Writes:** lambda writes overrides directly to Postgres (ADR-0003); no
POST-back.
- **State machine:** the classifier runs as a **subtask under the same address
task** (not a separate task, not a new `BulkUpload` status). Both subtasks
fire together at the **"Start address matching"** action. Safe: the combiner
globs S3 `ara_raw_outputs/{task_id}/`, which the classifier never writes to,
so the combined address output is not affected; `subtask_handler` does no
sibling-completion gating.
- **Progress honesty:** add a nullable **`service`/`kind` discriminator to
`sub_task`** (existing rows = address/legacy) so the progress view shows
address batches vs classification separately and attributes failures
correctly. Update the 3 subtask-count sites + `OnboardingProgress`.
### Q6 (queued) — Lambda edits
### Q5 — Persistence + migration of the inverted mapping
Handler hardcodes `source_column="Property Type" / "Walls" / "Roofs"`; needs
to read the mapping from the trigger body.
`LandlordDescriptionOverridesTriggerBody` already exists — check what fields
it has vs needs.
Reuse `bulkAddressUploads.columnMapping` (jsonb `Record<string, string>`); **no
separate table**. The Q2.2 inversion (`header → field``field → header`) is
handled by a **one-shot data migration**: invert each non-skip entry; on a
legacy duplicate (two headers → one field, which the new Q3 rule forbids
anyway) last-write-wins; `skip` entries drop. The migration is a no-op on empty
tables, so it's safe regardless of data volume. `validateMapping`
([server.ts:97-102](../../src/lib/bulkUpload/server.ts#L97-L102)) and the route
schema must be rewritten for the inverted shape (they currently check the
**values** for `address_1`/`postcode`).
### Q7 (queued) — Review/Edit UI for classified mappings
### Q6 — Lambda edits
Is the per-row review/edit surface in scope for this iteration, or deferred?
User has not addressed yet. ADR-0002 calls this "the future override
frontend" and treats it as deferred work — but "front end e2e from
bulk_upload" could reasonably include it.
`LandlordDescriptionOverridesTriggerBody` (task_id, sub_task_id, s3_uri,
portfolio_id; `extra="allow"`) gains a **`column_mapping: dict[str, str]`**
field carrying **only the classifier subset** (`category → source header`); the
frontend extracts the four classifier keys from its unified map before sending.
The handler keeps a fixed registry of the four category builders (each owns its
enum, repo, and any hint such as the wall construction-date one) and supplies
`source_column` from `column_mapping`, **skipping categories the user didn't
map**. Both `property_type` and `built_form_type` carry `"Property Type"`, so
PT+BF sharing falls out for free. Drop the hardcoded trigger and the 20-row cap.
## Resuming
### Q7 — Results UI: read-only this iteration
Re-read this file, then ask Q3. Don't re-litigate Q1/Q2 unless the user
reopens them.
This iteration ships a **read-only** results surface: it reads the four
`landlord_*_overrides` tables and shows each `description → value` (per
portfolio) so the pipeline is observably e2e. **No editing / write-back** — the
user-override write path (correcting a row to `source='user'`, which the
classifier upsert won't overwrite) is the deferred "future override frontend"
ADR-0002 anticipates.
## Next steps
Grilling is complete (Q1Q7). Suggested follow-ups:
1. Promote Q4 (trigger + state-machine integration) to a **frontend ADR-0003**;
cross-link the backend's ADR-0003 (which already superseded ADR-0002's
Next.js-writes clause).
2. Break the work into issues (`/to-issues`): (a) invert the mapping shape + UI
+ migration + validation; (b) `sub_task` discriminator + progress view;
(c) classifier trigger (new FastAPI endpoint + payload, fire at "Start");
(d) lambda edits (trigger body + registry-from-mapping); (e) read-only
results view.
3. Confirm the working branch (see header) before implementation starts.