assessment-model/docs/wip/multi-entry-ordering-plan.md
Jun-te Kim 65a5f57321 Confirm building-part ordering on awaiting_review (#297)
Adds the multiEntryOrdering jsonb column + interactive order picker: the
largest-count multi-entry sample is shown with a building-part dropdown per
file position (one Main building + Extensions), validated as a permutation.
A PATCH route persists { count: permutation } + confirmed, and Finalise is
disabled until the ordering is confirmed when the upload is multi-entry.

Migration for the new column is intentionally not included here — generated
after origin/main is merged (its multi_entry_summary migration lands first).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 09:38:48 +00:00

178 lines
9.3 KiB
Markdown

# Multi-entry building-part ordering — in-flight design notes
**Status:** Grilling complete (2026-06-02) — ready to break into issues
**Branch:** `feature/frontend_landlord_overrides`
**Author:** Jun-te (with Claude, via `/grill-me`)
A _design-in-progress_ document, not the ADR. It records the decisions reached
during grilling so the conversation can resume without re-litigating settled
questions. The flow + schema decision is promoted to
[ADR-0004](../adr/0004-multi-entry-building-part-ordering.md); new domain terms
are promoted to [CONTEXT.md](../../CONTEXT.md#building-parts).
## Goal
After address matching and classification finish, a single address row can carry
**comma-separated entries** in physical-element columns — e.g.
`Walls = "Cavity: AsBuilt (1976-1982), Cavity: FilledCavity"`,
`Roofs = "Flat: As Built, PitchedNormalLoftAccess: 200mm"`. Each entry is a
**building part** (main building + extensions). The order is ambiguous and a
**consistent per-file mistake**, so we capture the correct ordering from the user
**once per file** and persist it on the BulkUpload for a later consumer.
## Backstory / ground truth (verified against the example file + code)
- In `ARA AddressProfiling_Download_28-04-2026_10501 (2).xlsx` (32,213 data
rows): **0 UPRNs appear in more than one row** — multi-entry is
comma-separated values **inside one cell**, never multiple rows per address.
- In a multi-entry row the multi-valued columns **agree on count** (Walls=2 ∧
Roofs=2) while whole-dwelling columns stay at 1 (`Property Type` = `"House:
EndTerrace"`). So position *i* is the **same building part across every
multi-valued column**.
- The classifier today **discards** this: [`get_col_to_description_mappings`](/workspaces/home/github/Model/orchestration/landlord_description_overrides_orchestrator.py)
does `value.split(",")` into a **`set`** — orderless, deduped. Correct for the
vocabulary layer (description→enum), but it drops exactly the
position/building-part association this feature needs.
- This is the **per-Property building-part fact** territory ADR-0002 deferred
("a per-Property fact layer (not yet modelled)"). We are **not** building that
layer here — only **capturing** the ordering it will need.
## Decided
### Q1 — Order semantics: full reorder, keyed by count
Position *i* = a building part. The user supplies a **permutation per distinct
entry-count**; persisted as `{ count: permutation }`. This iteration captures
only the **largest-count** sample (see Q5).
### Q1.1 — Order scope: one ordering across all columns
A single per-count permutation realigns **every** multi-valued column at once
(index-aligned — Walls[i] and Roofs[i] are the same part). Not per-column.
Matches the data (counts agree across columns).
### Q1.2 — Mixed counts: single-value columns are whole-property
A 1-entry column (e.g. `Property Type`) is a **whole-dwelling** fact attached to
the property; only columns with N>1 are sliced into building parts. No padding.
### Q2 — Scope: capture + persist ordering only
Detect multi-entry, show one sample address + our classification, capture the
per-count ordering, persist on the BulkUpload. **Not** in scope: the
per-Property fact table or writing main/extension facts at finalise. The
ordering is stored for a later consumer.
### Q2.1 — Editable verification IS in scope (expands Q2)
The "verify classification" step lets the user **correct** a classification,
written back as `source='user'`. This deliberately picks up ADR-0002 Q7's
deferred **vocabulary** user-override write path — distinct from the per-Property
fact layer, which stays deferred.
### Q3 — Placement: on the `awaiting_review` surface
Render the flow on the existing
[OnboardingProgress](<../../src/app/portfolio/[slug]/(portfolio)/bulk-upload/[uploadId]/OnboardingProgress.tsx>)
page when `status === "awaiting_review"`. Classification finishes *before* the
combiner (both subtasks must complete → combiner → `awaiting_review`), so by the
time Finalise is offered the classification output exists. No new route.
### Q3.1 — Flow: two-step stepper, steps appear independently
- **Step 1 — Verify classification** — shows whenever **≥1 classifier column**
was mapped.
- **Step 2 — Confirm order** — shows only when **multi-entry was detected**.
- A file with classifier columns but no multi-entry shows only Step 1; a file
with neither goes straight to Finalise.
### Q3.2 — Gate: both steps gate Finalise (where each applies)
`canFinalize = status==="awaiting_review" && (noClassifierCols || verifyAck) &&
(noMultiEntry || orderingConfirmed)`. Two flags persisted. Finalise is one
click but the button stays disabled until its applicable gates are satisfied.
### Q4 — Verify step lists the sample address's entries only
Step 1 lists just the descriptions in the **one sample address** (matches "one
address"). Because a correction is per-`(portfolio, description)`, editing one
changes the mapping **portfolio-wide** for that text — the UI must say so. A
spot-check, not full-vocabulary coverage.
### Q4.1 — Write-back: Next.js upsert, `source='user'`, single row (as built)
A Next.js route handler / server action upserts the `landlord_*_overrides` row
by `(portfolio_id, description)` setting `value` + `source='user'`, validating
against the pgEnum. **Schema unchanged** — we keep ADR-0002's `UNIQUE
(portfolio_id, description)` and flip the single row's source in place. The
Python classifier's existing `ON CONFLICT … WHERE source='classifier'`
([landlord_overrides_postgres_repository.py:84-91](/workspaces/home/github/Model/infrastructure/landlord_overrides/landlord_overrides_postgres_repository.py#L84))
then never re-clobbers it.
> Considered and **rejected**: two rows per description (classifier + user) with
> read-time `user > classifier` resolution. It buys "revert to our suggestion" +
> provenance, and is cheap now (no readers exist yet), but reopens ADR-0002's
> `UNIQUE` decision and migrates Drizzle + 4 Python tables + the conflict target.
> Not worth it for this iteration; the single-row flip already gives "user wins".
> This is the first Next.js writer of a `source='user'` row.
### Q5 — Which sample: the largest-count row
Show one sample address — the row with the **most** building parts — so ordering
it reveals the fullest convention. In the common case (only N=2) that is a
single 2-part address.
### Q5.1 — Reorder UI: label each position
Lay the file's entries out as rows (position 0, 1, …), each with a building-part
dropdown (**Main building** / **Extension 1** / …). Assigning labels yields the
permutation and validates (each part used once, exactly one Main building). All
multi-valued columns are shown together, each raw entry annotated with our
classified enum, so the user sanity-checks classification **and** alignment.
### Q6 — Detection: at start, persist a summary
Compute the multi-entry summary in the **start-address-matching POST**
([route.ts:106](<../../src/app/api/portfolio/[portfolioId]/bulk-uploads/[uploadId]/start-address-matching/route.ts#L106>))
where the full `rows` are already parsed in memory — which columns are
multi-valued, the distinct counts (with row-counts so we can pick the largest),
and the largest-count sample (address + per-column raw entries). Avoids
re-reading a 32k-row file at render. Classification enums are joined at render
from the override tables.
### Q7 — Persistence: two jsonb columns on `bulk_address_uploads`
- `multiEntrySummary jsonb` — written at start (detection).
- `multiEntryOrdering jsonb` — written at confirm: `{ count: permutation }` plus
`verifyAck` / `orderingConfirmed` flags (final shape TBD; may split flags into
their own columns).
No new table — mirrors how `columnMapping` lives on the upload row.
## Risks / load-bearing assumptions
1. **Consistent-mistake assumption.** All rows of a given count share one
ordering convention. The whole "ask once" design rests on this; if a file
mixes conventions within a count, a single per-count permutation is wrong.
2. **Largest-count-only capture.** Smaller counts stay unpopulated in the map.
A future consumer (or a later UI iteration) needs a derivation rule to apply
the convention to other counts.
3. **Normalization coupling — mitigated.** To join the sample's raw entries to
the override tables the frontend must match the backend's `split(",")`
`strip``lower`. **Resolution:** store the *normalized* description keys in
`multiEntrySummary` at start (the route already holds the rows), so the
render-time join is exact-match — no cross-repo string-normalization drift.
4. **Portfolio-wide blast radius.** A verify-step edit changes the mapping for
every row with that description, not just the sample address. Must be
messaged in the UI.
## Suggested issues (`/to-issues`)
1. Schema: two jsonb columns on `bulk_address_uploads` + migration.
2. Detection at start: compute + persist `multiEntrySummary` (with normalized
description keys).
3. Verify step: list sample descriptions → enum (join override tables),
editable; Next.js upsert route writing `source='user'`; `verifyAck` flag.
4. Order step: largest-count sample, position→part dropdowns → permutation;
persist `multiEntryOrdering`; `orderingConfirmed` flag.
5. Gate: wire `canFinalize` to the two flags; conditional stepper rendering.