assessment-model/docs/wip/multi-entry-ordering-plan.md

# Multi-entry building-part ordering — in-flight design notes

**Status:** Grilling complete (2026-06-02) — ready to break into issues
**Branch:** `feature/frontend_landlord_overrides`
**Author:** Jun-te (with Claude, via `/grill-me`)

A _design-in-progress_ document, not the ADR. It records the decisions reached
during grilling so the conversation can resume without re-litigating settled
questions. The flow + schema decision is promoted to
[ADR-0004](../adr/0004-multi-entry-building-part-ordering.md); new domain terms
are promoted to [CONTEXT.md](../../CONTEXT.md#building-parts).

## Goal

After address matching and classification finish, a single address row can carry
**comma-separated entries** in physical-element columns — e.g.
`Walls = "Cavity: AsBuilt (1976-1982), Cavity: FilledCavity"`,
`Roofs = "Flat: As Built, PitchedNormalLoftAccess: 200mm"`. Each entry is a
**building part** (main building + extensions). The order is ambiguous and a
**consistent per-file mistake**, so we capture the correct ordering from the user
**once per file** and persist it on the BulkUpload for a later consumer.

## Backstory / ground truth (verified against the example file + code)

- In `ARA AddressProfiling_Download_28-04-2026_10501 (2).xlsx` (32,213 data
  rows): **0 UPRNs appear in more than one row** — multi-entry is
  comma-separated values **inside one cell**, never multiple rows per address.
- In a multi-entry row the multi-valued columns **agree on count** (Walls=2 ∧
  Roofs=2) while whole-dwelling columns stay at 1 (`Property Type` = `"House:
  EndTerrace"`). So position *i* is the **same building part across every
  multi-valued column**.
- The classifier today **discards** this: [`get_col_to_description_mappings`](/workspaces/home/github/Model/orchestration/landlord_description_overrides_orchestrator.py)
  does `value.split(",")` into a **`set`** — orderless, deduped. Correct for the
  vocabulary layer (description→enum), but it drops exactly the
  position/building-part association this feature needs.
- This is the **per-Property building-part fact** territory ADR-0002 deferred
  ("a per-Property fact layer (not yet modelled)"). We are **not** building that
  layer here — only **capturing** the ordering it will need.

## Decided

### Q1 — Order semantics: full reorder, keyed by count

Position *i* = a building part. The user supplies a **permutation per distinct
entry-count**; persisted as `{ count: permutation }`. This iteration captures
only the **largest-count** sample (see Q5).

### Q1.1 — Order scope: one ordering across all columns

A single per-count permutation realigns **every** multi-valued column at once
(index-aligned — Walls[i] and Roofs[i] are the same part). Not per-column.
Matches the data (counts agree across columns).

### Q1.2 — Mixed counts: single-value columns are whole-property

A 1-entry column (e.g. `Property Type`) is a **whole-dwelling** fact attached to
the property; only columns with N>1 are sliced into building parts. No padding.

### Q2 — Scope: capture + persist ordering only

Detect multi-entry, show one sample address + our classification, capture the
per-count ordering, persist on the BulkUpload. **Not** in scope: the
per-Property fact table or writing main/extension facts at finalise. The
ordering is stored for a later consumer.

### Q2.1 — Editable verification IS in scope (expands Q2)

The "verify classification" step lets the user **correct** a classification,
written back as `source='user'`. This deliberately picks up ADR-0002 Q7's
deferred **vocabulary** user-override write path — distinct from the per-Property
fact layer, which stays deferred.

### Q3 — Placement: on the `awaiting_review` surface

Render the flow on the existing
[OnboardingProgress](<../../src/app/portfolio/[slug]/(portfolio)/bulk-upload/[uploadId]/OnboardingProgress.tsx>)
page when `status === "awaiting_review"`. Classification finishes *before* the
combiner (both subtasks must complete → combiner → `awaiting_review`), so by the
time Finalise is offered the classification output exists. No new route.

### Q3.1 — Flow: two-step stepper, steps appear independently

- **Step 1 — Verify classification** — shows whenever **≥1 classifier column**
  was mapped.
- **Step 2 — Confirm order** — shows only when **multi-entry was detected**.
- A file with classifier columns but no multi-entry shows only Step 1; a file
  with neither goes straight to Finalise.

### Q3.2 — Gate: both steps gate Finalise (where each applies)

`canFinalize = status==="awaiting_review" && (noClassifierCols || verifyAck) &&
(noMultiEntry || orderingConfirmed)`. Two flags persisted. Finalise is one
click but the button stays disabled until its applicable gates are satisfied.

### Q4 — Verify step lists the sample address's entries only

Step 1 lists just the descriptions in the **one sample address** (matches "one
address"). Because a correction is per-`(portfolio, description)`, editing one
changes the mapping **portfolio-wide** for that text — the UI must say so. A
spot-check, not full-vocabulary coverage.

### Q4.1 — Write-back: Next.js upsert, `source='user'`, single row (as built)

A Next.js route handler / server action upserts the `landlord_*_overrides` row
by `(portfolio_id, description)` setting `value` + `source='user'`, validating
against the pgEnum. **Schema unchanged** — we keep ADR-0002's `UNIQUE
(portfolio_id, description)` and flip the single row's source in place. The
Python classifier's existing `ON CONFLICT … WHERE source='classifier'`
([landlord_overrides_postgres_repository.py:84-91](/workspaces/home/github/Model/infrastructure/landlord_overrides/landlord_overrides_postgres_repository.py#L84))
then never re-clobbers it.

> Considered and **rejected**: two rows per description (classifier + user) with
> read-time `user > classifier` resolution. It buys "revert to our suggestion" +
> provenance, and is cheap now (no readers exist yet), but reopens ADR-0002's
> `UNIQUE` decision and migrates Drizzle + 4 Python tables + the conflict target.
> Not worth it for this iteration; the single-row flip already gives "user wins".
> This is the first Next.js writer of a `source='user'` row.

### Q5 — Which sample: the largest-count row

Show one sample address — the row with the **most** building parts — so ordering
it reveals the fullest convention. In the common case (only N=2) that is a
single 2-part address.

### Q5.1 — Reorder UI: label each position

Lay the file's entries out as rows (position 0, 1, …), each with a building-part
dropdown (**Main building** / **Extension 1** / …). Assigning labels yields the
permutation and validates (each part used once, exactly one Main building). All
multi-valued columns are shown together, each raw entry annotated with our
classified enum, so the user sanity-checks classification **and** alignment.

### Q6 — Detection: at start, persist a summary

Compute the multi-entry summary in the **start-address-matching POST**
([route.ts:106](<../../src/app/api/portfolio/[portfolioId]/bulk-uploads/[uploadId]/start-address-matching/route.ts#L106>))
where the full `rows` are already parsed in memory — which columns are
multi-valued, the distinct counts (with row-counts so we can pick the largest),
and the largest-count sample (address + per-column raw entries). Avoids
re-reading a 32k-row file at render. Classification enums are joined at render
from the override tables.

### Q7 — Persistence: two jsonb columns on `bulk_address_uploads`

- `multiEntrySummary jsonb` — written at start (detection).
- `multiEntryOrdering jsonb` — written at confirm: `{ count: permutation }` plus
  `verifyAck` / `orderingConfirmed` flags (final shape TBD; may split flags into
  their own columns).

No new table — mirrors how `columnMapping` lives on the upload row.

## Risks / load-bearing assumptions

1. **Consistent-mistake assumption.** All rows of a given count share one
   ordering convention. The whole "ask once" design rests on this; if a file
   mixes conventions within a count, a single per-count permutation is wrong.
2. **Largest-count-only capture.** Smaller counts stay unpopulated in the map.
   A future consumer (or a later UI iteration) needs a derivation rule to apply
   the convention to other counts.
3. **Normalization coupling — mitigated.** To join the sample's raw entries to
   the override tables the frontend must match the backend's `split(",")` →
   `strip` → `lower`. **Resolution:** store the *normalized* description keys in
   `multiEntrySummary` at start (the route already holds the rows), so the
   render-time join is exact-match — no cross-repo string-normalization drift.
4. **Portfolio-wide blast radius.** A verify-step edit changes the mapping for
   every row with that description, not just the sample address. Must be
   messaged in the UI.

## Suggested issues (`/to-issues`)

1. Schema: two jsonb columns on `bulk_address_uploads` + migration.
2. Detection at start: compute + persist `multiEntrySummary` (with normalized
   description keys).
3. Verify step: list sample descriptions → enum (join override tables),
   editable; Next.js upsert route writing `source='user'`; `verifyAck` flag.
4. Order step: largest-count sample, position→part dropdowns → permutation;
   persist `multiEntryOrdering`; `orderingConfirmed` flag.
5. Gate: wire `canFinalize` to the two flags; conditional stepper rendering.