# Multi-entry building-part ordering — in-flight design notes **Status:** Grilling complete (2026-06-02) — ready to break into issues **Branch:** `feature/frontend_landlord_overrides` **Author:** Jun-te (with Claude, via `/grill-me`) A _design-in-progress_ document, not the ADR. It records the decisions reached during grilling so the conversation can resume without re-litigating settled questions. The flow + schema decision is promoted to [ADR-0004](../adr/0004-multi-entry-building-part-ordering.md); new domain terms are promoted to [CONTEXT.md](../../CONTEXT.md#building-parts). ## Goal After address matching and classification finish, a single address row can carry **comma-separated entries** in physical-element columns — e.g. `Walls = "Cavity: AsBuilt (1976-1982), Cavity: FilledCavity"`, `Roofs = "Flat: As Built, PitchedNormalLoftAccess: 200mm"`. Each entry is a **building part** (main building + extensions). The order is ambiguous and a **consistent per-file mistake**, so we capture the correct ordering from the user **once per file** and persist it on the BulkUpload for a later consumer. ## Backstory / ground truth (verified against the example file + code) - In `ARA AddressProfiling_Download_28-04-2026_10501 (2).xlsx` (32,213 data rows): **0 UPRNs appear in more than one row** — multi-entry is comma-separated values **inside one cell**, never multiple rows per address. - In a multi-entry row the multi-valued columns **agree on count** (Walls=2 ∧ Roofs=2) while whole-dwelling columns stay at 1 (`Property Type` = `"House: EndTerrace"`). So position *i* is the **same building part across every multi-valued column**. - The classifier today **discards** this: [`get_col_to_description_mappings`](/workspaces/home/github/Model/orchestration/landlord_description_overrides_orchestrator.py) does `value.split(",")` into a **`set`** — orderless, deduped. Correct for the vocabulary layer (description→enum), but it drops exactly the position/building-part association this feature needs. - This is the **per-Property building-part fact** territory ADR-0002 deferred ("a per-Property fact layer (not yet modelled)"). We are **not** building that layer here — only **capturing** the ordering it will need. ## Decided ### Q1 — Order semantics: full reorder, keyed by count Position *i* = a building part. The user supplies a **permutation per distinct entry-count**; persisted as `{ count: permutation }`. This iteration captures only the **largest-count** sample (see Q5). ### Q1.1 — Order scope: one ordering across all columns A single per-count permutation realigns **every** multi-valued column at once (index-aligned — Walls[i] and Roofs[i] are the same part). Not per-column. Matches the data (counts agree across columns). ### Q1.2 — Mixed counts: single-value columns are whole-property A 1-entry column (e.g. `Property Type`) is a **whole-dwelling** fact attached to the property; only columns with N>1 are sliced into building parts. No padding. ### Q2 — Scope: capture + persist ordering only Detect multi-entry, show one sample address + our classification, capture the per-count ordering, persist on the BulkUpload. **Not** in scope: the per-Property fact table or writing main/extension facts at finalise. The ordering is stored for a later consumer. ### Q2.1 — Editable verification IS in scope (expands Q2) The "verify classification" step lets the user **correct** a classification, written back as `source='user'`. This deliberately picks up ADR-0002 Q7's deferred **vocabulary** user-override write path — distinct from the per-Property fact layer, which stays deferred. ### Q3 — Placement: on the `awaiting_review` surface Render the flow on the existing [OnboardingProgress](<../../src/app/portfolio/[slug]/(portfolio)/bulk-upload/[uploadId]/OnboardingProgress.tsx>) page when `status === "awaiting_review"`. Classification finishes *before* the combiner (both subtasks must complete → combiner → `awaiting_review`), so by the time Finalise is offered the classification output exists. No new route. ### Q3.1 — Flow: two-step stepper, steps appear independently - **Step 1 — Verify classification** — shows whenever **≥1 classifier column** was mapped. - **Step 2 — Confirm order** — shows only when **multi-entry was detected**. - A file with classifier columns but no multi-entry shows only Step 1; a file with neither goes straight to Finalise. ### Q3.2 — Gate: both steps gate Finalise (where each applies) `canFinalize = status==="awaiting_review" && (noClassifierCols || verifyAck) && (noMultiEntry || orderingConfirmed)`. Two flags persisted. Finalise is one click but the button stays disabled until its applicable gates are satisfied. ### Q4 — Verify step lists the sample address's entries only Step 1 lists just the descriptions in the **one sample address** (matches "one address"). Because a correction is per-`(portfolio, description)`, editing one changes the mapping **portfolio-wide** for that text — the UI must say so. A spot-check, not full-vocabulary coverage. ### Q4.1 — Write-back: Next.js upsert, `source='user'`, single row (as built) A Next.js route handler / server action upserts the `landlord_*_overrides` row by `(portfolio_id, description)` setting `value` + `source='user'`, validating against the pgEnum. **Schema unchanged** — we keep ADR-0002's `UNIQUE (portfolio_id, description)` and flip the single row's source in place. The Python classifier's existing `ON CONFLICT … WHERE source='classifier'` ([landlord_overrides_postgres_repository.py:84-91](/workspaces/home/github/Model/infrastructure/landlord_overrides/landlord_overrides_postgres_repository.py#L84)) then never re-clobbers it. > Considered and **rejected**: two rows per description (classifier + user) with > read-time `user > classifier` resolution. It buys "revert to our suggestion" + > provenance, and is cheap now (no readers exist yet), but reopens ADR-0002's > `UNIQUE` decision and migrates Drizzle + 4 Python tables + the conflict target. > Not worth it for this iteration; the single-row flip already gives "user wins". > This is the first Next.js writer of a `source='user'` row. ### Q5 — Which sample: the largest-count row Show one sample address — the row with the **most** building parts — so ordering it reveals the fullest convention. In the common case (only N=2) that is a single 2-part address. ### Q5.1 — Reorder UI: label each position Lay the file's entries out as rows (position 0, 1, …), each with a building-part dropdown (**Main building** / **Extension 1** / …). Assigning labels yields the permutation and validates (each part used once, exactly one Main building). All multi-valued columns are shown together, each raw entry annotated with our classified enum, so the user sanity-checks classification **and** alignment. ### Q6 — Detection: at start, persist a summary Compute the multi-entry summary in the **start-address-matching POST** ([route.ts:106](<../../src/app/api/portfolio/[portfolioId]/bulk-uploads/[uploadId]/start-address-matching/route.ts#L106>)) where the full `rows` are already parsed in memory — which columns are multi-valued, the distinct counts (with row-counts so we can pick the largest), and the largest-count sample (address + per-column raw entries). Avoids re-reading a 32k-row file at render. Classification enums are joined at render from the override tables. ### Q7 — Persistence: two jsonb columns on `bulk_address_uploads` - `multiEntrySummary jsonb` — written at start (detection). - `multiEntryOrdering jsonb` — written at confirm: `{ count: permutation }` plus `verifyAck` / `orderingConfirmed` flags (final shape TBD; may split flags into their own columns). No new table — mirrors how `columnMapping` lives on the upload row. ## Risks / load-bearing assumptions 1. **Consistent-mistake assumption.** All rows of a given count share one ordering convention. The whole "ask once" design rests on this; if a file mixes conventions within a count, a single per-count permutation is wrong. 2. **Largest-count-only capture.** Smaller counts stay unpopulated in the map. A future consumer (or a later UI iteration) needs a derivation rule to apply the convention to other counts. 3. **Normalization coupling — mitigated.** To join the sample's raw entries to the override tables the frontend must match the backend's `split(",")` → `strip` → `lower`. **Resolution:** store the *normalized* description keys in `multiEntrySummary` at start (the route already holds the rows), so the render-time join is exact-match — no cross-repo string-normalization drift. 4. **Portfolio-wide blast radius.** A verify-step edit changes the mapping for every row with that description, not just the sample address. Must be messaged in the UI. ## Suggested issues (`/to-issues`) 1. Schema: two jsonb columns on `bulk_address_uploads` + migration. 2. Detection at start: compute + persist `multiEntrySummary` (with normalized description keys). 3. Verify step: list sample descriptions → enum (join override tables), editable; Next.js upsert route writing `source='user'`; `verifyAck` flag. 4. Order step: largest-count sample, position→part dropdowns → permutation; persist `multiEntryOrdering`; `orderingConfirmed` flag. 5. Gate: wire `canFinalize` to the two flags; conditional stepper rendering.