assessment-model/docs/wip/multi-entry-ordering-plan.md
Jun-te Kim af86d53397 Confirm building-part ordering on awaiting_review (#297)
Adds the multiEntryOrdering jsonb column + interactive order picker: the
largest-count multi-entry sample is shown with a building-part dropdown per
file position (one Main building + Extensions), validated as a permutation.
A PATCH route persists { count: permutation } + confirmed, and Finalise is
disabled until the ordering is confirmed when the upload is multi-entry.

Migration for the new column is intentionally not included here — generated
after origin/main is merged (its multi_entry_summary migration lands first).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 13:46:55 +00:00

9.3 KiB

Multi-entry building-part ordering — in-flight design notes

Status: Grilling complete (2026-06-02) — ready to break into issues Branch: feature/frontend_landlord_overrides Author: Jun-te (with Claude, via /grill-me)

A design-in-progress document, not the ADR. It records the decisions reached during grilling so the conversation can resume without re-litigating settled questions. The flow + schema decision is promoted to ADR-0004; new domain terms are promoted to CONTEXT.md.

Goal

After address matching and classification finish, a single address row can carry comma-separated entries in physical-element columns — e.g. Walls = "Cavity: AsBuilt (1976-1982), Cavity: FilledCavity", Roofs = "Flat: As Built, PitchedNormalLoftAccess: 200mm". Each entry is a building part (main building + extensions). The order is ambiguous and a consistent per-file mistake, so we capture the correct ordering from the user once per file and persist it on the BulkUpload for a later consumer.

Backstory / ground truth (verified against the example file + code)

  • In ARA AddressProfiling_Download_28-04-2026_10501 (2).xlsx (32,213 data rows): 0 UPRNs appear in more than one row — multi-entry is comma-separated values inside one cell, never multiple rows per address.
  • In a multi-entry row the multi-valued columns agree on count (Walls=2 ∧ Roofs=2) while whole-dwelling columns stay at 1 (Property Type = "House: EndTerrace"). So position i is the same building part across every multi-valued column.
  • The classifier today discards this: get_col_to_description_mappings does value.split(",") into a set — orderless, deduped. Correct for the vocabulary layer (description→enum), but it drops exactly the position/building-part association this feature needs.
  • This is the per-Property building-part fact territory ADR-0002 deferred ("a per-Property fact layer (not yet modelled)"). We are not building that layer here — only capturing the ordering it will need.

Decided

Q1 — Order semantics: full reorder, keyed by count

Position i = a building part. The user supplies a permutation per distinct entry-count; persisted as { count: permutation }. This iteration captures only the largest-count sample (see Q5).

Q1.1 — Order scope: one ordering across all columns

A single per-count permutation realigns every multi-valued column at once (index-aligned — Walls[i] and Roofs[i] are the same part). Not per-column. Matches the data (counts agree across columns).

Q1.2 — Mixed counts: single-value columns are whole-property

A 1-entry column (e.g. Property Type) is a whole-dwelling fact attached to the property; only columns with N>1 are sliced into building parts. No padding.

Q2 — Scope: capture + persist ordering only

Detect multi-entry, show one sample address + our classification, capture the per-count ordering, persist on the BulkUpload. Not in scope: the per-Property fact table or writing main/extension facts at finalise. The ordering is stored for a later consumer.

Q2.1 — Editable verification IS in scope (expands Q2)

The "verify classification" step lets the user correct a classification, written back as source='user'. This deliberately picks up ADR-0002 Q7's deferred vocabulary user-override write path — distinct from the per-Property fact layer, which stays deferred.

Q3 — Placement: on the awaiting_review surface

Render the flow on the existing OnboardingProgress page when status === "awaiting_review". Classification finishes before the combiner (both subtasks must complete → combiner → awaiting_review), so by the time Finalise is offered the classification output exists. No new route.

Q3.1 — Flow: two-step stepper, steps appear independently

  • Step 1 — Verify classification — shows whenever ≥1 classifier column was mapped.
  • Step 2 — Confirm order — shows only when multi-entry was detected.
  • A file with classifier columns but no multi-entry shows only Step 1; a file with neither goes straight to Finalise.

Q3.2 — Gate: both steps gate Finalise (where each applies)

canFinalize = status==="awaiting_review" && (noClassifierCols || verifyAck) && (noMultiEntry || orderingConfirmed). Two flags persisted. Finalise is one click but the button stays disabled until its applicable gates are satisfied.

Q4 — Verify step lists the sample address's entries only

Step 1 lists just the descriptions in the one sample address (matches "one address"). Because a correction is per-(portfolio, description), editing one changes the mapping portfolio-wide for that text — the UI must say so. A spot-check, not full-vocabulary coverage.

Q4.1 — Write-back: Next.js upsert, source='user', single row (as built)

A Next.js route handler / server action upserts the landlord_*_overrides row by (portfolio_id, description) setting value + source='user', validating against the pgEnum. Schema unchanged — we keep ADR-0002's UNIQUE (portfolio_id, description) and flip the single row's source in place. The Python classifier's existing ON CONFLICT … WHERE source='classifier' (landlord_overrides_postgres_repository.py:84-91) then never re-clobbers it.

Considered and rejected: two rows per description (classifier + user) with read-time user > classifier resolution. It buys "revert to our suggestion" + provenance, and is cheap now (no readers exist yet), but reopens ADR-0002's UNIQUE decision and migrates Drizzle + 4 Python tables + the conflict target. Not worth it for this iteration; the single-row flip already gives "user wins". This is the first Next.js writer of a source='user' row.

Q5 — Which sample: the largest-count row

Show one sample address — the row with the most building parts — so ordering it reveals the fullest convention. In the common case (only N=2) that is a single 2-part address.

Q5.1 — Reorder UI: label each position

Lay the file's entries out as rows (position 0, 1, …), each with a building-part dropdown (Main building / Extension 1 / …). Assigning labels yields the permutation and validates (each part used once, exactly one Main building). All multi-valued columns are shown together, each raw entry annotated with our classified enum, so the user sanity-checks classification and alignment.

Q6 — Detection: at start, persist a summary

Compute the multi-entry summary in the start-address-matching POST (route.ts:106) where the full rows are already parsed in memory — which columns are multi-valued, the distinct counts (with row-counts so we can pick the largest), and the largest-count sample (address + per-column raw entries). Avoids re-reading a 32k-row file at render. Classification enums are joined at render from the override tables.

Q7 — Persistence: two jsonb columns on bulk_address_uploads

  • multiEntrySummary jsonb — written at start (detection).
  • multiEntryOrdering jsonb — written at confirm: { count: permutation } plus verifyAck / orderingConfirmed flags (final shape TBD; may split flags into their own columns).

No new table — mirrors how columnMapping lives on the upload row.

Risks / load-bearing assumptions

  1. Consistent-mistake assumption. All rows of a given count share one ordering convention. The whole "ask once" design rests on this; if a file mixes conventions within a count, a single per-count permutation is wrong.
  2. Largest-count-only capture. Smaller counts stay unpopulated in the map. A future consumer (or a later UI iteration) needs a derivation rule to apply the convention to other counts.
  3. Normalization coupling — mitigated. To join the sample's raw entries to the override tables the frontend must match the backend's split(",")striplower. Resolution: store the normalized description keys in multiEntrySummary at start (the route already holds the rows), so the render-time join is exact-match — no cross-repo string-normalization drift.
  4. Portfolio-wide blast radius. A verify-step edit changes the mapping for every row with that description, not just the sample address. Must be messaged in the UI.

Suggested issues (/to-issues)

  1. Schema: two jsonb columns on bulk_address_uploads + migration.
  2. Detection at start: compute + persist multiEntrySummary (with normalized description keys).
  3. Verify step: list sample descriptions → enum (join override tables), editable; Next.js upsert route writing source='user'; verifyAck flag.
  4. Order step: largest-count sample, position→part dropdowns → permutation; persist multiEntryOrdering; orderingConfirmed flag.
  5. Gate: wire canFinalize to the two flags; conditional stepper rendering.