mirror of
https://github.com/Hestia-Homes/assessment-model.git
synced 2026-06-30 12:55:02 +00:00
Adds the multiEntryOrdering jsonb column + interactive order picker: the
largest-count multi-entry sample is shown with a building-part dropdown per
file position (one Main building + Extensions), validated as a permutation.
A PATCH route persists { count: permutation } + confirmed, and Finalise is
disabled until the ordering is confirmed when the upload is multi-entry.
Migration for the new column is intentionally not included here — generated
after origin/main is merged (its multi_entry_summary migration lands first).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
178 lines
9.3 KiB
Markdown
178 lines
9.3 KiB
Markdown
# Multi-entry building-part ordering — in-flight design notes
|
|
|
|
**Status:** Grilling complete (2026-06-02) — ready to break into issues
|
|
**Branch:** `feature/frontend_landlord_overrides`
|
|
**Author:** Jun-te (with Claude, via `/grill-me`)
|
|
|
|
A _design-in-progress_ document, not the ADR. It records the decisions reached
|
|
during grilling so the conversation can resume without re-litigating settled
|
|
questions. The flow + schema decision is promoted to
|
|
[ADR-0004](../adr/0004-multi-entry-building-part-ordering.md); new domain terms
|
|
are promoted to [CONTEXT.md](../../CONTEXT.md#building-parts).
|
|
|
|
## Goal
|
|
|
|
After address matching and classification finish, a single address row can carry
|
|
**comma-separated entries** in physical-element columns — e.g.
|
|
`Walls = "Cavity: AsBuilt (1976-1982), Cavity: FilledCavity"`,
|
|
`Roofs = "Flat: As Built, PitchedNormalLoftAccess: 200mm"`. Each entry is a
|
|
**building part** (main building + extensions). The order is ambiguous and a
|
|
**consistent per-file mistake**, so we capture the correct ordering from the user
|
|
**once per file** and persist it on the BulkUpload for a later consumer.
|
|
|
|
## Backstory / ground truth (verified against the example file + code)
|
|
|
|
- In `ARA AddressProfiling_Download_28-04-2026_10501 (2).xlsx` (32,213 data
|
|
rows): **0 UPRNs appear in more than one row** — multi-entry is
|
|
comma-separated values **inside one cell**, never multiple rows per address.
|
|
- In a multi-entry row the multi-valued columns **agree on count** (Walls=2 ∧
|
|
Roofs=2) while whole-dwelling columns stay at 1 (`Property Type` = `"House:
|
|
EndTerrace"`). So position *i* is the **same building part across every
|
|
multi-valued column**.
|
|
- The classifier today **discards** this: [`get_col_to_description_mappings`](/workspaces/home/github/Model/orchestration/landlord_description_overrides_orchestrator.py)
|
|
does `value.split(",")` into a **`set`** — orderless, deduped. Correct for the
|
|
vocabulary layer (description→enum), but it drops exactly the
|
|
position/building-part association this feature needs.
|
|
- This is the **per-Property building-part fact** territory ADR-0002 deferred
|
|
("a per-Property fact layer (not yet modelled)"). We are **not** building that
|
|
layer here — only **capturing** the ordering it will need.
|
|
|
|
## Decided
|
|
|
|
### Q1 — Order semantics: full reorder, keyed by count
|
|
|
|
Position *i* = a building part. The user supplies a **permutation per distinct
|
|
entry-count**; persisted as `{ count: permutation }`. This iteration captures
|
|
only the **largest-count** sample (see Q5).
|
|
|
|
### Q1.1 — Order scope: one ordering across all columns
|
|
|
|
A single per-count permutation realigns **every** multi-valued column at once
|
|
(index-aligned — Walls[i] and Roofs[i] are the same part). Not per-column.
|
|
Matches the data (counts agree across columns).
|
|
|
|
### Q1.2 — Mixed counts: single-value columns are whole-property
|
|
|
|
A 1-entry column (e.g. `Property Type`) is a **whole-dwelling** fact attached to
|
|
the property; only columns with N>1 are sliced into building parts. No padding.
|
|
|
|
### Q2 — Scope: capture + persist ordering only
|
|
|
|
Detect multi-entry, show one sample address + our classification, capture the
|
|
per-count ordering, persist on the BulkUpload. **Not** in scope: the
|
|
per-Property fact table or writing main/extension facts at finalise. The
|
|
ordering is stored for a later consumer.
|
|
|
|
### Q2.1 — Editable verification IS in scope (expands Q2)
|
|
|
|
The "verify classification" step lets the user **correct** a classification,
|
|
written back as `source='user'`. This deliberately picks up ADR-0002 Q7's
|
|
deferred **vocabulary** user-override write path — distinct from the per-Property
|
|
fact layer, which stays deferred.
|
|
|
|
### Q3 — Placement: on the `awaiting_review` surface
|
|
|
|
Render the flow on the existing
|
|
[OnboardingProgress](<../../src/app/portfolio/[slug]/(portfolio)/bulk-upload/[uploadId]/OnboardingProgress.tsx>)
|
|
page when `status === "awaiting_review"`. Classification finishes *before* the
|
|
combiner (both subtasks must complete → combiner → `awaiting_review`), so by the
|
|
time Finalise is offered the classification output exists. No new route.
|
|
|
|
### Q3.1 — Flow: two-step stepper, steps appear independently
|
|
|
|
- **Step 1 — Verify classification** — shows whenever **≥1 classifier column**
|
|
was mapped.
|
|
- **Step 2 — Confirm order** — shows only when **multi-entry was detected**.
|
|
- A file with classifier columns but no multi-entry shows only Step 1; a file
|
|
with neither goes straight to Finalise.
|
|
|
|
### Q3.2 — Gate: both steps gate Finalise (where each applies)
|
|
|
|
`canFinalize = status==="awaiting_review" && (noClassifierCols || verifyAck) &&
|
|
(noMultiEntry || orderingConfirmed)`. Two flags persisted. Finalise is one
|
|
click but the button stays disabled until its applicable gates are satisfied.
|
|
|
|
### Q4 — Verify step lists the sample address's entries only
|
|
|
|
Step 1 lists just the descriptions in the **one sample address** (matches "one
|
|
address"). Because a correction is per-`(portfolio, description)`, editing one
|
|
changes the mapping **portfolio-wide** for that text — the UI must say so. A
|
|
spot-check, not full-vocabulary coverage.
|
|
|
|
### Q4.1 — Write-back: Next.js upsert, `source='user'`, single row (as built)
|
|
|
|
A Next.js route handler / server action upserts the `landlord_*_overrides` row
|
|
by `(portfolio_id, description)` setting `value` + `source='user'`, validating
|
|
against the pgEnum. **Schema unchanged** — we keep ADR-0002's `UNIQUE
|
|
(portfolio_id, description)` and flip the single row's source in place. The
|
|
Python classifier's existing `ON CONFLICT … WHERE source='classifier'`
|
|
([landlord_overrides_postgres_repository.py:84-91](/workspaces/home/github/Model/infrastructure/landlord_overrides/landlord_overrides_postgres_repository.py#L84))
|
|
then never re-clobbers it.
|
|
|
|
> Considered and **rejected**: two rows per description (classifier + user) with
|
|
> read-time `user > classifier` resolution. It buys "revert to our suggestion" +
|
|
> provenance, and is cheap now (no readers exist yet), but reopens ADR-0002's
|
|
> `UNIQUE` decision and migrates Drizzle + 4 Python tables + the conflict target.
|
|
> Not worth it for this iteration; the single-row flip already gives "user wins".
|
|
> This is the first Next.js writer of a `source='user'` row.
|
|
|
|
### Q5 — Which sample: the largest-count row
|
|
|
|
Show one sample address — the row with the **most** building parts — so ordering
|
|
it reveals the fullest convention. In the common case (only N=2) that is a
|
|
single 2-part address.
|
|
|
|
### Q5.1 — Reorder UI: label each position
|
|
|
|
Lay the file's entries out as rows (position 0, 1, …), each with a building-part
|
|
dropdown (**Main building** / **Extension 1** / …). Assigning labels yields the
|
|
permutation and validates (each part used once, exactly one Main building). All
|
|
multi-valued columns are shown together, each raw entry annotated with our
|
|
classified enum, so the user sanity-checks classification **and** alignment.
|
|
|
|
### Q6 — Detection: at start, persist a summary
|
|
|
|
Compute the multi-entry summary in the **start-address-matching POST**
|
|
([route.ts:106](<../../src/app/api/portfolio/[portfolioId]/bulk-uploads/[uploadId]/start-address-matching/route.ts#L106>))
|
|
where the full `rows` are already parsed in memory — which columns are
|
|
multi-valued, the distinct counts (with row-counts so we can pick the largest),
|
|
and the largest-count sample (address + per-column raw entries). Avoids
|
|
re-reading a 32k-row file at render. Classification enums are joined at render
|
|
from the override tables.
|
|
|
|
### Q7 — Persistence: two jsonb columns on `bulk_address_uploads`
|
|
|
|
- `multiEntrySummary jsonb` — written at start (detection).
|
|
- `multiEntryOrdering jsonb` — written at confirm: `{ count: permutation }` plus
|
|
`verifyAck` / `orderingConfirmed` flags (final shape TBD; may split flags into
|
|
their own columns).
|
|
|
|
No new table — mirrors how `columnMapping` lives on the upload row.
|
|
|
|
## Risks / load-bearing assumptions
|
|
|
|
1. **Consistent-mistake assumption.** All rows of a given count share one
|
|
ordering convention. The whole "ask once" design rests on this; if a file
|
|
mixes conventions within a count, a single per-count permutation is wrong.
|
|
2. **Largest-count-only capture.** Smaller counts stay unpopulated in the map.
|
|
A future consumer (or a later UI iteration) needs a derivation rule to apply
|
|
the convention to other counts.
|
|
3. **Normalization coupling — mitigated.** To join the sample's raw entries to
|
|
the override tables the frontend must match the backend's `split(",")` →
|
|
`strip` → `lower`. **Resolution:** store the *normalized* description keys in
|
|
`multiEntrySummary` at start (the route already holds the rows), so the
|
|
render-time join is exact-match — no cross-repo string-normalization drift.
|
|
4. **Portfolio-wide blast radius.** A verify-step edit changes the mapping for
|
|
every row with that description, not just the sample address. Must be
|
|
messaged in the UI.
|
|
|
|
## Suggested issues (`/to-issues`)
|
|
|
|
1. Schema: two jsonb columns on `bulk_address_uploads` + migration.
|
|
2. Detection at start: compute + persist `multiEntrySummary` (with normalized
|
|
description keys).
|
|
3. Verify step: list sample descriptions → enum (join override tables),
|
|
editable; Next.js upsert route writing `source='user'`; `verifyAck` flag.
|
|
4. Order step: largest-count sample, position→part dropdowns → permutation;
|
|
persist `multiEntryOrdering`; `orderingConfirmed` flag.
|
|
5. Gate: wire `canFinalize` to the two flags; conditional stepper rendering.
|