Model/domain/postcodes/sanitise.py
Jun-te Kim 6198d7a46d postcode_splitter: pure domain (UserAddress, sanitise_postcode, postcode_batching)
Slice 1/6 of the postcode_splitter refactor (Hestia-Homes/Model#1100).
Introduces the pure-domain foundation under domain/, with no AWS, Postgres,
or pandas. UserAddress is a frozen dataclass that sanitises its postcode in
__post_init__ via the canonical sanitise_postcode helper, and
iter_postcode_grouped_batches preserves the legacy splitter's batching
invariants (group-by-postcode in insertion order, never split a group,
oversize single-postcode groups dispatched whole, final flush). Updates
UBIQUITOUS_LANGUAGE.md so the User Address term covers both the dataclass
sense (preferred in domain code) and the raw upstream-string sense.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 16:45:47 +00:00

23 lines
932 B
Python

"""Canonical postcode sanitisation for the domain layer.
The legacy postcode_splitter normalises postcodes inline with
``df["postcode"].str.upper().str.replace(" ", "")``. This module promotes
that operation to a pure, reusable function so the same canonical form is
applied wherever a postcode crosses a domain boundary -- including
:class:`domain.addresses.user_address.UserAddress` construction and future
migrations.
"""
from __future__ import annotations
def sanitise_postcode(s: str) -> str:
"""Return the canonical form of a postcode.
The canonical form is uppercase with all whitespace removed. This matches
the legacy splitter's ``str.upper().str.replace(" ", "")`` for the
overwhelmingly common case of space-separated postcodes (e.g. ``"sw1a 1aa"``
becomes ``"SW1A1AA"``) while also tolerating tabs/newlines that can creep
in from CSV ingestion.
"""
return "".join(s.split()).upper()