mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
Slice 1/6 of the postcode_splitter refactor (Hestia-Homes/Model#1100). Introduces the pure-domain foundation under domain/, with no AWS, Postgres, or pandas. UserAddress is a frozen dataclass that sanitises its postcode in __post_init__ via the canonical sanitise_postcode helper, and iter_postcode_grouped_batches preserves the legacy splitter's batching invariants (group-by-postcode in insertion order, never split a group, oversize single-postcode groups dispatched whole, final flush). Updates UBIQUITOUS_LANGUAGE.md so the User Address term covers both the dataclass sense (preferred in domain code) and the raw upstream-string sense. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
23 lines
932 B
Python
23 lines
932 B
Python
"""Canonical postcode sanitisation for the domain layer.
|
|
|
|
The legacy postcode_splitter normalises postcodes inline with
|
|
``df["postcode"].str.upper().str.replace(" ", "")``. This module promotes
|
|
that operation to a pure, reusable function so the same canonical form is
|
|
applied wherever a postcode crosses a domain boundary -- including
|
|
:class:`domain.addresses.user_address.UserAddress` construction and future
|
|
migrations.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
|
|
def sanitise_postcode(s: str) -> str:
|
|
"""Return the canonical form of a postcode.
|
|
|
|
The canonical form is uppercase with all whitespace removed. This matches
|
|
the legacy splitter's ``str.upper().str.replace(" ", "")`` for the
|
|
overwhelmingly common case of space-separated postcodes (e.g. ``"sw1a 1aa"``
|
|
becomes ``"SW1A1AA"``) while also tolerating tabs/newlines that can creep
|
|
in from CSV ingestion.
|
|
"""
|
|
return "".join(s.split()).upper()
|