Commit graph

4832 commits

Author SHA1 Message Date
Jun-te Kim
d0cf3d14ad get rid of comments 2026-05-20 13:21:11 +00:00
Jun-te Kim
8bb90a5aa5 sanitisation of postcode 2026-05-20 12:57:03 +00:00
Jun-te Kim
914a8ed51e postcode splliter working e2e 2026-05-20 11:07:40 +00:00
Jun-te Kim
0a04448217 applications/postcode_splitter: PostcodeSplitterOrchestrator + Lambda entrypoint slice
Wires slice 1-5 primitives into a deployable splitter:

- orchestration/postcode_splitter_orchestrator.py: PostcodeSplitterOrchestrator
  loads addresses via UserAddressRepository, groups by postcode via
  iter_postcode_grouped_batches, persists each batch under
  ara_postcode_splitter_batches/{task_id}/{subtask_id}/, creates a WAITING
  child SubTask, and publishes an address2UPRN SQS message per batch.

- applications/postcode_splitter/: Lambda entrypoint. handler.py is decorated
  with @subtask_handler() so the parent SubTask lifecycle is decorator-owned;
  PostcodeSplitterTriggerBody validates the body. Dockerfile is the
  python:3.11 Lambda base with the DDD-shaped source layers and no pandas.

- tests/orchestration/test_postcode_splitter_orchestrator.py: integration
  test using moto S3 + moto SQS + in-memory SQLite that exercises the full
  wiring against a fixture CSV spanning three postcode groups (one
  oversize) and asserts child count, persisted inputs, queue bodies, and
  dispatch order.

backend/postcode_splitter/ and .github/workflows/deploy_terraform.yml are
intentionally unchanged: the dockerfile_path flip is deferred until the
companion backend/address2UPRN/ migration is also ready.
2026-05-19 17:46:12 +00:00
Jun-te Kim
708f1b5d18 repositories: UserAddressRepository + UserAddressCsvS3Repository (CSV-on-S3 adapter)
Adds the persistence layer for UserAddress batches:

- Abstract UserAddressRepository with load_batch / save_batch.
- Concrete UserAddressCsvS3Repository over CsvS3Client:
  - load_batch reads canonical upload columns (Address 1/2/3, Postcode,
    Internal Reference), comma-joins non-empty address parts, and
    passes Internal Reference through (None when missing/empty).
  - save_batch writes a 3-column CSV (user_address,postcode,
    internal_reference) to {path_prefix}/{ISO datetime}_{uuid8}.csv
    and returns the s3://bucket/key URI.
- Postcode sanitisation flows through UserAddress.__post_init__; the
  repo never calls sanitise_postcode directly.

Tests (moto-backed) cover: three-line address load, Address-1-only
load, missing Internal Reference, save->reload round trip, and
unique-filename-per-save. pyright --strict clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 17:37:02 +00:00
Jun-te Kim
d70e8a9e53 utilities/aws_lambda: @subtask_handler injects TaskOrchestrator as third positional arg
The wrapped function now receives the decorator-owned TaskOrchestrator as
a third positional argument so handlers can compose their own use-case
orchestrator that shares the session, instead of opening a second Postgres
connection per invocation.

Both existing callers (backend/ordnanceSurvey/main.py and
backend/bulk_address2uprn_combiner/main.py) have their signatures extended
to accept the new positional argument (typed Optional[TaskOrchestrator] so
the legacy backend.utils.subtasks.subtask_handler — which only passes two
args — keeps working until the migration to the new decorator lands).

@task_handler is intentionally unchanged in this slice; symmetry is
deferred per issue #1103.
2026-05-19 17:31:27 +00:00
Jun-te Kim
d7f14033ba orchestration: add TaskOrchestrator.create_child_subtask primitive
Adds a primitive for creating a new WAITING SubTask under an existing
parent Task, routing all SubTask creation through the orchestrator
(replacing the legacy SubTaskInterface path used by the splitter).
Skips _cascade because a new WAITING child against an IN_PROGRESS
parent is a no-op under Task.recalculate_from_subtasks.
2026-05-19 17:19:41 +00:00
Jun-te Kim
7b00a33cd2 infrastructure: typed S3/SQS clients (S3Client, CsvS3Client, SqsClient, Address2UprnQueueClient)
Slice 3/6 of the postcode_splitter refactor (Hestia-Homes/Model#1101).
Introduces a thin typed infrastructure layer wrapping boto3 for the AWS
side of the splitter. S3Client/SqsClient are bucket-/queue-bound byte
adapters; CsvS3Client subclasses S3Client to round-trip CSV row dicts
via the existing parse_s3_uri helper in utils/s3.py; Address2UprnQueueClient
subclasses SqsClient to publish the typed {task_id, sub_task_id, s3_uri}
fan-out body the downstream consumer expects. moto[s3,sqs] is pulled into
test.requirements.txt and the new tests/infrastructure/ suite exercises
each client against the moto backend (S3 round-trip, CSV round-trip,
SQS send + body inspection, typed publish + body inspection). pyright
--strict is clean on the new modules.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 17:12:21 +00:00
Jun-te Kim
6198d7a46d postcode_splitter: pure domain (UserAddress, sanitise_postcode, postcode_batching)
Slice 1/6 of the postcode_splitter refactor (Hestia-Homes/Model#1100).
Introduces the pure-domain foundation under domain/, with no AWS, Postgres,
or pandas. UserAddress is a frozen dataclass that sanitises its postcode in
__post_init__ via the canonical sanitise_postcode helper, and
iter_postcode_grouped_batches preserves the legacy splitter's batching
invariants (group-by-postcode in insertion order, never split a group,
oversize single-postcode groups dispatched whole, final flush). Updates
UBIQUITOUS_LANGUAGE.md so the User Address term covers both the dataclass
sense (preferred in domain code) and the raw upstream-string sense.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 16:45:47 +00:00
Jun-te Kim
54a674b5c8 added postcode splitter rewrite to ddd 2026-05-19 16:35:09 +00:00
Jun-te Kim
bc8ca3ead3 deployment from infrastructure 2026-05-19 12:55:30 +00:00
Daniel Roth
a11ea1b9b8
Merge pull request #1096 from Hestia-Homes/bug/coordination-hub-file-source-correct
Correctly set file source to be "coordination_hub" when using coordation login for pashub
2026-05-19 12:45:56 +01:00
Daniel Roth
20ad0616bc PAS Hub happy path asserts file_source "pas hub" 🟩 2026-05-19 11:10:45 +00:00
Daniel Roth
a4ad1ca11c Coordination Hub file listing fallback stores correct file_source in DB 🟩 2026-05-19 11:10:18 +00:00
Daniel Roth
1e115ba3de Coordination Hub fallback stores correct file_source in DB 🟩 2026-05-19 11:09:01 +00:00
Daniel Roth
dc3543ac5f Coordination Hub fallback stores correct file_source in DB 🟥 2026-05-19 11:07:41 +00:00
Daniel Roth
b2e896f4eb
Merge pull request #1093 from Hestia-Homes/feature/address_additional
added more test cases
2026-05-18 13:11:39 +01:00
Daniel Roth
30c6a9f2f0
Merge pull request #1094 from Hestia-Homes/feature/coordination-hub-files
Pashub fetcher: try coordination credentials if initial token fails
2026-05-18 13:08:16 +01:00
Daniel Roth
770493ff9e add logging 2026-05-18 11:51:48 +00:00
Daniel Roth
3a7a00051d add new variables to deployment pipeline 2026-05-18 11:09:44 +00:00
Daniel Roth
4cd59768c3 Wire coordination account fallback into config and handler, remove token-refresh retry 🟩 2026-05-18 09:22:32 +00:00
Daniel Roth
dcff529219 UnauthorizedError propagates when both PAS and coordination clients return 401 🟩 2026-05-18 09:13:51 +00:00
Daniel Roth
5a29866245 PAS raises UnauthorizedError when 401 received with no coordination factory configured 🟩 2026-05-18 09:12:19 +00:00
Daniel Roth
0c1ecabf2f PAS falls back to coordination client when file listing returns 401 🟩 2026-05-18 09:09:18 +00:00
Daniel Roth
d49bd3620e PAS falls back to coordination client when file listing returns 401 🟥 2026-05-18 09:08:47 +00:00
Daniel Roth
e044638192 PAS falls back to coordination client when UPRN lookup returns 401 🟩 2026-05-18 09:06:46 +00:00
Daniel Roth
a999724578 PAS falls back to coordination client when UPRN lookup returns 401 🟥 2026-05-18 09:05:54 +00:00
Jun-te Kim
fce1e1008a added more test cases 2026-05-15 16:00:02 +00:00
Jun-te Kim
0573db1151
Merge pull request #1089 from Hestia-Homes/feature/run_docker_compose_tests_early
smoke tests
2026-05-15 13:36:43 +01:00
Jun-te Kim
6afd076005 added 5 second rest every 100 tests 2026-05-15 11:28:04 +00:00
Daniel Roth
d3a4365d6e
Merge pull request #1090 from Hestia-Homes/trigger-pashub-fetcher-lambda
Pashub fetcher: improve job ID extraction logic and write script to trigger deployed lambda
2026-05-15 12:07:39 +01:00
Daniel Roth
ad49bf9d85 tweak logs 2026-05-15 11:00:58 +00:00
Daniel Roth
eeb2f9eb20 tweaks before PR 2026-05-15 10:58:42 +00:00
Jun-te Kim
6c8080ef62 smoke tests 2026-05-14 16:57:31 +00:00
Jun-te Kim
0c3a31ed81 smoke tests 2026-05-14 16:49:45 +00:00
Jun-te Kim
16e6000180 smoke tests 2026-05-14 16:44:18 +00:00
Jun-te Kim
572fcc1406 smoke tests 2026-05-14 16:38:22 +00:00
Daniel Roth
ecd2676c5e pashub_job_id extracts job ID from all valid PasHub link shapes 🟩 2026-05-14 13:42:38 +00:00
Daniel Roth
5677789919 pashub_job_id extracts ID from /evidence/view links 🟩 2026-05-14 13:42:04 +00:00
Daniel Roth
0b358e6de6 pashub_job_id extracts ID from /evidence/view links 🟥 2026-05-14 13:37:14 +00:00
Daniel Roth
03ae73f39a trigger via sqs from local file 2026-05-14 13:37:08 +00:00
Daniel Roth
c98fc8452f
Merge pull request #1086 from Hestia-Homes/feature/pashub-additional-files
Fetch coordination and design documents from pashub
2026-05-14 11:59:43 +01:00
Daniel Roth
955db1c3eb additional typehint 2026-05-14 10:58:38 +00:00
Daniel Roth
faf698eb71 rename functions and include typehints 2026-05-14 10:57:37 +00:00
Daniel Roth
cc6b64ee2b
Merge pull request #1080 from Hestia-Homes/feature/magicplan_uploaded_file_id
Include uploaded file ID on MagicPlan plan
2026-05-14 10:23:33 +01:00
Daniel Roth
e8b7cfdcec remove redundant unknown-file test; rename test_infer_* to test_file_type_for_* 🟪 2026-05-14 09:01:56 +00:00
Daniel Roth
fb9bdbc585 _select_latest_core_files delegates to core_file_for; _get_core_file_type removed 🟪 2026-05-14 08:53:56 +00:00
Daniel Roth
5e31c0f3da file_type_for delegates to core_file_for; _MATCHERS removed 🟪 2026-05-14 08:51:28 +00:00
Daniel Roth
541d5965b7 core_file_for OSM fallback is suppressed when evidence_category is present 🟩 2026-05-14 08:46:48 +00:00
Daniel Roth
d4cc00b5e3 core_file_for returns None for unrecognised filenames 🟩 2026-05-14 08:46:10 +00:00