Model

mirror of https://github.com/Hestia-Homes/Model.git synced 2026-06-08 11:17:27 +00:00

History

Jun-te Kim 1b53b47048 add this in a sensible branch		2026-03-17 12:37:50 +00:00
..
handler	only in docker build	2026-02-12 18:52:11 +00:00
tests	added terraform files and test plan	2026-02-02 16:38:17 +00:00
__init__.py	need to download grade pydantic	2026-01-20 20:12:37 +00:00
main.py	add this in a sensible branch	2026-03-17 12:37:50 +00:00
README.md	add this in a sensible branch	2026-03-17 12:37:50 +00:00
script.py	added commnets on script	2026-02-16 14:12:09 +00:00

README.md

So you want to fetch UPRN for an address list?

Before you run:

Step 1) Get the list and ensure the following columns exists

I believe lower and upper case matter:

Address 1
Address 2
Address 3
Postcode

And save it as a .csv file

Step 2)

Before we run this, we need to upload it into S3 as well as put initiate a subtask + task

S3 upload I'll recommend somewhere in retrofit-data-dev and get the s3_uri

For this example I'll be using "s3://retrofit-data-dev/ara_raw_inputs/calico/Calico Homes Full list EPC Properties(Sheet2) (1) (1).csv"

Go to Ara DB and make a new task_id with a randomly generated uuid as the primarily key

task_id = 169ea9b0-01b5-48dc-9f90-ae1989491d09 sub_task_id = e5704f9e-29fe-43c8-8913-05be09f2440f s3 => s3://retrofit-data-dev/ara_raw_inputs/calico/Calico UPRN Matching Rerun After Address Fix.csv

Step 3) Alright, now lets make the input for postcode-splitter sqs to get the ball rolling postcode-splitter-sqs => https://eu-west-2.console.aws.amazon.com/sqs/v3/home?region=eu-west-2#/queues/https%3A%2F%2Fsqs.eu-west-2.amazonaws.com%2F337213553626%2Fpostcode-splitter-queue-dev

{ "task_id": "169ea9b0-01b5-48dc-9f90-ae1989491d09", "sub_task_id": "e5704f9e-29fe-43c8-8913-05be09f2440f", "s3_uri": "s3://retrofit-data-dev/ara_raw_inputs/calico/Calico UPRN Matching (1)(Sheet1).csv" } Each batch of csv should be saved in retrofit-data-dev/ara_postcode_splitter_batches///timestamp:uuid4.csv

outputs of address2uprn ( which is automatically triggered on postcodesplitter) will be saved on retrofit-data-dev/ara_raw_outputs///timestamp:uuid4.csv

Run the script in backend/scripts/combine_address2uprn_outputs.py with . This will combine all the outputs of the files for each address2uprn into one big file

Find out which ones have missing uprn and save that as a seperate sheet and save it somewhere in s3://retrofit-data-dev

I uploaded the missing uprn here: s3://retrofit-data-dev/ara_raw_inputs/calico/missinguprn.csv

ordnance_survey sqs is => https://eu-west-2.console.aws.amazon.com/sqs/v3/home?region=eu-west-2#/queues/https%3A%2F%2Fsqs.eu-west-2.amazonaws.com%2F337213553626%2FordnanceSurvey-queue-dev

{ "s3_uri": "s3://retrofit-data-dev/ara_raw_inputs/calico/missinguprn.csv", "task_id": "a7b70a02-4df4-45b5-a50b-196e095910bb", "sub_task_id": "567cf73b-1210-4909-9ecc-36ae7e23420e" }

outputs are at s3://retrofit-data-dev/ara_ordnance_survey_outputs//