mirror of https://github.com/Hestia-Homes/Model.git synced 2026-06-08 11:17:27 +00:00

History

Khalim Conn-Kowlessar abc300b406 merged		2026-02-10 14:31:28 +00:00
..
mappings/parity	preparing filtered columns	2026-02-05 13:11:53 +00:00
tests	fixed imports on unit tests for parity onboarding	2026-02-07 21:29:07 +00:00
__init__.py	Roof and wall tests green	2026-02-02 18:47:42 +00:00
base.py	merged	2026-02-10 14:31:28 +00:00
factory.py	Updated factory to return instantiated class	2026-02-09 08:03:44 +00:00
handler.py	Updated factory to return instantiated class	2026-02-09 08:03:44 +00:00
parity.py	ready for review (not deployed	2026-02-05 14:07:43 +00:00
README.md	added read me with repo overview and todos	2026-02-09 11:49:32 +00:00
requirements.txt	ready for review (not deployed	2026-02-05 14:07:43 +00:00

README.md

Retrofit Property Data Onboarding

This repository contains an ETL pipeline for transforming raw retrofit property data from external source systems ( currently Parity) into a standardised internal format, compatible for both address2uprn and engine.

The pipeline is designed to:

Run as an AWS Lambda triggered by SQS
Read raw CSV/XLSX files from S3
Perform rule-based mappings
Infer as built property attributes, assumed based on age
Output a processed csv, back to s3 to be consumed by address2uprn

Structure

SQS → Lambda handler → OnboarderFactory → System-specific Onboarder → Mapping → CSV to S3

Each source system implements its own Onboarder, while sharing a common base and mapping process.

Repository Structure

onboarders/ ├── handler.py # Lambda entrypoint
├── factory.py # Onboarder factory
├── base.py # Shared onboarding base class
├── parity.py # Parity-specific transformation logic
├── mappings/
│ └── parity/ # Parity domain mappings & classifiers
│ ├── age_band.py
│ ├── property_type.py
│ ├── built_form.py
│ ├── walls.py
│ ├── roof.py
│ ├── floor.py
│ ├── glazing.py
│ ├── heating.py
│ ├── as_built_wall_classifiers.py
│ ├── as_built_roof_classifiers.py
│ └── as_built_floor_classifiers.py
├── tests/
├── requirements.txt
└── README.md

Lambda Entry Point (`handler.py`)

The Lambda handler:

Consumes SQS queue
Validates the payload
Instantiates the correct onboarder via OnboarderFactory
Runs the transformation
Writes the transformed CSV back to S3

Expected Event Payload

{
  "s3_uri": "s3://bucket/path/to/input.xlsx",
  "system": "parity",
  "format": "xlsx",
  "sheet_name": "Sustainability"
}

Onboarder Base `(base.py)`

OnboarderBase provides shared functionality across all systems.

Responsibilities

Reading CSV/XLSX files from S3
Writing transformed CSVs to S3
Defining canonical output column names
Providing validation helpers
Common output - for the moment, onboards will be expected to return a csv

Parity Onboarder `(parity.py)`

ParityOnboarder contains all Parity-specific transformation logic.

Responsibilities*

Map raw Parity fields to internal EPC-aligned enums
Infer “as-built” constructions using age bands when insulation data is missing
Resolve energy efficiency ratings deterministically
Normalise output into a fixed schema

The transform() method orchestrates the transformation process.

TODOs

In backend/onboarders/mappings/parity/glazing.py we currently map the partiy descriptions to duples of descriptions and efficiency ratings. This is okay for the moment but we may consider using a data class, just given how error-prone this is.
This is also true for heating mappings in backend/onboarders/mappings/parity/heating.py
Implement a AI-enabled version, to replace the standardised asset list