# Retrofit Property Data Onboarding This repository contains an ETL pipeline for transforming raw retrofit property data from external source systems ( currently Parity) into a standardised internal format, compatible for both address2uprn and engine. The pipeline is designed to: - Run as an AWS Lambda triggered by SQS - Read raw CSV/XLSX files from S3 - Perform rule-based mappings - Infer as built property attributes, assumed based on age - Output a processed csv, back to s3 to be consumed by address2uprn ### Structure SQS → Lambda handler → OnboarderFactory → System-specific Onboarder → Mapping → CSV to S3 Each source system implements its own **Onboarder**, while sharing a common base and mapping process. --- ### Repository Structure onboarders/ ├── `handler.py` # Lambda entrypoint \ ├── `factory.py` # Onboarder factory \ ├── `base.py` # Shared onboarding base class \ ├── `parity.py` # Parity-specific transformation logic \ ├── `mappings/` \ │ └── `parity/` # Parity domain mappings & classifiers \ │ ├── `age_band.py` \ │ ├── `property_type.py` \ │ ├── `built_form.py` \ │ ├── `walls.py` \ │ ├── `roof.py` \ │ ├── `floor.py` \ │ ├── `glazing.py` \ │ ├── `heating.py` \ │ ├── `as_built_wall_classifiers.py` \ │ ├── `as_built_roof_classifiers.py` \ │ └── `as_built_floor_classifiers.py` \ ├── `tests/` \ ├── `requirements.txt` \ └── `README.md` --- ### Lambda Entry Point (`handler.py`) The Lambda handler: 1. Consumes SQS queue 2. Validates the payload 3. Instantiates the correct onboarder via `OnboarderFactory` 4. Runs the transformation 5. Writes the transformed CSV back to S3 ### Expected Event Payload ```json { "s3_uri": "s3://bucket/path/to/input.xlsx", "system": "parity", "format": "xlsx", "sheet_name": "Sustainability" } ``` ### Onboarder Base `(base.py)` OnboarderBase provides shared functionality across all systems. *Responsibilities* - Reading CSV/XLSX files from S3 - Writing transformed CSVs to S3 - Defining canonical output column names - Providing validation helpers - Common output - for the moment, onboards will be expected to return a csv ### Parity Onboarder `(parity.py)` `ParityOnboarder` contains all Parity-specific transformation logic. Responsibilities* - Map raw Parity fields to internal EPC-aligned enums - Infer “as-built” constructions using age bands when insulation data is missing - Resolve energy efficiency ratings deterministically - Normalise output into a fixed schema The `transform()` method orchestrates the transformation process. ### TODOs - In `backend/onboarders/mappings/parity/glazing.py` we currently map the partiy descriptions to duples of descriptions and efficiency ratings. This is okay for the moment but we may consider using a data class, just given how error-prone this is. - This is also true for heating mappings in `backend/onboarders/mappings/parity/heating.py` - Implement a AI-enabled version, to replace the standardised asset list