added read me with repo overview and todos

2026-08-02 21:08:24 +00:00 · 2026-02-09 11:49:32 +00:00 · 2026-02-09 11:49:32 +00:00 · bff54a9063
commit bff54a9063
parent bf34393ceb
1 changed files with 102 additions and 0 deletions
--- a/backend/onboarders/README.md
+++ b/backend/onboarders/README.md
@ -0,0 +1,102 @@
+# Retrofit Property Data Onboarding
+
+This repository contains an ETL pipeline for transforming raw retrofit property data from external source systems (
+currently Parity) into a standardised internal format, compatible for both address2uprn and engine.
+
+The pipeline is designed to:
+
+- Run as an AWS Lambda triggered by SQS
+- Read raw CSV/XLSX files from S3
+- Perform rule-based mappings
+- Infer as built property attributes, assumed based on age
+- Output a processed csv, back to s3 to be consumed by address2uprn
+
+### Structure
+
+SQS → Lambda handler → OnboarderFactory → System-specific Onboarder → Mapping → CSV to S3
+
+Each source system implements its own **Onboarder**, while sharing a common base and mapping process.
+
+---
+
+### Repository Structure
+
+onboarders/
+├── `handler.py` # Lambda entrypoint \
+├── `factory.py` # Onboarder factory \
+├── `base.py` # Shared onboarding base class \
+├── `parity.py` # Parity-specific transformation logic \
+├── `mappings/` \
+│ └── `parity/` # Parity domain mappings & classifiers \
+│ ├── `age_band.py` \
+│ ├── `property_type.py` \
+│ ├── `built_form.py` \
+│ ├── `walls.py` \
+│ ├── `roof.py` \
+│ ├── `floor.py` \
+│ ├── `glazing.py` \
+│ ├── `heating.py` \
+│ ├── `as_built_wall_classifiers.py` \
+│ ├── `as_built_roof_classifiers.py` \
+│ └── `as_built_floor_classifiers.py` \
+├── `tests/` \
+├── `requirements.txt` \
+└── `README.md`
+
+
+---
+
+### Lambda Entry Point (`handler.py`)
+
+The Lambda handler:
+
+1. Consumes SQS queue
+2. Validates the payload
+3. Instantiates the correct onboarder via `OnboarderFactory`
+4. Runs the transformation
+5. Writes the transformed CSV back to S3
+
+### Expected Event Payload
+
+```json
+{
+  "s3_uri": "s3://bucket/path/to/input.xlsx",
+  "system": "parity",
+  "format": "xlsx",
+  "sheet_name": "Sustainability"
+}
+
+```
+
+### Onboarder Base `(base.py)`
+
+OnboarderBase provides shared functionality across all systems.
+
+*Responsibilities*
+
+- Reading CSV/XLSX files from S3
+- Writing transformed CSVs to S3
+- Defining canonical output column names
+- Providing validation helpers
+- Common output - for the moment, onboards will be expected to return a csv
+
+### Parity Onboarder `(parity.py)`
+
+`ParityOnboarder` contains all Parity-specific transformation logic.
+
+Responsibilities*
+
+- Map raw Parity fields to internal EPC-aligned enums
+- Infer “as-built” constructions using age bands when insulation data is missing
+- Resolve energy efficiency ratings deterministically
+- Normalise output into a fixed schema
+
+The `transform()` method orchestrates the transformation process.
+
+### TODOs
+
+- In `backend/onboarders/mappings/parity/glazing.py` we currently map the partiy descriptions
+  to duples of descriptions and efficiency ratings. This is okay for the moment but we may consider
+  using a data class, just given how error-prone this is.
+- This is also true for heating mappings in `backend/onboarders/mappings/parity/heating.py`
+- Implement a AI-enabled version, to replace the standardised asset list