added read me with repo overview and todos

2026-06-08 11:17:27 +00:00 · 2026-02-09 11:49:32 +00:00 · 2026-02-09 11:49:32 +00:00 · bff54a9063
commit bff54a9063
parent bf34393ceb
1 changed files with 102 additions and 0 deletions
--- a/backend/onboarders/README.md
+++ b/backend/onboarders/README.md
@ -0,0 +1,102 @@
 # Retrofit Property Data Onboarding
 This repository contains an ETL pipeline for transforming raw retrofit property data from external source systems (
 currently Parity) into a standardised internal format, compatible for both address2uprn and engine.
 The pipeline is designed to:
 - Run as an AWS Lambda triggered by SQS
 - Read raw CSV/XLSX files from S3
 - Perform rule-based mappings
 - Infer as built property attributes, assumed based on age
 - Output a processed csv, back to s3 to be consumed by address2uprn
 ### Structure
 SQS → Lambda handler → OnboarderFactory → System-specific Onboarder → Mapping → CSV to S3
 Each source system implements its own **Onboarder**, while sharing a common base and mapping process.
 ---
 ### Repository Structure
 onboarders/
 ├── `handler.py` # Lambda entrypoint \
 ├── `factory.py` # Onboarder factory \
 ├── `base.py` # Shared onboarding base class \
 ├── `parity.py` # Parity-specific transformation logic \
 ├── `mappings/` \
 │ └── `parity/` # Parity domain mappings & classifiers \
 │ ├── `age_band.py` \
 │ ├── `property_type.py` \
 │ ├── `built_form.py` \
 │ ├── `walls.py` \
 │ ├── `roof.py` \
 │ ├── `floor.py` \
 │ ├── `glazing.py` \
 │ ├── `heating.py` \
 │ ├── `as_built_wall_classifiers.py` \
 │ ├── `as_built_roof_classifiers.py` \
 │ └── `as_built_floor_classifiers.py` \
 ├── `tests/` \
 ├── `requirements.txt` \
 └── `README.md`
 ---
 ### Lambda Entry Point (`handler.py`)
 The Lambda handler:
 1. Consumes SQS queue
 2. Validates the payload
 3. Instantiates the correct onboarder via `OnboarderFactory`
 4. Runs the transformation
 5. Writes the transformed CSV back to S3
 ### Expected Event Payload
 ```json
 {
  "s3_uri": "s3://bucket/path/to/input.xlsx",
  "system": "parity",
  "format": "xlsx",
  "sheet_name": "Sustainability"
 }
 ```
 ### Onboarder Base `(base.py)`
 OnboarderBase provides shared functionality across all systems.
 *Responsibilities*
 - Reading CSV/XLSX files from S3
 - Writing transformed CSVs to S3
 - Defining canonical output column names
 - Providing validation helpers
 - Common output - for the moment, onboards will be expected to return a csv
 ### Parity Onboarder `(parity.py)`
 `ParityOnboarder` contains all Parity-specific transformation logic.
 Responsibilities*
 - Map raw Parity fields to internal EPC-aligned enums
 - Infer “as-built” constructions using age bands when insulation data is missing
 - Resolve energy efficiency ratings deterministically
 - Normalise output into a fixed schema
 The `transform()` method orchestrates the transformation process.
 ### TODOs
 - In `backend/onboarders/mappings/parity/glazing.py` we currently map the partiy descriptions
  to duples of descriptions and efficiency ratings. This is okay for the moment but we may consider
  using a data class, just given how error-prone this is.
 - This is also true for heating mappings in `backend/onboarders/mappings/parity/heating.py`
 - Implement a AI-enabled version, to replace the standardised asset list