mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
added read me with repo overview and todos
This commit is contained in:
parent
bf34393ceb
commit
bff54a9063
1 changed files with 102 additions and 0 deletions
102
backend/onboarders/README.md
Normal file
102
backend/onboarders/README.md
Normal file
|
|
@ -0,0 +1,102 @@
|
|||
# Retrofit Property Data Onboarding
|
||||
|
||||
This repository contains an ETL pipeline for transforming raw retrofit property data from external source systems (
|
||||
currently Parity) into a standardised internal format, compatible for both address2uprn and engine.
|
||||
|
||||
The pipeline is designed to:
|
||||
|
||||
- Run as an AWS Lambda triggered by SQS
|
||||
- Read raw CSV/XLSX files from S3
|
||||
- Perform rule-based mappings
|
||||
- Infer as built property attributes, assumed based on age
|
||||
- Output a processed csv, back to s3 to be consumed by address2uprn
|
||||
|
||||
### Structure
|
||||
|
||||
SQS → Lambda handler → OnboarderFactory → System-specific Onboarder → Mapping → CSV to S3
|
||||
|
||||
Each source system implements its own **Onboarder**, while sharing a common base and mapping process.
|
||||
|
||||
---
|
||||
|
||||
### Repository Structure
|
||||
|
||||
onboarders/
|
||||
├── `handler.py` # Lambda entrypoint \
|
||||
├── `factory.py` # Onboarder factory \
|
||||
├── `base.py` # Shared onboarding base class \
|
||||
├── `parity.py` # Parity-specific transformation logic \
|
||||
├── `mappings/` \
|
||||
│ └── `parity/` # Parity domain mappings & classifiers \
|
||||
│ ├── `age_band.py` \
|
||||
│ ├── `property_type.py` \
|
||||
│ ├── `built_form.py` \
|
||||
│ ├── `walls.py` \
|
||||
│ ├── `roof.py` \
|
||||
│ ├── `floor.py` \
|
||||
│ ├── `glazing.py` \
|
||||
│ ├── `heating.py` \
|
||||
│ ├── `as_built_wall_classifiers.py` \
|
||||
│ ├── `as_built_roof_classifiers.py` \
|
||||
│ └── `as_built_floor_classifiers.py` \
|
||||
├── `tests/` \
|
||||
├── `requirements.txt` \
|
||||
└── `README.md`
|
||||
|
||||
|
||||
---
|
||||
|
||||
### Lambda Entry Point (`handler.py`)
|
||||
|
||||
The Lambda handler:
|
||||
|
||||
1. Consumes SQS queue
|
||||
2. Validates the payload
|
||||
3. Instantiates the correct onboarder via `OnboarderFactory`
|
||||
4. Runs the transformation
|
||||
5. Writes the transformed CSV back to S3
|
||||
|
||||
### Expected Event Payload
|
||||
|
||||
```json
|
||||
{
|
||||
"s3_uri": "s3://bucket/path/to/input.xlsx",
|
||||
"system": "parity",
|
||||
"format": "xlsx",
|
||||
"sheet_name": "Sustainability"
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
### Onboarder Base `(base.py)`
|
||||
|
||||
OnboarderBase provides shared functionality across all systems.
|
||||
|
||||
*Responsibilities*
|
||||
|
||||
- Reading CSV/XLSX files from S3
|
||||
- Writing transformed CSVs to S3
|
||||
- Defining canonical output column names
|
||||
- Providing validation helpers
|
||||
- Common output - for the moment, onboards will be expected to return a csv
|
||||
|
||||
### Parity Onboarder `(parity.py)`
|
||||
|
||||
`ParityOnboarder` contains all Parity-specific transformation logic.
|
||||
|
||||
Responsibilities*
|
||||
|
||||
- Map raw Parity fields to internal EPC-aligned enums
|
||||
- Infer “as-built” constructions using age bands when insulation data is missing
|
||||
- Resolve energy efficiency ratings deterministically
|
||||
- Normalise output into a fixed schema
|
||||
|
||||
The `transform()` method orchestrates the transformation process.
|
||||
|
||||
### TODOs
|
||||
|
||||
- In `backend/onboarders/mappings/parity/glazing.py` we currently map the partiy descriptions
|
||||
to duples of descriptions and efficiency ratings. This is okay for the moment but we may consider
|
||||
using a data class, just given how error-prone this is.
|
||||
- This is also true for heating mappings in `backend/onboarders/mappings/parity/heating.py`
|
||||
- Implement a AI-enabled version, to replace the standardised asset list
|
||||
Loading…
Add table
Reference in a new issue