Model/backend/onboarders/README.md

# Retrofit Property Data Onboarding

This repository contains an ETL pipeline for transforming raw retrofit property data from external source systems (
currently Parity) into a standardised internal format, compatible for both address2uprn and engine.

The pipeline is designed to:

- Run as an AWS Lambda triggered by SQS
- Read raw CSV/XLSX files from S3
- Perform rule-based mappings
- Infer as built property attributes, assumed based on age
- Output a processed csv, back to s3 to be consumed by address2uprn

### Structure

SQS → Lambda handler → OnboarderFactory → System-specific Onboarder → Mapping → CSV to S3

Each source system implements its own **Onboarder**, while sharing a common base and mapping process.

---

### Repository Structure

onboarders/
├── `handler.py` # Lambda entrypoint \
├── `factory.py` # Onboarder factory \
├── `base.py` # Shared onboarding base class \
├── `parity.py` # Parity-specific transformation logic \
├── `mappings/` \
│ └── `parity/` # Parity domain mappings & classifiers \
│ ├── `age_band.py` \
│ ├── `property_type.py` \
│ ├── `built_form.py` \
│ ├── `walls.py` \
│ ├── `roof.py` \
│ ├── `floor.py` \
│ ├── `glazing.py` \
│ ├── `heating.py` \
│ ├── `as_built_wall_classifiers.py` \
│ ├── `as_built_roof_classifiers.py` \
│ └── `as_built_floor_classifiers.py` \
├── `tests/` \
├── `requirements.txt` \
└── `README.md`


---

### Lambda Entry Point (`handler.py`)

The Lambda handler:

1. Consumes SQS queue
2. Validates the payload
3. Instantiates the correct onboarder via `OnboarderFactory`
4. Runs the transformation
5. Writes the transformed CSV back to S3

### Expected Event Payload

```json
{
  "s3_uri": "s3://bucket/path/to/input.xlsx",
  "system": "parity",
  "format": "xlsx",
  "sheet_name": "Sustainability"
}

```

### Onboarder Base `(base.py)`

OnboarderBase provides shared functionality across all systems.

*Responsibilities*

- Reading CSV/XLSX files from S3
- Writing transformed CSVs to S3
- Defining canonical output column names
- Providing validation helpers
- Common output - for the moment, onboards will be expected to return a csv

### Parity Onboarder `(parity.py)`

`ParityOnboarder` contains all Parity-specific transformation logic.

Responsibilities*

- Map raw Parity fields to internal EPC-aligned enums
- Infer “as-built” constructions using age bands when insulation data is missing
- Resolve energy efficiency ratings deterministically
- Normalise output into a fixed schema

The `transform()` method orchestrates the transformation process.

### TODOs

- In `backend/onboarders/mappings/parity/glazing.py` we currently map the partiy descriptions
  to duples of descriptions and efficiency ratings. This is okay for the moment but we may consider
  using a data class, just given how error-prone this is.
- This is also true for heating mappings in `backend/onboarders/mappings/parity/heating.py`
- Implement a AI-enabled version, to replace the standardised asset list