mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
75 lines
2.9 KiB
Markdown
75 lines
2.9 KiB
Markdown
# Condition Data Processor
|
||
|
||
The Condition Data Processor performs the following steps:
|
||
|
||
- **Extract**
|
||
- Ingest client Condition Survey data files (currently from local files; future support planned for S3 and internal survey sources)
|
||
- Parse input files into Data Transfer Objects (DTOs)
|
||
|
||
- **Transform**
|
||
- Map source data into the internal domain data model
|
||
|
||
- **Load**
|
||
- Persist transformed data into the ARA database (not yet implemented)
|
||
|
||
The processor currently supports file formats provided by **Peabody** and **LBWF**.
|
||
|
||
---
|
||
|
||
## Running Locally
|
||
|
||
The `local_runner` script allows the processor to be executed in a local environment.
|
||
|
||
1. Copy sample input file(s) into the `sample_data/` directory. If working with Peabody data, you'll need the Landlord Reference / UPRN lookup file as well.
|
||
2. Update `local_runner.py` as required, specifically the definitions of:
|
||
- `lbwf_path`
|
||
- `peabody_path`
|
||
- `file_paths`
|
||
3. Run `local_runner.py`.
|
||
Breakpoints may be added and the script run in debug mode if required.
|
||
|
||
---
|
||
|
||
## Known Data Issues
|
||
|
||
Some inconsistencies exist in the source datasets, primarily involving multiple representations of the same logical element within a single file. In these cases, assumptions have been made in order to normalise the data into the internal domain model.
|
||
|
||
### Peabody Data – Wall Finish Mapping
|
||
|
||
In the original Peabody sample dataset, multiple Element/Sub-Element combinations correspond to wall finishes:
|
||
|
||
| Element_Code | Element | Sub_Element_Code | Sub_Element |
|
||
|--------------|----------|------------------|-----------------------|
|
||
| 53 | External | 23 | Primary Wall Finish |
|
||
| 53 | External | 30 | Secondary Wall Finish |
|
||
| 120 | WALLS | 2 | Wall Finish |
|
||
|
||
A single property may contain records for all three combinations, and each combination may appear multiple times.
|
||
|
||
For example, the property at **55 Burnaby Street, London** contains entries for all three of the above combinations. However, it contains only a single entry for *“WALLS: Wall structure”*, indicating that the property has only one structure rather than multiple.
|
||
|
||
This pattern is also observed in other sampled properties. Based on this, the following assumption is applied:
|
||
|
||
- “Secondary” refers to a secondary **finish**, not a secondary **wall**.
|
||
|
||
As a result:
|
||
- The property is mapped to a single Wall element.
|
||
- That Wall element is assigned three Finish aspects:
|
||
- Two with `aspect_instance = 1`
|
||
- One with `aspect_instance = 2`
|
||
|
||
This means that the combination of
|
||
`UPRN / ElementType / ElementInstance / AspectType / AspectInstance`
|
||
is **not guaranteed to be unique**.
|
||
|
||
### LBWF Data – Wall Finish Mapping
|
||
|
||
In the LBWF dataset, the following element codes map to wall finishes:
|
||
|
||
- `EXTWALLFN1`
|
||
- `EXTWALLFN2`
|
||
|
||
These are similarly mapped as multiple instances of the **Finish** aspect for a single Wall element.
|
||
|
||
---
|
||
|