Model/backend/condition/README.md
2026-01-30 09:50:33 +00:00

75 lines
2.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Condition Data Processor
The Condition Data Processor performs the following steps:
- **Extract**
- Ingest client Condition Survey data files (currently from local files; future support planned for S3 and internal survey sources)
- Parse input files into Data Transfer Objects (DTOs)
- **Transform**
- Map source data into the internal domain data model
- **Load**
- Persist transformed data into the ARA database (not yet implemented)
The processor currently supports file formats provided by **Peabody** and **LBWF**.
---
## Running Locally
The `local_runner` script allows the processor to be executed in a local environment.
1. Copy sample input file(s) into the `sample_data/` directory. If working with Peabody data, you'll need the Landlord Reference / UPRN lookup file as well.
2. Update `local_runner.py` as required, specifically the definitions of:
- `lbwf_path`
- `peabody_path`
- `file_paths`
3. Run `local_runner.py`.
Breakpoints may be added and the script run in debug mode if required.
---
## Known Data Issues
Some inconsistencies exist in the source datasets, primarily involving multiple representations of the same logical element within a single file. In these cases, assumptions have been made in order to normalise the data into the internal domain model.
### Peabody Data Wall Finish Mapping
In the original Peabody sample dataset, multiple Element/Sub-Element combinations correspond to wall finishes:
| Element_Code | Element | Sub_Element_Code | Sub_Element |
|--------------|----------|------------------|-----------------------|
| 53 | External | 23 | Primary Wall Finish |
| 53 | External | 30 | Secondary Wall Finish |
| 120 | WALLS | 2 | Wall Finish |
A single property may contain records for all three combinations, and each combination may appear multiple times.
For example, the property at **55 Burnaby Street, London** contains entries for all three of the above combinations. However, it contains only a single entry for *“WALLS: Wall structure”*, indicating that the property has only one structure rather than multiple.
This pattern is also observed in other sampled properties. Based on this, the following assumption is applied:
- “Secondary” refers to a secondary **finish**, not a secondary **wall**.
As a result:
- The property is mapped to a single Wall element.
- That Wall element is assigned three Finish aspects:
- Two with `aspect_instance = 1`
- One with `aspect_instance = 2`
This means that the combination of
`UPRN / ElementType / ElementInstance / AspectType / AspectInstance`
is **not guaranteed to be unique**.
### LBWF Data Wall Finish Mapping
In the LBWF dataset, the following element codes map to wall finishes:
- `EXTWALLFN1`
- `EXTWALLFN2`
These are similarly mapped as multiple instances of the **Finish** aspect for a single Wall element.
---