add readme

This commit is contained in:
Daniel Roth 2026-01-27 16:57:59 +00:00
parent 0d9ee79c40
commit 60241f947e

View file

@ -0,0 +1,75 @@
# Condition Data Processor
The Condition Data Processor performs the following steps:
- **Extract**
- Ingest client Condition Survey data files (currently from local files; future support planned for S3 and internal survey sources)
- Parse input files into Data Transfer Objects (DTOs)
- **Transform**
- Map source data into the internal domain data model
- **Load**
- Persist transformed data into the ARA database (not yet implemented)
The processor currently supports file formats provided by **Peabody** and **LBWF**.
---
## Running Locally
The `local_runner` script allows the processor to be executed in a local environment.
1. Copy a sample input file into the `sample_data/` directory.
2. Update `local_runner.py` as required, specifically the definitions of:
- `lbwf_path`
- `peabody_path`
- `file_paths`
3. Run `local_runner.py`.
Breakpoints may be added and the script run in debug mode if required.
---
## Known Data Issues
Some inconsistencies exist in the source datasets, primarily involving multiple representations of the same logical element within a single file. In these cases, assumptions have been made in order to normalise the data into the internal domain model.
### Peabody Data Wall Finish Mapping
In the original Peabody sample dataset, multiple Element/Sub-Element combinations correspond to wall finishes:
| Element_Code | Element | Sub_Element_Code | Sub_Element |
|--------------|----------|------------------|-----------------------|
| 53 | External | 23 | Primary Wall Finish |
| 53 | External | 30 | Secondary Wall Finish |
| 120 | WALLS | 2 | Wall Finish |
A single property may contain records for all three combinations, and each combination may appear multiple times.
For example, the property at **55 Burnaby Street, London** contains entries for all three of the above combinations. However, it contains only a single entry for *“WALLS: Wall structure”*, indicating that the property has only one structure rather than multiple.
This pattern is also observed in other sampled properties. Based on this, the following assumption is applied:
- “Secondary” refers to a secondary **finish**, not a secondary **wall**.
As a result:
- The property is mapped to a single Wall element.
- That Wall element is assigned three Finish aspects:
- Two with `aspect_instance = 1`
- One with `aspect_instance = 2`
This means that the combination of
`UPRN / ElementType / ElementInstance / AspectType / AspectInstance`
is **not guaranteed to be unique**.
### LBWF Data Wall Finish Mapping
In the LBWF dataset, the following element codes map to wall finishes:
- `EXTWALLFN1`
- `EXTWALLFN2`
These are similarly mapped as multiple instances of the **Finish** aspect for a single Wall element.
---