diff --git a/backend/condition/README.md b/backend/condition/README.md new file mode 100644 index 00000000..140d4585 --- /dev/null +++ b/backend/condition/README.md @@ -0,0 +1,75 @@ +# Condition Data Processor + +The Condition Data Processor performs the following steps: + +- **Extract** + - Ingest client Condition Survey data files (currently from local files; future support planned for S3 and internal survey sources) + - Parse input files into Data Transfer Objects (DTOs) + +- **Transform** + - Map source data into the internal domain data model + +- **Load** + - Persist transformed data into the ARA database (not yet implemented) + +The processor currently supports file formats provided by **Peabody** and **LBWF**. + +--- + +## Running Locally + +The `local_runner` script allows the processor to be executed in a local environment. + +1. Copy a sample input file into the `sample_data/` directory. +2. Update `local_runner.py` as required, specifically the definitions of: + - `lbwf_path` + - `peabody_path` + - `file_paths` +3. Run `local_runner.py`. + Breakpoints may be added and the script run in debug mode if required. + +--- + +## Known Data Issues + +Some inconsistencies exist in the source datasets, primarily involving multiple representations of the same logical element within a single file. In these cases, assumptions have been made in order to normalise the data into the internal domain model. + +### Peabody Data – Wall Finish Mapping + +In the original Peabody sample dataset, multiple Element/Sub-Element combinations correspond to wall finishes: + +| Element_Code | Element | Sub_Element_Code | Sub_Element | +|--------------|----------|------------------|-----------------------| +| 53 | External | 23 | Primary Wall Finish | +| 53 | External | 30 | Secondary Wall Finish | +| 120 | WALLS | 2 | Wall Finish | + +A single property may contain records for all three combinations, and each combination may appear multiple times. + +For example, the property at **55 Burnaby Street, London** contains entries for all three of the above combinations. However, it contains only a single entry for *“WALLS: Wall structure”*, indicating that the property has only one structure rather than multiple. + +This pattern is also observed in other sampled properties. Based on this, the following assumption is applied: + +- “Secondary” refers to a secondary **finish**, not a secondary **wall**. + +As a result: +- The property is mapped to a single Wall element. +- That Wall element is assigned three Finish aspects: + - Two with `aspect_instance = 1` + - One with `aspect_instance = 2` + +This means that the combination of +`UPRN / ElementType / ElementInstance / AspectType / AspectInstance` +is **not guaranteed to be unique**. + +### LBWF Data – Wall Finish Mapping + +In the LBWF dataset, the following element codes map to wall finishes: + +- `EXTWALLFN1` +- `EXTWALLFN2` + +These are similarly mapped as multiple instances of the **Finish** aspect for a single Wall element. + +--- +