| .. | ||
| domain | ||
| parsing | ||
| tests | ||
| utils | ||
| __init__.py | ||
| file_type.py | ||
| handler.py | ||
| local_runner.py | ||
| processor.py | ||
| README.md | ||
Condition Data Processor
The Condition Data Processor performs the following steps:
-
Extract
- Ingest client Condition Survey data files (currently from local files; future support planned for S3 and internal survey sources)
- Parse input files into Data Transfer Objects (DTOs)
-
Transform
- Map source data into the internal domain data model
-
Load
- Persist transformed data into the ARA database (not yet implemented)
The processor currently supports file formats provided by Peabody and LBWF.
Running Locally
The local_runner script allows the processor to be executed in a local environment.
- Copy a sample input file into the
sample_data/directory. - Update
local_runner.pyas required, specifically the definitions of:lbwf_pathpeabody_pathfile_paths
- Run
local_runner.py.
Breakpoints may be added and the script run in debug mode if required.
Known Data Issues
Some inconsistencies exist in the source datasets, primarily involving multiple representations of the same logical element within a single file. In these cases, assumptions have been made in order to normalise the data into the internal domain model.
Peabody Data – Wall Finish Mapping
In the original Peabody sample dataset, multiple Element/Sub-Element combinations correspond to wall finishes:
| Element_Code | Element | Sub_Element_Code | Sub_Element |
|---|---|---|---|
| 53 | External | 23 | Primary Wall Finish |
| 53 | External | 30 | Secondary Wall Finish |
| 120 | WALLS | 2 | Wall Finish |
A single property may contain records for all three combinations, and each combination may appear multiple times.
For example, the property at 55 Burnaby Street, London contains entries for all three of the above combinations. However, it contains only a single entry for “WALLS: Wall structure”, indicating that the property has only one structure rather than multiple.
This pattern is also observed in other sampled properties. Based on this, the following assumption is applied:
- “Secondary” refers to a secondary finish, not a secondary wall.
As a result:
- The property is mapped to a single Wall element.
- That Wall element is assigned three Finish aspects:
- Two with
aspect_instance = 1 - One with
aspect_instance = 2
- Two with
This means that the combination of
UPRN / ElementType / ElementInstance / AspectType / AspectInstance
is not guaranteed to be unique.
LBWF Data – Wall Finish Mapping
In the LBWF dataset, the following element codes map to wall finishes:
EXTWALLFN1EXTWALLFN2
These are similarly mapped as multiple instances of the Finish aspect for a single Wall element.