mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
48 lines
No EOL
2 KiB
Markdown
48 lines
No EOL
2 KiB
Markdown
# Spatial - Geospatial Data Processing Service
|
|
|
|
## Overview
|
|
|
|
The Spatial service is designed to read, process, and analyze geospatial data related to
|
|
conservation areas and special buildings. It uses datasets from Historic England and the
|
|
UK government to determine whether a given UPRN (Unique Property Reference Number) is within
|
|
a conservation area or is a listed building. The processed data is saved back to an S3 bucket
|
|
in a parquet format for easy retrieval and further analysis.
|
|
|
|
## Dependencies
|
|
|
|
Dependencies are listed in requirements.txt. To install them, run:
|
|
|
|
```
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Data Sources
|
|
|
|
1. **Historic England Conservation Areas**: Shapefile containing polygons of conservation areas.
|
|
2. **UK Government Conservation Areas**: GeoJSON file containing polygons of conservation areas.
|
|
3. **Open UPRN Data**: CSV file with UPRN and corresponding geospatial data.
|
|
4. **Historic England Listed Buildings**: Shapefile with information on listed buildings.
|
|
5. **Historic England Heritage Buildings at Risk**: Shapefile with information on heritage buildings at risk.
|
|
|
|
## Files
|
|
|
|
- app.py: Main application file that orchestrates the data processing flow.
|
|
- ConservationAreaClient.py: Handles reading and processing of conservation area data.
|
|
- OpenUprnClient.py: Manages reading and partitioning of Open UPRN data.
|
|
- SpecialBuildingsClient.py: Takes care of reading and processing data related to special buildings.
|
|
- requirements.txt: Lists all Python package dependencies.
|
|
|
|
## How to Run
|
|
|
|
1. Make sure you have all the required packages installed.
|
|
2. Update the S3 bucket and file path constants in app.py.
|
|
3. Run app.py.
|
|
|
|
## Workflow
|
|
|
|
1. Read the datasets for conservation areas and special buildings.
|
|
2. Read the Open UPRN dataset and partition it into smaller chunks based on UPRN.
|
|
3. For each partition:
|
|
- Convert UPRN data to geopandas DataFrame.
|
|
- Check if each UPRN is within a conservation area or is a special building.
|
|
- Save the processed data back to S3 in parquet format. |