mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
2 KiB
2 KiB
Spatial - Geospatial Data Processing Service
Overview
The Spatial service is designed to read, process, and analyze geospatial data related to conservation areas and special buildings. It uses datasets from Historic England and the UK government to determine whether a given UPRN (Unique Property Reference Number) is within a conservation area or is a listed building. The processed data is saved back to an S3 bucket in a parquet format for easy retrieval and further analysis.
Dependencies
Dependencies are listed in requirements.txt. To install them, run:
pip install -r requirements.txt
Data Sources
- Historic England Conservation Areas: Shapefile containing polygons of conservation areas.
- UK Government Conservation Areas: GeoJSON file containing polygons of conservation areas.
- Open UPRN Data: CSV file with UPRN and corresponding geospatial data.
- Historic England Listed Buildings: Shapefile with information on listed buildings.
- Historic England Heritage Buildings at Risk: Shapefile with information on heritage buildings at risk.
Files
- app.py: Main application file that orchestrates the data processing flow.
- ConservationAreaClient.py: Handles reading and processing of conservation area data.
- OpenUprnClient.py: Manages reading and partitioning of Open UPRN data.
- SpecialBuildingsClient.py: Takes care of reading and processing data related to special buildings.
- requirements.txt: Lists all Python package dependencies.
How to Run
- Make sure you have all the required packages installed.
- Update the S3 bucket and file path constants in app.py.
- Run app.py.
Workflow
- Read the datasets for conservation areas and special buildings.
- Read the Open UPRN dataset and partition it into smaller chunks based on UPRN.
- For each partition:
- Convert UPRN data to geopandas DataFrame.
- Check if each UPRN is within a conservation area or is a special building.
- Save the processed data back to S3 in parquet format.