Model/etl/spatial
2025-05-14 15:34:12 +01:00
..
__init__.py restructuing repo wip 2023-10-05 14:45:56 +01:00
app.py updating plan trigger for new pydantic 2024-10-21 17:04:37 +01:00
BoreholeClient.py implemented property age band cleaning 2023-10-05 18:20:52 +01:00
ConservationAreaClient.py vectorised conservation area, heritage building and listedbuilding check 2023-10-02 22:54:15 +01:00
OpenUprnClient.py updating plan trigger for new pydantic 2024-10-21 17:04:37 +01:00
README.md added spatial readme 2023-10-05 14:27:00 +01:00
requirements.txt vectorised conservation area, heritage building and listedbuilding check 2023-10-02 22:54:15 +01:00
SpecialBuildingsClient.py vectorised conservation area, heritage building and listedbuilding check 2023-10-02 22:54:15 +01:00

Spatial - Geospatial Data Processing Service

Overview

The Spatial service is designed to read, process, and analyze geospatial data related to conservation areas and special buildings. It uses datasets from Historic England and the UK government to determine whether a given UPRN (Unique Property Reference Number) is within a conservation area or is a listed building. The processed data is saved back to an S3 bucket in a parquet format for easy retrieval and further analysis.

Dependencies

Dependencies are listed in requirements.txt. To install them, run:

pip install -r requirements.txt

Data Sources

  1. Historic England Conservation Areas: Shapefile containing polygons of conservation areas.
  2. UK Government Conservation Areas: GeoJSON file containing polygons of conservation areas.
  3. Open UPRN Data: CSV file with UPRN and corresponding geospatial data.
  4. Historic England Listed Buildings: Shapefile with information on listed buildings.
  5. Historic England Heritage Buildings at Risk: Shapefile with information on heritage buildings at risk.

Files

  • app.py: Main application file that orchestrates the data processing flow.
  • ConservationAreaClient.py: Handles reading and processing of conservation area data.
  • OpenUprnClient.py: Manages reading and partitioning of Open UPRN data.
  • SpecialBuildingsClient.py: Takes care of reading and processing data related to special buildings.
  • requirements.txt: Lists all Python package dependencies.

How to Run

  1. Make sure you have all the required packages installed.
  2. Update the S3 bucket and file path constants in app.py.
  3. Run app.py.

Workflow

  1. Read the datasets for conservation areas and special buildings.
  2. Read the Open UPRN dataset and partition it into smaller chunks based on UPRN.
  3. For each partition:
    • Convert UPRN data to geopandas DataFrame.
    • Check if each UPRN is within a conservation area or is a special building.
    • Save the processed data back to S3 in parquet format.