Model/model_data/simulation_system
2023-09-19 11:20:54 +01:00
..
core fixed but which is dropping roof description 2023-09-17 22:20:23 +01:00
Dockerfiles cleaned up docker file for local testing 2023-09-04 14:46:33 +01:00
handlers Fixing issue with sap model parsing event body 2023-09-12 18:48:33 +01:00
MLModel lambda can only write to /tmp folder to using /tmp for model download 2023-09-05 12:43:51 +01:00
model_build_data/change_data/rdsap_full begin boto3 chagne 2023-09-01 18:19:07 +01:00
preprocessed_data add a test data generation script 2023-08-12 19:11:22 +00:00
requirements cleaned up docker file for local testing 2023-09-04 14:46:33 +01:00
.gitignore fixed minor bug with reading in model 2023-09-05 12:18:42 +01:00
.pre-commit-config.yaml add pre-commit hook, no customisation on black 2023-08-29 17:28:48 +01:00
__init__.py Adding database connection to fastapi 2023-07-31 11:02:22 +01:00
area_data.py basic process for creating of area dataset complete 2023-09-04 12:08:55 +01:00
docker-compose.yml Update docker-compose.yml 2023-09-04 23:38:34 +01:00
energy_predictor.py Recolved merge conflicts 2023-08-25 12:15:19 +01:00
generate_rdsap_change.py Added the drop columns for roof 2023-09-19 11:20:54 +01:00
Makefile Add extra steps to readme and makefile 2023-08-29 18:16:46 +01:00
predictions.py lambda can only write to /tmp folder to using /tmp for model download 2023-09-05 12:43:51 +01:00
README.md Add extra steps to readme and makefile 2023-08-29 18:16:46 +01:00
regenerate_metrics.py Added hnalder and dockerfiles 2023-08-31 14:46:10 +01:00
requirements.txt add requirements for seaborn 2023-08-19 13:57:45 +00:00
test_data_generation.py Added the drop columns for roof 2023-09-19 11:20:54 +01:00
training.py cleaned up docker file for local testing 2023-09-04 14:46:56 +01:00

Simulation System

Starter Readme: Steps for pipeline:

  • (WIP) Set up the training development environment

    • Change directory to this folder (simulation_system)
    • Run the following command make env PYTHON_VERSION=3.10.12
    • This will install the specified python version using pyenv and select this version as the global python version
    • It will install all training packages as specified in the training-dev.txt requirements file, including the pre-commit hooks
    • Run source .training_env/bin/activate to use this environment
  • (WIP) Use Makefile to start up mock up s3 service

    • By running make init, this will run the docker-compose build and docker-compose up -d, which spins up a S3 service
    • This docker compose is running in detached mode -d, so will no output anything to the terminal
  • Once the Minio service is run, you can run the training.py file to start a model build process

    • This will output a model, for a given target column, and add model name composed of some of the hyperparameters
    • An example of running this file is:
      • python3 training.py --train-filepath ./model_build_data/change_data/rdsap_full/train_validation_data.parquet --test-filepath ./model_build_data/change_data/rdsap_full/test_data.parquet
    • Outputs of the pipeline are:
      • A model directory bucket
      • A target variable prefix (i.e. RDSAP_CHANGE or HEAT_DEMAND_CHANGE)
      • A model type prefix (i.e. autogluon, tensorflow etc)
      • A model name prefix (i.e. rdsap_change_medium_quality_60_TIMESTAMP)
        • This model name is made up of target variable, quality, time spent training and timestamp
        • Within this prefix, there are three folders:
          • model
            • The model path that can be loaded in the codebase
          • deployment
            • The optimised model that can be deployed (may or maynot need this)
          • metrics
            • The metrics generatted from the model (may or may not need this as this can be contained in the registry)
  • Once model build is finished, you can run the prediction.py file to generate prediction

    • By default, the prediction pipeline will select the best model based on mean absolute error from the model registry
      • This can be overwritten by specifying a model_path, which will load an alternative model
    • There are two ways of getting data into the pipeline:
      • Using the --data argument:
        • This is a JSON string which can be passed as python3 predictions.py --data '{"TOTAL_FLOOR_AREA": 1}'
          • Note the single and double quotation marks, as this affects the ingestion
      • Using the --data-path argument:
        • This can be a filepath (Can imagine that we might want to pull data from S3/ DB)
    • An example of running the file is:
      • python3 predictions.py --data-path ../simulation_system/model_build_data/change_data/rdsap_full/test_data.parquet
    • Outputs of the pipeline are:
      • prediction bucket
      • a Target variables prefix (i.e. RDSAP_CHANGE or HEAT_DEMAND_CHANGE)
      • a uprn prefix (i.e 0123456789)
      • a prediction.json
      • a metadata.json
        • This is all the metadata from the model (can change this if needed)
  • NOTE: If you wish to change any settings, these are currently all in the Settings.py file

    • It will be separated out eventually but for now, it works to keep track of anything that we might want to respecify.
      • I.e. the hyperparameters for models are in here but will move into a separate configuration file

TODO:

  • Structure/ MLOps:
    • Add configuration files (dev, staging, prod), including hyperparamters
    • Add precommit hooks (linters, branch names, etc)
    • Sphinx documentation
    • Sort out local mock up services
    • Sort out Model Registry
    • Sort out Data version control
    • pre-commit hooks:
      • The types of hooks that we want (safety, bandit, iso8 etc)
      • The customisations we require
    • Add sphinx documentation
  • Data Science:
    • Implement a metrics class, to hold all metric
    • Rebuild metrics script (Could be a one off but good to have)
    • Determine metrics
    • Implement and test custom model (Tensorflow Decision Trees etc)
  • Orchestration:
    • Lambda handler for the pipeline