mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
77 lines
4.3 KiB
Markdown
77 lines
4.3 KiB
Markdown
# Simulation System
|
|
|
|
Starter Readme:
|
|
Steps for pipeline:
|
|
|
|
- (WIP) Set up the training development environment
|
|
- Change directory to this folder (simulation_system)
|
|
- Run the following command `make env PYTHON_VERSION=3.10.12`
|
|
- This will install the specified python version using `pyenv` and select this version as the global python version
|
|
- It will install all training packages as specified in the training-dev.txt requirements file, including the pre-commit hooks
|
|
- Run `source .training_env/bin/activate` to use this environment
|
|
|
|
- (WIP) Use Makefile to start up mock up s3 service
|
|
- By running `make init`, this will run the `docker-compose build` and `docker-compose up -d`, which spins up a S3 service
|
|
- This docker compose is running in detached mode `-d`, so will no output anything to the terminal
|
|
|
|
- Once the Minio service is run, you can run the `training.py` file to start a model build process
|
|
- This will output a model, for a given target column, and add model name composed of some of the hyperparameters
|
|
- An example of running this file is:
|
|
- `python3 training.py --train-filepath ./model_build_data/change_data/rdsap_full/train_validation_data.parquet --test-filepath ./model_build_data/change_data/rdsap_full/test_data.parquet`
|
|
- Outputs of the pipeline are:
|
|
- A model directory bucket
|
|
- A target variable prefix (i.e. RDSAP_CHANGE or HEAT_DEMAND_CHANGE)
|
|
- A model type prefix (i.e. autogluon, tensorflow etc)
|
|
- A model name prefix (i.e. rdsap_change_medium_quality_60_TIMESTAMP)
|
|
- This model name is made up of target variable, quality, time spent training and timestamp
|
|
- Within this prefix, there are three folders:
|
|
- model
|
|
- The model path that can be loaded in the codebase
|
|
- deployment
|
|
- The optimised model that can be deployed (may or maynot need this)
|
|
- metrics
|
|
- The metrics generatted from the model (may or may not need this as this can be contained in the registry)
|
|
|
|
- Once model build is finished, you can run the `prediction.py` file to generate prediction
|
|
- By default, the prediction pipeline will select the best model based on **mean absolute error** from the model registry
|
|
- This can be overwritten by specifying a model_path, which will load an alternative model
|
|
- There are two ways of getting data into the pipeline:
|
|
- Using the `--data` argument:
|
|
- This is a JSON string which can be passed as `python3 predictions.py --data '{"TOTAL_FLOOR_AREA": 1}'`
|
|
- Note the single and double quotation marks, as this affects the ingestion
|
|
- Using the `--data-path` argument:
|
|
- This can be a filepath (Can imagine that we might want to pull data from S3/ DB)
|
|
- An example of running the file is:
|
|
- `python3 predictions.py --data-path ../simulation_system/model_build_data/change_data/rdsap_full/test_data.parquet`
|
|
- Outputs of the pipeline are:
|
|
- prediction bucket
|
|
- a Target variables prefix (i.e. RDSAP_CHANGE or HEAT_DEMAND_CHANGE)
|
|
- a uprn prefix (i.e 0123456789)
|
|
- a `prediction.json`
|
|
- a `metadata.json`
|
|
- This is all the metadata from the model (can change this if needed)
|
|
|
|
- NOTE: If you wish to change any settings, these are currently all in the `Settings.py` file
|
|
- It will be separated out eventually but for now, it works to keep track of anything that we might want to respecify.
|
|
- I.e. the hyperparameters for models are in here but will move into a separate configuration file
|
|
|
|
|
|
# TODO:
|
|
- Structure/ MLOps:
|
|
- Add configuration files (dev, staging, prod), including hyperparamters
|
|
- Add precommit hooks (linters, branch names, etc)
|
|
- Sphinx documentation
|
|
- Sort out local mock up services
|
|
- Sort out Model Registry
|
|
- Sort out Data version control
|
|
- pre-commit hooks:
|
|
- The types of hooks that we want (safety, bandit, iso8 etc)
|
|
- The customisations we require
|
|
- Add sphinx documentation
|
|
- Data Science:
|
|
- Implement a metrics class, to hold all metric
|
|
- Rebuild metrics script (Could be a one off but good to have)
|
|
- Determine metrics
|
|
- Implement and test custom model (Tensorflow Decision Trees etc)
|
|
- Orchestration:
|
|
- Lambda handler for the pipeline
|