| .github/workflows | ||
| deployment | ||
| modules | ||
| .pre-commit-config.yaml | ||
| MODEL_REGISTRY.md | ||
| README.md | ||
ML Toolkit
Creating a ML-toolkit that can be reused:
-
ML pipeline:
- A generic pipeline that has data version control, experiment tracking and a model registry
-
ML monitoring:
- A bolt-on service that can implement model monitoring
There are multiple protected branches which adapt the generic pipeline to produce different models:
- sap_change-**
- heat_change-**
- carbon_change-**
These branches will differ by the configuration files that define the data used and the outputs of the ML-pipeline
- There can be different additional logic for each branch but the pipeline will be the same.
Deployment
Scripts associated to deployment can be found in the deployment/ folder.
Deployment is automated via Github Actions, where a deployment is triggered by a push to one of the protected branch, with one of dev or prod as the suffix, describing the target environment.
The github actions file will build and push a docker image to ECR and then deploy a lambda which produces predictions for the relevant model.
In order for this to be set up, some key environment variables needs to be inserted into Github secrets. Each different model and protected branch has its own set of secrets which allows for flexibility between different pipelines.
For example, for the branch sap_change-dev, the prefix=SAP_CHANGE_DEV, and the following secrets are:
- {prefix}_ECR_URI, which is the URI of the ECR repository to push to. For example, for the sap change model this is the lambda-sap-prediction-dev repository.
- {prefix}_DOMAIN_NAME, is the custom domain name. This is likely going to be the same across the different models, but is still included in the secrets for flexibility.
- {prefix}_DATA_BUCKET, is the name of the s3 data bucket where data to be scored by the model is stored
- {prefix}_MODEL_BUCKET, is the name of the s3 bucket where the model is stored
- {prefix}_PREDICTIONS_BUCKET, is the name of the s3 bucket where the predictions are stored