mirror of
https://github.com/Hestia-Homes/ML.git
synced 2026-06-08 11:17:25 +00:00
95 lines
4.5 KiB
Markdown
95 lines
4.5 KiB
Markdown
# ML Toolkit
|
|
|
|
Creating a ML-toolkit that can be reused:
|
|
|
|
- ML pipeline:
|
|
- A generic pipeline that has data version control, experiment
|
|
tracking and a model registry
|
|
|
|
- ML monitoring:
|
|
- A bolt-on service that can implement model monitoring
|
|
|
|
There are multiple protected branches which adapt the generic pipeline to produce different models:
|
|
- sap-{dev/staging/prod}-**
|
|
- heat-{dev/staging/prod}-**
|
|
- carbon-{dev/staging/prod}-**
|
|
|
|
These branches will differ by the configuration files that define the data used and the outputs of the ML-pipeline
|
|
- There can be different additional logic for each branch but the pipeline will be the same.
|
|
|
|
# Deployment
|
|
|
|
Scripts associated to deployment can be found in the deployment/ folder.
|
|
|
|
Deployment is automated via Github Actions, where a deployment is triggered by a push to one of the
|
|
protected branch, with one of dev or prod as the suffix, describing the target environment.
|
|
|
|
The github actions file will build and push a docker image to ECR and then deploy a lambda
|
|
which produces predictions for the relevant model.
|
|
|
|
In order for this to be set up, some key environment variables needs to be inserted into Github
|
|
secrets. Each different model and protected branch has its own set of secrets which allows for flexibility
|
|
between different pipelines.
|
|
|
|
For example, for the branch sap-dev, the prefix=SAP_DEV, and the following secrets are:
|
|
|
|
- {prefix}_ECR_URI, which is the URI of the ECR repository to push to. For example, for the
|
|
sap change model this is the lambda-sap-prediction-dev repository.
|
|
- {prefix}_DOMAIN_NAME, is the custom domain name. This is likely going to be the same across the different
|
|
models, but is still included in the secrets for flexibility.
|
|
- {prefix}_DATA_BUCKET, is the name of the s3 data bucket where data to be scored by the model is stored
|
|
- {prefix}_MODEL_BUCKET, is the name of the s3 bucket where the model is stored
|
|
- {prefix}_PREDICTIONS_BUCKET, is the name of the s3 bucket where the predictions are stored
|
|
|
|
|
|
# Building and Testing the Prediction Lambda Function Locally
|
|
TODO: Generalise these instructions for the various different pipelines
|
|
|
|
This guide outlines the steps to build and test the Lambda function locally using Docker. These instructions assume you're working with a machine that has Docker installed.
|
|
|
|
### Prerequisites
|
|
Docker: Make sure Docker is installed and running on your machine.
|
|
AWS Credentials: Ensure you have AWS credentials set up on your local machine, typically stored
|
|
in ~/.aws/credentials.
|
|
Root Directory: All commands should be run from the root directory of the repository.
|
|
Step-by-Step Guide
|
|
1. Building the Docker Image
|
|
First, navigate to the root directory of the repository. Open a terminal and execute the following
|
|
2. command to build the Docker image:
|
|
|
|
```bash
|
|
docker build -t sap -f deployment/Dockerfile.prediction.lambda .
|
|
```
|
|
|
|
This will build a Docker image tagged as sap_change using the Dockerfile.prediction.lambda located
|
|
in the deployment directory.
|
|
|
|
2. Running the Docker Image
|
|
Once the image is built, you can run it using the following command:
|
|
|
|
```bash
|
|
docker run -p 9000:8080 -v ~/.aws/credentials:/root/.aws/credentials:ro -e RUNTIME_ENVIRONMENT=dev -e PREDICTIONS_BUCKET=retrofit-sap-predictions-dev sap
|
|
```
|
|
This command does the following:
|
|
|
|
Maps port 9000 on your local machine to port 8080 on the Docker container.
|
|
Mounts your AWS credentials into the Docker container in read-only mode.
|
|
Sets the RUNTIME_ENVIRONMENT variable to dev.
|
|
3. Testing the Lambda Function
|
|
To test the Lambda function, use the following curl command:
|
|
|
|
```json
|
|
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"body": "{\"file_location\": \"s3://retrofit-data-dev/sap_change_model/one_sample_test_dataset.parquet\", \"property_id\": 1, \"portfolio_id\": 4, \"created_at\": \"now\"}"}'
|
|
```
|
|
|
|
This will send a POST request to the running Lambda function and pass in the required data as JSON.
|
|
|
|
For the testing of warm or testing of the lambda, use:
|
|
|
|
```json
|
|
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"body": "{\"file_location\": \"s3://retrofit-data-dev/sap_change_model/one_sample_test_dataset.parquet\", \"property_id\": 1, \"portfolio_id\": 4, \"created_at\": \"now\", \"testing\": \"true\"}"}'
|
|
```
|
|
or
|
|
```json
|
|
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"body": "{\"file_location\": \"s3://retrofit-data-dev/sap_change_model/one_sample_test_dataset.parquet\", \"property_id\": 1, \"portfolio_id\": 4, \"created_at\": \"now\", \"warm\": \"true\"}"}'
|
|
```
|