No description
Find a file
2024-05-30 20:18:25 +01:00
.github/workflows change deployment - pineed serverless pajkage 2024-05-28 16:55:47 +01:00
deployment correct the dockerignore files and test model with just tabular 2024-04-19 16:25:26 +01:00
modules formatting 2024-05-28 19:58:46 +01:00
.dockerignore correct the dockerignore files and test model with just tabular 2024-04-19 16:22:06 +01:00
.pre-commit-config.yaml fixed bug 2023-09-21 21:28:14 +00:00
MODEL_REGISTRY.md Update Registry 2024-05-30 11:47:46 +00:00
README.md Update README.md 2023-10-10 14:02:54 +01:00

ML Toolkit

Creating a ML-toolkit that can be reused:

  • ML pipeline:

    • A generic pipeline that has data version control, experiment tracking and a model registry
  • ML monitoring:

    • A bolt-on service that can implement model monitoring

There are multiple protected branches which adapt the generic pipeline to produce different models:

  • sap-{dev/staging/prod}-**
  • heat-{dev/staging/prod}-**
  • carbon-{dev/staging/prod}-**

These branches will differ by the configuration files that define the data used and the outputs of the ML-pipeline

  • There can be different additional logic for each branch but the pipeline will be the same.

Deployment

Scripts associated to deployment can be found in the deployment/ folder.

Deployment is automated via Github Actions, where a deployment is triggered by a push to one of the protected branch, with one of dev or prod as the suffix, describing the target environment.

The github actions file will build and push a docker image to ECR and then deploy a lambda which produces predictions for the relevant model.

In order for this to be set up, some key environment variables needs to be inserted into Github secrets. Each different model and protected branch has its own set of secrets which allows for flexibility between different pipelines.

For example, for the branch sap-dev, the prefix=SAP_DEV, and the following secrets are:

  • {prefix}_ECR_URI, which is the URI of the ECR repository to push to. For example, for the sap change model this is the lambda-sap-prediction-dev repository.
  • {prefix}_DOMAIN_NAME, is the custom domain name. This is likely going to be the same across the different models, but is still included in the secrets for flexibility.
  • {prefix}_DATA_BUCKET, is the name of the s3 data bucket where data to be scored by the model is stored
  • {prefix}_MODEL_BUCKET, is the name of the s3 bucket where the model is stored
  • {prefix}_PREDICTIONS_BUCKET, is the name of the s3 bucket where the predictions are stored

Building and Testing the Prediction Lambda Function Locally

TODO: Generalise these instructions for the various different pipelines

This guide outlines the steps to build and test the Lambda function locally using Docker. These instructions assume you're working with a machine that has Docker installed.

Prerequisites

Docker: Make sure Docker is installed and running on your machine. AWS Credentials: Ensure you have AWS credentials set up on your local machine, typically stored in ~/.aws/credentials. Root Directory: All commands should be run from the root directory of the repository. Step-by-Step Guide

  1. Building the Docker Image First, navigate to the root directory of the repository. Open a terminal and execute the following
  2. command to build the Docker image:
docker build -t sap -f deployment/Dockerfile.prediction.lambda .

This will build a Docker image tagged as sap_change using the Dockerfile.prediction.lambda located in the deployment directory.

  1. Running the Docker Image Once the image is built, you can run it using the following command:
docker run -p 9000:8080 -v ~/.aws/credentials:/root/.aws/credentials:ro -e RUNTIME_ENVIRONMENT=dev -e PREDICTIONS_BUCKET=retrofit-sap-predictions-dev sap

This command does the following:

Maps port 9000 on your local machine to port 8080 on the Docker container. Mounts your AWS credentials into the Docker container in read-only mode. Sets the RUNTIME_ENVIRONMENT variable to dev. 3. Testing the Lambda Function To test the Lambda function, use the following curl command:

curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"body": "{\"file_location\": \"s3://retrofit-data-dev/sap_change_model/one_sample_test_dataset.parquet\", \"property_id\": 1, \"portfolio_id\": 4, \"created_at\": \"now\"}"}'

This will send a POST request to the running Lambda function and pass in the required data as JSON.