junte/ML

mirror of https://github.com/Hestia-Homes/ML.git synced 2026-07-27 22:45:04 +00:00

No description

Find a file

quandanrepo b8dcf626b2 Merge pull request #117 from Hestia-Homes/sap-dev Sap dev		2024-05-30 20:18:25 +01:00
.github/workflows	change deployment - pineed serverless pajkage	2024-05-28 16:55:47 +01:00
deployment	correct the dockerignore files and test model with just tabular	2024-04-19 16:25:26 +01:00
modules	formatting	2024-05-28 19:58:46 +01:00
.dockerignore	correct the dockerignore files and test model with just tabular	2024-04-19 16:22:06 +01:00
.pre-commit-config.yaml	fixed bug	2023-09-21 21:28:14 +00:00
MODEL_REGISTRY.md	Update Registry	2024-05-30 11:47:46 +00:00
README.md	Update README.md	2023-10-10 14:02:54 +01:00

README.md

ML Toolkit

Creating a ML-toolkit that can be reused:

ML pipeline:
- A generic pipeline that has data version control, experiment tracking and a model registry
ML monitoring:
- A bolt-on service that can implement model monitoring

There are multiple protected branches which adapt the generic pipeline to produce different models:

sap-{dev/staging/prod}-**
heat-{dev/staging/prod}-**
carbon-{dev/staging/prod}-**

These branches will differ by the configuration files that define the data used and the outputs of the ML-pipeline

There can be different additional logic for each branch but the pipeline will be the same.

Deployment

Scripts associated to deployment can be found in the deployment/ folder.

Deployment is automated via Github Actions, where a deployment is triggered by a push to one of the protected branch, with one of dev or prod as the suffix, describing the target environment.

The github actions file will build and push a docker image to ECR and then deploy a lambda which produces predictions for the relevant model.

In order for this to be set up, some key environment variables needs to be inserted into Github secrets. Each different model and protected branch has its own set of secrets which allows for flexibility between different pipelines.

For example, for the branch sap-dev, the prefix=SAP_DEV, and the following secrets are:

{prefix}_ECR_URI, which is the URI of the ECR repository to push to. For example, for the sap change model this is the lambda-sap-prediction-dev repository.
{prefix}_DOMAIN_NAME, is the custom domain name. This is likely going to be the same across the different models, but is still included in the secrets for flexibility.
{prefix}_DATA_BUCKET, is the name of the s3 data bucket where data to be scored by the model is stored
{prefix}_MODEL_BUCKET, is the name of the s3 bucket where the model is stored
{prefix}_PREDICTIONS_BUCKET, is the name of the s3 bucket where the predictions are stored

Building and Testing the Prediction Lambda Function Locally

TODO: Generalise these instructions for the various different pipelines

This guide outlines the steps to build and test the Lambda function locally using Docker. These instructions assume you're working with a machine that has Docker installed.

Prerequisites

Docker: Make sure Docker is installed and running on your machine. AWS Credentials: Ensure you have AWS credentials set up on your local machine, typically stored in ~/.aws/credentials. Root Directory: All commands should be run from the root directory of the repository. Step-by-Step Guide

Building the Docker Image First, navigate to the root directory of the repository. Open a terminal and execute the following
command to build the Docker image:

docker build -t sap -f deployment/Dockerfile.prediction.lambda .

This will build a Docker image tagged as sap_change using the Dockerfile.prediction.lambda located in the deployment directory.

Running the Docker Image Once the image is built, you can run it using the following command:

docker run -p 9000:8080 -v ~/.aws/credentials:/root/.aws/credentials:ro -e RUNTIME_ENVIRONMENT=dev -e PREDICTIONS_BUCKET=retrofit-sap-predictions-dev sap

This command does the following:

Maps port 9000 on your local machine to port 8080 on the Docker container. Mounts your AWS credentials into the Docker container in read-only mode. Sets the RUNTIME_ENVIRONMENT variable to dev. 3. Testing the Lambda Function To test the Lambda function, use the following curl command:

curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"body": "{\"file_location\": \"s3://retrofit-data-dev/sap_change_model/one_sample_test_dataset.parquet\", \"property_id\": 1, \"portfolio_id\": 4, \"created_at\": \"now\"}"}'

This will send a POST request to the running Lambda function and pass in the required data as JSON.