mirror of
https://github.com/Hestia-Homes/ML.git
synced 2026-06-08 11:17:25 +00:00
| .. | ||
| docs | ||
| src | ||
| tests | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| Makefile | ||
| README.MD | ||
ML-pipeline
This is a generic ML-pipeline, consisting of:
- dvc tracking for version control (data and models)
- gto for model registry
- docs, created via sphinx (in pre-commit hooks)
- tests for unit, integration and end to end testing
Within src folder, the structure is as follows:
pipelinefolder, which contains all the codebase for the generic pipeline- The pipeline can track multiple models through dvc and gto model registry
- Deployment files:
- Prediction.Dockerfile - code to create the prediction deployment image
- Training.Dockerfil - code to create the training image (i.e. for remote training on EC2/ Fargate)
- Docker development environment:
- If you wish to develop within a docker.
How to develop using this pipeline:
First, download miniconda to use conda to manage Python Environments
Rund conda init, to initialise your terminal
Change to this directory and run make init, which will:
- Create a conda virtual environment with this version of python - current 3.10.12
- Install packages in the training and version control directories in the pipeline folder (dev version if applicable)
- Install pre-commit to enable pre-commit hooks
To use the environment, run conda activate dev_env_pipeline
To enable the virtual envrionemnt created in vscode:
- Open settings
- Search 'env'
- Under the extensions tab, there will be Venv path
- Copy the path of the '.dev_env' folder into there.
- When you select a kernel, clcik through create environment and refresh
- The virutal environment should be there
To use the docker environment for coding in VSCODE:
- Open the "pipeline" folder
- Open with remote container
- Select the Dockerfile
- Add the Git extension (for dvc)
For running experiment, everything will be cached but the workflow will be:
dvc reproto regenerate the current experiement- Change parameters if needed
- Use
dvc exp run - Cachec the results by using
dvc push -r REMOTE_NAME - Repeat as needed
- When happy with results, use
dvc exp apply EXPERIMENT_NAME - Use
dvc pull - Commit code