# ML-pipeline This is a generic ML-pipeline, consisting of: - dvc tracking for version control (data and models) - gto for model registry - docs, created via sphinx (in pre-commit hooks) - tests for unit, integration and end to end testing Within `src` folder, the structure is as follows: - `pipeline` folder, which contains all the codebase for the generic pipeline - The pipeline can track multiple models through dvc and gto model registry - Deployment files: - Prediction.Dockerfile - code to create the prediction deployment image - Training.Dockerfil - code to create the training image (i.e. for remote training on EC2/ Fargate) - Docker development environment: - If you wish to develop within a docker. # How to develop using this pipeline: First, download miniconda to use conda to manage Python Environments Rund `conda init`, to initialise your terminal Change to this directory and run `make init`, which will: - Create a conda virtual environment with this version of python - current 3.10.12 - Install packages in the training and version control directories in the pipeline folder (dev version if applicable) - Install pre-commit to enable pre-commit hooks To use the environment, run `conda activate dev_env_pipeline` To enable the virtual envrionemnt created in vscode: - Open settings - Search 'env' - Under the extensions tab, there will be **Venv path** - Copy the path of the '.dev_env' folder into there. - When you select a kernel, clcik through create environment and refresh - The virutal environment should be there To use the docker environment for coding in VSCODE: - Open the "pipeline" folder - Open with remote container - Select the Dockerfile - Add the Git extension (for dvc) For running experiment, everything will be cached but the workflow will be: - `dvc repro` to regenerate the current experiement - Change parameters if needed - Use `dvc exp run` - Cachec the results by using `dvc push -r REMOTE_NAME` - Repeat as needed - When happy with results, use `dvc exp apply EXPERIMENT_NAME` - Use `dvc pull` - Commit code