mirror of
https://github.com/Hestia-Homes/ML.git
synced 2026-06-08 11:17:25 +00:00
51 lines
No EOL
2 KiB
Markdown
51 lines
No EOL
2 KiB
Markdown
# ML-pipeline
|
|
|
|
This is a generic ML-pipeline, consisting of:
|
|
- dvc tracking for version control (data and models)
|
|
- gto for model registry
|
|
- docs, created via sphinx (in pre-commit hooks)
|
|
- tests for unit, integration and end to end testing
|
|
|
|
Within `src` folder, the structure is as follows:
|
|
- `pipeline` folder, which contains all the codebase for the generic pipeline
|
|
- The pipeline can track multiple models through dvc and gto model registry
|
|
- Deployment files:
|
|
- Prediction.Dockerfile - code to create the prediction deployment image
|
|
- Training.Dockerfil - code to create the training image (i.e. for remote training on EC2/ Fargate)
|
|
- Docker development environment:
|
|
- If you wish to develop within a docker.
|
|
|
|
# How to develop using this pipeline:
|
|
|
|
Run `make init`, which will:
|
|
- Download pyenv (Python version management)
|
|
- Download Python 3.X.X as defined in the `make` file - current 3.10.12
|
|
- Create a virtual environment with this version of python
|
|
- Install packages in the training and version control directories in the pipeline folder (dev version if applicable)
|
|
- Install pre-commit to enable pre-commit hooks
|
|
|
|
To use the environment, run `source .dev_env_pipeline/bin/activate`.
|
|
|
|
To enable the virtual envrionemnt created in vscode:
|
|
- Open settings
|
|
- Search 'env'
|
|
- Under the extensions tab, there will be **Venv path**
|
|
- Copy the path of the '.dev_env' folder into there.
|
|
- When you select a kernel, clcik through create environment and refresh
|
|
- The virutal environment should be there
|
|
|
|
To use the docker environment for coding in VSCODE:
|
|
- Open the "pipeline" folder
|
|
- Open with remote container
|
|
- Select the Dockerfile
|
|
- Add the Git extension (for dvc)
|
|
|
|
For running experiment, everything will be cached but the workflow will be:
|
|
- `dvc repro` to regenerate the current experiement
|
|
- Change parameters if needed
|
|
- Use `dvc exp run`
|
|
- Cachec the results by using `dvc push -r REMOTE_NAME`
|
|
- Repeat as needed
|
|
- When happy with results, use `dvc exp apply EXPERIMENT_NAME`
|
|
- Use `dvc pull`
|
|
- Commit code |