ML/modules/ml-pipeline/README.MD
2023-09-10 10:01:26 +01:00

37 lines
No EOL
1.3 KiB
Markdown

# ML-pipeline
This is a dummy ML-pipeline, consisting of:
- dvc tracking for version control (data and models)
- gto for model registry
- docs, created via sphinx (in pre-commit hooks)
- tests for unit, integration and end to end testing
Within `src` folder, the structure is as follows:
- multiple pipelines can be defined
- i.e. for a product, we might require multuple pipelines do deliver a result
- i.e. multiple models
- these models can be all tracked within the same gto model registry
To enable the virtual envrionemnt created in vscode:
- Open settings
- Search 'env'
- Under the extensions tab, there will be **Venv path**
- Copy the path of the '.dev_env' folder into there.
- When you select a kernel, clcik through create environment and refresh
- The virutal environment should be there
To use the docker environment for coding in VSCODE:
- Open the "pipeline" folder
- Open with remote container
- Select the Dockerfile
- Add the Git extension (for dvc)
For running experiment, everything will be cached but the workflow will be:
- `dvc repro` to regenerate the current experiement
- Change parameters if needed
- Use `dvc exp run`
- Cachec the results by using `dvc push -r REMOTE_NAME`
- Repeat as needed
- When happy with results, use `dvc exp apply EXPERIMENT_NAME`
- Use `dvc pull`
- Commit code