mirror of
https://github.com/Hestia-Homes/ML.git
synced 2026-06-08 11:17:25 +00:00
37 lines
No EOL
1.3 KiB
Markdown
37 lines
No EOL
1.3 KiB
Markdown
# ML-pipeline
|
|
|
|
This is a dummy ML-pipeline, consisting of:
|
|
- dvc tracking for version control (data and models)
|
|
- gto for model registry
|
|
- docs, created via sphinx (in pre-commit hooks)
|
|
- tests for unit, integration and end to end testing
|
|
|
|
Within `src` folder, the structure is as follows:
|
|
- multiple pipelines can be defined
|
|
- i.e. for a product, we might require multuple pipelines do deliver a result
|
|
- i.e. multiple models
|
|
- these models can be all tracked within the same gto model registry
|
|
|
|
To enable the virtual envrionemnt created in vscode:
|
|
- Open settings
|
|
- Search 'env'
|
|
- Under the extensions tab, there will be **Venv path**
|
|
- Copy the path of the '.dev_env' folder into there.
|
|
- When you select a kernel, clcik through create environment and refresh
|
|
- The virutal environment should be there
|
|
|
|
To use the docker environment for coding in VSCODE:
|
|
- Open the "pipeline" folder
|
|
- Open with remote container
|
|
- Select the Dockerfile
|
|
- Add the Git extension (for dvc)
|
|
|
|
For running experiment, everything will be cached but the workflow will be:
|
|
- `dvc repro` to regenerate the current experiement
|
|
- Change parameters if needed
|
|
- Use `dvc exp run`
|
|
- Cachec the results by using `dvc push -r REMOTE_NAME`
|
|
- Repeat as needed
|
|
- When happy with results, use `dvc exp apply EXPERIMENT_NAME`
|
|
- Use `dvc pull`
|
|
- Commit code |