ML/modules/ml-pipeline
2023-09-15 18:36:33 +01:00
..
.dvc all scripts working 2023-09-09 19:33:27 +00:00
docs first pass at sphinx but stoped due to open issue 2023-09-10 15:52:33 +01:00
src new data location 2023-09-15 18:36:33 +01:00
tests add dvc and gto 2023-09-09 09:52:37 +00:00
.dvcignore add dvc and gto 2023-09-09 09:52:37 +00:00
.gitignore test debug 2023-09-12 15:52:22 +01:00
.gto change default for make file 2023-09-09 18:32:37 +00:00
.pre-commit-config.yaml first pass at sphinx but stoped due to open issue 2023-09-10 15:52:33 +01:00
Makefile add monitoring 2023-09-15 09:19:05 +01:00
README.MD changed format 2023-09-10 10:01:26 +01:00

ML-pipeline

This is a dummy ML-pipeline, consisting of:

  • dvc tracking for version control (data and models)
  • gto for model registry
  • docs, created via sphinx (in pre-commit hooks)
  • tests for unit, integration and end to end testing

Within src folder, the structure is as follows:

  • multiple pipelines can be defined
    • i.e. for a product, we might require multuple pipelines do deliver a result
    • i.e. multiple models
  • these models can be all tracked within the same gto model registry

To enable the virtual envrionemnt created in vscode:

  • Open settings
  • Search 'env'
  • Under the extensions tab, there will be Venv path
  • Copy the path of the '.dev_env' folder into there.
  • When you select a kernel, clcik through create environment and refresh
  • The virutal environment should be there

To use the docker environment for coding in VSCODE:

  • Open the "pipeline" folder
  • Open with remote container
  • Select the Dockerfile
  • Add the Git extension (for dvc)

For running experiment, everything will be cached but the workflow will be:

  • dvc repro to regenerate the current experiement
  • Change parameters if needed
  • Use dvc exp run
  • Cachec the results by using dvc push -r REMOTE_NAME
  • Repeat as needed
  • When happy with results, use dvc exp apply EXPERIMENT_NAME
  • Use dvc pull
  • Commit code