mirror of
https://github.com/Hestia-Homes/Model.git
synced 2026-06-08 11:17:27 +00:00
changed readme
This commit is contained in:
parent
98ff1b6fc5
commit
91605127f2
3 changed files with 20 additions and 1 deletions
|
|
@ -3,7 +3,7 @@
|
|||
Starter Readme:
|
||||
Steps for pipeline:
|
||||
|
||||
- (WIP) Set up the training development environment
|
||||
- Set up the training development environment
|
||||
- Change directory to this folder (simulation_system)
|
||||
- Run the following command `make env PYTHON_VERSION=3.10.12`
|
||||
- This will install the specified python version using `pyenv` and select this version as the global python version
|
||||
|
|
@ -56,6 +56,21 @@ Steps for pipeline:
|
|||
- I.e. the hyperparameters for models are in here but will move into a separate configuration file
|
||||
|
||||
|
||||
Data Workflow (DVC):
|
||||
- We can store artifacts (data/ models) in S3 but can add versioning to these artifacts by leveraging DVC (Not Just Data Version Control)
|
||||
- How does this work:
|
||||
- (Initial run): Use the `dvc init` command to turn a git repo into a dvc repo
|
||||
- This will add a dvc config file and add a gitignore file
|
||||
- use git to commit these files
|
||||
- For any data/artifact file that is generated, we use the `dvc add <FILE/Directory>` command to track the file
|
||||
- This creates a `<FILE/directory>.dvc` file that has the metadata of the corresponding artifact
|
||||
- Now we need to add a remote location to where the files should be stored (or cached version of the data). This is done using the `dvc remote add REMOTE-NAME s3://REMOTE-LOCATION`
|
||||
- The two remotes set up for this repo are `build-data-remote` and `etl-data-remote`, both pointing to different folders in s3
|
||||
- You can use `dvc push` to move the data files to the remote storage locations
|
||||
- Now if you every delete you data, you can run `dvc pull` and this will download all data back to the relative folders that have `.dvc` files
|
||||
|
||||
- In this repo, this has been setup, so all you will need to do it run `dvc pull` to get all the latest data
|
||||
|
||||
# TODO:
|
||||
- Structure/ MLOps:
|
||||
- Add configuration files (dev, staging, prod), including hyperparamters
|
||||
|
|
|
|||
|
|
@ -3,3 +3,5 @@ pandas==1.5.3
|
|||
seaborn==0.12.2
|
||||
s3fs==2023.6.0
|
||||
pre-commit==3.3.3
|
||||
dvc
|
||||
dvc[s3]
|
||||
|
|
|
|||
|
|
@ -2,3 +2,5 @@ autogluon==0.8.2
|
|||
pandas==1.5.3
|
||||
seaborn==0.12.2
|
||||
s3fs==2023.6.0
|
||||
dvc
|
||||
dvc[s3]
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue