How to use different remotes for different folders?

You can first add the different DVC remotes you want to establish (let's say you call them data and models, each one pointing to a different GC bucket). But don't set any remote as the project's default; This way, dvc push won't work without the -r (or --remote) option.

You would then need to push each directory or file individually to the appropriate remote, like dvc push data/ -r data and dvc push model.dat -r models.

Note that a feature request to configure this exists on the DVC repo too. See Specify file types that can be pushed to remote.


Yes, you can use multiple remotes without Git-submodules.

There is a separate command for using data artifacts from external repositories: dvc import http://your-repo datadir The command brings data to your repo and keeps the connection to the original repo (to avoid data duplication in different remotes).

In your case, one repository can be used for a dataset with its own data remote. A second repo might be used for the code and models which imports the dataset project while all it's models and outputs go to another data remote.

With import, no dvc push -r myremote are needed. A default dvc push synchronize data in a proper remote.

EDITED: Simply use one Git repo for dataset with its data-remote/S3-folder, and import it from another repo with code, model and another data-remote/S3-folder.

Tags:

Dvc