Managing large binary files with Git

If the program won't work without the files it seems like splitting them into a separate repo is a bad idea. We have large test suites that we break into a separate repo but those are truly "auxiliary" files.

However, you may be able to manage the files in a separate repo and then use git-submodule to pull them into your project in a sane way. So, you'd still have the full history of all your source but, as I understand it, you'd only have the one relevant revision of your images submodule. The git-submodule facility should help you keep the correct version of the code in line with the correct version of the images.

Here's a good introduction to submodules from Git Book.


Another solution, since April 2015 is Git Large File Storage (LFS) (by GitHub).

It uses git-lfs (see git-lfs.github.com) and tested with a server supporting it: lfs-test-server:
You can store metadata only in the git repo, and the large file elsewhere.

https://cloud.githubusercontent.com/assets/1319791/7051226/c4570828-ddf4-11e4-87eb-8fc165e5ece4.gif


I discovered git-annex recently which I find awesome. It was designed for managing large files efficiently. I use it for my photo/music (etc.) collections. The development of git-annex is very active. The content of the files can be removed from the Git repository, only the tree hierarchy is tracked by Git (through symlinks). However, to get the content of the file, a second step is necessary after pulling/pushing, e.g.:

$ git annex add mybigfile
$ git commit -m'add mybigfile'
$ git push myremote
$ git annex copy --to myremote mybigfile ## This command copies the actual content to myremote
$ git annex drop mybigfile ## Remove content from local repo
...
$ git annex get mybigfile ## Retrieve the content
## or to specify the remote from which to get:
$ git annex copy --from myremote mybigfile

There are many commands available, and there is a great documentation on the website. A package is available on Debian.