Whitelisting and subdirectories in Git
I searched for a long time:
Assume I have a large folder structure with ~100.000 directories recursively nested. In those folders, there're about 30.000 files of type
.txt
(in my case: type*.md
). Next to these*.md
files, there're, lets say, 500GB of (a million+) files that I don't want to track.I want git to track only
.txt
(or*.md
) files in all folders and subdirs.
The correct answer should be: this is not possible in Git.
What I did instead:
[edit: did also not work - I tried to create a folder with symlinks (or hardlinks) and use git there, but git doesn't follow symlinks and overwrites hardlinks. Doh!]
A simpler way of achieving this is:
# Ignore all files...
*.*
# ...except the ones we want
!*.txt
This works because gitignore
applies patterns that do not start with /
to every level below the .gitignore
file:
If there is a separator at the beginning or middle (or both) of the pattern, then the pattern is relative to the directory level of the particular .gitignore file itself. Otherwise the pattern may also match at any level below the .gitignore level.
If you wanted to do this to files inside a directory, things get more complex:
# Ignore all files in all directories inside subdir...
/subdir/**/*.*
# ...except the ones we want
!/subdir/**/*.txt
This works because gitignore
has special rules for **
:
Two consecutive asterisks ("
**
") in patterns matched against full pathname may have special meaning:
- A slash followed by two consecutive asterisks then a slash matches zero or more directories. For example, "
a/**/b
" matches "a/b
", "a/x/b
", "a/x/y/b
" and so on.
The key piece is to make sure you don't ignore directories, because then every file within that directory is ignored regardless of other rules.
If you try it that way, it'll fail, because you'll end up blacklisting the directories in your structure.
To solve, you want to blacklist everything that is not a directory, and is not one of the file-types you want to commit, while not blacklisting directories.
The .gitignore
file that will do this:
# First, ignore everything
*
# Now, whitelist anything that's a directory
!*/
# And all the file types you're interested in.
!*.one
!*.two
!*.etc
Tested this in a three-level structure white-listing for .txt
files in the presence of *.one
, *.two
and *.three
files using a .gitignore
located in the root directory of the repository - works for me. You won't have to add .gitignore
files to all directories in your structure.
Information I used to figure out the answer came from, amongst other things, this (stackoverflow.com).