New repo with copied history of only currently tracked files
Run git filter branch only once
The script in the question is going to be processing thousands of commits, thousands of times - and it's doing various (very slow) things once per iteration that ordinarily you'll only do at the end. That really is going to take forever.
Instead run the script once, removing all files in one go:
del=`cat deleted.txt`
git filter-branch --force --index-filter \
"git rm --cached --ignore-unmatch $del" \
--prune-empty --tag-name-filter cat -- --all
Once the process has finished then cleanup:
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
# optional extra gc. Slow and may not further-reduce the repo size
git gc --aggressive --prune=now
If the above fails due to the number of files
If there are enough files in deleted.txt such that the above command is too large to run, it can be rewritten as something like so:
git filter-branch --force --index-filter \
'cat /abs/path/to/deleted.txt | xargs git rm --cached --ignore-unmatch' \
--prune-empty --tag-name-filter cat -- --all
(cleanup steps are the same)
This is identical to the version above - but the command to delete the files does so one at a time instead of all at once.
As of April 2020, git
produces the following warning when using git filter-branch
:
WARNING: git-filter-branch has a glut of gotchas generating mangled history
rewrites. Hit Ctrl-C before proceeding to abort, then use an
alternative filtering tool such as 'git filter-repo'
(https://github.com/newren/git-filter-repo/) instead. See the
filter-branch manual page for more details; to squelch this warning,
set FILTER_BRANCH_SQUELCH_WARNING=1.
I'm sure there's a safe way to use git filter-branch
, but for those (like myself) unaware of how to avoid the gotchas mentioned above, git-filter-repo
makes it pretty easy to retain the history of only currently tracked files:
$ git checkout master
$ git ls-files > /tmp/keep-these.txt
$ git filter-repo --paths-from-file /tmp/keep-these.txt
While git filter-branch
took about 5 minutes to run on my repo, git filter-repo
ran and repacked the repo in a little under a second!
It can be installed by following the instructions on its GitHub page. Alternatively, on a Mac you can just run brew install git-filter-repo
.
Base on AD7six, with renamed files history preserved. (you can skip the preliminary optional section)
Optional
remove all remotes:
git remote | while read -r line; do (git remote rm "$line"); done
remove all tags:
git tag | xargs git tag -d
remove all other branches:
git branch | grep -v \* | xargs git branch -D
remove all stashes:
git stash clear
remove all submodules configuration and cache:
git config --local -l | grep submodule | sed -e 's/^\(submodule\.[^.]*\)\(.*\)/\1/g' | while read -r line; do (git config --local --remove-section "$line"); done
rm -rf .git/modules/
Pruning untracked files history, keeping tracked files history & renames
git ls-files | sed -e 's/^/"/g' -e 's/$/"/g' > keep-these.txt
git ls-files | while read -r line; do (git log --follow --raw --diff-filter=R --pretty=format:%H "$line" | while true; do if ! read hash; then break; fi; IFS=$'\t' read mode_etc oldname newname; read blankline; echo $oldname; done); done | sed -e 's/^/"/g' -e 's/$/"/g' >> keep-these.txt
git filter-branch --force --index-filter "git rm --ignore-unmatch --cached -qr .; cat \"$PWD/keep-these.txt\" | xargs git reset -q \$GIT_COMMIT --" --prune-empty --tag-name-filter cat -- --all
rm keep-these.txt
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
- First two commands are to list tracked files and tracked files old names, using quotes to preserve paths with spaces.
- Third command is to rewrite the commits for those files only.
- Subsequent commands are to clean the history.
Optional (not recommended)
repack (from the-woes-of-git-gc-aggressive):
git repack -a -d --depth=250 --window=250
Delete everything and restore what you want
Rather than delete this-list-of-files one at a time, do the almost-opposite: delete everything and just restore the files you want to keep.
Like so:
# for unix
$ git checkout master
$ git ls-files > keep-these.txt
$ git filter-branch --force --index-filter \
"git rm --ignore-unmatch --cached -qr . ; \
cat $PWD/keep-these.txt | tr '\n' '\0' | xargs -d '\0' git reset -q \$GIT_COMMIT --" \
--prune-empty --tag-name-filter cat -- --all
# for macOS
$ git checkout master
$ git ls-files > keep-these.txt
$ git filter-branch --force --index-filter \
"git rm --ignore-unmatch --cached -qr . ; \
cat $PWD/keep-these.txt | tr '\n' '\0' | xargs -0 git reset -q \$GIT_COMMIT --" \
--prune-empty --tag-name-filter cat -- --all
It may be faster to execute.
Cleanup steps
Once the whole process has finished, then cleanup:
$ rm -rf .git/refs/original/
$ git reflog expire --expire=now --all
$ git gc --prune=now
# optional extra gc. Slow and may not further-reduce the repo size
$ git gc --aggressive --prune=now
Comparing the repository size before and after, should indicate a sizable reduction, and of course only commits that touch the kept files, plus merge commits - even if empty (because that's how --prune-empty works), will be in the history.
$GIT_COMMIT?
The use of $GIT_COMMIT
seems to have caused some confusion, from the git filter-branch documentation (emphasis added):
The argument is always evaluated in the shell context using the eval command (with the notable exception of the commit filter, for technical reasons). Prior to that, the $GIT_COMMIT environment variable will be set to contain the id of the commit being rewritten.
That means git filter-branch
will provide the variable at run time, it's not provided by you before hand. This can be demonstrated if there's any doubt using this no-op filter branch command:
$ git filter-branch --index-filter "echo current commit is \$GIT_COMMIT"
Rewrite d832800a85be9ef4ee6fda2fe4b3b6715c8bb860 (1/xxxxx)current commit is d832800a85be9ef4ee6fda2fe4b3b6715c8bb860
Rewrite cd86555549ac17aeaa28abecaf450b49ce5ae663 (2/xxxxx)current commit is cd86555549ac17aeaa28abecaf450b49ce5ae663
...