Git copy file preserving history
Unlike subversion, git does not have a per-file history. If you look at the commit data structure, it only points to the previous commits and the new tree object for this commit. No explicit information is stored in the commit object which files are changed by the commit; nor the nature of these changes.
The tools to inspect changes can detect renames based on heuristics. E.g. "git diff" has the option -M that turns on rename detection. So in case of a rename, "git diff" might show you that one file has been deleted and another one created, while "git diff -M" will actually detect the move and display the change accordingly (see "man git diff" for details).
So in git this is not a matter of how you commit your changes but how you look at the committed changes later.
All you have to do is:
- move the file to two different locations,
- merge the two commits that do the above, and
- move one copy back to the original location.
You will be able to see historical attributions (using git blame
) and full history of changes (using git log
) for both files.
Suppose you want to create a copy of file foo
called bar
. In that case the workflow you'd use would look like this:
git mv foo bar
git commit
SAVED=`git rev-parse HEAD`
git reset --hard HEAD^
git mv foo copy
git commit
git merge $SAVED # This will generate conflicts
git commit -a # Trivially resolved like this
git mv copy foo
git commit
Why this works
After you execute the above commands, you end up with a revision history that looks like this:
( revision history ) ( files )
ORIG_HEAD foo
/ \ / \
SAVED ALTERNATE bar copy
\ / \ /
MERGED bar,copy
| |
RESTORED bar,foo
When you ask Git about the history of foo
, it will:
- detect the rename from
copy
between MERGED and RESTORED, - detect that
copy
came from the ALTERNATE parent of MERGED, and - detect the rename from
foo
between ORIG_HEAD and ALTERNATE.
From there it will dig into the history of foo
.
When you ask Git about the history of bar
, it will:
- notice no change between MERGED and RESTORED,
- detect that
bar
came from the SAVED parent of MERGED, and - detect the rename from
foo
between ORIG_HEAD and SAVED.
From there it will dig into the history of foo
.
It's that simple. :)
You just need to force Git into a merge situation where you can accept two traceable copies of the file(s), and we do this with a parallel move of the original (which we soon revert).