git tree contains duplicate file entries

I finally fixed the repo by doing the following

  1. do a fresh clone from github, which only included commits before the problem occurred
  2. add my messed up repo from the filesystem as a remote on the new clone
  3. painstakingly check out commits from the bad repo into the working copy of the new clone

    git checkout fe3254FIRSTCOMMITAFTERORIGIN/MASTER/HEAD . // note the dot at the end
    // without the dot, you move your head to the commit instead of the commit
    // to the working copy, and seems to bring the corrupt object into your good clone
    
  4. commit each in turn, manually copying the commit message from the other repo
  5. remove the corrupt repo from remotes
  6. garbage collect + prune

    git gc --aggressive --prune=now
    
  7. weep happily as git fsck shows no duplicate file entries

checkout a new branch just before the problematic commit. now checkout the files from the problematic commit. Now add and commit them using the same message ( use the -C option ). Repeat for the rest of the commits. After you're done, reset the other branch to point to this correct one. You can then push.


I used git-replace and git-mktree to fix this in the past. You essentially keep the broken tree object, but override all links and make them point to a new object.

  1. First we grab the bad tree:git ls-tree bad_tree_hash > tmpfile.txt This writes out your bad tree. For example:

     040000·tree·3cdcc756ee0ed636c44828927126911d0ab28a18 →  xNotAlphabetic
     040000·tree·4ad0d8ef014b8cc09c95694399254eff43217bfb →  EXT
     040000·tree·d65085e4a05ea9ac8b79e37b87202dd64d402c2e →  duplicateFolder
     040000·tree·d65085e4a05ea9ac8b79e37b87202dd64d402c2e →  duplicateFolder
     040000·tree·fd0661d698ace91135a8473b26707892b7c89c32 →  ToolTester
     040000·tree·d65085e4a05ea9ac8b79e37b87202dd64d402c2e →  duplicateFolder
    

NB, · & → are whitespace [space] and [tab]

  1. Next, edit the text, removing the offending lines, and save with Unix-style endings (ie only LF, not CRLF). With this example, we make this:

     040000·tree·4ad0d8ef014b8cc09c95694399254eff43217bfb →  EXT
     040000·tree·d65085e4a05ea9ac8b79e37b87202dd64d402c2e →  duplicateFolder
     040000·tree·fd0661d698ace91135a8473b26707892b7c89c32 →  ToolTester
     040000·tree·3cdcc756ee0ed636c44828927126911d0ab28a18 →  xNotAlphabetic
    
  2. Type cat tmpfile.txt | git mktree which will make a new, fixed tree object and save it, and return the new hash: a55115e4a05ea9ac8b79e37b872024d64d4r2c2e a.k.a. for demo purposes new_tree_hash

  3. Next git replace will create a new reference, which forces all previously incident links to use the new, fixed object instead. git replace bad_tree_hash new_tree_hash

This will solve your immediate problem. If you're interested, look at the overriding link in the .git/refs/replace folder.


The bad tree object will continue to generate warnings whenever you do a check on your repository with git fsck, but it can be ignored, and all your commits and other links will be consistent and working regardless.


8 year retrospective: There's probably a way to just delete the old, corrupt tree since git replace should make it moot.

Further warning: This hack could also be rejected by a git service eg BitBucket or GitHub, since they could view it as corruption.


I had a problem of this ilk and all the solutions here and in other SO threads failed to fix it for me. In the end I used BFG repo cleaner to destroy all the commits which references the bad folder name, which was probably overkill but successfully repaired the repo.

Tags:

Git

Duplicates