How does git compute file hashes?

I am only expanding on the answer by @Leif Gruenwoldt and detailing what is in the reference provided by @Leif Gruenwoldt

Do It Yourself..

Step 1. Create an empty text document (name does not matter) in your repository

Step 2. Stage and Commit the document

Step 3. Identify the hash of the blob by executing git ls-tree HEAD

Step 4. Find the blob's hash to be e69de29bb2d1d6434b8b29ae775ad8c2e48c5391

Step 5. Snap out of your surprise and read below

How does GIT compute its commit hashes

    Commit Hash (SHA1) = SHA1("blob " + <size_of_file> + "\0" + <contents_of_file>)

The text blob⎵ is a constant prefix and \0 is also constant and is the NULL character. The <size_of_file> and <contents_of_file> vary depending on the file.

See: What is the file format of a git commit object?

And thats all folks!

But wait!, did you notice that the <filename> is not a parameter used for the hash computation? Two files could potentially have the same hash if their contents are same indifferent of the date and time they were created and their name. This is one of the reasons Git handles moves and renames better than other version control systems.

Do It Yourself (Ext)

Step 6. Create another empty file with a different filename in the same directory

Step 7. Compare the hashes of both your files.

Note:

The link does not mention how the tree object is hashed. I am not certain of the algorithm and parameters however from my observation it probably computes a hash based on all the blobs and trees (their hashes probably) it contains

Git prefixes the object with "blob ", followed by the length (as a human-readable integer), followed by a NUL character

$ echo -e 'blob 14\0Hello, World!' | shasum 8ab686eafeb1f44702738c8b0f24f2567c36da6d

Source: http://alblue.bandlem.com/2011/08/git-tip-of-week-objects.html

How does git compute file hashes?

Tags:

Hash

Git

Checksum

Sha1

Git Hash

Related

Recent Posts