A commit in Git: Is it a snapshot/state/image or is it a change/diff/patch/delta?

While it could be construed as both, the GitHub Engineering team is clear (Dec. 2020):

Commits are snapshots, not diffs

Derrick Stolee starts with

  • Object ID
  • blobs (file content)
  • tree (directory listing)
  • commits: snapshots!

Object ID

The most important part to know about Git objects is that Git references each by its object ID (OID for short), providing a unique name for the object.
We will use the git rev-parse <ref> command to discover these OIDs.
Each object is essentially a plain-text file and we can examine its contents using the git cat-file -p <oid> command.

Blobs (file content)

To discover the OID for a file at your current revision, run git rev-parse HEAD:<path>.
Then, use git cat-file -p <oid> to find its contents.

Trees (directory listings)

Note that blobs contain file contents, but not the file names!
The names come from Git’s representation of directories: trees.
A tree is an ordered list of path entries, paired with object types, file modes, and the OID for the object at that path.
Subdirectories are also represented as trees, so trees can point to other trees!

Finally:

commit: snapshot in time

A commit is a snapshot in time. Each commit contains a pointer to its root tree, representing the state of the working directory at that time.
The commit has a list of parent commits corresponding to the previous snapshots.
A commit with no parents is a root commit and a commit with multiple parents is a merge commit.
Commits also contain metadata describing the snapshot such as author and committer (including name, email address, and date) and a commit message.
The commit message is an opportunity for the commit author to describe the purpose of that commit with respect to the parents.

https://github.blog/wp-content/uploads/2020/12/commit.png?resize=399%2C268?w=399

Even though commits are snapshots, we frequently look at a commit in a history view or on GitHub as a diff. In fact, the commit message frequently refers to this diff.

The diff is dynamically generated from the snapshot data by comparing the root trees of the commit and its parent. Git can compare any two snapshots in time, not just adjacent commits.

Computing diff is what enable git cherry-pick or git rebase.

And since commits are not diff...

Git doesn’t track renames. There is no data structure inside Git that stores a record that a rename happened between a commit and its parent.
Instead, Git tries to detect renames during the dynamic diff calculation. There are two stages to this rename detection: exact renames and edit-renames.

After first computing a diff, Git inspects the internal model of that diff to discover which paths were added or deleted.
Naturally, a file that was moved from one location to another would appear as a deletion from the first location and an add in the second. Git attempts to match these adds and deletes to create a set of inferred renames.


Understand the Git particle/wave duality

Short answer: both.

Medium answer: It depends.

Long answer: Git is a bit like quantum phenomena: Neither of the two views alone can explain all observations. Read on.

Internally, Git will use both representations, depending (conceptually) on which one it deems more efficient in terms of storage space and execution time for a given commit. The snapshot representation is the primary one.

From the user's point of view, however, it depends on what you do:

Duality 1: Commit as a snapshot vs. commit as a change

Indeed some commands simply only make any sense at all when you think about commits as snapshots of the working tree. This is most pronounced for checkout, but is also true for stash and at least halfway for fetch and reset.

For other commands, madness is the likely result when you try to think of commits in this manner. For those other commands, commits are clearly treated as changes,

  • either in the form of patches you can look at (e.g. show, diff)
  • or in the form of operators you can apply to modify your working tree (e.g. apply, cherry-pick, pull)
  • or in the form of operators you can apply to modify other commits (e.g. rebase)
  • or in the form of operators you can apply to create new commits (e.g. merge, cherry-pick)

Duality 2: Commit as a fixed thing vs. commit as something fluid

There is a side-effect of duality 1 that can shock Git newbies accustomed to other versioning systems. It is the fact that Git appears to not even commit itself to its commits.

Huh?

Assume you have created a branch X containing what you like to think of as your commits A and B. But master has progressed a little, so you rebase X to master.

When you think of A and B as changes, but of master as a snapshot (hey, particles and waves in a single experiment!), this is not a problem: Just apply the changes A and B to the snapshot master.

This thinking is so natural that you will barely notice that Git has now rewritten your commits A and B: They now have different snapshot content and hence a different SHA-1 ID. In Git, the conceptual commit that you think of as a developer is not a fixed-for-all-times kind of thing, but rather some fluid object that changes as a result of working with your repository.

In contrast, if you think of all three (A, B, and master) as snapshots or of all three as changes, your brain will hurt and you will get nowhere.

Disclaimer

The above is a much-simplified description. In Git reality,

  • a commit is not a snapshot at all, it is a piece of metadata (the who/when/why of a snapshot) plus a pointer to a snapshot;
  • the snapshot is called a tree in Git lingo;
  • the commits-as-changes internal representation uses packfiles;
  • some of the above-mentioned commands have further roles that do not fit the same characterization;
  • and even for the given roles it is to some degree a matter of taste into which category (or -ies) certain commands belong.

And don't get confused by the fact that the Pro Git book's very first characterization of Git (in section "Git Basics") is "Snapshots, Not Differences".

Git is complicated after all.

Tags:

Git