In what cases could `git pull` be harmful?

My answer, pulled from the discussion that arose on HackerNews:

I feel tempted to just answer the question using the Betteridge Law of Headlines: Why is git pull considered harmful? It isn't.

  • Nonlinearities aren't intrinsically bad. If they represent the actual history they are ok.
  • Accidental reintroduction of commits rebased upstream is the result of wrongly rewriting history upstream. You can't rewrite history when history is replicated along several repos.
  • Modifying the working directory is an expected result; of debatable usefulness, namely in the face of the behaviour of hg/monotone/darcs/other_dvcs_predating_git, but again not intrinsically bad.
  • Pausing to review others' work is needed for a merge, and is again an expected behaviour on git pull. If you do not want to merge, you should use git fetch. Again, this is an idiosyncrasy of git in comparison with previous popular dvcs, but it is expected behaviour and not intrinsically bad.
  • Making it hard to rebase against a remote branch is good. Don't rewrite history unless you absolutely need to. I can't for the life of me understand this pursuit of a (fake) linear history
  • Not cleaning up branches is good. Each repo knows what it wants to hold. Git has no notion of master-slave relationships.

Summary

By default, git pull creates merge commits which add noise and complexity to the code history. In addition, pull makes it easy to not think about how your changes might be affected by incoming changes.

The git pull command is safe so long as it only performs fast-forward merges. If git pull is configured to only do fast-forward merges and when a fast-forward merge isn't possible, then Git will exit with an error. This will give you an opportunity to study the incoming commits, think about how they might affect your local commits, and decide the best course of action (merge, rebase, reset, etc.).

With Git 2.0 and newer, you can run:

git config --global pull.ff only

to alter the default behavior to only fast-forward. With Git versions between 1.6.6 and 1.9.x you'll have to get into the habit of typing:

git pull --ff-only

However, with all versions of Git, I recommend configuring a git up alias like this:

git config --global alias.up '!git remote update -p; git merge --ff-only @{u}'

and using git up instead of git pull. I prefer this alias over git pull --ff-only because:

  • it works with all (non-ancient) versions of Git,
  • it fetches all upstream branches (not just the branch you're currently working on), and
  • it cleans out old origin/* branches that no longer exist upstream.

Problems with git pull

git pull isn't bad if it is used properly. Several recent changes to Git have made it easier to use git pull properly, but unfortunately the default behavior of a plain git pull has several problems:

  • it introduces unnecessary nonlinearities in the history
  • it makes it easy to accidentally reintroduce commits that were intentionally rebased out upstream
  • it modifies your working directory in unpredictable ways
  • pausing what you are doing to review someone else's work is annoying with git pull
  • it makes it hard to correctly rebase onto the remote branch
  • it doesn't clean up branches that were deleted in the remote repo

These problems are described in greater detail below.

Nonlinear History

By default, the git pull command is equivalent to running git fetch followed by git merge @{u}. If there are unpushed commits in the local repository, the merge part of git pull creates a merge commit.

There is nothing inherently bad about merge commits, but they can be dangerous and should be treated with respect:

  • Merge commits are inherently difficult to examine. To understand what a merge is doing, you have to understand the differences to all parents. A conventional diff doesn't convey this multi-dimensional information well. In contrast, a series of normal commits is easy to review.
  • Merge conflict resolution is tricky, and mistakes often go undetected for a long time because merge commits are difficult to review.
  • Merges can quietly supersede the effects of regular commits. The code is no longer the sum of incremental commits, leading to misunderstandings about what actually changed.
  • Merge commits may disrupt some continuous integration schemes (e.g., auto-build only the first-parent path under the assumed convention that second parents point to incomplete works in progress).

Of course there is a time and a place for merges, but understanding when merges should and should not be used can improve the usefulness of your repository.

Note that the purpose of Git is to make it easy to share and consume the evolution of a codebase, not to precisely record history exactly as it unfolded. (If you disagree, consider the rebase command and why it was created.) The merge commits created by git pull do not convey useful semantics to others—they just say that someone else happened to push to the repository before you were done with your changes. Why have those merge commits if they aren't meaningful to others and could be dangerous?

It is possible to configure git pull to rebase instead of merge, but this also has problems (discussed later). Instead, git pull should be configured to only do fast-forward merges.

Reintroduction of Rebased-out Commits

Suppose someone rebases a branch and force pushes it. This generally shouldn't happen, but it's sometimes necessary (e.g., to remove a 50GiB log file that was accidentally comitted and pushed). The merge done by git pull will merge the new version of the upstream branch into the old version that still exists in your local repository. If you push the result, pitch forks and torches will start coming your way.

Some may argue that the real problem is force updates. Yes, it's generally advisable to avoid force pushes whenever possible, but they are sometimes unavoidable. Developers must be prepared to deal with force updates, because they will happen sometimes. This means not blindly merging in the old commits via an ordinary git pull.

Surprise Working Directory Modifications

There's no way to predict what the working directory or index will look like until git pull is done. There might be merge conflicts that you have to resolve before you can do anything else, it might introduce a 50GiB log file in your working directory because someone accidentally pushed it, it might rename a directory you are working in, etc.

git remote update -p (or git fetch --all -p) allows you to look at other people's commits before you decide to merge or rebase, allowing you to form a plan before taking action.

Difficulty Reviewing Other People's Commits

Suppose you are in the middle of making some changes and someone else wants you to review some commits they just pushed. git pull's merge (or rebase) operation modifies the working directory and index, which means your working directory and index must be clean.

You could use git stash and then git pull, but what do you do when you're done reviewing? To get back to where you were you have to undo the merge created by git pull and apply the stash.

git remote update -p (or git fetch --all -p) doesn't modify the working directory or index, so it's safe to run at any time—even if you have staged and/or unstaged changes. You can pause what you're doing and review someone else's commit without worrying about stashing or finishing up the commit you're working on. git pull doesn't give you that flexibility.

Rebasing onto a Remote Branch

A common Git usage pattern is to do a git pull to bring in the latest changes followed by a git rebase @{u} to eliminate the merge commit that git pull introduced. It's common enough that Git has some configuration options to reduce these two steps to a single step by telling git pull to perform a rebase instead of a merge (see the branch.<branch>.rebase, branch.autosetuprebase, and pull.rebase options).

Unfortunately, if you have an unpushed merge commit that you want to preserve (e.g., a commit merging a pushed feature branch into master), neither a rebase-pull (git pull with branch.<branch>.rebase set to true) nor a merge-pull (the default git pull behavior) followed by a rebase will work. This is because git rebase eliminates merges (it linearizes the DAG) without the --preserve-merges option. The rebase-pull operation can't be configured to preserve merges, and a merge-pull followed by a git rebase -p @{u} won't eliminate the merge caused by the merge-pull. Update: Git v1.8.5 added git pull --rebase=preserve and git config pull.rebase preserve. These cause git pull to do git rebase --preserve-merges after fetching the upstream commits. (Thanks to funkaster for the heads-up!)

Cleaning Up Deleted Branches

git pull doesn't prune remote tracking branches corresponding to branches that were deleted from the remote repository. For example, if someone deletes branch foo from the remote repo, you'll still see origin/foo.

This leads to users accidentally resurrecting killed branches because they think they're still active.

A Better Alternative: Use git up instead of git pull

Instead of git pull, I recommend creating and using the following git up alias:

git config --global alias.up '!git remote update -p; git merge --ff-only @{u}'

This alias downloads all of the latest commits from all upstream branches (pruning the dead branches) and tries to fast-forward the local branch to the latest commit on the upstream branch. If successful, then there were no local commits, so there was no risk of merge conflict. The fast-forward will fail if there are local (unpushed) commits, giving you an opportunity to review the upstream commits before taking action.

This still modifies your working directory in unpredictable ways, but only if you don't have any local changes. Unlike git pull, git up will never drop you to a prompt expecting you to fix a merge conflict.

Another Option: git pull --ff-only --all -p

The following is an alternative to the above git up alias:

git config --global alias.up 'pull --ff-only --all -p'

This version of git up has the same behavior as the previous git up alias, except:

  • the error message is a bit more cryptic if your local branch isn't configured with an upstream branch
  • it relies on an undocumented feature (the -p argument, which is passed to fetch) that may change in future versions of Git

If you are running Git 2.0 or newer

With Git 2.0 and newer you can configure git pull to only do fast-forward merges by default:

git config --global pull.ff only

This causes git pull to act like git pull --ff-only, but it still doesn't fetch all upstream commits or clean out old origin/* branches so I still prefer git up.

Tags:

Git

Git Pull