In what cases could `git pull` be harmful?
My answer, pulled from the discussion that arose on HackerNews:
I feel tempted to just answer the question using the Betteridge Law of Headlines: Why is git pull
considered harmful? It isn't.
- Nonlinearities aren't intrinsically bad. If they represent the actual history they are ok.
- Accidental reintroduction of commits rebased upstream is the result of wrongly rewriting history upstream. You can't rewrite history when history is replicated along several repos.
- Modifying the working directory is an expected result; of debatable usefulness, namely in the face of the behaviour of hg/monotone/darcs/other_dvcs_predating_git, but again not intrinsically bad.
- Pausing to review others' work is needed for a merge, and is again an expected behaviour on git pull. If you do not want to merge, you should use git fetch. Again, this is an idiosyncrasy of git in comparison with previous popular dvcs, but it is expected behaviour and not intrinsically bad.
- Making it hard to rebase against a remote branch is good. Don't rewrite history unless you absolutely need to. I can't for the life of me understand this pursuit of a (fake) linear history
- Not cleaning up branches is good. Each repo knows what it wants to hold. Git has no notion of master-slave relationships.
Summary
By default, git pull
creates merge commits which add noise and complexity to the code history. In addition, pull
makes it easy to not think about how your changes might be affected by incoming changes.
The git pull
command is safe so long as it only performs fast-forward merges. If git pull
is configured to only do fast-forward merges and when a fast-forward merge isn't possible, then Git will exit with an error. This will give you an opportunity to study the incoming commits, think about how they might affect your local commits, and decide the best course of action (merge, rebase, reset, etc.).
With Git 2.0 and newer, you can run:
git config --global pull.ff only
to alter the default behavior to only fast-forward. With Git versions between 1.6.6 and 1.9.x you'll have to get into the habit of typing:
git pull --ff-only
However, with all versions of Git, I recommend configuring a git up
alias like this:
git config --global alias.up '!git remote update -p; git merge --ff-only @{u}'
and using git up
instead of git pull
. I prefer this alias over git pull --ff-only
because:
- it works with all (non-ancient) versions of Git,
- it fetches all upstream branches (not just the branch you're currently working on), and
- it cleans out old
origin/*
branches that no longer exist upstream.
Problems with git pull
git pull
isn't bad if it is used properly. Several recent changes to Git have made it easier to use git pull
properly, but unfortunately the default behavior of a plain git pull
has several problems:
- it introduces unnecessary nonlinearities in the history
- it makes it easy to accidentally reintroduce commits that were intentionally rebased out upstream
- it modifies your working directory in unpredictable ways
- pausing what you are doing to review someone else's work is annoying with
git pull
- it makes it hard to correctly rebase onto the remote branch
- it doesn't clean up branches that were deleted in the remote repo
These problems are described in greater detail below.
Nonlinear History
By default, the git pull
command is equivalent to running git fetch
followed by git merge @{u}
. If there are unpushed commits in the local repository, the merge part of git pull
creates a merge commit.
There is nothing inherently bad about merge commits, but they can be dangerous and should be treated with respect:
- Merge commits are inherently difficult to examine. To understand what a merge is doing, you have to understand the differences to all parents. A conventional diff doesn't convey this multi-dimensional information well. In contrast, a series of normal commits is easy to review.
- Merge conflict resolution is tricky, and mistakes often go undetected for a long time because merge commits are difficult to review.
- Merges can quietly supersede the effects of regular commits. The code is no longer the sum of incremental commits, leading to misunderstandings about what actually changed.
- Merge commits may disrupt some continuous integration schemes (e.g., auto-build only the first-parent path under the assumed convention that second parents point to incomplete works in progress).
Of course there is a time and a place for merges, but understanding when merges should and should not be used can improve the usefulness of your repository.
Note that the purpose of Git is to make it easy to share and consume the evolution of a codebase, not to precisely record history exactly as it unfolded. (If you disagree, consider the rebase
command and why it was created.) The merge commits created by git pull
do not convey useful semantics to others—they just say that someone else happened to push to the repository before you were done with your changes. Why have those merge commits if they aren't meaningful to others and could be dangerous?
It is possible to configure git pull
to rebase instead of merge, but this also has problems (discussed later). Instead, git pull
should be configured to only do fast-forward merges.
Reintroduction of Rebased-out Commits
Suppose someone rebases a branch and force pushes it. This generally shouldn't happen, but it's sometimes necessary (e.g., to remove a 50GiB log file that was accidentally comitted and pushed). The merge done by git pull
will merge the new version of the upstream branch into the old version that still exists in your local repository. If you push the result, pitch forks and torches will start coming your way.
Some may argue that the real problem is force updates. Yes, it's generally advisable to avoid force pushes whenever possible, but they are sometimes unavoidable. Developers must be prepared to deal with force updates, because they will happen sometimes. This means not blindly merging in the old commits via an ordinary git pull
.
Surprise Working Directory Modifications
There's no way to predict what the working directory or index will look like until git pull
is done. There might be merge conflicts that you have to resolve before you can do anything else, it might introduce a 50GiB log file in your working directory because someone accidentally pushed it, it might rename a directory you are working in, etc.
git remote update -p
(or git fetch --all -p
) allows you to look at other people's commits before you decide to merge or rebase, allowing you to form a plan before taking action.
Difficulty Reviewing Other People's Commits
Suppose you are in the middle of making some changes and someone else wants you to review some commits they just pushed. git pull
's merge (or rebase) operation modifies the working directory and index, which means your working directory and index must be clean.
You could use git stash
and then git pull
, but what do you do when you're done reviewing? To get back to where you were you have to undo the merge created by git pull
and apply the stash.
git remote update -p
(or git fetch --all -p
) doesn't modify the working directory or index, so it's safe to run at any time—even if you have staged and/or unstaged changes. You can pause what you're doing and review someone else's commit without worrying about stashing or finishing up the commit you're working on. git pull
doesn't give you that flexibility.
Rebasing onto a Remote Branch
A common Git usage pattern is to do a git pull
to bring in the latest changes followed by a git rebase @{u}
to eliminate the merge commit that git pull
introduced. It's common enough that Git has some configuration options to reduce these two steps to a single step by telling git pull
to perform a rebase instead of a merge (see the branch.<branch>.rebase
, branch.autosetuprebase
, and pull.rebase
options).
Unfortunately, if you have an unpushed merge commit that you want to preserve (e.g., a commit merging a pushed feature branch into master
), neither a rebase-pull (git pull
with branch.<branch>.rebase
set to true
) nor a merge-pull (the default git pull
behavior) followed by a rebase will work. This is because git rebase
eliminates merges (it linearizes the DAG) without the --preserve-merges
option. The rebase-pull operation can't be configured to preserve merges, and a merge-pull followed by a git rebase -p @{u}
won't eliminate the merge caused by the merge-pull. Update: Git v1.8.5 added git pull --rebase=preserve
and git config pull.rebase preserve
. These cause git pull
to do git rebase --preserve-merges
after fetching the upstream commits. (Thanks to funkaster for the heads-up!)
Cleaning Up Deleted Branches
git pull
doesn't prune remote tracking branches corresponding to branches that were deleted from the remote repository. For example, if someone deletes branch foo
from the remote repo, you'll still see origin/foo
.
This leads to users accidentally resurrecting killed branches because they think they're still active.
A Better Alternative: Use git up
instead of git pull
Instead of git pull
, I recommend creating and using the following git up
alias:
git config --global alias.up '!git remote update -p; git merge --ff-only @{u}'
This alias downloads all of the latest commits from all upstream branches (pruning the dead branches) and tries to fast-forward the local branch to the latest commit on the upstream branch. If successful, then there were no local commits, so there was no risk of merge conflict. The fast-forward will fail if there are local (unpushed) commits, giving you an opportunity to review the upstream commits before taking action.
This still modifies your working directory in unpredictable ways, but only if you don't have any local changes. Unlike git pull
, git up
will never drop you to a prompt expecting you to fix a merge conflict.
Another Option: git pull --ff-only --all -p
The following is an alternative to the above git up
alias:
git config --global alias.up 'pull --ff-only --all -p'
This version of git up
has the same behavior as the previous git up
alias, except:
- the error message is a bit more cryptic if your local branch isn't configured with an upstream branch
- it relies on an undocumented feature (the
-p
argument, which is passed tofetch
) that may change in future versions of Git
If you are running Git 2.0 or newer
With Git 2.0 and newer you can configure git pull
to only do fast-forward merges by default:
git config --global pull.ff only
This causes git pull
to act like git pull --ff-only
, but it still doesn't fetch all upstream commits or clean out old origin/*
branches so I still prefer git up
.