What does GIT PUSH do exactly?
The technical, jargon-laden answer from the manual is as follows:
git push
"updates remote refs using local refs, while sending objects necessary to complete the given refs."
So basically, it is copying information, so as to make sure your remote is up to date with your local repo. But what are refs, and what are objects? Paraphrasing the manual:
Refs manual entry are files which "store the SHA-1 value [of an object, like a commit] under a simple name so you can use that pointer rather than the raw SHA-1 value" [to find the content associated with it]. You can see them by navigating to directories like
.git/refs/heads/<branch name>
, or.git/refs/remotes/origin/<branch name>
in your repo.Objects (manual entry) include commits, trees, blobs, and tags (last of which are not pushed by default). As an example, quoting Mark Longair from another SO answer, "a commit records the exact content of the source code at that point in time with the date, author's name, and references to parent commits".
Thus, when you git push
, git uses local refs (created by you typing git commit
) to update equivalent files on the remote, thus updating pointers to most recent commits, and then any new content that you've created gets copied into git's system as objects, labeled with some metadata and SHA-1 refs.
As an extra illustration of what a ref is, here in the Github API docs they show example JSON results of API calls asking for refs in a given repo. It might help you understand how the different pieces of information relate to each other.
My simplest description is, push just do the following: (assuming you do git push origin master)
- Copy the local commits that are not existed in the remote repo to the remote repo
- Move the origin/master (both in your local git and in the remote git) to point to the same local/master commit
- Push DOES NOT merge
HOWEVER, It will check whether your local/master is based on the origin/master. Conceptually, it means in the git graph, from local/master you can go back directly to origin/master (not the origin/master of your local git, but the master on the remote repo) by only moving "downward", meaning no modification was made to remote repo before your push. Otherwise push will be rejected
Assuming you already understand git's "objects" model (your commits and files and so on are all just "objects in the git database", with "loose" objects—those not packed up to save space—stored in .git/objects/12/34567...
and the like)...
You are correct: git fetch
retrieves objects "they" (origin
, in this case) have that you don't, and sticks labels on them: origin/master
and the like. More specifically, your git calls up theirs on the Internet-phone (or any other suitable transport) and asks: what branches do you have, and what commit IDs are those? They have master
and the ID is 1234567...
, so your git asks for 1234567...
and any other objects needed that you don't already have, and makes your origin/master
point to commit object 1234567...
.
The part of git push
that is symmetric here is this: your git calls up their git on the same Internet-phone as usual, but this time, instead of just asking them about their branches, your git tells them about your branches and your git repository objects, and then says: "How about I get you to set your master
to 56789ab...
?"
Their git takes a look at the objects you sent over (the new commit 56789ab...
and whatever other objects you have that they didn't, that they would need to take it). Their git then considers the request to set their master
to 56789ab...
.
As Chris K already answered, there is no merging happening here: your git simply proposes that their git overwrite their master
with this new commit-ID. It's up to their git to decide whether to allow that.
If "they" (whoever they are) have not set up any special rules, the default rule that git uses here is very simple: the overwrite is allowed if the change is a "fast forward". It has one additional feature: the overwrite is also allowed if the change is done with the "force" flag set. It's usually not a good idea to set the force flag here, as the default rule, "only fast forwards", is usually the right rule.
The obvious question here is: what exactly is a fast forward? We'll get to that in a moment; first I need to expand a bit on labels, or "references" to be more formal.
Git's references
In git, a branch, or a tag, or even things like the stash and HEAD
are all references. Most of them are found in .git/refs/
, a sub-directory of the git repository. (A few top-level references, including HEAD
, are right in .git
itself.) All a reference is, is a file1 containing an SHA-1 ID like 7452b4b5786778d5d87f5c90a94fab8936502e20
. SHA-1 IDs are cumbersome and impossible for people to remember, so we use names, like v2.1.0
(a tag in this case, version 2.1.0 of git itself) to save them for us.
Some references are—or at least are intended to be—totally static. The tag v2.1.0
should never refer to something other than the SHA-1 ID above. But some references are more dynamic. Specifically, your own local branches, like master
, are moving targets. One special case, HEAD
, is not even a target of its own: it generally contains the name of the moving-target branch. So there's one exception for "indirect" references: HEAD
usually contains the string ref: refs/heads/master
, or ref: refs/heads/branch
, or something along those lines; and git does not (and cannot) enforce a "never change" rule for references. Branches in particular change a lot.
How do you know if a reference is supposed to change? Well, a lot of this is just by convention: branches move and tags don't. But you should then ask: how do you know if a reference is a branch, or a tag, or what?
Name spaces of references: refs/heads/
, refs/tags/
, etc.
Other than the special top-level references, all of git's references are in refs/
as we already noted above. Within the refs/
directory (or "folder" if you're on Windows or Mac), though, we can have a whole collection of sub-directories. Git has, at this point, four well-defined subdirectories: refs/heads/
contains all your branches, refs/tags/
contains all your tags, refs/remotes/
contains all your "remote-tracking branches", and refs/notes/
contains git's "notes" (which I will ignore here as they get a bit complicated).
Since all your branches are in refs/heads/
, git can tell that these should be allowed to change, and since all your tags are in refs/tags/
, git can tell that these should not.
Automatic motion of branches
When you make a new commit, and are on a branch like master
, git will automatically move the reference. Your new commit is created with its "parent commit" being the previous branch-tip, and once your new commit is safely saved away, git changes master
to contain the ID of the new commit. In other words, it makes sure that the branch name, the reference in the heads
sub-directory, always points to the tip-most commit.
(In fact, the branch, in the sense of a collection of commits that is part of the commit-graph stored in the repository, is a data structure made out of the commits in the repository. Its only connection with the branch name is that the tip commit of the branch itself is stored in the reference label with that name. This is important later, if and when branch names are changed or erased as the repository grows many more commits. For now it's just something to keep in mind: there's a difference between the "branch tip", which is where the "branch name" points, and the branch-as-a-subset-of-commit-DAG. It's a bit unfortunate that git tends to lump these different concepts under a single name, "branch".)
What exactly is a fast forward?
Usually you see "fast forward" in the context of merge, often with the merge done as the second step in a git pull
. But in fact, "fast forwarding" is actually a property of a label move.
Let's draw a little bit of a commit graph. The little o
nodes represent commits, and each one has an arrow pointing left, left-and-up, or left-and-down (or in one case, two arrows) to its parent (or parents). To be able to refer to three by name I'll give them uppercase letter names instead of o
. Also, this character-based artwork doesn't have arrows, so you have to imagine them; just remember that they all point left or left-ish, just like the three names.
o - A <-- name1
/
o - o - o - o - B <-- name2
\ /
o - C <-- name3
When you ask git to change a reference, you simply ask it to stick a new commit ID into the label. In this case, these labels live in refs/heads/
and are thus branch names, so they are supposed to be able to take on new values.
If we tell git to put B
into name1
, we get this:
o - A
/
o - o - o - o - B <-- name1, name2
\ /
o - C <-- name3
Note that commit A
now has no name, and the o
to the left of it is found only by finding A
... which is hard since A
has no name. Commit A
has been abandoned, and these two commits have become eligible for "garbage collection". (In git, there's a "ghost name" left behind in the "reflog", that keeps the branch with A
around for 30 days in general. But that's a different topic entirely.)
What about telling git to put B
into name3
? If we do that next, we get this:
o - A
/
o - o - o - o - B <-- name1, name2, name3
\ /
o - C
Here, commit C
still has a way to find it: start at B
and work down-and-left, to its other (second) parent commit, and you find commit C
. So commit C
is not abandoned.
Updating name1
like this is not a fast-forward, but updating name3
is.
More specifically, a reference-change is a "fast forward" if and only if the object—usually a commit—that the reference used to point-to is still reachable by starting from the new place and working backwards, along all possible backwards paths. In graph terms, it's a fast-forward if the old node is an ancestor of the new one.
Making a push
be a fast-forward, by merging
Branch-name fast-forwards occur when the only thing you do is add new commits; but also when, if you've added new commits, you've also merged-in whatever new commits someone else added. That is, suppose your repo has this in it, after you've made one new commit:
o <-- master
/
...- o - o <-- origin/master
At this point, moving origin/master
"up and right" would be a fast-forward. However, someone else comes along and updates the other (origin
) repo, so you do a git fetch
and get a new commit from them. Your git moves your origin/master
label (in a fast-forward operation on your repo, as it happens):
o <-- master
/
...- o - o - o <-- origin/master
At this point, moving origin/master
to master
would not be a fast-forward, as it would abandon that one new commit.
You, however, can do a git merge origin/master
operation to make a new commit on your master
, with two parent commit IDs. Let's label this one M
(for merge):
o - M <-- master
/ /
...- o - o - o <-- origin/master
You can now git push
this back to origin
and ask them to set their master
—which you are calling origin/master
—equal to your (new) M
, because for them, that's now a fast-forward operation!
Note that you can also do a git rebase
, but let's leave that for a different stackoverflow posting. :-)
1In fact, git references always start out as individual files in various sub-directories, but if a reference doesn't get updated for a long while, it tends to get "packed" (along with all the other mostly-static references) into a single file full of packed references. This is just a time-saving optimization, and the key here is not to depend on the exact implementation, but rather to use git's rev-parse
and update-ref
commands to extract the current SHA-1 from a reference, or update a reference to contain a new SHA-1.
It only performs a copy, no merge.
More specifically it copies the parts of the object store that are in the local repo/branch and are missing from the remote side. This includes, commit objects, refs, trees and blobs.
Tags are a notable exception, they require the --tags flag to be included.
The following blog post, git is simpler than you think has more detail.