How does ancestry path work with git log?
Matthieu Moy's answer is correct but may not help you very much, if you haven't been exposed to the necessary graph theory.
DAGs
First, let's take a quick look at Directed Acyclic Graphs or DAGs. A DAG is just a graph (hence the g
), i.e., a collection of nodes and connections between them—these work like train stations on rail lines, for instance, where the stations are the nodes—that is "directed" (the d
: trains only run one way) and have no loops in them (the a
).
Linear chains and tree structures are valid DAGs (note: newer commits are to the right, in general, here):
o <- o <- o
or:
o <- o
/
o <- o
\ o
\ /
o
\
o <- o
(imagine the diagonal connections having arrow heads so that they point up-and-left or down-and-left, as needed).
However, non-tree graphs can have nodes that merge back (these are git's merges):
o <- o
/ \
o <- o \
\ o \
\ / \
o o
\ /
o <- o
or:
o--o
/ \
o--o o--o
\ /
o--o
(I'm just compressing the notation further here, nodes still generally point leftward).
Next, git's ..
notation does not mean what most people usually first think it means. In particular, let's take a look at this graph again, add another node, and use some single letters to mark particular nodes:
o---o
/ \
A--o \
\ B \
\ / \
o C--D
\ /
o---o
And, let's do one more thing, and stop thinking about this as just git log
but rather the more general case of "selecting revisions with ancestry".
Selecting revisions (commits), with ancestry
If we select revision A
, we get just revision A
, because it has no ancestors (nothing to the left of it).
If we select revision B
we get this piece of the graph:
A--o
\ B
\ /
o
This is because select-with-ancestry means "Take the commit I identify, and all the commits I can get to by following the arrows back out of it." Here the result is somewhat interesting, but not very interesting since there are no merges and following the arrows nets us a linear chain of four commits, starting from B
and going back to A
.
Selecting either C
or D
with ancestry, though, gets us much further. Let's see what we get with D
:
o---o
/ \
A--o \
\ \
\ \
o C--D
\ /
o---o
This is, in fact, everything except commit B
. Why didn't we get B
? Because the arrows all point leftward: we get D
, which points to C
, which points to two un-lettered commits; those two point left, and so on, but when we hit the node just left-and-down of B
, we aren't allowed to go rightward, against the arrow, so we can't reach B
.
Two-dot notation
Now, the two-dot notation in git is really just shorthand syntax for set subtraction.1 That is, if we write B..D
for instance, it means: "Select D
with ancestry, and then select B
with ancestry, and then give me the set of commits from the D
selection after excluding (subtracting away) all commits from the B
selection."
Selecting D
with ancestry gets the entire graph except for the B
commit. Subtracting away the B
selection removes A
, the two o
nodes we drew earlier, and B
. How can we remove B
when it's not in the set? Easy: we just pretend to remove it and say we're done! That is, set subtraction only bothers to remove things that are actually in the set.
The result for B..D
is therefore this graph:
o---o
\
\
\
\
C--D
/
o---o
Three-dot notation
The three-dot notation is different. It's more useful in a simple branch-y graph, perhaps even a straight tree. Let's start with the tree-like graph this time and look at both two- and three-dot notation. Here's our tree-like graph, with some single letter names for nodes put in:
o--I
/
G--H
\ J
\ /
K
\
o--L
This time I've added extra letters because we'll need to talk about some of the places the commits "join up", in particular at nodes H
and K
.
Using two-dot notation, what do we get for L..I
? To find the answer, start at node I
and work backwards. You must always move leftward, even if you also go up or down. These are the commits that are selected. Then, start at node L
and work backwards, finding the nodes to un-select; if you come across any earlier selected ones, toss them out. (Making the final list is left as an exercise, though I'll put the answer in as a footnote.2)
Now let's see the three-dot notation in action. What it does is a bit complicated, because it must find the merge base between two branches in the graph. The merge base has a formal definition,3 but for our purposes it's just: "The point where, when following the graph backwards, we meet up at some commit."
In this case, for instance, if we ask for L...I
or I...L
—both produce the same result—git finds all commits that are reachable from either commit, but not from both. That is, it excludes the merge base and all earlier commits, but keeps the commits beyond that point.
The merge base of L
and I
(or I
and L
) is commit H
, so we get things after H
, but not H
itself, and we cannot reach node J
from either I
or L
since it's not in their ancestry. Hence, the result for I...L
or L...I
is:
o--I
K
\
o--L
(Note that these histories do not join up, since we tossed out node H
.)
--ancestry-path
Now, all these are ordinary selection operations. None have been modified with --ancestry-path
. The documentation for git log
and git rev-list
—these two are almost the same command, except for their output format—describes --ancestry-path
this way:
When given a range of commits to display (e.g.
commit1..commit2
orcommit2 ^commit1
), only display commits that exist directly on the ancestry chain between thecommit1
andcommit2
, i.e. commits that are both descendants ofcommit1
, and ancestors ofcommit2
.
We define ancestors here in terms of the commit DAG: a first commit is a direct ancestor of a second if the second has an arrow pointing back at the first, and an indirect ancestor if the second points back at the first through some chain of commits. (For selection purposes a commit is also considered an ancestor of itself.)
Descendants (also sometimes called children) are defined similarly, but by going against the arrows in the graph. A commit is a child (or descendant) of another commit if there's a path between them.
Note that the description of the --ancestry-path
talks about using the two-dot notation, not the three-dot notation, probably because the implementation of the three-dot notation is a little bit weird inside. As noted earlier, B...D
excludes (as if with leading ^
) the merge base (or bases, if there is/are more than one) of the two commits, so the merge base is the one that play the "must be child-of" role. I'll mention how --ancestry-path
works with this, though I'm not sure how useful it is in "real world" examples.
Practical examples
What does this mean in practice? Well, it depends on the arguments you give, and the actual commit DAG. Let's look at the funky loopy graph again:
o---o
/ \
A--o \
\ B \
\ / \
o C--D
\ /
o---o
Suppose we ask for B..D
here without --ancestry-path
. This means we take commit D
and its ancestors, but exclude B
and its ancestors, just as we saw before. Now let's add --ancestry-path
. Everything we had earlier was an ancestor of D
, and that's still true, but this new flag says we must also toss out commits that are not children of B
.
How many children does node B
have? Well, none! So we must toss out every commit, giving us a completely empty list.
What if we ask for B...D
, without the special --ancestry-path
notation? That gives us everything reachable from either D
or B
, but excludes everything reachable from both D
and B
:
o---o
\
\
B \
\
C--D
/
o---o
This is the same as B..D
except that we get node B
as well.
[Note: the section below on mixing --ancestry-path
with B...D
was wrong for almost a year, between April 2016 and Feb 2017. It has been fixed to note that the "must be child" part starts from the merge base(s), not from the left side of the B...D
notation.]
Suppose we add --ancestry-path
here. We start with the same graph we just got for B...D
without --ancestry-path
, but then discard items that are not children of the merge base. The merge base is the o
just to the left of B
. The top row o
commits are not children of this node, so they are discarded. Again, as with ancestors, we consider a node its own child, so we would keep this node itself—giving this partial result:
B
/
o C--D
\ /
o---o
But, while we are (or --ancestry-path
is) discarding children of this merge base node, the merge base node itself, to the down-and-left of B
, was not in the B...D
graph in the first place. Hence, the final result (actually tested in Git 2.10.1) is:
B
C--D
/
o---o
(Again, I'm not really sure how useful this is in practice. The starting graph, again, is that of B...D
: everything reachable from either commit, minus everything reachable from both commits: this works by discarding starting from every merge base, if there are two or more. The child-of checking code also handles a list of commits. It retains everything that is a child of any of the merge bases, if there are multiple merge bases. See the function limit_to_ancestry
in revision.c
.)
Thus, it depends on the graph and the selectors
The final action of X..Y
or X...Y
, with or without --ancestry-path
, depends on the commit graph. To predict it, you must draw the graph. (Use git log --graph
, perhaps with --oneline --decorate --all
, or use a viewer that draws the graph for you.)
1There's an exception in git diff
, which does its own special handling for X..Y
and X...Y
. When you are not using git diff
you should just ignore its special handling.
2We start with I
and the o
to its left, and also H
and G
. Then we lose H
and G
when we work back from L
, so the result is just o--I
.
3The formal definition is that the merge base is the Lowest Common Ancestor, or LCA, of the given nodes in the graph. In some graphs there may be multiple LCAs; for Git, these are all merge bases, and X...Y
will exclude all of them.
It's interesting / instructive to run git rev-parse B...D
for the graph I drew. These commit hashes here depend on not just the graph itself, and the commit, but also the time stamps at which one makes the commits, so if you build this same graph, you will get different hashes, but here are the ones I got while revising the answer to fix the description of --ancestry-path
interacting with B...D
:
$ git rev-parse B...D
3f0490d4996aecc6a17419f9cf5a4ab420c34cc2
7f0b666b4098282301a9f95e056a646483c2e5fc
^843eaf75d78520f9a569da35d4e561a036a7f107
but we can see that these are D
, B
, and the merge base, in that order, using several more commands:
$ git rev-parse B # this produces the middle hash
7f0b666b4098282301a9f95e056a646483c2e5fc
and:
$ git rev-parse D # this produces the first hash
3f0490d4996aecc6a17419f9cf5a4ab420c34cc2
and:
$ git merge-base B D # this produces the last, negated, hash
843eaf75d78520f9a569da35d4e561a036a7f107
Graphs with multiple merge bases do occur, but they're somewhat harder to construct—the easy way is with "criss cross" merges, where you run git checkout br1; git merge br2; git checkout br2; git merge br1
. If you get this situation and run git rev-list
you will see several negated hashes, one per merge base. Run git merge-base --all
and you will see the same set of merge bases.
As the documentation says, --ancestry-path
removes commits that are not descendant of origin/master
. If you have a local, unmerged branch, and this branch is based on a commit which is older than origin/master
, then commits in this branch will not be shown because these commits are not descendant of origin/master
.
Git 2.38 (Q3 2022) illustrates how git log --ancestry-path
works, and extends that --ancestry-path
option with a value.
"git rev-list --ancestry-path=C A..B
"(man) is a natural extension of git rev-list A..B
;
instead of choosing a subset of A..B
to those that have ancestry relationship with A
, it lets a subset with ancestry relationship with C
.
See commit 1838e21 (19 Aug 2022) by Derrick Stolee (derrickstolee
).
See commit 257418c, commit 11ea33c (19 Aug 2022) by Elijah Newren (newren
).
(Merged by Junio C Hamano -- gitster
-- in commit 0b08ba7, 29 Aug 2022)
revision
: allow --ancestry-path to take an argumentSigned-off-by: Elijah Newren
Acked-by: Derrick Stolee
We have long allowed users to run e.g.
git log --ancestry-path master..seen
which shows all commits which satisfy all three of these criteria:
- are an ancestor of
seen
- are not an ancestor of
master
- have
master
as an ancestorThis commit allows another variant:
git log --ancestry-path=$TOPIC master..seen
which shows all commits which satisfy all of these criteria:
- are an ancestor of
seen
- are not an ancestor of master
- have
$TOPIC
in their ancestry-paththat last bullet can be defined as commits meeting any of these criteria:
- are an ancestor of
$TOPIC
- have
$TOPIC
as an ancestor- are
$TOPIC
This also allows multiple
--ancestry-path
arguments, which can be used to find commits with any of the given topics in their ancestry path.
rev-list-options
now includes in its man page:
--ancestry-path[=<commit>]
When given a range of commits to display (e.g. '
commit1..commit2
' or 'commit2 {caret}commit1
'), only display commits in that range that are ancestors of<commit>
, descendants of<commit>
, or<commit>
itself.If no commit is specified, use '
commit1
' (the excluded part of the range) as<commit>
.Can be passed multiple times; if so, a commit is included if it is any of the commits given or if it is an ancestor or descendant of one of them.
As an example use case, consider the following commit history:
----------------------------------------------------------------------- D---E-------F / \ \ B---C---G---H---I---J / \ A-------K---------------L--M -----------------------------------------------------------------------
When we want to find out what commits in
M
are contaminated with the bug introduced byD
and need fixing, however, we might want to view only the subset of 'D..M
' that are actually descendants ofD
, i.e. excludingC
andK
.This is exactly what the
--ancestry-path
option does.
Applied to the 'D..M' range, it results in:----------------------------------------------------------------------- E-------F \ \ G---H---I---J \ L--M -----------------------------------------------------------------------
rev-list-options
now includes in its man page:
We can also use
--ancestry-path=D
instead of--ancestry-path
which means the same thing when applied to the 'D..M
' range but is just more explicit.If we instead are interested in a given topic within this range, and all commits affected by that topic, we may only want to view the subset of
D..M
which contain that topic in their ancestry path.So, using
--ancestry-path=H D..M
for example would result in:----------------------------------------------------------------------- E \ G---H---I---J \ L--M -----------------------------------------------------------------------
Whereas
--ancestry-path=K D..M
would result in----------------------------------------------------------------------- K---------------L--M -----------------------------------------------------------------------