How does ancestry path work with git log?

Matthieu Moy's answer is correct but may not help you very much, if you haven't been exposed to the necessary graph theory.

DAGs

First, let's take a quick look at Directed Acyclic Graphs or DAGs. A DAG is just a graph (hence the g), i.e., a collection of nodes and connections between them—these work like train stations on rail lines, for instance, where the stations are the nodes—that is "directed" (the d: trains only run one way) and have no loops in them (the a).

Linear chains and tree structures are valid DAGs (note: newer commits are to the right, in general, here):

o <- o <- o

or:

       o <- o
      /
o <- o
      \   o
       \ /
        o
         \
          o <- o

(imagine the diagonal connections having arrow heads so that they point up-and-left or down-and-left, as needed).

However, non-tree graphs can have nodes that merge back (these are git's merges):

       o <- o
      /      \
o <- o        \
      \   o    \
       \ /      \
        o        o
         \      /
          o <- o

or:

     o--o
    /    \
o--o      o--o
    \    /
     o--o

(I'm just compressing the notation further here, nodes still generally point leftward).

Next, git's .. notation does not mean what most people usually first think it means. In particular, let's take a look at this graph again, add another node, and use some single letters to mark particular nodes:

     o---o
    /     \
A--o       \
    \   B   \
     \ /     \
      o       C--D
       \     /
        o---o

And, let's do one more thing, and stop thinking about this as just git log but rather the more general case of "selecting revisions with ancestry".

Selecting revisions (commits), with ancestry

If we select revision A, we get just revision A, because it has no ancestors (nothing to the left of it).

If we select revision B we get this piece of the graph:

A--o
    \   B
     \ /
      o

This is because select-with-ancestry means "Take the commit I identify, and all the commits I can get to by following the arrows back out of it." Here the result is somewhat interesting, but not very interesting since there are no merges and following the arrows nets us a linear chain of four commits, starting from B and going back to A.

Selecting either C or D with ancestry, though, gets us much further. Let's see what we get with D:

     o---o
    /     \
A--o       \
    \       \
     \       \
      o       C--D
       \     /
        o---o

This is, in fact, everything except commit B. Why didn't we get B? Because the arrows all point leftward: we get D, which points to C, which points to two un-lettered commits; those two point left, and so on, but when we hit the node just left-and-down of B, we aren't allowed to go rightward, against the arrow, so we can't reach B.

Two-dot notation

Now, the two-dot notation in git is really just shorthand syntax for set subtraction.1 That is, if we write B..D for instance, it means: "Select D with ancestry, and then select B with ancestry, and then give me the set of commits from the D selection after excluding (subtracting away) all commits from the B selection."

Selecting D with ancestry gets the entire graph except for the B commit. Subtracting away the B selection removes A, the two o nodes we drew earlier, and B. How can we remove B when it's not in the set? Easy: we just pretend to remove it and say we're done! That is, set subtraction only bothers to remove things that are actually in the set.

The result for B..D is therefore this graph:

     o---o
          \
           \
            \
             \
              C--D
             /
        o---o

Three-dot notation

The three-dot notation is different. It's more useful in a simple branch-y graph, perhaps even a straight tree. Let's start with the tree-like graph this time and look at both two- and three-dot notation. Here's our tree-like graph, with some single letter names for nodes put in:

     o--I
    /
G--H
    \   J
     \ /
      K
       \
        o--L

This time I've added extra letters because we'll need to talk about some of the places the commits "join up", in particular at nodes H and K.

Using two-dot notation, what do we get for L..I? To find the answer, start at node I and work backwards. You must always move leftward, even if you also go up or down. These are the commits that are selected. Then, start at node L and work backwards, finding the nodes to un-select; if you come across any earlier selected ones, toss them out. (Making the final list is left as an exercise, though I'll put the answer in as a footnote.2)

Now let's see the three-dot notation in action. What it does is a bit complicated, because it must find the merge base between two branches in the graph. The merge base has a formal definition,3 but for our purposes it's just: "The point where, when following the graph backwards, we meet up at some commit."

In this case, for instance, if we ask for L...I or I...L—both produce the same result—git finds all commits that are reachable from either commit, but not from both. That is, it excludes the merge base and all earlier commits, but keeps the commits beyond that point.

The merge base of L and I (or I and L) is commit H, so we get things after H, but not H itself, and we cannot reach node J from either I or L since it's not in their ancestry. Hence, the result for I...L or L...I is:

     o--I
 



      K
       \
        o--L

(Note that these histories do not join up, since we tossed out node H.)

--ancestry-path

Now, all these are ordinary selection operations. None have been modified with --ancestry-path. The documentation for git log and git rev-list—these two are almost the same command, except for their output format—describes --ancestry-path this way:

When given a range of commits to display (e.g. commit1..commit2 or commit2 ^commit1), only display commits that exist directly on the ancestry chain between the commit1 and commit2, i.e. commits that are both descendants of commit1, and ancestors of commit2.

We define ancestors here in terms of the commit DAG: a first commit is a direct ancestor of a second if the second has an arrow pointing back at the first, and an indirect ancestor if the second points back at the first through some chain of commits. (For selection purposes a commit is also considered an ancestor of itself.)

Descendants (also sometimes called children) are defined similarly, but by going against the arrows in the graph. A commit is a child (or descendant) of another commit if there's a path between them.

Note that the description of the --ancestry-path talks about using the two-dot notation, not the three-dot notation, probably because the implementation of the three-dot notation is a little bit weird inside. As noted earlier, B...D excludes (as if with leading ^) the merge base (or bases, if there is/are more than one) of the two commits, so the merge base is the one that play the "must be child-of" role. I'll mention how --ancestry-path works with this, though I'm not sure how useful it is in "real world" examples.

Practical examples

What does this mean in practice? Well, it depends on the arguments you give, and the actual commit DAG. Let's look at the funky loopy graph again:

     o---o
    /     \
A--o       \
    \   B   \
     \ /     \
      o       C--D
       \     /
        o---o

Suppose we ask for B..D here without --ancestry-path. This means we take commit D and its ancestors, but exclude B and its ancestors, just as we saw before. Now let's add --ancestry-path. Everything we had earlier was an ancestor of D, and that's still true, but this new flag says we must also toss out commits that are not children of B.

How many children does node B have? Well, none! So we must toss out every commit, giving us a completely empty list.


What if we ask for B...D, without the special --ancestry-path notation? That gives us everything reachable from either D or B, but excludes everything reachable from both D and B:

     o---o
          \
           \
        B   \
             \
              C--D
             /
        o---o

This is the same as B..D except that we get node B as well.

[Note: the section below on mixing --ancestry-path with B...D was wrong for almost a year, between April 2016 and Feb 2017. It has been fixed to note that the "must be child" part starts from the merge base(s), not from the left side of the B...D notation.]

Suppose we add --ancestry-path here. We start with the same graph we just got for B...D without --ancestry-path, but then discard items that are not children of the merge base. The merge base is the o just to the left of B. The top row o commits are not children of this node, so they are discarded. Again, as with ancestors, we consider a node its own child, so we would keep this node itself—giving this partial result:

        B
       /
      o       C--D
       \     /
        o---o

But, while we are (or --ancestry-path is) discarding children of this merge base node, the merge base node itself, to the down-and-left of B, was not in the B...D graph in the first place. Hence, the final result (actually tested in Git 2.10.1) is:

        B

              C--D
             /
        o---o

(Again, I'm not really sure how useful this is in practice. The starting graph, again, is that of B...D: everything reachable from either commit, minus everything reachable from both commits: this works by discarding starting from every merge base, if there are two or more. The child-of checking code also handles a list of commits. It retains everything that is a child of any of the merge bases, if there are multiple merge bases. See the function limit_to_ancestry in revision.c.)

Thus, it depends on the graph and the selectors

The final action of X..Y or X...Y, with or without --ancestry-path, depends on the commit graph. To predict it, you must draw the graph. (Use git log --graph, perhaps with --oneline --decorate --all, or use a viewer that draws the graph for you.)


1There's an exception in git diff, which does its own special handling for X..Y and X...Y. When you are not using git diff you should just ignore its special handling.

2We start with I and the o to its left, and also H and G. Then we lose H and G when we work back from L, so the result is just o--I.

3The formal definition is that the merge base is the Lowest Common Ancestor, or LCA, of the given nodes in the graph. In some graphs there may be multiple LCAs; for Git, these are all merge bases, and X...Y will exclude all of them.

It's interesting / instructive to run git rev-parse B...D for the graph I drew. These commit hashes here depend on not just the graph itself, and the commit, but also the time stamps at which one makes the commits, so if you build this same graph, you will get different hashes, but here are the ones I got while revising the answer to fix the description of --ancestry-path interacting with B...D:

$ git rev-parse B...D
3f0490d4996aecc6a17419f9cf5a4ab420c34cc2
7f0b666b4098282301a9f95e056a646483c2e5fc
^843eaf75d78520f9a569da35d4e561a036a7f107

but we can see that these are D, B, and the merge base, in that order, using several more commands:

$ git rev-parse B     # this produces the middle hash
7f0b666b4098282301a9f95e056a646483c2e5fc

and:

$ git rev-parse D     # this produces the first hash
3f0490d4996aecc6a17419f9cf5a4ab420c34cc2

and:

$ git merge-base B D  # this produces the last, negated, hash
843eaf75d78520f9a569da35d4e561a036a7f107

Graphs with multiple merge bases do occur, but they're somewhat harder to construct—the easy way is with "criss cross" merges, where you run git checkout br1; git merge br2; git checkout br2; git merge br1. If you get this situation and run git rev-list you will see several negated hashes, one per merge base. Run git merge-base --all and you will see the same set of merge bases.


As the documentation says, --ancestry-path removes commits that are not descendant of origin/master. If you have a local, unmerged branch, and this branch is based on a commit which is older than origin/master, then commits in this branch will not be shown because these commits are not descendant of origin/master.


Git 2.38 (Q3 2022) illustrates how git log --ancestry-path works, and extends that --ancestry-path option with a value.

"git rev-list --ancestry-path=C A..B"(man) is a natural extension of git rev-list A..B;
instead of choosing a subset of A..B to those that have ancestry relationship with A, it lets a subset with ancestry relationship with C.

See commit 1838e21 (19 Aug 2022) by Derrick Stolee (derrickstolee).
See commit 257418c, commit 11ea33c (19 Aug 2022) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit 0b08ba7, 29 Aug 2022)

revision: allow --ancestry-path to take an argument

Signed-off-by: Elijah Newren
Acked-by: Derrick Stolee

We have long allowed users to run e.g.

git log --ancestry-path master..seen

which shows all commits which satisfy all three of these criteria:

  • are an ancestor of seen
  • are not an ancestor of master
  • have master as an ancestor

This commit allows another variant:

git log --ancestry-path=$TOPIC master..seen

which shows all commits which satisfy all of these criteria:

  • are an ancestor of seen
  • are not an ancestor of master
  • have $TOPIC in their ancestry-path

that last bullet can be defined as commits meeting any of these criteria:

  • are an ancestor of $TOPIC
  • have $TOPIC as an ancestor
  • are $TOPIC

This also allows multiple --ancestry-path arguments, which can be used to find commits with any of the given topics in their ancestry path.

rev-list-options now includes in its man page:

--ancestry-path[=<commit>]

When given a range of commits to display (e.g. 'commit1..commit2' or 'commit2 {caret}commit1'), only display commits in that range that are ancestors of <commit>, descendants of <commit>, or <commit> itself.

If no commit is specified, use 'commit1' (the excluded part of the range) as <commit>.

Can be passed multiple times; if so, a commit is included if it is any of the commits given or if it is an ancestor or descendant of one of them.

As an example use case, consider the following commit history:

-----------------------------------------------------------------------
      D---E-------F
     /     \       \
    B---C---G---H---I---J
   /                     \
  A-------K---------------L--M
-----------------------------------------------------------------------

When we want to find out what commits in M are contaminated with the bug introduced by D and need fixing, however, we might want to view only the subset of 'D..M' that are actually descendants of D, i.e. excluding C and K.

This is exactly what the --ancestry-path option does.
Applied to the 'D..M' range, it results in:

-----------------------------------------------------------------------
      E-------F
       \       \
        G---H---I---J
                 \
              L--M
-----------------------------------------------------------------------

rev-list-options now includes in its man page:

We can also use --ancestry-path=D instead of --ancestry-path which means the same thing when applied to the 'D..M' range but is just more explicit.

If we instead are interested in a given topic within this range, and all commits affected by that topic, we may only want to view the subset of D..M which contain that topic in their ancestry path.

So, using --ancestry-path=H D..M for example would result in:

-----------------------------------------------------------------------
E
\
G---H---I---J
\
L--M
-----------------------------------------------------------------------

Whereas --ancestry-path=K D..M would result in

-----------------------------------------------------------------------
K---------------L--M
-----------------------------------------------------------------------

Tags:

Git