Efficient graph traversal with LINQ - eliminating recursion

You can always eliminate recursion by replicating the basics of how recursion works with a stack.

  1. place the first item on the top of the stack
  2. While the stack isn't empty, pop an item off the stack
  3. if the current node has children, add them to the stack
  4. Yield return the current item.
  5. Go to step 1!

Crazy smart theoretical answer: https://stackoverflow.com/a/933979/29093

http://cs.saddleback.edu/rwatkins/CS2B/Lab%20Exercises/Stacks%20and%20Recursion%20Lab.pdf


You are right, walking trees and graphs recursively in code that does yield return is a big source of inefficiency.

Generally, you rewrite recursive code with a stack - in a similar way to how it is usually implemented in compiled code.

I did not get a chance to try it out, but this should work:

public static IEnumerable<T> Traverse<T>(this IEnumerable<T> enumerable, Func<T, IEnumerable<T>> recursivePropertySelector) {
    var stack = new Stack<IEnumerable<T>>();
    stack.Push(enumerable);
    while (stack.Count != 0) {
        enumerable = stack.Pop();
        foreach (T item in enumerable) {
            yield return item;
            var seqRecurse = recursivePropertySelector(item);
            if (seqRecurse != null) {
                stack.Push(seqRecurse);
            }
        }
    }
}

First off, you are absolutely correct; if the graph has n nodes of average depth d then the naive nested iterators yield a solution which is O(n*d) in time, and O(d) in stack. If d is a large fraction of n then this can become an O(n2) algorithm, and if d is large then you can blow the stack entirely.

If you are interested in a performance analysis of nested iterators, see former C# compiler developer Wes Dyer's blog post:

http://blogs.msdn.microsoft.com/wesdyer/2007/03/23/all-about-iterators

dasblinkenlight's solution is a variation on the standard approach. I would typically write the program like this:

public static IEnumerable<T> Traverse<T>(
    T root, 
    Func<T, IEnumerable<T>> children)
{
    var stack = new Stack<T>();
    stack.Push(root);
    while(stack.Count != 0)
    {
        T item = stack.Pop();
        yield return item;
        foreach(var child in children(item))
            stack.Push(child);
    }
}

And then if you have multiple roots:

public static IEnumerable<T> Traverse<T>(
    IEnumerable<T> roots, 
    Func<T, IEnumerable<T>> children)
{
    return from root in roots 
           from item in Traverse(root, children)
           select item ;
}

Now, note that a traversal is not what you want if you have a highly interconnected graph or a cyclic graph! If you have a graph with downward pointing arrows:

          A
         / \
        B-->C
         \ /
          D

then the traversal is A, B, D, C, D, C, D. If you have a cyclic or interconnected graph then what you want is the transitive closure.

public static IEnumerable<T> Closure<T>(
    T root, 
    Func<T, IEnumerable<T>> children)
{
    var seen = new HashSet<T>();
    var stack = new Stack<T>();
    stack.Push(root);

    while(stack.Count != 0)
    {
        T item = stack.Pop();
        if (seen.Contains(item))
            continue;
        seen.Add(item);
        yield return item;
        foreach(var child in children(item))
            stack.Push(child);
    }
}

This variation only yields items that have not been yielded before.

I also happen to not be very good at eliminating recursion.

I have written a number of articles on ways to eliminate recursion, and about recursive programming in general. If this subject interests you, see:

http://blogs.msdn.com/b/ericlippert/archive/tags/recursion/

In particular:

http://blogs.msdn.com/b/ericlippert/archive/2005/08/01/recursion-part-two-unrolling-a-recursive-function-with-an-explicit-stack.aspx

http://blogs.msdn.com/b/ericlippert/archive/2005/08/04/recursion-part-three-building-a-dispatch-engine.aspx

http://blogs.msdn.com/b/ericlippert/archive/2005/08/08/recursion-part-four-continuation-passing-style.aspx