Efficient graph traversal with LINQ - eliminating recursion
You can always eliminate recursion by replicating the basics of how recursion works with a stack.
- place the first item on the top of the stack
- While the stack isn't empty, pop an item off the stack
- if the current node has children, add them to the stack
- Yield return the current item.
- Go to step 1!
Crazy smart theoretical answer: https://stackoverflow.com/a/933979/29093
http://cs.saddleback.edu/rwatkins/CS2B/Lab%20Exercises/Stacks%20and%20Recursion%20Lab.pdf
You are right, walking trees and graphs recursively in code that does yield return
is a big source of inefficiency.
Generally, you rewrite recursive code with a stack - in a similar way to how it is usually implemented in compiled code.
I did not get a chance to try it out, but this should work:
public static IEnumerable<T> Traverse<T>(this IEnumerable<T> enumerable, Func<T, IEnumerable<T>> recursivePropertySelector) {
var stack = new Stack<IEnumerable<T>>();
stack.Push(enumerable);
while (stack.Count != 0) {
enumerable = stack.Pop();
foreach (T item in enumerable) {
yield return item;
var seqRecurse = recursivePropertySelector(item);
if (seqRecurse != null) {
stack.Push(seqRecurse);
}
}
}
}
First off, you are absolutely correct; if the graph has n nodes of average depth d then the naive nested iterators yield a solution which is O(n*d) in time, and O(d) in stack. If d is a large fraction of n then this can become an O(n2) algorithm, and if d is large then you can blow the stack entirely.
If you are interested in a performance analysis of nested iterators, see former C# compiler developer Wes Dyer's blog post:
http://blogs.msdn.microsoft.com/wesdyer/2007/03/23/all-about-iterators
dasblinkenlight's solution is a variation on the standard approach. I would typically write the program like this:
public static IEnumerable<T> Traverse<T>(
T root,
Func<T, IEnumerable<T>> children)
{
var stack = new Stack<T>();
stack.Push(root);
while(stack.Count != 0)
{
T item = stack.Pop();
yield return item;
foreach(var child in children(item))
stack.Push(child);
}
}
And then if you have multiple roots:
public static IEnumerable<T> Traverse<T>(
IEnumerable<T> roots,
Func<T, IEnumerable<T>> children)
{
return from root in roots
from item in Traverse(root, children)
select item ;
}
Now, note that a traversal is not what you want if you have a highly interconnected graph or a cyclic graph! If you have a graph with downward pointing arrows:
A
/ \
B-->C
\ /
D
then the traversal is A, B, D, C, D, C, D. If you have a cyclic or interconnected graph then what you want is the transitive closure.
public static IEnumerable<T> Closure<T>(
T root,
Func<T, IEnumerable<T>> children)
{
var seen = new HashSet<T>();
var stack = new Stack<T>();
stack.Push(root);
while(stack.Count != 0)
{
T item = stack.Pop();
if (seen.Contains(item))
continue;
seen.Add(item);
yield return item;
foreach(var child in children(item))
stack.Push(child);
}
}
This variation only yields items that have not been yielded before.
I also happen to not be very good at eliminating recursion.
I have written a number of articles on ways to eliminate recursion, and about recursive programming in general. If this subject interests you, see:
http://blogs.msdn.com/b/ericlippert/archive/tags/recursion/
In particular:
http://blogs.msdn.com/b/ericlippert/archive/2005/08/01/recursion-part-two-unrolling-a-recursive-function-with-an-explicit-stack.aspx
http://blogs.msdn.com/b/ericlippert/archive/2005/08/04/recursion-part-three-building-a-dispatch-engine.aspx
http://blogs.msdn.com/b/ericlippert/archive/2005/08/08/recursion-part-four-continuation-passing-style.aspx