Functional Breadth First Search
It is pretty OK to have a mutable state hidden inside the function. If it is not visible, then it doesn't exist. I usually use hash sets for this. But in general, you should stick to this if your profiling pinpointed that. Otherwise, just use set data structure. OCaml has an excellent Set based on eagerly balanced AVL trees.
You have to keep track of the nodes you visit. Lists are not king in the ML family, they're just one of the oligarchs. You should just use a set (tree based) to track the visited nodes. This will add a log factor compared to mutating the node state, but is so much cleaner it's not funny. If you know more about your nodes you can possibly eliminate the log factor by using a set not based on a tree (a bit vector say).
See example implementation of BFS, with explanation in Martin Erwig: Inductive Graphs and Functional Graph Algorithms. Also, DFS implementation, based on David King , John Launchbury: Structuring Depth-First Search Algorithms in Haskell
(Hint for the S.O. police: yes, this looks like a link-only answer, but that is how science works - you have to actually read the papers, re-typing their abstracts isn't very useful.)
One option is to use inductive graphs, which are a functional way of representing and working with arbitrary graph structures. They are provided by Haskell's fgl library and described in "Inductive Graphs and Funtional Graph Algorithms" by Martin Erwig.
For a gentler introduction (with illustrations!), see my blog post Generating Mazes with Inductive Graphs.
The trick with inductive graphs is that they let you pattern match on graphs. The common functional idiom for working with lists is to decompose them into a head element and the rest of the list, then recurse on that:
map f [] = []
map f (x:xs) = f x : map f xs
Inductive graphs let you do the same thing, but for graphs. You can decompose an inductive graph into a node, its edges and the rest of the graph.
(source: jelv.is)
Here we match on the node 1
and all of its edges (highlighted in blue), separate from the rest of the graph.
This lets us write a map
for graphs (in Haskellish pseudocode that can be realized with pattern synonyms):
gmap f Empty = Empty
gmap f ((in, node, out) :& rest) = f (in, node, out) :& gmap f rest
The main shortcoming of this approach as opposed to lists is that graphs do not have a single natural way to decompose: the same graph can be built up in multiple ways. The map code above would visit all the vertices, but in an arbitrary (implementation-dependent) order.
To overcome this, we add another construct: a match
function that takes a specific node. If that node is in our graph, we get a successful match just like above; if it isn't, the whole match fails.
This construct is enough to write a DFS or a BFS—with elegant code that looks almost identical for both!
Instead of manually marking nodes as visited, we just recurse on the rest of the graph except the node we're seeing now: at each step, we're working with a smaller and smaller portion of the original graph. If we try to access a node we've already seen with match
, it won't be in the remaining graph and that branch will fail. This lets our graph-processing code look just like our normal recursive functions over lists.
Here's a DFS for this sort of graph. It keeps the stack of nodes to visit as a list (the frontier), and takes the initial frontier to start. The output is a list of nodes traversed in order. (The exact code here can't be written with the library directly without some custom pattern synonyms.)
dfs _frontier Empty = []
dfs [] _graph = []
dfs (n:ns) (match n -> Just (ctx, rest)) = -- not visited n
dfs (neighbors' ctx ++ ns) rest
dfs (n:ns) graph = -- visited n
dfs ns graph
A pretty simple recursive function. To turn it into a breadth-first search, all we have to do is replace our stack frontier with a queue: instead of putting the neighbors on the front of the list, we put them on the back:
bfs _frontier Empty = []
bfs [] _graph = []
bfs (n:ns) (match n -> Just (ctx, rest)) = -- not visited n
bfs (ns ++ neighbors' ctx) rest
bfs (n:ns) graph = -- visited n
bfs ns graph
Yep, that's all we need! We don't have to do anything special to keep track of the nodes we visited as we recurse over the graph, just like we don't have to keep track of the list cells we've visited: each time we recurse, we're only getting the part of the graph we haven't seen.