Recursive Operation in Pandas
Check with networkx
, you need a direction graph with 'root'
to 'leaf'
path
import networkx as nx
G=nx.from_pandas_edgelist(df,source='operator',target='nextval', edge_attr=None, create_using=nx.DiGraph())
road=[]
for n in G:
if G.out_degree(n)==0: #leaf
road.append(nx.shortest_path(G, 1, n))
road
Out[82]: [[1, 2, 4], [1, 3, 5, 6]]
Update
import networkx as nx
G=nx.from_pandas_edgelist(df,source='operator',target='nextval', edge_attr=None, create_using=nx.DiGraph())
road=[]
for n in G:
if G.out_degree(n)==0: #leaf
road.append(list(nx.all_simple_paths(G, 1, n)))
road
Out[509]: [[[1, 3, 5, 6], [1, 6]], [[1, 2, 4]]]
Let's try to hand-roll a solution, because thinking about this kind of recursive algorithm is educational. (Of course it is appropriate to just use an existing library in the real world; it will probably be much more fault-tolerant.)
The code you show builds a recognizable representation of the graph itself, but it would be better to use lists (or sets, or tuple) for the values even when there is only one successor node, for consistency. I would argue that sets make the most sense here, since if there are duplicate entries in the input then we should discard the duplicate graph nodes. So let us suppose we start with:
graph = {1: {2, 3}, 2: {4}, 3: {5}, 5: {6}}
We have agreed to restrict ourselves to considering directed acyclic graphs. I propose that the paths from our root node can be found recursively as follows: recursively check each path from each successor node; accumulate these results, and prepend each with the link from the root to the corresponding successor.
Of course, when we write recursive code we like to avoid side effects, since they make it harder to reason. So let us instead say: the accumulation of all paths, defined as (link from node to successor) + (path from successor to end), for each successor, for each pat from that successor. Of course, the way we represent the "link from node to successor" is just the current node name and an arrow; we get the rest of the path from the recursion, including the successor name.
And then we need a base case: if there are no successors, then we have a single path from here to the end (since we are at an end), which is just that node name by itself. It would be simpler for our code if the dead ends in our graph were represented with empty sets; but it's clearly easier to generate the graph just omitting those keys. So we'll lean on dict.get
rather than indexing when we make the check.
Well, that first part sounds an awful lot like a list comprehension to me (with two for
clauses`. For the base case, to match that, we need a list containing one path. That gives us:
def paths(graph, root):
successors = graph.get(root, set())
if not successors:
return [str(root)] # a single path starting here and going nowhere.
return [
f'{root} -> {path}'
for successor in successors
for path in paths(graph, successor)
]
Let's try it:
>>> paths({1: {2, 3}, 2: {4}, 3: {5}, 5: {6}}, 1)
['1 -> 2 -> 4', '1 -> 3 -> 5 -> 6']
Alternatively, you could use generator expressions instead of list comprehensions, or even write it as a recursive generator (using yield
and yield from
).
(If we are feeling cheeky enough, we can continue the functional-programming theme by using a conditional expression:)
def paths(graph, root):
successors = graph.get(root, set())
return [
f'{root} -> {path}'
for successor in successors
for path in paths(graph, successor)
] if successors else [str(root)]