Generating a random DAG

The answer to https://mathematica.stackexchange.com/questions/608/how-to-generate-random-directed-acyclic-graphs applies: if you have a adjacency matrix representation of the edges of your graph, then if the matrix is lower triangular, it's a DAG by necessity.

A similar approach would be to take an arbitrary ordering of your nodes, and then consider edges from node x to y only when x < y. That constraint should also get your DAGness by construction. Memory comparison would be one arbitrary way to order your nodes if you're using structs to represent nodes.

Basically, the pseudocode would be something like:

for(i = 0; i < N; i++) {
    for (j = i+1; j < N; j++) {
        maybePutAnEdgeBetween(i, j);
    }
}

where N is the number of nodes in your graph.

The pseudocode suggests that the number of potential DAGs, given N nodes, is

2^(n*(n-1)/2),

since there are

n*(n-1)/2

ordered pairs ("N choose 2"), and we can choose either to have the edge between them or not.


I cooked up a C program that does this. The key is to 'rank' the nodes, and only draw edges from lower ranked nodes to higher ranked ones.

The program I wrote prints in the DOT language.

Here is the code itself, with comments explaining what it means:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define MIN_PER_RANK 1 /* Nodes/Rank: How 'fat' the DAG should be.  */
#define MAX_PER_RANK 5
#define MIN_RANKS 3    /* Ranks: How 'tall' the DAG should be.  */
#define MAX_RANKS 5
#define PERCENT 30     /* Chance of having an Edge.  */

int main (void)
{
  int i, j, k,nodes = 0;
  srand (time (NULL));

  int ranks = MIN_RANKS
              + (rand () % (MAX_RANKS - MIN_RANKS + 1));

  printf ("digraph {\n");
  for (i = 0; i < ranks; i++)
    {
      /* New nodes of 'higher' rank than all nodes generated till now.  */
      int new_nodes = MIN_PER_RANK
                      + (rand () % (MAX_PER_RANK - MIN_PER_RANK + 1));

      /* Edges from old nodes ('nodes') to new ones ('new_nodes').  */
      for (j = 0; j < nodes; j++)
        for (k = 0; k < new_nodes; k++)
          if ( (rand () % 100) < PERCENT)
            printf ("  %d -> %d;\n", j, k + nodes); /* An Edge.  */

      nodes += new_nodes; /* Accumulate into old node set.  */
    }
  printf ("}\n");
  return 0;
}

And here is the graph generated from a test run:

A randomly generated DAG


So, to try to put all these reasonable answers together:

(In the following, I used V for the number of vertices in the generated graph, and E for the number of edges, and we assume that E ≤ V(V-1)/2.)

Personally, I think the most useful answer is in a comment, by Flavius, who points at the code at http://condor.depaul.edu/rjohnson/source/graph_ge.c. That code is really simple, and it's conveniently described by a comment, which I reproduce:

To generate a directed acyclic graph, we first
generate a random permutation dag[0],...,dag[v-1].
(v = number of vertices.)
This random permutation serves as a topological
sort of the graph. We then generate random edges of the
form (dag[i],dag[j]) with i < j.

In fact, what the code does is generate the request number of edges by repeatedly doing the following:

  1. generate two numbers in the range [0, V);
  2. reject them if they're equal;
  3. swap them if the first is larger;
  4. reject them if it has generated them before.

The problem with this solution is that as E gets closes to the maximum number of edges V(V-1)/2, then the algorithm becomes slower and slower, because it has to reject more and more edges. A better solution would be to make a vector of all V(V-1)/2 possible edges; randomly shuffle it; and select the first (requested edges) edges in the shuffled list.

The reservoir sampling algorithm lets us do this in space O(E), since we can deduce the endpoints of the kth edge from the value of k. Consequently, we don't actually have to create the source vector. However, it still requires O(V2) time.

Alternatively, one can do a Fisher-Yates shuffle (or Knuth shuffle, if you prefer), stopping after E iterations. In the version of the FY shuffle presented in Wikipedia, this will produce the trailing entries, but the algorithm works just as well backwards:

// At the end of this snippet, a consists of a random sample of the
// integers in the half-open range [0, V(V-1)/2). (They still need to be
// converted to pairs of endpoints).
vector<int> a;
int N = V * (V - 1) / 2;
for (int i = 0; i < N; ++i) a.push_back(i);
for (int i = 0; i < E; ++i) {
  int j = i + rand(N - i);
  swap(a[i], a[j]);
a.resize(E);

This requires only O(E) time but it requires O(N2) space. In fact, this can be improved to O(E) space with some trickery, but an SO code snippet is too small to contain the result, so I'll provide a simpler one in O(E) space and O(E log E) time. I assume that there is a class DAG with at least:

class DAG {
  // Construct an empty DAG with v vertices
  explicit DAG(int v);

  // Add the directed edge i->j, where 0 <= i, j < v
  void add(int i, int j);
};

Now here goes:

// Return a randomly-constructed DAG with V vertices and and E edges.
// It's required that 0 < E < V(V-1)/2.
template<typename PRNG>
DAG RandomDAG(int V, int E, PRNG& prng) {
  using dist = std::uniform_int_distribution<int>;
  // Make a random sample of size E
  std::vector<int> sample;
  sample.reserve(E);
  int N = V * (V - 1) / 2;
  dist d(0, N - E);  // uniform_int_distribution is closed range
  // Random vector of integers in [0, N-E]
  for (int i = 0; i < E; ++i) sample.push_back(dist(prng));
  // Sort them, and make them unique
  std::sort(sample.begin(), sample.end());
  for (int i = 1; i < E; ++i) sample[i] += i;
  // Now it's a unique sorted list of integers in [0, N-E+E-1]
  // Randomly shuffle the endpoints, so the topological sort
  // is different, too.
  std::vector<int> endpoints;
  endpoints.reserve(V);
  for (i = 0; i < V; ++i) endpoints.push_back(i);
  std::shuffle(endpoints.begin(), endpoints.end(), prng);
  // Finally, create the dag
  DAG rv;
  for (auto& v : sample) {
    int tail = int(0.5 + sqrt((v + 1) * 2));
    int head = v - tail * (tail - 1) / 2;
    rv.add(head, tail);
  }
  return rv;
}