Generalized nontransitive dice

Note: I wrote this answer before question 3 became a separate part of the question; this answer only answers questions 1 and 2. Question 3 seems harder.

Every simple directed graph arises in this way. Consider probability distributions over all $n!$ possible total orders of the values of the $X_i$, and associate a potential directed edge $i\to j$ with the distribution that has probability zero for orders with $X_i\lt X_j$ and $2/n!$ for orders with $X_i\gt X_j$. Given a directed graph, choose any network flow with positive flow along all the directed edges (you can always find one by taking any flow from the sources to the sinks and adding circular flow to the cycles if necessary), and form the normalized linear combination of the distributions associated with the edges, with coefficients given by the flow.

Now consider an edge $i\to j$. The contributions from any edges not incident at $i$ or $j$ don't affect whether $\mathbb P(X_i\gt X_j)\gt\frac12$, since they assign weight to equally many orders with $X_i\gt X_j$ as with $X_i\lt X_j$. Thus we only need to consider edges incident at $i$ or $j$. For an edge $j\to k$, consider the six possible orders of $X_i$, $X_j$ and $X_k$. Three of these are assigned weight by the edge $j\to k$, namely $X_i\gt X_j\gt X_k$, $X_j\gt X_i\gt X_k$ and $X_j\gt X_k\gt X_i$, and of these, one has $X_i\gt X_j$ and two have $X_i\lt X_j$. For an edge $k\to j$, it's the other way around.

Thus, considering the effect of all edges on $\mathbb P(X_i\gt X_j)-\frac12$, the edge $i\to j$ itself contributes $1-\frac12=\frac12$, edges not incident at $i$ or $j$ contribute $\frac12-\frac12=0$, edges "aligned with" $i\to j$, of the forms $k\to j$ and $i\to k$, contribute $\frac23-\frac12=\frac16$, and edges "opposed to" $i\to j$, of the forms $j\to k$ and $k\to i$, contribute $\frac13-\frac12=-\frac16$. Equal but opposite contributions from "aligned" and "opposed" edges cancel, so for a non-terminal edge the net result is equivalent by flow conservation to that of one incoming "opposed" edge incident at $i$ and one outgoing "opposed" edge incident at $j$ with net flows equal to that of $i\to j$. Thus the net result is that flow times $\frac12-\frac16-\frac16=\frac16\gt0$. The situation is only improved by sources and sinks.

If neither $i\to j$ nor $j\to i$ is an edge in the graph, then if $i$ and $j$ are not sources or sinks all contributions cancel and $\mathbb P(X_i\gt X_j)=\frac12\not\gt\frac12$ as desired; and again the situation is only improved by sources and sinks.

Qiaochu, I should add that I probably wouldn't have come up with this solution if I hadn't learned from you to try to reduce everything to linear algebra. :-)


joriki has already presented a solution to Questions 1 and 2; but maybe the following alternate solution has some insight to provide. I'll describe some $n$-tuples of random variables, as sets of vectors in $\mathbb R^n$, with the vectors selected uniformly from that set. For example, with $n=3$, the three random variables $X_1,X_2,X_3$ could be given by the set of vectors $\{(0,1,2), (1,2,0), (2,0,1)\}$ (as in Michael Hardy's comment). Then each $X_i$ individually would be uniformly chosen from $\{0,1,2\}$, but the $X_i$ are far from independent. Notationally, let $e_j = \{0,\dots,0 ,1,0,\dots,0\}$ denote the $j$th standard basis vector in $\mathbb R^n$. Let $f=(1,2,\dots,n)$ and $b=(n,n-1,\dots,1)$.

To begin, consider the following set $U_n = \{u_{ij}\colon 1\le i,j \le n,\, i\ne j\}$ of $n(n-1)$ vectors in $\mathbb R^n$, indexed by ordered pairs of distinct integers from $1$ to $n$, where: $$ u_{ij} = ne_i + ne_j + \begin{cases} f, &\text{if } i<j, \\ b, &\text{if } i>j. \end{cases} $$ For example, with $n=7$, $u_{52} = (0,0,0,0,7,0,0) + (0,7,0,0,0,0,0) + (7,6,5,4,3,2,1) = (7,13,5,4,10,2,1)$, while $u_{25} = (1,8,3,4,11,6,7)$. Note that for each unordered pair ${i,j}$, there are exactly 2 elements of $U_n$, namely $u_{ij}$ and $u_{ji}$, for which the $i$th and $j$th coordinates are the two largest (each one gets to be largest once). Note also that if $i<j$, then the sign of the $i$th coordinate of $u_{kl}$ minus the $j$th coordinate of $u_{kl}$ always equals the sign of $k-l$; in other words, the random variables $X_1,\dots,X_n$ defined by the set $U_n$ have the property that $\mathbb P(X_i>X_j) = \frac12$ for all $i,j$.

Now we modify the set $U_n$ to account for a given directed graph $G$. Let $V_n = \{v_{ij}\colon 1\le i,j \le n,\, i\ne j\}$, where

$$ v_{ij} = \begin{cases} u_{ij}+ne_i, &\text{if there's a directed edge from $i$ to $j$ in $G$,} \\ u_{ij}+ne_j, &\text{if there's a directed edge from $j$ to $i$ in $G$,} \\ u_{ij}, &\text{otherwise.} \end{cases} $$ For example, with $n=7$, suppose that there is a directed edge from 5 to 2 in $G$. Then $v_{52} = (7,13,5,4,17,2,1)$, while $v_{25} = (1,8,3,4,18,6,7)$. With this modification, it's easy to see (after grokking the notation) that $$ \mathbb P(X_i>X_j) = \begin{cases} \frac12+\frac1{n(n-1)}, &\text{if there's a directed edge from $i$ to $j$ in $G$,} \\ \frac12-\frac1{n(n-1)}, &\text{if there's a directed edge from $j$ to $i$ in $G$,} \\ \frac12, &\text{otherwise.} \end{cases} $$


This question is answered fully by NovaDenizen in this answer to a similar question. It turns out that arbitrary graphs can be achieved with independent random variables and a finite sample space. The construction is as follows:

I'm just going to describe how you'd build it up an edge at a time.

Say your graph has $n$ nodes. Start out with $n$ identical 2-sided dice. Each having 0 on both sides will work fine.

Pick an arbitrary edge in the graph, specifying that $D_a$ should beat $D_b$. All the dice currently have $k$ sides, and satisfy the requirements for previously added edges, and unconnected dice are even*. Find $m$, the maximum of all the faces on the dice. To $D_a$, add two faces with $m+3$. To $D_b$, add two faces of $m+2$. To all other dice, add two faces, one of $m+1$ and one of $m+4$. Repeat this procedure for all edges in the graph, winding up with $2e+2$-sided dice.

*I assume "even" here means that neither beats the other with probability $>1/2$.