Why is a Hermitian operator a "quantum random variable"?

Quantum mechanics is indeed a probability theory, but it is a non-commutative probability theory.

So it is not just a matter of having signed/complex measures, but really of having a non-commutative probabilistic framework. Quantum mechanics was developed, historically, before non-commutative probability theories and I think that people in probability modelled non-commutative probability theory on quantum mechanics and not vice-versa. One mathematical example of non-commutative probability is the free probability introduced by Voicolescu (it is similar to quantum mechanics, but in quantum mechanics some of Voicolescu's axioms about freeness are not necessary).

The idea of non-commutative probability is to extend usual probability theory exploiting the fact that random variables usually form an abelian algebra. So you start directly from a $C^*$ or $W^*$ algebra $\mathfrak{A}$ of random variables, possibly non commutative, and introduce (non-commutative, complex) measures as the topological dual $\mathfrak{A}'$. The interpretation in quantum mechanical terms is that states are the non-commutative probabilities, i.e. the positive and norm one elements of $\mathfrak{A}'$, while the observables are usually taken to be the self-adjoint elements affiliated to $\mathfrak{A}$ (i.e. possibly unbounded operators $a$ whose spectral family $\bigl(P_{t}(a)\bigr)_{t\in\mathbb{R}}\subset\mathfrak{A}$ ). The usual concepts of probability extend, mutatis mutandis, to this framework; e.g. the evaluation $\mathbb{E}_\omega(a\in [0,1])$, giving the probability of finding a value in the interval $[0,1]$ for the observable $a$, in the state $\omega$, is given by $\omega\bigl(P_1(a)-P_0(a)\bigr)$.


I'm going to try to explain why and how density operators in quantum mechanics correspond to random variables in classical probability theory, something none of the other answers have even tried to do.

Let's work in a two-dimensional quantum space. We'll use standard physics bra-ket notation. A quantum state is a column vector in this space, and we'll represent a column vector as $\alpha|0\rangle + \beta |1 \rangle.$ A row vector is $\gamma \langle 0 | + \delta \langle 1 |\,$.

Now, you might think that a probability distribution is a measure on quantum states. You can think of it that way, but it turns out that this is too much information. For example, consider two probability distributions on quantum states. First, let's take the probability distribution

$$ \begin{array}{cc} |0\rangle & \mathrm{with\ probability\ }2/3,\\ |1\rangle & \mathrm{with\ probability\ }1/3. \end{array} $$

Next, let's take the probability distribution $$ \begin{array}{cc} \sqrt{{2}/{3}}\,\left|0\right\rangle +\sqrt{1/3}\, \left|1\right\rangle & \mathrm{with\ probability\ }1/2,\\ \sqrt{{2}/{3}}\,\left|0\right\rangle -\sqrt{1/3}\, \left|1\right\rangle & \mathrm{with\ probability\ }1/2. \end{array} $$

It turns out that these two probability distributions are indistinguishable. That is, any measurement you make on one will give exactly the same probability distribution of results that you make on the other. The reason for that is that $$ \frac{2}{3} |0\rangle\langle0| +\frac{1}{3}|1\rangle\langle 1| $$ and $$ \frac{1}{2}\left(\sqrt{2/3}\left|0\right\rangle +\sqrt{1/3}\, \left|1\right\rangle\right) \left(\sqrt{2/3}\left\langle 0\right| +\sqrt{1/3}\, \left\langle 1\right|\right) +\frac{1}{2}\left(\sqrt{{2}/{3}}\left|0\right\rangle -\sqrt{1/3}\, \left|1\right\rangle\right) \left(\sqrt{2/3}\left\langle 0\right| -\sqrt{1/3}\, \left\langle 1\right|\right) $$ are the same matrix.

That is, a probability distribution on quantum states is an overly specified distribution, and it is quite cumbersome to work with. We can predict any experimental outcome for a probability distribution on quantum states if we know the corresponding density operator, and many probability distributions yield the same density operator. If we have a probability density $\mu_v$ on quantum states $v$, we can predict any experimental outcome from the density operator $$ \int v v^* d \mu_v \,. $$

So for quantum probability theory, instead of working with probability distributions on quantum states, we work with density operators instead.

Classical states correspond to orthonormal vectors in Hilbert space, and classical probability distributions correspond to diagonal density operators.


You could certainly model any one quantum observable as a random variable.

The problem comes in when you have multiple observables, which you might attempt to model as classical random variables with some joint distribution. From this joint distribution, you can compute various probabilities (like $\textrm{Prob}(Y\neq X)$, for example), according to the standard rules you learned in your undergraduate probability classes.

The problem is that in general, no joint distribution can yield the probabilities that are predicted by quantum mechanics (and observed in the laboratory).

For example, for classical random variables, it's easy to prove that no matter what the joint distribution of $X,Y$ and $Z$ might be, you have $$\textrm{Prob}(X\neq Z)\leq \textrm{Prob}(X\neq Y)+\textrm{Prob}(Y\neq Z)$$

For quantum observables, such inequalities can be violated. Therefore you need a different formalism.