Is there an introduction to probability theory from a structuralist/categorical perspective?
$\def\Spec{\mathop{\rm Spec}} \def\R{{\bf R}} \def\Ep{{\rm E}^+} \def\L{{\rm L}} \def\EpL{\Ep\L}$ One can argue that an object of the right category of spaces in measure theory is not a set equipped with a σ-algebra of measurable sets, but rather a set $S$ equipped with a σ-algebra $M$ of measurable sets and a σ-ideal $N$ of negligible sets, i.e., sets of measure 0. The reason for this is that you can hardly state any theorem of measure theory or probability theory without referring to sets of measure 0. However, objects of this category contain less data than the usual measured spaces, because they are not equipped with a measure. Therefore I prefer to call them enhanced measurable spaces, since they are measurable spaces enhanced with a σ-ideal of negligible sets. A morphism of enhanced measurable spaces $(S,M,N)→(T,P,Q)$ is a map $S\to T$ such that the preimage of every element of $P$ is a union of an element of $M$ and a subset of an element of $N$ and the preimage of every element of $Q$ is a subset of an element of $N$.
Irving Segal proved in “Equivalences of measure spaces” (see also Kelley's “Decomposition and representation theorems in measure theory”) that for an enhanced measurable space $(S,M,N)$ that admits a faithful measure (meaning $μ(A)=0$ if and only if $A∈N$) the following properties are equivalent.
- The Boolean algebra $M/N$ of equivalence classes of measurable sets is complete;
- The space of equivalence classes of all bounded (or unbounded) real-valued functions on $S$ modulo equality almost everywhere is Dedekind-complete;
- The Radon-Nikodym theorem is true for $(S,M,N)$;
- The Riesz representation theorem is true for $(S,M,N)$ (the dual of $\L^1$ is isomorphic to $\L^∞$);
- Equivalence classes of bounded functions on $S$ form a von Neumann algebra (alias W*-algebra).
An enhanced measurable space that satisfies these conditions (including the existence of a faithful measure) is called localizable. This theorem tells us that if we want to prove anything nontrivial about measurable spaces, we better restrict ourselves to localizable enhanced measurable spaces. We also have a nice illustration of the claim I made in the first paragraph: none of these statements would be true without identifying objects that differ on a set of measure 0. For example, take a nonmeasurable set $G$ and a family of singleton subsets of $G$ indexed by themselves. This family of measurable sets does not have a supremum in the Boolean algebra of measurable sets, thus disproving a naive version of (1).
But restricting to localizable enhanced measurable spaces does not eliminate all the pathologies: one must further restrict to the so-called compact and strictly localizable enhanced measurable spaces, and use a coarser equivalence relation on measurable maps: $f$ and $g$ are weakly equal almost everywhere if for any measurable subset $B$ of the codomain the symmetric difference $f^*B⊕g^*B$ of preimages of $B$ under $f$ and $g$ is a negligible subset of the domain. (For codomains like real numbers this equivalence relation coincides with equality almost everywhere.)
An enhanced measurable space is strictly localizable if it splits as a coproduct (disjoint union) of σ-finite (meaning there is a faithful finite measure) enhanced measurable spaces. An enhanced measurable space $(X,M,N)$ is (Marczewski) compact if there is a compact class $K⊂M$ such that for any $m∈M∖N$ there is $k∈K∖N$ such that $k⊂m$. Here a compact class is a collection $K⊂2^X$ of subsets of $X$ such that for any $K'⊂K$ the following finite intersection property holds: if for any finite $K''⊂K'$ we have $⋂K''≠∅$, then also $⋂K'≠∅$.
The best argument for such restrictions is the following Gelfand-type duality theorem for commutative von Neumann algebras.
Theorem. The following 5 categories are equivalent.
- The category of compact strictly localizable enhanced measurable spaces with measurable maps modulo weak equality almost everywhere.
- The category of hyperstonean topological spaces and open continuous maps.
- The category of hyperstonean locales and open maps.
- The category of measurable locales (and arbitrary maps of locales).
- The opposite category of commutative von Neumann algebras and normal (alias ultraweakly continuous) unital *-homomorphisms.
I actually prefer to work with the opposite category of the category of commutative von Neumann algebras, or with the category of measurable locales. The reason for this is that the point-set definition of a measurable space exhibits immediate connections only (perhaps) to descriptive set theory, and with additional effort to Boolean algebras, whereas the description in terms of operator algebras or locales immediately connects measure theory to other areas of the central core of mathematics (noncommutative geometry, algebraic geometry, complex geometry, differential geometry, topos theory, etc.).
Additionally, note how the fourth category (measurable locales) is a full subcategory of the category of locales. Roughly, the latter can be seen as a slight enlargement of the usual category of topological spaces, for which all the usual theorems of general topology continue to hold (e.g., Tychonoff, Urysohn, Tietze, various results about paracompact and uniform spaces, etc.). In particular, there is a fully faithful functor from sober topological spaces (which includes all Hausdorff spaces) to locales. This functor is not surjective, i.e., there are nonspatial locales that do not come from topological spaces. As it turns out, all measurable locales (excluding discrete ones) are nonspatial. Thus, measure theory is part of (pointfree) general topology, in the strictest sense possible.
The non-point-set languages (2–5) are also easier to use in practice. Let me illustrate this statement with just one example: when we try to define measurable bundles of Hilbert spaces on a compact strictly localizable enhanced measurable space in a point-set way, we run into all sorts of problems if the fibers can be nonseparable, and I do not know how to fix this problem in the point-set framework. On the other hand, in the algebraic framework we can simply say that a bundle of Hilbert spaces is a Hilbert W*-module over the corresponding von Neumann algebra.
Categorical properties of von Neumann algebras (hence of compact strictly localizable enhanced measurable spaces) were investigated by Guichardet in “Sur la catégorie des algèbres de von Neumann”. Let me mention some of his results, translated in the language of enhanced measurable spaces. The category of compact strictly localizable enhanced measurable spaces admits equalizers and coequalizers, arbitrary coproducts, hence also arbitrary colimits. It also admits products (and hence arbitrary limits), although they are quite different from what one might think. For example, the product of two real lines is not $\R^2$ with the two obvious projections. The product contains $\R^2$, but it also has a lot of other stuff, for example, the diagonal of $\R^2$, which is needed to satisfy the universal property for the two identity maps on $\R$. The more intuitive product of measurable spaces ($\R\times\R=\R^2$) corresponds to the spatial tensor product of von Neumann algebras and forms a part of a symmetric monoidal structure on the category of measurable spaces. See Guichardet's paper for other categorical properties (monoidal structures on measurable spaces, flatness, existence of filtered limits, etc.).
Another property worthy of mentioning is that the category of commutative von Neumann algebras is a locally presentable category, which immediately allows one to use the adjoint functor theorem to construct commutative von Neumann algebras (hence enhanced measurable spaces) via their representable functors.
Finally let me mention pushforward and pullback properties of measures on enhanced measurable spaces. I will talk about more general case of $\L^p$-spaces instead of just measures (i.e., $\L^1$-spaces). For the sake of convenience, denote $\L_p(M)=\L^{1/p}(M)$, where $M$ is an enhanced measurable space. Here $p$ can be an arbitrary complex number with a nonnegative real part. We do not need a measure on $M$ to define $\L_p(M)$. For instance, $\L_0$ is the space of all bounded functions (i.e., the commutative von Neumann algebra corresponding to $M$), $\L_1$ is the space of finite complex-valued measures (the dual of $\L_0$ in the ultraweak topology), and $\L_{1/2}$ is the Hilbert space of half-densities. I will also talk about extended positive part $\EpL_p$ of $\L_p$ for real $p$. In particular, $\EpL_1$ is the space of all (not necessarily finite) positive measures on $M$.
Pushforward for $\L_p$-spaces. Suppose we have a morphism of enhanced measurable spaces $M\to N$. If $p=1$, then we have a canonical map $\L_1(M)\to\L_1(N)$, which just the dual of $\L_0(N)→\L_0(M)$ in the ultraweak topology. Geometrically, this is the fiberwise integration map. If $p≠1$, then we only have a pushforward map of the extended positive parts, namely, $\EpL_p(M)→\EpL_p(N)$, which is nonadditive unless $p=1$. Geometrically, this is the fiberwise $\L_p$-norm. Thus $\L_1$ is a functor from the category of enhanced measurable spaces to the category of Banach spaces and $\EpL_p$ is a functor to the category of “positive homogeneous $p$-cones”. The pushforward map preserves the trace on $\L_1$ and hence sends a probability measure to a probability measure.
To define pullbacks of $\L_p$-spaces (in particular, $\L_1$-spaces) one needs to pass to a different category of enhanced measurable spaces. In the algebraic language, if we have two commutative von Neumann algebras $A$ and $B$, then a morphism from $A$ to $B$ is a usual morphism of commutative von Neumann algebras $f\colon A\to B$ together with an operator valued weight $T\colon\Ep(B)\to\Ep(A)$ associated to $f$. Here $\Ep(A)$ denotes the extended positive part of $A$. (Think of positive functions on $\Spec A$ that can take infinite values.) Geometrically, this is a morphism $\Spec f\colon\Spec B\to\Spec A$ between the corresponding enhanced measurable spaces and a choice of measure on each fiber of $\Spec f$. Now we have a canonical additive map $\EpL_p(\Spec A)\to\EpL_p(\Spec B)$, which makes $\EpL_p$ into a contravariant functor from the category of enhanced measurable spaces and measurable maps equipped with a fiberwise measure to the category of “positive homogeneous additive cones”.
If we want to have a pullback of $\L_p$-spaces themselves and not just their extended positive parts, we need to replace operator valued weights in the above definition by finite complex-valued operator valued weights $T\colon B\to A$ (think of a fiberwise finite complex-valued measure). Then $\L_p$ becomes a functor from the category of enhanced measurable spaces to the category of Banach spaces (if the real part of $p$ is at most $1$) or quasi-Banach spaces (if the real part of $p$ is greater than $1$). Here $p$ is an arbitrary complex number with a nonnegative real part. Notice that for $p=0$ we get the original map $f\colon A\to B$ and in this (and only this) case we do not need $T$.
Finally, if we restrict ourselves to an even smaller subcategory defined by the additional condition $T(1)=1$ (i.e., $T$ is a conditional expectation; think of a fiberwise probability measure), then the pullback map preserves the trace on $\L_1$ and in this case the pullback of a probability measure is a probability measure.
There is also a smooth analog of the theory described above. The category of enhanced measurable spaces and their morphisms is replaced by the category of smooth manifolds and submersions, $\L_p$-spaces are replaced by bundles of $p$-densities, operator valued weights are replaced by sections of the bundle of relative 1-densities, the integration map on 1-densities is defined via Poincaré duality (to avoid any dependence on measure theory) etc. There is a forgetful functor that sends a smooth manifold to its underlying enhanced measurable space.
Of course, the story does not end here, there are many other interesting topics to consider: products of measurable spaces, the difference between Borel and Lebesgue measurability, conditional expectations, etc. An index of my writings on this topic is available.
In the spirit of this answer to a different question, I'll offer a contrarian answer. How to understand probability theory from a structuralist perspective:
Don't.
To put it less provocatively, what I really mean is that probabilists don't think about probability theory that way, which is why they don't write their introductory books that way. The reason probabilists don't think that way is that probability theory is not about probability spaces. Probability theory is about families of random variables. Probability spaces are the mathematical formalism used to talk about random variables, but most probabilists keep the probability spaces in the background as much as possible. Doing probability theory while dwelling on probability spaces is a little like doing number theory while dwelling on a definition of 1 as $\{\{\}\}$ etc. (That last sentence is definitely an overstatement, but I can't think of a more apt analogy offhand.)
That said, multiple perspectives are always good to have, so I'm very happy you asked this question and that you've gotten some very nice noncontrarian answers that I hope to digest better myself.
Added: Here is something which is perhaps more similar to dwelling on probability spaces. To set the stage for graph theory carefully one may start by defining a graph as a pair $(V,E)$ in which $V$ is a (finite, nonempty) set and $E$ is a set of cardinality 2 subsets of $V$. You need to start tweaking this in various ways to allow loops, directed graphs, multigraphs, infinite graphs, etc. But worrying about the details of how you do this is a distraction from actually doing graph theory.
A few months ago, Terry Tao had a really insightful post about "the probabilistic way of thinking", in which he suggested that a nice category of probability spaces was one in which the objects were probability spaces and the morphisms were extensions (ie, measurable surjections which are probability preserving). By avoiding looking at the details of the sample space, you can elegantly capture the style of probabilistic arguments in which you introduce new sources of randomness as needed.