Probability and measure theory

2 reasons why measure theory is needed in probability:

  1. We need to work with random variables that are neither discrete nor continuous like $X$ below:

Let $(\Omega, \mathscr{F}, \mathbb{P})$ be a probability space and let $Z, B$ be random variables in $(\Omega, \mathscr{F}, \mathbb{P})$ s.t.

$Z$ ~ $N(\mu,\sigma^2)$, $B$ ~ Bin$(n,p)$.

Consider random variable $X = Z1_A + B1_{A^c}$ where $A \in \mathscr{F}$, discrete or continuous depending on A.

  1. We need to work with certain sets:

Consider $U$ ~ Unif$([0,1])$ s.t. $f_U(u) = 1_{[0,1]}$ on $([0,1], 2^{[0,1]}, \lambda)$.

In probability w/o measure theory:

If $(i_1, i_2) \subseteq [0,1]$, then $$P(U \in (i_1, i_2)) = \int_{i_1}^{i_2} 1 du = i_2 - i_1$$

In probability w/ measure theory:

$$P(U \in (i_1, i_2)) = \lambda((i_1, i_2)) = i_2 - i_1$$

So who needs measure theory right? Well, what about if we try to compute

$$P(U \in \mathbb{Q} \cup [0,1])?$$

We need measure theory to say $$P(U \in \mathbb{Q} \cup [0,1]) = \lambda(\mathbb{Q}) = 0$$

I think Riemann integration doesn't give an answer for $$\int_{\mathbb{Q} \cup [0,1]} 1 du$$.

Furthermore, $\exists A \subset {[0,1]}$ s.t. $P(U \in A)$ is undefined.


From Rosenthal's A First Look at Rigorous Probability Theory:

enter image description here


enter image description here


enter image description here


enter image description here


Since measure-theoretic axiomatization of probability was formulated by Kolmogorov, I think you'd be very much interested in this article. I had similar questions to you, and most of them were clarified after the reading - although I've also read Kolmogorov's original work after that.

One of the ideas is that historically there were proofs for LLN and CLT available without explicit use of measure theory, however both Borel and Kolmogorov started using measure-theoretical tools to solve probabilistic problems on $[0,1]$ and similar spaces, such as treating a binary expansion of $x\in [0,1]$ as coordinates of a random walk. Then the idea was: it works well, what if we try to use this method much more often, and even say that this is the way to go actually? When the work of Kolmogorov was first out, not every mathematician was agree with his claim (to say the least). But you are somewhat right in saying that measure theory allows dealing with probability easier. It's like solving basic geometric problems using vector algebra.

Regarding facts exclusively available for discrete/continuous distributions: usually a good probabilistic theorem is quite general and works fine with both cases. However, there are some things that hold for "continuous" measures only. A proper name for continuous is atomless: $\mu$ is atomless if for any measurable set $F$ there exists $E \subseteq F$ such that $\mu(E) < \mu(F)$ where inequality must be strict. Then the range of $\mu$ is convex, that is for all $0 \leq c \leq \mu(\Omega)$ there exists a set $C$ such that $\mu(C) = c$. Of course, that does not hold for measures with atoms. Not a very probabilistic fact though.


There's an exciting and amazing theorem in Kai Lai Chung's A Course in Probability Theory about distribution functions (d.f.), which states :

enter image description here

or with this refinement,

enter image description here This super professional statement, overkills every old fashion probability theory about discrete, continuous or mixed distribution functions !

Theorem 1.3.2 can't be proven, unless by the powerful paradigm, Measure Theory.


Moreover, Measure Theory has much more tools to study Probability Theory. In fact,

  • Strong Law of Large Numbers can NOT be proven without measure Theory.

  • You can NOT define Brownian Motion precisely

  • You can NOT work with Stochastic Differential Equations without Measure Theory