Why is free probability a generalization of probability theory?
Quite a lot of questions here!
It is perhaps worth making a distinction between scalar classical probability theory - the study of scalar classical random variables - and more general classical probability theory, in which one studies more general random objects such as random graphs, random sets, random matrices, etc.. The former has the structure of a commutative algebra in addition to an expectation, which allows one to then form many familiar concepts in probability theory such as moments, variances, correlations, characteristic functions, etc., though in many cases one has to impose some integrability condition on the random variables involved in order to ensure that these concepts are well defined; in particular, it can be technically convenient to restrict attention to bounded random variables in order to avoid all integrability issues. In the more general case, one usually does not have the commutative algebra structure, and (in the case of random variables not taking values in a vector space) one also does not have an expectation structure any more.
My focus in my free probability notes is on scalar random variables (commutative or noncommutative), in which one needs both the algebra structure and the expectation structure in order to define the concepts mentioned above. Neither structure is necessary to define the other, but they enjoy some compatibility conditions (e.g. ${\bf E} X^2 \geq 0$ for any real random variable $X$, in both the commutative and noncommutative settings). In my notes, I also restricted largely to the case of bounded random variables $X \in L^\infty$ for simplicity (or at least with random variables $X \in L^{\infty-}$ in which all moments were finite), but one can certainly study unbounded noncommutative random variables as well, though the theory becomes significantly more delicate (much as the spectral theorem becomes significantly more subtle when working with unbounded operators rather than bounded operators).
When teaching classical probability theory, one usually focuses first on the scalar case, and then perhaps moves on to the general case in more advanced portions of the course. Similarly, noncommutative probability (of which free probability is a subfield) usually focuses first on the case of scalar noncommutative variables, which was the also the focus of my post. For instance, random $n \times n$ matrices, using the normalised expected trace $X \mapsto \frac{1}{n} {\bf E} \mathrm{tr} X$ as the trace structure, would be examples of scalar noncommutative random variables (note that the normalised expected trace of a random matrix is a scalar, not a matrix). It is true that random $n \times n$ matrices, when equipped with the classical expectation ${\bf E}$ instead of the normalised expected trace $\frac{1}{n} {\bf E} \mathrm{tr}$, can also be viewed as classical non-scalar random variables, but this is a rather different structure (note now that the expectation is a matrix rather than a scalar) and should not be confused with the scalar noncommutative probability structure one can place here.
It is certainly possible to consider non-scalar noncommutative random variables, such as a matrix in which the entries are themselves elements of some noncommutative tracial von Neumann algebra (e.g. a matrix of random matrices); see e.g. Section 5 of these slides of Speicher. Similarly, there is certainly literature on free point processes (see e.g. this paper), noncommutative white noise (see e.g. this paper), etc., but these are rather advanced topics and beyond the scope of the scalar noncommutative probability theory discussed in my notes. I would not recommend trying to think about these objects until one is completely comfortable conceptually both with non-scalar classical random variables and with scalar noncommutative random variables, as one is likely to become rather confused otherwise when dealing with them. (This is analogous to how one should not attempt to study quantum field theory until one is completely comfortable conceptually both with classical field theory and with the quantum theory of particles. Much as one should not conflate the superficially similar notions of a classical field and a quantum wave function, one should also not conflate the superficially similar notions of a non-scalar classical random variable and a scalar noncommutative random variable.)
Regarding localisable measurable spaces: all standard probability spaces generate localisable measurable spaces. Technically, it is true that there do exist some pathological probability spaces whose corresponding measurable spaces are not localisable; however the vast majority of probability theory can be conducted on standard probability spaces, and there are some technical advantages to doing so, particularly when it comes to studying conditional expectations with respect to continuous random variables or continuous $\sigma$-algebras.
There have been many good answers to this question, but it might be that the main point gets lost in too many details. So, as kind of expert on free probability theory, let me try to give a short direct answer to the question “Why is free probability a generalization of probability theory.”
There are two main ingredients in free probability theory: first, the general notion of a non-commutative probability space and second, the more specific notion of freeness (or free independence).
A non-commutative probability space consists of an algebra and a linear functional. The algebra can (despite the use of “non-commutative”) also be commutative and thus a classical probability space (encoded in the commutative algebra of random variables and the functional given by taking the expectation with respect to the underlying probability measure) is also an example of a non-commutative probability space. However, in this generality non-commutative (as well as classical) probability spaces are not too exciting. One needs more structure for interesting statements. In the classical setting, the most basic additional structure is “independence”. In free probability the corresponding structure is “free independence”. However, free independenc is NOT a generalization of independence; it is an analogue. What independence means for classical (commuting) random variables, free independence means for non-commuting variables. Apart from trivial situations, there are no classical random variables which are free. Hence freeness is not a kind of dependence for classical variables; it is a special relation for non-commuting variables, which behaves in many respects like independence.
Hence the above question has two possible answers, depending on how it is interpreted.
Read as “Why is a non-commutative probability space a generalization of a classical probability space?” the answer is just: because a commutative algebra is also allowed as an example of a non-commutative algebra.
Read as “Why is free independence a generalization of classical independence?” the answer is: this is actually not true, free independence is not a generalization, but an analogue of classical independence.
First of all, you are mixing many questions into one post...
It depends on what do you mean by "generalization". And I am not sure what you mean by talking about "commutative" without mentioning notion of "exchageability" or "conditional independence".
The classical probability theory dealt with dependent random variables, but usually they are discussed in stochastic process like autocorrelation process, where the dependent relation is tractable. In free probability, the dependence could be wilder.
But in what sense does classical probability theory only concern itself with commutative quantities?
In short it discussed mostly nothing beyond exchageability.
Free probability is one of many possible generalizations of the notion of exchangeability. There are many other generalization of the notion of exchangeability, for example the exchageable pairs. Free probability provides a method that deals with such a "non-commutative relations". But free probability is yet not the only method that deals with dependence more than exchangeablility.
From the perspective of a statistician, free probability is a natural generalization of the notion of exchangeability. Generalization of de Finetti's Theorem is one very interesting application of free probability framework. If you are a Bayesian, a natural question to ask is how to justify the conditional independence assumption in model building. de Finetti's theorem is a strong justification that why putting a prior on the exchangeable sequence is natural. After this justification, some people asked what if we have a weaker assumption than exchangeability? Then we asked what can we assert if a pair of random variable is "non-commutative" in some sense.
From perspective of a probabilist, I understand why you interpret free probability in that way. You can regard free probability as a general framework that includes, say, matrix valued random variables. In that way you can also treat $W^*$-algebra of bounded random operators on the sample space $X$ (Hilbert space) with specified value space $Y$(say matrix space) as a generalization of $M(X)$ the space of collection of probability measures on $X$. Then free probability is a formalism of the notion of (conditional independence) There is a monograph talked about this view in depth.
But doesn't classical probability theory study random variables with non-existent moments? Even in an elementary course I remember learning about the Cauchy distribution.
The reason why Tao's post is limited to bounded case is partially due to the nice correspondence of $W^*$-algebra and the formalism that I mentioned above. Your claim seems a bit odd to me. $L^p$ spaces do not include many real valued functions; Sobolev spaces do not include many $L^p$ functions, are they generalizations of the former notions?