Motivation for tensor product in Physics

It is essentially impossible to answer the general question of "how does multilinearity come up naturally in physics?" because of the myriad of possible examples that make up the total answer. Instead, let me describe a situation that very loudly cries out for the use of tensor products of two vectors.

Consider the problem of conservation of momentum for a continuous distribution of electric charge and current, which interacts with an electromagnetic field, under the action of no other external force. I will describe it more or less along the lines of Jackson (Classical Electrodynamics, 3rd edition, §6.7) but depart from it towards the end. This will get very electromagneticky for a while, so if you want to skip to the tensors, you can go straight to equation (1).

The rate of change of the total mechanical momentum of the system is the total Lorentz force, given by $$ \frac{ d\mathbf{P}_\rm{mech}}{dt} =\int_V(\rho\mathbf{E}+\mathbf{J}\times \mathbf{B})d\mathbf{x}. $$ To simplify this, one can take $\rho$ and $\mathbf{J}$ from Maxwell's equations: $$ \rho=\epsilon_0\nabla\cdot\mathbf{E} \ \ \ \text{ and }\ \ \ \mathbf{J}=\frac1{\mu_0}\nabla\times \mathbf{B}-\epsilon_0\frac{\partial \mathbf{E}}{\partial t}.$$ (In particular, this means that what follows is only valid "on shell": momentum is only conserved if the equations of motion are obeyed. Of course!)

One can then put these expressions back, to a nice vector calculus work-out, and come up with the following relation: $$ \begin{align}{} \frac{ d\mathbf{P}_\rm{mech}}{dt} +&\frac{d}{dt}\int_V\epsilon_0\mathbf{E}\times \mathbf{B}d\mathbf{x} \\ &= \epsilon_0\int_V \left[ \mathbf{E}(\nabla\cdot \mathbf{E})-\mathbf{E} \times(\nabla \times \mathbf{E}) + c^2 \mathbf{B} (\nabla \cdot \mathbf{B})- c^2 \mathbf{B} \times (\nabla \times \mathbf{B}) \right]d\mathbf{x}. \end{align} $$

The integral on the left-hand side can be identified as the total electromagnetic momentum, and differs from the integral of the Poynting vector by a factor of $1/c^2$. To get this in the proper form for a conservation law, though, such as the one for energy in this setting, $$ \frac{dE_\rm{mech}}{dt} +\frac{d}{dt}\frac{\epsilon_0}{2}\int_V(\mathbf{E}^2 +c^2\mathbf{B}^2)d\mathbf{x} = -\oint_S \mathbf{S}\cdot d\mathbf{a}, $$ we need to reduce the huge, ugly volume integral into a surface integral.

The way to do this, is, of course, the divergence theorem. However, that theorem is for scalars, and what we have so far is a vector equation. To work further then, we need to (at least temporarily) work in some specific basis $\{\mathbf{e}_1,\mathbf{e}_2,\mathbf{e}_3\}$, and write $\mathbf{E}=\sum_i E_i \mathbf{e}_i$. Let's work with the electric field term first; after that the results also apply to the magnetic term. Thus, to start with, $$ \begin{align}{} \int_V \left[ \mathbf{E}(\nabla\cdot \mathbf{E})-\mathbf{E} \times(\nabla \times \mathbf{E}) \right]d\mathbf{x} = \sum_i \mathbf{e}_i \int_V \left[ E_i(\nabla\cdot \mathbf{E})-\mathbf{e}_i\cdot\left(\mathbf{E} \times(\nabla \times \mathbf{E})\right) \right]d\mathbf{x}. \end{align} $$ These terms should be simplified using the vector calculus identities $$ E_i(\nabla\cdot \mathbf{E}) = \nabla\cdot\left(E_i \mathbf{E}\right) - \mathbf{E}\cdot \nabla E_1 $$ and $$ \mathbf{E} \times(\nabla \times \mathbf{E}) = \frac12\nabla(\mathbf{E}\cdot\mathbf{E})-(\mathbf{E}\cdot\nabla)\mathbf{E}, $$ which mean that the whole combination can be simplified as $$ \begin{align}{} \int_V \left[ \mathbf{E}(\nabla\cdot \mathbf{E})-\mathbf{E} \times(\nabla \times \mathbf{E}) \right]d\mathbf{x} = \sum_i \mathbf{e}_i \int_V \left[ \nabla\cdot\left(E_i \mathbf{E}\right) - \mathbf{e}_i\cdot\left( \frac12\nabla(\mathbf{E}\cdot\mathbf{E}) \right) \right]d\mathbf{x}, \end{align} $$ since the terms in $\mathbf{E}\cdot \nabla E_i$ and $\mathbf{e}_i\cdot\left( (\mathbf{E}\cdot\nabla)\mathbf{E}\right)$ cancel. This means we can write the whole integrand as the divergence of some vector field, and use the divergence theorem: $$ \begin{align}{} \int_V \left[ \mathbf{E}(\nabla\cdot \mathbf{E})-\mathbf{E} \times(\nabla \times \mathbf{E}) \right]d\mathbf{x} &= \sum_i \mathbf{e}_i \int_V \nabla\cdot\left[ E_i \mathbf{E} - \frac12 \mathbf{e}_i E^2 \right]d\mathbf{x} \\ & = \sum_i \mathbf{e}_i \oint_S\left[ E_i \mathbf{E} - \frac12 \mathbf{e}_i E^2 \right]\cdot d\mathbf{a}. \tag 1 \end{align} $$

In terms of conservation law structure, we're essentially done, as we've reduced the rate of change of momentum to a surface term. However, it is crying out for some simplification. In particular, this expression is basis-dependent, but it is so close to being basis independent that it's worth a closer look.

The first term, for instance, is simply crying out for a simplification that would look something like $$ \sum_i \mathbf{e}_i \oint_S E_i \mathbf{E}\cdot d\mathbf{a} = \oint_S \mathbf{E}\, \mathbf{E}\cdot d\mathbf{a} $$ if we could only make sense of an object like $\mathbf{E}\, \mathbf{E}$. Even better, if we could make sense of such a combination, then it turns out that the seemingly basis-dependent combination that would come up in the second term, $\sum_i \mathbf{e}_i\,\mathbf{e}_i$, turns out to be basis independent: one can prove that for any two orthonormal bases $\{\mathbf{e}_1, \mathbf{e}_2, \mathbf{e}_3\}$ and $\{\mathbf{e}_1', \mathbf{e}_2', \mathbf{e}_3'\}$, those combinations are the same: $$ \sum_i \mathbf{e}_i\,\mathbf{e}_i = \sum_i \mathbf{e}_i'\,\mathbf{e}_i' $$ as long as the product $\mathbf{u}\,\mathbf{v}$ of two vectors, whatever it ends up being, is linear on each component, which is definitely a reasonable assumption.

So what, then, should this new vector multiplication be? One key to realizing what we really need is noticing the fact that we haven't yet assigned any real physical meaning to the combination $\mathbf{E}\,\mathbf{E}$; instead, we're only ever interacting with it by dotting "one of the vectors of the product" with the surface area element $d\mathbf{a}$, and that leaves a vector $\mathbf{E}\,\mathbf{E}\cdot d\mathbf{a}$ which we can integrate to get a vector, and that requires no new structure.

Let's then write a list of how we want this new product to behave. To keep things clear, let's give it some fancy new symbol like $\otimes$, mostly to avoid unseemly combinations like $\mathbf{u}\,\mathbf{v}$. We want then,

  • a function $\otimes:V\times V\to W$, which takes euclidean vectors in $V=\mathbb R^3$ into some vector space $W$ in which we'll keep our fancy new objects.
  • Combinations of the form $\mathbf{u}\otimes \mathbf{v}$ should be linear in both $\mathbf{u}$ and $\mathbf{v}$.
  • For all vectors $w$ in $V$, and all combinations $(\mathbf{u},\mathbf{v})\in V\times V$, we want the combination $(\mathbf{u}\otimes \mathbf{v})\cdot\mathbf{w}$ to be a vector in $V$. Even more, we want that to be the vector $(\mathbf{v}\cdot\mathbf{w})\mathbf{u}\in V$.

That last one looks actually pretty strong, but there's evidently room for improvement. For one, it depends on the euclidean structure, which is not actually necessary: we can make an equivalent statement that uses the vector space's dual.

  • For all $(\mathbf{u},\mathbf{v})\in V\times V$ and all $f\in V^\ast$, we want $f_\to(\mathbf{u}\otimes \mathbf{v})=f(\mathbf{v})\mathbf{u}\in V$ to hold, where $f_\to$ simply means that $f$ acts on the factor on the right.

Finally, if we're doing stuff with the dual, we can reformulate that in a slightly prettier way. Since two vectors $\mathbf{u},\mathbf{v}\in V$ are equal if and only if $f(\mathbf{u})=f(\mathbf{v})$ for all $f\in V^\ast$, we can give another equivalent statement of the same statement:

  • For all $(\mathbf{u},\mathbf{v})\in V\times V$ and all $f,g\in V^\ast$, we want $g_\leftarrow f_\to(\mathbf{u}\otimes \mathbf{v})=g(\mathbf{u})f(\mathbf{v})\in V$.

[Note, here, that this last rephrasing isn't really that fancy. Essentially, it is saying that the vector equation (1) is really to be interpreted as a component-by-component equality, and that's not really off the mark of how we actually do things.]

I could keep going, but it's clear that this requirement can be rephrased into the universal property of the tensor product, and that rephrasing is a job for the mathematicians. Thus, you can see the story like this: Upon hitting equation (1), we give to the mathematicians this list of requirements. They go off, think for a bit, and come back telling us that such a structure does exist (i.e. there exist rigorous constructions that obey those requirements) and that it is essentially unique, in the sense that multiple such constructions are possible, but they are canonically isomorphic. For a physicist, what that means is that it's OK to write down objects like $\mathbf{u}\otimes \mathbf{v}$ as long as one does keep within the rules of the game.

As far as electromagnetism goes, this means that we can write our conservation law in the form $$ \frac{ d\mathbf{P}_\rm{mech}}{dt} +\frac{d}{dt}\int_V\epsilon_0\mathbf{E}\times \mathbf{B}d\mathbf{x} = \oint_A \mathcal T\cdot d\mathbf{a} $$ where $$ \mathcal T = \epsilon_0\left[ \mathbf{E}\otimes\mathbf{E}+c^2\mathbf{B}\otimes\mathbf{B} -\frac12\sum_i\mathbf{e}_i\otimes\mathbf{e}_i\left(E^2+c^2 B^2\right) \right] $$ is, of course, the Maxwell stress tensor.

I could go on and on about this, but I think this really captures the essence of how and where it happens in physics that a situation is really begging the use of a tensor product. There are other such situations, of course, but this is the clearest one I know.


I'd like to add another answer to expand on Federico's answer, because quantum mechanics does offer another very clean way of getting directly to the universal property of a tensor product from physical constraints. This universal property is, as you state,

given $k$ vector spaces $V_1,\dots,V_k$ over the same field $\Bbb K$ we want to find a new space $S$ and a universal multilinear map $T$ such that for every vector space $W$ and multilinear mapping $g : V_1\times\cdots\times V_k\to W$ we have a linear map $f : S\to W$ such that $g = f\circ T$.

Say, then, that you have $n$ quantum mechanical systems that you know are described by Hilbert spaces $\mathcal H_1,\ldots,\mathcal H_n$ of their own, and you want to give a coherent description of the whole composite system, whether they're interacting or not. The postulates of quantum mechanics require that there be a Hilbert space that describes the whole system: that is that it be possible, in principle, to superpose any two given states of the global system. Let's call this Hilbert space $\mathcal H$.

The universal map, of course, is simply 'the formation of a composite state': it is simply the map $$T:\mathcal H_1\times\cdots\times\mathcal H_n\to\mathcal H$$ that will give me the state of the global system in $\mathcal H$ if I give it the state of each system in its Hilbert space. Thus far that's all nomenclature. We do require, though, that $T$ be linear in each input, and that's where the physics come in. Say I can prepare systems 2 through $n$ using given preparations, and that I have two distinct states I can prepare system 1 in. Each such state will give a global state in $\mathcal H$. If I now prepare a superposition in system 1, I also want the global system to be in a superposition, because I can see the preparation of systems 2 through $n$ as "addenda" to my superposition procedure in system 1, and it would be inconsistent to have the outcome (the superposition) depend on the presence of other systems.

Finally, let's see what I can get out, physically, from my composite system. The basic physical prediction that I can get from any of the systems, individually, is the set of matrix elements of the form $$\langle \phi|A|\psi\rangle,$$ where I'm measuring on the state $\langle\phi|$ after evolving for some time and possibly multiplying by some observable, which is encoded in some operator $A$. Outcome probabilities, for example, are encoded in the mod-squares of such linear forms, though their phase is also recoverable using suitable measurements.

For our composite system, on the other hand, I need to let the whole system evolve and this may involve interactions. The final amplitude for the evolution-plus-measurement, on the other hand, must be linear in each system, because if I prepare a superposition in one system and leave the others alone the amplitudes must be linear, as the appendage, preparation and measurement of other subsystems can be seen as part of the measurement procedure on each system. Thus, all probability amplitudes of measurements on the whole system must be multilinear functions $g$. Similarly, by construction, this set of multilinears covers all physical amplitudes.

In terms of the composite Hilbert space, though, all I've done is prepare the system in some global state $|\Psi\rangle$, let it evolve for a while, and project it on some global state $\langle \Phi|$, and the amplitude for that, $$\langle\Phi|A|\Psi\rangle,$$ is necessarily a linear function $f=\langle\Phi|A$ on the global Hilbert space $\mathcal H$. Thus we have that all multilinear functions $g$ on the component Hilbert spaces admit a decomposition of the sort $$g=f\circ T,$$ which simply states that all joint measurements on the subsystems are a measurement on the global system. By now, of course, we've provided all the elements for the universal property of the tensor product, so this fixes the structure of $\mathcal H$ and it provides a setting where these requirements come up naturally.


Self containing the discussion in the realm of quantum mechanics, a nice explaination could be the following:

The state of a quantum mechanical system is described by a ray on a complex (separable) hilbert space $\mathcal{H}$. With this I mean that you fix a Hilbert space $\mathcal{H}$ and one state of your system corresponds to a point in $\mathbb{C}\mathbb{P}(\mathcal{H})$.

Physically you usually say this in the following way: two vectors $|\psi_1\rangle$ and $|\psi_2\rangle$ that differ by a multiplicative complex number i.e $|\psi_1\rangle =\alpha |\psi_2\rangle, \ \alpha \in \mathbb{C}$, being in the same ray, descibe the same state. Furthermore you can ask for well normalized states and doing this your $\alpha$ is going to be just a pure phase.

Now, if you have two different systems, each with its hilbert space $\mathcal{H}_1$ and $\mathcal{H}_2$, you often want to build the composite system. I mean with this that you would like to treat the two systems together as just a big one.

Then you should find a hilbert space $\mathcal{H}_{1+2}$ for the total system, and this has to be built with $\mathcal{H}_1$ and $\mathcal{H}_2$.

You can show that neither $\mathcal{H}_1 \times \mathcal{H}_2$ nor $\mathcal{H}_1 \oplus \mathcal{H}_2$ will work for different reasons. If I am not wrong sometimes (in infinite dimensional cases) the cartesian product of two hilbert spaces is not anymore hilbert. While for the direct sum, this is basically made of couple of states of $\mathcal{H}_1$ and $\mathcal{H}_2$ separately and you are not able to discuss interactions between them.

Plus, both the cartesian product and the direct sum of vector spaces do not contain vectors $|w\rangle$ that can not be split in the cartesian product or the direct sum of $|v_1\rangle \in \mathcal{H}_1$ and $|v_2\rangle \in \mathcal{H}_2$. Physically this is terrible since states such as $|w\rangle$ are entagled states, which are experimentally proven to exist.

Therefore you are lead to consider $\mathcal{H}_1 \otimes \mathcal{H}_2$.

For why you need the construction mathematicians do and not the one physicists do, I think a good answer could be the fact that the physicists way does not prove the fact that the tensor product of two given vector (here also hilbert) spaces is unique.

This is extremely important to my personal opinion, since given a system I want to associate him exaclty one well defined hilbert space.

Hope this was useful, but yes, as someone suggested above, it is really hard to answer this kind of questions in a good way.