Intuitive proof of multivariable changing of variables formula (jacobian) without using mapping and/or measure theory?

The multivariable change of variables formula is nicely intuitive, and it's not too hard to imagine how somebody might have derived the formula from scratch. However, it seems that proving the theorem rigorously is not as easy as one might hope.

Here's my attempt at explaining the intuition -- how you would derive or discover the formula.

The first thing to understand is that if $A$ is an $N \times N$ matrix with real entries and $S \subset \mathbb R^N$, then $m(AS) = |\det A| \, m(S)$. (Technically I should assume that $S$ is measurable.) This is intuitively clear from the SVD of $A$: \begin{equation} A = U \Sigma V^T \end{equation} where $U$ and $V$ are orthogonal and $\Sigma$ is diagonal with nonnegative diagonal entries. Multiplying by $V^T$ doesn't change the measure of $S$. Multiplying by $\Sigma$ scales along each axis, so the measure gets multiplied by $\det \Sigma = | \det A|$. Multiplying by $U$ doesn't change the measure.

Next suppose $\Omega$ and $\Theta$ are open subsets of $\mathbb R^N$ and suppose $g:\Omega \to \Theta$ is $1-1$ and onto. We should probably assume $g$ and $g^{-1}$ are $C^1$ just to be safe. (Since we're just seeking an intuitive derivation of the change of variables formula, we aren't obligated to worry too much about what assumptions we make on $g$.) Suppose also that $f:\Theta \to \mathbb R$ is, say, continuous (or whatever conditions we need for the theorem to actually be true).

Partition $\Theta$ into tiny subsets $\Theta_i$. For each $i$, let $u_i$ be a point in $\Theta_i$. Then \begin{equation} \int_{\Theta} f(u) \, du \approx \sum_i f(u_i) m(\Theta_i). \end{equation}

Now let $\Omega_i = g^{-1}(\Theta_i)$ and $x_i = g^{-1}(u_i)$ for each $i$. The sets $\Omega_i$ are tiny and they partition $\Omega$. Then \begin{align} \sum_i f(u_i) m(\Theta_i) &= \sum_i f(g(x_i)) m(g(\Omega_i)) \\ &\approx \sum_i f(g(x_i)) m(g(x_i) + Jg(x_i) (\Omega_i - x_i)) \\ &= \sum_i f(g(x_i)) m(Jg(x_i) \Omega_i) \\ &\approx \sum_i f(g(x_i)) |\det Jg(x_i)| m(\Omega_i) \\ &\approx \int_{\Omega} f(g(x)) |\det Jg(x)| \, dx. \end{align}

We have discovered that \begin{equation} \int_{g(\Omega)} f(u) \, du \approx \int_{\Omega} f(g(x)) |\det Jg(x)| \, dx. \end{equation} By using even tinier subsets $\Theta_i$, the approximation would be even better -- so we see by a limiting argument that we actually have equality.

At a key step in the above argument, we used the approximation \begin{equation} g(x) \approx g(x_i) + Jg(x_i)(x - x_i) \end{equation} which is a good approximation when $x$ is close to $x_i$


To do it for a particular number of variables is very easy to follow. Consider what you do when you integrate a function of x and y over some region. Basically, you chop up the region into boxes of area ${\rm d}x{~\rm d} y$, evaluate the function at a point in each box, multiply it by the area of the box. This can be notated a bit sloppily as:

$$\sum_{b \in \text{Boxes}} f(x,y) \cdot \text{Area}(b)$$

What you do when changing variables is to chop the region into boxes that are not rectangular, but instead chop it along lines that are defined by some function, call it $u(x,y)$, being constant. So say $u=x+y^2$, this would be all the parabolas $x+y^2=c$. You then do the same thing for another function, $v$, say $v=y+3$. Now in order to evaluate the expression above, you need to find "area of box" for the new boxes - it's not ${\rm d}x~{\rm d}y$ anymore.

As the boxes are infinitesimal, the edges cannot be curved, so they must be parallelograms (adjacent lines of constant $u$ or constant $v$ are parallel.) The parallelograms are defined by two vectors - the vector resulting from a small change in $u$, and the one resulting from a small change in $v$. In component form, these vectors are ${\rm d}u\left\langle\frac{\partial x}{\partial u}, ~\frac{\partial y}{\partial u}\right\rangle $ and ${\rm d}v\left\langle\frac{\partial x}{\partial v}, ~\frac{\partial y}{\partial v}\right\rangle $. To see this, imagine moving a small distance ${\rm d}u$ along a line of constant $v$. What's the change in $x$ when you change $u$ but hold $v$ constant? The partial of $x$ with respect to $u$, times ${\rm d}u$. Same with the change in $y$. (Notice that this involves writing $x$ and $y$ as functions of $u$, $v$, rather than the other way round. The main condition of a change in variables is that both ways round are possible.)

The area of a paralellogram bounded by $\langle x_0,~ y_0\rangle $ and $\langle x_1,~ y_1\rangle $ is $\vert y_0x_1-y_1x_0 \vert$, (or the abs value of the determinant of a 2 by 2 matrix formed by writing the two column vectors next to each other.)* So the area of each box is

$$\left\vert\frac{\partial x}{\partial u}{\rm d}u\frac{\partial y}{\partial v}{\rm d}v - \frac{\partial y}{\partial u}{\rm d}u\frac{\partial x}{\partial v}dv\right\vert$$

or

$$\left\vert \frac{\partial x}{\partial u}\frac{\partial y}{\partial v} - \frac{\partial y}{\partial u}\frac{\partial x}{\partial v}\right\vert~{\rm d}u~{\rm d}v$$

which you will recognise as being $\mathbf J~{\rm d}u~{\rm d}v$, where $\mathbf J$ is the Jacobian.

So, to go back to our original expression

$$\sum_{b \in \text{Boxes}} f(x,y) \cdot \text{Area}(b)$$

becomes

$$\sum_{b \in \text{Boxes}} f(u, v) \cdot \mathbf J \cdot {\rm d}u{\rm d}v$$

where $f(u, v)$ is exactly equivalent to $f(x, y)$ because $u$ and $v$ can be written in terms of $x$ and $y$, and vice versa. As the number of boxes goes to infinity, this becomes an integral in the $uv$ plane.

To generalize to $n$ variables, all you need is that the area/volume/equivalent of the $n$ dimensional box that you integrate over equals the absolute value of the determinant of an n by n matrix of partial derivatives. This is hard to prove, but easy to intuit.


*to prove this, take two vectors of magnitudes $A$ and $B$, with angle $\theta$ between them. Then write them in a basis such that one of them points along a specific direction, e.g.:

$$A\left\langle \frac{1}{\sqrt 2}, \frac{1}{\sqrt 2}\right\rangle \text{ and } B\left\langle \frac{1}{\sqrt 2}(\cos(\theta)+\sin(\theta)),~ \frac{1}{\sqrt 2} (\cos(\theta)-\sin(\theta))\right\rangle $$

Now perform the operation described above and you get $$\begin{align} & AB\cdot \frac12 \cdot (\cos(\theta) - \sin(\theta)) - AB \cdot 0 \cdot (\cos(\theta) + \sin(\theta)) \\ = & \frac 12 AB(\cos(\theta)-\sin(\theta)-\cos(\theta)-\sin(\theta)) \\ = & -AB\sin(\theta) \end{align}$$

The absolute value of this, $AB\sin(\theta)$, is how you find the area of a parallelogram - the products of the lengths of the sides times the sine of the angle between them.


A lengthy proof of the change of variables formula for Riemann integrals in $\mathbb R^n$ (that does not use measure theory) is given in Vector Calculus, Linear Algebra, and Differential Forms: A Unified Approach by Hubbard and Hubbard. A discussion of the intuition behind it is given on page 493.