Why do we define change of basis matrix to be the transpose of the transformation?

This is a good question. It might be comforting to know that there is always some arbitrary choices involved in the whole issue of representing vectors of an abstract vector space as an array of numbers, linear maps as a double array of numbers (matrices) and what one means by changing a basis. The most important thing is to set up a notation that minimizes the number of arbitrary choices and is self consistent.

Let me try and explain the motivation behind the most popular notation and then reconsider your example. Fix a vector space $V$ and let $\mathcal{B} = (v_1, \dots, v_n)$ be some basis of $V$. The basis $\mathcal{B}$ allows us to identify a vector $v \in V$ with a list of scalars by representing $v$ (uniquely) as $v = a_1 v_1 + \dots + v_n a_n$ and identifying $v$ with the list $(a_1,\dots,a_n)$ which is called the coordinates of $v$ with respect to $\mathcal{B}$. The convention is that we treat this list as a column vector and write

$$ [v]_{\mathcal{B}} := \begin{pmatrix} a_1 \\ \vdots \\ a_n \end{pmatrix}. $$

Given a linear map $T \colon V \rightarrow W$, a basis $\mathcal{B} = (v_1,\dots,v_n)$ of $V$ and a basis $\mathcal{C} = (w_1,\dots,w_m)$ of $W$, we can represent each vector $T(v_i)$ as a linear combination $T(v_j) = \sum_{i=1}^m a_{ij} w_i$. The convention is that we treat the array $A = (a_{ij})$ as a double array (which we call a matrix) for which $i$ is the row index and $j$ is the column index. The matrix $A \in M_{m \times n}(\mathbb{F})$ is denoted by $A = [T]^{\mathcal{B}}_{\mathcal{C}}$ and is called the matrix representing $T$ with respect to the basis $\mathcal{B}$ (of the domain) and the basis $\mathcal{C}$ (of the codomain). This convention has the slightly annoying feature (especially to beginners) that a linear map from an $n$-dimensional space to an $m$-dimensional space is represented by an $m \times n$ matrix (so the dimensions are "reversed") but its most important advantage is that it identifies matrix multiplication and composition/evaluation. Namely, we have the following formulas:

$$ [T(v)]_{\mathcal{C}} = [T]^{\mathcal{B}}_{\mathcal{C}} \cdot [v]_{\mathcal{B}}, \,\,\, [T \circ S]^{\mathcal{B}}_{\mathcal{D}} = [T]^{\mathcal{C}}_{\mathcal{D}} \cdot [S]_{\mathcal{C}}^{\mathcal{B}} $$

where $\cdot$ is matrix multiplication.

Finally, let us discuss the change of basis matrices. If $\mathcal{B} = (u_1,\dots,u_n)$ and $\mathcal{B}' = (v_1,\dots,v_n)$ are two bases of $V$, the change of basis matrix "from $\mathcal{B}'$ to $\mathcal{B}$" is the matrix $P = [\operatorname{id}]_{\mathcal{B}}^{\mathcal{B'}}$ where $\operatorname{id} \colon V \rightarrow V$ is the identity transformation. Using the properties above, we see that we have

$$ P[v]_{\mathcal{B}'} = [\operatorname{id}]_{\mathcal{B}}^{\mathcal{B'}} [v]_{\mathcal{B}'} = [v]_{\mathcal{B}}. $$

Thus, given the coordinates a vector $v \in V$ in the "new basis" $\mathcal{B}'$, the matrix $P$ allows us to compute the coordinates of $v$ in the "old basis" $\mathcal{B}$ by performing matrix multiplication. The decision which basis to call "the old" and which "the new" is not entirely standard and depends on whether you prefer to change basis vectors or coordinates. In physics, this is related to the "passive v.s active" point of view of linear transformations.

Finally, let me reconsider your example. We have $V = \mathbb{R}^2$, $\mathcal{B} = (u_1 = (1,2),u_2 = (3,5))$ and $\mathcal{B}' = (v_1 = (1,-1), v_2 = (1,2))$. When representing elements in the basis $\mathcal{B}$, I'll use the letter $a$ and when writing elements in the basis $\mathcal{B}'$ I'll use the letter $b$ for the coefficients. That is,

$$ v = a_1 u_1 + a_2 u_2 = b_1 v_1 + b_2 v_2. $$

The matrix $P$ has the feature that

$$ P \begin{pmatrix} b_1 \\ b_2 \end{pmatrix} = \begin{pmatrix} a_1 \\ a_2 \end{pmatrix} $$

and so it tells you how to transform the coordinates of an arbitrary vector in the basis $\mathcal{B}'$ to its coordinates in the basis $\mathcal{B}$. For example, if

$$ v = 1 \cdot v_1 + 1 \cdot v_2 = 1 \cdot (-8u_1 + 3u_2) + 1 \cdot (-11u_1 + 4u_2) = -19 u_1 + 7 u_2 $$

we have

$$ [v]_{\mathcal{B}} = \begin{pmatrix} -19 \\ 7 \end{pmatrix}, \,\,\, [v]_{\mathcal{B}'} = \begin{pmatrix} 1 \\ 1 \end{pmatrix}, \\ \begin{pmatrix} -8 && -11 \\ 3 && 4 \end{pmatrix} \begin{pmatrix} 1 \\ 1 \end{pmatrix} = \begin{pmatrix} -8 && -11 \\ 3 && 4 \end{pmatrix} [v]_{\mathcal{B}'} = \begin{pmatrix} -19 \\ 7 \end{pmatrix} = [v]_{\mathcal{B}}. $$

When a basis, or coordinate, transform is in action then we are not so much interested in how the new basis vectors are defined (this is dealt with at the start), but we need to transform the coordinates of vectors a thousand times. Therefore things are set up in such a way that the the formulas for the transformation of coordinates are as simple as possible.

A second thing: Per definition of matrix multiplication matrices should operate on, resp., are multiplied with, tuples of numbers, and not tuples of vectors. But in the setup you propose the latter is the case.

Why do we define change of basis matrix to be the transpose of the transformation?

Tags:

Linear Algebra

Related

Recent Posts