Why does a diagonalization of a matrix B with the basis of a commuting matrix A give a block diagonal matrix?

Suppose that $A$ and $B$ are matrices that commute. Let $\lambda$ be an eigenvalue for $A$, and let $E_{\lambda}$ be the eigenspace of $A$ corresponding to $\lambda$. Let $\mathbf{v}_1,\ldots,\mathbf{v}_k$ be a basis for $E_{\lambda}$.

I claim that $B$ maps $E_{\lambda}$ to itself; in particular, $B\mathbf{v}_i$ can be expressed as a linear combination of $\mathbf{v}_1,\ldots,\mathbf{v}_k$, for $i=1,\ldots,k$.

To show that $B$ maps $E_{\lambda}$ to itself, it is enough to show that $B\mathbf{v}_i$ lies in $E_{\lambda}$; that is, that if we apply $A$ to $B\mathbf{v}_i$, the result ill be $\lambda(B\mathbf{v}_i)$. This is where the fact that $A$ and $B$ commute comes in. We have: $$A\Bigl(B\mathbf{v}_i\Bigr) = (AB)\mathbf{v}_i = (BA)\mathbf{v}_i = B\Bigl(A\mathbf{v}_i\Bigr) = B(\lambda\mathbf{v}_i) = \lambda(B\mathbf{v}_i).$$ Therefore, $B\mathbf{v}_i\in E_{\lambda}$, as claimed.

So, now take the basis $\mathbf{v}_1,\ldots,\mathbf{v}_k$, and extend it to a basis for $\mathbf{V}$, $\beta=[\mathbf{v}_1,\ldots,\mathbf{v}_k,\mathbf{v}_{k+1},\ldots,\mathbf{v}_n]$. To find the coordinate matrix of $B$ relative to $\beta$, we compute $B\mathbf{v}_i$ for each $i$, write $B\mathbf{v}_i$ as a linear combination of the vectors in $\beta$, and then place the corresponding coefficients in the $i$th column of the matrix.

When we compute $B\mathbf{v}_1,\ldots,B\mathbf{v}_k$, each of these will lie in $E_{\lambda}$. Therefore, each of these can be expressed as a linear combination of $\mathbf{v}_1,\ldots,\mathbf{v}_k$ (since they form a basis for $E_{\lambda}$. So, to express them as linear combinations of $\beta$, we just add $0$s; we will have: $$\begin{align*} B\mathbf{v}_1 &= b_{11}\mathbf{v}_1 + b_{21}\mathbf{v}_2+\cdots+b_{k1}\mathbf{v}_k + 0\mathbf{v}_{k+1}+\cdots + 0\mathbf{v}_n\\ B\mathbf{v}_2 &= b_{12}\mathbf{v}_1 + b_{22}\mathbf{v}_2 + \cdots +b_{k2}\mathbf{v}_k + 0\mathbf{v}_{k+1}+\cdots + 0\mathbf{v}_n\\ &\vdots\\ B\mathbf{v}_k &= b_{1k}\mathbf{v}_1 + b_{2k}\mathbf{v}_2 + \cdots + b_{kk}\mathbf{v}_k + 0\mathbf{v}_{k+1}+\cdots + 0\mathbf{v}_n \end{align*}$$ where $b_{ij}$ are some scalars (some possibly equal to $0$). So the matrix of $B$ relative to $\beta$ would start off something like: $$\left(\begin{array}{ccccccc} b_{11} & b_{12} & \cdots & b_{1k} & * & \cdots & *\\ b_{21} & b_{22} & \cdots & b_{2k} & * & \cdots & *\\ \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots\\ b_{k1} & b_{k2} & \cdots & b_{kk} & * & \cdots & *\\ 0 & 0 & \cdots & 0 & * & \cdots & *\\ \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 0 & * & \cdots & * \end{array}\right).$$

So, now suppose that you have a basis for $\mathbf{V}$ that consists entirely of eigenvectors of $A$; let $\beta=[\mathbf{v}_1,\ldots,\mathbf{v}_n]$ be this basis, with $\mathbf{v}_1,\ldots,\mathbf{v}_{m_1}$ corresponding to $\lambda_1$ (with $m_1$ the algebraic multiplicity of $\lambda_1$, which equals the geometric multiplicity of $\lambda_1$); $\mathbf{v}_{m_1+1},\ldots,\mathbf{v}_{m_1+m_2}$ the eigenvectors corresponding to $\lambda_2$, and so on until we get to $\mathbf{v}_{m_1+\cdots+m_{k-1}+1},\ldots,\mathbf{v}_{m_1+\cdots+m_k}$ corresponding to $\lambda_k$. Note that $\mathbf{v}_{1},\ldots,\mathbf{v}_{m_1}$ are a basis for $E_{\lambda_1}$; that $\mathbf{v}_{m_1+1},\ldots,\mathbf{v}_{m_1+m_2}$ are a basis for $E_{\lambda_2}$, etc.

By what we just saw, each of $B\mathbf{v}_1,\ldots,B\mathbf{v}_{m_1}$ lies in $E_{\lambda_1}$, and so when we express it as a linear combination of vectors in $\beta$, the only vectors with nonzero coefficients are $\mathbf{v}_1,\ldots,\mathbf{v}_{m_1}$, because they are a basis for $E_{\lambda_1}$. So in the first $m_1$ columns of $[B]_{\beta}^{\beta}$ (the coordinate matrix of $B$ relative to $\beta$), the only nonzero entries in the first $m_1$ columns occur in the first $m_1$ rows.

Likewise, each of $B\mathbf{v}_{m_1+1},\ldots,B\mathbf{v}_{m_1+m_2}$ lies in $E_{\lambda_2}$, so when we express them as linear combinations of $\beta$, the only places where you can have nonzero coefficients are in the coefficients of $\mathbf{v}_{m_1+1},\ldots,\mathbf{v}_{m_1+m_2}$. So the $(m_1+1)$st through $(m_1+m_2)$st column of $[B]_{\beta}^{\beta}$ can only have nonzero entries in the $(m_1+1)$st through $(m_1+m_2)$st rows. And so on.

That means that $[B]_{\beta}^{\beta}$ is in fact block-diagonal, with the blocks corresponding to the eigenspaces $E_{\lambda_i}$ of $A$, exactly as described.


I will write $k_i=m_g(\lambda_i)$.

You are looking for the general form of a matrix $B$ that commutes with $$A= \begin{pmatrix} \lambda_1 I_{k_1} & 0 & \cdots & 0 \\ 0 & \lambda_2 I_{k_2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_m I_{k_m} \end{pmatrix}.$$

If you put $B$ in the same block structure, you have

$$B= \begin{pmatrix} B_{11} & B_{12} & \cdots & B_{1m} \\ B_{21} & B_{22} & \cdots & B_{2m} \\ \vdots & \vdots & \ddots & \vdots \\ B_{m1} & B_{m2} & \cdots & B_{mm} \end{pmatrix},$$ where $B_{ij}$ is a $k_i$-by-$k_j$ matrix.

Then $$AB= \begin{pmatrix} \lambda_1 B_{11} & \lambda_1 B_{12} & \cdots & \lambda_1 B_{1m} \\ \lambda_2 B_{21} & \lambda_2 B_{22} & \cdots & \lambda_2 B_{2m} \\ \vdots & \vdots & \ddots & \vdots \\ \lambda_m B_{m1} & \lambda_m B_{m2} & \cdots & \lambda_m B_{mm} \end{pmatrix},$$ while $$BA= \begin{pmatrix} \lambda_1 B_{11} & \lambda_2 B_{12} & \cdots & \lambda_m B_{1m} \\ \lambda_1 B_{21} & \lambda_2 B_{22} & \cdots & \lambda_m B_{2m} \\ \vdots & \vdots & \ddots & \vdots \\ \lambda_1 B_{m1} & \lambda_2 B_{m2} & \cdots & \lambda_m B_{mm} \end{pmatrix}.$$

You can compare off-diagonal blocks to see that $B$ must have the desired form if $BA=AB$, because $\lambda_i\neq \lambda j$ if $i\neq j$.


Getting a block diagonal matrix expressing$~B$ just means that each eigenspace for$~A$ (whose direct sum fills the entire space since $A$ is supposed diagonalisable) is $B$-stable. And a useful fact that applies here is that when two linear operators commute, then every subspace that is the kernel or image of a polynomial in one of the operators is automatically stable for the other operator. A polynomial in the first operator is just another operator$~\psi$ that commutes with the second operator$~\phi$, so it suffices to show that for the kernel and the image of $\psi$ are $\phi$-stable when $\psi$ and $\phi$ commute:

  • Kernel: if $v\in\ker\psi$ then $\psi(\phi(v))=\phi(\psi(v))=\phi(0)=0$, so indeed $\phi(v)\in\ker\psi$.
  • Image: if $v=\psi(w)$ then $\phi(v)=\phi(\psi(w))=\psi(\phi(w))$, which indeed is in the image of $\psi$.

The eigenspace of$~A$ for $\lambda$ is of course just the special case of the kernel of the polynomial $A-\lambda I$ in$~A$.