Formulas for the (top) coefficients of the characteristic polynomial of a matrix

The first guess is correct and follows directly from the permutation expansion of the determinant. The second guess is also correct. The generalization is

$$\text{tr}^{(n)}(A) = \text{tr } \Lambda^n(A) = \frac{1}{n!} \sum_{\pi \in S_n} \text{sgn}(\pi) \text{tr}_{\pi}(A)$$

where $\Lambda^n(A)$ is the action of $A$ on the exterior power $\Lambda^n$, $\text{tr}_n(A) = \text{tr}(A^n)$ and, for a permutation $\pi$ with cycle type $(\lambda_1, ... \lambda_k)$, we define $\text{tr}_{\pi} = \text{tr}_{\lambda_1} ... \text{tr}_{\lambda_k}$.

For example, $S_3$ has one element with cycle type $(1, 1, 1)$, three elements with cycle type $(1, 2)$, and two elements with cycle type $(3)$. It follows that

$$\text{tr}^{(3)}(A) = \frac{1}{6} \left( \text{tr}(A)^3 - 3 \text{tr}(A^2) \text{tr}(A) + 2 \text{tr}(A^3) \right).$$

This identity is really an identity of symmetric functions expressing the elementary symmetric functions in terms of the power symmetric functions. It can be deduced from Newton's identities or the exponential formula. You can find a more thorough exposition in Stanley's Enumerative Combinatorics, Vol. II, Chapter 7.


Some exposition is in order regarding what this has to do with exterior powers. Suppose that $A$ is diagonalizable with eigenvalues $\lambda_1, ... \lambda_n$ and eigenvectors $e_1, ... e_n$. Then of course $\text{tr}(A) = \sum \lambda_i$. More generally $\text{tr}^{(k)}(A) = \sum_{i_1 < ... < i_k} \lambda_{i_1} ... \lambda_{i_k}$ is the $k^{th}$ elementary symmetric polynomial in the eigenvalues.

Now, if $A$ acts on a vector space $V$, then the exterior power $\Lambda^k(V)$ inherits an action of $A$ by functoriality. In this basis, it is particularly easy to describe: it is given by

$$\Lambda^k(A)(e_{i_1} \wedge ... \wedge e_{i_k}) = \lambda_{i_1} ... \lambda_{i_k} e_{i_1} \wedge ... \wedge e_{i_k}.$$

Hence the wedges of all possible ordered $k$-tuples of distinct eigenvectors of $A$ forms an eigenbasis for $\Lambda^k(A)$ acting on $\Lambda^k(V)$. It follows that $\text{tr } \Lambda^k(A) = \text{tr}^{(k)}(A)$ for all diagonalizable $A$, hence for all $A$ by continuity. Of course you can ignore all of this if you want to, but I thought I would mention it because it is a precise sense in which the other coefficients of the characteristic polynomial encode "higher traces."


I thought I might add a bit on the background of what we are talking about.

Consider a vector space $V$ with a linear operator $A$, and lets choose a basis $e_i$. Let $A\otimes A$ be the linear operator on $V\otimes V$ given by $A\otimes A (e_i\otimes e_j)=Ae_i\otimes Ae_j$. It is easy to see that $tr(A\otimes A)=tr(A)^2$. Let $\theta$ be the automorphism of $V\otimes V$ given by $\theta(e_i\otimes e_j)=e_j\otimes e_i$. We can define $Sym^2(V)$, the Symmetric Square to be the set of all elements in $V\otimes V$ which are annihilated by $\theta -Id$, and $Alt^2(V)$, the Alternating Square, to be the set of all elements that are annihilated by $\theta + Id$. Then, $V\otimes V$ decomposes into $Sym^2(V)\oplus Alt^2 (V)$ and what you are then looking for is the Trace of $A\otimes A$ when restricted to the alternating square. (Note: restriction to an invariant subspace gives a linear operator, so we can take its Trace)

Why is this Trace the sum of $\lambda_i \lambda_j$? First, the set of all $e_i\otimes e_j+e_j\otimes e_i$ is a basis for $Sym^2(V)$ and the set of all $e_i\otimes e_j - e_j\otimes e_i$ is a basis for $Alt^2(V)$. To see this, write a vector in either subspace in the basis for $V\otimes V$ and consider the fact that it is annihilated by either $\theta +Id$ or $\theta -Id$. Applying these operators gives and equation, and allows us to write the original vector in one of the bases I mentioned above.
Now lets do the case where $A$ is diagonalizable, and the $e_i$ above form an eigenbasis. Then $$A\otimes A (e_i\otimes e_j - e_j\otimes e_i)= \lambda_i\lambda_j(e_i\otimes e_j - e_j\otimes e_i)$$ Hence $\lambda_i\lambda_j$ are exactly the eigenvalues of $A\otimes A$ when restricted to $Alt^2(V)$.

In more generality, the sum of the products of $n$ distinct eigenvalues will be the Trace of $A\otimes \cdots \otimes A$ restricted to the intersection of all subspaces annihilated by $\theta_\pi +Id$ where $\theta_\pi $ is some automorphism given by an odd permutation. This subspace is sometimes called the Exterior Power.

Now, how do we actually calculate a formula for these Traces based on the original matrix $A$? That is supplied in Qiaochu's Answer.


We want a "simple" formula for the coefficients of the characteristic polynomial in terms of the entries of the matrix, at least for the top few coefficients.

The characteristic polynomial can be written in terms of the eigenvalues:

$$ \chi(A) = (x-\lambda_1)(x-\lambda_2)\cdots(x-\lambda_n) $$ $$= x^n - (\lambda_1 + \dots + \lambda_n) x^{(n-1)} + (\lambda_1 \lambda_2 + \lambda_1 \lambda_3 + \dots + \lambda_{n-1} \lambda_n) x^{(n-2)} - \dots + (-1)^n \lambda_1 \lambda_2 \cdots \lambda_n $$

Each coefficient tr(k) is just (±) the sum of the product of all k-sets of the eigenvalues λi. We'd like to use the regular trace to write this sum down, and so we try to find some crazy matrix Λ(k)(A) whose eigenvalues are exactly those products!

For k=1 this is easy: just use A itself. For k=0, this is degenerate: just use [1]. For k=n, there is a sneaky trick: use [ det(A) ]. That is basically cheating, but it turns out to be part of a larger pattern.

To define B = Λ(k)(A) we consider all k-subsets K of {1,2,…,n} and use those to index the rows and columns of B. For two such k-subsets K,L define the (K,L)th entry of B to be the determinant of the submatrix of A formed by taking only the rows from K and the columns from L. In other words, B is the matrix formed of the k-minors of A.

For instance if K={i,j} and L={i,j}, then the (K,L)th entry is AiiAjj−AijAji. The trace of this matrix is just the sum of the (K,K)th entries as K varies over all k-subsets of {1,2,…,n}. For k=2, this is exactly the first formula given:

$$\operatorname{tr}^{(2)}(A) = tr(B) = \sum_{1 \leq i < j \leq n} A_{ii}A_{jj}-A_{ij}A_{ji}$$

Now for k=3, the formula is a little bit awful, but primarily because the formula for a 3×3 determinant is a little awful.

$$\operatorname{tr}^{(3)}(A) = tr(B) = \sum_{1\leq i<j<k \leq n} \left|\begin{array}{rrr}A_{ii} & A_{ij} & A_{ik} \\ A_{ji} & A_{jj} & A_{jk} \\ A_{ki} & A_{kj} & A_{kk} \end{array}\right|$$ $$= \sum_{1 \leq i < j < k \leq n} A_{ii}A_{jj}A_{kk}+A_{ij}A_{jk}A_{ki}+A_{ik}A_{ji}A_{kj}-A_{ik}A_{jj}A_{ki}-A_{ij}A_{ji}A_{kk}-A_{ii}A_{jk}A_{kj}$$


Now relating Eric Naslund's answer:

The Kronecker product of two matrices, M⊗N is another related construction. Its rows are indexed by pairs (i1,i2) of row indices (i1 from M, and i2 from N), its columns are indexed by pairs (j1,j2) of column indices (j1 from M, and j2 from N), and the ((i1,i2),(j1,j2))th entry is just the the product of the (i1,j1)st entry of the first matrix with (i2,j2)nd entry of the second matrix. In other words, one gets a block matrix where one the first matrix decides a common multiplier on the (i1,j1)st block, and the second matrix simply provides that block.

For instance

$$\begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix} \otimes \begin{pmatrix} 5 & 6 & 7 \\ 8 & 9 & 10 \\ 11 & 12 & 13 \end{pmatrix} = \left(\begin{array}{rrr|rrr}0&0&0&5&6&7\\0&0&0&8&9&10\\0&0&0&11&12&13\\\hline -5 & -6 & -7 & 0&0&0 \\ -8 & -9 & -10 & 0&0&0 \\ -11&-12&-13&0&0&0 \end{array}\right) $$

The Kronecker product has the nice feature that its eigenvalues are the products of eigenvalues, one from the first matrix and one from the second. In particular, for A⊗A, one gets all products of eigenvalues, even λ1λ1 and both λ1λ2 and λ2λ1.

The Kronecker product of the two matrices acts on V⊗W, the vector space whose basis consists of pairs of basis vectors, one from V and one from W, and where M acts on V and N acts on W.

This vector space V⊗V is too large, it gives not only the products of 2-sets of eigenvalues, but also any ordered 2-tuple, including (1,1) and both (1,2) and (2,1). However, there is a nice subspace called the exterior power which is the intersection of the kernels of all those coordinate switchers mentioned by Eric Naslund. The matrix of A acting on the k'th exterior power Λ(k)(V) is just the matrix of k-minors, Λ(k)(A).

The eigenvalues divide up nicely in the second tensor power: the subspace spanned by x⊗x is called the symmetric power Sym2(V) and it gets one copy of each of the products λiλj for i ≤ j, including all of the repeats λiλi, so it has dimension n(n+1)/2. The space spanned by all x⊗y−y⊗x is called the exterior power Alt2(V) and it gets the leftover eigenvalues, one copy each of the λiλj for i < j, so it has dimension n(n-1)/2.

For higher tensor powers the eigenvalues still have to be divided up, and one still has spaces Symk(V) and Altk(V)=Λ(k)(V), and these get their corresponding eigenvalues: all products of non-decreasing k-tuples of eigenvalues λi⋯λj with 1≤i≤…≤j≤n for Symk(V) and all products of increasing k-tuples (or just plain k-sets) of eigenvalues for Altk(V). Weirdly though, that doesn't add up to the total dimension, and there still other subspaces left! These are called Schur functors and apparently don't have too direct a relation on this question, but you can see they have to exist by looking at the third tensor power of a three dimensional space (a vector space of dimension 3×3×3=27): Sym3 has (1,1,1), (1,1,2), (1,2,2), (1,2,3), (1,3,3), (2,2,2), (2,2,3), (2,3,3), (3,3,3) for a total of 9. Alt3 has only (1,2,3) for a total of 1. Where did the other 17 dimensions go? For instance, who got λ3λ2λ1?