How to calculate the matrix exponential explicitly for a matrix which isn't diagonalizable?
I suppose it's the proper time for something elaborate.
The reason why one would want to try to diagonalize a matrix before applying the exponential (or any function, really) is that it's easy to compute the exponential of a diagonal matrix: one merely takes the exponential of the diagonal entries. For matrices that can be diagonalized (e.g. normal matrices) $\mathbf A=\mathbf V\mathbf\Lambda\mathbf V^{-1}$, the exponential then is given by $\exp(\mathbf A)=\mathbf V\exp(\mathbf\Lambda)\mathbf V^{-1}$
Now, the problem is when your matrix is defective (i.e., nondiagonalizable, does not possess a complete eigenvector set). We still want to be able to perform a similarity transformation in such a way that one can (relatively) easily apply the exponential function to the resulting matrix.
The Jordan decomposition is one such convenient decomposition: one decomposes a matrix as $\mathbf A=\mathbf S\mathbf J\mathbf S^{-1}$, where $\mathbf J$ is a triangular matrix. If $\mathbf A$ were diagonalizable to begin with, $\mathbf J$ is diagonal, and we're back to having an eigendecomposition. For defective $\mathbf A$, one can treat $\mathbf J$ as a block diagonal matrix whose diagonal blocks are either scalars or so-called Jordan blocks, matrices of the form
$$\begin{pmatrix}\lambda&1&&\\&\lambda&\ddots&\\&&\ddots&1\\&&&\lambda\end{pmatrix}$$
(elements not indicated are zero), which $\mathbf J$ will have for each cluster of repeated eigenvalues $\mathbf A$ has. Computing $\exp(\mathbf A)$ then formally proceeds in the same way as in the diagonalizable case: $\exp(\mathbf A)=\mathbf S\exp(\mathbf J)\mathbf S^{-1}$.
So, how would one compute $\exp(\mathbf J)$? The scalar blocks are easy to take care of, so what remains is a formula for computing
$$f\begin{pmatrix}\lambda&1&&\\&\lambda&\ddots&\\&&\ddots&1\\&&&\lambda\end{pmatrix}$$
The formula we need goes like this (if the Jordan block is a $k\times k$ matrix):
$$f\begin{pmatrix}\lambda&1&&\\&\lambda&\ddots&\\&&\ddots&1\\&&&\lambda\end{pmatrix}=\begin{pmatrix}f(\lambda)&f^\prime(\lambda)&\cdots&\frac{f^{(k-1)}(\lambda)}{(k-1)!}\\&f(\lambda)&\ddots&\vdots\\&&\ddots&f^\prime(\lambda)\\&&&f(\lambda)\end{pmatrix}$$
For the exponential, this becomes
$$\begin{pmatrix}\exp(\lambda)&\exp(\lambda)&\cdots&\frac{\exp(\lambda)}{(k-1)!}\\&\exp(\lambda)&\ddots&\vdots\\&&\ddots&\exp(\lambda)\\&&&\exp(\lambda)\end{pmatrix}$$
A proof of this formula is given here.
In the realm of inexact arithmetic, the Jordan decomposition becomes unreliable due to a number of elaborate reasons; for reliably computing the exponential of a matrix, there are a bunch of other alternatives, such as scaling+squaring and the use of the Schur decomposition, a similarity transformation to a triangular matrix which is safer to compute in inexact arithmetic. Moler and Van Loan tackle this topic in this article and its followup.
For any element $a$ of any finite dimensional $\mathbb C$-algebra with 1, let $f_a\in\mathbb C[X]$ be the unique polynomial of degree $<\dim\mathbb C[a]$ satisfying $f_a(a)=e^a$. (The letter $X$ is an indeterminate.)
Let $m\in\mathbb C[X]$ be the minimal polynomial of $a$, let $\lambda$ be a multiplicity $\mu(\lambda)$ root of $m$, and let $x(\lambda)$ be the image of $X$ in $\mathbb C[X]/(X-\lambda)^{\mu(\lambda)}$.
Then $f_a$ can be computed by solving, thanks to Taylor's Formula, the congruences $$f_a\equiv f_{x(\lambda)}\quad\bmod\quad(X-\lambda)^{\mu(\lambda)},$$ where $\lambda$ runs over the roots of $m$.
EDIT OF APRIL 22, 2011. Here are some more details: $$f_{x(\lambda)}=e^\lambda\ \sum_{n<\mu(\lambda)}\ \frac{(X-\lambda)^n}{n!}\quad,$$ $$f_a=\sum_\lambda\ T_\lambda\left(f_{x(\lambda)}\ \frac{(X-\lambda)^{\mu(\lambda)}}{m}\right)\frac{m}{(X-\lambda)^{\mu(\lambda)}}\quad,$$ where $T_\lambda$ means "degree $<\mu(\lambda)$ Taylor polynomial at $X=\lambda$".
EDIT OF APRIL 23, 2011. Let me try to describe what is (in my humble opinion) the relationship between the various approaches.
Again, let $a$ be an element of a finite dimensional $\mathbb C$-algebra with 1 (e.g. $a$ is an $n$ by $n$ complex matrix), and $m$ its minimal polynomial. Put $d := \deg m = \dim \mathbb C[a]$. Then there is a unique polynomial $f_a$ of degree $< d$ satisfying $f_a(a) = e^a$. Thus, computing $e^a$ is equivalent to computing $f_a$. Now, $f_a$ is determined by the congruences spelled out above (along with the condition deg $\deg f_a < d$). Thus, computing $e^a$ is equivalent to solving these congruences.
Here is the way things are usually stated. There is a unique degree $<d$ polynomial $g\in\mathbb C[X]$ such that $g(a)$ is semisimple and $a-g(a)$ is nilpotent. The equality $$a=g(a)+(a-g(a))$$ is called the Jordan decomposition of $a$. [To get the Jordan normal form of a matrix, you must in addition find, in each generalized eigenspace, a basis which is compatible, in a certain sense, with the nilpotent part.] Of course, $e^a$ can be easily expressed in terms of the Jordan decomposition. But the point is this: How do you compute $g$? The answer is: By solving the congruences $$g\equiv \lambda\quad\bmod\quad(X-\lambda)^{\mu(\lambda)},$$ where $\lambda$ runs over the roots of $m$. Thus, we see that computing the Jordan decomposition is exactly as hard as computing the exponential.
As a side remark, let me recall Lagrange's Formula for the exponential of a diagonalizable matrix $a$ with eigenvalues $(\lambda_j)$: $$e^a=\sum_j\ e^{\lambda_j}\ \prod_{k\not= j}\ \frac{a-\lambda_k}{\lambda_j-\lambda_k}\quad.$$
I'll say that there are at least three ways of computing $e^a$ for an $n$ by $n$ complex matrix $a$.
(1) Compute the exponential directly, as explained above.
(2) Compute the Jordan decomposition first, and then the exponential (using the decomposition).
(3) Compute the Jordan normal form first, and then the exponential (using the normal form).
I believe that (3) is an unnecessary complication: In the two extreme cases, namely when $a$ is diagonalizable and when $a$ is nilpotent, there is no difference between (1) and (2). In the diagonalizable case, (1) and (2) yield Lagrange's formula. But, if you want to use (3), you must diagonalize $a$. Isn't it a loss of time? In the nilpotent case, (1) and (2) give the obvious answer, but to get the normal form, you must solve a lot of linear systems. Isn't it a loss of time?
As a last comment, let me just spell out the starting point of the argument: There is an obvious isomorphism $\mathbb C[X]/(m)\to\mathbb C[a]$, mapping the image $x$ of $X$ to $a$. Thus, computing $e^a$ is the same as computing $e^x$. Moreover, $\mathbb C[X]/(m)$ this algebra is canonically isomorphic to a product of truncated polynomial algebras.
SECOND EDIT OF APRIL 23, 2011. The reader who has doubts can test the various methods on examples. The first example which is not entirely trivial is the companion matrix to $(X-a)^2(X-b)$, where $a$ and $b$ are distinct complex numbers.
EDIT OF APRIL 24, 2011. More generally, given an $n$ by $n$ complex matrix $a$, one can ask the following three questions:
(a) What is the minimal polynomial of $a$?
(b) What is the conjugacy class of $a$?
(c) How to put $a$ in Jordan normal form?
It is instructive to consider the nilpotent case case (to which the general case reduces in some sense):
To answer (a), you must find the least exponent $\mu$ satisfying $a^\mu=0$.
To answer (b), you must, in addition, compute the rank of $a^j$ for $1\le j < \mu$.
To answer (c), you must, in addition, solve a bunch of linear systems.
Only (a) is needed to compute the exponential.
You can find the exponential of a matrix by Putzer's algorithm, which avoids both computing either the Jordan canonical form or any eigenvectors. Putzer's algorithm uses the Cayley-Hamilton theorem, which states that a matrix satisfies its characteristic equation, in an essential way.
Here is a summary of the method for a $3 \times 3$ matrix $a$. The general case should be similar. The basic idea is that all powers of $a$ starting with $a^3, a^4, \ldots$ can be expressed as a linear combination of $i, a, a^2.$ We will use different combinations of $i, a, a^2$ instead, as explained in the next paragraph.
Let the characteristic polynomial of $a$ be written as $\det(\lambda I - A) = (\lambda - \lambda_1)(\lambda - \lambda_1)(\lambda - \lambda_3)$. It does not matter how the roots are labeled or even whether they are all real. Now define $p_0 = i, p_1 = (a - \lambda_1 i)p_0, p_2 = (a - \lambda_2 i)p_1.$ We will need the following consequences $ a p_0 = \lambda_1 p_0 + p_1, a p_1 = \lambda_2 p_1 + p_2, a p_2 = \lambda_3 p_2.$
Now look for $e^{at} = i + at + \frac{t^2a^2}{2!} + \ldots$. Since all powers starting with $a^3$ is a linear combination of $i, p_0, p_1$, thanks to c-h, $e^{at} = r_0 p_0 + r_1 p_1 + r_2 p_2.$ Using $\frac{d}{dt}e^{at} = ae^{at}$ it is easy to see that $r_0,r_1, r_2$ satisfy $$\frac{dr_0}{dt} = \lambda_1 r_0, \frac{dr_1}{dt} = \lambda_2 r_1 + r_0, \frac{dr_2}{dt} = \lambda_3 r_2 + r_1$$ together with the initial conditions $r_0(0) = 1, r_1(0) = 0, r_2(0) = 0.$
Once we have $r_0, r_1, r_2$ the fundamental matrix $e^{at} = r_0 p_0 + r_1 p_1 + r_2 p_2$ and we are done.