What is the time complexity of the matrix exponential?

The exponential of a matrix is not, in general, computable explicitly. Strictly speaking, the Complexity of Matrix Exponential does not exist. Instead, you have an approximate calculation and you are asking about its complexity. Now, a few comments :

  • there is not just one approximate exponential, but actually a sequence of approximations. Therefore, the complexity must involve the tolerance parameter.
  • One cannot overlook the stability of the approximation. At least two ingredients are at stake. Either the diagonalisation of the matrix can be ill-conditionned (the matrix is far from normal). Or (even worse) its spectrum is moderately large. Say that $30$ and $-30$ are eigenvalues, then the magnitude of $\exp A$ is $e^{30}$, a large number. But $e^{-30}$ is also an (extremely small) eigenvalue, and the action of the approximate $\exp A$ on the corresponding eigenvector will be totally eroneous. More generally, a moderate condition number of $A$ causes huge errors in the modes associated with the smallest eigenvalues. These phenomena are well known by numerical analysts when dealing with ODEs. And they must choose a version of the Euler scheme adapted to the matrix itself.

Moler's paper "Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later" contains the following extracts:

In estimating the time required by matrix computations it is traditional to estimate the number of multiplications and then employ some factor to account for the other operations. We suggest making this slightly more precise by defining a basic floating point operation, or “flop”, to be the time required for a particular computer system to execute the FORTRAN statement

$A(I; J) = A(I; J) + T^*A(I;K)$.

This involves one floating point multiplication, one floating point addition, a few subscript and index calculations, and a few storage references. We can then say, for example, that Gaussian elimination requires $n^3/3$ flops to solve an $n$-by-$n$ linear system $Ax = b$....

About $qn^3$ flops are required to evaluate [a Padé approximant]...

The scaling and squaring method of section 3 and some of the matrix decomposition methods of section 6 require on the order of $10$ to $20n^3$ flops and they obtain higher accuracies than those obtained with $200 n^3$ or more flops for the o.d.e. solvers...

Most of the computational cost [in block diagonal methods] lies in obtaining the basic Schur decomposition. Although this cost varies somewhat from matrix to matrix because of the iterative nature of the QR algorithm, a good average figure is $15 n^3$ flops, including the further reduction to block diagonal form.

and there are a few more nuggets buried in there.

Based purely on this (and without reading the paper you cite but noting that its title is "A new scaling and squaring algorithm for the matrix exponential", for which see also here and http://eprints.ma.man.ac.uk/1442/), I would guess that Al-Mohy and Higham's scaling and squaring method is in the ballpark of $Cn^3$ for $C \approx 15$.