How to calculate the expectation of the Wishart distribution?

This can be shown with the characteristic function.

If $X$ is a random matrix, its characteristic function is $Z \mapsto E\Bigl[\exp\bigl(\textrm{tr}(iZX)\bigr)\Bigr]$.

The application $Z \mapsto \exp\bigl(\textrm{tr}(iZX)\bigr)$ is differentiable and its differential at $Z_0$ is $\textsf{D}_{Z_0}\phi(Z) = \exp\bigl(\text{tr}(iZ_0X)\bigr)\text{tr}(iZX)$. For $Z_0=0$ (the null matrix), $\textsf{D}_{0}\phi(Z) = \text{tr}(iZX)$, and if $X$ is integrable then $\phi$ is differentiable at $0$ and $\textsf{D}_0\phi(Z) = \text{tr}\bigl(iZE[X]\bigr)$. Conversely, if $\phi$ is differentiable at $0$ then $X$ is integrable and $\textsf{D}_0\phi(Z) = \text{tr}\bigl(iZE[X]\bigr)$

Therefore, if you prove that $\phi$ is differentiable at $0$ and $\textsf{D}_0\phi(Z) = \text{tr}\bigl(iZA\bigr)$ then $E[X]=A$.

The characteristic function of the Wishart distribution $W_p(\nu, \Sigma)$ is $\phi(Z) = \det(I - 2iZ\Sigma)^{-\frac{\nu}{2}}$.

Let's set $V(Z) = I - 2iZ\Sigma$ and $U(Z) = \det\bigl(V(Z)\bigr)$. Then $V$ is differentiable and $\textsf{D}_{Z_0}V(Z) = -2iZ\Sigma$. By Jacobi's formula, $U$ is differentiable and $\textsf{D}_{Z_0}U(Z) = \text{tr}\bigl({V(Z_0)}^\# \textsf{D}_{Z_0}V(Z)\bigr)$ where $M^\#$ denotes the adjugate of a square matrix $M$. For $Z_0=0$, this gives $\textsf{D}_{0}U(Z) = \text{tr}(-2iZ\Sigma)$. Now, $\phi$ is differentiable and $$ \textsf{D}_{Z_0}\phi(Z) = -\frac{\nu}{2}{\det(I - 2iZ_0\Sigma)}^{-\frac{\nu}{2}-1} \textsf{D}_{Z_0}U(Z) $$ and for $Z_0=0$, $$ \textsf{D}_{Z_0}\phi(Z) = -\frac{\nu}{2} \text{tr}(-2iZ\Sigma) = \text{tr}\bigl(iZ(\nu\Sigma)\bigr). $$ That shows that the expectation is $\nu\Sigma$.


$X$ follows a wishart distribution, $X \sim W_{p}(\Sigma,v)$, if \begin{equation} X = \sum_{i=1}^{v}Z_{i}Z_{i}^{T} \end{equation} where $Z_i$ are independent identically p-dimensional normally distributed such that $Z_{i} \sim N_{p}(0,\Sigma)$, and $\Sigma$ is the covariance matrix of $Z_{i}$.

The expectation of a random matrix A is defined as $\mathbb{E}(A)_{i,j} = \mathbb{E}(A_{i,j})$, i.e it's the expectation of each of its elements. This definition is compatible with random vectors.

Now \begin{align*} \mathbb{E}(X) &= \mathbb{E} \left( \sum_{i=1}^{v}Z_{i}Z_{i}^{T} \right) \\&= \sum_{i=1}^{v} \mathbb{E}(Z_{i}Z_{i}^{T}) \\ &= \sum_{i=1}^{v} \mathbb{E}(Z_{1}Z_{1}^{T}) \\ &= v\mathbb{E}(Z_{1}Z_{1}^{T}) \end{align*}

Where the equality follows from the linearity of the expectation and that $Z_{i}$ are identically distributed.

But for any random vector $Y$, $Cov(Y) = \mathbb{E}(YY^{T}) - \mathbb{E}(Y)\mathbb{E}(Y^{T})$. The mean of $Z_{1}$ is zero and hence $\mathbb{E}(Z_{1}Z_{1}^{T}) = Cov(Z_{1}) = \Sigma$.

Therefore \begin{equation} \mathbb{E}(X) = v\Sigma. \end{equation}