Minimize $\mbox{tr}(AX)$ where $X$ is an orthogonal matrix

The method of Lagrange multipliers yields the solution.

One studies the (real-valued) function $F(X)$ with the (matrix) constraint that $C(X)=I$ for $F(X)=\text{trace}(AX)$ and $C(X)=X^TX$, hence $$ F(X)=\sum\limits_{ij}A_{ji}X_{ij},\quad C_{ij}(X)=\sum\limits_{k}X_{ki}X_{kj}. $$ The gradients of $F$ and $C$ are given by $\partial_{ij}F(X)=A_{ji}$ and $$ \partial_{ij}C_{k\ell}(X)=X_{i\ell}[k=j]+X_{ik}[\ell=j]. $$ One wants that there exists some multipliers $\lambda_{ij}$ such that, for every $i$ and $j$, $$ \partial_{ij}F(X)=\sum\limits_{k\ell}\lambda_{k\ell}\partial_{ij}C_{k\ell}(X). $$ This condition reads $$ A_{ji}=\sum\limits_{k,\ell}\lambda_{k\ell}\left(X_{i\ell}[k=j]+X_{ik}[\ell=j]\right)=\sum\limits_{\ell}\lambda_{j\ell}X_{i\ell}+\sum\limits_{k}\lambda_{kj}X_{ik}, $$ or, equivalently, introducing the matrix $\Lambda=(\lambda_{ij})$, $$ A^T=X\Lambda^T+X\Lambda. $$ The matrix $M=\Lambda^T+\Lambda$ is such that $M^T=M$ and $XM=A^T$, hence $X^TX=I$ implies that $M$ should be such that $M^TM=M^2=AA^T$.

Using pseudo-inverse matrices if $A$ is not invertible and the usual definition of the square root of a symmetric matrix, one sees that the maximizing and minimizing matrices $X$ and the maximum and minimum values of $F$ are

When $A$ is invertible, one can use the usual definition of the square root of a symmetric matrix to find $M$. The maximizing and minimizing matrices $X$ and the maximum and minimum values of $F$ are$$ X_0=A^TM^{-1}=\pm A^T(AA^T)^{-1/2},\quad F(X_0)=\pm\mbox{trace}((AA^T)^{1/2}). $$ When $A$ is not invertible, the formula $X_0=A^TM^{-1}$, whichever meaning one gives to the notation $M^{-1}$, cannot yield an orthogonal matrix $X_0$ since, if $A$ is not invertible, neither are $A^T$, nor $A^T$ times any matrix. On the other hand, the formula $F(X_0)=\pm\mbox{trace}((AA^T)^{1/2})$ might still be valid.


Another way to get did's answer is to use the SVD to reduce to the case of $A$ being diagonal: We have $A = USV'$, where $U$ and $V$ are orthogonal and $S$ is diagonal, and so

$$\def\Tr{\operatorname{Tr}}\Tr(AX) = \Tr(SV'XU)\;.$$

Since $X\to V'XU$ is a bijection on the set of orthogonal matrices,

$$\min\{\Tr(AX)\mid X\text{ orthogonal}\} = \min\{\Tr(SW)\mid W\text{ orthogonal}\}\;.$$

But $\Tr(SW) = \sum_iS_{ii}W_{ii}$, and all the $S_{ii}$ are non-negative and $-1\le W_{ii}\le1$, so that $\Tr(SW)\ge\Tr(S(-I))$, and so the minimum occurs at $W=-I$ (or $X=-VU'$) and is $-\sum_iS_{ii}$.