Reference for trace/norm inequality
A proof in linear algebra. I hope you're familiar with SVD.
Lemma 1 For any matrix $A$, $|tr(A)|\le \sum_i \sigma_i(A)$
Proof: By SVD decomposition, and properties of the trace function $$tr(A) = tr(U\Sigma V) = tr(\Sigma VU) $$ If $Z=VU$ then it is still an unitary matrix, and $$|tr(\Sigma Z)| = |\sum_i \sigma_i(A)z_{ii}|\le \sum_i |\sigma_i(A)z_{ii}|\le \sum_i \sigma_i(A) $$ since $|z_{ii}|\le 1$.
Lemma 2 For any matrix $A,B$, $\sigma_i(A^*B)\le \sigma_i(A)\sigma_1(B)$
Proof: Using Fischer minmax theorem, we know $$ \sigma_i(A^*B) = \max_{\dim V=i}\min_{x\in V,\,\|x\|=1}\|A^*Bx\| $$ but $$ \min_{x\in V,\,\|x\|=1} \|A^*Bx\| \le \max_{x\in V,\,\|x\|=1}\|Bx\| \min_{y\in BV,\,\|y\|=1}\|A^*y\| $$ so $$ \sigma_i(A^*B) \le \max_{\dim V=i}(\max_{x\in V,\,\|x\|=1}\|Bx\| \min_{y\in BV,\,\|y\|=1}\|A^*y\|) $$ $$ \le \max_{\dim V=i}\max_{x\in V,\,\|x\|=1}\|Bx\| \max_{\dim V=i}\min_{y\in BV,\,\|y\|=1}\|A^*y\| \le \sigma_1(B)\sigma_i(A^*) $$