What is the intuition behind the trace of an endomorphism?
For finite-dimensional vector spaces $V$, there is a canonical isomorphism of $V$ with its double dual $V^{**}$ and this makes the vector space $V \otimes V^*$ naturally isomorphic to its own dual space: $$ (V \otimes V^{*})^* \cong V^* \otimes V^{**} \cong V^* \otimes V \cong V \otimes V^{*}, $$ where the first and last isomorphisms are the natural ones involving tensor products of (finite-dimensional) vector spaces. Since $V \otimes V^{*} \cong {\rm End}(V)$, we get that ${\rm End}(V)$ is naturally isomorphic as a vector space to its own dual space. If you unwrap all of these isomorphisms, the isomorphism ${\rm End}(V) \to ({\rm End}(V))^{*}$ sends each linear operator $A$ on $V$ to the following linear functional on operators on $V$: $B \mapsto {\rm Tr}(AB)$. In particular, the identity map on $V$ is sent to the trace map on ${\rm End}(V)$.
One way to think about it is that the trace is the unique linear map with the property
$$\operatorname{tr}(|u\rangle\langle v|) = \langle u | v\rangle $$
In particular in the case of a rank 1 matrix $A=\lambda|u\rangle\langle v|$ we see that the trace in some sense measures "how much $\ker(A)^\perp$ is aligned with $\operatorname{Im}(A)$". Does this viewpoint carry over to matrices that are not rank-1?
Consider a higher rank matrix $A=\sum_{i=1}^k \sigma_i|u_i\rangle\langle v_i|$. Again we may assume $u_i$ and $v_i$ are normalized (you may notice that when $k=n$ and the $u_i$ and $v_j$ are orthogonal this is the SVD of $A$) then
$$\operatorname{tr}(A)= \operatorname{tr}(\sum_i \sigma_i|u_i\rangle\langle v_i|) = \sum \sigma_i\langle v_i|u_i\rangle $$
Which is a weighted sum of how much the orthogonal complements of the kernels of the rank-1 components align with their images. Due to linearity, it does not matter how we represent $A$ as a sum of rank-1 matrices.
Notable special cases:
if $A$ has an orthogonal eigenbasis, then $A=\sum_i \lambda_i |v_i\rangle\langle v_i|$ and so $\operatorname{tr}(A)=\sum_i\lambda_i\langle v_i|v_i\rangle = \sum_i \lambda_i$. Here the orthogonal complements and the images of the rank 1 components are perfectly aligned.
For a projection matrix $P$ (i.e. $P^2=P$) we have $\operatorname{tr}(P)=\dim \operatorname{Im}(P)$. Since $P$ acts like an identity on the subspace is projects onto, again the orthogonal complements of the kernel and the image are perfectly aligned.
For a nilpotent matrix $N$, the trace is zero. In fact, we can write $N$ as a sum of rank-1 matrices where the orthogonal complement of the kernel is orthogonal to the image for each of them. (can be proven for example via Schur decomposition)
Here's a cute geometric interpretation: the trace is the derivative of the identity at the origin. That is, we have
$$\det (1 + At) = 1 + \text{tr}(A) t + O(|t|^2)$$
So if you think of the determinant geometrically in terms of volumes, the trace is telling you something about how a matrix very close to the identity changes volumes. Similarly we have the identity
$$\det \exp(At) = \exp(\text{tr}(A) t).$$
This identity explains, among other things, why the Lie algebra of the special linear group $SL_n$ is the Lie algebra $\mathfrak{sl}_n$ of matrices with zero trace.
The argument in KCd's answer can be substantially generalized and you can get some pretty pictures out of it. There is a way of defining the trace using what are called string diagrams which, among other things, makes it immediately clear why the trace satisfies the cyclicity property $\text{tr}(AB) = \text{tr}(BA)$ (note that this is at least apparently slightly stronger than being conjugation-invariant): see this blog post and this blog post. As a teaser, once the appropriate notation has been introduced and the appropriate lemmas proven, here is a complete proof of cyclicity: