Why are characters so well-behaved?
Orthogonality makes sense without character theory. There's an inner product on the space of representations given by $\dim \operatorname {Hom}(V, W)$. By Schur's lemma the irreps are an orthonormal basis. This is "character orthogonality" but without the characters.
How to recover the usual version from this conceptual version? Notice $$\dim \operatorname{Hom}(V,W) = \dim \operatorname{Hom}(V \otimes W^*, 1)$$ where $1$ is the trivial representation. So in order to make the theory more concrete you want to know how to pick off the trivial part of a representation. This is just given by the image of the projection $\frac1{|G|} \sum_{g\in G} g$.
The dimension of a space is the same as the trace of the projection onto that space, so
$$
\def\H{\rule{0pt}{1.5ex}H}
\dim \operatorname{Hom}(V \otimes W^*, 1) = \operatorname{tr}\left(\frac1{|G|} \sum_{g\in G} {\large \rho}_{\small V \otimes W^*}(g)\right)
= \frac1{|G|} \sum_{g\in G} {\large\chi}_{V}(g)\ {\large\chi}_{W}\left(g^{-1}\right)
\\
$$ using the properties of trace under tensor product and duals.
The trace is about the strongest general way we have to linearly project a non-abelian situation (matrices) to an abelian situation (scalars): tr(AB)=tr(BA). By using the trace, the representation theory of non-abelian groups begins to resemble the representation theory of abelian groups, i.e. Fourier analysis. (Note though that the correspondence becomes less tight when considering triple products: tr(ABC) != tr(CBA). For related (though not identical) reasons, the theory of tensor products of representations is far richer in the nonabelian world (Littlewood-Richardson coefficients, etc.) than it is in the abelian world (convolution), and characters aren't always the best way to proceed here.)
This of course raises the question of why Fourier analysis is so miraculous, but I tend to take that as axiomatic. :-)
As to "why take the trace and not any other coefficient of the characteristic polynomial", note that for completely elementary reasons the trace of the whole representation still knows the characteristic polynomial of each individual element: for instance the second-from-top coefficient of the characteristic polynomial of rho(g) is 1/2(tr(rho(g))^2 - tr(rho(g^2))). Writing down the formula for subsequent coefficients is an exercise with symmetric functions. On the other hand, the higher coefficients of the characteristic polynomial do lose information -- e.g. non-isomorphic representations rather often have the same determinant.