Intuition behind classical virial theorem
The conclusion – the claim of the virial theorem – is not "just some math" because all the objects in the claim have a physical interpretation. So it's physics and it has big implications in theoretical physics as well as applied physics.
The derivation is a mathematical derivation but it's not right to attach the disrespectful word "just" to a mathematical derivation. Mathematical derivations are the most solid and the only truly solid derivations one may have in science. On the contrary, it's derivations and intuitions that are not mathematical that should be accompanied by the word "just" because they are inferior. Instead, the right way is to adjust one's intuition so that it's compatible with the most solid results in physics – and they're the mathematically formulated results. Incidentally, there are various derivations – dealing with the microcanonical ensemble, canonical ensemble etc. The details of the proof differ in these variations but the overall physical conclusion is shared and important.
The exact proof of the theorem can't be simplified too much – otherwise people would do so – but one may offer heuristic, approximate proofs for approximate versions of the virial theorem and its special cases. For example, the quantity in the expectation value contains the derivative of $H$ with respect to a coordinate. The larger the derivative is, the more the Hamiltonian increases with the coordinate, and the more the Boltzmann factor $\exp(-H/kT)$ of the canonical distribution decreases with the coordinate which makes the expectation value of the coordinate smaller. So if we multiply the quantity by the coordinate again, we get something that behaves constantly, independently of the slope. And indeed, the expectation value of the product only depends on the temperature.
This theorem is important in statistical physics because statistical physics is all about the computation of statistical averages of various quantities, the theorem allows us to express some expectation values in a simpler way, and $x_i \cdot \partial H / \partial x_j$ are among the simplest and most important quantities whose statistical averages may be computed or interesting. So we should better know how they behave.
An important special case of the theorem you mentioned deals with the calculation of the expectation value of the kinetic energy and the potential energy. The former is $n/2$ times the latter for power-law potentials of the form $ar^n$, for example. So we know how big a percentage of the energy is stored in the kinetic one and how big portion is the potential energy. For example, both the kinetic and potential energy contribute 50% for harmonic-oscillator-like $r^2$ potentials. For the Keplerian or Coulomb $-C/r$ potential, i.e. $n=-1$, the potential energy is negative, $-|V|$, and the kinetic energy is $+|V|/2$, reducing the potential one by 50% while keeping the total energy negative. There are many other things we may learn from the theorem in various situations – and in classes of situations.