What do heat kernels have to do with the Riemann-Roch theorem and the Gauss-Bonnet theorem?
Added 2 June:
Since the summary below is already a bit long, I thought I'd add a few lines at the beginning as a guide. The proofs all proceed as follows:
Identify the quantity of interest (like the Euler characteristic) as the index of an operator going from an 'even' bundle to an 'odd' bundle.
Use Hodge theory to write the index in terms of the dimensions of harmonic sections, i.e., kernels of Laplacians.
Use the heat evolution operator for the Laplacians and 'supersymmetry' to rewrite this as a 'supertrace.'
Write the heat evolution operator in terms of the heat kernel to express the supertrace as the integral of a local density.
Use the eigenfunction expansion of the heat kernel to identify the constant (in time) part of the local density.
Most of this is general nonsense, and the difficult step is 5. By and large, the advances made after the seventies all had to do with finding interpretations of this last step that employed intuition arising from physics.
I suffered over this proof quite a bit in my pre-arithmetic youth and wrote up a number of summaries. A condensed and extremely superficial version is given here, mostly for my own review. If by chance someone finds it at all useful, of course I will be delighted. I apologize that I don't say anything about physical intuition (because I have none), and for repeating parts of the previous nice answers. It's been years since I've thought about these matters, so I will forgo all attempts at even a semblance of analytic rigor. In fact, the main pedagogical reason for posting is that a basic outline of the proof is possible to understand with almost no analysis.
The usual setting has a compact Riemannian manifold $M$, two hermitian bundles $E^+$ and $E^-$, and a linear operator $$P:H^+\rightarrow H^-,$$ where $H^{\pm}:=L^2(E^{\pm})$. With suitable assumptions (ellipticity), $ker(P)$ and $coker(P)$ have finite dimension, and the number of interest is the index: $$Ind(P)=dim(ker(P))-dim(coker(P)).$$ This can also be expressed as $$dim(ker(P))-dim(ker(P^{ *})),$$ where $$P^{*}:H^-\rightarrow H^+$$ is the Hilbert space adjoint. A straightforward generalization of the Hodge theorem allows us also to write this in terms of Laplacians $\Delta^+=P^* P$ and $\Delta^-=PP^*$ as $$dim(ker(\Delta^+))-dim(ker(\Delta^-)).$$ Things get a bit more tricky when we try to identify the index with the expression ('supertrace,' so-called) $$Tr(e^{-t\Delta^+})-Tr(e^{-t\Delta^-}).$$ The operator $$e^{-t\Delta^{\pm}}:H^{\pm}\rightarrow H^{\pm}$$ sends a section $f$ to the solution of the heat equation $$\frac{\partial}{\partial t} F(t,x)+\Delta^{\pm}F(t,x)=0$$ ($x$ denoting a point of $M$) at time $t$ with intial condition $F(0,x)=f(x).$ One important part of this is that there are discrete Hilbert direct sum decompositions $$H^+=\oplus_{\lambda} H^+(\lambda)$$ and $$H^-=\oplus_{\mu} H^-(\mu)$$ in terms of finite-dimensional eigenspaces for the Laplacians with non-negative eigenvalues. And then, the identities $$\Delta^-P=PP^{*}P=P\Delta^+$$ and $$\Delta^+P^{*}=P^{*}PP^{*}=P^{*}\Delta^-$$ show that the (supersymmetry) operators $P$ and $P^{*}$ can be used to define isomorphisms between all non-zero eigenspaces of the two Laplacians with a correspondence for eigenvalues as well. Thus, once you believe that the exponential operators are trace class, it's easy to see that the only contributions to the trace are from the kernels of the plus and minus Laplacians. This is the 'easy cancellation' that occurs in this proof. But on the zero eigenspaces, the heat evolution operators are clearly the identity, allowing us to identify the supertrace with the index. To summarize up to here, we have $$Ind(P)=Tr(e^{-t\Delta^+})-Tr(e^{-t\Delta^-}).$$ This identity also makes it obvious that the supertrace is in fact independent of $t>0$.
The proofs under discussion all have to do with identifying this supertrace in terms of local expressions that relate naturally to characteristic classes. The beginning of this process involves first writing the operator $e^{-t\Delta^+}$ in terms of an integral kernel $$K^+_t(x,y)=\sum_i e^{-t\lambda_i } \phi^+_i(x)\otimes \phi^+_i(y)$$ where the $\phi^+_i$ make up an orthonormal basis of eigenvectors for the Laplacian. That is, $$[e^{-t\Delta^+}f](x)=\int_M K^+_t(x,y)f(y)dvol(y)=\sum_i e^{-t\lambda_i } \int_M \phi^+_i(x) \langle \phi^+_i(y),f(y)\rangle dvol(y).$$ Formally, this identity is obvious, and the real work consists of the global analysis necessary to justify the formal computation. Obviously, there is a parallel discussion for $\Delta^-$. Now, by an infinite-dimensional version of the formula that expresses the trace of a matrix as a sum of diagonals, we get that $$Tr(e^{-t\Delta^+})=\int_M Tr(K^+_t(x,x))dvol(x)=\int_M \sum_ie^{-t\lambda_i}||\phi^+_i(x)||^2 dvol(x),$$ an integral of local (point-wise) traces, and similarly for $Tr(e^{-t\Delta^-})$. One needs therefore, techniques to evaluate the density
$$\sum_ie^{-t\lambda_i}||\phi^+_i(x)||^2 dvol(x)-\sum_ie^{-t\mu_i}||\phi^-_i(x)||^2 dvol(x).$$
More analysis gives an asymptotic expansion for the plus and minus densities of the form $$ a^{ \pm }_{-d/2}(x) t^{-d/2}+a^{ \pm }_{d/2+1}(x) t^{-d/2+1}+\cdots $$ where $d$ is the dimension of $M$.
Up to here the discussion was completely general, but then the proof begins to involve special cases, or at least, broad division into classes of cases. But note that even for the special cases mentioned in the original question, one would essentially carry out the procedure outlined above for a specific operator $P$.
The breakthrough in this line of thinking came from Patodi's incredibly complicated computations for the operator $d+d^*$ going from even to odd differential forms, where one saw that the $$a^{+}_i(x)$$ and $$a^{-}_i(x)$$ canceled each other out locally, that is, for each point $x$, for all the terms with negative $i$. I think it was fashionable to refer to this cancellation as 'miraculous,' which it is, compared to the easy cancellation above. At this point, Patodi could take a limit $$\lim_{t\rightarrow 0}[\sum_ie^{-t\lambda_i}||\phi^+_i(x)||^2 dvol(x)-\sum_ie^{-t\mu_i}||\phi^-_i(x)||^2 dvol(x)],$$ that he identified with the Euler form. This important calculation set a pattern that recurred in all other versions of the heat kernel approach to index theorems. One proves the existence of an analogous limit as $t\rightarrow 0$ and identifies it. The identification as a precise differential form representative for a characteristic class is referred to sometimes as a local index theorem, a statement more refined than the topological formula for the global index. There is even a beautiful version of a local families index theorem that relates eventually to deep work in arithmetic intersection theory and Vojta's proof of the Mordell conjecture.
As I understand it, Gilkey's contribution was an invariant theory argument that tremendously simplified the calculation and allowed a differential form representative for the $\hat{A}$ genus to emerge naturally in the case of the Dirac operator. And then, I believe there is a $K$-theory argument that deduces the index theorem for a general elliptic operator from the one for the twisted Dirac operator.
Experts can correct me if I'm wrong, but from a purely mathematical point of view, essentially all the work on the heat kernel proof was done at this point. Subsequent interpretations of the proof (more precisely, the supertrace), in terms of supersymmetry, path integrals, loop spaces, etc., were tremendously influential to many areas of mathematics and physics, but the mathematical core of the index theorem itself appears to have remained largely unchanged for almost forty years. In particular, the terminology I've used myself above, the super- things, didn't occur at all in the original papers of Patodi, Atiyah-Bott-Patodi, or Gilkey.
Added:
Here is just a little bit of geometric-physical intuition regarding the heat kernel in the Gauss-Bonnet case, which I'm sure is completely banal for most people. The density $$\sum_ie^{-t\lambda_i}||\phi^+_i(x)||^2 dvol(x)-\sum_ie^{-t\mu_i}||\phi^-_i(x)||^2 dvol(x)$$ expresses the heat kernel in terms of orthonormal bases for the even and odd forms. When $t\rightarrow \infty$ all terms involving the positive eigenvalues decay to zero, leaving only contributions from orthonormal harmonic forms. This is one way to to see that the integral of this density, which is independent of $t$, must be the Euler characteristic. On the other hand, as $t\rightarrow 0$, the operator $$K^+_t(x,y)dvol(y)=[\sum_i e^{-t\lambda_i } \phi^+_i(x)\otimes \phi^+_i(y)]dvol(y)$$ literally approaches the identity operator on all even forms (except for the fact that it diverges). That is, the heat kernel interpolates between the identity and the projection to the harmonic forms, in some genuine sense expressing the diffusion of heat from a point distribution to a harmonic steady-state. A similar discussion holds for the odd forms as well. I can't justify this next point even vaguely at the moment, but one should therefore think of $$[K^+_t(x,y)-K^-_t(x,y)]dvol(y)$$ as regularizing the current on $M\times M$ given by the diagonal $M\subset M\times M$. Thus, the integral of $$[TrK^{+}_t(x,x)-TrK^-_t(x,x)]dvol(x)$$ ends up computing a deformed self-intersection number of the diagonal in $M\times M$. From this perspective, it shouldn't be too surprising that the Euler class, representing exactly this self-intersection, shows up.
Added:
I forgot to mention that the Riemann-Roch case is where $$P=\bar{\partial}+\bar{\partial}^*$$ going from the even to the odd part of the Dolbeault resolution associated to a holomorphic vector bundle. The limit of the local density is a differential form representing the top degree portion of the Chern character of the bundle multiplied by the Todd class of the tangent bundle. Perhaps it's worth stressing that these special cases all go through the general argument outlined above.
Here is how the heat kernel proof of Atiyah-Singer goes at a high level. Let $(\partial_t - \Delta)u = 0$ and define the heat kernel (HK) or Green function via $\exp(-t\Delta):u(0,\cdot) \rightarrow u(t,\cdot)$. The HK derives from the solution of the heat equation on the circle:
$u(t,\theta) = \sum_n a_n(t) \exp(in\theta) \implies a_n(t) = a_n(0)\cdot \exp(-tn^2)$
For a sufficiently nice case the solution of the heat equation is $u(t,\cdot) = \exp(-t\Delta) * u(0,\cdot)$.
The hard part is building the HK: we have to compute the eigenstuff of $\Delta$ (this is the Hodge theorem). But once we do that, a miracle occurs and we get the
Atiyah-Singer Theorem: The supertrace of the HK on forms is constant: viz.
$\mathrm{Tr}_s \exp(-t\Delta) = \sum_k (-1)^k \,\mathrm{Tr} \exp(-t\Delta^k) = \mathrm{const}.$
For $t$ large, this can be evaluated topologically; for small $t$, it can be evaluated analytically as an integral of a characteristic class.
Edit per Qiaochu's clarification
This article of Kotake (really in here as the books seem to be mixed up) proves Riemann-Roch directly using the heat kernel.
In my opinion, the most "physical" recasting of the index theorem is the Witten index for supersymmetric quantum theories. It is superficially similar to the heat kernel, but much more general. The Witten index of a supersymmetric quantum mechanical system is the regularised supertrace $$\mathrm{Tr} (-1)^F \exp(-\beta H)$$ where $(-1)^F$ is the grading operator which is $-1$ on fermionic states and $+1$ on bosonic states and $H$ is the hamiltonian. The trace is taken over the Hilbert space of states.
One can show that this does not depend on $\beta$ and hence can be evaluated both at small $\beta$ ("large temperature expansion") or large $\beta$ ("small temperature expansion"). In one expansion one sees that it computes the trace of $(-1)^F$ on zero modes of the hamiltonian, since for a supersymmetric system the dimensions of the bosonic and fermionic positive-energy eigenstates are equal. In the other expansion one writes the Witten index in terms of a functional integral, which (when formally manipulated) becomes a geometric integral. The formal manipulations can be justified as in Getzler's proof of the local Atiyah-Singer index theorem.
The relation with the heat kernel comes from taking a particular supersymmetric model in which the hamiltonian is the laplacian. The relation with the Gauss-Bonnet theorem comes from considering a supersymmetric sigma model in which the hamiltonian is the Hodge laplacian acting on ($L^2$) differential forms. The Witten index then is computing the index of the elliptic operator $d + \delta$ from the odd to the even rank differential forms, which is the Euler characteristic of the manifold.
There are supersymmetric models for which the Witten index computes the index of a generalised Dirac operator as in the original work of Atiyah and Singer.
Witten introduced "his" index in order to study supersymmetry breaking. A nonzero value of the index shows that there exists a vacuum state (=a state of minimal energy) which is invariant under supersymmetry and hence supersymmetry is not (spontaneously) broken.
There is yet another relation between the heat kernel and the index theorem and it comes from a certain regularisation of the functional integral measure as in Fujikawa's celebrated derivation of the chiral anomaly.