Characterization of the exterior derivative $d$
I think this might be a fairly intuitive way of understanding this statement. You are defining the operator $d$ by the property $f^*(d\omega)=df^*(\omega)$ for all forms $\omega$ and all diffeomorphisms $f$. A more general linear map from $\Lambda^p$ to $\Lambda^{p+1}$ (I like this notation for the exterior p forms) would be a general affine connection $\nabla$. Now, using your defining identity twice, we find \begin{equation} f^*(d^2\omega)=d^2f^*(\omega)=0 \end{equation} which is clearly true of the exterior derivative. For a more general connection, we have \begin{equation} f^*(\nabla^2\omega)=f^*(\Omega \omega)=\tilde{\nabla}^2 f^*(\omega)=\tilde{\Omega}f^*(\omega), \end{equation} where $\Omega$ is the curvature of the connection, and $\tilde{\nabla}$ and $\tilde{\Omega}$ denote the induced connection and curvature under the diffeomorphism. Clearly, in general the form of the induced curvature of the connection depends on the diffeomorphism. However, in the case where the curvature is zero, the curvature remains form invariant under all diffeomorphisms.
So we have established a relationship between your definition and the more usual requirement that $d^2=0$. As $d$ is the unique affine connection that satisfies this requirement, we have established the result.
I personally think Palais' proof is clear, motivated, reasonably elementary. Maybe I can add some intuition.
- The key observation is that if $T:\Omega^p\to\Omega^{p+1}$ is linear and commutes with diffeomorphisms, then it must be local, in the sense that if $V$ is open, then $T\omega|_V$ is determined by $\omega|_V$. To see this, we observe that for any point $x$ we can cook up a diffeo $f$ that acts like a dilatation by $r\neq 1$ on the tangent space at $x$ and which is identity outside some neighborhood $U$ of $x$. Now, if $\omega$ vanishes on this neighborhood, then $f^\star\omega=\omega$. So, $$r^{q+1}(T\omega)|_x=f^\star(T\omega)|_x=T(f^\star \omega)|_x=T\omega|_x,$$ thus $T\omega|_x=0.$
- You can further restrict possible properties of $T$ by observing that in any coordinate chart, it must commute with shifts (which makes sense due to locality). Indeed, you can cook up a diffeo that looks like a shift locally in a coordinate chart and which is an identity far away.
- It is clear that this is already quite restrictive. For example, a continuous map on the space of test functions that commutes with shifts is clearly a convolution with a generalized function. If our map is local, then the support of the latter must be the origin. It is known that such a generalized function is a finite linear combination of Dirac's delta and its derivatives. Of course we don't know that our map is continuous, but it is plausible that commuting with all diffeos is strong enough to outrule pathological examples. Thus, we may conclude that in any coordinate chart, our map should be a differential operator with constant coefficients.
- Using scaling again, we can moreover infer that our operator must be first-order.
- At this point, just looking at a bunch of concrete linear maps (say, ones that dilate one coordinate) will fix $T$ to be the exterior derivative.