Zagier's one-sentence proof of a theorem of Fermat
This paper by Christian Elsholtz seems to be exactly what you're looking for. It motivates the Zagier/Liouville/Heath-Brown proof and uses the method to prove some other similar statements. Here is a German version, with slightly different content.
Essentially, Elsholtz takes the idea of using a group action and examining orbits as given (and why not -- it's relatively common) and writes down the axioms such a group action would have to fulfill to be useful in a proof of the two-squares theorem. He then algorithmically determines that there is a unique group action satisfying his axioms -- that is, the one in the Zagier proof. The important thing is that having written down these (fairly natural) axioms, there's no cleverness required; finding the involution in Zagier's proof boils down to solving a system of equations.
Let me answer your question "where do these involution come from" with an elementary geometric explanation. You can skip ahead to the pictures, which are somewhat self-explanatory, I hope.
The elements in the the $S$, i.e. triplets $(x,y,z)$ such that $x^2+4yz=p$, can be visualized as a square of side length $x$ together with $4$ rectangles of size $y\times z$, where we place the rectangles such that for each corner of the square, a corner of a rectangle coincides with the corner of the square and a side of length $y$ coincides with the side of the square clockwise adjacent to the corner, and the interior of the rectangles and the square do not intersect. See the pictures below for examples.
While it might seem trivial to display the number $x^2+4yz$ in this fashion, it very nicely illustrate Zagier's involution $$(x, y, z)\mapsto\begin{cases}(x+2z, z, y-x-z) &\text{if } x < y-z\\(2y-x, y, x-y+z)&\text{if } y-z < x < 2y\\(x-2y, x-y+z, y)&\text{if } 2y < x \end{cases} .$$ In fact, an element of $S$, visualized in the way described above covers an area of a square (of side length $x+2z$) with $4$ rectangles removed. However, given such a square with oblong rectangles removed, there are precisely two ways of cutting it into a smaller configuration of a square and $4$ rectangles. Interchanging the two possible cuttings is precisely what Zagier's involution does on the non fixed points. The only fixpoint occurs if the area covered is a square with $4$ squares removed. For a prime number $p=1+4k$, this happens presicely once, namely for the configuration associated to $(x,y,z)=(1,1,k)$.
We provide a complete example for $p=41$ (so $k=10$). Each picture shows two elements of $S$, which are mapped to each other under Zagier's involution and their common shape (in grey).
The last image dispays the unique fixpoint of the involution.
The other involution $(x,y,z)\mapsto (x,z,y)$ can of course also be visualized; the rectangles are simply rotated.
If you have this simple illustration of Zagier's involution in mind, it is easy to recover the formula of it, which I find hard to remember otherwise. In this sense, I hope this answers where this involution comes from; although I am not sure, that Zagier was thinking in these images, when he wrote his one-liner.
I heard about this illustration from Günter M. Ziegler, and as I understand it, Aigner and Ziegler plan to include it in the next edition of their "Proofs from the book". The first source seems to be some notes from a A. Spivak, which can be found here: A. Spivak : Крылатые квадраты (Winged squares), Lecture notes for the mathematical circle at Moscow State University, 15th lecture 2007
I would be interested in knowing where this illustration appeared first, since it does not (yet) appear to be well known.
As the answers above linked to an old paper of mine (in German, and a somewhat different English preprint), some readers might like to know that an updated version is to appear very soon and is now linked on my webpage:
http://www.math.tugraz.at/~elsholtz/WWW/papers/papers30nathanson-new-address3.pdf
In addition to the motivation of the Heath-Brown/Zagier proof it contains for example
a) a discussion of a lattice point proof (section 1.6)
b) much more historical information and links to other work
c) an alternative motivation of the Heath-Brown-Zagier proof, due to Dijkstra (section 2.3)