Why is the standard definition of cocycle the one that _always_ comes up??
Let me just beat a dead horse a little. As Mariano and DC have mentioned (and you mentioned in your original question), there are multiple equivalent ways to compute group cohomology, or Ext in general, because you can pick any resolution you want. However, if you want a systematic resolution that always works then you need to work harder.
The resolution that you wrote down in terms of G acting on Gi+1 works perfectly well but it depends on a special fact about the group algebra: it's a Hopf algebra, and as a result it has a sensible way to act on a tensor product of copies of itself. Not all rings have such properties.
The "clunky" definition works for more general rings. Given any ring R and an R-module M, there is always an exact sequence
$$\cdots \to R \otimes R \otimes M \to R \otimes M \to M \to 0$$
called the bar resolution of R over M, and it arises from a simplicial construction via some natural adjoint-functor considerations - namely, there is an adjoint pair of the forgetful functor from R-modules to ℤ-modules and its left adjoint, the free R-module functor, and together they construct a simplicial object which you convert into a chain complex etc etc. If you specialize this construction to R = ℤ[G], you recover the "clunky" resolution and its specific formula for the boundary and coboundary operators. So somehow the definition that is less intuitive comes from forgetting that we're working in a grouplike context and being systematic there.
Because I can't help myself, I should mention that the above bar resolution is only a free resolution if R and M are free over ℤ (or if you replace ℤ and tensor with some ground ring over which they are both free). If not, then in order to get a canonical resolution you have to dip all the way down to the forgetful functor from R-modules to sets and its adjoint, the free R-module on a set, and this gives you an absolutely nightmarish but always-valid resolution that is even less intuitive.
The difference between f(1,g) and f(g,1) is generally an issue of whether mathematicians give preference to "domains" or "ranges" of maps.
Here is one way that you could think of this. I can write EG for a category whose objects are objects are elements of G, and where each pair of objects has a unique map between them. This category has an action of G on it, and you can ask about G-equivariant functors from this category to another category what has a G-action on it.
To define such a functor on the level of objects it suffices to define F(1), where 1 is the unit; equivariance forces us to define F(g) = g F(1). On the level of morphisms, however, we have to make a choice. The unique morphism g→h becomes a morphism F(g)→F(h), and to make such maps compatible with the G-action it suffices to make one of the following sets of choices:
- We could define maps fh:F(1)→F(h) for all h, and get all the other maps as g fh:F(g)→F(gh). To be a functor, we need this to satisfy the cocycle condition fgh = (g fh) fg.
- We could define maps dh:F(h)→F(1) for all h, and get all the other maps as g dh:F(gh)→F(g). To be a functor, we need this to satisfy the cocycle condition dgh = dg (g dh).
In group cohomology, H1(G,M) classifies splittings in the semidirect product of G with M, and the cocycle condition we get comes from our convention of writing this group as pairs (m,g) (which is in the same order as the exact sequence it fits into) and not (g,m). Similarly for H2(G,M).
I would say that I've hit nonstandard cocycle definitions several times because I've been too lazy to come up with sensible conventions about when I'm thinking about domains and ranges or trying to sweep it under the rug, especially when dealing with Hopf algebroids and cohomological calculations there.
I don't have a good answer for higher cocycle conditions other than saying that writing 2-cochains using f(g,1,h) is somehow more unusual than either of the other 2 choices because it's somehow derived from focusing on the "middle" object in a double composite of maps.
Late to the party as usual, but: the goal of this answer is to convince you that the standard convention for $2$-cocycles is so natural that you should consider it perverse to consider any other convention, modulo "applying a canonical involution to everything," as you say. To keep things simple let's only deal with trivial action on coefficients. The motivating question is the following:
What does it mean for a group $G$ to act on a category $C$?
For starters we should attach to each element of $G$ a functor $F(g) : C \to C$. Next we could require that $F(g) \circ F(h) = F(gh)$, but we should really weaken equalities of functors to natural isomorphisms whenever possible. Hence we should attach to each pair of elements of $G$ a natural isomorphism
$$\eta(g, h) : F(g) \circ F(h) \to F(gh).$$
This is the point at which we pick a convention for how we're going to represent $2$-cocycles. Instead of talking about $\eta(g, h)$ we could talk about its inverse; which we choose corresponds to whether we prefer to talk about lax monoidal or oplax monoidal functors, since what we're going to end up writing down is a lax monoidal resp. an oplax monoidal functor from $G$ (regarded as a discrete monoidal category) to $\text{Aut}(C)$ (regarded as a monoidal category under composition).
In any case, let's stick to the above choice (the lax one). Then the isomorphisms $\eta(g, h)$ should satisfy some coherence conditions, the important one being the "associativity" condition that the two obvious ways of going from $F(g_1) \circ F(g_2) \circ F(g_3)$ to $F(g_1 g_2 g_3)$ should agree.
Now let's assume that in addition all of the functors $F(g)$ are the identity functor $\text{id}_C : C \to C$. Then the only remaining data in a group action is a collection of natural automorphisms
$$\eta(g, h) : \text{id}_C \to \text{id}_C$$
of the identity functor. For any category $C$, the natural automorphisms of the identity functor naturally form an abelian (by the Eckmann-Hilton argument) group which here I'll call its center $Z(C)$ (but this notation is also used for the commutative monoid of natural endomorphisms of the identity). So we get a function
$$\eta : G \times G \to Z(C).$$
The important coherence condition I mentioned above now reduces (again by the Eckmann-Hilton argument) to the condition that for any $g_1, g_2, g_3 \in G$ we have
$$\eta(g_1, g_2) \eta(g_1 g_2, g_3) = \eta(g_2, g_3) \eta(g_1, g_2 g_3)$$
which is precisely the standard cocycle condition. (Coboundaries come in when you ask what it means for two group actions to be equivalent; I'm going to ignore this.)
The only reason this condition, which recall is in general just the statement that the two obvious ways of going from $F(g_1) \circ F(g_2) \circ F(g_3)$ to $F(g_1 g_2 g_3)$ should agree, could ever have looked anything other than completely natural is that it's a degenerate special case where the sources and targets of the various maps involved have been obscured because they are identical. In particular, of course I could have instead chosen to think about the natural isomorphisms
$$\eta(g, g^{-1} h) : F(g) \circ F(g^{-1} h) \to F(h)$$
(which corresponds to your $f(1, g, h)$), but now
- it's no longer at all obvious how to state the associativity condition succinctly, and
- this requires that I make explicit use of the fact that $G$ is a group.
The discussion up til now in fact gives a perfectly reasonable definition for what it means for a monoid to act on a category. (If I want to weaken "natural isomorphism" to "natural transformation," though, I get two genuinely different possibilities depending on whether I pick lax or oplax monoidal functors.)
Reflecting on associativity suggests that, for a more "unbiased" point of view, we should consider families of natural isomorphisms
$$\eta(g_1, g_2, \dots g_n) : F(g_1) \circ F(g_2) \circ \dots \circ F(g_n) \to F(g_1 g_2 \dots g_n)$$
and then impose a "generalized associativity" condition that every way of composing them to get a natural isomorphism with the same source and target as $\eta(g_1, g_2, \dots g_n)$ should give $\eta(g_1, g_2, \dots g_n)$. Another way to say this is that the cocycle condition (in the $F(g) = \text{id}_C$ special case, at least) should really be written
$$\eta(g_1, g_2, g_3) = \eta(g_1, g_2) \eta(g_1 g_2, g_3) = \eta(g_2, g_3) \eta(g_1, g_2 g_3).$$
This is in the same way that we can consider a monoid operation to be a family $m(g_1, g_2, \dots g_n) = g_1 g_2 \dots g_n$ of operations satisfying a generalized associativity condition, and in particular satisfying
$$m(g_1, g_2, g_3) = m(m(g_1, g_2), g_3) = m(g_1, m(g_2, g_3)).$$
Namely, by "associativity" we usually mean that the middle expression equals the right, but really the reason that the middle expression equals the right is that they both equal the left.