Is there a preferable convention for defining the wedge product?
I think a lot of people run into this issue. The way I think about it is the following:
Take your finite-dimensional vector space $V$ and form its tensor algebra $T(V)$. Define $\mathcal{J}$ to be the 2-sided ideal in $T(V)$ generated by elements of the form $v \otimes v$, and then define the exterior algebra to be $\Lambda(V) = T(V) / \mathcal{J}$. This exhibits the exterior algebra as a quotient of the tensor algebra.
The different conventions you see for the wedge product arise from different embeddings of the exterior algebra into the tensor algebra. Define on $V^{\otimes n}$ the map $$ A_n (v_1 \otimes \dots \otimes v_n) = \frac{1}{n!} \sum_{\pi \in S_n} sgn(\pi) v_{\pi(1)} \otimes \dots \otimes v_{\pi(n)}, $$ (or possibly with $\pi^{-1}$ instead of $\pi$, although I guess it doesn't matter) and then define on the tensor algebra the map $$A = \bigoplus_{n=0}^{\infty} A_n.$$ Then you can show easily that $A_n^2 = A_n$ for all $n$, so that $A$ is a projection.
The point is that $\mathcal{J} = \mathrm{ker} (A)$, so that you can identify the quotient $\Lambda(V)$ with $\mathrm{im} A$, i.e. we have now embedded the exterior algebra as a subspace of the tensor algebra. This is where the two conventions differ. I have defined $A_n$ with a $\frac{1}{n!}$ in front, but some don't do so. Of course, this doesn't change the kernel of the map, but it does change the embedding of the exterior algebra into the tensor algebra.
The important point is that $A$ is not an algebra map of $T(V)$ to itself, so the embedding $\Lambda(V) \to T(V)$ is not an embedding of algebras. Now you ask how to describe the exterior product in terms of the product in $T(V)$. Take $\alpha \in \Lambda^k(V)$ and $\beta \in \Lambda^l(V)$ with representatives $\tilde{\alpha} \in \mathrm{im}(A_k)$ and $\tilde{\beta} \in \mathrm{im}(A_l)$, respectively. Then $A_{k+l}(\tilde{\alpha} \otimes \tilde{\beta})$ is the representative of $\alpha \wedge \beta$ that you're looking for.
Essentially, it boils down to whether or not you put the $\frac{1}{n!}$ in front of your alternating map or not.
The issue here is that there are really two tasks: (1) Define the algebra of differential forms on a manifold, and (2) implement them as multilinear functions on tangent vectors. The natural way to define them is along the lines of what Donu said: The differential forms at a point are like a polynomial algebra over the vector space $V = T^*_pM$, except supercommutative (or graded-commutative) instead of commutative. The supercommutativity condition is identical to taking a quotient of the tensor algebra, which is the free non-commutative algebra over the vector space $V$.
But then for the second task, you would like a monomial, in a standard basis of cotangent vectors, to take values of $\{0,1,-1\}$ if you pair it with a standard basis of vectors. For example, you would like to say $$dx \wedge dy = dx \otimes dy - dy \otimes dx,$$ because that evaluates to $1$ on $(\hat{x},\hat{y})$ and $-1$ on $(\hat{y},\hat{x})$. In order to do this, you have to implement the wedge product with antisymmetrization and with factorials, actually the reciprocal of the factor you give: $$\alpha \wedge \beta = \frac{(a+b)!}{a!b!} \mathrm{Alt}(\alpha \otimes \beta).$$
If I were explaining the subject, I would handle points (1) and (2) separately. It is common to conflate the two concerns. It amounts to either definition forms as a subspace of tensors (the usual solution), or as a quotient space of tensors. The real issue is that they need to be both, and that double role leads you to the factorial factors.
The answers by Greg and MTS are quite thorough, so there is not much more to say about that. However, I would like to explain my comment that viewing differential forms as antisymmetric tensors is often inadvisable, although I don't want to seem too dogmatic about this.
My first argument is pedagogical. Making the above identification can be confusing (as evidenced by the question) and is frequently besides the point. A few years back, when I taught a vector calculus class, I decided to do differential forms. The students had no idea about tensor products or multilinear algebra, so it would have been a bad idea to attempt this approach. Instead, I told them that $dx$ etc. were symbols subject to the chain rule $dx = \frac{\partial x}{\partial u}du+\ldots$, and that they could be multiplied in such a way that $dx\wedge dy= - dy\wedge dx$. I gave a heuristic explanation in terms of oriented areas of "infinitesimal" rectangles for why this should be so... I won't claim that the experiment was entirely successful, but it could have been a lot worse.
My second argument is more mathematical. Differential forms can be defined within algebraic geometry for quite general spaces. Here the approach using antisymmetric tensors can lead to serious problems. In characteristic $p>0$, the denominators will be undefined in general. As an interesting side note, in the algebraic proof of the Hodge theorem by Deligne and Illusie they do find it necessary to make this identification. But they have to restrict the dimension of the space to be less than $p$ for precisely this reason. Although in the limiting case of characteristic $0$, this is a nonissue.