Why is the tensor product constructed in this way?
The "product" of vector spaces is more properly thought of as a direct sum, since the dimension increases linearly according to addition, $\dim(V\oplus W)=\dim(V)+\dim(W)$. Tensor products are much bigger in size than sums, since we have $\dim(V\otimes W)=\dim(V)\times\dim(W)$. In fact, in analogy to elementary arithmetic, we have distributivity $(A\oplus B)\otimes C\cong (A\otimes C)\oplus(B\times C)$.
Note (as KCd says in the comments), linearity in $V\oplus W$ and $V\otimes W$ are very different:
$$\begin{array}{cl}V\oplus W: & (a+b,c+d) & = (a,0)+(b,0)+(0,c)+(0,d) \\ V\otimes W: & (a+b)\otimes(c+d) & = a\otimes c+a\otimes d+b\otimes c+b\otimes d. \end{array}$$
Another item you might forget is that the tensor product $V\otimes W$ is not simply comprised of the so-called "pure tensors" of the form $v\otimes w$; it also has linear combinations of these elements. While pure tensors may be decomposed into sums of pure tensors (using linearity on both sides of the symbol $\otimes$), not every tensor is amenable to being put into the form of a pure tensor.
Here's one way to formally think about the difference, in a quantum-mechanical spirit. Given a distinguished basis of $V$ and $W$, say $X={\cal B}_V$ and $Y={\cal B}_W$, we may say that $V$ and $W$ are the free vector spaces generated from $X$ and $Y$, i.e. that they each are formal $K$-linear ($K$ being the base field) combinations of elements of $X$ and $Y$ respectively, i.e. $V=KX$ and $W=KY$.
Then, assuming $X,Y$ are disjoint, we may say $V\oplus W\cong K(X\cup Y)$, i.e. we allow the two bases together to form a new basis. But $V\otimes W\cong K(X\times Y)$, and $X\times Y$ is certainly different from the union $X\cup Y$. If we think about $X$ and $Y$ as being sets of "pure states" of some theoretical system, then the direct sum says we think about $X$ and $Y$ as disjoint collections of pure states of a single system, and view the vector spaces as superpositions of pure states, in which case the direct sum is just opening ourselves up to both collections of pure states when we make our superpositions.
But the tensor product has as basis the collection of pure states of the composite system of the two systems underlying $X$ and $Y$. That is, we view them as distinct systems that make up a larger system, so that the state of system 1 may vary independentently of the state of system 2, in which case the collection of pure states for the composite system is $X\times Y$.
The tensor product is a way to encode multilinearity, though the binary operation $\otimes$ by itself only encodes bilinearity. That is, the space of bilinear maps into the ground field $K$, the first argument taking vectors from $U$ and the second taking vectors from $V$, is the tensor product $U^*\otimes V^*$. The dual spaces $U^*$ and $V^*$ (viewing everything as finite-dimensional), have bases that come from bases $\{u_i\}$ and $\{v_i\}$ on $U$ and $V$ respectively.
Specifically, $U^*$ has basis $\{u_i^*\}$, where $u_i^*(u_j)=\delta_{ij}$ is the scalar part of the projection onto the one-dimensional subspace generated by $u_i$, and similarly for $v_i^*$. For the linear vector space of bilinear maps $U,V\to K$, it suffices to check where the basis pairs $u_i,v_j$ is sent, so we can define maps $(u_i\otimes v_j)(u_k,v_\ell)=\delta_{ik}\delta_{j\ell}$ in the exact same spirit, and these bilinear maps $u_i\otimes v_j$ will form a basis of the space of all bilinear maps. (Note that $u_i\otimes v_j$ says "apply $u_i^*$ to the first argument and $v_j^*$ to the second, and multiply the two resulting scalars.") This is the ground covered by muzzlator.
This allows us to reinterpret linear maps between vector spaces in a number of new ways. In particular, the linear maps $U\to V$ may be reinterpreted as linear maps $U\otimes V^*\to K$, or $V^*\to U^*$, or $K\to U^*\otimes V^*$. We also have the tensor-hom adjunction $$\hom(U\otimes V,W)\cong\hom(U,\hom(V,W)),$$ where $\hom(A,B)$ is the space of linear maps $A\to B$. This is the "category of ($K$-)vector spaces" version of the set-theoretic concept of "currying," where a map $A\times B\to C$ can be reinterpreted as a map from $A$ into the set of maps from $B\to C$ (here $A,B,C$ are sets and maps are not in any special algebraic sense homomorphisms, they are just maps).
Tensor products are the formal machinery behind the concept of "extension of scalars." For instance, given a real vector space $V$, how could we make it a complex vector space? We aren't a priori allowed to multiply vectors by nonreal scalars, but if we pretend we can (just look at the space $V\oplus iV$ with the obvious notion of complex scalar multiplication) we have a complex vector space. This process is called complexification, and it can be done simply by tensoring $V$ over $\bf R$ against $\bf C$, i.e. the complexification may be given by $V_{\bf C}\cong{\bf C}\otimes_{\bf R}V$. This allows us to left multiply by complex scalars in a consistent manner.
Going from real to complex vector spaces is not all it is limited to, though. If $V$ is a $K$-vector space and $L/K$ is an extension field of $K$, we can make $V$ an $L$-vector space via $L\otimes_KV$. Given that $L$ is itself a $K$-vector space, we could make a $K$-basis $\{\ell_i\}$ for it (could be infinite, even uncountable), and extend the scalars via $\bigoplus_i \ell_i V$ formally, but tensoring is succinct and coordinate-free. The very same ideas apply to modules, which are more general than vector spaces.
When we allow our vector space to have a multiplication operation compatible with the linear structure (so, a $K$-algebra), we can extend the multiplication to the tensor product. This allows us to "glue" algebras together (more than just tacking on extra scalars). Or rings in general, actually.
In particular, for $R$ a ring, $R[x]\otimes_R R[y]\cong R[x,y]$ as polynomial rings. The multiplication operation is extended from $(a\otimes b)(c\otimes d)=ac\otimes bd$ via linearity (which can be seen to be well-defined).
Finally, KCd mentions in passing induction and restriction of representations. As a representation $V$ of $G$ over $K$ may be viewed as a $K[G]$-module, induction can be seen as ${\rm Ind}_H^GV\cong K[G]\otimes_{K[H]}V$, although probably the more natural definitions are "induction is the left-adjoint (and coinduction is the right-adjoint) of restriction," (see adjoint functor) which is the categorical version of the statement of Frobenius reciprocity.
By quotienting a tensor power $V^{\otimes n}:=V\otimes V\otimes\cdots\otimes V$ by certain relations, we can obtain the exterior power $\Lambda^nV$ (there is also a symmetric power), which is spanned by alternating multilinear symbols of the form $v_1\wedge v_2\wedge\cdots\wedge v_n$. This allows for a new definition of the determinant map and hence of characteristic polynomials too, and it also allows the creation of the exterior algebra of differential forms, a very intrinsic way of working with geometric, multidimensional infinitessimals (informally speaking). Another application of tensor powers: by directly summing tensor powers of Lie algebra representations, we obtain the universal enveloping algebra $U({\frak g})$.
Tensor products turn multilinear algebra into linear algebra. That's the point (or at least one point).
They let you treat different kinds of base extension (e.g., viewing a real matrix as a complex matrix, making a polynomial in ${\mathbf Z}[X]$ into a polynomial in $({\mathbf Z}/m{\mathbf Z})[X]$, turning a representation of a subgroup $H$ into a representation of the whole group $G$) as special instances of one general construction.
They provide a mathematical explanation for the phenomenon of "entangled" states in quantum mechanics (a tensor that is not an elementary tensor).
See Why is the tensor product important when we already have direct and semidirect products? for more answers to your question (it's a duplicate question).
When I studied tensor product, I am lucky to find this wonderful article by Tom Coates. Starting with the very trivial functions on the product space, he explains the intuition behind tensor products very clearly.