Tensor product vs. Cartesian Product for composite quantum systems
Difference between Cartesian and tensor product
When the Cartesian product is equipped with the "natural" vector space structure, it's usually called the direct sum and denoted by the symbol $\oplus$. As other answers state, the direct sum (Cartesian product) and the tensor product of two vector spaces can be clearly seen to be different by their dimension.
If $\{v_i\}$ and $\{w_i\}$ are basis of $V$ and $W$, we have that $\{v_i\}\cup\{w_j\}$ is a basis of $V\oplus W$ and $\{v_i\otimes w_j\}$ is a basis of $V\otimes W$. Therefore,
$$\operatorname{dim}(V\oplus W)=\operatorname{dim}V+\operatorname{dim}W$$
$$\operatorname{dim}(V\otimes W)=
\operatorname{dim}V\cdot\operatorname{dim}W$$
As you can see, the "Cartesian product" behaves more like a sum when dealing with vector spaces whereas the role of a product is adopted by the tensor product.
Composite systems
Now, to get some intuition about why we should use the tensor product of the spaces of states of two different quantum systems when we want to describe the composite system we can use the following analogy.
Consider two classical systems with a finite number $m$ and $n$ of states $\{s_i\}$ and $\{r_i\}$. When describing the joint system, would we want to have as space of states the union or the Cartesian product of the original sets?
We would like the product, because we expect to have all the possible combined states $\{S_{ij}=(s_i,r_j)\}$ for all $i$, $j$. As you can see, number of elements of the new set is $m\cdot n$.
In the quantum analog of this setting, the two subsystems are described by vector spaces whose basis are $\{s_i\}$ and $\{r_i\}$. In the same way, the composite system will have a basis $\{S_{ij}= (s_i,r_j)\}$ (which we write as $S_{ij}=s_i\otimes r_j$) so it must be the tensor product.
Note: The key of this argument is the observation that the set whose number of elements is the product is the Cartesian product (classical case), whereas the vector space whose dimension is the product is the tensor product (quantum case).
Theorems from linear algebra 101
You seem to mix up some basic math facts which don't depend on anything from quantum mechanics or physics:
- The Cartesian product of vector spaces $A\times B$ and the direct sum of vector spaces $A\oplus B$ are the same thing*.
- If $A$ and $B$ are sets, then $|A\times B|=|A|\cdot |B|$, where $|X|$ is the cardinality of the set $X$.
- If $A$ and $B$ are vector spaces, and $\operatorname{dim}(A)$ denotes the dimension of $A$, then $\operatorname{dim}(A\times B)=\operatorname{dim}(A)+\operatorname{dim}(B)$
- Finally, $\operatorname{dim}(A\otimes B)=\operatorname{dim}(A)\cdot\operatorname{dim}(B)$
These formulas all work even if the numbers are infinite, in which case they're cardinal numbers. This has no dependence on quantum mechanics. There is no difference with the classical case, because these aren't quantum or classical, they're just mathematical definitions/theorems.
An example
The cartesian product of $\mathbb{R}^3$ with itself is spanned by $(e_1,0)$, $(e_2,0)$, $(e_3,0)$, $(0,e_1)$, $(0,e_2)$, $(0,e_3)$. It's 6 dimensional.
The tensor product of $\mathbb{R}^3$ with itself is spanned by $e_1\otimes e_1$, $e_2\otimes e_1$, $e_3\otimes e_1$, $e_1\otimes e_2$, $e_2\otimes e_2$, $e_3\otimes e_2$, $e_1\otimes e_3$, $e_2\otimes e_3$, $e_3\otimes e_3$. It's 9 dimensional.
Why do we use the tensor product and not the cartesian product?
It's easiest to compare with a classical probability distribution. To describe the most general probability distribution over three possible states, you need three real numbers (plus the constraints of probabilities being positive and adding to one). To describe the most general probability distribution over two separate systems each with three separate states, you need nine real numbers. The probability object 1 is in state 1 and object 2 is in state 1, the probability object 1 is in state 2 and object 2 is in state 1, and so on. So you can see how this corresponds to nine dimensions: the dimension of the tensor product. The Cartesian product is not of high enough dimension to store this information in a sensible/straightforward manner.
*($A\times B=A\oplus B$, but in the case of the product/sum of infinitely many vector spaces they are distinct: $\prod_i A_i\neq \bigoplus_i A_i$. This wouldn't be something covered in introductory classes. The deep distinction between the two is that one is a category theory product and one is a category theory coproduct, but that's not useful or relevant for introductory quantum mechanics.)