Why is tensor product of linear maps defined as $(S\otimes T)(v\otimes w)=S(v)\otimes T(w)$?
At some level, the tensor product of maps is, like most things in math, a convenient choice of definition. However, it arises naturally, in a precise sense.
The standard definition of a tensor product of two spaces, $V \otimes W$, actually provides more than a vector space constructed from $V$ and $W$. It is a universal construction, meaning that it satisfies a particular property, and is the best choice of a vector space that does so. There is a bilinear map $i: V \times W \rightarrow V \otimes W$ taking $(v,w) \mapsto v \otimes w$. Now for any bilinear map $f: V \times W \rightarrow U$, there exists a unique linear map $\tilde{f}: V\otimes W \rightarrow U$ such that $\tilde{f}\circ i = f$.
Now if $S: V \rightarrow V'$, $T: W \rightarrow W'$, we want to define a new map $S \otimes T: V \otimes W \rightarrow V' \otimes W'$. We choose this domain and range because we want the tensor product of maps to be compatible with the tensor product of spaces (in a precise sense: we want the association of a tensor product of spaces and maps to be a "bifunctor" - see here).
Now to actually define the map, we appeal to the universal property. Let $(S \times T)(v,w) = S(v) \otimes T(w)$. This map is bilinear from $V \times W$ to $V' \otimes W'$, so it induces the map $(S \otimes T)(v,w): V \otimes W \rightarrow V' \otimes W'$.
These concepts become a bit more clear if you're familiar with the language of category theory - natural constructions, universal properties, functors, etc.