Why multiplication of matrix is not done in the same way as matrix addition (i.e. adding corresponding entries)?

Suppose \begin{align} p & = 3x+10y, \\ q & = 7x-2y, \end{align} and \begin{align} a & = -13p + 9 q, \\ b & = \phantom{+}6p + 5q. \end{align} Then \begin{align} a &= 24 x -148y, \\ b & = 53x + 50y. \end{align} That's why matrix multiplication is defined the way it is.


In short, matrix multiplication is defined the way it is because it corresponds to the composition of linear transformations on (finite dimensional) vector spaces. This is what Cameron Williams mentioned in the comments.


For a very simple example of this in action, consider rotations of vectors in the plane $\mathbb{R}^2$. A rotation is an example of a linear transformation, which can be represented as a matrix. For instance, rotation counterclockwise by $90^\circ$ can be represented by the matrix

$$ R_{90} = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix} $$

Note that if you start with the vector $(1,0)^T$ then $R_{90}$ rotates it by $90^\circ$ as follows

$$ \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 0 \\ 1 \end{pmatrix} $$

In the same way if you do the calculation with any other vector in $\mathbb{R}^2$, the matrix $R_{90}$ will just rotate the vector by $90^\circ$. You can obtain a matrix like this for any degree of rotation. For example

$$ R_{180} = \begin{pmatrix} -1 & 0 \\ 0 & -1 \end{pmatrix} $$

is the matrix corresponding to rotation counterclockwise by $180^\circ$. Try it out on a few vectors if you haven't seen this before.

Now to the point. Note that rotations may be composed (i.e. we may perform one rotation after the other to achieve a total rotation). Composing two $90^\circ$ rotations results in an overall $180^\circ$ rotation. In the same way that we can naturally compose these linear transformations it would be nice if there was a way to "compose" the matrix $R_{90}$ with itself to produce the matrix $R_{180}$. Well, because of the way matrix multiplication is defined, we can do just this by multiplying $R_{90}$ by itself. Observe that

$$ R_{90}\cdot R_{90} = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix} = \begin{pmatrix} -1 & 0 \\ 0 & -1 \end{pmatrix} = R_{180} $$


Matrices are sometimes multiplied in this way. This is known as Hadamard product, Schur product, or entrywise product, and comes up in the theory of association schemes, for example.

Matrix multiplication is often taught as a completely arbitrary operation. This is similar to defining multiplication of integers by specifying the long multiplication algorithm. It is much better to think of matrix multiplication semantically. There are at least two prominent semantic ways to think about matrices:

  1. A matrix is a way of representing a linear transformation given a specific basis for the underlying vector space. When you compose two linear transformations, the matrix of the composed transformation is obtained by matrix multiplication.

  2. A system of linear equations can be expressed in the form $Ax = b$. When $x$ and $b$ have the same dimensions and there is a unique solution, the unique solution is given by the formula $x = A^{-1}b$, where $A^{-1}$ is the multiplicative inverse of $A$ with respect to matrix multiplication.

There are other situations in which matrix multiplication natively occurs, but these are perhaps the two simplest ones. Also, it's not really a coincidence that the same notion of matrix multiplication works in both situations — we can think of a system of linear equations as a specification for a linear transformation, and we use this insight to solve the system.