Why are matrices ubiquitous but hypermatrices rare?
Note that in linear algebra matrices describe at least two different things: linear maps between vector spaces (we consider only finite-dimensional vector spaces here) and bilinear forms. When thinking of matrices as tensors, linear maps between $V$ and $W$ are elements of the space $V^* \otimes W$, whereas bilinear forms between $V$ and $W$ are elements of $V^* \otimes W^* $. Now you can easily generalize the latter case to more than two spaces, but not the former. But it is the former case where several concepts like composition (matrix multiplication), determinants, eigenvalues etc. apply. (Note that eigenvalues and determinants can be defined for bilinear forms on a vector space equipped with an inner product, but not for bilinear forms on plain vector spaces). Of course you can consider spaces like $V^* \otimes W^* \otimes X$, but elements of this space are better thought as linear maps between $V\otimes W$ and $X$ than as three-dimensional hypermatrices. So what is special about the number 2 is that there is a notion of duality for vector spaces, but no "n-ality".
To such a complex problem, there cannot be a unique answer. I see many, which all justify the tremendous interest that mathematicians have devoted so far to matrices, rather than to hypermatrices.
Ubiquity. Matrices are used by every species of mathematicians, and beyond, by a large fraction of scientists. This is perhaps the only mathematical area to enjoy this versatility. Let me provide a few examples. Matrix exponential is fundamental in differential equations (more generally in dynamical systems) and Lie theory of groups. Symmetric matrices are used in quantum mechanics, statistics, optimisation and numerical analysis; they have deep relations with representation theory and combinatorics (see the solution of Horn's conjecture by Tao & Knutson). Positive matrices are encountered in probability and numerical analysis (discrete maximum principle). Matrix groups are used in representation theory, in number theory (including modular forms), in dynamical systems (because of symmetries). When depending on parameters, matrices enter in PDE theory as symbols.
Simplicity. The concept of matrix is by definition simpler than that of hypermatrices. It is natural that the study of matrices precedes that of HM. This argument will fade as time increases, of course.
Richness. What makes a field particularly attractive is that it involves several apparently unrelated concepts in order to produce unexpected results. This happens in matrix theory, because on the one hand, we may view them as linear maps (where conjugation is relevant) and on the other hand we may see them as bilinear or sesquilinear maps (where congruence is relevant). It becomes especially fruitful when we go back and forth between both points of view. This happens in the remarkable theorem that normal matrices are unitarily diagonalizable, but also in the parametrization of a Lie group by its Lie algebra via the exponential and the Hermitian square root. I am not at all aware of the theory of HM, but if they do not form naturally an algebra, I doubt that their theory could be so rich, or if it is, it will be for completely different mathematical reasons.
To temperate this pledge, let me say that hypermatrices have been studied (although not so deeply) under the name tensors. They are of great importance in differential geometry (Ricci curvature tensor, with the many identities named after Christoffel, Gauss, Codazzi, ...) and in its applications: general relativity, elasticity. These are undoubtedly difficult topics, where even simple problems are not well understood. To mention one of them, there is still no satisfactory description of the twice-symmetric tensors of fourth order ($a_{ijkl}=a_{jikl}=a_{ijlk}$) that satisfy the Legendre-Hadamard condition $$\sum_{i,j,k,l}a_{ijkl}x_ix_j\xi_k\xi_l\ge0,\qquad\forall x\in\mathbb R^n,\xi\in\mathbb R^d.$$ It seems to me that the use of HM is too scattered, and therefore there is no research community specializing on all their aspects. Edit. Likewise, the notion of rank, although correctly defined in the case of tensors, is hard to manipulate and to compute explicitly. This is the reason why the exact algorithmic complexity of the multiplication of matrices is still not known (the operation $(A,B)\mapsto AB$ in $M_n(k)$ may be viewed as a $3$-tensor, and its tensorial rank governs the number of operations needed in an $n\times n$ mulitplication).
An awfully simplistic answer: we work on two-dimensional paper, so two-dimensional matrices are very convenient to write down and compute with, while higher-dimensional hypermatrices are not.
So while we could represent multilinear forms, tensors, etc. as hypermatrices, we often don’t, because doing so is not nearly as fruitful as representing linear maps, bilinear forms etc. as matrices. Instead, we usually use other notations when working with higher tensors by hand.
In computer algebra, the dimension of the paper is not significant, while some kinds of abstraction are harder, so in this context, higher tensors are much more often represented as hypermatrices.