Why is the determinant defined in terms of permutations?
This is only one of many possible definitions of the determinant.
A more "immediately meaningful" definition could be, for example, to define the determinant as the unique function on $\mathbb R^{n\times n}$ such that
- The identity matrix has determinant $1$.
- Every singular matrix has determinant $0$.
- The determinant is linear in each column of the matrix separately.
(Or the same thing with rows instead of columns).
While this seems to connect to high-level properties of the determinant in a cleaner way, it is only half a definition because it requires you to prove that a function with these properties exists in the first place and is unique.
It is technically cleaner to choose the permutation-based definition because it is obvious that it defines something, and then afterwards prove that the thing it defines has all of the high-level properties we're really after.
The permutation-based definition is also very easy to generalize to settings where the matrix entries are not real numbers (e.g. matrices over a general commutative ring) -- in contrast, the characterization above does not generalize easily without a close study of whether our existence and uniqueness proofs will still work with a new scalar ring.
The amazing fact is that it seems matrices were developed to study determinants. I'm not sure, but I think the "formula" definition of the determinant you have there is known as the Leibnitz formula. I am going to quote some lines from the following source Tucker, 1993.:
Matrices and linear algebra did not grow out of the study of coefficients of systems of linear equations, as one might guess. Arrays of coefficients led mathematicians to develop determinants, not matrices. Leibnitz, co-inventor of calculus, used determinants in 1693 about one hundred and fifty years before the study of matrices in their own right. Cramer presented his determinant-based formula for solving systems of linear equations in 1750. The first implicit use of matrices occurred in Lagrange's work on bilinear forms in the late 18th century.
--
In 1848, J. J. Sylvester introduced the term "matrix," the Latin word for womb, as a name for an array of numbers. He used womb, because he viewed a matrix as a generator of determinants. That is, every subset of k rows and k columns in a matrix generated a determinant (associated with the submatrix formed by those rows and columns).
You would probably have to dig (historical texts, articles) to find out why exactly Leibnitz devised the definition, most probably he had some hunch/intuition that it could lead to some breakthroughs in understanding the underlying connection between coefficients and the solution of a system equations...
Hint:
Determinants appear in the solution of linear systems of equation, among others. If you permute the equations, the solution cannot change. Hence, the expression of a determinant must be insensitive to row permutations, and this is why they are a combination of terms involving $a_{ip_i}$.
This explains the pattern $$\sum_p \sigma_p\prod_i a_{ip_i},$$ where the operators are commutative and imply multilinearity of the expression. Also, the form must be antisymmetric so that two equal rows yield a zero determinant (causing failure of the solution) and this explains why $\sigma_p=\pm1$ indicates the parity of the permutation.