why is the definition of the determinant so weird?

Look at the properties that signed volume has. Think of it as a function $d : \mathbb{R}^n \times \cdots \times \mathbb{R}^n \to \mathbb{R}$.

(i) It is multilinear. (ii) If you swap two parameters, you switch the sign. (iii) $d(e_1,....,e_n) = 1$.

From these properties alone you can derive the textbook formula.

So, if you think of the above as defining the determinant, the definition is far from weird.

Addendum:

Here is a quick sketch of how we obtain the formula:

As Ian noted in the comments (iii) says that the determinant of the identity is one.

From (ii), we can show that if two of the parameters to $d$ are the same then the result is zero.

Take a square matrix $A$. Then the $j$th column is $\sum_{\sigma=1}^n A_{\sigma,j} e_\sigma$.

Using (i) we have $\det A = \sum_{\sigma_1 =1}^n \cdots \sum_{\sigma_n =1}^n A_{\sigma_1,1} \cdots A_{\sigma_n,n} d(e_{\sigma_1},...,e_{\sigma_n})$.

Now note that $d(e_{\sigma_1},...,e_{\sigma_n}) = 0$ whenever any index is repeated. Hence we can replace the sum $\sum_{\sigma_1 =1}^n \cdots \sum_{\sigma_n =1}^n$ by $\sum_{\sigma \in S}$, where $S$ is the set of permutations $\sigma: \{1,...,n\} \to \{1,...,n\} $.

Hence we have $\det A = \sum_{\sigma \in S} A_{\sigma_1,1} \cdots A_{\sigma_n,n} d(e_{\sigma_1},...,e_{\sigma_n})$.

Using (ii) & (iii), we can show that $d(e_{\sigma_1},...,e_{\sigma_n}) = \operatorname{sgn} \sigma$, and we end up with $\det A = \sum_{\sigma \in S} \operatorname{sgn} \sigma A_{\sigma_1,1} \cdots A_{\sigma_n,n} $.


Take a system of equations where the coefficients are variables, e.g.

$$a x + b y = e$$ $$c x + d y = f$$

Solve it:

$$x = \frac{d e - b f}{a d - b c}, \qquad y = \frac{a f - c e}{a d - b c}$$

Notice that each expression has the same denominator, namely $a d - b c$. This can be proven to hold for an arbitrary $n \times n$ system, and the thing in the denominator is the determinant of the system. [1]

This can be made into a formal definition; see for instance the expository paper,

  • Garibaldi, Skip. “The Characteristic Polynomial and Determinant Are Not Ad Hoc Constructions.” The American Mathematical Monthly, vol. 111, no. 9, 2004, pp. 761–778. http://www.jstor.org/stable/4145188.

[1] To be entirely fair, we could also have chosen the negative of what we normally call the "determinant." If you like to think of the determinant as a signed volume, this choice is equivalent to whether we choose a left-handed or right-handed orientation on space. Of course such a choice is completely arbitrary, but makes little difference as long as we all agree on the same one.


Another way to interpret the determinant arises naturally from the alternating product construction on vector spaces. Briefly, given a vector space $V$, then $\Lambda^n V$ is a vector space defined to be generated by formal expressions of the form $x_1 \wedge \ldots \wedge x_n$ for $x_1, \ldots, x_n \in V$, subject to the relations:

  1. The wedge product is linear in each of the terms, i.e. \begin{equation} x_1 \wedge \ldots \wedge (\lambda_1 x_i + \lambda_2 x_i') \wedge \ldots \wedge x_n = \lambda_1 (x_1 \wedge \ldots x_i \wedge \ldots x_n) + \lambda_2 (x_1 \wedge \ldots x_i' \wedge \ldots x_n). \end{equation}
  2. The wedge product is zero if any two adjacent terms are equal: \begin{equation} x_1 \wedge \ldots \wedge y \wedge y \wedge \ldots x_n = 0. \end{equation} (Note that since also $\ldots \wedge (y+z) \wedge (y+z) \wedge \ldots = 0$, this implies that \begin{equation} \ldots \wedge y \wedge z \wedge \ldots = -(\ldots \wedge z \wedge y \ldots). \end{equation} This is the reason for the name "alternating product" or "antisymmetric product".)

Now, it turns out that if $V$ is an $n$-dimensional vector space, than $\Lambda^k V$ is an $\binom{n}{k}$-dimensional vector space; and in particular $\Lambda^n V$ is a 1-dimensional vector space. Also, for any linear transformation $T : V \rightarrow W$, it is easy to define a corresponding linear transformation $\Lambda^k T : \Lambda^k V \rightarrow \Lambda^k W$ such that $(\Lambda^k T)(x_1 \wedge \ldots \wedge x_k) = Tx_1 \wedge \ldots Tx_k$.

Now, the interpretation of the determinant is as follows: given a linear operator $T : V \to V$ on an $n$-dimensional vector space, then $\det T$ is simply defined to be the unique scalar such that $\Lambda^n T$ is equal to multiplication by $\det T$. And for a matrix $A \in M_{n \times n}(F)$, $\det A$ is the determinant of the corresponding linear operator on $F^n$.

This definition has some distinct advantages: for example, it's clear from it why the determinant is multiplicative: $\det(T \circ U) = \det(T) \det(U)$. It also gives a relatively natural proof that the determinant of a singular linear operator is 0: just choose a basis including a vector in the null space. On the other hand, actually proving the dimension of $\Lambda^k V$ turns out to be most straightforward using the determinant as a tool - which would make it a circular definition. However, even in that "bootstrapping" phase, keeping this other definition in mind can definitely help in motivating a formulation of the actual initial definition of determinant.