Born Interpretation of Wave Function
Well there are multiple reasons, but a very important one is that it can be proven (from the Schrödinger equation) that
$$\frac{\mathrm d}{\mathrm dt}\int \mathrm d\boldsymbol x\ |\psi(\boldsymbol x,t)|^2=0$$
so that, if at any moment in time we have $\int \mathrm d\boldsymbol x\ |\psi(\boldsymbol x,t)|^2=1$, this will remain true at any other time.
On the other hand, the derivative of the integral of $|\psi|$ is not time independent, so a consistent normalization is not possible
We need that the integral to be time independent, because otherwise a probabilistic interpretation wouldn't be possible. We need that the probability of finding the particle somewhere has to be $1$. If we used $|\psi|$ as a probability distribution, and at any point in time had we that the integral equals $1$, this will change over time, what wouldn't make any sense. On the other hand, as I already stated, we can think of $|\psi|^2$ as a probability just because its integral is time-independent. So if at any point in time we have that the integral of $|\psi|^2$ equals $1$, this will remain true at any other point in time.
Also, this has nothing to do with the Schrödinger equation being of 2nd order: Dirac equation is a 1st order equation and (in some sense), the probability distribution is still $\psi^\dagger\psi$.
Edit: there is another explanation that might be more "physical", closer to our intuition. You probably know about the double slit-experiment, a standard way of introducing QM. When learning about such experiment, we are given two scenarios: first, think of the double slit being hit by light. We know from optics about the phenomenon of interference: the electromagnetic field is radiated from each slit, thus interfering when reaching the screen. The interference pattern is easily understood, mathematically, when we think of the electric field as a wave propagating through space. We know that the intensity observed at the screen is the modulus squared of $\boldsymbol E$, where $\boldsymbol E=\boldsymbol E_1+\boldsymbol E_2$. When calculating the modulus squared, we get the expected interference (crossed) term. The observed intensity is just $I(x)=|\boldsymbol E(x)|^2$.
On the other hand, if we think of the experiment when using electrons, we know that the interference pattern is still produced, so by being inspired from classical electrodynamics, we think of another wave propagating through space, such that its modulus squared gives the intensity on the screen, i.e., the modulus squared of the wave function is like the intensity of the light: where it is high, there is a high chance of finding an electron. In this way, we can think of $|\psi|^2$ as a probability distribution, in the same way we can think of $|\boldsymbol E|^2$ as a probability distribution of the photon. There is actually a lot from QM taken from classical electromagnetism.
For the record, I must say that this analogy between the electric field and the wave-function is rather limited, and should not be pushed too far: it will lead to incorrect conclusions. The electric field is not the wave function of the photon.
Your suspicion that the square of the modulus of the wave function has to be used as the probability density is indeed related with the Schrodinger equation. I approach the question by using eigen value equation theory. The Schrodinger equation is a eigenvalue equation. The wave function is a superposition of eigenfunctions of the Schrodinger equation. Or different words, the wave function can be developed into a complete set of eigen functions : $\Psi = \sum_l a_l\psi_l$ each of which fulfills the Schrodinger equation corresponding to a particular energy eigen value $E_l$: $H\psi_l =E_n\psi_l$. The coefficients $a_l$ express the probability (actually it is $|a_l|^2$) to find a particle with wave function $\Psi$ in state $\psi_l$.
The different $\psi_l$ have to be orthogonal to each other, not only due to mathematical reasons, also to a physical one. If an energy measurement is carried out and the it's found that the particle governed by the Schrodinger equation is found to be in energy state $E_n$ then it will stay in that energy state. That means after the measurement $\Psi =\psi_n$ and $a_l= \delta_{ln}$ (The wave function $\Psi$ collapses to a particular function $\psi_l$.). It is expressed in the quantum mechanical formalism as that the projection of the wave function $\psi$ on another energy state eigen function $m\ \neq n$ $\int \Psi^{\star} \psi_m dx =0$ whereas the projection of the wavefunction on the eigen function of state $n$ is one: $\int \Psi^{\star} \psi_n dx =1$. You can check this by putting $\Psi = \sum_l a_l\psi_l$ with $a_l= \delta_{ln}$ in the integral. The eigen functions of the Schrodinger eigen value equation fulfill very well this property (the eigen function can of course easily be normalized so that they fulfill $\int \psi_m^{\star} \psi_m dx =1$ for all $m$). From this we learn that in a particular state $m$ $\int |\psi_m|^2 dx =1$. The same consideration can be made for any other well defined operator and its corresponding eigen states. Finally as a physicist you would like to make sense of this relation $\int |\psi_m|^2 dx =1$. And Born ended up stating that it means that the $|\psi_m|^2$ is the probability density to find a particle in the interval $dx$. $x$ does not need to necessarily mean position, a priori it could also be momentum $p$ etc. But the concept remains the same: it would mean that $|\psi_m|^2$ is the probability density to find the particle in momentum interval $dp$. And obviously trying to define the probability density with $\int |\psi_m| dx =1$ does not make much sense as this expression $\int |\psi_m| dx =1$ does not appear in the eigen mode formalism. It is may be useful to stress that eigen value equations like the Schrodinger equation appear everywhere in physics for instance in electrodynamics and the eigen functions fulfill the same relations as those of the Schrodinger equation. This means the formalism does not fall from the sky, it's also used in classical physics. It the way to solve eigenvalue equations in mathematical physics. But on top of it a typical quantum mechanical assumption is used: The collapse of the wave function upon a measurement to a particular wave function. And finally the orthogonality relations get a particular physical, even quantum-mechanical sense.
My explanation is certainly not the most rigorous one, in full-fleshed QM-formalism with bras and ket vectors it can be shown more rigorously. And I recommend you to continue to read the book, although I don't know it it will certainly demonstrate this in some way in the following.