Why don't humans perceive sound waves as twice the frequency they are?
Humans hear the correct perceptive signal for a sound wave of that frequency.
We really can't say much more than that. The psychology of acoustics are very complicated and could fill volumes.
It's closer to say we have cells which act resonant at a specific frequency. Our brain identifies which cells are resonating at any point in time, and constructs the signal from that. Our brains receive information that cell A or cell B is signalling. The association between those neural signals and frequencies is a learned response that we pick up early on, as an infant or perhaps even in the womb.
So obviously the audible frequency is twice the envelope
Sorry, that's wrong. If you play two tones (say 440 Hz and 267 Hz), you simply hear two tones at two different frequencies and you have two excitations at different spots on the basilar membrane and two different sets of nerves firing. You don't hear the envelope at all, they just sound like two steady-state tones.
"Beats" only happen when you have two frequencies that are VERY close together, say 237 Hz and 238 Hz. In this case, your ear can't resolve the frequency difference anymore but you hear a single tone at 237.5 Hz that's amplitude modulated at 1 Hz.
Taking the magnitude (as wikipedia says, i.e. by squaring A) gives you an audible frequency of 2fT
No. You can square the amplitude to estimate power or energy but there is no mechanism that would square the actual waveform. If you play 100 Hz, you hear 100 Hz, that's all there is to it.
The human perception of a wave at frequency $f$ is the human perception of a wave at frequency $f$. There is no "objective" qualia for frequency $f$ other than what people perceive, so it's nonsensical to ask whether people, when they hear $f$, perceive $2f$; there is no meaning to "perceive $2f$" other than "experience the qualia associated with $2f$", and clearly when someone hears $f$, they experience that qualia associated with $f$, not $2f$.
The human ear basically is a device for detecting components of the Fourier transform of sound. The reason that $f_2-f_1$ dominates with beats is that if $f_2+f_1$ is high enough, then the $f_2-f_1$ component will not be significantly affected by multiplying by a $f_2+f_1$ wave.