How to distinguish female and male voices via Fourier analysis?

This has been extensively studied in linguistics and acoustics. Humans and other primates predict speaker gender through a combination of fundamental frequency $F_0$ ("pitch") and Vocal-Tract-Length estimates ($VTL$) which are a proxy for body size.

Sometimes "formant dispersion" is used for $VTL$. It is usually defined as $$\frac{\sum_{i=1}^n(F_{i+1}-F_i)}{n-1}$$where $F_i$ is the $i$th formant frequency and $n$ is the number of formants measured. However this measure is problematic and does not capture information about midrange formants or about formant positioning. See Masculine voices signal men's threat potential in forager and industrial societies

An alternative $VTL$ measure is 'formant position', defined as:$$\frac{\sum_{i=1}^nF'_i}{n}$$where $F'_i$ is the $i$th formant standardized across the population measured.

However, the usual finding is that a combination of pitch and estimates of vocal tract length give us information about speaker gender and sexual maturity. Looking at male vs female spectra, on average you'd see male voices lower-pitched and and more closely-spaced formants.

Acoustic correlates of talker sex and individual talker identity are present in a short vowel segment produced in running speech

Vocal Tract Length Perception and the Evolution of Language

Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques, but see Formant frequencies and body size of speaker: a weak relationship in adult humans


My impression would be that the lower frequencies are more apparent in the male spectrum than the female spectrum.

If you want to build a nice test, my approach would be to determine some average male and average female spectrum. Then you can see which of your average or most common spectrum correlates best the test person.

However, you should take are about the noise in the measured spectrums