Why can't the human voice produce a Shepard tone?

The human voice box produces a fundamental frequency and its harmonics because the mechanism is like that of a relaxation oscillator. However, we have limited control over the relative amplitude of the harmonics (we do have some - that is how we change the "color" of a tone we sing, and the sound of vowels).

In order to produce the Shepard scale, you need to be able to control the relative amplitude of the different harmonics - especially the ratio of the lowest two harmonics. To a limited extent we do this when we change the vowel that we sing - with the "oo" sound having few "really high" harmonics, while the "ah" has lots. For example, from the hyperphysics site we get this image:

enter image description here

showing that there is a lot or harmonic content in the voice. But it's not "evenly distributed" - so if you were to drop by an octave, you are creating a sound that is sufficiently different that you don't really get the feeling that you have an "eternal" scale.

I suspect the most important problem is that you would want to re-introduce the lowest harmonic with a slowly increasing amplitude, so that the note "returns to the lower range" without ever appearing to jump there. But the mechanism of the vocal chords is too simple to allow it.

Incidentally, when sopranos sing very high notes, many people lose the ability to distinguish what vowel they are singing since the harmonics are further apart, and the ear distinguishes between vowels by estimating the shape of the frequency envelope in the range up to a few kHz; when there are very few harmonics in that range, the shape cannot be determined. The "high C" (C7) has a frequency of 2093 Hz, so there might be just a couple of harmonics available to figure out the sound. That makes vowels in the highest register hard to distinguish.


i've programmed some shepard tones and even a voice generator.

The human voice can't make that sound for the same reason that a single or even 3 trumbones couldn't make it. if you had 12 trumbones you could conceivably put them on a wheel system so that the pitch of each is increased and when the top one reaches to top is muted and send down to the lowest pitch. Perhaps someone has built a mechanical shepherd tone but i doubt it, and to emulate the sound with voice would require multiple singers. It is generally a digital effect not an acoustic instrument one.

The human voice is a monophonic sound generator (except for tibetan tantric voice) with one major output channel, the mouth, and some lower volume output channels, i.e. the cheeks throat and nose, all coming from a single voice box.

It is the polyphonic nature of the shepard tone which confuses the ear by giving it too many harmonics to clearly define in tone at a given time, it is similar to a chord of 12 or 20 notes, a very wide array of tones.

A shepard tone requires either multiple oscillators changing pitch or multiple static oscs going through multiple filters. The one i found on youtube is especially good because it uses about 50 sines with soft attacks so it's difficult to tell one sound from the next.

The human vocal box cannot do something similar because it would need to generate at least a dozen controlled harmonics simultaneously for a basic shepard tone illusion, tones which are equally spaced and cyclic in nature, i.e. the lowest tone's amplitude increases as the highest tone decreases.

Humans can barely make a low tone and a high tone simultaneously and independently so that the tone of one can be controlled precisely relative to the next, and the volume can be controlled precisely one relative to the next. The voice box certainly can't produce multiple harmonics of equal volume and constant pitch spacing and control their volume.

Also the human voice struggles to produce a single clear carefully pitched tone, and multiple carefully controlled signals prior or after filtering are necessary.

The voicebox would have to have multiple independent resonators.