Why does our voice sound different on inhaling helium?

In order to properly understand this without any unnecessary "controversy", let's break the whole process of sound generation and perception into 5 important, but completely separate parts. We'll then proceed to explain each part using a few different examples and pieces of derivative logic:

  1. Vibration of the vocal folds
  2. Transmission of energy from vocal folds to air in the vocal tract
  3. Resonance and Attenuation in the vocal tract
  4. Transmission of energy from the end of the vocal tract (mouth) to the surrounding medium
  5. Reception and perception of sound by another human.

Now:

1. The frequency generated by the vocal folds depends on the tension exerted on them and surrounding muscles. This is a neuromuscular process and is NOT affected by Helium or any other gas (at least in the short term).

So our vocal folds continue to vibrate at the same frequency in helium as in normal air.


2. Sound is produced by the transmission of the vibrations produced in the vocal fold, to the air in the vocal tract. This "transmission" doesn't occur by any magic. The vocal folds - as they vibrate - push and pull columns of air in their immediate vicinity, not very different from the way you may push a child on a swing at specific intervals, so as to produce sustained oscillations, and brief enjoyment. (The "pull" in this analogy though, is provided by gravity).

The point is, the child oscillates at the same frequency at which you are pushing the swing. i.e. If you are pushing the swing once every N seconds, the child also completes a swing once every N seconds. This is true regardless of the weight of the child, correct?

Similarly, the air in the vocal tract, also vibrates at the same frequency as the vocal chords. This fact, is also true regardless of the mass of the air particles.

In other words, the frequency of sound does not change, regardless of the medium in which it is transmitted.

Time-Out

The last one was a doozy. Frequency of sound does not change? Then why on earth does helium sound different from normal air?

While the frequency of sound does not change, the SPEED of sound does. Why? Consider this old classical physics equation:

Kinetic Energy = 0.5 * m * v^2,

where m = mass and v = speed (let's not say 'velocity' for now)


Now the vocal chords, vibrate with the same Force N at the same frequency T. Thus the energy it conveys must be the same in ALL media.

In other words, for a given constant value of Kinetic Energy, v^2 is inversely proportional to the mass of the particles.

This naturally means sound travels faster in Helium than in air.

Now, we know the other old equation:

Speed = Wavelength x Frequency

Now since we know that the FREQUENCY of sound is the same in Helium and in Air, and the speed of sound is greater in Helium, it follows that the Wavelength of sound is greater in Helium than in Air. This is a very important conclusion, that bears directly on our next deduction.


3. Now, we have a very important conclusion in our kitty - "Wavelength of sound is greater in Helium than in Air".

Remember that the vocal tract is often modelled (simplistically) as an open or closed tube. To refresh why that's important, see Wikipedia.

The vocal tract is actually not really a cylinder, but a fairly complex shape. This means it has areas of constriction and expansion that change depending on the position of your tongue, tension in the tract, and several other factors.

So in a sense, in these complex configurations, the vocal tract can be modelled as a series of tubes of varying diameters and varying levels of "closure" of either openings.

Now this means, that different parts of the vocal tract, depending on their geometrical configuration and their material characteristics, resonate with different WAVELENGTHS of sound.

Notice I said WAVELENGTHS and not FREQUENCIES. In common parlance, "Frequencies" is often used since W and F are directly inter-related in a common medium. However, even if we change the medium through which sound is being propagated, the interaction of sound waves with open and closed tubes depends strictly on its wavelength and not its frequency.

Now would be a good point to return to the marquee conclusion we drew from point 2 - "Wavelength of sound is greater in Helium than in Air".

This leads us to the following KEY/FINAL CONCLUSIONS:

In a vocal tract filled with Helium:

1. The frequencies of sound do not change

2. The wavelengths of sound DO change

3. Because the wavelengths have changed, the portions of the sound spectrum produced by the vocal chords that are attenuated and resonated by different portions of the vocal tract, also change.

4. This results in the sound spectrum output by the combination of the vocal chords and vocal tract in Helium, being different from the sound spectrum output in normal Air.

5. This means, the net distribution of energies among high and low frequencies (or the timbre) changes with a change in sound medium. Whereas the fundamental frequency of the sound (closely related to pitch) does not change.

Let's look at the spectrogram of two sample sounds helpfully provided in the NSW article.

enter image description here

Unfortunately due to the experimental conditions the two sounds do not have the same content (different sentences are spoken) and therefore the spectrogram cannot be exactly relied upon. However, the fundamental frequency in both is roughly the same and therefore supports our conclusion that the pitch is the same. Since different words are used in either sound, a timbre comparison cannot be made (since the difference in energy distributions visible in the spectrogram can be attributed to the different words spoken).

Also, for simplicity and ease of understanding a "Melodic Spectrogram" has been used in favor of the raw, noisier spectrogram. It was generated using Sonic Visualizer.



We are not Done!

We started with the promise of explaining sound transmission and reception/perception in FIVE parts. We are done with only three. Let's get through the remaining parts very quickly.

4. Transmission of sound from mouth to air - As covered by point 2, with a change in medium, the sound frequency does not change, but the wavelength does. This means that the only effect of filling a room with helium as well (rather than just the vocal tract) is to increase the wavelength of the sound.

5. The above has no impact on sound perception. The ear and brain together are primarily a FREQUENCY receiver. The ear translates air pulsations into hair cell oscillations, which then translate to synchronous pulses on attached neurons. Since the timing of the pulses is correlated ONLY to frequency, and the timing of the pulses is what produces notions of pitch, timbre etc, we can safely assume that the ear transcribes sound to the brain faithfully based on frequency. Wavelength has no impact on this process.

However, the ear, just like the vocal tract, is non-linear. Which means that it too, is going to attenuate/resonate some sounds (the specific non-linear properties of the cochlea are still being studied). However, UNLIKE the vocal tract, the ear/cochlea is a sealed, fluid-filled chamber. The properties of the cochlea are not affected by surrounding air but only by the fluid, which of course could be affected by blood composition and other biological factors. But NOT the immediate environment.

Thus at the root of all the confusion around production and reception of sound in alternative media like Helium, is that the vocal tract's non-linear characteristics are affected by the surrounding medium, whereas the ear's are not. That's it.


Let's start with the experimental facts. Here are some Fourier power spectra that I recorded of myself singing the vowel "ah" with air (top) and helium (bottom) in my lungs:

power spectrum with air

spectrum with helium

The was with me attempting to do the same thing with my vocal tract in both cases. The two sounds clearly do not differ much in the spacing of the "picket fence" of harmonics, which corresponds to the fundamental frequency. The first (air) has a fundamental at about 126 Hz, the second (He) about 124 Hz. Given the not-so-great quality of the spectra, I would say these are basically the same to within experimental error.

The distribution of power among the different harmonics is clearly different, which explains why the timbres sound so different.

This is different from the physics of a wind instrument. E.g., you can put a helium-filled balloon over the mouthpiece of a toy recorder, and the pitch clearly pops up high -- the fundamental really does change. The fundamental of a saxophone really does relate in the way you'd expect to the length of the air column.

So some of the physics explanations I've heard of this don't sound quite right.

There is the theory that the vibration of the vocal cords must obviously depend only on their mechanical properties, not on the medium. I don't think this is at all self-evident. They are strongly coupled to the medium. If you look at similar systems such as reed instruments, it is not true that the reed vibrates at a frequency set by its own mechanical properties (inertia and stiffness). E.g., in a saxophone the frequency of the reed is set entirely by the effective length of the air column.

There is also the claim that the vocal tract acts as an air column, so that the wavelength of the sound is required to be $\lambda_0/n$. I don't think this is really true either. The vocal tract is a complicated system of coupled, resonating cavities, but to a first approximation I think it's basically a Helmholtz resonator. See, e.g., Rivero. The frequency of a Helmholtz resonance is set by its volume and the dimensions of the opening, not by its length when considered as an air column.

I think what's going on here is probably somewhat more complicated than any of the simple explanations. A reed is a highly nonlinear thing, and I suspect the vocal cords are as well. You have resonances with some width to them, and the width probably matters. This width is going to be set by things like the sizes of the openings and the efficiency with which sound is radiated from the mouth. (Efficient radiation should give a low $Q$ and therefore a wide resonance.) Without considering these complications carefully, I don't see how one can hope to explain ab initio why the vocal tract behaves differently from a reed instrument.

Rivero et al., "Approach Model of Speech Production Using Helmholtz Resonator and Wave Equation," DOI 10.1109/EMS.2010.102 , https://ieeexplore.ieee.org/document/5703676