Why do different letters sound different?

You don't sing a single pitch - you sing a frequency and its harmonics. Using a simple spectrum analyzer, this is me "singing" the letter A and M, alternately (AMAMA, actually):

enter image description here

The letter "A" is the one with more harmonics (brighter lines at higher frequencies), the letter "M" seems to have a bigger second harmonic. The frequency scale is not calibrated correctly (cheap iPhone app...)

Here are two other shots, side by side (M, then A). You can see that the 2nd harmonic of the M is bigger than the first; by contrast, the higher harmonics from the A are dropping off more slowly: enter image description here enter image description here

Simple vowels have this in common: the shape of your mouth changes the relative intensity of harmonics, and your ear is good at picking that up. Incidentally, this is the reason that it is sometimes hard to understand what a soprano is singing - at the top of her range, the frequencies that help you differentiate the different vowels might be "out of range" for your ears.

For short ("plosive") consonants (P, T, B, K etc), the story is a bit more complicated, as the frequency content changes during the sounding of the letter. But then it's hard to "sing" the letter P... you could sing "peeeee", but then it's the "E" that carries the pitch.

The app I used for this is SignalSpy - I am not affiliated with it in any way.


The basic frequency is determined by the vocal cords. They make the air flow pulsate with a frequency of 100 Hz to 200 Hz. The pulses are short, so there are overtones upp to several kHz.

The mouth and tongue make the vocal tract resonant at different frequency ranges. Those are called formants. Have a look at the formant map here: https://www2.warwick.ac.uk/fac/sci/physics/staff/academic/bell/sonify/ttm/sound_files/


Here's something different to note. Let's say I would tell a band to play the musical note "C3". The bass, the guitar, the piano, the voice, the banjo, all of them sound different and yet we perceive them as the same note that has been played.

Similar, think of a sung "A" and a sung "B" (as in "bee") as an instrument respectively. They have their own unique "sound" to them, and yet they can both be used to create the same "musical note" of a certain given pitch and volume.

What makes a C3 note of a sung "A" different from a C3 note from a sung "B" then? (Or what makes a C3 of a piano different from a C3 of a guitar?)

Note what "same pitch and volume" actually means. I'll keep it simple.

Pitch: perceived frequency

Volume: air pressure or amplitude

Here are two pictures to illustrate what I mean:

enter image description here enter image description here

Both of them have the same amplitude, or volume.

Both of them have the same perceived frequency, or pitch.

Thus both of them play the same musical note we perceive.

But looking at the wave form, you could probably tell that they will sound differently, even though we would perceive them as the same note.

This difference is similar to a piano C3 vs a guitar C3.

Essentially: The same perceived frequency and air pressure creates the illusion of the same musical note perceived by a listener. Completely different wave forms (sounds) can be perceived as the same musical note, as long as their wave form "look the same" (the two pictures above illustrated what I mean with that).

So a sung "A" and a sung "B" are actually quite different from each other. But if sang with the same pitch, they will produce the same musical sound (as perceived by humans).

Source of the images used