Why aren't active headphones equalized to a flat frequency response?

When you put something to your ear reproducing standard stereo recordings, you don't want a flat frequency response because the head-related transfer function that normally comes into play for a sound source much further away looks very different when the source is against your ear.

Let me quote you a couple of paragraphs from a book:

Of all the components in the electroacoustic transmission chain, headphones are the most controversial. High fidelity in its true sense, involving not only timbre but also spatial localization, is associated more with loudspeaker stereophony due to the well-known in-head localization of headphones. And yet binaural recordings with a dummy head, which are the most promising for true-to-life high fidelity, are destined for headphone reproduction. Even in their heyday they found no place in routine recording and broadcasting. At that time the causes were unreliable frontal localization, incompatibility with loudspeaker reproduction, as well as their tendency to be unaesthetic. Since digital signal processing (DSP) can filter routinely using binaural head-related transfer functions, HRTF, dummy heads are no longer needed.

Still the most common application of headphones is to feed them with stereo signals originally intended for loudspeakers. This raises the question of the ideal frequency response. For other devices in the transmission chain (Fig. 14.1), such as microphones, amplifiers and loudspeakers, a flat response is usually the design objective, with easily definable departures from this response in special cases. A loudspeaker is required to produce a flat SPL response at a distance of typically 1 m. The free-field SPL at this point reproduces the SPL at the microphone location in the sound field of, say, a concert being recorded. Listening to the recording in front of a LS, the head of the listener distorts the SPL linearly by diffraction. His ear signals no longer show a flat response. However, this need not concern the loudspeaker manufacturer, since this would also have happened if the listener had been present at the live performance. On the other hand, the headphone manufacturer is directly concerned with producing these ear signals. The requirements laid down in the standards have led to the free field calibrated headphone, whose frequency response replicates the ear signals for a loudspeaker in front, as well as the diffuse field calibration, in which the aim is to replicate the SPL in the ear of a listener for sound impinging from all directions. It is assumed that many loudspeakers have incoherent sources each with a flat voltage response.

enter image description here

(a) Free-field response: For want of any better reference, the various international and other standards have set up the following requirement for high-fidelity headphones: the frequency response and perceived loudness for a constant voltage mono signal input is to approximate that of a flat response loudspeaker in front of the listener under anechoic conditions. The free-field (FF) transfer function of a headphone at a given frequency (1000 Hz chosen as 0 dB reference) is equal to the amount in dB by which the headphone signal is to be amplified to give equal loudness. Averaging over a minimum number of subjects (typically eight) is required. [...] Figure 14.76 shows a typical tolerance field.

enter image description here

(b) Diffuse-field response: During the 1980s there began a movement to replace the free-field standard requirements with another, where the diffuse field (DF) is the reference. As it turned out, it has made its way into the standards, but without replacing the old one. The two now stand side by side. The dissatisfaction with the FF reference arose principally from the magnitude of the 2 kHz peak. It was held responsible for coloration of the image, since frontal localization is not achieved even for a mono signal. The way in which the hearing mechanism perceives coloration is described by the association model of Theile (Fig. 14.62). A comparison of the ear responses for diffuse field and free field is shown in Fig. 14.77. [...] Since the subjective listening test is the one that counts, FF headphones have so far been more the exception than the rule. A palate of different frequency responses is available to cater for individual preferences, and each manufacturer has its own headphone philosophy with frequency responses ranging from flat to free field and beyond.

enter image description here

This HRTF difference issue is also why angled drivers (in headphones) sound better to enough people that companies like Sennheiser sell such. Angled drivers don't fully make headphones sound like speakers though.

At factory or in a lab an artificial ear is used when measuring the frequency response. The one below is a lab-level one; the factory-level ones are a bit more simple.

enter image description here

I also found the methodology used by that HeadRoom site:

How we test frequency response: To perform this test we drive the headphones with a series of 200 tones at the same voltage and of ever increasing frequency. We then measure the output at each frequency through the ears of the highly-specialized (and pricey!) Head Acoustics microphone. After that we apply an audio correction curve that removes the head-related transfer function and accurately produces the data for display.

The microphone used is probably this one. It seems they actually invert the transfer function of the dummy head/ears via software because they say right before that that "Theoretically, this graph should be a flat line at 0dB."... but I'm not entirely sure what they do... because after that they say "A “natural sounding” headphone should be slightly higher in the bass (about 3 or 4 dB) between 40Hz and 500Hz." and "Headphones also need to be rolled-off in the highs to compensate for the drivers being so close to the ear; a gently sloping flat line from 1kHz to about 8-10dB down at 20kHz is about right." Which doesn't quite compile for me in relation to their previous statement about inverting/removing the HRTF.

Looking at some certificates that people got from the manufacturer (Sennheiser) for the headphone model (HD800) used in that HeadRoom example, it seems that HeadRoom displays the data without any assumed correction model for the headphone itself (which would explain why they give their later interpretation suggestions, so their initial "flat" suggestion is the misleading one), whereas Sennheiser uses DF (diffuse field) correction so their graphs look almost flat.

enter image description here

enter image description here

This is just a guess though, differences in measurement equipment (and/or between headphone samples) could well account for those differences since they aren't that big.


Anyway, this is an area of active and ongoing research (as you probably guessed from the last sentences quoted above about DF). There's quite a bit of this done by some HK researchers; I don't have (free) access to their AES papers, but some fairly extensive summaries can be read on the innerfidelity blog 2013, 2014 as well as following links from the main HK author's blog, Sean Olive; as shortcut, here are some free slides from their most recent (Nov 2015) presentation found there. This is quite a bit of material... I have only look at it briefly, but the theme seems to be that DF isn't good enough.

Here are a couple of interesting slides from one of their earlier presentations. First, the full frequency response (not truncated to 12KHz) of HD800 and on more clearly disclosed equipment:

enter image description here

And perhaps of most interest to the OP, the bassy sound of the Beats isn't all that appealing, granted in comparison with headphones that cost four to six times as much.

enter image description here


The simple answer is that a flat frequency response system constructed with op-amps to correct the driver response will necessarily have a very un-flat phase response in the pass band. This non-flatness means component frequencies of transient sounds become unevenly delayed, resulting in a subtle transient distortion which prevents proper sound component recognition, which means fewer distinct sounds can be discerned.

Consequently, It sounds terrible. As if all the sound is coming from a fuzzy ball centred exactly between one's ears.

The HRTF issue in the answer above is only part of this - the other is that a realizable analogue domain circuit can only have a causal time response, and to correct the driver properly one needs an acausal filter.

This can be approximated digitally with a driver-matched Finite Impulse Response filter, but this requires a small time delay which is enough to make movies very jarringly out-of-sync.

And it still sounds like it's coming from inside your head, unless the HRTF is also added back in too.

So, it's not so simple after all.

To make a "transparent" system, you don't need merely a flat pass band over the human hearing range, you also need a linear phase too - a flat group delay plot - and there is some evidence to suggest that this linear phase needs to continue up to a surprisingly high frequency so that directional cues are not lost.

This is easy to verify by experiment: Open a .wav of some music you are familiar with in a sound file editor like Audacity or snd, and delete one single 44100 Hz sample from just one channel, and realign the other channel so that the first sample now happens with the second one of the edited channel, and play it back.

You will hear a very noticeable difference, even though the difference is a time delay of only 1/44100th of a second.

Consider this: sound goes about 340 mm / ms, so at 20 kHz this is a time error of plus minus one sample delay, or 50 microseconds. That's 17 mm of sound travel, yet you can hear the difference with that missing 22.67 microseconds, which is only 7.7 mm of sound travel.

The absolute cut-off of human hearing is generally regarded to be around 20 kHz, so what's happening?

The answer is that hearing tests are conducted with test tones which mostly consist of just one frequency at a time, for a fairly long time at each part of the test. But our inner ears consist of a physical structure that performs an FFT of sorts on the sound while exposing neurons to it, so that neurons at different positions correlate to different frequencies.

Individual neurons can only re-fire so fast, so in some cases a few are used one-after-the-other to keep up... but this only works up to about 4 kHz or so... Which is right where our perception of tone ends. Yet there's nothing in the brain to stop a neuron firing just any time it feels so inclined, so what's the highest frequency that matters?

The point is that the tiny phase difference between the ears is perceptible, but rather than changing how we identify sounds (by their spectrographic structure) it affects how we perceive their direction. (which the HRTF also changes!) Even though it seems like it should be "rolled off" out of our range of hearing.

The answer is that the -3dB or even -10dB point is still too low - you need to go to about the -80 dB point to get it all. And if you want to handle loud sound as well as quiet, then you need to be good down to better than -100 dB. Which a single tone listening test is unlikely to ever see, largely because such frequencies only "count" when they arrive in phase with their other harmonics as part of a sharp transient sound - their energy in this case adds together, reaching enough of a concentration to trigger a neural response, even though as individual frequency components in isolation they may be too small to count.

Another issue is that we're constantly bombarded by many sources of ultrasonic noise anyway, probably much of it from broken neurons in our own inner ears, damaged by excessive sound level at some prior point in our lives. It would be hard to discern the isolated output tone of a listening test over such loud "local" noise!

This therefore requires "transparent" system design to use a much higher low-pass frequency so that there is space for the human low-pass to fade out (with it's own phase modulation which your brain is already "calibrated" to) before the system phase modulation starts changing the shape of transients, and shifting them around in time such that the brain can't recognise which sound they belong to any more.

With headphones it is far easier to simply construct them to have a single broadband driver with sufficient bandwidth, and rely upon the very high natural frequency response of the 'uncorrected' driver to prevent temporal distortion. This works far better with earphones, as the small mass of the driver lends itself well to this condition.

The reason for needing phase linearity is deeply rooted in the time-domain frequency-domain duality, as is the reason you can't construct a zero-delay filter that can "perfectly correct" any real physical system.

The reason it's "phase linearity" that matters and not "phase flatness" is because the overall slope of the phase curve doesn't matter - by duality, any phase slope is just equivalent to a constant time delay.

Everyone's outer ear has a different shape, and thus a different transfer function occurring at slightly different frequencies. Your brain is used to what it has, with it's own distinct resonances. If you use the wrong one, it will actually just sound worse, as the corrections your brain is used to doing will no longer correspond to the ones in the earphone's transfer function, and you will have something worse than a lack of cancellation of resonance - you'll have twice as many unbalanced poles/zeros cluttering up your phase delay, and utterly mangling your group delays and component arrive time relationships.

It will sound very unclear, and you won't be able to make out the spatial imaging encoded by the recording.

If you do a blind A/B listening test, everyone will select the uncorrected headphones which at least don't mangle the group delays so much, so that their brains can retune themselves into them.

And this is really why active headphones don't try to equalise. It's just too hard to get right.

It's also why digital room correction is the niche it is: Because using it properly requires frequent measurements, which are hard/impossible to do live, and which consumers generally don't want to know about.

Mostly because the acoustic resonances in the room under correction, which are mostly part of the bass response, keep shifting slightly as the air pressure, temperature and humidity all change, thus changing the speed of sound slightly, thus changing the resonances away from what they were when the measurement was taken.