Listening to stereo music through headphones has never sounded quite right to me. The extreme stereo separation can cause headaches for me when using headphones for more than an hour. Some people report experiencing “pressure” in their ears when using headphones This discomfort could simply be due to how unnatural headphone listening is; no sound from the left channel makes it to your right ear and vice versa. There are some tricks that can be used to get rid of these problems but before I show you how you can try out these tricks let’s get into the math and science behind head related transfer function (HRTF) and how they are used to simulate the way sound travels to our ears resulting in a better listening experience when using headphones with stereo content or even create what is known as virtual surround sound.
First let me explain what a transfer function is. In engineering, transfer functions are mathematical representations of how a system can affect an input when a signal passes through it. In signal processing there are a few things we want to look at when examining a transfer function but today we are interested in the frequency response and input output delay.
The frequency response of a transfer function tells us how the system that the transfer function represents affects the amplitude of the signal that passes through it depending on the frequency of the signal. For our purposes, frequency translates to the pitch of a note and amplitude the loudness of the note. The input output delay is simply what it sounds like, how long does it take for the input to make it to the output.
The HRTF is a representation of how our head affects sounds as it travels around it. Our brain is able to figure out which direction a sound comes from by the time delay between when one ear hears a sound verses the other and how the sound, sounds. By using the HRTF to simulate the affects our head has on sounds traveling around it we can make it seems like you are listening to speakers instead of headphones.
There is a way we can attempt to generate the HRTF, not surprisingly it uses a similar setup to how binaural recordings are done. A binaural recording is where microphones are placed in two positions to model the ears of a person’s head and sounds recorded using this setup when played back on headphones simulates the auditory environment that the recording was done in heard as if you were there. A good example of a binaural recording can be found here.
To capture the effect of a binaural recording in a way where we can use it to emulate a speaker setup using headphones we can play white noise (noise that contains all frequencies) from speakers placed one meter apart, each one meter away from the head model and record the input from both microphones placed where your ears are. If we analyze the frequency response of the sound we recorded from the left microphone verses the right one we can determine what frequencies the human head absorbs better than others. Other people have done these measurements and found that a person’s head absorbs high frequency sounds more than low frequency ones. Another unsurprising fact is that the sound will be delayed between the 2 microphones, unsurprising because we know that sound travels at about 343 meters per second which might seem fast but is slow enough for our brains to interpret that the sound came from one side and not the other. A persons head is about 0.15 meters wide, this means the distance from the left speaker to left ear and the left speaker to right ear is different. Using some trigonometry we can calculate that the distance from the left speaker to left ear is about 0.965 meters and the distance from the left speaker to right ear is about 1.04 meters. This is a difference of 0.075 meters. This results in a 220 microsecond time delay between one ear hearing the sound from the speaker verses the other ear.
Now that we understand how sound travels around our head we can better understand how we can simulate speakers using headphones. By taking into account the measurements taken using a head model we can create a digital signal processing (DSP) block that adds together the sound from the left channel and the modified sound from the right channel and feed it into the left side of our headphones. If we do the same to the other side we have created something that better simulates the sound of speakers.
I have recorded the first 15 seconds of the output of a few audio files that exhibits exceptionally high amounts of stereo separation for you to try. You will notice that you can still tell which direction the sound is coming from but to many people it will feel less awkward. Make sure you do not have any crossfeed plugins enabled when you give these a try and also make sure you are using headphones since it makes no sense simulating the sound of a speaker if the sounds are being played from a speaker.
I will now show you how you can listen to your own music with the effect of your head applied to the sound before it is output to your headphones.
To apply the effect of the HRTF to your music I will use a popular, extensible media player Foobar2000 and a digital signal processing plugin called that will perform what is known as crossfeed on the input audio stream.
First Install Foobar2000 and open it up. Configure it the way you like using the setup dialog. Once you are done close Foobar2000, extract the zip file containing the dll file from the crossfeed plugin link and move the dll file into the Foobar2000 components folder. You will find it at C:\Program Files (x86)\foobar2000\components on a 64 bit machine or C:\Program Files\foobar2000\components on a 32 bit machine.
Next you can open up Foobar2000 and open up the preference dialog by clicking File -> Preferences. Open the DSP Manager by clicking it in the tree on the left and double click Crossfeed under Available DSPs to add it to the Active DSPs list.
Now you can configure the Crossfeed plugin by clicking on the Crossfeed plugin under Active DSPs and clicking Configure Selected.
Here we see some of the options that change how this DSP block affects the sound we hear. ILD stands for interaural level difference, the difference in the volume the sound being played back into the opposite ear after ITD or the interaural time delay period. ILD low determines how much bass frequencies are attenuated and ILD high determines how much treble is attenuated before the sound is repeated to the other ear. The numbers are in decibels, -3dB signifies half the intensity of the original. Above you can find the settings I like to use.
Now that you are done with that, OK out of both dialogs and add your music to Foobar2000 and give it a try. You can remove the crossfeed effect by removing the component from the Active DSPs list anytime.
I hope you enjoyed this introduction to HRTF’s, comments and suggestions are always welcome so please comment down below.