Imagine you want to render the illusion of a virtual sound object on headphones and be able to control its direction in real-time. A straightforward way to do it is to simulate a virtual speaker setup using head-related transfer-functions (HRTFs), and pan the audio object around using amplitude panning. This is a well-known approach. However, it typically does not sound very good, for reasons I will explain. And there is a lesser-known “trick” that you can use to drastically improve sound quality. In this post I will briefly describe what it is and show you an audio example.
I wrote about this “trick” in a research article that I presented at the International Conference on Acoustics (ICA) 2022 in South Korea, with the title “A practical method for HRTF phase pre-processing”. Actually, the idea has been with me for a longer time and is inspired by a patent that I filed in 2014 (link).
In regular stereo loudspeaker listening, you will hear sound images in directions in between the speakers.
To generate sound images in any horizontal direction, you can use a ring of speakers.
It is easy to simulate the ear signals that would occur for a listener in the middle of the circle of speakers, using an HRTF database (for the reader that is not familiar with HRTFs, see Wikipedia). Below you can listen to a simulation of panning white noise around the listener using the 12-speaker layout above. For this example I use the common VBAP method to pan audio between the speakers, and an HRTF database for the Neumann KU100 artificial head:
Here is the audible result. Try to imagine the sound source moving counterclockwise around you, starting in the front (headphones required):
As you can hear, it doesn’t sound great. The sound “pumps” as it circles around you and sound character is different in the speaker directions compared to intermediate directions. What is desired is for the sound source to move smoothly around without a significant change in sound character. And this is what can be achieved with the “trick” I referred to above. It consists in a modification of the HRTF phase response at high frequencies. Here is the result with the modification:
Sounds a lot better, doesn’t it?
So what does this (patent pending) HRTF phase adjustment do? It corrects for a fundamental flaw in loudspeaker reproduction. The sound from each speaker reaches one of the ears with slightly different delays, leading to interference and cancellation of the sound waves at certain frequencies:
The plots below explain this in a slightly more technical way. Imagine a sound object panned between speakers 1 and 2 in the circle of speakers illustrated above. The blue curve in the upper figure below shows the phase difference between the signals from speakers 1&2 at the left ear, and it reaches a multiple of 180 degrees several times which means the sound gets cancelled out at those frequencies.
The lower figure shows the loss in sound pressure that occurs due to the interference between the speakers. It indicates that the resulting coloration is large and hampers any prospect for High-Fidelity sound reproduction (it should be mentioned, that the problem is much smaller in real loudspeaker listening than in this example, due to several factors such as room reflections contributing to the overall sound pressure at your ears, and that you can move your head around).
The red curves show the corresponding result with the phase modification applied to the HRTFs, which effectively fixes the problem in this case and leads to the audible result you heard above.
A side effect of the phase adjustment is that time-differences between the ears disappear above a certain frequency. Is that a problem? Usually not. If you want more details, take a look at the full article using the link below!
“The rationale for high frequency phase pre-processing is that the auditory system is relatively insensitive to interaural phase differences (IPD) at high frequencies in the localization of single anechoic sound sources. The well-known duplex theory, which applies to localization of sound in the lateral dimension, states that interaural time difference (ITD) is a dominant localization cue at low frequencies, and interaural level difference (ILD) is a dominant cue at high frequencies“
The work I describe in this post is part of my PhD studies conducted in collaboration with Dirac Research and the Signals & Systems group at Uppsala University, Sweden. The core idea is currently patent pending.