Here’s a multi-year research project that resulted in an IEEE journal publication on headphone simulation of speaker systems, where the ultimate goal is to exactly duplicate the experience of listening to a speaker setup in a specific real room.
Why would you want to simulate speaker listening using headphones? One application is to simulate a high-fidelity music listening experience. A lot of music is produced to sound best on speakers. Simulation of an immersive speaker setup is also great for watching movies with surround sound. Another application would be to simulate a studio speaker setup for remote work in music or film production. Also, auralization of speaker systems is an interesting problem scientifically, as discussed below.
In principle, one way to achieve headphone simulation of a speaker system is to put microphones in the ears and record the impulse responses from each speaker to each ear. Then by recording the impulse response of a pair of headphones to the same microphone locations, it is possible to design digital filters that reproduce the sound pressure in the ears that the speakers would produce, hence duplicating the speaker listening experience. Illustration below.
However, this procedure is not feasible if one wants to extend the target group of listeners to a larger group – e.g. if designing a room auralization product for the consumer market – since everyone would need to be brought to every room that should be auralized. At least if exact reproduction is the goal. Our ears put a unique fingerprint on the sound that reaches our eardrums. Listening to room auralization filters designed using measurements taken in another person’s ears and with headphone equalization for another person risk sounding awfully colored.
Hence, the idea is to measure the room and sound system with a microphone array instead. This way, the acoustic fingerprint of the ears of each individual listener can be added in a later step to the measured room response. The approach we took builds on a technique that uses digital filtering to implement a “virtual head” using the microphone array, so that one obtains the ear signals that would occur for real head in the position of the mic. array. It is also easy to simulate ear signals for different look-directions of the “virtual head”, which enables head-tracked listening. In this study we used a Zylia ZM-1 microphone array that has 19 mics. on a sphere, illustrated in my living room below.
The application of speaker system auralization is scientifically both challenging and interesting. It is typically not possible to obtain complete information about the sound field in the room at the position of the mic. array, so simplifications have to be made in the filter design. It turns out that perceptually transparent simplifications are possible, leading to insights about the auditory system. Accurate headphone equalization is also part of the problem (see this post). Being able to create authentic auralization of a speaker system is a good baseline application that is interrelated with other interesting application such as, for example, live recording and transmission of general 3D sound fields for remote listening.
To use the room auralization method we developed, it is necessary to know the response of the microphone array itself for sound coming from any direction. We used a measurement robot for that purpose, which you can see in the picture below. The robot turns the array step-wise into basically every direction and for each direction, a speaker plays a sweep signal that is recorded by the array.
You can imagine that this measurement process took a bit of time and effort. Just below the mic. array you can see that I’ve mounted a small micro-speaker that plays a special signal for time-synchronization before each measurement, to know the exact time that each measurement starts. A larger speaker that is not in the picture then plays the real test signal.
Here is a small demonstration that you can listen to, a headphone simulation of one of the labs at Dirac that has a 7.x.4 Dolby Atmos speaker setup (7 horizontal speakers, 4 ceiling speakers, and no subs in this demo). It’s a channel check.
This is of course not using the acoustical fingerprint of your ears (HRTFs/headphone eq.), so you’ll probably not hear the correct elevation of the speakers, and you may perceive the front speakers to be in the back and vice-versa. In the published article we show that in principle, the array measurement method allows to achieve an audible result that is very close to actual in-ear measurements, as long as HRTF data is available for your ears. Getting hold of your individual HRTFs is a challenge in itself that I may write about in another post.
Comments, questions, feedback welcome in the comment field. You can find a link to the related publication below. One reason that it was a few years in the making is that we describe a whole mathematical framework for multichannel filter optimization for microphone arrays, that potentially has other applications as well. It may be a bit of a dense read even if you’re familiar with the field.
The work I describe in this post is part of my PhD studies conducted in collaboration with Dirac Research and the Signals & Systems group at Uppsala University, Sweden.