The Cocktail-Party-Effect

Extraction of one Sound Source

The term Cocktail-Party-Effect describes the ability of the human auditory system to focus on the signals of a sound source when multiple sound sources are simultaneously active. Thus, at a cocktail party where many people speak simultaneously, a listener is able to concentrate on only one speaker and disregard the other speakers.

Comparing the personal listening experience in "cocktail party" situations with microphone recordings of the same situation, so the microphone recordings appear much more strongly disturbed by noise sources (other speakers) than your hearing. Due to the Cocktail-Party-Effect the human auditory system can achieve an enhancement of the signal-to-noise-ratio of 9..15 dB. That is, a human listener can reduce the perceived loudness of interfering sound sources by a factor of 2..3.

Reduction of Echo and Reverberation

At the hearing in enclosed rooms a listener is capable to suppress distracting reflections and reverberation considerably. Comparing the personal listening experience in a room with microphone recordings from the same room, microphone recordings sound much more reverberant.

Inside enclosed rooms the sound of a sound source is reflected at the walls again and again. A wall reflection has the same effect on the sound field as the introduction of a corresponding "mirror sound source" behind the wall. Analogous to the wall reflections, these "mirror sound sources" would then be distributed over all directions.
A Cocktail-Party-Processor, which is able to suppress sound from directions, which are different to the direction of a desired sound source, can also be used to reduce the effect of echoes and reverberation. Since all those "mirror sound sources" are suppressed, whose directions do not match to the direction of desired sound source, the total number of perceived mirror sound sources is reduced. As a result, the perceived sound will contain less echoe and reverberation.

Separation of Sound Sources

The human auditory system uses several methods to separate a desired sound source from interfering sound sources.

Separation of sound sources on the basis of signal characteristics.
If the signal characteristics of the desired sound source or of interfering sound sources are known, the auditory system can use this information in order to improve the perception of the desired signal: Signal portions (e.g. frequency bands), where the signal characteristics of the interfering signals dominates, are suppressed, signal portions, where the signal characteristics of the desired signal dominates, are accepted.
Separation of sound sources on the basis of signal directions.
Sound of a specific direction leads to characteristical interaural time and level differences between both ear signals. If the direction of a desired sound source is known, their interaural time and level differences can be used to extract a desired sound source: Signal portions, where interaural time and level differences fit to the desired direction, are accepted, Signal portions, where interaural time and level differences do not fit to the desired direction, are suppressed,

Types of Cocktail-Party-Processors

There are several approaches to extract the signals of a desired sound source from a mixture of sound signals. Depending on the number of used sound receivers, Cocktail-Party-Processors can be classified as follows:

Monaural Cocktail-Party-Processors process the sound signals of only one sound receiver. These Cocktail-Party-Processors need to know the characteristics of the desired sound signal or of interfering sound signals. (e.g. spectrum, statistics). Signal portions, whose characteristics fits to the desired signal (or doesn't fit to known interfering signal), are accepted. Signal portions, whose characteristics doesn't fit to the desired signal (or does fit to the interfering signals), are suppressed.
Binaural Cocktail-Party-Processors process 2 sound signals, which are recorded at different positions (similar to the sound signals at both ears). These Cocktail-Party-Processors use directional information to separate different sound sources from each other. For each sound source direction there are characteristic time and level differences between both recorded sound signals. Signal portions, whose time and level differences fit to the desired sound source are transmitted. Signal portions, whose time and level differences don't fit to the desired sound source are suppressed.
Cocktail-Party-Processors with microphone arrays process more than 2 sound signals. The microphones of the array can be arranged on a line, on a plane or all over the space. These Cocktail-Party-Processors separate sound sources depending on their direction, too. The used method is the so called "beamforming": By introducing additional time and level differences between the recorded signals the directional characteristic of the microphone array can be adjusted to the direction of the desired sound source. Hereby the sound of the desired sound source is amplified, while the sound of interfering sound sources is attenuated.

Properties of Cocktail-Party-Processors

The properties of a Cocktail-Party-Processor depend on its type.

Monaural Cocktail-Party-Processors are especially effective when signal characteristics of desired and interfering signal differ significantly (e.g. speech and white noise). Then major improvements in the signal-to-noise-ratio can be achieved even for signals with the same direction of incidence. If the signal characteristics of desired and interfering signal are similar, the effect of this Cocktail-Party-Processor type is rather low.
Binaural Cocktail-Party-Processors, which base on the evaluation of phase differences between two sound receivers, are most effective when the distance between the sound receivers is smaller than the wavelength of the sound, but so large that phase differences can be evaluated with sufficient accuracy. Then, the incident direction of the sound can be evaluated from the phase differences between both sound receivers, alone. If the wavelength is less than the distance between the sound receivers, there is ambiguity: several directions of incidence then have the same phase difference, so that signal components from wrong directions will be assigned to the desired sound signal, too.
Binaural Cocktail-Party-Processors, which base on the evaluation of level differences, need remarkable directional dependant level differences between both sound receivers. These level differences can be introduced by inserting an shading object between both sound receivers (similar to human head between both ears). In order to achieve remarkable level differences between both sound receivers, the wavelength of the sound has to be significantly smaller than the size of the inserted object. At longer wavelengths no evaluable level differences will appear at all.
Cocktail-Party-Processors with microphone arrays are most effective, when the wavelength of the sound is much smaller than the size of the microphone array, so that significant phase differences arise between the microphone signals. which improve the directional characteristics of the microphone array. For large wavelengths, which exceed the size of the microphone array, the directional filtering effect of the microphone array is relatively small.

Combining Cocktail-Party-Processors

To achieve good results for broadband signals, it is useful to combine different types of cocktail-party processors, such as:

Combine several binaural Cocktail-Party-Processors with different microphone distances for different frequency ranges.
Combine a binaural Cocktail-Party-Processor with phase difference evaluation for low frequencies and a binaural Cocktail-Party-Processor with level difference analysis for high frequencies.
Combine binaural Cocktail-Party-Processors for low frequencies and microphone-arrays for high frequencies.
Combine monaural Cocktail-Party-Processors with binaural Cocktail-Party-Processors or microphone-arrays. Thus, the effects of directional filtering can be enhanced by the effects of filtering by signal characteristics.

Cocktail-Party-Processor-Algorithms

Depending on the type of Cocktail-Party-Processor, different types of algorithms are used to extract the signal of the desired sound source.

Monaural and binaural Cocktail-Party-Processors evaluate the signal characteristics of the received signals mostly in different frequency bands for subsequent time intervals. These processors estimate the percentage of the desired signal within each frequency band and for each time interval and weight the signals in these frequency bands and time frames by the estimated percentages. In this way the received signals are adjusted to the spectrum and the time course of the desired signal.
Monaural cocktail-party processors often use correlation algorithms between a known signal (desired or interfering) and the received signals, in order to determine the percentage of the known signal inside the received signal.
Binaural Cocktail-Party-Processors oftenly use cross-correlation-algorithms between both received signals, in order to evaluate the incidence direction. The degree of coincidence between the evaluated direction and the direction of the desired sound source is often used as a measure for the percentage of the desired signal within the received signal.
Microphone arrays typically work under broad band conditions. The received microphone signals are individually delayed and attenuated, in order to achieve an ideal directional characteristics of the array. The array is considered as optimally tuned, if the signal of the desired direction is amplified as far as possible and if the signals from interfering sound source directions are attenuated as far as possible.

The evaluation algorithms for monaural and binaural Cocktail-Party-Processors can also be classified to at least two types:

First order Cocktail-Party-Processor-Algorithms determine the percentage of the desired signal in the received signals from the deviation between known signal characteristics of the desired signal and evaluated signal characteristics of the received signals.
These algorithms perform well, when the desired signal is relatively strong (at least within several frequency bands or several time intervals) and if the received signals are not dominated by interfering signals. If the interfering signals are dominating, the performance of first order algorithms remains low.
Second order Cocktail-Party-Processor-Algorithms evaluate statistics, based on the signal characteristics of the received signals. The percentage of the desired signal within the received signals is estimated from the results of this statistical analysis.
These algorithms do also perform well, if the desired signal is much weaker than the interfering signals. In these cases, for example, the variance of the evaluated sound direction can give an estimate for the amplitude of a weak desired signal within strong interfering signals.