Relative Transfer Function Estimation Exploiting Instantaneous Signals and the Signal Subspace

Maja Taseska and Emanuel A. P. Habets

Published in the Proc. of the European Signal Processing Conference (EUSIPCO), Nice, France 2015.

Abstract

Multichannel noise reduction can be achieved without distorting the desired signals, provided that the relative transfer functions (RTFs) of the sources are known. Many RTF estimators exploit periods where only one source is active, which is a restrictive requirement in practice. We propose an RTF estimator that does not require such periods. A time-varying RTF is computed per time-frequency bin that corresponds to the dominant source at that bin. We demonstrate that a minimum variance distortionless response (MVDR) filter based on the proposed RTF estimate can extract multiple sources with low distortion. The MVDR filter has maximum degrees of freedom and hence achieves significantly better noise reduction compared to a linearly constrained minimum variance filter that uses a separate RTF for each source.

Description

The proposed RTF estimator was evaluated in a simulated room with dimensions 4.5 m x 4 m x 3 m. The microphone signals were obtained by convolving clean speech with simulated room impulse responses [1]. White sensor noise and diffuse babble noise were added [2].

The experiments were performed using an array of 5 omnidirectional microphones, with inter-microphone distance of 3 cm. The method is however applicable to any constellation of co-located or distributed microphones. The sampling rate was 16 kHz and the STFT frame length was 128 ms with 50% overlap. The results shown below use a PSD matrix estimation framework from [3].

The goal is to extract the sum of all active speech sources while minimizing the residual background and sensor noise power at the output. Maximum degrees of freedom of the spatial filter are utilized if only one filter constraint is imposed. By using an MVDR with an appropriately estimated RTF vector, we demonstrate that the desired signal can be extracted with low distortion regardless of the number of active sources. Compared to an LCMV filter which uses multiple constraints, the MVDR filter achieves significantly higher noise reduction. Audio examples for two, three and four sources are shown for the scenarios illustrated in the figure below.

We compare MVDR filters with three RTF estimators:

  1. Spatial-prediction-based RTF estimator (denoted by SP in the paper)
  2. Standard subspace-based RTF estimator (denoted by GEVD in the paper)
  3. The proposed RTF estimator
fig_scenarios

Audio Examples

Two sources scenario, SNR = 18 dB, Reverberation time 350 ms

Two sources scenario, SNR = 8 dB, Reverberation time 350 ms

Two sources scenario, SNR = 18 dB, Reverberation time 500 ms

Two sources scenario, SNR = 8 dB, Reverberation time 500 ms

Three sources scenario, SNR = 18 dB, Reverberation time 350 ms

Three sources scenario, SNR = 8 dB, Reverberation time 350 ms

Three sources scenario, SNR = 18 dB, Reverberation time 500 ms

Three sources scenario, SNR = 8 dB, Reverberation time 500 ms

Four sources scenario, SNR = 18 dB, Reverberation time 350 ms

Four sources scenario, SNR = 8 dB, Reverberation time 350 ms

Four sources scenario, SNR = 18 dB, Reverberation time 500 ms

Four sources scenario, SNR = 8 dB, Reverberation time 500 ms

References

[1] E. A. P. Habets, “Room impulse response generator,” Tech. Rep., Technische Universiteit Eindhoven, 2006

[2] E. A. P. Habets and S. Gannot, “Generating sensor signals in isotropic noise fields,” J. Acoust. Soc. Am., vol. 122, no. 6, pp. 3464–3470, Dec. 2007.

[3] M. Taseska and E. A. P. Habets, “MMSE-based blind source extraction in diffuse noise fields using a complex coherence-based a priori SAP estimator,” in Proc. Intl. Workshop Acoust. Signal Enhancement (IWAENC), Sep. 2012