Neural Directional Filtering - Far-Field Directivity Control with a Small Microphone Array

J. Wechsler, S. R. Chetupalli, M. M. Halimeh, O. Thiergart and E. A. P. Habets

Published in Proc. Intl. Workshop Acoust. Signal Enhancement (IWAENC), 2024.

Best Student Paper Award - 2nd Place

Click here for the paper.

Contents of this Page

  1. Abstract of the IWAENC 2024 paper
  2. Note to iOS users
  3. Audio Examples
    • Example 1: Sources at 2.5 and 87.5 degrees
    • Example 2: Sources at 177.5 and 187.5 degrees
    • Example 3: Sources at 92.5 and 182.5 degrees
  4. References

Abstract

Capturing audio signals with specific directivity patterns is essential in speech communication. This study presents a deep neural network (DNN)-based approach to directional filtering, alleviating the need for explicit signal models. More specifically, our proposed method uses a DNN to estimate a single-channel complex mask from the signals of a microphone array. This mask is then applied to a reference microphone to render a signal that exhibits a desired directivity pattern. We investigate the training dataset composition and its effect on the directivity realized by the DNN during inference. Using a relatively small DNN, the proposed method is found to approximate the desired directivity pattern closely. Additionally, it allows for the realization of higher-order directivity patterns using a small number of microphones, which is a difficult task for linear and parametric directional filtering.

Note to iOS users

In order to listen to the audio examples on this website, devices using iOS must not be in silent mode (as of 06 August 2024). We apologise for the inconvenience.

Audio Examples

Below, we illustrate the performance with some audio examples. We show both the performance of our baseline methods (Least-Squares Beamforming [1], Parametric Filtering [2]) and that of our proposed method.

The examples constitute mixtures of 2 speakers from LibriSpeech [3] that we spatialized. The models trained on a single speaker as well as a maximum of {3,5} speakers are evaluated. For all audio examples, we first give the cardioid target and the corresponding estimates, then the 3rd-order DMA target and the corresponding estimates.

The employed architecture is FT-JNF [4].

Sources at 2.5 and 87.5 degrees

The attenuation values are:

  • Cardioid: -0.00 dB and -5.65 dB
  • 3rd-Order DMA: -0.02 dB and -41.67 dB
Activate

  • Play
  • Stop
  • Repeat
  • --:--:--:--- / --:--:--:---
power spectral density
  • Mixture
    • Solo
  • Cardioid Target
    • Solo
  • LS Beamformer [SDR 16.6 dB], Cardioid Target
    • Solo
  • Parametric Filtering [SDR 20.7 dB], Cardioid Target
    • Solo
  • FT-JNF Trained on 1 Source [SDR 11.3 dB], Cardioid Target
    • Solo
  • FT-JNF Trained on max. 3 Sources [SDR 30.0 dB], Cardioid Target
    • Solo
  • FT-JNF Trained on max. 5 Sources [SDR 29.9 dB], Cardioid Target
    • Solo
  • 3rd-Order DMA Target
    • Solo
  • LS Beamformer [SDR 2.8 dB], 3rd-Order DMA Target
    • Solo
  • Parametric Filtering [SDR 13.7 dB], 3rd-Order DMA Target
    • Solo
  • FT-JNF Trained on 1 Source [SDR 4.2 dB], 3rd-Order DMA Target
    • Solo
  • FT-JNF Trained on max. 3 Sources [SDR 23.7 dB], 3rd-Order DMA Target
    • Solo
  • FT-JNF Trained on max. 5 Sources [SDR 23.6 dB], 3rd-Order DMA Target
    • Solo

Sources at 177.5 and 187.5 degrees

The attenuation values are:

  • Cardioid: -66.45 dB and -47.38 dB
  • 3rd-Order DMA: -76.02 dB and -57.14 dB
Activate

  • Play
  • Stop
  • Repeat
  • --:--:--:--- / --:--:--:---
power spectral density
  • Mixture
    • Solo
  • Cardioid Target
    • Solo
  • LS Beamformer [SDR -36.9 dB], Cardioid Target
    • Solo
  • Parametric Filtering [SDR 12.7 dB], Cardioid Target
    • Solo
  • FT-JNF Trained on 1 Source [SDR 4.6 dB], Cardioid Target
    • Solo
  • FT-JNF Trained on max. 3 Sources [SDR -2.5 dB], Cardioid Target
    • Solo
  • FT-JNF Trained on max. 5 Sources [SDR -2.3 dB], Cardioid Target
    • Solo
  • 3rd-Order DMA Target
    • Solo
  • LS Beamformer [SDR -48.4 dB], 3rd-Order DMA Target
    • Solo
  • Parametric Filtering [SDR 5.9 dB], 3rd-Order DMA Target
    • Solo
  • FT-JNF Trained on 1 Source [SDR -2.1 dB], 3rd-Order DMA Target
    • Solo
  • FT-JNF Trained on max. 3 Sources [SDR -7.5 dB], 3rd-Order DMA Target
    • Solo
  • FT-JNF Trained on max. 5 Sources [SDR -12.7 dB], 3rd-Order DMA Target
    • Solo

Sources at 92.5 and 182.5 degrees

The attenuation values are:

  • Cardioid: -6.41 dB and -66.45 dB
  • 3rd-Order DMA: -43.95 dB and -76.02 dB
Activate

  • Play
  • Stop
  • Repeat
  • --:--:--:--- / --:--:--:---
power spectral density
  • Mixture
    • Solo
  • Cardioid Target
    • Solo
  • LS Beamformer [SDR 5.6 dB], Cardioid Target
    • Solo
  • Parametric Filtering [SDR 13.9 dB], Cardioid Target
    • Solo
  • FT-JNF Trained on 1 Source [SDR 6.7 dB], Cardioid Target
    • Solo
  • FT-JNF Trained on max. 3 Sources [SDR 23.8 dB], Cardioid Target
    • Solo
  • FT-JNF Trained on max. 5 Sources [SDR 23.8 dB], Cardioid Target
    • Solo
  • 3rd-Order DMA Target
    • Solo
  • LS Beamformer [SDR -39.2 dB], 3rd-Order DMA Target
    • Solo
  • Parametric Filtering [SDR 5.3 dB], 3rd-Order DMA Target
    • Solo
  • FT-JNF Trained on 1 Source [SDR -1.0 dB], 3rd-Order DMA Target
    • Solo
  • FT-JNF Trained on max. 3 Sources [SDR 1.4 dB], 3rd-Order DMA Target
    • Solo
  • FT-JNF Trained on max. 5 Sources [SDR -3.2 dB], 3rd-Order DMA Target
    • Solo

References

[1] E. Rasumow et al., "Regularization Approaches for Synthesizing HRTF Directivity Patterns," IEEE/ACM Trans. Aud., Sp., Lang. Proc., vol. 24, no. 2, pp. 215-225, 2016.

[2] K. Kowalczyk, O. Thiergart, M. Taseska, G. Del Galdo, V. Pulkki and E. A. P. Habets, "Parametric Spatial Sound Processing: A flexible and efficient solution to sound scene acquisition, modification, and reproduction," in IEEE Signal Processing Magazine, vol. 32, no. 2, pp. 31-42, 2015.

[3] V. Panayotov, G. Chen, D. Povey and S. Khudanpur, “Librispeech: An ASR corpus based on public domain audio books,” in Proc. IEEE Intl. Conf. on Ac., Sp. and Sig. Proc. (ICASSP), 2015, pp. 5206-5210.

[4] K. Tesch and T. Gerkmann, “Spatially selective deep non-linear filters for speaker extraction,” in Proc. IEEE Intl. Conf. on Ac., Sp. and Sig. Proc. (ICASSP), 2023.