AudioLabs - Neural Directional Filtering - Far-Field Directivity Control with a Small Microphone Array

Neural Directional Filtering - Far-Field Directivity Control with a Small Microphone Array

J. Wechsler, S. R. Chetupalli, M. M. Halimeh, O. Thiergart and E. A. P. Habets

Published in Proc. Intl. Workshop Acoust. Signal Enhancement (IWAENC), 2024.

Best Student Paper Award - 2nd Place

Click here for the paper.

Contents of this Page

Abstract of the IWAENC 2024 paper
Note to iOS users
Audio Examples
- Example 1: Sources at 2.5 and 87.5 degrees
- Example 2: Sources at 177.5 and 187.5 degrees
- Example 3: Sources at 92.5 and 182.5 degrees
References

Abstract

Capturing audio signals with specific directivity patterns is essential in speech communication. This study presents a deep neural network (DNN)-based approach to directional filtering, alleviating the need for explicit signal models. More specifically, our proposed method uses a DNN to estimate a single-channel complex mask from the signals of a microphone array. This mask is then applied to a reference microphone to render a signal that exhibits a desired directivity pattern. We investigate the training dataset composition and its effect on the directivity realized by the DNN during inference. Using a relatively small DNN, the proposed method is found to approximate the desired directivity pattern closely. Additionally, it allows for the realization of higher-order directivity patterns using a small number of microphones, which is a difficult task for linear and parametric directional filtering.

Note to iOS users

In order to listen to the audio examples on this website, devices using iOS must not be in silent mode (as of 06 August 2024). We apologise for the inconvenience.

Audio Examples

Below, we illustrate the performance with some audio examples. We show both the performance of our baseline methods (Least-Squares Beamforming [1], Parametric Filtering [2]) and that of our proposed method.

The examples constitute mixtures of 2 speakers from LibriSpeech [3] that we spatialized. The models trained on a single speaker as well as a maximum of {3,5} speakers are evaluated. For all audio examples, we first give the cardioid target and the corresponding estimates, then the 3rd-order DMA target and the corresponding estimates.

The employed architecture is FT-JNF [4].

Sources at 2.5 and 87.5 degrees

The attenuation values are:

Cardioid: -0.00 dB and -5.65 dB
3rd-Order DMA: -0.02 dB and -41.67 dB

Activate

Play
Stop
Repeat
--:--:--:--- / --:--:--:---

Mixture
- Solo
Cardioid Target
- Solo
LS Beamformer [SDR 16.6 dB], Cardioid Target
- Solo
Parametric Filtering [SDR 20.7 dB], Cardioid Target
- Solo
FT-JNF Trained on 1 Source [SDR 11.3 dB], Cardioid Target
- Solo
FT-JNF Trained on max. 3 Sources [SDR 30.0 dB], Cardioid Target
- Solo
FT-JNF Trained on max. 5 Sources [SDR 29.9 dB], Cardioid Target
- Solo
3rd-Order DMA Target
- Solo
LS Beamformer [SDR 2.8 dB], 3rd-Order DMA Target
- Solo
Parametric Filtering [SDR 13.7 dB], 3rd-Order DMA Target
- Solo
FT-JNF Trained on 1 Source [SDR 4.2 dB], 3rd-Order DMA Target
- Solo
FT-JNF Trained on max. 3 Sources [SDR 23.7 dB], 3rd-Order DMA Target
- Solo
FT-JNF Trained on max. 5 Sources [SDR 23.6 dB], 3rd-Order DMA Target
- Solo

Sources at 177.5 and 187.5 degrees

The attenuation values are:

Cardioid: -66.45 dB and -47.38 dB
3rd-Order DMA: -76.02 dB and -57.14 dB

Activate

Play
Stop
Repeat
--:--:--:--- / --:--:--:---

Mixture
- Solo
Cardioid Target
- Solo
LS Beamformer [SDR -36.9 dB], Cardioid Target
- Solo
Parametric Filtering [SDR 12.7 dB], Cardioid Target
- Solo
FT-JNF Trained on 1 Source [SDR 4.6 dB], Cardioid Target
- Solo
FT-JNF Trained on max. 3 Sources [SDR -2.5 dB], Cardioid Target
- Solo
FT-JNF Trained on max. 5 Sources [SDR -2.3 dB], Cardioid Target
- Solo
3rd-Order DMA Target
- Solo
LS Beamformer [SDR -48.4 dB], 3rd-Order DMA Target
- Solo
Parametric Filtering [SDR 5.9 dB], 3rd-Order DMA Target
- Solo
FT-JNF Trained on 1 Source [SDR -2.1 dB], 3rd-Order DMA Target
- Solo
FT-JNF Trained on max. 3 Sources [SDR -7.5 dB], 3rd-Order DMA Target
- Solo
FT-JNF Trained on max. 5 Sources [SDR -12.7 dB], 3rd-Order DMA Target
- Solo

Sources at 92.5 and 182.5 degrees

The attenuation values are:

Cardioid: -6.41 dB and -66.45 dB
3rd-Order DMA: -43.95 dB and -76.02 dB

Activate

Play
Stop
Repeat
--:--:--:--- / --:--:--:---

Mixture
- Solo
Cardioid Target
- Solo
LS Beamformer [SDR 5.6 dB], Cardioid Target
- Solo
Parametric Filtering [SDR 13.9 dB], Cardioid Target
- Solo
FT-JNF Trained on 1 Source [SDR 6.7 dB], Cardioid Target
- Solo
FT-JNF Trained on max. 3 Sources [SDR 23.8 dB], Cardioid Target
- Solo
FT-JNF Trained on max. 5 Sources [SDR 23.8 dB], Cardioid Target
- Solo
3rd-Order DMA Target
- Solo
LS Beamformer [SDR -39.2 dB], 3rd-Order DMA Target
- Solo
Parametric Filtering [SDR 5.3 dB], 3rd-Order DMA Target
- Solo
FT-JNF Trained on 1 Source [SDR -1.0 dB], 3rd-Order DMA Target
- Solo
FT-JNF Trained on max. 3 Sources [SDR 1.4 dB], 3rd-Order DMA Target
- Solo
FT-JNF Trained on max. 5 Sources [SDR -3.2 dB], 3rd-Order DMA Target
- Solo

References

[1] E. Rasumow et al., "Regularization Approaches for Synthesizing HRTF Directivity Patterns," IEEE/ACM Trans. Aud., Sp., Lang. Proc., vol. 24, no. 2, pp. 215-225, 2016.

[2] K. Kowalczyk, O. Thiergart, M. Taseska, G. Del Galdo, V. Pulkki and E. A. P. Habets, "Parametric Spatial Sound Processing: A flexible and efficient solution to sound scene acquisition, modification, and reproduction," in IEEE Signal Processing Magazine, vol. 32, no. 2, pp. 31-42, 2015.

[3] V. Panayotov, G. Chen, D. Povey and S. Khudanpur, “Librispeech: An ASR corpus based on public domain audio books,” in Proc. IEEE Intl. Conf. on Ac., Sp. and Sig. Proc. (ICASSP), 2015, pp. 5206-5210.

[4] K. Tesch and T. Gerkmann, “Spatially selective deep non-linear filters for speaker extraction,” in Proc. IEEE Intl. Conf. on Ac., Sp. and Sig. Proc. (ICASSP), 2023.

International Audio Laboratories Erlangen

Neural Directional Filtering - Far-Field Directivity Control with a Small Microphone Array

Contents of this Page

Abstract

Note to iOS users

Audio Examples

Sources at 2.5 and 87.5 degrees

Sources at 177.5 and 187.5 degrees

Sources at 92.5 and 182.5 degrees

References