AudioLabs - Sample Rate Offset Compensated Acoustic Echo Cancellation for Multi-Device Scenarios

Sample Rate Offset Compensated Acoustic Echo Cancellation for Multi-Device Scenarios

S. Korse, O. Thiergart and E. A. P. Habets

Published in the Proc. of the International Workshop on Acoustic Signal Enhancement (IWAENC), 2024.

Best Student Paper Award - Finalist

Abstract

Acoustic echo cancellation (AEC) in a multi-device scenario is a challenging problem in the presence of sample rate offset (SRO) between the devices. The presence of SRO prevents the convergence of the AEC filter, thereby reducing the overall performance of the AEC filter. To mitigate the convergence issue, we formulate the multi-device AEC scenario as a multi-channel AEC problemthat comprises of multi-channel Kalman filter, SRO estimation, and resampling of far-end signals. Experiments using a two-device scenario shows that, for both correlated and uncorrelated playback signals, our proposed system can successfully mitigate the divergence of the multi-channel Kalman filter in the presence of SRO during both echo-only and double-talk. In addition, we show that for devices with correlated playback signals, an independent single channel AEC filter is essential to ensure fast convergence of SRO estimation.

Example Audio

Below, we illustrate the performance with some audio examples. In our examples, we assume a two-device scenario where the primary device on which the acoustic echo cancellation (AEC)[1] is running is connected to a auxiliary device via WiFi or Bluetooth. In addition, it is assumed that the primary device have access to all the far-end signals. We also assume that there exists an unknown sample rate offset (SRO) between the loudspeaker and microphone signals not belonging to the same device. Constant SROs were simulated using the STFT method proposed in [2] using segment length of 8192 samples. For SRO estimation, we use the dynamic weighted average coherence drift (DWACD) algorithm [3].

Example 1: Uncorrelated playback signals with an SRO of -125 ppm between the devices in echo-only scenario.
Following parameters were used to simulate the files:

Room Size: [7, 7, 5]m
RT60: 0.33s
Microphone Position of the Primary Device: [6.31, 1.3, 1.16]m
Loudspeaker Position of the Primary Device: [6.17, 1.59, 1.25]m
Loudspeaker Position of the Auxiliary Device: [0.89, 4.63, 2.03]m

Example 2: Uncorrelated playback signals with an SRO of 10 ppm between the devices in echo-only scenario.
Following parameters were used to simulate the files:

Room Size: [6, 6, 3]m
RT60: 0.42s
Microphone Position of the Primary Device: [2.18, 5.03, 1.03]m
Loudspeaker Position of the Primary Device: [2.08, 5.2, 1.31]m
Loudspeaker Position of the Auxiliary Device: [1.01, 2.89, 2.3]m

Example 3: Correlated playback signals with an SRO of 50 ppm between the devices in echo-only scenario.
Following parameters were used to simulate the files:

Room Size: [6, 5, 4]m
RT60: 0.42s
Microphone Position of the Primary Device: [4.27, 2.44, 1.79]m
Loudspeaker Position of the Primary Device: [4.08, 2.72, 2.09]m
Loudspeaker Position of the Auxiliary Device: [5.03, 3.15, 0.95]m

Example 4: Correlated playback signals with an SRO of -100 ppm between the devices in echo-only scenario.
Following parameters were used to simulate the files:

Room Size: [7, 5, 4]m
RT60: 0.22s
Microphone Position of the Primary Device: [6.4, 3.89, 1.61]m
Loudspeaker Position of the Primary Device: [6.25, 4.09, 1.89]m
Loudspeaker Position of the Auxiliary Device: [1.25, 3.16, 2.37]m

Example 5: Uncorrelated playback signals with an SRO of 25 ppm between the devices in double-talk scenario.
Following parameters were used to simulate the files:

Room Size: [7, 5, 4]m
RT60: 0.45s
Microphone Position of the Primary Device: [3.16, 1.09, 2.46]m
Loudspeaker Position of the Primary Device: [3.16, 0.84, 2.42]m
Loudspeaker Position of the Auxiliary Device: [3.14, 1.87, 0.86]m
Near-End Speaker Position: [3.59, 1.25, 1.98]m

Example 6: Uncorrelated playback signals with an SRO of -125 ppm between the devices in double-talk scenario.
Following parameters were used to simulate the files:

Room Size: [7, 7, 5]m
RT60: 0.33s
Microphone Position of the Primary Device: [2.95, 4.96, 0.69]m
Loudspeaker Position of the Primary Device: [2.94, 4.94, 0.5]m
Loudspeaker Position of the Auxiliary Device: [1.61, 3.4, 0.74]m
Near-End Speaker Position: [2.9, 5.16, 0.81]m

Example 7: Correlated playback signals with an SRO of 150 ppm between the devices in double-talk scenario.
Following parameters were used to simulate the files:

Room Size: [5, 5, 5]m
RT60: 0.29s
Microphone Position of the Primary Device: [1.8, 2.22, 0.67]m
Loudspeaker Position of the Primary Device: [1.99, 2.44, 0.74]m
Loudspeaker Position of the Auxiliary Device: [1.28, 3.23, 1.66]m
Near-End Speaker Position: [1.93, 2.37, 0.82]m

Example 8: Correlated playback signals with an SRO of -150 ppm between the devices in double-talk scenario.
Following parameters were used to simulate the files:

Room Size: [6, 6, 4]m
RT60: 0.47s
Microphone Position of the Primary Device: [2.18, 4.68, 2.47]m
Loudspeaker Position of the Primary Device: [2.44, 4.8, 2.4]m
Loudspeaker Position of the Auxiliary Device: [1.81, 2.4, 2.37]m
Near-End Speaker Position: [3.11, 4.53, 2.5]m

References

[1] E. Hansler and G. Schmidt, Acoustic Echo and Noise Control: A Practical Approach, Wiley-Interscience, USA, 2004.

[2] J. Schmalenstroeer and R. Haeb-Umbach, “Efficient sampling rate offset compensation - an overlap-save based approach,” in 26th European Signal Processing Conference (EUSIPCO), 2018, pp. 499–503.

[3] T. Gburrek, J. Schmalenstroeer, and R. Haeb-Umbach, “On Synchronization of Wireless Acoustic Sensor Networks in the Presence of Time-Varying Sampling Rate Offsets and Speaker Changes,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, May 2022, pp. 916–920.

International Audio Laboratories Erlangen

Sample Rate Offset Compensated Acoustic Echo Cancellation for Multi-Device Scenarios

Abstract

Example Audio

References