Sample Rate Offset Compensated Acoustic Echo Cancellation for Multi-Device Scenarios

S. Korse, O. Thiergart and E. A. P. Habets

To be submitted to the International Workshop on Acoustic Signal Enhancement (IWAENC 2024).

Click [here]() for the paper.

Contents of this Page

  1. Abstract of the paper to be submitted to IWAENC 2024
  2. Example Audio
  3. References

Abstract

Acoustic echo cancellation (AEC) in a multi-device scenario is a challenging problem in the presence of sample rate offset (SRO) between the devices. The presence of SRO prevents the convergence of the AEC filter, thereby reducing the overall performance of the AEC filter. To mitigate the convergence issue, we formulate the multi-device AEC scenario as a multi-channel AEC problemthat comprises of multi-channel Kalman filter, SRO estimation, and resampling of far-end signals. Experiments using a two-device scenario shows that, for both correlated and uncorrelated playback signals, our proposed system can successfully mitigate the divergence of the multi-channel Kalman filter in the presence of SRO during both echo-only and double-talk. In addition, we show that for devices with correlated playback signals, an independent single channel AEC filter is essential to ensure fast convergence of SRO estimation.

Example Audio

Below, we illustrate the performance with some audio examples. In our examples, we assume a two-device scenario where the primary device on which the acoustic echo cancellation (AEC)[1] is running is connected to a auxiliary device via WiFi or Bluetooth. In addition, it is assumed that the primary device have access to all the far-end signals. We also assume that there exists an unknown sample rate offset (SRO) between the loudspeaker and microphone signals not belonging to the same device. Constant SROs were simulated using the STFT method proposed in [2] using segment length of 8192 samples. For SRO estimation, we use the dynamic weighted average coherence drift (DWACD) algorithm [3].

Example 1: Uncorrelated playback signals with an SRO of -125 ppm between the devices in echo-only scenario.
Following parameters were used to simulate the files:

  • Room Size: [7, 7, 5]m
  • RT60: 0.33s
  • Microphone Position of the Primary Device: [6.31, 1.3, 1.16]m
  • Loudspeaker Position of the Primary Device: [6.17, 1.59, 1.25]m
  • Loudspeaker Position of the Auxiliary Device: [0.89, 4.63, 2.03]m


power spectral density


Example 2: Uncorrelated playback signals with an SRO of 10 ppm between the devices in echo-only scenario.
Following parameters were used to simulate the files:

  • Room Size: [6, 6, 3]m
  • RT60: 0.42s
  • Microphone Position of the Primary Device: [2.18, 5.03, 1.03]m
  • Loudspeaker Position of the Primary Device: [2.08, 5.2, 1.31]m
  • Loudspeaker Position of the Auxiliary Device: [1.01, 2.89, 2.3]m


power spectral density


Example 3: Correlated playback signals with an SRO of 50 ppm between the devices in echo-only scenario.
Following parameters were used to simulate the files:

  • Room Size: [6, 5, 4]m
  • RT60: 0.42s
  • Microphone Position of the Primary Device: [4.27, 2.44, 1.79]m
  • Loudspeaker Position of the Primary Device: [4.08, 2.72, 2.09]m
  • Loudspeaker Position of the Auxiliary Device: [5.03, 3.15, 0.95]m


power spectral density


Example 4: Correlated playback signals with an SRO of -100 ppm between the devices in echo-only scenario.
Following parameters were used to simulate the files:

  • Room Size: [7, 5, 4]m
  • RT60: 0.22s
  • Microphone Position of the Primary Device: [6.4, 3.89, 1.61]m
  • Loudspeaker Position of the Primary Device: [6.25, 4.09, 1.89]m
  • Loudspeaker Position of the Auxiliary Device: [1.25, 3.16, 2.37]m


power spectral density


Example 5: Uncorrelated playback signals with an SRO of 25 ppm between the devices in double-talk scenario.
Following parameters were used to simulate the files:

  • Room Size: [7, 5, 4]m
  • RT60: 0.45s
  • Microphone Position of the Primary Device: [3.16, 1.09, 2.46]m
  • Loudspeaker Position of the Primary Device: [3.16, 0.84, 2.42]m
  • Loudspeaker Position of the Auxiliary Device: [3.14, 1.87, 0.86]m
  • Near-End Speaker Position: [3.59, 1.25, 1.98]m


power spectral density


Example 6: Uncorrelated playback signals with an SRO of -125 ppm between the devices in double-talk scenario.
Following parameters were used to simulate the files:

  • Room Size: [7, 7, 5]m
  • RT60: 0.33s
  • Microphone Position of the Primary Device: [2.95, 4.96, 0.69]m
  • Loudspeaker Position of the Primary Device: [2.94, 4.94, 0.5]m
  • Loudspeaker Position of the Auxiliary Device: [1.61, 3.4, 0.74]m
  • Near-End Speaker Position: [2.9, 5.16, 0.81]m


power spectral density


Example 7: Correlated playback signals with an SRO of 150 ppm between the devices in double-talk scenario.
Following parameters were used to simulate the files:

  • Room Size: [5, 5, 5]m
  • RT60: 0.29s
  • Microphone Position of the Primary Device: [1.8, 2.22, 0.67]m
  • Loudspeaker Position of the Primary Device: [1.99, 2.44, 0.74]m
  • Loudspeaker Position of the Auxiliary Device: [1.28, 3.23, 1.66]m
  • Near-End Speaker Position: [1.93, 2.37, 0.82]m


power spectral density


Example 8: Correlated playback signals with an SRO of -150 ppm between the devices in double-talk scenario.
Following parameters were used to simulate the files:

  • Room Size: [6, 6, 4]m
  • RT60: 0.47s
  • Microphone Position of the Primary Device: [2.18, 4.68, 2.47]m
  • Loudspeaker Position of the Primary Device: [2.44, 4.8, 2.4]m
  • Loudspeaker Position of the Auxiliary Device: [1.81, 2.4, 2.37]m
  • Near-End Speaker Position: [3.11, 4.53, 2.5]m


power spectral density

References

[1] E. Hansler and G. Schmidt, Acoustic Echo and Noise Control: A Practical Approach, Wiley-Interscience, USA, 2004.

[2] J. Schmalenstroeer and R. Haeb-Umbach, “Efficient sampling rate offset compensation - an overlap-save based approach,” in 26th European Signal Processing Conference (EUSIPCO), 2018, pp. 499–503.

[3] T. Gburrek, J. Schmalenstroeer, and R. Haeb-Umbach, “On Synchronization of Wireless Acoustic Sensor Networks in the Presence of Time-Varying Sampling Rate Offsets and Speaker Changes,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, May 2022, pp. 916–920.