Binaural Rendering of Heterogeneous Sound Sources with Extent

Carlotta Anemüller, Oliver Thiergart, and Emanuël A. P. Habets

Published in proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

Abstract

In spatial audio rendering applications, it is often desired to render sound sources with a certain spatial extent in a realistic way. While existing methods mainly consider rendering of homogeneously extended sound sources (i.e., with constant radiation characteristics over the extent), rendering of heterogeneously extended sound sources (i.e., with position-dependent radiation characteristics) has barely been discussed in the literature. In this paper, we propose an approach for binaural rendering of heterogeneously extended sound sources. Input to the algorithm is a two-channel signal, which provides information about the position-dependent radiation characteristics of the sound source. Based on a model for an extended sound source with position-dependent energy and spectral content, the target covariance matrix of the binaural output signal is determined. Using a previously proposed optimal mixing approach, a binaural output signal with the desired properties is obtained, ensuring that the spatial characteristics encoded in the two-channel input signal are preserved. The proposed approach is evaluated both objectively and subjectively by comparing it to two homogeneous extent-rendering baselines as well as to simple point source reproduction.

Audio Examples

The following audio examples correspond to the items included in the listening test.

Input to each processing method is a simulated stereo input signal. For each item, the following conditions are included:

  • Reference: Simulated binaural reference signal.
  • Proposed: Output of the proposed method.
  • PS: Point source reproduction of the individual input channels at the outer left, respectively, outer right edge of the extent.
  • Hom [8] 1dec: Homogeneous extent rendering based on [8]. The heterogeneous SESS is rendered as two homogeneous SESSs, covering the left and right parts of the desired extent, respectively. One decorrelator per extent range is used.
  • Hom [8] 2dec: Homogeneous extent rendering based on [8]. The heterogeneous SESS is rendered as two homogeneous SESSs, covering the left and right parts of the desired extent, respectively. Two decorrelators per extent range are used.
  • Hom [9]: Homogeneous extent rendering based on [9]. The heterogeneous SESS is rendered as two homogeneous SESSs, covering the left and right parts of the desired extent, respectively.
  • Anchor: Point source reproduction of both input channels at the center of the extent, followed by a 3.5kHz low pass filter.

 
φ̅=0°, Δφ=60°

Speech

Sparse applause

Dense applause

φ̅=0°, Δφ=120°

Speech

Sparse applause

Dense applause

φ̅=60°, Δφ=60°

Speech

Sparse applause

Dense applause

φ̅=60°, Δφ=120°

Speech

Sparse applause

Dense applause

References

[8] Carlotta Anemüller, Alexander Adami, and Jürgen Herre, “Efficient Binaural Rendering of Spatially Extended Sound Sources,” J. Audio Eng. Soc., vol. 71, no. 5, pp. 281–292, May 2023.

[9] Leo McCormack, Archontis Politis, and Ville Pulkki, “Rendering of Source Spread for Arbitrary Playback Setups Based on Spatial Covariance Matching,” in Proc. WASPAA 2021, Oct. 2021, pp. 371–375.