Carlotta Anemüller, Oliver Thiergart, and Emanuël A. P. Habets
Published in proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023
The spatial perception of a sound image is significantly influenced by the degree of correlation between the two audio signals received by the ears. Audio signal decorrelation is, therefore, a commonly used tool in various spatial audio processing applications. In this paper, we propose a novel approach to audio decorrelation using generative adversarial networks. As generator, we employ a convolutional neural network architecture that has been recently proposed for audio decorrelation. In contrast to previous work, the loss function is defined solely w.r.t. the input audio signal, a reference output signal is not required. This enables to specifically tailor the training procedure to the desired output signal properties and possibly outperform conventional decorrelation techniques. The proposed approach is compared to a state-of-the-art conventional decorrelation method by means of objective evaluations as well as through a listening test, considering a variety of signal types.
The following examples correspond to the items included in the listening tests. The original audio files originate partly from the EBU SQAM CD [1] and the FSD50K dataset [2].
Music11
Music22
Castanets3
Violin4
Applause5
Speech6
Waves7
[1] “Sound quality assessment material recordings for subjective tests - users’ handbook for the EBU-SQAM compact disk, Tech. Rep. 3253-E, Apr. 1988.
[2] E. Fonseca, X. Favory, J. Pons, F. Font, and X. Serra, “FSD50K: An open dataset of human-labeled sound events,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 30, pp. 829–852, 2022.
[3] S. Disch, “Decorrelation for immersive audio applications and sound effects,” in Proc. DAFx-23, Copenhagen, Denmark, Sept. 2023.
1 Track 70 of the EBU SQAM CD.
2 Track 69 of the EBU SQAM CD.
3 Track 27 of the EBU SQAM CD.
4 Track 59 of the EBU SQAM CD.
5 Item 395414 of the FSD50K evaluation subset, uploader: debsound, license: http://creativecommons.org/licenses/by-nc/3.0/.
6 Track 49 of the EBU SQAM CD.
7 Item 161700 of the FSD50K evaluation subset, uploader: xserra, license: http://creativecommons.org/licenses/by/3.0/.