Martin Strauss, Matteo Torcoli and Bernd Edler
presented at SLT, 2022
Listening test items. The test data is the dataset published by Valentini et al. [1]. The comparing methods are MetricGAN+ [2] and CDiffuSE [3].
ID | Gender | Noise | Test input SNR [dB] | Website |
---|---|---|---|---|
p232_003 | male | Bus | 7.5 | Link |
p232_219 | male | Cafe | 7.5 | Link |
p232_005 | male | Bus | 2.5 | Link |
p232_032 | male | Cafe | 2.5 | Link |
p257_186 | female | Bus | 7.5 | Link |
p257_049 | female | Cafe | 7.5 | Link |
p257_367 | female | Bus | 2.5 | Link |
p257_251 | female | Cafe | 2.5 | Link |
[1] C. Valentini-Botinhao, X. Wang, S. Takaki, and J. Yamagishi, “Speech enhancement for a noise-robust text-to-speech synthesis system using deep recurrent neural networks,” in Proceedings Interspeech Conference, 2016, pp. 352–356.
[2] S.-W. Fu, C. Yu, T.-A. Hsieh, P. Plantinga, M. Ravanelli, X. Lu, Y. Tsao, "MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement", in Proceedings Interspeech Conference, 2021, pp. 201–205.
[3] Y.-J. Lu, Z.-Q. Wang, S. Watanabe, A. Richard, C. Yu, and Y. Tsao, "CONDITIONAL DIFFUSION PROBABILISTIC MODEL FOR SPEECH ENHANCEMENT", in Proceedings ICASSP, 2022, pp. 7402-7406