SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement

Martin Strauss, Nicola Pia, Nagashree K. S. Rao and Bernd Edler

presented at WASPAA, 2023

Listening test items

The test items were created from the WSJ-CHiME3 dataset described in [5]. Proposed methods are SE-Flow+condNet and SEFGAN. The comparing methods are MetricGAN+ [2], Conv-TasNet [2] and SGMSE+ [3].

m_1.73

f_0.66

f_6.47

m_0.6

f_6.04

f_1.69

m_2.64

m_2.67

References

[1] S.-W. Fu, C. Yu, T.-A. Hsieh, P. Plantinga, M. Ravanelli, X. Lu, Y. Tsao, "MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement", in Proceedings Interspeech Conference, 2021, pp. 201–205.

[2] Luo and N. Mesgarani, "Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, pp. 1256-1266, 2019

[3] J. Richter, S. Welker, J.-M. Lemercier, B. Lay, T. Gerkmann, "Speech enhancement and dereverberation with diffusion-based generative models," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 31, pp. 2351-2364, 2023