SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement
Martin Strauss, Nicola Pia, Nagashree K. S. Rao and Bernd Edler
presented at WASPAA, 2023
Listening test items
The test items were created from the WSJ-CHiME3 dataset described in [5].
Proposed methods are SE-Flow+condNet and SEFGAN.
The comparing methods are MetricGAN+ [2], Conv-TasNet [2] and SGMSE+ [3].
m_1.73
- Play
- Stop
- Repeat
- --:--.--- / --:--.---
- Noisy
- Reference
- LP35
- MetricGAN+
- Conv-TasNet
- SGMSE+
- SE-Flow+condNet
- SEFGAN
f_0.66
- Play
- Stop
- Repeat
- --:--.--- / --:--.---
- Noisy
- Reference
- LP35
- MetricGAN+
- Conv-TasNet
- SGMSE+
- SE-Flow+condNet
- SEFGAN
f_6.47
- Play
- Stop
- Repeat
- --:--.--- / --:--.---
- Noisy
- Reference
- LP35
- MetricGAN+
- Conv-TasNet
- SGMSE+
- SE-Flow+condNet
- SEFGAN
m_0.6
- Play
- Stop
- Repeat
- --:--.--- / --:--.---
- Noisy
- Reference
- LP35
- MetricGAN+
- Conv-TasNet
- SGMSE+
- SE-Flow+condNet
- SEFGAN
f_6.04
- Play
- Stop
- Repeat
- --:--.--- / --:--.---
- Noisy
- Reference
- LP35
- MetricGAN+
- Conv-TasNet
- SGMSE+
- SE-Flow+condNet
- SEFGAN
f_1.69
- Play
- Stop
- Repeat
- --:--.--- / --:--.---
- Noisy
- Reference
- LP35
- MetricGAN+
- Conv-TasNet
- SGMSE+
- SE-Flow+condNet
- SEFGAN
m_2.64
- Play
- Stop
- Repeat
- --:--.--- / --:--.---
- Noisy
- Reference
- LP35
- MetricGAN+
- Conv-TasNet
- SGMSE+
- SE-Flow+condNet
- SEFGAN
m_2.67
- Play
- Stop
- Repeat
- --:--.--- / --:--.---
- Noisy
- Reference
- LP35
- MetricGAN+
- Conv-TasNet
- SGMSE+
- SE-Flow+condNet
- SEFGAN
References
[1] S.-W. Fu, C. Yu, T.-A. Hsieh, P. Plantinga, M. Ravanelli, X. Lu, Y. Tsao, "MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement", in Proceedings Interspeech Conference, 2021, pp. 201–205.
[2] Luo and N. Mesgarani, "Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, pp. 1256-1266, 2019
[3] J. Richter, S. Welker, J.-M. Lemercier, B. Lay, T. Gerkmann, "Speech enhancement and dereverberation with diffusion-based generative models," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 31, pp. 2351-2364, 2023