Declipping Speech Using Deep Filtering

Wolfgang Mack and Emanuël A. P. Habets

Published in the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2019), New Paltz, New York.


Recorded signals can be clipped in case the sound pressure or analog signal amplification is too large. Clipping is a non-linear distortion, which limits the maximal magnitude modulation of the signal and changes the energy distribution in frequency domain, and hence degrades the quality of the recording. Consequently, for declipping, some frequencies have to be amplified and others attenuated. We propose a declipping method by using the recently proposed deep filtering technique which is capable of extracting and reconstructing a desired signal from a degraded input. Deep filtering operates in the short-time Fourier transform (STFT) domain estimating a complex multidimensional filter for each desired STFT bin. The filters are applied to defined areas of the clipped STFT to obtain for each filter a single complex STFT bin estimation of the declipped STFT. The filter estimation, thereby, is performed via a deep neural network trained with simulated data using soft- or hardclipping. The loss function minimizes the reconstruction mean-squared error between the non-clipped and the declipped STFTs. We evaluate our approach using simulated data degraded by hard- and softclipping and conducted a pairwise comparison listening test with measured signals comparing our approach to one commercial and one opensource declipping method. Our approach outperformed the baselines for declipping speech signals for measured data for strong and medium clipping.


Medium Measured Clipping
Strong Measured Clipping