Towards Transient Restoration in Score-informed Audio Decomposition

This is the accompanying website for the paper "Towards Transient Restoration in Score-informed Audio Decomposition" by Christian Dittmar and Meinard Müller.

Abstract

Our goal is to improve the perceptual quality of transient signal components extracted in the context of music source separation. Many state-of-the-art techniques are based on applying a suitable decomposition to the magnitude Short-Time Fourier Transform (STFT) of the mixture signal (see e.g.,[2,3]). The phase information required for the reconstruction of individual component signals is usually taken from the mixture, resulting in a complex-valued, modified STFT (MSTFT). There are different methods for reconstructing a time-domain signal whose STFT approximates the target MSTFT. Due to phase inconsistencies, these reconstructed signals are likely to contain artifacts such as pre-echos preceding transient components. We propose a simple, yet effective extension of the iterative signal reconstruction procedure by Griffin and Lim to remedy this problem. We denote the classic procedure as proposed in [4] as GL and our extended method as proposed in [1] as TR.

The following two animations show the effect of the GL iterations vs. TR iterations in the reconstruction of a single drum hit from an MSTFT. Case 2 refers to initialization of the MSTFT with oracle (known groundtruth) magnitude and zero phase. For each case, we show the evolution of the signal reconstruction over 200 iterations of the GL (left) and the TR (right) procedure. It can clearly be seen, that the TR method quickly attenuates the pre-echos. The vertical blue line marks the known transient position.

Case 2 GL Case 2 TR

Further down this page we provide 5 audio examples, the first one being used as illustrative example in the paper, the remaining 4 contained in the test set that we used for evaluation. All evaluation items are taken from the "WaveDrum02" subset of the "IDMT-SMT-Drums" dataset. Our test items are drum loops, where we have perfectly isolated single tracks of the involved instruments available as well as annotations of each drum onset available. For further details about the items in the test set, please refer to [2]. Below, you can listen to reconstructions of the drumloop mixtures as well as each single track. Per reconstruction, we present the following cases:

Case MSTFT Initialization
Original Oracle (known groundtruth) signals used, no reconstruction applied.
Case 1 Oracle-based magnitude combined with mixture phase.
Case 2 Oracle-based magnitude combined with zero phase.
Case 3 NMFD-based magnitude combined with mixture phase.
Case 4 NMFD-based magnitude combined with zero phase.

Our evaluation results showed that the reduction of pre-echos worked best for TR reconstruction in Case 1 and worst for GL reconstruction in Case 4. We recommend that you listen to the results below using headphones. Using the audio player, you can switch seamlessly between the different reconstructions as well as the original signals using the radio buttons. If desired, you can also download the example signals from the tables below the audio players.

License and Acknowledgements

We would like to thank the colleagues from Fraunhofer IDMT for making the "IDMT-SMT-Drums" dataset publicly available under the Creative Commons Attribution-ShareAlike 4.0 International License ("by-sa"). Our reconstructed signals are derived from the original dataset and consequently fall under the same license.

Audio Examples

Drum loop: TechnoDrum02_00

TechnoDrum02_00_MIX
TechnoDrum02_00_KD
TechnoDrum02_00_SD
TechnoDrum02_00_HH
Instrument Original
signal
Case 1
GL
Case 1
TR
Case 2
GL
Case 2
TR
Case 3
GL
Case 3
TR
Case 4
GL
Case 4
TR
Mixture [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]
Kick drum [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]
Snare drum [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]
Hi-hat [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]

Drum loop: WaveDrum02_58

WaveDrum02_58_MIX
WaveDrum02_58_KD
WaveDrum02_58_SD
WaveDrum02_58_HH
Instrument Original
signal
Case 1
GL
Case 1
TR
Case 2
GL
Case 2
TR
Case 3
GL
Case 3
TR
Case 4
GL
Case 4
TR
Mixture [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]
Kick drum [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]
Snare drum [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]
Hi-hat [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]

Drum loop: WaveDrum02_23

WaveDrum02_23_MIX
WaveDrum02_23_KD
WaveDrum02_23_SD
WaveDrum02_23_HH
Instrument Original
signal
Case 1
GL
Case 1
TR
Case 2
GL
Case 2
TR
Case 3
GL
Case 3
TR
Case 4
GL
Case 4
TR
Mixture [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]
Kick drum [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]
Snare drum [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]
Hi-hat [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]

Drum loop: WaveDrum02_55

WaveDrum02_55_MIX
WaveDrum02_55_KD
WaveDrum02_55_SD
WaveDrum02_55_HH
Instrument Original
signal
Case 1
GL
Case 1
TR
Case 2
GL
Case 2
TR
Case 3
GL
Case 3
TR
Case 4
GL
Case 4
TR
Mixture [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]
Kick drum [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]
Snare drum [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]
Hi-hat [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]

Drum loop: WaveDrum02_42

WaveDrum02_42_MIX
WaveDrum02_42_KD
WaveDrum02_42_SD
WaveDrum02_42_HH
Instrument Original
signal
Case 1
GL
Case 1
TR
Case 2
GL
Case 2
TR
Case 3
GL
Case 3
TR
Case 4
GL
Case 4
TR
Mixture [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]
Kick drum [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]
Snare drum [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]
Hi-hat [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav] [wav]

References

  1. Christian Dittmar and Meinard Müller
    Towards Transient Restoration in Score-informed Audio Decomposition
    In Proceedings of the International Conference on Digital Audio Effects (DAFx): 145–152, 2015. PDF
    @inproceedings{DittmarM15_TransientRestore_DAFx,
    author      = {Christian Dittmar and Meinard M\"{u}ller},
    booktitle   = {Proceedings of the International Conference on Digital Audio Effects ({DAFx})},
    title       = {Towards Transient Restoration in Score-informed Audio Decomposition},
    year        = {2015},
    month       = {December},
    address     = {Trondheim, Norway},
    pages       = {145--152},
    url-pdf     = {http://www.ntnu.edu/documents/1001201110/1266017954/DAFx-15_submission_31.pdf/bd700f13-cbb0-441b-b1a7-9078e268e99f},
    }
  2. Christian Dittmar and Daniel Gärtner
    Real-Time Transcription and Separation of Drum Recordings based on NMF Decomposition
    In Proceedings of the International Conference on Digital Audio Effects (DAFx): 187–194, 2014. PDF
    @inproceedings{DittmarG14_DrumTranscription_DAFX,
    author = {Christian Dittmar and Daniel G{\"a}rtner},
    title = {Real-Time Transcription and Separation of Drum Recordings based on {NMF} Decomposition},
    booktitle = {Proceedings of the International Conference on Digital Audio Effects ({DAFx})},
    year = {2014},
    address = {Erlangen, Germany},
    month = {September},
    pages={187--194},
    url-pdf={http://www.dafx14.fau.de/papers/dafx14_christian_dittmar_real_time_transcription_a.pdf},
    }
  3. Paris Smaragdis
    Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs
    In Proceedings of the International Conference on Independent Component Analysis and Blind Signal Separation ICA: 494–499, 2004. PDF
    @inproceedings{Smaragdis04_NMD,
    author      = {Paris Smaragdis},
    title       = {Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs},
    booktitle   = {Proceedings of the International Conference on Independent Component Analysis and Blind Signal Separation {ICA}},
    pages       = {494--499},
    address     = {Grenada, Spain},
    year        = {2004}
    month   = {September},
    url-pdf={http://www.merl.com/publications/docs/TR2004-104.pdf},
    }
  4. Daniel W. Griffin and Jae S. Lim
    Signal estimation from modified short-time Fourier transform
    IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2): 236–243, 1984. PDF
    @article{GriffinL84_SpecgramInversion_TASSP,
    author={Daniel W. Griffin and Jae S. Lim},
    title={Signal estimation from modified short-time {F}ourier transform},
    journal={{IEEE} Transactions on Acoustics, Speech, and Signal Processing},
    year={1984},
    volume={32},
    number={2},
    pages={236--243},
    url-pdf={http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1164317},
    }