This is the accompanying website for the paper "Towards Transient Restoration in Score-informed Audio Decomposition" by Christian Dittmar and Meinard Müller.
Our goal is to improve the perceptual quality of transient signal components extracted in the context of music source separation. Many state-of-the-art techniques are based on applying a suitable decomposition to the magnitude Short-Time Fourier Transform (STFT) of the mixture signal (see e.g.,[2,3]). The phase information required for the reconstruction of individual component signals is usually taken from the mixture, resulting in a complex-valued, modified STFT (MSTFT). There are different methods for reconstructing a time-domain signal whose STFT approximates the target MSTFT. Due to phase inconsistencies, these reconstructed signals are likely to contain artifacts such as pre-echos preceding transient components. We propose a simple, yet effective extension of the iterative signal reconstruction procedure by Griffin and Lim to remedy this problem. We denote the classic procedure as proposed in [4] as GL and our extended method as proposed in [1] as TR.
The following two animations show the effect of the GL iterations vs. TR iterations in the reconstruction of a single drum hit from an MSTFT. Case 2 refers to initialization of the MSTFT with oracle (known groundtruth) magnitude and zero phase. For each case, we show the evolution of the signal reconstruction over 200 iterations of the GL (left) and the TR (right) procedure. It can clearly be seen, that the TR method quickly attenuates the pre-echos. The vertical blue line marks the known transient position.
Case 2 GL | Case 2 TR | ||
---|---|---|---|
Further down this page we provide 5 audio examples, the first one being used as illustrative example in the paper, the remaining 4 contained in the test set that we used for evaluation. All evaluation items are taken from the "WaveDrum02" subset of the "IDMT-SMT-Drums" dataset. Our test items are drum loops, where we have perfectly isolated single tracks of the involved instruments available as well as annotations of each drum onset available. For further details about the items in the test set, please refer to [2]. Below, you can listen to reconstructions of the drumloop mixtures as well as each single track. Per reconstruction, we present the following cases:
Case | MSTFT Initialization |
---|---|
Original | Oracle (known groundtruth) signals used, no reconstruction applied. |
Case 1 | Oracle-based magnitude combined with mixture phase. |
Case 2 | Oracle-based magnitude combined with zero phase. |
Case 3 | NMFD-based magnitude combined with mixture phase. |
Case 4 | NMFD-based magnitude combined with zero phase. |
Our evaluation results showed that the reduction of pre-echos worked best for TR reconstruction in Case 1 and worst for GL reconstruction in Case 4. We recommend that you listen to the results below using headphones. Using the audio player, you can switch seamlessly between the different reconstructions as well as the original signals using the radio buttons. If desired, you can also download the example signals from the tables below the audio players.
We would like to thank the colleagues from Fraunhofer IDMT for making the "IDMT-SMT-Drums" dataset publicly available under the Creative Commons Attribution-ShareAlike 4.0 International License ("by-sa"). Our reconstructed signals are derived from the original dataset and consequently fall under the same license.
Instrument | Original signal |
Case 1 GL |
Case 1 TR |
Case 2 GL |
Case 2 TR |
Case 3 GL |
Case 3 TR |
Case 4 GL |
Case 4 TR |
---|---|---|---|---|---|---|---|---|---|
Mixture | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Kick drum | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Snare drum | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Hi-hat | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Instrument | Original signal |
Case 1 GL |
Case 1 TR |
Case 2 GL |
Case 2 TR |
Case 3 GL |
Case 3 TR |
Case 4 GL |
Case 4 TR |
---|---|---|---|---|---|---|---|---|---|
Mixture | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Kick drum | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Snare drum | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Hi-hat | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Instrument | Original signal |
Case 1 GL |
Case 1 TR |
Case 2 GL |
Case 2 TR |
Case 3 GL |
Case 3 TR |
Case 4 GL |
Case 4 TR |
---|---|---|---|---|---|---|---|---|---|
Mixture | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Kick drum | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Snare drum | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Hi-hat | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Instrument | Original signal |
Case 1 GL |
Case 1 TR |
Case 2 GL |
Case 2 TR |
Case 3 GL |
Case 3 TR |
Case 4 GL |
Case 4 TR |
---|---|---|---|---|---|---|---|---|---|
Mixture | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Kick drum | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Snare drum | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Hi-hat | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Instrument | Original signal |
Case 1 GL |
Case 1 TR |
Case 2 GL |
Case 2 TR |
Case 3 GL |
Case 3 TR |
Case 4 GL |
Case 4 TR |
---|---|---|---|---|---|---|---|---|---|
Mixture | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Kick drum | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Snare drum | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
Hi-hat | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] | [wav] |
@inproceedings{DittmarM15_TransientRestore_DAFx, author = {Christian Dittmar and Meinard M\"{u}ller}, booktitle = {Proceedings of the International Conference on Digital Audio Effects ({DAFx})}, title = {Towards Transient Restoration in Score-informed Audio Decomposition}, year = {2015}, month = {December}, address = {Trondheim, Norway}, pages = {145--152}, url-pdf = {http://www.ntnu.edu/documents/1001201110/1266017954/DAFx-15_submission_31.pdf/bd700f13-cbb0-441b-b1a7-9078e268e99f}, }
@inproceedings{DittmarG14_DrumTranscription_DAFX, author = {Christian Dittmar and Daniel G{\"a}rtner}, title = {Real-Time Transcription and Separation of Drum Recordings based on {NMF} Decomposition}, booktitle = {Proceedings of the International Conference on Digital Audio Effects ({DAFx})}, year = {2014}, address = {Erlangen, Germany}, month = {September}, pages={187--194}, url-pdf={http://www.dafx14.fau.de/papers/dafx14_christian_dittmar_real_time_transcription_a.pdf}, }
@inproceedings{Smaragdis04_NMD, author = {Paris Smaragdis}, title = {Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs}, booktitle = {Proceedings of the International Conference on Independent Component Analysis and Blind Signal Separation {ICA}}, pages = {494--499}, address = {Grenada, Spain}, year = {2004} month = {September}, url-pdf={http://www.merl.com/publications/docs/TR2004-104.pdf}, }
@article{GriffinL84_SpecgramInversion_TASSP, author={Daniel W. Griffin and Jae S. Lim}, title={Signal estimation from modified short-time {F}ourier transform}, journal={{IEEE} Transactions on Acoustics, Speech, and Signal Processing}, year={1984}, volume={32}, number={2}, pages={236--243}, url-pdf={http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1164317}, }