This is the accompanying page for the paper "Improving Time-Scale Modification of Music Audio Signals Using Harmonic-Percussive Separation" [bib] by Jonathan Driedger, Meinard Müller and Sebastian Ewert. The paper was published in the IEEE Signal Processing Letters and can be found here.

Abstract

A major problem in time-scale modification (TSM) of music signals is that percussive transients are often perceptually degraded. To prevent this degradation, some TSM approaches try to explicitly identify transients in the input signal and to handle them in a special way. However, such approaches are problematic for two reasons. First, errors in the transient detection have an immediate influence on the final TSM result and, second, a perceptual transparent preservation of transients is by far not a trivial task. In this paper we present a TSM approach that handles transients implicitly by first separating the signal into a harmonic component as well as a percussive component which typically contains the transients. While the harmonic component is modified with a phase vocoder approach using a large frame size, the noise-like percussive component is modified with a simple time-domain overlap-add technique using a short frame size, which preserves the transients to a high degree without any explicit transient detection.




Evaluation

Here, we present the audio files used in our conducted listening experiment. For different audio examples (all mono, sampled at 22050 Hz), the outputs of the following TSM algorithms are shown.

  • HP: Our proposed method.
  • EL: The elastique algorithm by zPlane. The algorithm was run in "Pro" mode.
  • PV: The phase vocoder with identity phase locking as proposed by Jean Laroche and Mark Dolson.
  • NW: An algorithm proposed by Frederik Nagel and Andreas Walther that relies on transient detection.
  • WS: The Waveform-Similarity OverLap-Add algorithm proposed by Werner Verhelst and Marc Roelands.

Additionally, the following auxilliary audio files are given.

  • H: The harmonic component of the signal as used by the method HP.
  • H-PV: The harmonic component H stretched with PV.
  • P: The percussive component of the signal as used by the methods HP.
  • P-OLA: The percussive component P stretched with OLA.

Constant stretching factor of α=1.2

Audio Original HP EL PV NW WS H H-PV P P-OLA
Bongo [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
CastanetsViolin [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
DrumSolo [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Glockenspiel [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Stepdad [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Jazz [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Pop [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
SingingVoice [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
SynthMono [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
SynthPoly [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]

Constant stretching factor of α=1.8

Audio Original HP EL PV NW WS H H-PV P P-OLA
Bongo [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
CastanetsViolin [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
DrumSolo [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Glockenspiel [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Stepdad [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Jazz [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Pop [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
SingingVoice [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
SynthMono [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
SynthPoly [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]

You can download a rar-archive containing all sound files in wav-format here.

Listening Experiment

You can find the conducted listening experiment here.




Additional Material

Here, we present additional time-scale modification results for the above listed algorithms for extreme stretching factors α which were not included in our listening experiment.

Constant stretching factor of α=0.5

Audio Original HP EL PV NW WS H H-PV P P-OLA
Bongo [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
CastanetsViolin [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
DrumSolo [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Glockenspiel [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Stepdad [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Jazz [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Pop [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
SingingVoice [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
SynthMono [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
SynthPoly [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]

Constant stretching factor of α=3.0

Audio Original HP EL PV NW WS H H-PV P P-OLA
Bongo [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
CastanetsViolin [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
DrumSolo [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Glockenspiel [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Stepdad [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Jazz [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
Pop [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
SingingVoice [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
SynthMono [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]
SynthPoly [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3] [mp3]






For comments and feedback, please contact Jonathan Driedger.

page last modified Wednesday, 13. November 2013 - 09:20