In Chapter 8 of [Müller, FMP, Springer 2015] on audio decomposition, we present a challenging research direction that is closely related to source separation. Within this wide research area, we consider three subproblems: harmonic–percussive separation, main melody extraction, and score-informed audio decomposition. Within these scenarios, we discuss a number of key techniques including instantaneous frequency estimation, fundamental frequency (F0) estimation, spectrogram inversion, and nonnegative matrix factorization (NMF). Furthermore, we encounter a number of acoustic and musical properties of audio recordings that have been introduced and discussed in previous chapters.
8.1 Harmonic–Percussive Separation
8.2 Melody Extraction
8.3 NMF-Based Audio Decomposition
8.4 Further Notes
Topic | Relation to [Müller, FMP, Springer 2015] & Description | HTML | IPYNB |
Harmonic–Percussive Separation (HPS) | [Section 8.1.1] Harmonic sound; percussive sound; median filter; binary mask; soft mask; signal reconstruction; HPR experiments; Violin–Castanets example; audio examples (diverse) |
[html] | [ipynb] |
Harmonic–Residual–Percussive Separation (HRPS) | [Section 8.1.1, Exercise 8.5] Separation factor; residual component; binary mask; cascaded HRPS; Violin–Applause–Castanets example; Bornemark example (Stop Messing With Me) |
[html] | [ipynb] |
Signal Reconstruction | [Section 8.1.2] Inverse DFT; inverse STFT; modified STFT; overlap–add procedure; Griffin–Lim optimization problem |
[html] | [ipynb] |
Applications of HPS and HPRS | [Section 8.1.3] Feature enhancement; chroma feature; onset detection; time-scale modification; Violin–Castanets example |
[html] | [ipynb] |
Instantaneous Frequency Estimation | [Section 8.2.1] Phase wrapping; principle argument; exponential function; phase prediction; instantaneous frequency (IF); polar coordinates; bin offset; visualization of IF values; dependency on hop size; C4 piano example |
[html] | [ipynb] |
Salience Representation | [Section 8.2.2] Log-frequency spectrogram; instantaneous frequency; binning; harmonic summation; salience; Weber example (Freischütz) |
[html] | [ipynb] |
Fundamental Frequency Tracking | [Section 8.2.3] Frequency trajectory; sonification; salience representation; continuity constraint; dynamic programming; score-informed constraint; constraint region; Weber example (Freischütz); Bornemark example (Stop Messing With Me) |
[html] | [ipynb] |
Melody Extraction and Separation | [Section 8.2, Section 8.2.3.3] Melody; salience representation; predominant frequency; F0-trajectory; separation; binary mask; harmonics; frequency-dependent tolerance; signal reconstruction; Weber example (Freischütz); Bornemark example (Stop Messing With Me) |
[html] | [ipynb] |
Nonnegative Matrix Factorization (NMF) | [Section 8.3.1] Matrix factorization; nonnegative matrix; rank; template vector; activation vector; gradient descent; multiplicative update rule; magnitude spectrogram; Chopin example (Op. 28, No. 4); C-major scale example |
[html] | [ipynb] |
NMF-Based Spectrogram Factorization | [Section 8.3.2] Spectrogram factorization; score-informed NMF; initialization; template constraints; pitch information; activation constraints; score information; onset model; Chopin example (Op. 28, No. 4) |
[html] | [ipynb] |
NMF-Based Audio Decomposition | [Section 8.3.3] Score-informed NMF; activation matrix; spectral masking; audio decomposition; audio editing; Chopin example (Op. 28, No. 4) |
[html] | [ipynb] |