AudioLabs - Source Separation of Piano Concertos Using Musically Motivated Augmentation Techniques

Source Separation of Piano Concertos Using Musically Motivated Augmentation Techniques

This is the accompanying website for the submission Source Separation of Piano Concertos Using Musically Motivated Augmentation Techniques.

Yigitcan Özer and Meinard Müller
Source Separation of Piano Concertos Using Musically Motivated Augmentation Techniques
IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 32: 1214–1225, 2024. PDF DOI

@article{OezerM24_PCSeparation_TASLP,
author    = {Yigitcan {\"O}zer and Meinard M{\"u}ller},
title     = {Source Separation of Piano Concertos Using Musically Motivated Augmentation Techniques},
journal   = {{IEEE/ACM} Transactions on Audio, Speech, and Language Processing ({TASLP})},
year      = {2024},
volume    = {32},
pages     = {1214--1225},
doi       = {10.1109/TASLP.2024.3356980},
url-pdf   = {2024_OezerM_PCSeparation_TASLP_ePrint.pdf}
}

Abstract

In this work, we address the novel and rarely considered source separation task of decomposing piano concerto recordings into separate piano and orchestral tracks. Being a genre written for a pianist typically accompanied by an ensemble or orchestra, piano concertos often involve an intricate interplay of the piano and the entire orchestra, leading to high spectro–temporal correlations between the constituent instruments. Moreover, in the case of piano concertos, the lack of multi-track data for training constitutes another challenge in view of data-driven source separation approaches. As a basis for our work, we adapt existing deep learning (DL) techniques, mainly used for the separation of popular music recordings. In particular, we investigate spectrogram- and waveform-based approaches as well as hybrid models operating in both spectrogram and waveform domains. As a main contribution, we introduce a musically motivated data augmentation approach for training based on artificially generated samples. Furthermore, we systematically investigate the effects of various augmentation techniques for DL-based models. For our experiments, we use a recently published, open-source dataset of multi-track piano concerto recordings. Our main findings demonstrate that the best source separation performance is achieved by a hybrid model when combining all augmentation techniques.

Test Dataset

For assessing the quantitative and subjective evaluation of our experiments, we use the dry recordings without artificial reverberation from the Piano Concerto Dataset (PCD) as our test dataset, which contains 81 excerpts with separate piano and orchestral tracks, performed by five pianists.

Source Code

For the reproducibility of the results, we provide the open-source code and pretrained models in our GitHub repository.

Audio Examples

Excerpts selected from our test corpora, separated with different source separation models:

[Bach]	[Beet1]	[Beet3]	[Chopin]	[Grieg]	[Mendel]
[Moz]	[Rach2-1]	[Rach2-3]	[Saint]	[Schum]	[Tchai]

Listening Test Results

Results of our listening tests based on the MUSHRA framework for the separated (a) piano and (b) orchestral tracks. The colored markers indicate the average rating scores enclosed by 95% confidence intervals (shown as the vertical lines).

References

Yigitcan Özer and Meinard Müller
Source Separation of Piano Concertos with Test-Time Adaptation
In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 493–500, 2022. Demo

@inproceedings{OezerM22_PianoSepAdapt_ISMIR,
author    = {Yigitcan \"Ozer and Meinard M\"uller},
title     = {Source Separation of Piano Concertos with Test-Time Adaptation},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
address   = {Bengaluru, India},
year      = {2022},
pages     = {493--500},
url-demo = {https://www.audiolabs-erlangen.de/resources/MIR/2022-PianoSep}
}

Fabian-Robert Stöter, Stefan Uhlich, Antoine Liutkus, and Yuki Mitsufuji
Open-Unmix — A Reference Implementation for Music Source Separation
Journal of Open Source Software, 4(41), 2019. DOI

@article{StoeterULM19_Unmix_JOSS,
author    = {Fabian{-}Robert St{\"{o}}ter and Stefan Uhlich and Antoine Liutkus and Yuki Mitsufuji},
title     = {{Open-Unmix} -- {A} Reference Implementation for Music Source Separation},
journal   = {Journal of Open Source Software},
volume    = {4},
number    = {41},
year      = {2019},
url       = {https://doi.org/10.21105/joss.01667},
doi       = {10.21105/joss.01667}
}

Romain Hennequin, Anis Khlif, Felix Voituret, and Manuel Moussallam
Spleeter: A Fast and Efficient Music Source Separation Tool with Pre-trained Models
Journal of Open Source Software, 5(50): 2154, 2020. DOI

@article{HennequinKVM2020_Spleeter_JOSS,
doi = {10.21105/joss.02154},
url = {https://doi.org/10.21105/joss.02154},
year = {2020},
publisher = {The Open Journal},
volume = {5},
number = {50},
pages = {2154},
author = {Romain Hennequin and Anis Khlif and Felix Voituret and Manuel Moussallam},
title = {Spleeter: A Fast and Efficient Music Source Separation Tool with Pre-trained Models},
journal = {Journal of Open Source Software},
note = {Deezer Research}
}

Alexandre Défossez
Hybrid Spectrogram and Waveform Source Separation
In Proceedings of the ISMIR 2021 Workshop on Music Source Separation, 2021.

@inproceedings{Defossez21_Demucs_ISMIR,
author    = {Alexandre Défossez},
title     = {Hybrid Spectrogram and Waveform Source Separation},
booktitle = {Proceedings of the {ISMIR} 2021 Workshop on Music Source Separation},
year      = {2021},
address   = {Online}
}

Alexandre Défossez, Nicolas Usunier, Léon Bottou, and Francis R. Bach
Music Source Separation in the Waveform Domain
2019.

@misc{DefossezUBB21_MSSWaveFormDomain_arXiV,
author    = {Alexandre Défossez and Nicolas Usunier and Léon Bottou and Francis R. Bach},
title     = {Music Source Separation in the Waveform Domain},
year      = {2019},
url       = {http://arxiv.org/abs/1911.13254},
eprint    = {1911.13254},
}

Yigitcan Özer, Simon Schwär, Vlora Arifi-Müller, Jeremy Lawrence, Emre Sen, and Meinard Müller
Piano Concerto Dataset (PCD): A Multitrack Dataset of Piano Concertos
Transactions of the International Society for Music Information Retrieval (TISMIR), 6(1): 75–88, 2023. Demo DOI

@article{OezerSALSM23_PCD_TISMIR,
title     = {{P}iano {C}oncerto {D}ataset {(PCD)}: A Multitrack Dataset of Piano Concertos},
author    = {Yigitcan \"Ozer and Simon Schw\"ar and Vlora Arifi-M\"uller and Jeremy Lawrence and Emre Sen and Meinard M\"uller},
journal   = {Transactions of the International Society for Music Information Retrieval ({TISMIR})},
volume    = {6},
number    = {1},
year      = {2023},
pages     = {75--88},
doi       = {10.5334/tismir.160},
url-demo  = {https://www.audiolabs-erlangen.de/resources/MIR/PCD}
}

Hyemi Kim, Jiyun Park, Taegyun Kwon, Dasaem Jeong, and Juhan Nam
A study of audio mixing methods for piano transcription in violin-piano ensembles
In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023. DOI

@inproceedings{KimPKJN23_AudioMixingMethods_ICASSP,
author    = {Hyemi Kim and Jiyun Park and Taegyun Kwon and Dasaem Jeong and Juhan Nam},
title     = {A study of audio mixing methods for piano transcription in violin-piano
ensembles},
booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
year      = {2023},
doi       = {10.1109/ICASSP49357.2023.10095061},
address   = {Rhodes Island, Greece}
}

Ching-Yu Chiu, Wen-Yi Hsiao, Yin-Cheng Yeh, Yi-Hsuan Yang, and Alvin Wen-Yu Su
Mixing-specific data augmentation techniques for improved blind violin/piano source separation
In 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP): 1–6, 2020.

@inproceedings{ChiuHYYSA2020_DataAugViolinPianoSeparation_MMSP,
title={Mixing-specific data augmentation techniques for improved blind violin/piano source separation},
author={Ching-Yu Chiu and Wen-Yi Hsiao and Yin-Cheng  Yeh and Yi-Hsuan Yang and Alvin Wen-Yu Su},
booktitle={2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)},
pages={1--6},
year={2020},
organization={IEEE}
}

Series, B
Method for the subjective assessment of intermediate quality level of audio systems
International Telecommunication Union Radiocommunication Assembly, 2014.

@article{Mushra14,
title       = {Method for the subjective assessment of intermediate quality level of audio systems},
author      = {Series, B},
journal     = {International Telecommunication Union Radiocommunication Assembly},
year        = {2014}
}

Fabian-Robert Stöter, Stefan Bayer, and Bernd Edler
Unison Source Separation
In Proceedings of the International Conference on Digital Audio Effects (DAFx): 235–241, 2014.

@inproceedings{StoeterBE14_UnisonSourceSep_DAFx,
author    = {Fabian-Robert St{\"o}ter and Stefan Bayer and Bernd Edler},
title     = {Unison Source Separation},
booktitle = {Proceedings of the International Conference on Digital Audio Effects ({DAFx})},
pages     = {235--241},
year      = {2014},
address   = {Erlangen, Germany}
}

William Cole
The Form of Music
The Associated Board of the Royal Schools of Music (ABRSM), 1997.

@book{Cole97_FormOfMusic_BOOK,
author    = {William Cole},
publisher = {The {A}ssociated {B}oard of the {R}oyal {S}chools of {M}usic ({ABRSM})},
address   = {London, UK},
title     = {The Form of Music},
year      = {1997}
}

Cuthbert Morton Girdlestone
Mozart & His Piano Concertos
Cassell & Company Ltd., 1948.

@book{Girdlestone48_MozartPianoConcertos_BOOK,
author    = {Cuthbert Morton Girdlestone},
title     = {Mozart \& His Piano Concertos},
publisher = {Cassell \& Company Ltd.},
address   = {London, UK},
year      = {1948}
}

Acknowledgments

This work was supported by the German Research Foundation (DFG MU 2686/10-2). The authors are with the International Audio Laboratories Erlangen, a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer Institute for Integrated Circuits IIS.