This is the accompanying website for the following paper:
@inproceedings{ZalkowM20_WeaklyAlignedTrain_ISMIR, author = {Frank Zalkow and Meinard M{\"u}ller}, title = {Using Weakly Aligned Score--Audio Pairs to Train Deep Chroma Models for Cross-Modal Music Retrieval}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})}, address = {Montréal, Canada}, year = {2020}, pages = {184--191}, url-details = {https://www.audiolabs-erlangen.de/resources/MIR/2020-ISMIR-chroma-ctc}, url-pdf = {2020_ZalkowM_CTC_ISMIR.pdf} }
Many music information retrieval tasks involve the comparison of a symbolic score representation with an audio recording. A typical strategy is to compare score–audio pairs based on a common mid-level representation, such as chroma features. Several recent studies demonstrated the effectiveness of deep learning models that learn task-specific mid-level representations from temporally aligned training pairs. However, in practice, there is often a lack of strongly aligned training data, in particular for real-world scenarios. In our study, we use weakly aligned score–audio pairs for training, where only the beginning and end of a score excerpt is annotated in an audio recording, without aligned correspondences in between. To exploit such weakly aligned data, we employ the Connectionist Temporal Classification (CTC) loss to train a deep learning model for computing an enhanced chroma representation. We then apply this model to a cross-modal retrieval task, where we aim at finding relevant audio recordings of Western classical music, given a short monophonic musical theme in symbolic notation as a query. We present systematic experiments that show the effectiveness of the CTC-based model for this theme-based retrieval task.
Please note that the following journal article represents a significant expansion of the ISMIR conference paper.
@article{ZalkowMueller21_ChromaCTC_TASLP, author = {Frank Zalkow and Meinard M{\"u}ller}, title = {{CTC}-Based Learning of Chroma Features for Score--Audio Music Retrieval}, journal = {{IEEE}/{ACM} Transactions on Audio, Speech, and Language Processing}, volume = {29}, pages = {2957--2971}, year = {2021}, doi = {10.1109/TASLP.2021.3110137}, url-details = {https://www.audiolabs-erlangen.de/resources/MIR/2021_TASLP-ctc-chroma}, url-pdf = {https://ieeexplore.ieee.org/document/9531521}, }
Pre-trained models and code to apply them are available at:
The repository also contains two public domain audio excerpts. These excerpts are given in the following table.
ID | Composer | Work | Performer | Description | Audio |
---|---|---|---|---|---|
1 | Beethoven | Symphony no. 5, op. 67 | Davis High School Symphony Orchestra | First movement, first theme | |
2 | Beethoven | Piano Sonata no. 2, op. 2 no. 2 | Paul Pitman | First movement, second theme |
Furthermore, the repository contains a Jupyter notebook that shows how to apply the model described in the paper. The following table links HTML exports of this notebook for the two audio excerpts and all model variants in the repository. The model variants are due to different training and validation splits.
Audio Excerpt | Model Variant | Link |
---|---|---|
1 | train123valid4 | [link] |
1 | train234valid5 | [link] |
1 | train345valid1 | [link] |
1 | train451valid2 | [link] |
1 | train512valid3 | [link] |
1 | train1234valid5 | [link] |
1 | train2345valid1 | [link] |
1 | train3451valid2 | [link] |
1 | train4512valid3 | [link] |
1 | train5123valid4 | [link] |
2 | train123valid4 | [link] |
2 | train234valid5 | [link] |
2 | train345valid1 | [link] |
2 | train451valid2 | [link] |
2 | train512valid3 | [link] |
2 | train1234valid5 | [link] |
2 | train2345valid1 | [link] |
2 | train3451valid2 | [link] |
2 | train4512valid3 | [link] |
2 | train5123valid4 | [link] |
Frank Zalkow and Meinard Müller are supported by the German Research Foundation (DFG-MU 2686/11-1, MU 2686/12-1). We thank Daniel Stoller for fruitful discussions on the CTC loss, and Michael Krause for proof-reading the manuscript. We also thank Stefan Balke and Vlora Arifi-Müller as well as all students involved in the annotation work, especially Lena Krauß and Quirin Seilbeck. The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer Institute for Integrated Circuits IIS. The authors gratefully acknowledge the compute resources and support provided by the Erlangen Regional Computing Center (RRZE).
@book{BarlowM75_MusicalThemes_BOOK, Author = {Harold Barlow and Sam Morgenstern}, Edition = {Revised edition}, Publisher = {Crown Publishers, Inc.}, Title = {A Dictionary of Musical Themes}, Year = {1975} }
@inproceedings{GravesFGS15_CTC_ICML, author = {Alex Graves and Santiago Fern{\'{a}}ndez and Faustino J. Gomez and J{\"{u}}rgen Schmidhuber}, title = {Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks}, booktitle = {Proceedings of the International Conference on Machine Learning ({ICML})}, pages = {369--376}, year = {2006}, address = {Pittsburgh, Pennsylvania, USA} }
@inproceedings{BalkeALM16_BarlowRetrieval_ICASSP, author = {Stefan Balke and Vlora Arifi-M{\"u}ller and Lukas Lamprecht and Meinard M{\"u}ller}, title = {Retrieving Audio Recordings Using Musical Themes}, booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})}, address = {Shanghai, China}, year = {2016}, pages = {281--285}, }
@inproceedings{ZalkowBM19_SalienceRetrieval_ICASSP, author = {Frank Zalkow and Stefan Balke and Meinard M{\"u}ller}, title = {Evaluating Salience Representations for Cross-Modal Retrieval of Western Classical Music Recordings}, booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})}, address = {Brighton, United Kingdom}, year = {2019}, pages = {311--335}, url-details = {https://www.audiolabs-erlangen.de/resources/MIR/2019-ICASSP-BarlowMorgenstern/}, }
@inproceedings{ZalkowM20_WeaklyAlignedTrain_ISMIR, author = {Frank Zalkow and Meinard M{\"u}ller}, title = {Using Weakly Aligned Score--Audio Pairs to Train Deep Chroma Models for Cross-Modal Music Retrieval}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})}, address = {Montréal, Canada}, year = {2020}, pages = {184--191}, url-details = {https://www.audiolabs-erlangen.de/resources/MIR/2020-ISMIR-chroma-ctc}, url-pdf = {2020_ZalkowM_CTC_ISMIR.pdf} }