This is the accompanying website for the following paper:
@inproceedings{KrauseWM23_CrossVersionRepresentationLearning_ISMIR, author = {Michael Krause and Christof Wei{\ss} and Meinard M{\"u}ller}, title = {A Cross-Version Approach to Audio Representation Learning for Orchestral Music}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})}, address = {Milano, Italy}, year = {2023} }
Deep learning systems have become popular for tackling a variety of music information retrieval tasks. However, these systems often require large amounts of labeled data for supervised training, which can be very costly to obtain. To alleviate this problem, recent papers on learning music audio representations employ alternative training strategies that utilize unannotated data. In this paper, we introduce a novel cross-version approach to audio representation learning that can be used with music datasets containing several versions (performances) of a musical work. Our method exploits the correspondences that exist between two versions of the same musical section. We evaluate our proposed cross-version approach qualitatively and quantitatively on complex orchestral music recordings and show that it can better capture aspects of instrumentation compared to techniques that do not use cross-version information.
Extract in a subfolder outputs/models
in the code repository
The aligned annotations for the recordings used in this paper are made publicly available as a dataset for further research.
Extract in a subfolder data/
in the code repository.
This dataset contains aligned instrument activity annotations (in 02_Annotations/ann_audio_instruments_npz/csv
). The corresponding .wav-files of music audio need to be obtained individually and placed in 01_RawData/audio_wav
.
Furthermore, we also provide warping paths (in 02_Annotations/ann_audio_sync
) that map the different versions to a common musical time axis.
Finally, we also provide note annotations in 02_Annotations/ann_audio_note_npz
.
All files follow the naming convention
(Subset)_(Composer)_(Work)_V(Version)
where
So, for example, the file
01_RawData/audio_wav/Ours_Tschaikowsky_ViolinConcertoMvmt1_V3.wav
contains the audio for the third version of the first movement of Tschaikowsky's Violin Concerto.
The instrument annotations in 02_Annotations/ann_audio_instruments_npz
are provided as .npz-files (i.e. Numpy arrays) to be loaded with python. The arrays have the shape
(N, 18)
with the first dimension corresponding to frames of the recording and the second dimension corresponding to different classes. We use a frame-rate of 43,0664 Hz (obtained with a hop size of 512 at a sample rate of 22050Hz). The individual entries in the second dimension correspond to the classes:
Index | Class identifier |
---|---|
00 | INST |
01 | WW |
02 | BR |
03 | TMP |
04 | VOC |
05 | ST |
06 | Fl |
07 | Ob |
08 | Cl |
09 | Bn |
10 | Hn |
11 | Tpt |
12 | Fe |
13 | Ma |
14 | Vn |
15 | Va |
16 | Vc |
17 | Db |
(see "Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings" for explanations of the identifiers)
Exactly the same annotations are also provided as .csv-files in 02_Annotations/ann_audio_instruments_csv
. Here, each row corresponds to an activity region for one of the classes, with starts and ends given in seconds. In 03_ExtraMaterial/npz_to_csv_anno.py
, we provide a Python script for obtaining the .csv-files from the .npy-files.
The .csv-files in 02_Annotations/ann_audio_sync
map the different versions per piece to a common time axis. Each file contains two columns, separated by a comma, where
For the recordings in the WagnerRing subset, the common musical time axis is given in measures. For the remaining recordings, the common musical time axis does not have a unit and simply progresses from 0 (beginning of the piece) to 1 (end of the piece).
The note annotations in 02_Annotations/ann_audio_note_npz
are provided as .npz-files (i.e. Numpy arrays) to be loaded with python. The arrays have the shape
(N, 128)
with the first dimension corresponding to frames of the recording (see above for frame rate etc.) and the second dimension corresponding to different MIDI pitches.
This dataset consists of a combination of audio from different previously released datasets, as well as audio we collected ourselves. In all cases, annotations are provided by us. More details on the individual subsets of the dataset follow.
The audio files are from the WagnerRing dataset described in
[WRD] Christof Weiß, Vlora Arifi-Müller, Michael Krause, Frank Zalkow, Stephanie Klauk, Rainer Kleinertz, and Meinard Müller. Wagner Ring Dataset: A complex opera scenario for music processing and computational musicology. TISMIR, 2023
Concretely, we use the following versions of the first act of Die Walküre (also referred to as B1 in the dataset):
Subset | Composer | Work | Version ID (in our dataset) | In our test set? | Performer / Label | Identifier in [WRD] |
---|---|---|---|---|---|---|
WagnerRing | Wagner | WalkuereAct1 | 1 | x | Karajan / DG 1998 | Karajan1966 |
WagnerRing | Wagner | WalkuereAct1 | 2 | Neuhold / MEMBRAN 1995 | Neuhold1993 | |
WagnerRing | Wagner | WalkuereAct1 | 3 | Levine / DG 2012 | Levine1987 | |
WagnerRing | Wagner | WalkuereAct1 | 4 | Böhm / DECCA 2008 | Bohm1967 | |
WagnerRing | Wagner | WalkuereAct1 | 5 | Keilberth/Furtwängler / ZYX 2012 | KeilberthFurtw1952 | |
WagnerRing | Wagner | WalkuereAct1 | 6 | Boulez / PHILIPS 2006 | Boulez1980 | |
WagnerRing | Wagner | WalkuereAct1 | 7 | x | Barenboim / Warner Classics 2009 | Barenboim1991 |
WagnerRing | Wagner | WalkuereAct1 | 8 | x | Haitink / EMI Classics 2008 | Haitink1988 |
Information on the individual audio files is provided in the following table. Some links are contained to the website https://cc0.oer-musik.de/ (containing recordings which are public domain in Germany).
Subset | Composer | Work | Version ID | In our test set? | Performer / Label | cc0 Link |
---|---|---|---|---|---|---|
Ours | Beethoven | Symphony3Mvmt1 | 1 | x | Abbado / DG | |
Ours | Beethoven | Symphony3Mvmt2 | 1 | Blomstedt / BC | ||
Ours | Beethoven | Symphony3Mvmt2 | 2 | Drahos / NAXOS | ||
Ours | Beethoven | Symphony3Mvmt2 | 3 | Jarvi / SONY | ||
Ours | Beethoven | Symphony3Mvmt2 | 4 | Scherchen / Heliodor | https://cc0.oer-musik.de/428002/ | |
Ours | Beethoven | Symphony3Mvmt2 | 5 | Fricsay / DG | https://cc0.oer-musik.de/002894793106-55/ | |
Ours | Beethoven | Symphony3Mvmt3 | 1 | Blomstedt / BC | ||
Ours | Beethoven | Symphony3Mvmt3 | 2 | Drahos / NAXOS | ||
Ours | Beethoven | Symphony3Mvmt3 | 3 | Jarvi / SONY | ||
Ours | Beethoven | Symphony3Mvmt3 | 4 | Scherchen / Heliodor | https://cc0.oer-musik.de/428002/ | |
Ours | Beethoven | Symphony3Mvmt3 | 5 | Fricsay / DG | https://cc0.oer-musik.de/002894793106-55/ | |
Ours | Beethoven | Symphony3Mvmt4 | 1 | Blomstedt / BC | ||
Ours | Beethoven | Symphony3Mvmt4 | 2 | Drahos / NAXOS | ||
Ours | Beethoven | Symphony3Mvmt4 | 3 | Jarvi / SONY | ||
Ours | Beethoven | Symphony3Mvmt4 | 4 | Scherchen / Heliodor | https://cc0.oer-musik.de/428002/ | |
Ours | Beethoven | Symphony3Mvmt4 | 5 | Fricsay / DG | https://cc0.oer-musik.de/002894793106-55/ | |
Ours | Dvorak | Symphony9Mvmt1 | 1 | Kubelik / MERCURY | ||
Ours | Dvorak | Symphony9Mvmt1 | 2 | Szell / SONY | ||
Ours | Dvorak | Symphony9Mvmt1 | 3 | Karajan / MEMBRAN | https://cc0.oer-musik.de/600001041-9/ | |
Ours | Dvorak | Symphony9Mvmt1 | 4 | Toscanini / DG | https://cc0.oer-musik.de/at114/ | |
Ours | Dvorak | Symphony9Mvmt1 | 5 | Fricsay / DG | https://cc0.oer-musik.de/lpm18142/ | |
Ours | Dvorak | Symphony9Mvmt2 | 1 | Kubelik / MERCURY | ||
Ours | Dvorak | Symphony9Mvmt2 | 2 | Szell / SONY | ||
Ours | Dvorak | Symphony9Mvmt2 | 3 | Leaper / SONY | ||
Ours | Dvorak | Symphony9Mvmt2 | 4 | Toscanini / DG | https://cc0.oer-musik.de/at114/ | |
Ours | Dvorak | Symphony9Mvmt2 | 5 | Fricsay / DG | https://cc0.oer-musik.de/lpm18142/ | |
Ours | Dvorak | Symphony9Mvmt4 | 1 | x | Suitner / BC | |
Ours | Tschaikowsky | ViolinConcertoMvmt1 | 1 | Ashkenazy / DECCA | ||
Ours | Tschaikowsky | ViolinConcertoMvmt1 | 2 | Francescatti / SONY | ||
Ours | Tschaikowsky | ViolinConcertoMvmt1 | 3 | Menuhin / MEMBRAN | ||
Ours | Tschaikowsky | ViolinConcertoMvmt1 | 4 | Nishizaki / NAXOS | ||
Ours | Tschaikowsky | ViolinConcertoMvmt1 | 5 | Szering / RCA-CCV | https://cc0.oer-musik.de/ccv5015-tschaikowski/ | |
Ours | Tschaikowsky | ViolinConcertoMvmt2 | 1 | Ashkenazy / DECCA | ||
Ours | Tschaikowsky | ViolinConcertoMvmt2 | 2 | Francescatti / SONY | ||
Ours | Tschaikowsky | ViolinConcertoMvmt2 | 3 | Menuhin / MEMBRAN | ||
Ours | Tschaikowsky | ViolinConcertoMvmt2 | 4 | Heifetz / SONY | ||
Ours | Tschaikowsky | ViolinConcertoMvmt2 | 5 | Kogan / BC | ||
Ours | Tschaikowsky | ViolinConcertoMvmt3 | 1 | x | Mullova / PHILIPS |
This work was supported by the German Research Foundation (DFG MU 2686/7-2, MU 2686/11-2). The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer Institut für Integrierte Schaltungen IIS. The authors gratefully acknowledge the compute resources and support provided by the Erlangen Regional Computing Center (RRZE).
@inproceedings{McCallum19_UnsupervisedStructureLearning_ICASSP, author = {Matthew C. McCallum}, title = {Unsupervised Learning of Deep Features for Music Segmentation}, booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})}, pages = {346--350}, address = {Brighton, {UK}}, year = {2019}, doi = {10.1109/ICASSP.2019.8683407}, }
@inproceedings{SpijkervetB21_ContrastiveLearningMusical_ISMIR, author = {Janne Spijkervet and John Ashley Burgoyne}, title = {Contrastive Learning of Musical Representations}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})}, address = {Online}, pages = {673--681}, year = {2021}, OPTurl = {https://archives.ismir.net/ismir2021/paper/000084.pdf} }
@article{ZalkowMueller20_Shingles_AppliedSciences, author = {Frank Zalkow and Meinard M{\"u}ller}, title = {Learning Low-Dimensional Embeddings of Audio Shingles for Cross-Version Retrieval of Classical Music}, journal = {Applied Sciences}, volume = {10}, number = {1}, year = {2020}, doi = {10.3390/app10010019}, }
@article{MuellerOKPD21_SyncToolbox_JOSS, author = {Meinard M{\"u}ller and Yigitcan {\"O}zer and Michael Krause and Thomas Pr{\"a}tzlich and Jonathan Driedger}, title = {{S}ync {T}oolbox: {A} {P}ython Package for Efficient, Robust, and Accurate Music Synchronization}, journal = {Journal of Open Source Software ({JOSS})}, volume = {6}, number = {64}, year = {2021}, pages = {3434:1--4}, doi = {10.21105/joss.03434} }
@inproceedings{Foote00_segmentationNovelty_ICME, author = {Jonathan Foote}, title = {Automatic audio segmentation using a measure of audio novelty}, pages = {452--455}, booktitle = {Proceedings of the {IEEE} International Conference on Multimedia and Expo (ICME)}, year = {2000}, address = {New York, NY, USA}, }