This is the accompanying website for the following paper:
@article{KrauseM23_HierarchicalInstruments_TASLP, author = {Michael Krause and Meinard M{\"u}ller}, title = {Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings}, journal = {{IEEE/ACM} Transactions on Audio, Speech, and Language Processing ({TASLP})}, year = {2023}, volume = {31}, pages = {2567--2578}, doi = {10.1109/TASLP.2023.3291506}, url-pdf = {https://ieeexplore.ieee.org/document/10171391}, url-details = {https://www.audiolabs-erlangen.de/resources/MIR/2023-TASLP-HierarchicalInstrumentClass} }
Instrument activity detection is a fundamental task in music information retrieval, serving as a basis for many applications, such as music recommendation, music tagging, or remixing. Most published works on this task deal with popular music and music for smaller ensembles. In this paper, we cover orchestral and opera music recordings as a rarely considered scenario for automated instrument activity detection. Orchestral music is particularly challenging since it consists of intricate polyphonic and polytimbral sound mixtures where multiple instruments are playing simultaneously. Orchestral instruments can naturally be arranged in hierarchical taxonomies, according to instrument families. As the main contribution of this paper, we show that a hierarchical classification approach can be used to detect instrument activity in our scenario, even if only few fine-grained, instrument-level annotations are available. We further consider additional loss terms for improving hierarchical consistency of predictions. For our experiments, we collect a dataset containing 14 hours of orchestral music recordings with aligned instrument activity annotations. Finally, we perform an analysis into the behavior of our proposed approach with regard to potential confounding errors.
The aligned instrument activity annotations for the recordings used in this paper are made publicly available as a dataset for further research.
The annotations are provided as .npy-Files (i.e. Numpy arrays) to be loaded with python. The arrays have the shape
(N, 18)
with the first dimension corresponding to frames of the recording and the second dimension corresponding to different classes. We use a frame-rate of 43,0664 Hz (obtained with a hop size of 512 at a sample rate of 22050Hz). The individual entries in the second dimension correspond to the classes:
Index | Class identifier |
---|---|
00 | INST |
01 | WW |
02 | BR |
03 | TMP |
04 | VOC |
05 | ST |
06 | Fl |
07 | Ob |
08 | Cl |
09 | Bn |
10 | Hn |
11 | Tpt |
12 | Fe |
13 | Ma |
14 | Vn |
15 | Va |
16 | Vc |
17 | Db |
(see the paper for explanations of the identifiers)
The .npy-Files follow the naming convention
(Subset)_(Composer)_(Work)_V(Version).npy
where
See also the paper for a description of the recordings in this dataset.
So, for example, the file
Ours_Tschaikowsky_ViolinConcertoMvmt1_V3.npy
contains annotations for the third version of the first movement of Tschaikowsky's Violin Concerto.
FreischuetzDigital, PhenicxAnechoic and BeethovenAnechoic each contain only one version of the pieces played. For FreischuetzDigital, we use the stereo mixtures as provided in the dataset. For PhenicxAnechoic and BeethovenAnechoic, we obtain stereo mixes by simply summing the tracks for different instruments in the dataset, applying a simple reverb filter, and normalizing.
For the remaining works, we use the following commercial audio recordings:
Subset | Composer | Work | Version | In test set? | Performer / Label |
---|---|---|---|---|---|
Ours | Wagner | WalkuereAct1 | 1 | x | Karajan / DG 1998 |
Ours | Wagner | WalkuereAct1 | 2 | Neuhold / MEMBRAN 1995 | |
Ours | Wagner | WalkuereAct1 | 3 | Levine / DG 2012 | |
Ours | Wagner | WalkuereAct1 | 4 | Böhm / DECCA 2008 | |
Ours | Wagner | WalkuereAct1 | 5 | Keilberth/Furtwängler / ZYX 2012 | |
Ours | Wagner | WalkuereAct1 | 6 | Boulez / PHILIPS 2006 | |
Ours | Beethoven | Symphony3Mvmt1 | 1 | x | Abbado / DG |
Ours | Beethoven | Symphony3Mvmt2 | 1 | Blomstedt / BC | |
Ours | Beethoven | Symphony3Mvmt2 | 2 | Drahos / NAXOS | |
Ours | Beethoven | Symphony3Mvmt2 | 3 | Jarvi / SONY | |
Ours | Beethoven | Symphony3Mvmt2 | 4 | Scherchen / Heliodor | |
Ours | Beethoven | Symphony3Mvmt2 | 5 | Fricsay / DG | |
Ours | Beethoven | Symphony3Mvmt3 | 1 | Blomstedt / BC | |
Ours | Beethoven | Symphony3Mvmt3 | 2 | Drahos / NAXOS | |
Ours | Beethoven | Symphony3Mvmt3 | 3 | Jarvi / SONY | |
Ours | Beethoven | Symphony3Mvmt3 | 4 | Scherchen / Heliodor | |
Ours | Beethoven | Symphony3Mvmt3 | 5 | Fricsay / DG | |
Ours | Beethoven | Symphony3Mvmt4 | 1 | Blomstedt / BC | |
Ours | Beethoven | Symphony3Mvmt4 | 2 | Drahos / NAXOS | |
Ours | Beethoven | Symphony3Mvmt4 | 3 | Jarvi / SONY | |
Ours | Beethoven | Symphony3Mvmt4 | 4 | Scherchen / Heliodor | |
Ours | Beethoven | Symphony3Mvmt4 | 5 | Fricsay / DG | |
Ours | Dvorak | Symphony9Mvmt1 | 1 | Kubelik / MERCURY | |
Ours | Dvorak | Symphony9Mvmt1 | 2 | Szell / SONY | |
Ours | Dvorak | Symphony9Mvmt1 | 3 | Karajan / MEMBRAN | |
Ours | Dvorak | Symphony9Mvmt1 | 4 | Toscanini / DG | |
Ours | Dvorak | Symphony9Mvmt1 | 5 | Fricsay / DG | |
Ours | Dvorak | Symphony9Mvmt2 | 1 | Kubelik / MERCURY | |
Ours | Dvorak | Symphony9Mvmt2 | 2 | Szell / SONY | |
Ours | Dvorak | Symphony9Mvmt2 | 3 | Leaper / SONY | |
Ours | Dvorak | Symphony9Mvmt2 | 4 | Toscanini / DG | |
Ours | Dvorak | Symphony9Mvmt2 | 5 | Fricsay / DG | |
Ours | Dvorak | Symphony9Mvmt4 | 1 | x | Suitner / BC |
Ours | Tschaikowsky | ViolinConcertoMvmt1 | 1 | Ashkenazy / DECCA | |
Ours | Tschaikowsky | ViolinConcertoMvmt1 | 2 | Francescatti / SONY | |
Ours | Tschaikowsky | ViolinConcertoMvmt1 | 3 | Menuhin / MEMBRAN | |
Ours | Tschaikowsky | ViolinConcertoMvmt1 | 4 | Nishizaki / NAXOS | |
Ours | Tschaikowsky | ViolinConcertoMvmt1 | 5 | Szering / RCA-CCV | |
Ours | Tschaikowsky | ViolinConcertoMvmt2 | 1 | Ashkenazy / DECCA | |
Ours | Tschaikowsky | ViolinConcertoMvmt2 | 2 | Francescatti / SONY | |
Ours | Tschaikowsky | ViolinConcertoMvmt2 | 3 | Menuhin / MEMBRAN | |
Ours | Tschaikowsky | ViolinConcertoMvmt2 | 4 | Heifetz / SONY | |
Ours | Tschaikowsky | ViolinConcertoMvmt2 | 5 | Kogan / BC | |
Ours | Tschaikowsky | ViolinConcertoMvmt3 | 1 | x | Mullova / PHILIPS |
We thank Christof Weiß for helpful discussions. This work was supported by the German Research Foundation (DFG MU 2686/7-2, MU 2686/11-2). The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer Institut für Integrierte Schaltungen IIS.
@inproceedings{KrauseM22_HierarchyClass_ICASSP, author = {Michael Krause and Meinard M{\"u}ller}, title = {Hierarchical Classification for Singing Activity, Gender, and Type in Complex Music Recordings}, booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})}, pages = {406--410}, address = {Singapore}, year = {2022}, doi = {10.1109/ICASSP43922.2022.9747690} }
@article{BoehmAW21_BeethovenAnechoic_AES, author={Böhm, Christoph and Ackermann, David and Weinzierl, Stefan}, journal = {Journal of the Audio Engineering Society}, title={A multi-channel anechoic orchestra recording of {B}eethoven’s {S}ymphony no. 8 op. 93}, year={2021}, volume={68}, number={12}, pages={977-984}, doi={https://doi.org/10.17743/jaes.2020.0056} }
@inproceedings{PraetzlichMBV15_FreiDi_ISMIR-LBD, author = {Thomas Pr{\"a}tzlich and Meinard M{\"u}ller and Benjamin W. Bohl and Joachim Veit}, title = {{F}reisch{\"u}tz {D}igital: {D}emos of audio-related contributions}, booktitle = {Demos and Late Breaking News of the International Society for Music Information Retrieval Conference ({ISMIR})}, address = {Mal{\'a}ga, Spain}, year = {2015}, url-pdf = {2015_PraetzlichMBV_FreiDi_ISMIR-LBD.pdf}, url-details = {http://freischuetz-digital.de/} }
@article{MironCBGJ16_OrchestraSourceSeparation_JECE, author = {Marius Miron and Julio J. Carabias{-}Orti and Juan J. Bosch and Emilia G{\'{o}}mez and Jordi Janer}, title = {Score-Informed Source Separation for Multichannel Orchestral Recordings}, journal = {Journal of Electrical and Computer Engineering}, volume = {2016}, pages = {8363507:1--8363507:19}, year = {2016}, }
@inproceedings{GururaniL21_PartialLabels_ISM, author = {Siddharth Gururani and Alexander Lerch}, title = {Semi-Supervised Audio Classification with Partially Labeled Data}, booktitle = {{IEEE} International Symposium on Multimedia (ISM)}, address = {Naple, Italy}, pages = {111--114}, year = {2021}, doi = {10.1109/ISM52913.2021.00027} }