In the ISAD project, we explored fundamental techniques and computational tools for detecting sound sources or characteristic sound events that are present in a given music recording. The project was funded by the German Research Foundation. On this website, we summarize the project's main objectives and provide links to project-related resources (data, demonstrators, websites) and publications.
Informed Sound Activity Detection in Music Recordings
In music information retrieval, the development of computational methods for analyzing, segmenting, and classifying music signals is of fundamental importance. One prominent task is known as singing voice detection. The objective is to automatically locate all sections of a given music recording where a main singer is active. Although this task seems to be simple for human listeners, the detection of the singing voice by computational methods remains difficult due to complex superpositions of sound sources that typically occur in music where the singing voice interacts with accompanying instruments. Extending this scenario, the goal of automatic instrument recognition is to identify all performing instruments in a given music recording and to derive a segmentation into sections with homogeneous instrumentation. Other related problems deal with finding all monophonic sections, identifying all solo parts or sections with a predominant melody, or locating sections with a specific timbre. In this project, motivated by these segmentation problems, we adopted a comprehensive perspective. Our goal was to explore fundamental techniques and computational tools for detecting sound sources or characteristic sound events that are present in a given music recording. To cope with a wide range of musical properties and complex superpositions of different sound sources, we focused on informed approaches that exploit various types of additional knowledge. Such knowledge may be given in the form of musical parameters (e.g., number of instruments, score information), sound examples (e.g., instrument samples, representative sections), or user input (e.g., annotations, interactive feedback). By combining audio segmentation, detection, and classification techniques, we developed novel approaches that can efficiently adapt to requirements within specific application scenarios. To test and evaluate our activity detection algorithms, we considered various challenging music scenarios including Western classical music, jazz music, and opera recordings.
Informierte Klangquellenerkennung in Musikaufnahmen
Die Entwicklung computergestützter Verfahren zur Analyse, Segmentierung und Klassifikation von Musiksignalen ist ein zentrales Forschungsthema des "Music Information Retrieval". Eine wichtige Fragestellung besteht darin, alle Abschnitte einer Musikaufnahme, in denen eine Gesangsstimme vorkommt, zu identifizieren. Während diese Aufgabe für den Menschen leicht zu bewerkstelligen ist, stoßen automatisierte Verfahren aufgrund komplexer akustischer Überlagerungen von Gesangs- und Begleitstimmen schnell an ihre Grenzen. Eine erweiterte Problemstellung stellt die automatische Erkennung von Musikinstrumenten dar. Hierbei besteht das Ziel darin, eine Musikaufnahme in Abschnitte ähnlicher Instrumentierung zu segmentieren und die darin vorkommenden Instrumente zu identifizieren. Weitere verwandte Fragestellungen betreffen die Erkennung monophon gespielter Abschnitte, die Identifikation von Solopassagen und die Bestimmung aller Abschnitte mit dominanter Melodiestimme oder anderen spezifischen Klangeigenschaften. Ausgehend von den beschriebenen Detektionsproblemen haben wir in diesem Forschungsprojekt grundsätzliche Fragestellungen der klanglichen Segmentierung und Klassifikation von Musikaufnahmen erforscht. Hierbei wurden automatisierte Verfahren zur Detektion unterschiedlicher Klangquellen in komplexen Musikaufnahmen entwickelt. Um klangliche Vielfalt und möglichen Überlagerungen verschiedener Klangquellen besser bewältigen zu können, haben wir in diesem Projekt informierte Verfahren, die unterschiedliche Arten von Vor- oder Zusatzwissen berücksichtigen können, untersucht. Solches Wissen kann in Form von musikalischen Parametern (z. B. Anzahl der Instrument, Noteninformation), Klangbeispielen (z. B. Samples von Instrumenten, repräsentative Musikpassagen) oder Nutzerspezifikationen (z. B. Annotationen, interaktives Feedback) gegeben sein. Unter Verwendung von Methoden aus den Bereichen der Audiosignalverarbeitung und Informatik (Strukturanalyse, Klassifikation) entwickelten wir im Rahmen dieses Projektes neuartige Analysetechniken, die sich effizient an die jeweiligen Anforderungen einer spezifischen Anwendung anpassen lassen. Die entwickelten Erkennungsverfahren sollen für verschiedene Musikgenres (u. a. Klassische Musik, Jazzmusik und Opernaufnahmen), die eine Vielzahl möglicher Klangeigenschaften und Instrumentierungen abdecken, getestet und evaluiert werden.
The following list provides an overview of the most important publicly accessible sources created in the ISAD project:
The following publications reflect the main scientific contributions of the work carried out in the ISAD project.
@inproceedings{AbesserM19_ContourFeature_ICASSP, author = {Jakob Abe{\ss}er and Meinard M{\"u}ller}, title = {Fundamental Frequency Contour Classification: {A} Comparison between Hand-Crafted and {CNN}-Based Features}, booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})}, pages = {486--490}, address = {Brighton, {UK}}, year = {2019}, doi = {10.1109/ICASSP.2019.8682252} }
@article{KrauseMW21_OperaSingingActivity_Electronics, author = {Michael Krause and Meinard M{\"u}ller and Christof Wei{\ss}}, title = {Singing Voice Detection in Opera Recordings: A Case Study on Robustness and Generalization}, journal = {Electronics}, volume = {10}, number = {10}, pages = {1214:1--14}, year = {2021}, doi = {10.3390/electronics10101214}, url-pdf = {2021_KrauseMW_OperaSingingActivity_Electronics.pdf} }
@article{MimilakisDCS19_DenoisingAutoencoders_TASLP, author = {Stylianos Ioannis Mimilakis and Konstantinos Drossos and Estefan{\'{\i}}a Cano and Gerald Schuller}, title = {Examining the Mapping Functions of Denoising Autoencoders in Singing Voice Separation}, journal = {{IEEE/ACM} Transactions on Audio, Speech {\&} Language Processing}, volume = {28}, number = {}, pages = {266--278}, year = {2019}, doi = {10.1109/TASLP.2019.2952013} }
@inproceedings{MimilakisWAAM19_SingingVDetWagner_MML, author = {Stylianos I. Mimilakis and Christof Wei{\ss} and Vlora Arifi-M{\"u}ller and Jakob Abe{\ss}er and Meinard M{\"u}ller}, title = {Cross-Version Singing Voice Detection in Opera Recordings: {C}hallenges for Supervised Learning}, booktitle = {Machine Learning and Knowledge Discovery in Databases -- Proceedings of the International Workshops of {ECML} {PKDD} 2019, Part {II}}, series = {Communications in Computer and Information Science}, volume = {1168}, pages = {429--436}, address = {W{\"u}rzburg, Germany}, year = {2019}, doi = {10.1007/978-3-030-43887-6_35} }
@inproceedings{TaenzerAMWLM19_InstrumentCNN_ISMIR, author = {Michael Taenzer and Jakob Abe{\ss}er and Stylianos I. Mimilakis and Christof Wei{\ss} and Hanna Lukashevich and Meinard M{\"u}ller}, title = {Investigating {CNN}-based Instrument Family Recognition for {W}estern Classical Music Recordings}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})}, address = {Delft, The Netherlands}, pages = {612--619}, year = {2019}, doi = {10.5281/zenodo.3527884}, url-pdf = {2019_TaenzerAMWML_InstrumentCNN_ISMIR_PrintedVersion.pdf} }
@article{TaenzerMA21_LocalPolyphonyEstimation_Electronics, title = {Informing Piano Multi-Pitch Estimation with Inferred Local Polyphony Based on Convolutional Neural Networks}, author = {Michael Taenzer and Stylianos I. Mimilakis and Jakob Abe{\ss}er}, journal = {Electronics}, volume = {10}, number = {7}, year = {2021}, doi = {10.3390/electronics10070851} }
@inproceedings{WeissBAM18_JazzComplexity_ISMIR, author = {Christof Wei{\ss} and Stefan Balke and Jakob Abe{\ss}er and Meinard M{\"u}ller}, title = {Computational Corpus Analysis: {A} Case Study on Jazz Solos}, booktitle = {Proceedings of the 19th International Society for Music Information Retrieval Conference ({ISMIR})}, pages = {416--423}, address = {Paris, France}, year = {2018}, doi = {10.5281/zenodo.1492439}, url-pdf = {2018_WeissBAM_JazzComplexity_ISMIR_PrintedVersion.pdf} }
@article{ZalkowBAM20_MTD_TISMIR, title = {{MTD}: {A} Multimodal Dataset of Musical Themes for {MIR} Research}, author = {Frank Zalkow and Stefan Balke and Vlora Arifi-M{\"{u}}ller and Meinard M{\"{u}}ller}, journal = {Transactions of the International Society for Music Information Retrieval ({TISMIR})}, volume = {3}, number = {1}, year = {2020}, pages = {180--192}, doi = {10.5334/tismir.68}, url-details = {https://transactions.ismir.net/articles/10.5334/tismir.68/}, url-pdf = {2020_ZalkowBAM_MTD_TISMIR_ePrint.pdf}, url-demo = {https://www.audiolabs-erlangen.de/resources/MIR/MTD} }
@inproceedings{ZalkowBM19_SalienceRetrieval_ICASSP, author = {Frank Zalkow and Stefan Balke and Meinard M{\"u}ller}, title = {Evaluating Salience Representations for Cross-Modal Retrieval of {W}estern Classical Music Recordings}, booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})}, address = {Brighton, {UK}}, year = {2019}, pages = {331--335}, doi = {10.1109/ICASSP.2019.8683609} }
@inproceedings{ZalkowM20_CTC_ISMIR, author = {Frank Zalkow and Meinard M{\"u}ller}, title = {Using Weakly Aligned Score--Audio Pairs to Train Deep Chroma Models for Cross-Modal Music Retrieval}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})}, address = {Montr{\'{e}}al, Canada}, pages = {184--191}, year = {2020}, doi = {10.5281/zenodo.4245400}, url-pdf = {2020_ZalkowM_WeaklyAlignedTrain_ISMIR.pdf} }
@article{ZalkowMueller21_ChromaCTC_TASLP, author = {Frank Zalkow and Meinard M{\"u}ller}, title = {{CTC}-Based Learning of Chroma Features for Score-Audio Music Retrieval}, journal = {{IEEE}/{ACM} Transactions on Audio, Speech, and Language Processing}, volume = {29}, pages = {2957--2971}, year = {2021}, doi = {10.1109/TASLP.2021.3110137}, url-details = {https://www.audiolabs-erlangen.de/resources/MIR/2021_TASLP-ctc-chroma}, url-pdf = {https://ieeexplore.ieee.org/document/9531521}, }
@phdthesis{Zalkow21_CrossVersionRetrieval_PhD, author = {Frank Zalkow}, year = {2021}, title = {Learning Audio Representations for Cross-Version Retrieval of Western Classical Music}, school = {Friedrich-Alexander-Universit{\"a}t Erlangen-N{\"u}rnberg}, url-details = {https://opus4.kobv.de/opus4-fau/frontdoor/index/index/docId/16777}, url-pdf = {2021_Zalkow_AudioRepRetrieval_ThesisPhD.pdf} }
@phdthesis{Balke18_MultimediaProcMusic_PhD, author = {Stefan Balke}, year = {2018}, title = {Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content}, school = {Friedrich-Alexander-Universit{\"a}t Erlangen-N{\"u}rnberg}, url-details = {https://opus4.kobv.de/opus4-fau/frontdoor/index/index/docId/9635}, url-pdf = {2018_Balke_MultimediaProcessing_ThesisPhD.pdf} }