There has been a rapid growth of digitally available music data including audio recordings, images of scanned sheet music, album covers, and an increasing number of video clips. In this tutorial, we cover general signal processing and machine learning concepts designed to bridge the gap between these different music representations. In particular, we discuss traditional approaches based on musically motivated features, generalized audio fingerprinting, as well as recent data embedding techniques based on deep learning. These technologies form the building blocks for many exciting music navigation and browsing applications including the classical problem of automated score following.
Music not only connects people but also relates to many different research disciplines. Adopting an interdisciplinary perspective, our aim is to show that music is an attractive, rich, and challenging problem domain that has many things to offer to the signal processing community. Considering cross-modal music retrieval tasks, we demonstrate that these scenarios are well suited to discuss signal processing and machine learning techniques (comparing traditional feature extraction and deep learning approaches). Furthermore, we want to give some examples of fascinating music retrieval applications of academic, educational, and commercial relevance.
The main goal of this tutorial is to give an exciting and easy-to-understand introduction to music processing appealing to a wide audience in academia and industry. By providing many illustrative audio examples and by working with pictures (rather than with formulas), we will make an effort to convey the main ideas, in particular to non-experts and to researchers who are new to the field of music processing. By doing so, the tutorial should appeal to a wide and interdisciplinary audience working in different fields including signal processing, information retrieval, and machine learning.
It is our primary goal to give an exciting tutorial that ranges from basic to advanced techniques used in music processing. Our tutorial consists of three one-hour sessions, where we make sure that in each session is enough room for questions and interaction with the audience. To have some variety throughout the tutorial, we will discuss theoretical as well as practical aspects in each of the three sessions. Recent techniques based on deep learning will play a role throughout the tutorial, even though they are mentioned explicitly only in the third session of the following overview.
The tutorial consists of a short introduction and three session. The following links provide the PDFs of the slides and the handouts.
Meinard Müller, Andreas Arzt, Stefan Balke, Matthias Dorfer, Gerhard Widmer
Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies
IEEE Signal Processing Magazine, 36(1), 2019, pp. 52-62.
Details
@article{MuellerABDW19_MusicRetrieval_IEEE-SPM, author = {Meinard Müller and Andreas Arzt and Stefan Balke and Matthias Dorfer and Gerhard Widmer}, title = {Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies}, journal = {{IEEE} Signal Processing Magazine}, volume = {36}, number = {1}, pages = {52--62}, year = {2019}, url = {https://doi.org/10.1109/MSP.2018.2868887}, doi = {10.1109/MSP.2018.2868887}, url-pdf = {https://ieeexplore.ieee.org/document/8588416/} }
Meinard Müller
Fundamentals of Music Processing — Audio, Analysis, Algorithms, Applications
Springer Verlag, ISBN: 978-3-319-21944-8, 2015.
Details
@book{Mueller15_FundamentalsMusicProcessig_SPRINGER, author = {Meinard M\"{u}ller}, title = {Fundamentals of Music Processing -- Audio, Analysis, Algorithms, Applications}, type = {Monograph}, year = {2015}, isbn = {978-3-319-21944-8}, publisher = {Springer Verlag}, url-details={http://www.music-processing.de} }
@inproceedings{ArztWS14_TempoTranspInvariantIdent_ISMIR, address = {Taipei, Taiwan}, author = {Andreas Arzt and Gerhard Widmer and Reinhard Sonnleitner}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)}, pages = {549--554}, title = {Tempo- and Transposition-invariant Identification of Piece and Score Position}, year = {2014} }
@inproceedings{arzt:ijcai:2015, address = {Buenos Aires, Argentina}, author = {Andreas Arzt and Harald Frostel and Thassilo Gadermaier and Martin Gasser and Maarten Grachten and Gerhard Widmer}, booktitle = {Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)}, pages = {2424--2430}, title = {Artificial Intelligence in the Concertgebouw}, year = {2015} }
@inproceedings{ArztBFFGLW14_MusicCompanion_ECAI, acmid = {3006927}, address = {Prague, Czech Republic}, author = {Andreas Arzt and Sebastian Böck and Sebastian Flossmann and Harald Frostel and Martin Gasser and Cynthia C. S. Liem and Gerhard Widmer}, booktitle = {Proceedings of the European Conference on Artificial Intelligence}, doi = {10.3233/978-1-61499-419-0-1221}, pages = {1221--1222}, title = {The Piano Music Companion}, year = {2014} }
@inproceedings{ArztBW12_SymbolicFingerprint_ISMIR, address = {Porto, Portugal}, author = {Andreas Arzt and Sebastian Böck and Gerhard Widmer}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)}, pages = {433--438}, title = {Fast Identification of Piece and Score Position via Symbolic Fingerprinting}, year = {2012} }
@inproceedings{arzt:ecai:2008, address = {Patras, Greece}, author = {Andreas Arzt and Gerhard Widmer and Simon Dixon}, booktitle = {Proceedings of the European Conference on Artificial Intelligence (ECAI)}, date-modified = {2016-11-14 13:21:41 +0000}, pages = {241--245}, title = {Automatic Page Turning for Musicians via Real-Time Machine Listening}, year = {2008} }
@inproceedings{BalkeALM16_BarlowRetrieval_ICASSP, address = {Shanghai, China}, author = {Stefan Balke and Vlora Arifi-Müller and Lukas Lamprecht and Meinard Müller}, booktitle = {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, month = {March}, pages = {281--285}, title = {Retrieving Audio Recordings Using Musical Themes}, year = {2016} }
@inproceedings{BalkeDAM17_SoloVoiceEnhancement_ICASSP, author = {Stefan Balke and Christian Dittmar and Jakob Abeßer and Meinard Müller}, booktitle = {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, location = {New Orleans, USA}, pages = {196--200}, title = {Data-Driven Solo Voice Enhancement for Jazz Music Retrieval}, url-demo = {https://www.audiolabs-erlangen.de/resources/MIR/2017-ICASSP-SoloVoiceEnhancement}, year = {2017} }
@inproceedings{BalkePM15_MatchingMusicalThemes_ICASSP, address = {Brisbane, Australia}, author = {Stefan Balke and Sanu Pulimootil Achankunju and Meinard Müller}, booktitle = {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, pages = {703--707}, title = {Matching Musical Themes based on Noisy OCR and OMR Input}, year = {2015} }
@book{BarlowM75_MusicalThemes_BOOK, author = {Harold Barlow and Sam Morgenstern}, edition = {Revised edition Third Printing}, publisher = {Crown Publishers, Inc.}, title = {A Dictionary of Musical Themes}, year = {1975} }
@inproceedings{BoeckS12_TranscriptionRecurrentNetwork_ICASSP, address = {Kyoto, Japan}, author = {Sebastian Böck and Markus Schedl}, booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, month = {March}, pages = {121--124}, title = {Polyphonic piano note transcription with recurrent neural networks}, year = {2012} }
@inproceedings{BoschBSG16_MelodyExtraction_ISMIR, address = {New York City, USA}, author = {Juan J. Bosch, Rachel M. Bittner, Justin Salamon and Emilia Gómez}, booktitle = {Proceedings of the International Conference on Music Information Retrieval (ISMIR)}, pages = {571--577}, title = {A Comparison of Melody Extraction Methods Based on Source-Filter Modelling}, year = {2016} }
@article{DBLP:journals/spm/BenetosDDE19, author = {Emmanouil Benetos and Simon Dixon and Zhiyao Duan and Sebastian Ewert}, bibsource = {dblp computer science bibliography, https://dblp.org}, biburl = {https://dblp.org/rec/bib/journals/spm/BenetosDDE19}, doi = {10.1109/MSP.2018.2869928}, journal = {IEEE Signal Processing Magazine}, number = {1}, pages = {20--30}, timestamp = {Fri, 18 Jan 2019 23:22:47 +0100}, title = {Automatic Music Transcription: An Overview}, url = {https://doi.org/10.1109/MSP.2018.2869928}, volume = {36}, year = {2019} }
@article{BenetosDGKK13_MusicTranscription_JIIS, author = {Emmanouil Benetos and Simon Dixon and Dimitrios Giannoulis and Holger Kirchhoff and Anssi Klapuri}, doi = {10.1007/s10844-013-0258-3}, journal = {Journal of Intelligent Information Systems}, number = {3}, pages = {407--434}, title = {Automatic music transcription: challenges and future directions}, url = {http://dx.doi.org/10.1007/s10844-013-0258-3}, volume = {41}, year = {2013} }
@article{Byrd2015_OMR_JNMR, author = {Donald Byrd and Jakob G. Simonsen}, doi = {10.1080/09298215.2015.1045424}, issn = {0929-8215}, journal = {Journal of New Music Research}, keywords = {optical music recognition, empirical evaluation, notation, notation complexity}, number = {3}, pages = {169--195}, publisher = {Routledge}, title = {Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images}, volume = {44}, year = {2015} }
@article{CanoBKH05_FingerprintingReview_VLSI, acmid = {1107829}, address = {Hingham, Massachusetts, USA}, author = {Pedro Cano and Eloi Batlle and Ton Kalker and Jaap Haitsma}, doi = {10.1007/s11265-005-4151-3}, issn = {0922-5773}, journal = {The Journal of VLSI Signal Processing}, month = {November}, number = {3}, numpages = {14}, pages = {271--284}, publisher = {Kluwer Academic Publishers}, title = {A review of audio fingerprinting}, url = {http://dx.doi.org/10.1007/s11265-005-4151-3}, volume = {41}, year = {2005} }
@article{CaseyRS08_MinimumDistances_IEEE-TASLP, author = {Michael A. Casey and Christophe Rhodes and Malcolm Slaney}, doi = {10.1109/TASL.2008.925883}, journal = {IEEE Transactions on Audio, Speech, and Language Processing}, number = {5}, pages = {1015--1028}, title = {Analysis of Minimum Distances in High-Dimensional Musical Spaces}, volume = {16}, year = {2008} }
@inproceedings{ChengMBD16_AttackDecayPianoTranscription_ISMIR, address = {New York City, USA}, author = {Tian Cheng and Matthias Mauch and Emmanouil Benetos and Simon Dixon}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)}, pages = {584--590}, title = {An attack/decay model for piano transcription}, year = {2016} }
@inproceedings{DorferAW17_ScoreIdentification_ISMIR, address = {Suzhou, China}, author = {Matthias Dorfer and Andreas Arzt and Gerhard Widmer}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)}, pages = {115--122}, title = {Learning Audio-Sheet Music Correspondences for Score Identification and Offline Alignment}, year = {2017} }
@article{DorferSVKW18_CCALayer_IJMIR, author = {Matthias Dorfer and Jan Schlüter and Andreu Vall and Filip Korzeniowski and Gerhard Widmer}, day = {01}, doi = {10.1007/s13735-018-0151-5}, issn = {2192-662X}, journal = {International Journal of Multimedia Information Retrieval}, month = {Jun}, number = {2}, pages = {117--128}, title = {End-to-end cross-modality retrieval with CCA projections and pairwise ranking loss}, url = {https://doi.org/10.1007/s13735-018-0151-5}, volume = {7}, year = {2018} }
@article{DorferHAFW18_MSMD_TISMIR, author = {Matthias Dorfer and Jan Hajič jr. and Andreas Arzt and Harald Frostel and Gerhard Widmer}, journal = {Transactions of the International Society for Music Information Retrieval}, number = {1}, publisher = {Ubiquity Press}, title = {Learning Audio--Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification}, volume = {1}, year = {2018} }
@article{DBLP:journals/spm/GotoD19, author = {Masataka Goto and Roger B. Dannenberg}, bibsource = {dblp computer science bibliography, https://dblp.org}, biburl = {https://dblp.org/rec/bib/journals/spm/GotoD19}, doi = {10.1109/MSP.2018.2874360}, journal = {IEEE Signal Processing Magazine}, number = {1}, pages = {74--81}, timestamp = {Fri, 18 Jan 2019 23:22:47 +0100}, title = {Music Interfaces Based on Automatic Music Signal Analysis: New Ways to Create and Listen to Music}, url = {https://doi.org/10.1109/MSP.2018.2874360}, volume = {36}, year = {2019} }
@inproceedings{GroscheM12_RetrievalShingles_ICASSP, address = {Kyoto, Japan}, author = {Peter Grosche and Meinard Müller}, booktitle = {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, pages = {473--476}, title = {Toward Characteristic Audio Shingles for Efficient Cross-Version Music Retrieval}, year = {2012} }
@inproceedings{hawthorne:ismir:2018, author = {Curtis Hawthorne and Erich Elsen and Jialin Song and Adam Roberts and Ian Simon and Colin Raffel and Jesse Engel and Sageev Oore and Douglas Eck}, booktitle = {Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27, 2018}, pages = {50--57}, title = {Onsets and Frames: Dual-Objective Piano Transcription}, year = {2018} }
@inproceedings{IzmirliS12_BridgingPrintedMusicAudio_ISMIR, address = {Porto, Portugal}, author = {Özgür İzmirli and Gyanendra Sharma}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)}, title = {Bridging Printed Music and Audio Through Alignment Using a Mid-level Score Representation} }
@inproceedings{kelz:icassp:2019, address = {Brighton, United Kingdom}, author = {Rainer Kelz and Sebastian Böck and Gerhard Widmer}, booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages = {246-250}, title = {Deep Polyphonic ADSR Piano Note Transcription}, year = {2019} }
@inproceedings{KurthMFCC07_AutomatedSynchronization_ISMIR, address = {Vienna, Austria}, author = {Frank Kurth and Meinard Müller and Christian Fremerey and Yoonha Chang and Michael Clausen}, booktitle = {Proceedings of the International Conference on Music Information Retrieval (ISMIR)}, month = {September}, pages = {261--266}, title = {Automated Synchronization of Scanned Sheet Music with Audio Recordings}, url-pdf = {2007_KurthMuellerFremereyClausen_AutomatedSynchronization_ISMIR.pdf}, year = {2007} }
@book{Mueller07_InformationRetrieval_SPRINGER, author = {Meinard Müller}, isbn = {3540740473}, publisher = {Springer Verlag}, title = {Information Retrieval for Music and Motion}, type = {Monograph}, year = {2007} }
@book{Mueller15_FMP_SPRINGER, author = {Meinard Müller}, isbn = {978-3-319-21944-8}, publisher = {Springer Verlag}, title = {Fundamentals of Music Processing}, type = {Monograph}, year = {2015} }
@article{MuellerABDW19_MusicRetrieval_IEEE-SPM, author = {Meinard Müller and Andreas Arzt and Stefan Balke and Matthias Dorfer and Gerhard Widmer}, doi = {10.1109/MSP.2018.2868887}, journal = {IEEE Signal Processing Magazine}, number = {1}, pages = {52--62}, title = {Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies}, url = {https://doi.org/10.1109/MSP.2018.2868887}, url-pdf = {https://ieeexplore.ieee.org/document/8588416/}, volume = {36}, year = {2019} }
@inproceedings{MuellerKC05_ChromaFeatures_ISMIR, address = {London, UK}, author = {Meinard Müller and Frank Kurth and Michael Clausen}, booktitle = {Proceedings of the International Conference on Music Information Retrieval (ISMIR)}, pages = {288--295}, title = {Audio Matching via Chroma-Based Statistical Features}, url-details = {https://www.audiolabs-erlangen.de/resources/MIR/chromatoolbox}, url-pdf = {2005_MuellerKurthClausen_AudioMatching_ISMIR.pdf}, year = {2005} }
@article{SalamonSG13_Retrieval_IJMRI, author = {Justin Salamon and Joan Serrà and Emilia Gómez}, bibsource = {dblp computer science bibliography, http://dblp.org}, biburl = {http://dblp.uni-trier.de/rec/bib/journals/ijmir/SalamonSG13}, doi = {10.1007/s13735-012-0026-0}, journal = {International Journal of Multimedia Information Retrieval}, number = {1}, pages = {45--58}, timestamp = {Fri, 15 Mar 2013 10:07:17 +0100}, title = {Tonal representations for music retrieval: from version identification to query-by-humming}, url = {http://dx.doi.org/10.1007/s13735-012-0026-0}, volume = {2}, year = {2013} }
@article{Schmidhuber15_DeepLearningOverview_NN, author = {Jürgen Schmidhuber}, doi = {10.1016/j.neunet.2014.09.003}, journal = {Neural Networks}, pages = {85--117}, title = {Deep learning in neural networks: An overview}, volume = {61}, year = {2015} }
@incollection{SerraGH10_coversong_BOOKCHAP, address = {Berlin, Germany}, author = {Joan Serrà and Emilia Gómez and Perfecto Herrera}, booktitle = {Advances in Music Information Retrieval}, chapter = {14}, editor = {Ras, Z. W. and Wieczorkowska, A. A.}, pages = {307--332}, publisher = {Springer}, series = {Studies in Computational Intelligence}, title = {Audio Cover Song Identification and Similarity: Background, Approaches, Evaluation and Beyond}, volume = {274}, year = {2010} }
@article{SigtiaBD16_DNNPolyPianoTrans_TASLP, author = {Siddharth Sigtia and Emmanouil Benetos and Simon Dixon}, journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing}, number = {5}, pages = {927--939}, title = {An End-to-End Neural Network for Polyphonic Piano Music Transcription}, volume = {24}, year = {2016} }
@inproceedings{SixL14_PanakoAcousFP_ISMIR, address = {Taipei, Taiwan}, author = {Joren Six and Marc Leman}, booktitle = {Proceedings of the International Conference on Music Information Retrieval (ISMIR)}, pages = {259--264}, title = {Panako - A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification}, year = {2014} }
@article{SonnleitnerW16_QuadFingerp_TASLP, author = {Reinhard Sonnleitner and Gerhard Widmer}, doi = {10.1109/TASLP.2015.2509248}, journal = {IEEE Transactions on Audio, Speech, and Language Processing}, number = {3}, pages = {409--421}, title = {Robust Quad-Based Audio Fingerprinting}, volume = {24}, year = {2016} }
@inproceedings{Wang03_Shazam_ISMIR, address = {Baltimore, Maryland, USA}, author = {Avery Wang}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)}, pages = {7--13}, title = {An Industrial Strength Audio Search Algorithm}, year = {2003} }
@inproceedings{ZalkowBM19_SalienceRep_ICASSP, address = {Brighton, UK}, author = {Frank Zalkow and Stefan Balke and Meinard Müller}, booktitle = {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, month = {May}, title = {Evaluating Salience Representations for Cross-Modal Retrieval of Western Classical Music Recordings}, year = {2019} }
The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and the Fraunhofer-Institut für Integrierte Schaltungen IIS. The work by Meinard Müller and Stefan Balke was supported by the German Research Foundation (DFG MU 2686/11-1). Furthermore, the work of Andreas Arzt and Stefan Balke was supported by the European Research Council (ERC) under the European Union's Horizon 2020 Framework Programme (H2020, 2014-2020) / ERC Advanced Grant Agreement n.670035, project "Con Espressione".