AES International Conference on Semantic Audio

22.06.2017 - 24.06.2017
Erlangen, Germany

Semantic Audio is concerned with the extraction of meaning from audio signals and with the development of applications that use this information to support the user in identifying, organizing, and exploring audio signals, and interacting with them. These applications include music information retrieval, semantic web technologies, audio production, sound reproduction, education, and gaming. Semantic technology involves some kind of understanding of the meaning of the information it deals with and to this end may incorporate machine learning, digital signal processing, speech processing, source separation, perceptual models of hearing, musicological knowledge, metadata, and ontologies. This conference will be the third AES conference on the topic and will provide the opportunity to present and discuss the latest advancements in the field.

For more information take a look at the AES website.


The Audio Engineering Society’s (AES) International Conference on Semantic Audio took place on June 22-24 in Erlangen, Germany at Fraunhofer IIS. The conference included technical discussions along with demos, poster sessions and a few cultural and entertaining activities every evening. Attendees received a pre-conference in-depth introduction with three tutorials on music performance analysis, sonic interactions for virtual reality applications and phase reconstruction from magnitude spectrograms. While the tutorials were not mandatory for attendance, almost all the delegates joined the sessions. Alexander Lerch and Stefan Weinzierl ("Music Performance Analysis") explained how semantic analysis could be applied to support learning a musical instrument by providing feedback on the accuracy of a performance with respect to rhythm, pitch and tone.

In the second tutorial entitled "Sonic Interactions for Virtual Reality Applications", Stefania Serafin talked about ways to enhance VR applications by providing sonic and haptic feedback with applications to gaming, virtual musical instruments and rehabilitation.

Christian Dittmar presented a comprehensive overview of different methods for phase reconstruction from magnitude spectrograms. Such algorithms are applied for enhancing the sound quality when applying source separation methods.

The conference officially started on June 22 with a welcome speech from Bernhard Grill, Director of Fraunhofer IIS. Mark Plumbley, Professor of Signal Processing at the University of Surrey gave the keynote speech on audio event detection and scene recognition. He talked about the computational methods for analyzing recordings of everyday sounds with the aim to classify the sounds, such as door slamming or gun shots; and the environment, e.g. train station or an office space.

Fraunhofer IIS’s Audio and Media Technologies division presented the EVS, MPEG-H and Cingo demos. The delegates were curious to see the latest audio technologies being applied and engaged in discussions regarding the implementation of MPEG-H in Korea.

Awards were given out later that evening. Rodrigo Schramm and Emmanouil Benetos received the Best Paper Award for their paper on "Automatic Transcription of A Cappella Recordings from Multiple Singers". The winners of another category, the Best Student Paper Award, were Rachel M. Bittner, Justin Salamon, Juan J. Bosch and Juan P. Bello, for their paper on "Pitch Contours as a Mid-Level Representation for Music Informatics". The evening continued with a concert by LINda Capo, who performed a pleasant fusion of jazz and pop.

The conference’s second day started with the "Pitch Tracking" oral session which included the keynote "Pitch-based Audio Algorithms" by Udo Zölzer. The TU Hamburg professor, who is well versed in digital audio processing, discussed various pitch-related topics, including estimating the fundamental frequency of an audio signal and how to use this information for creative audio effects, such as automatic harmonization of singing and novel methods for waveform synthesis.

Attendees also dined at a Franconian beer garden where they toured the maze of historic cellars that cooled beer in the summer before the advent of the modern refrigerator.

The conference finished on June 24 with an oral session on Deep Learning followed by an invited talk from Masataka Goto, a well-known researcher and scientist, who made major contributions to the field of Semantic Audio. Goto introduced Hatsune Miku, a digital avatar which uses singing synthesis that has inspired many people in Japan to create and share multimedia content. Goto also explained how audio analysis technology could be applied to browse and interact with this web-native content.