AudioLabs - A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction

A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction

This is the accompanying website to the article [1].

Simon Schwär, Michael Krause, Michael Fast, Sebastian Rosenzweig, Frank Scherbaum, and Meinard Müller
A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction
Transaction of the International Society for Music Information Retrieval (TISMIR), 7(1): 30–43, 2024. PDF Details DOI

@article{SchwaerKFRSM24_LarynxMicSVR_TISMIR,
author  = {Simon Schw{\"a}r and Michael Krause and Michael Fast and Sebastian Rosenzweig and Frank Scherbaum and Meinard M{\"u}ller},
title   = {A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction},
journal = {Transaction of the International Society for Music Information Retrieval ({TISMIR})},
year    = {2024},
volume  = {7},
number  = {1},
pages   = {30--43},
doi     = {10.5334/tismir.166},
url-pdf = {2024_SchwaerKFRSM_LarynxMicSVR_TISMIR.pdf},
url-details = {https://www.audiolabs-erlangen.de/resources/MIR/LM-SVR/}
}

Abstract

Larynx microphones (LMs) provide a practical way to obtain crosstalk-free recordings of the human voice by picking up vibrations directly from the throat. This can be useful in a multitude of music information retrieval scenarios related to singing, e.g., the analysis of individual voices recorded in environments with lots of interfering noise. However, LMs have a limited frequency range and barely capture the effects of the vocal tract, which makes the recorded signal unsuitable for downstream tasks that require high-quality recordings. In this paper, we introduce the task of reconstructing a natural sounding, high-quality singing voice recording from an LM signal. With an explicit focus on the singing voice, the problem lies at the intersection of speech enhancement and singing voice synthesis with the additional requirement of faithful reproduction of expressive parameters like dynamics and intonation. In this context, we make three main contributions. First, we publish a dataset with over 3.5 hours of popular music we recorded with four amateur singers accompanied by a guitar, where both LM and clean close-up microphone signals are available. Second, we propose a data-driven baseline approach for singing voice reconstruction from LM signals using differentiable signal processing, inspired by a source-filter model that emulates the missing vocal tract effects. Third, we evaluate the baseline with a listening test and further show that it can improve the accuracy of lyrics transcription as an exemplary downstream tasks.

Dataset: LM-SSD

The full dataset with all 348 uncompressed audio files can be downloaded here (ca. 6 GB).
To preview the provided signals for individual songs, please click on "View Song" in the overview table below.

ID	Song Title	Original Artist	Singer	Duration (mm:ss)	Link
AA	All Alone	Michael Fast	1M	27:03	View Song
TS	The Scientist	Coldplay	1M	21:37	View Song
YF	Your Fires	All The Luck In The World	1M	24:21	View Song
DL	Dezemberluft	Heisskalt	2M	14:47	View Song
BB	Books From Boxes	Maxïmo Park	2M	17:39	View Song
NB	Narben	Alligatoah	2M	11:47	View Song
SG	Supergirl	Reamonn	3F, 1M	26:34	View Song
OC	One Call Away	Charlie Puth	3F, 1M	19:32	View Song
PL	Past Life	Trevor Daniel & Selena Gomez	3F, 1M	17:45	View Song
CC	Chasing Cars	Snow Patrol	4F	28:10	View Song
BT	Breakfast At Tiffany's	Deep Blue Something	4F	22:16	View Song
LL	Little Lion Man	Mumford & Sons	4F	19:06	View Song

Baseline LM-SVR System: Reconstruction Results

The following example excerpts are presented as used in the listening test. The used train/test splits for each take can be downloaded here.

Example I (1M)

Activate

Play
Stop
Repeat
--:--:--:--- / --:--:--:---

Reference (CM)
- Solo
Larynx Microphone (LM)
- Solo
Naive Approach (NA)
- Solo
Overfitting (OF)
- Solo
Different Take (DT)
- Solo
Different Song (DS)
- Solo

Example II (2M)

Activate

Play
Stop
Repeat
--:--:--:--- / --:--:--:---

Reference (CM)
- Solo
Larynx Microphone (LM)
- Solo
Naive Approach (NA)
- Solo
Overfitting (OF)
- Solo
Different Take (DT)
- Solo
Different Song (DS)
- Solo

Example III (3F)

Activate

Play
Stop
Repeat
--:--:--:--- / --:--:--:---

Reference (CM)
- Solo
Larynx Microphone (LM)
- Solo
Naive Approach (NA)
- Solo
Overfitting (OF)
- Solo
Different Take (DT)
- Solo
Different Song (DS)
- Solo

Example IV (4F)

Activate

Play
Stop
Repeat
--:--:--:--- / --:--:--:---

Reference (CM)
- Solo
Larynx Microphone (LM)
- Solo
Naive Approach (NA)
- Solo
Overfitting (OF)
- Solo
Different Take (DT)
- Solo
Different Song (DS)
- Solo

Acknowledgments

We would like to thank the musicians for their contribution to the dataset and Daniel Vollmer for his help with the LMs. The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer Institute for Integrated Circuits IIS. This project is supported by the German Research Foundation (DFG MU 2686/13-2).

License

The recordings in LM-SSD are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Note that some songs in the dataset are cover versions. In these cases, it may be required to obtain a separate license for the composition, depending on the intended use case.