Ferienakademie 2024, Sarntal (22.09. - 04.10.2024)

Course 8: Learning with Music Signals

Main Tutor/Lecturer: Simon Schwär, Hans-Ulrich Berendes, Prof. Dr. Meinard Müller

Group: Music and Singing Voice Synthesis (SYNTH)

A vocoder (voice encoder) refers to a technology used to analyze and synthesize human voice by decomposing sound into its spectral envelope and excitation signal. Vocoder models include source-filter models inspired by the human vocal tract and sinusoidal models that combine time-varying sine waves. Sinusoidal modeling represents audio signals as sums of sinusoidal components, capturing the harmonic structure of sounds. Recently, Differentiable Digital Signal Processing (DDSP) has been introduced as a framework that merges traditional DSP with deep learning to enhance audio synthesis capabilities. By integrating classic signal processing with modern machine learning, DDSP improves flexibility, accuracy, and expressivity in audio applications. In this group, we explore recent advances in DDSP, specifically for synthesizing singing.

Literature

  1. Xavier Serra and Julius Smith III
    Spectral Modeling Synthesis: A Sound Analysis/Synthesis System Based on a Deterministic Plus Stochastic Decomposition
    Computer Music Journal, 14(4): 12–24, 1990. PDF
    @article{SerraS90_SinesTransientsNoiseModel_CMJ,
    author    = {Xavier Serra and Julius {Smith III}},
    journal   = {Computer Music Journal},
    number    = {4},
    pages     = {12--24},
    publisher = {The MIT Press},
    title     = {Spectral Modeling Synthesis: A Sound Analysis/Synthesis System Based on  a Deterministic Plus Stochastic Decomposition},
    volume    = {14},
    year      = {1990},
    url-pdf   = {1990_SerraS_SpectralModelingSynthesis_CMJ.pdf}
    }
  2. Jesse Engel, Lamtharn Hantrakul, Chenjie Gu, and Adam Roberts
    DDSP: Differentiable Digital Signal Processing
    In Proceedings of the International Conference on Learning Representations (ICLR), 2020. PDF
    @inproceedings{EngelHGR20_DifferentiableDSP_ICLR,
    title     = {{DDSP}: Differentiable Digital Signal Processing},
    author    = {Jesse Engel and Lamtharn Hantrakul and Chenjie Gu and Adam Roberts},
    booktitle = {Proceedings of the International Conference on Learning Representations ({ICLR})},
    year      = {2020},
    address   = {Virtual},
    url-pdf   = {2020_EngelHGR_DDSP_ICLR.pdf}
    }
  3. Simon Schwär and Meinard Müller
    Multi-Scale Spectral Loss Revisited
    IEEE Signal Processing Letters, 30: 1712–1716, 2023. PDF DOI
    @article{SchwaerM23_MultiScaleSpecLoss_IEEE-SPL,
    author    = {Simon Schw{\"a}r and Meinard M{\"u}ller},
    title     = {Multi-Scale Spectral Loss Revisited},
    journal   = {{IEEE} Signal Processing Letters},
    year      = {2023},
    volume    = {30},
    number    = {},
    pages     = {1712--1716},
    doi       = {10.1109/LSP.2023.3333205},
    url-pdf   = {2023_SchwaerM_MSSLossRevisited_IEEE-SPL.pdf}
    }
  4. Chin-Yun Yu and György Fazekas
    Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables
    In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 667–675, 2023. PDF
    @inproceedings{YuF23_DifferentiableLPC_ISMIR,
    title     = {Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables},
    author    = {Chin-Yun Yu and Gy{\"o}rgy Fazekas},
    booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
    pages     = {667--675},
    address   = {Milano, Italy},
    year      = {2023},
    url-pdf   = {2023_YuF_GlottalFlowSVS_ISMIR.pdf}
    }
  5. Hayes, Ben, Shier, Jordie, Fazekas, György, McPherson, Andrew, and Saitis, Charalampos
    A review of differentiable digital signal processing for music and speech synthesis
    Frontiers in Signal Processing, 3, 2024. PDF DOI
    @article{HayesSFMS24_DDSPReview_Frontiers,
    author    = {Hayes, Ben and Shier, Jordie and Fazekas, Gy{\"{o}}rgy and McPherson, Andrew and Saitis, Charalampos},
    title     = {A review of differentiable digital signal processing for music and speech synthesis},
    journal   = {Frontiers in Signal Processing},
    volume    = {3},
    year      = {2024},
    doi       = {10.3389/frsip.2023.1284100},
    issn      = {2673-8198},
    url-pdf   = {2024_HayesSFMS_DDSPReview_Frontiers.pdf}
    }
  6. Simon Schwär, Michael Krause, Michael Fast, Sebastian Rosenzweig, Frank Scherbaum, and Meinard Müller
    A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction
    Transaction of the International Society for Music Information Retrieval (TISMIR), 7(1): 30–43, 2024. PDF DOI
    @article{SchwaerKFRSM24_LarynxMicSVR_TISMIR,
    author    = {Simon Schw{\"a}r and Michael Krause and Michael Fast and Sebastian Rosenzweig and Frank Scherbaum and Meinard M{\"u}ller},
    title     = {A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction},
    journal   = {Transaction of the International Society for Music Information Retrieval ({TISMIR})},
    year      = {2024},
    volume    = {7},
    number    = {1},
    pages     = {30--43},
    doi       = {10.5334/tismir.166},
    url-pdf   = {2024_SchwaerKFRSM_LarynxMicSVR_TISMIR.pdf}
    }

Further Links