Neural Binaural Upmix of Stereo Content

Philipp Grundhuber, Michael Lovedee-Turner, and Emanuël A. P. Habets

Abstract

While immersive music productions have become popular in recent years, music content produced during the last decades has been predominantly mixed for stereo. This paper presents a data-driven approach to automatic binaural upmixing of stereo music. The network architecture HDemucs, previously utilized for both source separation and binauralization, is leveraged for an end-to-end approach to binaural upmixing. We employ two distinct datasets, demonstrating that while custom-designed training data enhances the accuracy of spatial positioning, the use of professionally mixed music yields superior spatialization. The trained networks show a capacity to process multiple simultaneous sources individually and add valid binaural cues, effectively positioning sources with an average azimuthal error of less than 11.3 degree. A listening test with binaural experts shows it outperforms digital signal processing-based approaches to binauralization of stereo content in terms of spaciousness while preserving audio quality.

Audio Examples

  • Original Stereo: unprocessed stereo-channel input signal
  • Binaural Upmix NBU S: Binaural upmix using NBU S, trained on studio mixes
  • Binaural Upmix NBU C+: Binaural upmix using NBU C+, trained on the Cambridge MT database with added silence

Music11

Music21

Music32


1 Wonder Under by Glad Rags from https://freemusicarchive.org/music/Glad_Rags/Wonder_Under under CC-BY License

2 Every Time by Katy Kirby from https://freemusicarchive.org/music/Katy_Kirby/Katy_Kirby/Katy_Kirby_-_3_-_01_Every_Time/ under CC-BY License