Audio Processing Seminar Winter 2022/23

This is a joint seminar offered by the professors of the International Audio Laboratories Erlangen. In our seminar, we offer topics in various areas of audio and acoustic signal processing. In close cooperation with a tutor, students can deepen their knowledge in signal processing and machine learning by studying individually assigned research papers. In addition, the seminar allows students to further develop their skills in scientific reading, writing, and presenting.

Note that this is a technically oriented seminar aiming at master students of engineering and computer science. The offered topics require a good mathematical background, a solid understanding of digital signal processing, as well as general knowledge and interest in the field of audio. Below, we provide further information on the available subgroups and examples for seminar topics and literature. Note that the specific topics may change and will be announced at the beginning of each semester.

Dates and Deadlines

to be defined

Subgroup: Audio signal processing with the Internet of Things

In this seminar group, we study research papers about audio algorithms in the context of emerging IoT applications. Students will use the role playing concept to study and present the assigned papers from the perspectives of different expert groups.

Press here for example papers
  1. Alexandru Nelus, Rene Glitza and Rainer Martin
    Estimation of Microphone Clusters in Acoustic Sensor Networks Using Unsupervised Federated Learning
    In Proceedings of the IEEE international Conference on Acoustics, Speech, and Signal Processing (ICASSP): 761–765, 2021. DOI
    @inproceedings{nelus2021estimation,
    title={Estimation of Microphone Clusters in Acoustic Sensor Networks Using Unsupervised Federated Learning},
    author={Alexandru Nelus, Rene Glitza, and Rainer Martin},
    booktitle={Proceedings of the {IEEE} international Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
    pages={761--765},
    year={2021},
    organization={IEEE},
    doi = {10.1109/ICASSP39728.2021.9414186}
    }
  2. Yanping Zhu, Aimin Jiang and Hon Keung Kwan
    ADMM-Based Sensor Network Localization Using Low-Rank Approximation
    IEEE Sensors Journal, 18(20): 8463–8471, 2018. DOI
    @article{zhu2018admm,
    title={{ADMM}-Based Sensor Network Localization Using Low-Rank Approximation},
    author={Yanping Zhu, Aimin Jiang and Hon Keung Kwan},
    journal={{IEEE} Sensors Journal},
    volume={18},
    number={20},
    pages={8463--8471},
    year={2018},
    publisher={IEEE},
    doi = {10.1109/JSEN.2018.2866686}
    }
  3. Krishna Subramani and Paris Smaragdis
    Point Cloud Audio Processing
    In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA): 31–35, 2021. PDF
    @inproceedings{subramani2021point,
    title={Point Cloud Audio Processing},
    author={Krishna Subramani and Paris Smaragdis},
    booktitle={Proceedings of the {IEEE} Workshop on Applications of Signal Processing to Audio and Acoustics ({WASPAA})},
    pages={31--35},
    year={2021},
    organization={IEEE},
    url-pdf = {https://arxiv.org/pdf/2105.02469.pdf}
    }
  4. Sofia-Eirini Kotti, Richard Heusdens and Richard C Hendriks
    Clock-Offset and Microphone Gain Mismatch Invariant Beamforming
    In Proceedings of the 28th European Signal Processing Conference (EUSIPCO): 176–180, 2021. PDF DOI
    @inproceedings{kotti2021clock,
    title={Clock-Offset and Microphone Gain Mismatch Invariant Beamforming},
    author={Sofia-Eirini Kotti, Richard Heusdens, and Richard C Hendriks},
    booktitle={Proceedings of the 28th European Signal Processing Conference ({EUSIPCO})},
    pages={176--180},
    year={2021},
    organization={IEEE},
    doi = {10.23919/Eusipco47968.2020.9287852},
    url-pdf = {https://pure.tudelft.nl/ws/portalfiles/portal/83830242/0000176.pdf}
    }
  5. Tomoki Hayashi, Takenori Yoshimura and Yusuke Adachi
    Conformer-Based ID-Aware Autoencoder for Unsupervised Anomalous Sound Detection
    DCASE2020 Challenge, Tech. Rep., 2020. PDF
    @article{hayashi2020conformer,
    title={Conformer-Based ID-Aware Autoencoder for Unsupervised Anomalous Sound Detection},
    author={Tomoki Hayashi, Takenori Yoshimura, and Yusuke Adachi},
    journal={DCASE2020 Challenge, Tech. Rep.},
    year={2020},
    url-pdf = {https://dcase.community/documents/challenge2020/technical_reports/DCASE2020_Hayashi_111_t2.pdf}
    }
  6. Yasser Alsouda, Sabri Pllana and Arianit Kurti
    A Machine Learning Driven IoT Solution for Noise Classification in Smart Cities
    arXiv preprint arXiv:1809.00238, 2018. PDF
    @article{alsouda2018machine,
    title={A Machine Learning Driven IoT Solution for Noise Classification in Smart Cities},
    author={Yasser Alsouda, Sabri Pllana, and Arianit Kurti},
    journal={arXiv preprint arXiv:1809.00238},
    year={2018},
    url-pdf = {https://arxiv.org/pdf/1809.00238}
    }

Subgroup: Efficient Spatial Audio Coding & Rendering

  • Responsible Lecturer: Prof. Jürgen Herre
  • Topics: Signal Processing, Spatial Audio Coding, Auditory Perception
Press here for example papers
  1. Breebaart, Jeroen, Cengarle, Giulio, Lu, Lie, Mateos, Toni, Purnhagen, Heiko, and Tsingos, Nicolas
    Spatial Coding of Complex Object-Based Program Material
    Journal of the Audio Engineering Society, 67(7/8): 486–497, 2019. DOI
    @article{breebaart2019spatial,
    author={Breebaart, Jeroen and  Cengarle, Giulio and  Lu, Lie and  Mateos, Toni and Purnhagen, Heiko and Tsingos, Nicolas},
    journal={Journal of the Audio Engineering Society},
    title={Spatial Coding of Complex Object-Based Program Material},
    year={2019},
    volume={67},
    number={7/8},
    pages={486-497},
    doi={https://doi.org/10.17743/jaes.2018.0067},
    month={July}
    }
  2. Kentgens, Maximilian, Behler, Andreas, and Jax, Peter
    Translation of a Higher Order Ambisonics Sound Scene Based on Parametric Decomposition
    In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 151–155, 2020. DOI
    @inproceedings{9054414,
    author={Kentgens, Maximilian and Behler, Andreas and Jax, Peter},
    booktitle={Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
    title={Translation of a Higher Order Ambisonics Sound Scene Based on Parametric Decomposition},
    year={2020},
    volume={},
    number={},
    pages={151-155},
    doi={10.1109/ICASSP40776.2020.9054414}
    }
  3. Kelly, Jack, Woszczyk, Wieslaw, and King, Richard
    3D Impulse Response Convolution With Multichannel Direct Sound: Assessing Perceptual Equivalency Between Room- and Source-Impression for Music Production
    In Proceedings of the 151st Audio Engineering Society Convention, 2021. Details
    @inproceedings{kelly20213d,
    author={Kelly, Jack and Woszczyk, Wieslaw and King, Richard},
    booktitle={Proceedings of the 151st Audio Engineering Society Convention},
    title={3D Impulse Response Convolution With Multichannel Direct Sound: Assessing Perceptual Equivalency Between Room- and Source-Impression for Music Production},
    year={2021},
    month={October},
    url-details={https://www.aes.org/e-lib/online/browse.cfm?elib=21496}
    }
  4. Nishiguchi, Masayuki, Kato, Kodai, Watanabe, Kanji, Abe, Koji, and Takane, Shouichi
    Spatial Auditory Masking for Three-Dimensional Audio Coding
    In Proceedings of the 147th Audio Engineering Society Convention, 2019. Details
    @inproceedings{nishiguchi2019spatial,
    author={Nishiguchi, Masayuki and Kato, Kodai and Watanabe, Kanji and Abe, Koji and Takane, Shouichi},
    booktitle={Proceedings of the 147th Audio Engineering Society Convention},
    title={Spatial Auditory Masking for Three-Dimensional Audio Coding},
    year={2019},
    month={October},
    url-details={https://www.aes.org/e-lib/browse.cfm?elib=20632}
    }
  5. van de Par, Steven, Disch, Sascha, Niedermeier, Andreas, Burdiel Pérez, Elena, and Edler, Bernd
    Temporal Envelope-Based Psychoacoustic Modelling for Evaluating Non-Waveform Preserving Audio Codecs
    In Proceedings of the 147th Audio Engineering Society Convention, 2019. Details
    @inproceedings{vandepar2019temporal,
    author={van de Par, Steven and Disch, Sascha and Niedermeier, Andreas and Burdiel Pérez, Elena and Edler, Bernd},
    booktitle={Proceedings of the 147th Audio Engineering Society Convention},
    title={Temporal Envelope-Based Psychoacoustic Modelling for Evaluating Non-Waveform Preserving Audio Codecs},
    year={2019},
    month={October},
    url-details={https://www.aes.org/e-lib/online/browse.cfm?elib=20686}
    }

Subgroup: Music Processing

Music processing and music information retrieval (MIR) are exciting and challenging research areas. Music is not only a ubiquitous and vital part of our lives but also relates to many different research disciplines including signal processing, information retrieval, machine learning, and musicology. In this subgroup, we study computational techniques for processing, searching, organizing, and accessing music-related data. This will require a good understanding of general concepts in signal processing, data science, and machine learning. Furthermore, basic knowledge in music theory and a strong interest in music are extremely helpful to get enthusiastic about the field of music processing. This subgroup particularly aims at students who have participated in the lecture Music Processing Analysis.

Press here for example papers
  1. Jonathan Driedger and Meinard Müller
    A Review on Time-Scale Modification of Music Signals
    Applied Sciences, 6(2): 57–82, 2016. PDF
    @article{DriedgerMueller16_ReviewTSM_AppliedSciences,
    author  = {Jonathan Driedger and Meinard M{\"u}ller},
    journal = {Applied Sciences},
    title   = {A Review on Time-Scale Modification of Music Signals},
    year    = {2016},
    month   = {February},
    volume  = {6},
    number  = {2},
    pages   = {57--82},
    url-pdf   = {aps-w23/papers/2016_DriedgerMueller_TSMOverview_AppliedSciences_PrintedVersion.pdf}
    }
  2. Sebastian Ewert and Meinard Müller
    Using Score-Informed Constraints for NMF-based Source Separation
    In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 129–132, 2012. PDF Details
    @inproceedings{EwertM12_ScoreInformedNMF_ICASSP,
    author     = {Sebastian Ewert and Meinard M{\"u}ller},
    title      = {Using Score-Informed Constraints for {NMF}-based Source Separation},
    booktitle  = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
    address  = {Kyoto, Japan},
    year       = {2012},
    pages    = {129--132},
    url-pdf   = {aps-w23/papers/2012_EwertMueller_ScoreConstrainedNMF_ICASSP.pdf},
    url-details = {http://resources.mpi-inf.mpg.de/MIR/ICASSP2012-ScoreInformedNMF/}
    }
  3. Derry FitzGerald
    Harmonic/Percussive Separation Using Median Filtering
    In Proceedings of the International Conference on Digital Audio Effects (DAFx): 246–253, 2010. PDF
    @inproceedings{Fitzgerald10_HarmPercSep_DAFX,
    author      = {Derry FitzGerald},
    title       = {Harmonic/Percussive Separation Using Median Filtering},
    booktitle   = {Proceedings of the International Conference on Digital Audio Effects ({DAFx})},
    address     = {Graz, Austria},
    year        = {2010},
    pages       = {246--253},
    month       = {September},
    url-pdf   = {aps-w23/papers/2010_FitzGerald_HarmonicPercussiveSep_DAFx.pdf}
    }
  4. Daniel Stoller, Simon Durand, and Sebastian Ewert
    End-To-End Lyrics Alignment for Polyphonic Music Using an Audio-To-Character Recognition Model
    In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 181–185, 2019. PDF
    @inproceedings{StollerDE19_LyricsAlignment_ICASSP,
    author    = {Daniel Stoller and Simon Durand and Sebastian Ewert},
    title     = {End-To-End Lyrics Alignment for Polyphonic Music Using an Audio-To-Character Recognition Model},
    booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
    pages     = {181--185},
    address   = {Brighton, {UK}},
    year      = {2019},
    url-pdf   = {aps-w23/papers/2019_StollerEwert_LyricsCTC_ICASSP_arXiv.pdf},
    }

Subgroup: Spatial Audio Processing

This subgroup particularly aims at students who attended the Virtual Acoustics or Speech Enhancement lecture.

  • Responsible Lecturer: Prof. Emanuël Habets
  • Tutors: TBD
  • Topics: Microphone array processing, speech enhancement, acoustic scene analysis, Ambisonics
  • Expected background: Signal processing and/or machine learning
Press here for example papers
  1. N. Antonello, E. De Sena, M. Moonen, P.A. Naylor, and T. van Waterschoot
    Room Impulse Response Interpolation Using a Sparse Spatio-Temporal Representation of the Sound Field
    IEEE/ACM Transactions on Audio, Speech and Language Processing, 25(10): 1929–1941, 2017. PDF
    @article{Antonello2017,
    author  = {N. Antonello and E. De Sena and M. Moonen and P.A. Naylor and T. van Waterschoot},
    journal = {IEEE/ACM Transactions on Audio, Speech and Language Processing},
    title   = {Room Impulse Response Interpolation Using a Sparse Spatio-Temporal Representation of the Sound Field},
    year    = {2017},
    volume  = {25},
    number  = {10},
    pages   = {1929--1941},
    url-pdf   = {aps-w23/papers/sap_Antonello2017.pdf}
    }
  2. S. Tervo, J. Pätynen, A. Kuusinen, and T. Lokki
    Spatial Decomposition Method for Room Impulse Responses
    Journal of the Audio Engineering Society, 61: 17–28, 2012. PDF
    @article{Tervo2012,
    author  = {S. Tervo and J. P\"{a}tynen and A. Kuusinen and T. Lokki,},
    journal = {Journal of the Audio Engineering Society},
    title   = {Spatial Decomposition Method for Room Impulse Responses},
    year    = {2012},
    volume  = {61},
    pages   = {17--28},
    url-pdf   = {aps-w23/papers/sap_Tervo2012.pdf}
    }
  3. S. Chakrabarty and E.A.P. Habets
    Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals
    IEEE Journal of Selected Topics in Signal Processing, 13(1): 8–21, 2019. PDF
    @article{Chakrabarty2019,
    author  = {S. Chakrabarty and E.A.P. Habets},
    title = {Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals},
    journal   = {IEEE Journal of Selected Topics in Signal Processing},
    year    = {2019},
    month   = {March},
    volume  = {13},
    number  = {1},
    pages   = {8--21},
    url-pdf   = {aps-w23/papers/sap_Chakrabarty2019.pdf}
    }
  4. V. Pulkki
    Virtual Sound Source Positioning Using Vector Base Amplitude Panning
    Journal of the Audio Engineering Society, 45(6): 456–466, 1997. PDF
    @article{Pulkki1997,
    author  = {V. Pulkki},
    journal = {Journal of the Audio Engineering Society},
    title   = {Virtual Sound Source Positioning Using Vector Base Amplitude Panning},
    year    = {1997},
    month   = {June},
    volume  = {45},
    number  = {6},
    pages   = {456--466},
    url-pdf   = {aps-w23/papers/sap_Pulkki1997.pdf}
    }
  5. M. Noisternig, T. Musil, A. Sontacchi, and R. Holdrich
    3D Binaural Sound Reproduction Using a Virtual Ambisonic Approach
    In Proceedings of the International Symposium an Virtual Environments, Human-Computer Interfaces, and Measurement Systems: 174–178, 2003. PDF
    @inproceedings{Noisternig2003,
    author  = {M. Noisternig and T. Musil and A. Sontacchi and R. Holdrich},
    booktitle = {Proceedings of the International Symposium an Virtual Environments, Human-Computer Interfaces, and Measurement Systems},
    title   = {3D Binaural Sound Reproduction Using a Virtual Ambisonic Approach},
    pages   ={174-178},
    year    = {2003},
    month   = {July},
    url-pdf   = {aps-w23/papers/sap_Noisternig2003.pdf}
    }
  6. S. Adavanne, A. Politis, J. Nikunen, and T. Virtanen
    Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
    IEEE Journal of Selected Topics in Signal Processing, 13(1): 34–48, 2019. PDF
    @article{Adavanne2019,
    author  = {S. Adavanne and A. Politis and J. Nikunen and T. Virtanen,},
    journal = {IEEE Journal of Selected Topics in Signal Processing},
    title   = {Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks},
    year    = {2019},
    month   = {March},
    volume  = {13},
    number  = {1},
    pages   = {34--48},
    url-pdf   = {aps-w23/papers/sap_Adavanne2019.pdf}
    }
  7. Yoshioka, T. and Nakatani, T.
    Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening
    IEEE Transactions on Audio, Speech, and Language Processing, 20(10): 2707–2720, 2012. PDF
    @article{Yoshioka2012,
    author={Yoshioka, T. and Nakatani, T.},
    journal={IEEE Transactions on Audio, Speech, and Language Processing},
    title={Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening},
    year={2012},
    volume={20},
    number={10},
    pages={2707-2720},
    url-pdf   = {aps-w23/papers/sap_Yoshioka2012.pdf}
    }

Subgroup: Speech Processing / Perceptual Models

The offered topics usually assume a broad range of prior knowledge. If terms like DFT, Spectral Analysis, Filterbanks, but also Deep Neural Networks are new to you, it is highly recommended to take a look into these topics before the seminar starts.

  • Responsible Lecturer: Prof. Bernd Edler
  • Tutors: Bernd Edler and his research team
  • Topics: Signal Processing and Machine Learning for Speech Processing and/or Perceptual Models
  • Expected background: Signal Processing and/or Machine Learning
Press here for example papers
  1. Florian Schuh, Sascha Dick, Richard Füg, Christian R. Helmrich, Nikolaus Rettelbach, and Tobias Schwegler
    Efficient Multichannel Audio Transform Coding with Low Delay and Complexity
    In Proceedings of the 141st Audio Engineering Society Convention, 2016. Details
    @inproceedings{schuh2016efficient,
    title = {Efficient {M}ultichannel {A}udio {T}ransform {C}oding with {L}ow {D}elay and {C}omplexity},
    author = {Florian Schuh and Sascha Dick and Richard Füg and Christian R. Helmrich and Nikolaus Rettelbach and Tobias Schwegler},
    booktitle = {Proceedings of the 141st Audio Engineering Society Convention},
    month = {Sep},
    year = {2016},
    url-details = {http://www.aes.org/e-lib/browse.cfm?elib=18464}
    }
  2. Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae
    HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
    In Advances in Neural Information Processing Systems: 17022–17033, 2020.
    @inproceedings{NEURIPS2020_c5d73680,
    author = {Jungil Kong and Jaehyeon Kim and Jaekyoung Bae},
    booktitle = {Advances in Neural Information Processing Systems},
    editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
    pages = {17022--17033},
    publisher = {Curran Associates, Inc.},
    title = {HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis},
    volume = {33},
    year = {2020}
    }
  3. Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber
    Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks
    In Proceedings of the 23rd International Conference on Machine Learning: 369–376, 2006. DOI
    @inproceedings{10.1145/1143844.1143891,
    author = {Alex Graves and Santiago Fern\'{a}ndez and Faustino Gomez and J\"{u}rgen Schmidhuber},
    title = {Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks},
    year = {2006},
    isbn = {1595933832},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/1143844.1143891},
    doi = {10.1145/1143844.1143891},
    booktitle = {Proceedings of the 23rd International Conference on Machine Learning},
    pages = {369–376},
    numpages = {8},
    location = {Pittsburgh, Pennsylvania, USA},
    series = {ICML '06}
    }
  4. Ryan Prenger, Rafael Valle, and Bryan Catanzaro
    Waveglow: A Flow-based Generative Network for Speech Synthesis
    In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 3617–3621, 2019. DOI
    @inproceedings{8683143,
    author={Ryan Prenger and Rafael Valle and Bryan Catanzaro},
    booktitle={Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
    title={Waveglow: A Flow-based Generative Network for Speech Synthesis},
    year={2019},
    volume={},
    number={},
    pages={3617-3621},
    doi={10.1109/ICASSP.2019.8683143}
    }
  5. Ning Guo and Bernd Edler
    Frequency Domain Long-Term Prediction for Low Delay General Audio Coding
    IEEE Signal Processing Letters, 28: 1185–1189, 2021. DOI
    @article{9444770,
    author={Ning Guo and Bernd Edler},
    journal={IEEE Signal Processing Letters},
    title={Frequency Domain Long-Term Prediction for Low Delay General Audio Coding},
    year={2021},
    volume={28},
    number={},
    pages={1185-1189},
    doi={10.1109/LSP.2021.3084503}
    }
  6. Xavier Serra and Julius Smith
    Spectral Modeling Synthesis: A Sound Analysis/Synthesis System Based on a Deterministic Plus Stochastic Decomposition
    Computer Music Journal, 14(4): 12–24, 1990.
    @article{10.2307/3680788,
    ISSN = {01489267, 15315169},
    URL = {http://www.jstor.org/stable/3680788},
    author = {Xavier Serra and Julius Smith},
    journal = {Computer Music Journal},
    number = {4},
    pages = {12--24},
    publisher = {The MIT Press},
    title = {Spectral Modeling Synthesis: A Sound Analysis/Synthesis System Based on a Deterministic Plus Stochastic Decomposition},
    urldate = {2022-08-16},
    volume = {14},
    year = {1990}
    }