Exploring the Importance of F0 Trajectories for Speaker Anonymization using X-vectors and Neural Waveform Models

Ünal Ege Gaznepoğlu and Nils Peters

Presented at the 2021 Workshop on Machine Learning in Speech and Language Processing (MLSLP), Sep 6 2021.

MLSLP 2021 is colocated with Interspeech 2021 and is a workshop of SIGML, the SIG on machine learning in speech and language processing of ISCA (the International Speech Communication Association).

Abstract

Voice conversion for speaker anonymization is an emerging field in speech processing research. Many state-of-the-art approaches are based on the resynthesis of the phoneme posteriorgrams (PPG), the fundamental frequency (F0) of the input signal together with modified X-vectors. Our research focuses on the role of F0 for speaker anonymization, which is an understudied area. Utilizing the VoicePrivacy Challenge 2020 framework and its datasets we developed and evaluated eight low-complexity F0 modifications prior resynthesis. We found that modifying the F0 can improve speaker anonymization by as much as 8% with minor word-error rate degradation.

Audio Examples

Female Voice Example

Male Voice Example

Poster (click to enlarge)

GaznepogluPoster

Paper (click to enlarge)

cover
Ünal Ege Gaznepoğlu and Nils Peters
Exploring the Importance of F0 Trajectories for Speaker Anonymization using X-vectors and Neural Waveform Models
In Proc. of the 2021 Workshop on Machine Learning in Speech and Language Processing (MLSLP), 2021.

How to Reference

@inproceedings{GaznepogluMLSLP2021,
  author = {{\"U}nal Ege Gaznepo{\u g}lu and Nils Peters},
  title = {Exploring the Importance of {F0} Trajectories for Speaker Anonymization using {X}-vectors and Neural Waveform Models},
  booktitle = {Proc. of the 2021 Workshop on Machine Learning in Speech and Language Processing (MLSLP)},
  keywords = {voice privacy, anonymization, pseudonymization, F0},  
  year = {2021}}