Speaker anonymization systems continue to improve their ability to obfuscate the original speaker characteristics in a speech signal, but often create processing artifacts and unnatural sounding voices as a tradeoff. Many of those systems stem from the VoicePrivacy Challenge Baseline B1, using a neural vocoder to synthesize speech from an F0, x-vectors and bottleneck features-based speech representation. Inspired by this, we investigate the reproduction capabilities of the aforementioned baseline, to assess how successful the shared methodology is in synthesizing human-like speech. We use four objective metrics to measure speech quality, waveform similarity, and F0 similarity. Our findings indicate that both the speech representation and the vocoder introduces artifacts, causing an unnatural perception. A MUSHRA-like listening test on 18 subjects corroborate our findings, motivating further research on the analysis and synthesis components of the VoicePrivacy Challenge Baseline B1.
Listening test sound stimuli
8455-210777-0034.wav
Activate
Play
Stop
Repeat
--:--:--:--- / --:--:--:---
mel-nsf-spk-4k
Solo
joint-hifigan-spk
Solo
am-nsf-spk
Solo
mel-nsf-spk
Solo
reference
Solo
B1b-utt
Solo
B1b-spk
Solo
mel-nsf-pt-spk
Solo
p229_016_mic2.wav
Activate
Play
Stop
Repeat
--:--:--:--- / --:--:--:---
mel-nsf-spk-4k
Solo
joint-hifigan-spk
Solo
am-nsf-spk
Solo
mel-nsf-spk
Solo
reference
Solo
B1b-utt
Solo
B1b-spk
Solo
mel-nsf-pt-spk
Solo
p226_080_mic2.wav
Activate
Play
Stop
Repeat
--:--:--:--- / --:--:--:---
mel-nsf-spk-4k
Solo
joint-hifigan-spk
Solo
am-nsf-spk
Solo
mel-nsf-spk
Solo
reference
Solo
B1b-utt
Solo
B1b-spk
Solo
mel-nsf-pt-spk
Solo
p232_015_mic2.wav
Activate
Play
Stop
Repeat
--:--:--:--- / --:--:--:---
mel-nsf-spk-4k
Solo
joint-hifigan-spk
Solo
am-nsf-spk
Solo
mel-nsf-spk
Solo
reference
Solo
B1b-utt
Solo
B1b-spk
Solo
mel-nsf-pt-spk
Solo
5142-36377-0023.wav
Activate
Play
Stop
Repeat
--:--:--:--- / --:--:--:---
mel-nsf-spk-4k
Solo
joint-hifigan-spk
Solo
am-nsf-spk
Solo
mel-nsf-spk
Solo
reference
Solo
B1b-utt
Solo
B1b-spk
Solo
mel-nsf-pt-spk
Solo
p287_015_mic2.wav
Activate
Play
Stop
Repeat
--:--:--:--- / --:--:--:---
mel-nsf-spk-4k
Solo
joint-hifigan-spk
Solo
am-nsf-spk
Solo
mel-nsf-spk
Solo
reference
Solo
B1b-utt
Solo
B1b-spk
Solo
mel-nsf-pt-spk
Solo
1995-1837-0004.wav
Activate
Play
Stop
Repeat
--:--:--:--- / --:--:--:---
mel-nsf-spk-4k
Solo
joint-hifigan-spk
Solo
am-nsf-spk
Solo
mel-nsf-spk
Solo
reference
Solo
B1b-utt
Solo
B1b-spk
Solo
mel-nsf-pt-spk
Solo
8463-294828-0006.wav
Activate
Play
Stop
Repeat
--:--:--:--- / --:--:--:---
mel-nsf-spk-4k
Solo
joint-hifigan-spk
Solo
am-nsf-spk
Solo
mel-nsf-spk
Solo
reference
Solo
B1b-utt
Solo
B1b-spk
Solo
mel-nsf-pt-spk
Solo
6829-68771-0013.wav
Activate
Play
Stop
Repeat
--:--:--:--- / --:--:--:---
mel-nsf-spk-4k
Solo
joint-hifigan-spk
Solo
am-nsf-spk
Solo
mel-nsf-spk
Solo
reference
Solo
B1b-utt
Solo
B1b-spk
Solo
mel-nsf-pt-spk
Solo
p269_015_mic2.wav
Activate
Play
Stop
Repeat
--:--:--:--- / --:--:--:---
mel-nsf-spk-4k
Solo
joint-hifigan-spk
Solo
am-nsf-spk
Solo
mel-nsf-spk
Solo
reference
Solo
B1b-utt
Solo
B1b-spk
Solo
mel-nsf-pt-spk
Solo
7729-102255-0031.wav
Activate
Play
Stop
Repeat
--:--:--:--- / --:--:--:---
mel-nsf-spk-4k
Solo
joint-hifigan-spk
Solo
am-nsf-spk
Solo
mel-nsf-spk
Solo
reference
Solo
B1b-utt
Solo
B1b-spk
Solo
mel-nsf-pt-spk
Solo
8230-279154-0024.wav
Activate
Play
Stop
Repeat
--:--:--:--- / --:--:--:---
mel-nsf-spk-4k
Solo
joint-hifigan-spk
Solo
am-nsf-spk
Solo
mel-nsf-spk
Solo
reference
Solo
B1b-utt
Solo
B1b-spk
Solo
mel-nsf-pt-spk
Solo
p225_024_mic2.wav
Activate
Play
Stop
Repeat
--:--:--:--- / --:--:--:---
mel-nsf-spk-4k
Solo
joint-hifigan-spk
Solo
am-nsf-spk
Solo
mel-nsf-spk
Solo
reference
Solo
B1b-utt
Solo
B1b-spk
Solo
mel-nsf-pt-spk
Solo
260-123286-0018.wav
Activate
Play
Stop
Repeat
--:--:--:--- / --:--:--:---
mel-nsf-spk-4k
Solo
joint-hifigan-spk
Solo
am-nsf-spk
Solo
mel-nsf-spk
Solo
reference
Solo
B1b-utt
Solo
B1b-spk
Solo
mel-nsf-pt-spk
Solo
Paper (click to enlarge)
@techreport{2023_gaznepoglu_evaluation_vpc,
author = {Ünal Ege Gaznepoğlu, Nils Peters},
institution = {3rd ISCA Symp. on Security and Privacy in Speech Communication (SPSC)},
title = {Evaluation of the Speech Resynthesis Capabilities of the VoicePrivacy Challenge Baseline B1},
keywords = {},
year = {2023}}