An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023

3 Jul 2023 · Sheng Zhao, Qilong Yuan, Yibo Duan, Zhuoyue Chen ·

The task of synthetic speech generation is to generate language content from a given text, then simulating fake human voice.The key factors that determine the effect of synthetic speech generation mainly include speed of generation, accuracy of word segmentation, naturalness of synthesized speech, etc. This paper builds an end-to-end multi-module synthetic speech generation model, including speaker encoder, synthesizer based on Tacotron2, and vocoder based on WaveRNN. In addition, we perform a lot of comparative experiments on different datasets and various model structures. Finally, we won the first place in the ADD 2023 challenge Track 1.1 with the weighted deception success rate (WDSR) of 44.97%.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Face Swapping

Datasets

LibriTTS AISHELL-3

Results from the Paper

Edit

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

ReLU • Sigmoid Activation • Softmax • SPEED • Tanh Activation • WaveRNN

Edit Social Preview

An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove