Search Results for author: Berrak Sisman

Found 22 papers, 7 papers with code

Speech Synthesis with Mixed Emotions

no code implementations11 Aug 2022 Kun Zhou, Berrak Sisman, Rajib Rana, B. W. Schuller, Haizhou Li

We then incorporate our formulation into a sequence-to-sequence emotional text-to-speech framework.

Emotional Speech Synthesis

Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning

1 code implementation15 Jun 2022 Rui Liu, Berrak Sisman, Björn Schuller, Guanglai Gao, Haizhou Li

In this paper, we propose a data-driven deep learning model, i. e. StrengthNet, to improve the generalization of emotion strength assessment for seen and unseen speech.

Emotion Classification Multi-Task Learning +1

Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion

no code implementations20 Oct 2021 Zongyang Du, Berrak Sisman, Kun Zhou, Haizhou Li

Expressive voice conversion performs identity conversion for emotional speakers by jointly converting speaker identity and emotional style.

Disentanglement Voice Conversion

DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding

no code implementations13 Oct 2021 Sergey Nikonorov, Berrak Sisman, Mingyang Zhang, Haizhou Li

At the same time, as the deep neural analyzer is learnable, it is expected to be more accurate for signal reconstruction and manipulation, and generalizable from speech to singing.

Speech Synthesis Voice Conversion

VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over

no code implementations7 Oct 2021 Junchen Lu, Berrak Sisman, Rui Liu, Mingyang Zhang, Haizhou Li

The proposed VisualTTS adopts two novel mechanisms that are 1) textual-visual attention, and 2) visual fusion strategy during acoustic decoding, which both contribute to forming accurate alignment between the input text content and lip motion in input lip sequence.

Speech Synthesis

StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis

1 code implementation7 Oct 2021 Rui Liu, Berrak Sisman, Haizhou Li

The emotion strength of synthesized speech can be controlled flexibly using a strength descriptor, which is obtained by an emotion attribute ranking function.

Data Augmentation Emotional Speech Synthesis +1

Emotional Voice Conversion: Theory, Databases and ESD

1 code implementation31 May 2021 Kun Zhou, Berrak Sisman, Rui Liu, Haizhou Li

In this paper, we first provide a review of the state-of-the-art emotional voice conversion research, and the existing emotional speech databases.

Voice Conversion

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training

1 code implementation31 Mar 2021 Kun Zhou, Berrak Sisman, Haizhou Li

In stage 2, we perform emotion training with a limited amount of emotional speech data, to learn how to disentangle emotional style and linguistic information from the speech.

Voice Conversion

VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech

no code implementations3 Nov 2020 Kun Zhou, Berrak Sisman, Haizhou Li

Emotional voice conversion (EVC) aims to convert the emotion of speech from one state to another while preserving the linguistic content and speaker identity.

Disentanglement Voice Conversion

Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset

2 code implementations28 Oct 2020 Kun Zhou, Berrak Sisman, Rui Liu, Haizhou Li

Emotional voice conversion aims to transform emotional prosody in speech while preserving the linguistic content and speaker identity.

Speech Emotion Recognition Style Transfer +1

GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis

no code implementations23 Oct 2020 Rui Liu, Berrak Sisman, Haizhou Li

Attention-based end-to-end text-to-speech synthesis (TTS) is superior to conventional statistical methods in many ways.

Graph Attention Speech Synthesis +1

Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS

no code implementations11 Aug 2020 Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao, Haizhou Li

We propose a multi-task learning scheme for Tacotron training, that optimizes the system to predict both Mel spectrum and phrase breaks.

Multi-Task Learning Speech Synthesis

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN

no code implementations11 Aug 2020 Zongyang Du, Kun Zhou, Berrak Sisman, Haizhou Li

It relies on non-parallel training data from two different languages, hence, is more challenging than mono-lingual voice conversion.

Voice Conversion

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data

no code implementations10 Aug 2020 Junchen Lu, Kun Zhou, Berrak Sisman, Haizhou Li

We train an encoder to disentangle singer identity and singing prosody (F0 contour) from phonetic content.

Voice Conversion

Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion

1 code implementation13 May 2020 Kun Zhou, Berrak Sisman, Mingyang Zhang, Haizhou Li

We consider that there is a common code between speakers for emotional expression in a spoken language, therefore, a speaker-independent mapping between emotional states is possible.

Voice Conversion

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

no code implementations2 Feb 2020 Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao, Haizhou Li

To address this problem, we propose a new training scheme for Tacotron-based TTS, referred to as WaveTTS, that has 2 loss functions: 1) time-domain loss, denoted as the waveform loss, that measures the distortion between the natural and generated waveform; and 2) frequency-domain loss, that measures the Mel-scale acoustic feature loss between the natural and generated acoustic features.

Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data

1 code implementation1 Feb 2020 Kun Zhou, Berrak Sisman, Haizhou Li

Many studies require parallel speech data between different emotional patterns, which is not practical in real life.

Voice Conversion

Teacher-Student Training for Robust Tacotron-based TTS

no code implementations7 Nov 2019 Rui Liu, Berrak Sisman, Jingdong Li, Feilong Bao, Guanglai Gao, Haizhou Li

We first train a Tacotron2-based TTS model by always providing natural speech frames to the decoder, that serves as a teacher model.

Knowledge Distillation

VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019

no code implementations27 May 2019 Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura

Our proposed approach significantly improved the intelligibility (in CER), the MOS, and discrimination ABX scores compared to the official ZeroSpeech 2019 baseline or even the topline.

Cannot find the paper you are looking for? You can Submit a new open access paper.