Search Results for author: Eric Battenberg

Found 13 papers, 4 papers with code

Learning the joint distribution of two sequences using little or no paired data

no code implementations6 Dec 2022 Soroosh Mariooryad, Matt Shannon, Siyuan Ma, Tom Bagby, David Kao, Daisy Stanton, Eric Battenberg, RJ Skerry-Ryan

We present a noisy channel generative model of two sequences, for example text and speech, which enables uncovering the association between the two modalities when limited paired data is available.

Variational Inference

Speaker Generation

no code implementations7 Nov 2021 Daisy Stanton, Matt Shannon, Soroosh Mariooryad, RJ Skerry-Ryan, Eric Battenberg, Tom Bagby, David Kao

We call this task "speaker generation", and present TacoSpawn, a system that performs competitively at this task.

Transfer Learning

Non-saturating GAN training as divergence minimization

no code implementations15 Oct 2020 Matt Shannon, Ben Poole, Soroosh Mariooryad, Tom Bagby, Eric Battenberg, David Kao, Daisy Stanton, RJ Skerry-Ryan

Non-saturating generative adversarial network (GAN) training is widely used and has continued to obtain groundbreaking results.

Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

3 code implementations23 Oct 2019 Eric Battenberg, RJ Skerry-Ryan, Soroosh Mariooryad, Daisy Stanton, David Kao, Matt Shannon, Tom Bagby

Despite the ability to produce human-level speech for in-domain text, attention-based end-to-end text-to-speech (TTS) systems suffer from text alignment failures that increase in frequency for out-of-domain text.

Speech Synthesis

Semi-Supervised Generative Modeling for Controllable Speech Synthesis

no code implementations ICLR 2020 Raza Habib, Soroosh Mariooryad, Matt Shannon, Eric Battenberg, RJ Skerry-Ryan, Daisy Stanton, David Kao, Tom Bagby

We present a novel generative model that combines state-of-the-art neural text-to-speech (TTS) with semi-supervised probabilistic latent variable models.

Speech Synthesis

Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis

no code implementations8 Jun 2019 Eric Battenberg, Soroosh Mariooryad, Daisy Stanton, RJ Skerry-Ryan, Matt Shannon, David Kao, Tom Bagby

Recent work has explored sequence-to-sequence latent variable models for expressive speech synthesis (supporting control and transfer of prosody and style), but has not presented a coherent framework for understanding the trade-offs between the competing methods.

Expressive Speech Synthesis Style Transfer

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

2 code implementations ICML 2018 RJ Skerry-Ryan, Eric Battenberg, Ying Xiao, Yuxuan Wang, Daisy Stanton, Joel Shor, Ron J. Weiss, Rob Clark, Rif A. Saurous

We present an extension to the Tacotron speech synthesis architecture that learns a latent embedding space of prosody, derived from a reference acoustic representation containing the desired prosody.

Expressive Speech Synthesis

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

11 code implementations ICML 2018 Yuxuan Wang, Daisy Stanton, Yu Zhang, RJ Skerry-Ryan, Eric Battenberg, Joel Shor, Ying Xiao, Fei Ren, Ye Jia, Rif A. Saurous

In this work, we propose "global style tokens" (GSTs), a bank of embeddings that are jointly trained within Tacotron, a state-of-the-art end-to-end speech synthesis system.

Speech Synthesis Style Transfer +1

Exploring Neural Transducers for End-to-End Speech Recognition

no code implementations24 Jul 2017 Eric Battenberg, Jitong Chen, Rewon Child, Adam Coates, Yashesh Gaur, Yi Li, Hairong Liu, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu

In this work, we perform an empirical comparison among the CTC, RNN-Transducer, and attention-based Seq2Seq models for end-to-end speech recognition.

Language Modelling speech-recognition +1

Reducing Bias in Production Speech Models

no code implementations11 May 2017 Eric Battenberg, Rewon Child, Adam Coates, Christopher Fougner, Yashesh Gaur, Jiaji Huang, Heewoo Jun, Ajay Kannan, Markus Kliegl, Atul Kumar, Hairong Liu, Vinay Rao, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu

Replacing hand-engineered pipelines with end-to-end deep learning systems has enabled strong results in applications like speech and object recognition.

Object Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.