Search Results for author: Kyle Kastner

Found 18 papers, 8 papers with code

Zero-shot Cross-lingual Voice Transfer for TTS

no code implementations20 Sep 2024 Fadi Biadsy, Youzheng Chen, Isaac Elias, Kyle Kastner, Gary Wang, Andrew Rosenberg, Bhuvana Ramabhadran

In this paper, we introduce a zero-shot Voice Transfer (VT) module that can be seamlessly integrated into a multi-lingual Text-to-speech (TTS) system to transfer an individual's voice across languages.

text-to-speech Text to Speech

High-precision Voice Search Query Correction via Retrievable Speech-text Embedings

no code implementations8 Jan 2024 Christopher Li, Gary Wang, Kyle Kastner, Heng Su, Allen Chen, Andrew Rosenberg, Zhehuai Chen, Zelin Wu, Leonid Velikovich, Pat Rondon, Diamantino Caseiro, Petar Aleksic

In this paper, we eliminate the hypothesis-audio mismatch problem by querying the correction database directly using embeddings derived from the utterance audio; the embeddings of the utterance audio and candidate corrections are produced by multimodal speech-text embedding networks trained to place the embedding of the audio of an utterance and the embedding of its corresponding textual transcript close together.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Understanding Shared Speech-Text Representations

no code implementations27 Apr 2023 Gary Wang, Kyle Kastner, Ankur Bapna, Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang

Recently, a number of approaches to train speech models by incorpo-rating text into end-to-end models have been developed, with Mae-stro advancing state-of-the-art automatic speech recognition (ASR)and Speech Translation (ST) performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS

no code implementations30 Jun 2022 Kyle Kastner, Aaron Courville

This paper introduces R-MelNet, a two-part autoregressive architecture with a frontend based on the first tier of MelNet and a backend WaveRNN-style audio decoder for neural text-to-speech synthesis.

Decoder Speech Synthesis +3

Planning in Dynamic Environments with Conditional Autoregressive Models

1 code implementation25 Nov 2018 Johanna Hansen, Kyle Kastner, Aaron Courville, Gregory Dudek

We demonstrate the use of conditional autoregressive generative models (van den Oord et al., 2016a) over a discrete latent space (van den Oord et al., 2017b) for forward planning with MCTS.

Harmonic Recomposition using Conditional Autoregressive Modeling

1 code implementation18 Nov 2018 Kyle Kastner, Rithesh Kumar, Tim Cooijmans, Aaron Courville

We demonstrate a conditional autoregressive pipeline for efficient music recomposition, based on methods presented in van den Oord et al.(2017).

Representation Mixing for TTS Synthesis

no code implementations17 Nov 2018 Kyle Kastner, João Felipe Santos, Yoshua Bengio, Aaron Courville

Recent character and phoneme-based parametric TTS systems using deep learning have shown strong performance in natural speech generation.

Learning Distributed Representations from Reviews for Collaborative Filtering

no code implementations18 Jun 2018 Amjad Almahairi, Kyle Kastner, Kyunghyun Cho, Aaron Courville

However, interestingly, the greater modeling power offered by the recurrent neural network appears to undermine the model's ability to act as a regularizer of the product representations.

Collaborative Filtering

Learning to Discover Sparse Graphical Models

1 code implementation ICML 2017 Eugene Belilovsky, Kyle Kastner, Gaël Varoquaux, Matthew Blaschko

Learning this function brings two benefits: it implicitly models the desired structure or sparsity properties to form suitable priors, and it can be tailored to the specific problem of edge structure discovery, rather than maximizing data likelihood.

A Recurrent Latent Variable Model for Sequential Data

5 code implementations NeurIPS 2015 Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio

In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder.

model

Cannot find the paper you are looking for? You can Submit a new open access paper.