Search Results for author: Kyle Kastner

Found 14 papers, 8 papers with code

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data

no code implementations29 Feb 2024 Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov

Without any transcribed speech in a new language, this TTS model can generate intelligible speech in >30 unseen languages (CER difference of <10% to ground truth).

Representation Learning Speech Synthesis

High-precision Voice Search Query Correction via Retrievable Speech-text Embedings

no code implementations8 Jan 2024 Christopher Li, Gary Wang, Kyle Kastner, Heng Su, Allen Chen, Andrew Rosenberg, Zhehuai Chen, Zelin Wu, Leonid Velikovich, Pat Rondon, Diamantino Caseiro, Petar Aleksic

In this paper, we eliminate the hypothesis-audio mismatch problem by querying the correction database directly using embeddings derived from the utterance audio; the embeddings of the utterance audio and candidate corrections are produced by multimodal speech-text embedding networks trained to place the embedding of the audio of an utterance and the embedding of its corresponding textual transcript close together.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Understanding Shared Speech-Text Representations

no code implementations27 Apr 2023 Gary Wang, Kyle Kastner, Ankur Bapna, Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang

Recently, a number of approaches to train speech models by incorpo-rating text into end-to-end models have been developed, with Mae-stro advancing state-of-the-art automatic speech recognition (ASR)and Speech Translation (ST) performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS

no code implementations30 Jun 2022 Kyle Kastner, Aaron Courville

This paper introduces R-MelNet, a two-part autoregressive architecture with a frontend based on the first tier of MelNet and a backend WaveRNN-style audio decoder for neural text-to-speech synthesis.

Decoder Speech Synthesis +1

Planning in Dynamic Environments with Conditional Autoregressive Models

1 code implementation25 Nov 2018 Johanna Hansen, Kyle Kastner, Aaron Courville, Gregory Dudek

We demonstrate the use of conditional autoregressive generative models (van den Oord et al., 2016a) over a discrete latent space (van den Oord et al., 2017b) for forward planning with MCTS.

Harmonic Recomposition using Conditional Autoregressive Modeling

1 code implementation18 Nov 2018 Kyle Kastner, Rithesh Kumar, Tim Cooijmans, Aaron Courville

We demonstrate a conditional autoregressive pipeline for efficient music recomposition, based on methods presented in van den Oord et al.(2017).

Representation Mixing for TTS Synthesis

no code implementations17 Nov 2018 Kyle Kastner, João Felipe Santos, Yoshua Bengio, Aaron Courville

Recent character and phoneme-based parametric TTS systems using deep learning have shown strong performance in natural speech generation.

Learning Distributed Representations from Reviews for Collaborative Filtering

no code implementations18 Jun 2018 Amjad Almahairi, Kyle Kastner, Kyunghyun Cho, Aaron Courville

However, interestingly, the greater modeling power offered by the recurrent neural network appears to undermine the model's ability to act as a regularizer of the product representations.

Collaborative Filtering Recommendation Systems

Learning to Discover Sparse Graphical Models

1 code implementation ICML 2017 Eugene Belilovsky, Kyle Kastner, Gaël Varoquaux, Matthew Blaschko

Learning this function brings two benefits: it implicitly models the desired structure or sparsity properties to form suitable priors, and it can be tailored to the specific problem of edge structure discovery, rather than maximizing data likelihood.

A Recurrent Latent Variable Model for Sequential Data

5 code implementations NeurIPS 2015 Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio

In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder.

Cannot find the paper you are looking for? You can Submit a new open access paper.