no code implementations • 29 Feb 2024 • Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov
Without any transcribed speech in a new language, this TTS model can generate intelligible speech in >30 unseen languages (CER difference of <10% to ground truth).
no code implementations • 8 Jan 2024 • Christopher Li, Gary Wang, Kyle Kastner, Heng Su, Allen Chen, Andrew Rosenberg, Zhehuai Chen, Zelin Wu, Leonid Velikovich, Pat Rondon, Diamantino Caseiro, Petar Aleksic
In this paper, we eliminate the hypothesis-audio mismatch problem by querying the correction database directly using embeddings derived from the utterance audio; the embeddings of the utterance audio and candidate corrections are produced by multimodal speech-text embedding networks trained to place the embedding of the audio of an utterance and the embedding of its corresponding textual transcript close together.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 27 Apr 2023 • Gary Wang, Kyle Kastner, Ankur Bapna, Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang
Recently, a number of approaches to train speech models by incorpo-rating text into end-to-end models have been developed, with Mae-stro advancing state-of-the-art automatic speech recognition (ASR)and Speech Translation (ST) performance.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 30 Jun 2022 • Kyle Kastner, Aaron Courville
This paper introduces R-MelNet, a two-part autoregressive architecture with a frontend based on the first tier of MelNet and a backend WaveRNN-style audio decoder for neural text-to-speech synthesis.
1 code implementation • ICLR 2022 • Yusong Wu, Ethan Manilow, Yi Deng, Rigel Swavely, Kyle Kastner, Tim Cooijmans, Aaron Courville, Cheng-Zhi Anna Huang, Jesse Engel
Musical expression requires control of both what notes are played, and how they are performed.
1 code implementation • 25 Nov 2018 • Johanna Hansen, Kyle Kastner, Aaron Courville, Gregory Dudek
We demonstrate the use of conditional autoregressive generative models (van den Oord et al., 2016a) over a discrete latent space (van den Oord et al., 2017b) for forward planning with MCTS.
1 code implementation • 18 Nov 2018 • Kyle Kastner, Rithesh Kumar, Tim Cooijmans, Aaron Courville
We demonstrate a conditional autoregressive pipeline for efficient music recomposition, based on methods presented in van den Oord et al.(2017).
no code implementations • 17 Nov 2018 • Kyle Kastner, João Felipe Santos, Yoshua Bengio, Aaron Courville
Recent character and phoneme-based parametric TTS systems using deep learning have shown strong performance in natural speech generation.
1 code implementation • 12 Nov 2018 • Ankesh Anand, Eugene Belilovsky, Kyle Kastner, Hugo Larochelle, Aaron Courville
We explore blindfold (question-only) baselines for Embodied Question Answering.
no code implementations • 18 Jun 2018 • Amjad Almahairi, Kyle Kastner, Kyunghyun Cho, Aaron Courville
However, interestingly, the greater modeling power offered by the recurrent neural network appears to undermine the model's ability to act as a regularizer of the product representations.
1 code implementation • ICML 2017 • Eugene Belilovsky, Kyle Kastner, Gaël Varoquaux, Matthew Blaschko
Learning this function brings two benefits: it implicitly models the desired structure or sparsity properties to form suitable priors, and it can be tailored to the specific problem of edge structure discovery, rather than maximizing data likelihood.
2 code implementations • 22 Nov 2015 • Francesco Visin, Marco Ciccone, Adriana Romero, Kyle Kastner, Kyunghyun Cho, Yoshua Bengio, Matteo Matteucci, Aaron Courville
Moreover, ReNet layers are stacked on top of pre-trained convolutional layers, benefiting from generic local features.
Ranked #18 on Semantic Segmentation on CamVid
5 code implementations • NeurIPS 2015 • Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio
In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder.
4 code implementations • 3 May 2015 • Francesco Visin, Kyle Kastner, Kyunghyun Cho, Matteo Matteucci, Aaron Courville, Yoshua Bengio
In this paper, we propose a deep neural network architecture for object recognition based on recurrent neural networks.
Ranked #34 on Image Classification on MNIST