Search Results for author: Kyle Kastner

Found 14 papers, 8 papers with code

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data

no code implementations • 29 Feb 2024 • Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov

Without any transcribed speech in a new language, this TTS model can generate intelligible speech in >30 unseen languages (CER difference of <10% to ground truth).

Representation Learning Speech Synthesis

Paper
Add Code

High-precision Voice Search Query Correction via Retrievable Speech-text Embedings

no code implementations • 8 Jan 2024 • Christopher Li, Gary Wang, Kyle Kastner, Heng Su, Allen Chen, Andrew Rosenberg, Zhehuai Chen, Zelin Wu, Leonid Velikovich, Pat Rondon, Diamantino Caseiro, Petar Aleksic

In this paper, we eliminate the hypothesis-audio mismatch problem by querying the correction database directly using embeddings derived from the utterance audio; the embeddings of the utterance audio and candidate corrections are produced by multimodal speech-text embedding networks trained to place the embedding of the audio of an utterance and the embedding of its corresponding textual transcript close together.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Understanding Shared Speech-Text Representations

no code implementations • 27 Apr 2023 • Gary Wang, Kyle Kastner, Ankur Bapna, Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang

Recently, a number of approaches to train speech models by incorpo-rating text into end-to-end models have been developed, with Mae-stro advancing state-of-the-art automatic speech recognition (ASR)and Speech Translation (ST) performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS

no code implementations • 30 Jun 2022 • Kyle Kastner, Aaron Courville

This paper introduces R-MelNet, a two-part autoregressive architecture with a frontend based on the first tier of MelNet and a backend WaveRNN-style audio decoder for neural text-to-speech synthesis.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

1 code implementation • ICLR 2022 • Yusong Wu, Ethan Manilow, Yi Deng, Rigel Swavely, Kyle Kastner, Tim Cooijmans, Aaron Courville, Cheng-Zhi Anna Huang, Jesse Engel

Musical expression requires control of both what notes are played, and how they are performed.

Audio Synthesis

290

Paper
Code

Planning in Dynamic Environments with Conditional Autoregressive Models

1 code implementation • 25 Nov 2018 • Johanna Hansen, Kyle Kastner, Aaron Courville, Gregory Dudek

We demonstrate the use of conditional autoregressive generative models (van den Oord et al., 2016a) over a discrete latent space (van den Oord et al., 2017b) for forward planning with MCTS.

Paper
Code

Harmonic Recomposition using Conditional Autoregressive Modeling

1 code implementation • 18 Nov 2018 • Kyle Kastner, Rithesh Kumar, Tim Cooijmans, Aaron Courville

We demonstrate a conditional autoregressive pipeline for efficient music recomposition, based on methods presented in van den Oord et al.(2017).

Paper
Code

Representation Mixing for TTS Synthesis

no code implementations • 17 Nov 2018 • Kyle Kastner, João Felipe Santos, Yoshua Bengio, Aaron Courville

Recent character and phoneme-based parametric TTS systems using deep learning have shown strong performance in natural speech generation.

Paper
Add Code

Blindfold Baselines for Embodied QA

1 code implementation • 12 Nov 2018 • Ankesh Anand, Eugene Belilovsky, Kyle Kastner, Hugo Larochelle, Aaron Courville

We explore blindfold (question-only) baselines for Embodied Question Answering.

Embodied Question Answering Question Answering

Paper
Code

Learning Distributed Representations from Reviews for Collaborative Filtering

no code implementations • 18 Jun 2018 • Amjad Almahairi, Kyle Kastner, Kyunghyun Cho, Aaron Courville

However, interestingly, the greater modeling power offered by the recurrent neural network appears to undermine the model's ability to act as a regularizer of the product representations.

Collaborative Filtering Recommendation Systems

Paper
Add Code

Learning to Discover Sparse Graphical Models

1 code implementation • ICML 2017 • Eugene Belilovsky, Kyle Kastner, Gaël Varoquaux, Matthew Blaschko

Learning this function brings two benefits: it implicitly models the desired structure or sparsity properties to form suitable priors, and it can be tailored to the specific problem of edge structure discovery, rather than maximizing data likelihood.

Paper
Code

ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation

2 code implementations • 22 Nov 2015 • Francesco Visin, Marco Ciccone, Adriana Romero, Kyle Kastner, Kyunghyun Cho, Yoshua Bengio, Matteo Matteucci, Aaron Courville

Moreover, ReNet layers are stacked on top of pre-trained convolutional layers, benefiting from generic local features.

Ranked #18 on Semantic Segmentation on CamVid

Segmentation Semantic Segmentation +1

124

Paper
Code

A Recurrent Latent Variable Model for Sequential Data

5 code implementations • NeurIPS 2015 • Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio

In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder.

289

Paper
Code

ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks

4 code implementations • 3 May 2015 • Francesco Visin, Kyle Kastner, Kyunghyun Cho, Matteo Matteucci, Aaron Courville, Yoshua Bengio

In this paper, we propose a deep neural network architecture for object recognition based on recurrent neural networks.

Ranked #34 on Image Classification on MNIST

Image Classification Object Recognition

124

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.