Search Results for author: Navdeep Jaitly

Found 36 papers, 13 papers with code

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

no code implementations7 May 2020 Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu

On a long-form YouTube test set, when the nonstreaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22. 3% to 14. 8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67. 0% to 25. 3%.

Speech Recognition

Robotic Table Tennis with Model-Free Reinforcement Learning

no code implementations31 Mar 2020 Wenbo Gao, Laura Graesser, Krzysztof Choromanski, Xingyou Song, Nevena Lazic, Pannag Sanketi, Vikas Sindhwani, Navdeep Jaitly

We propose a model-free algorithm for learning efficient policies capable of returning table tennis balls by controlling robot joints at a rate of 100Hz.

Curriculum Learning

SPIN: A High Speed, High Resolution Vision Dataset for Tracking and Action Recognition in Ping Pong

no code implementations13 Dec 2019 Steven Schwarcz, Peng Xu, David D'Ambrosio, Juhana Kangaspunta, Anelia Angelova, Huong Phan, Navdeep Jaitly

The corpus consists of ping pong play with three main annotation streams that can be used to learn tracking and action recognition models -- tracking of the ping pong ball and poses of humans in the videos and the spin of the ball being hit by humans.

Action Recognition Pose Estimation

Policy Optimization by Local Improvement through Search

no code implementations25 Sep 2019 Jialin Song, Joe Wenjie Jiang, Amir Yazdanbakhsh, Ebrahim Songhori, Anna Goldie, Navdeep Jaitly, Azalia Mirhoseini

On the other end of the spectrum, approaches rooted in Policy Iteration, such as Dual Policy Iteration do not choose next step actions based on an expert, but instead use planning or search over the policy to choose an action distribution to train towards.

Imitation Learning

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

3 code implementations21 Feb 2019 Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.

Sequence-To-Sequence Speech Recognition

Peptide-Spectra Matching from Weak Supervision

no code implementations20 Aug 2018 Samuel S. Schoenholz, Sean Hackett, Laura Deming, Eugene Melamud, Navdeep Jaitly, Fiona McAllister, Jonathon O'Brien, George Dahl, Bryson Bennett, Andrew M. Dai, Daphne Koller

As in many other scientific domains, we face a fundamental problem when using machine learning to identify proteins from mass spectrometry data: large ground truth datasets mapping inputs to correct outputs are extremely difficult to obtain.

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

4 code implementations5 Dec 2017 Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani

Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network.

Speech Recognition

An online sequence-to-sequence model for noisy speech recognition

no code implementations16 Jun 2017 Chung-Cheng Chiu, Dieterich Lawson, Yuping Luo, George Tucker, Kevin Swersky, Ilya Sutskever, Navdeep Jaitly

This is because the models require that the entirety of the input sequence be available at the beginning of inference, an assumption that is not valid for instantaneous speech recognition.

Noisy Speech Recognition

Learning Hard Alignments with Variational Inference

no code implementations16 May 2017 Dieterich Lawson, Chung-Cheng Chiu, George Tucker, Colin Raffel, Kevin Swersky, Navdeep Jaitly

There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition.

Image Captioning Object Recognition +3

Discrete Sequential Prediction of Continuous Actions for Deep RL

no code implementations ICLR 2018 Luke Metz, Julian Ibarz, Navdeep Jaitly, James Davidson

Specifically, we show how Q-values and policies over continuous spaces can be modeled using a next step prediction model over discretized dimensions.

Continuous Control Q-Learning +1

Sequence-to-Sequence Models Can Directly Translate Foreign Speech

1 code implementation24 Mar 2017 Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen

We present a recurrent encoder-decoder deep neural network architecture that directly translates speech in one language into text in another.

Machine Translation Sequence-To-Sequence Speech Recognition +1

Next-Step Conditioned Deep Convolutional Neural Networks Improve Protein Secondary Structure Prediction

no code implementations13 Feb 2017 Akosua Busia, Navdeep Jaitly

This sequential model achieves 70. 3% Q8 accuracy on CB513 with a single model; an ensemble of these models produces 71. 4% Q8 accuracy on the same test set, improving upon the previous overall state of the art for the eight-class secondary structure problem.

Protein Secondary Structure Prediction

Towards better decoding and language model integration in sequence to sequence models

no code implementations8 Dec 2016 Jan Chorowski, Navdeep Jaitly

The recently proposed Sequence-to-Sequence (seq2seq) framework advocates replacing complex data processing pipelines, such as an entire automatic speech recognition system, with a single neural network trained in an end-to-end fashion.

Speech Recognition

An Online Sequence-to-Sequence Model Using Partial Conditioning

no code implementations NeurIPS 2016 Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, David Sussillo, Samy Bengio

However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.

Protein Secondary Structure Prediction Using Deep Multi-scale Convolutional Neural Networks and Next-Step Conditioning

no code implementations4 Nov 2016 Akosua Busia, Jasmine Collins, Navdeep Jaitly

We first train a series of deep neural networks to predict eight-class secondary structure labels given a protein's amino acid sequence information and find that using recent methods for regularization, such as dropout and weight-norm constraining, leads to measurable gains in accuracy.

Protein Secondary Structure Prediction Protein Structure Prediction

RNN Approaches to Text Normalization: A Challenge

1 code implementation31 Oct 2016 Richard Sproat, Navdeep Jaitly

Though our conclusions are largely negative on this point, we are actually not arguing that the text normalization problem is intractable using an pure RNN approach, merely that it is not going to be something that can be solved merely by having huge amounts of annotated text data and feeding that to a general RNN model.

Very Deep Convolutional Networks for End-to-End Speech Recognition

2 code implementations10 Oct 2016 Yu Zhang, William Chan, Navdeep Jaitly

Sequence-to-sequence models have shown success in end-to-end speech recognition.

Speech Recognition

Latent Sequence Decompositions

no code implementations10 Oct 2016 William Chan, Yu Zhang, Quoc Le, Navdeep Jaitly

We present the Latent Sequence Decompositions (LSD) framework.

Speech Recognition

Learning Online Alignments with Continuous Rewards Policy Gradient

no code implementations3 Aug 2016 Yuping Luo, Chung-Cheng Chiu, Navdeep Jaitly, Ilya Sutskever

Though capable and easy to use, they require that the entirety of the input sequence is available at the beginning of inference, an assumption that is not valid for instantaneous translation and speech recognition.

Machine Translation Question Answering +2

Chained Predictions Using Convolutional Neural Networks

no code implementations8 May 2016 Georgia Gkioxari, Alexander Toshev, Navdeep Jaitly

In this model the output variables for a given input are predicted sequentially using neural networks.

Pose Estimation

Adversarial Autoencoders

25 code implementations18 Nov 2015 Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan Frey

In this paper, we propose the "adversarial autoencoder" (AAE), which is a probabilistic autoencoder that uses the recently proposed generative adversarial networks (GAN) to perform variational inference by matching the aggregated posterior of the hidden code vector of the autoencoder with an arbitrary prior distribution.

Data Visualization Dimensionality Reduction +4

A Neural Transducer

no code implementations16 Nov 2015 Navdeep Jaitly, David Sussillo, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, Samy Bengio

However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.

Object Recognition from Short Videos for Robotic Perception

no code implementations4 Sep 2015 Ivan Bogun, Anelia Angelova, Navdeep Jaitly

Videos, unlike still images, are temporally coherent which makes the application of deep networks non-trivial.

Object Recognition

Listen, Attend and Spell

38 code implementations5 Aug 2015 William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals

Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly.

Speech Recognition

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

9 code implementations NeurIPS 2015 Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer

Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning.

Constituency Parsing Curriculum Learning +3

Pointer Networks

17 code implementations NeurIPS 2015 Oriol Vinyals, Meire Fortunato, Navdeep Jaitly

It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output.

Combinatorial Optimization

Occlusion Edge Detection in RGB-D Frames using Deep Convolutional Networks

no code implementations22 Dec 2014 Soumik Sarkar, Vivek Venugopalan, Kishore Reddy, Michael Giering, Julian Ryde, Navdeep Jaitly

Occlusion edges in images which correspond to range discontinuity in the scene from the point of view of the observer are an important prerequisite for many vision and mobile robot tasks.

Edge Detection

Multi-task Neural Networks for QSAR Predictions

no code implementations4 Jun 2014 George E. Dahl, Navdeep Jaitly, Ruslan Salakhutdinov

Although artificial neural networks have occasionally been used for Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) studies in the past, the literature has of late been dominated by other machine learning techniques such as random forests.

Deep Neural Networks for Acoustic Modeling in Speech Recognition

no code implementations Signal Processing Magazine 2012 Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input.

Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.