Search Results for author: Navdeep Jaitly

Found 53 papers, 18 papers with code

Adversarial Autoencoders

28 code implementations18 Nov 2015 Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan Frey

In this paper, we propose the "adversarial autoencoder" (AAE), which is a probabilistic autoencoder that uses the recently proposed generative adversarial networks (GAN) to perform variational inference by matching the aggregated posterior of the hidden code vector of the autoencoder with an arbitrary prior distribution.

Clustering Data Visualization +5

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

2 code implementations21 Feb 2019 Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.

Sequence-To-Sequence Speech Recognition

Listen, Attend and Spell

40 code implementations5 Aug 2015 William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals

Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly.

Language Modelling Reading Comprehension +1

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

4 code implementations5 Dec 2017 Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani

Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

9 code implementations NeurIPS 2015 Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer

Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning.

Constituency Parsing Image Captioning +2

Pointer Networks

21 code implementations NeurIPS 2015 Oriol Vinyals, Meire Fortunato, Navdeep Jaitly

It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output.

Ranked #9 on Point Cloud Completion on ShapeNet (using extra training data)

Combinatorial Optimization Point Cloud Completion

RNN Approaches to Text Normalization: A Challenge

1 code implementation31 Oct 2016 Richard Sproat, Navdeep Jaitly

Though our conclusions are largely negative on this point, we are actually not arguing that the text normalization problem is intractable using an pure RNN approach, merely that it is not going to be something that can be solved merely by having huge amounts of annotated text data and feeding that to a general RNN model.

An Online Sequence-to-Sequence Model Using Partial Conditioning

1 code implementation NeurIPS 2016 Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, David Sussillo, Samy Bengio

However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.

PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model

1 code implementation NeurIPS 2023 Yizhe Zhang, Jiatao Gu, Zhuofeng Wu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly

Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation.

Denoising

How Far Are We from Intelligent Visual Deductive Reasoning?

1 code implementation7 Mar 2024 Yizhe Zhang, He Bai, Ruixiang Zhang, Jiatao Gu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly

Vision-Language Models (VLMs) such as GPT-4V have recently demonstrated incredible strides on diverse vision language tasks.

In-Context Learning Visual Reasoning

Sequence-to-Sequence Models Can Directly Translate Foreign Speech

1 code implementation24 Mar 2017 Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen

We present a recurrent encoder-decoder deep neural network architecture that directly translates speech in one language into text in another.

Machine Translation Sequence-To-Sequence Speech Recognition +2

Position Prediction as an Effective Pretraining Strategy

1 code implementation15 Jul 2022 Shuangfei Zhai, Navdeep Jaitly, Jason Ramapuram, Dan Busbridge, Tatiana Likhomanenko, Joseph Yitan Cheng, Walter Talbott, Chen Huang, Hanlin Goh, Joshua Susskind

This pretraining strategy which has been used in BERT models in NLP, Wav2Vec models in Speech and, recently, in MAE models in Vision, forces the model to learn about relationships between the content in different parts of the input using autoencoding related objectives.

Position speech-recognition +1

Probing the Multi-turn Planning Capabilities of LLMs via 20 Question Games

1 code implementation2 Oct 2023 Yizhe Zhang, Jiarui Lu, Navdeep Jaitly

In this paper, we offer a surrogate problem which assesses an LLMs's capability to deduce an entity unknown to itself, but revealed to a judge, by asking the judge a series of queries.

Discrete Sequential Prediction of Continuous Actions for Deep RL

no code implementations ICLR 2018 Luke Metz, Julian Ibarz, Navdeep Jaitly, James Davidson

Specifically, we show how Q-values and policies over continuous spaces can be modeled using a next step prediction model over discretized dimensions.

Continuous Control Q-Learning +1

Learning Hard Alignments with Variational Inference

no code implementations16 May 2017 Dieterich Lawson, Chung-Cheng Chiu, George Tucker, Colin Raffel, Kevin Swersky, Navdeep Jaitly

There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition.

Hard Attention Image Captioning +5

An online sequence-to-sequence model for noisy speech recognition

no code implementations16 Jun 2017 Chung-Cheng Chiu, Dieterich Lawson, Yuping Luo, George Tucker, Kevin Swersky, Ilya Sutskever, Navdeep Jaitly

This is because the models require that the entirety of the input sequence be available at the beginning of inference, an assumption that is not valid for instantaneous speech recognition.

Noisy Speech Recognition speech-recognition

Next-Step Conditioned Deep Convolutional Neural Networks Improve Protein Secondary Structure Prediction

no code implementations13 Feb 2017 Akosua Busia, Navdeep Jaitly

This sequential model achieves 70. 3% Q8 accuracy on CB513 with a single model; an ensemble of these models produces 71. 4% Q8 accuracy on the same test set, improving upon the previous overall state of the art for the eight-class secondary structure problem.

Protein Secondary Structure Prediction

Towards better decoding and language model integration in sequence to sequence models

no code implementations8 Dec 2016 Jan Chorowski, Navdeep Jaitly

The recently proposed Sequence-to-Sequence (seq2seq) framework advocates replacing complex data processing pipelines, such as an entire automatic speech recognition system, with a single neural network trained in an end-to-end fashion.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Protein Secondary Structure Prediction Using Deep Multi-scale Convolutional Neural Networks and Next-Step Conditioning

no code implementations4 Nov 2016 Akosua Busia, Jasmine Collins, Navdeep Jaitly

We first train a series of deep neural networks to predict eight-class secondary structure labels given a protein's amino acid sequence information and find that using recent methods for regularization, such as dropout and weight-norm constraining, leads to measurable gains in accuracy.

Protein Secondary Structure Prediction Protein Structure Prediction

Chained Predictions Using Convolutional Neural Networks

no code implementations8 May 2016 Georgia Gkioxari, Alexander Toshev, Navdeep Jaitly

In this model the output variables for a given input are predicted sequentially using neural networks.

Pose Estimation

A Neural Transducer

no code implementations16 Nov 2015 Navdeep Jaitly, David Sussillo, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, Samy Bengio

However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.

Learning Online Alignments with Continuous Rewards Policy Gradient

no code implementations3 Aug 2016 Yuping Luo, Chung-Cheng Chiu, Navdeep Jaitly, Ilya Sutskever

Though capable and easy to use, they require that the entirety of the input sequence is available at the beginning of inference, an assumption that is not valid for instantaneous translation and speech recognition.

Machine Translation Question Answering +4

Object Recognition from Short Videos for Robotic Perception

no code implementations4 Sep 2015 Ivan Bogun, Anelia Angelova, Navdeep Jaitly

Videos, unlike still images, are temporally coherent which makes the application of deep networks non-trivial.

Object Object Recognition

Occlusion Edge Detection in RGB-D Frames using Deep Convolutional Networks

no code implementations22 Dec 2014 Soumik Sarkar, Vivek Venugopalan, Kishore Reddy, Michael Giering, Julian Ryde, Navdeep Jaitly

Occlusion edges in images which correspond to range discontinuity in the scene from the point of view of the observer are an important prerequisite for many vision and mobile robot tasks.

Edge Detection

Multi-task Neural Networks for QSAR Predictions

no code implementations4 Jun 2014 George E. Dahl, Navdeep Jaitly, Ruslan Salakhutdinov

Although artificial neural networks have occasionally been used for Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) studies in the past, the literature has of late been dominated by other machine learning techniques such as random forests.

Peptide-Spectra Matching from Weak Supervision

no code implementations20 Aug 2018 Samuel S. Schoenholz, Sean Hackett, Laura Deming, Eugene Melamud, Navdeep Jaitly, Fiona McAllister, Jonathon O'Brien, George Dahl, Bryson Bennett, Andrew M. Dai, Daphne Koller

As in many other scientific domains, we face a fundamental problem when using machine learning to identify proteins from mass spectrometry data: large ground truth datasets mapping inputs to correct outputs are extremely difficult to obtain.

SPIN: A High Speed, High Resolution Vision Dataset for Tracking and Action Recognition in Ping Pong

no code implementations13 Dec 2019 Steven Schwarcz, Peng Xu, David D'Ambrosio, Juhana Kangaspunta, Anelia Angelova, Huong Phan, Navdeep Jaitly

The corpus consists of ping pong play with three main annotation streams that can be used to learn tracking and action recognition models -- tracking of the ping pong ball and poses of humans in the videos and the spin of the ball being hit by humans.

Action Recognition Pose Estimation +1

Robotic Table Tennis with Model-Free Reinforcement Learning

no code implementations31 Mar 2020 Wenbo Gao, Laura Graesser, Krzysztof Choromanski, Xingyou Song, Nevena Lazic, Pannag Sanketi, Vikas Sindhwani, Navdeep Jaitly

We propose a model-free algorithm for learning efficient policies capable of returning table tennis balls by controlling robot joints at a rate of 100Hz.

reinforcement-learning Reinforcement Learning (RL)

Deep Neural Networks for Acoustic Modeling in Speech Recognition

no code implementations Signal Processing Magazine 2012 Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input.

speech-recognition Speech Recognition

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

no code implementations7 May 2020 Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu

On a long-form YouTube test set, when the nonstreaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22. 3% to 14. 8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67. 0% to 25. 3%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Policy Optimization by Local Improvement through Search

no code implementations25 Sep 2019 Jialin Song, Joe Wenjie Jiang, Amir Yazdanbakhsh, Ebrahim Songhori, Anna Goldie, Navdeep Jaitly, Azalia Mirhoseini

On the other end of the spectrum, approaches rooted in Policy Iteration, such as Dual Policy Iteration do not choose next step actions based on an expert, but instead use planning or search over the policy to choose an action distribution to train towards.

Imitation Learning reinforcement-learning +1

Efficient Representation Learning via Adaptive Context Pooling

no code implementations5 Jul 2022 Chen Huang, Walter Talbott, Navdeep Jaitly, Josh Susskind

Inspired by the success of ConvNets that are combined with pooling to capture long-range dependencies, we learn to pool neighboring features for each token before computing attention in a given attention layer.

Representation Learning

Continuous Pseudo-Labeling from the Start

no code implementations17 Oct 2022 Dan Berrebbi, Ronan Collobert, Samy Bengio, Navdeep Jaitly, Tatiana Likhomanenko

Nevertheless, these approaches still rely on bootstrapping the ST using an initial supervised learning phase where the model is trained on labeled data alone.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

More Speaking or More Speakers?

no code implementations2 Nov 2022 Dan Berrebbi, Ronan Collobert, Navdeep Jaitly, Tatiana Likhomanenko

We perform a systematic analysis on both labeled and unlabeled data by varying the number of speakers while keeping the number of hours fixed and vice versa.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Continuous Soft Pseudo-Labeling in ASR

no code implementations11 Nov 2022 Tatiana Likhomanenko, Ronan Collobert, Navdeep Jaitly, Samy Bengio

Continuous pseudo-labeling (PL) algorithms such as slimIPL have recently emerged as a powerful strategy for semi-supervised learning in speech recognition.

speech-recognition Speech Recognition

Understanding the Robustness of Multi-Exit Models under Common Corruptions

no code implementations3 Dec 2022 Akshay Mehra, Skyler Seto, Navdeep Jaitly, Barry-John Theobald

Furthermore, the lack of calibration increases the inconsistency in the predictions of the model across exits, leading to both inefficient inference and more misclassifications compared with evaluation on in-distribution data.

REALM: Robust Entropy Adaptive Loss Minimization for Improved Single-Sample Test-Time Adaptation

no code implementations7 Sep 2023 Skyler Seto, Barry-John Theobald, Federico Danieli, Navdeep Jaitly, Dan Busbridge

In online F-TTA, a pre-trained model is adapted using a stream of test samples by minimizing a self-supervised objective, such as entropy minimization.

Test-time Adaptation

Construction of Paired Knowledge Graph-Text Datasets Informed by Cyclic Evaluation

no code implementations20 Sep 2023 Ali Mousavi, Xin Zhan, He Bai, Peng Shi, Theo Rekatsinas, Benjamin Han, Yunyao Li, Jeff Pound, Josh Susskind, Natalie Schluter, Ihab Ilyas, Navdeep Jaitly

Guided by these observations, we construct a new, improved dataset called LAGRANGE using heuristics meant to improve equivalence between KG and text and show the impact of each of the heuristics on cyclic evaluation.

Hallucination Knowledge Graphs

Matryoshka Diffusion Models

no code implementations23 Oct 2023 Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly

Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges.

Image Generation Zero-shot Generalization

Generating Molecular Conformer Fields

no code implementations27 Nov 2023 Yuyang Wang, Ahmed A. Elhag, Navdeep Jaitly, Joshua M. Susskind, Miguel Angel Bautista

In this paper we tackle the problem of generating conformers of a molecule in 3D space given its molecular graph.

KGLens: A Parameterized Knowledge Graph Solution to Assess What an LLM Does and Doesn't Know

no code implementations15 Dec 2023 Shangshang Zheng, He Bai, Yizhe Zhang, Yi Su, Xiaochuan Niu, Navdeep Jaitly

Measuring the alignment between a Knowledge Graph (KG) and Large Language Models (LLMs) is an effective method to assess the factualness and identify the knowledge blind spots of LLMs.

Knowledge Graphs

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

no code implementations29 Jan 2024 Pratyush Maini, Skyler Seto, He Bai, David Grangier, Yizhe Zhang, Navdeep Jaitly

Large language models are trained on massive scrapes of the web, which are often unstructured, noisy, and poorly phrased.

Language Modelling

Divide-or-Conquer? Which Part Should You Distill Your LLM?

no code implementations22 Feb 2024 Zhuofeng Wu, He Bai, Aonan Zhang, Jiatao Gu, VG Vinod Vydiswaran, Navdeep Jaitly, Yizhe Zhang

Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first.

Problem Decomposition

Cannot find the paper you are looking for? You can Submit a new open access paper.