no code implementations • 14 Jan 2025 • Yifu Qiu, Varun Embar, Yizhe Zhang, Navdeep Jaitly, Shay B. Cohen, Benjamin Han
Recent advancements in long-context language models (LCLMs) promise to transform Retrieval-Augmented Generation (RAG) by simplifying pipelines.
1 code implementation • 30 Dec 2024 • Jiayi Pan, Xingyao Wang, Graham Neubig, Navdeep Jaitly, Heng Ji, Alane Suhr, Yizhe Zhang
When combined with our fine-tuned SWE agents, we achieve 32. 0% and 26. 0% on SWE-Bench Verified and Lite, respectively, reflecting a new state-of-the-art for open-weight SWE agents.
1 code implementation • 9 Dec 2024 • Shuangfei Zhai, Ruixiang Zhang, Preetum Nakkiran, David Berthelot, Jiatao Gu, Huangjie Zheng, Tianrong Chen, Miguel Angel Bautista, Navdeep Jaitly, Josh Susskind
Normalizing Flows (NFs) are likelihood-based models for continuous inputs.
Ranked #12 on Image Generation on ImageNet 128x128
no code implementations • 26 Nov 2024 • Akshita Gupta, Tatiana Likhomanenko, Karren Dai Yang, Richard He Bai, Zakaria Aldeneh, Navdeep Jaitly
In this paper, we propose a new task -- generating speech from videos of people and their transcripts (VTTS) -- to motivate new techniques for multimodal speech generation.
no code implementations • 2 Nov 2024 • Georgia Gabriela Sampaio, Ruixiang Zhang, Shuangfei Zhai, Jiatao Gu, Josh Susskind, Navdeep Jaitly, Yizhe Zhang
In this work, we focus on the text rendering aspect of these models, which provides a lens for evaluating a generative model's fine-grained instruction-following capabilities.
no code implementations • 31 Oct 2024 • Chen Huang, Skyler Seto, Samira Abnar, David Grangier, Navdeep Jaitly, Josh Susskind
Then we jointly train a prompt generator, optimized to produce a prompt embedding that stays close to the aggregated summary while minimizing task loss at the same time.
no code implementations • 10 Oct 2024 • Jiatao Gu, Yuyang Wang, Yizhe Zhang, Qihang Zhang, Dinghuai Zhang, Navdeep Jaitly, Josh Susskind, Shuangfei Zhai
Furthermore, DART seamlessly trains with both text and image data in a unified model.
no code implementations • 22 Jul 2024 • He Bai, Tatiana Likhomanenko, Ruixiang Zhang, Zijin Gu, Zakaria Aldeneh, Navdeep Jaitly
Large language models have revolutionized natural language processing by leveraging self-supervised pretraining on vast textual data.
1 code implementation • 2 Jun 2024 • Dinghuai Zhang, Yizhe Zhang, Jiatao Gu, Ruixiang Zhang, Josh Susskind, Navdeep Jaitly, Shuangfei Zhai
Diffusion models have become the de-facto approach for generating visual data, which are trained to match the distribution of the training dataset.
no code implementations • 31 May 2024 • Jiatao Gu, Ying Shen, Shuangfei Zhai, Yizhe Zhang, Navdeep Jaitly, Joshua M. Susskind
Furthermore, we show that Kaleido adheres closely to the guidance provided by the generated latent variables, demonstrating its capability to effectively control and direct the image generation process.
no code implementations • 24 May 2024 • Zijin Gu, Tatiana Likhomanenko, He Bai, Erik McDermott, Ronan Collobert, Navdeep Jaitly
Language models (LMs) have long been used to improve results of automatic speech recognition (ASR) systems, but they are unaware of the errors that ASR systems make.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 7 Mar 2024 • Yizhe Zhang, He Bai, Ruixiang Zhang, Jiatao Gu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly
Vision-Language Models (VLMs) have recently demonstrated incredible strides on diverse vision language tasks.
1 code implementation • 22 Feb 2024 • Zhuofeng Wu, He Bai, Aonan Zhang, Jiatao Gu, VG Vinod Vydiswaran, Navdeep Jaitly, Yizhe Zhang
Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first.
no code implementations • 29 Jan 2024 • Pratyush Maini, Skyler Seto, He Bai, David Grangier, Yizhe Zhang, Navdeep Jaitly
Large language models are trained on massive scrapes of the web, which are often unstructured, noisy, and poorly phrased.
no code implementations • 15 Dec 2023 • Shangshang Zheng, He Bai, Yizhe Zhang, Yi Su, Xiaochuan Niu, Navdeep Jaitly
Large Language Models (LLMs) might hallucinate facts, while curated Knowledge Graph (KGs) are typically factually reliable especially with domain-specific knowledge.
no code implementations • 27 Nov 2023 • Yuyang Wang, Ahmed A. Elhag, Navdeep Jaitly, Joshua M. Susskind, Miguel Angel Bautista
We present a novel way to predict molecular conformers through a simple formulation that sidesteps many of the heuristics of prior works and achieves state of the art results by using the advantages of scale.
1 code implementation • 23 Oct 2023 • Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly
Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges.
1 code implementation • 2 Oct 2023 • Yizhe Zhang, Jiarui Lu, Navdeep Jaitly
In this paper, we offer a surrogate problem which assesses an LLMs's capability to deduce an entity unknown to itself, but revealed to a judge, by asking the judge a series of queries.
no code implementations • 20 Sep 2023 • Ali Mousavi, Xin Zhan, He Bai, Peng Shi, Theo Rekatsinas, Benjamin Han, Yunyao Li, Jeff Pound, Josh Susskind, Natalie Schluter, Ihab Ilyas, Navdeep Jaitly
Guided by these observations, we construct a new, improved dataset called LAGRANGE using heuristics meant to improve equivalence between KG and text and show the impact of each of the heuristics on cyclic evaluation.
no code implementations • 7 Sep 2023 • Skyler Seto, Barry-John Theobald, Federico Danieli, Navdeep Jaitly, Dan Busbridge
In online F-TTA, a pre-trained model is adapted using a stream of test samples by minimizing a self-supervised objective, such as entropy minimization.
no code implementations • 6 Sep 2023 • David B. D'Ambrosio, Jonathan Abelian, Saminda Abeyruwan, Michael Ahn, Alex Bewley, Justin Boyd, Krzysztof Choromanski, Omar Cortes, Erwin Coumans, Tianli Ding, Wenbo Gao, Laura Graesser, Atil Iscen, Navdeep Jaitly, Deepali Jain, Juhana Kangaspunta, Satoshi Kataoka, Gus Kouretas, Yuheng Kuang, Nevena Lazic, Corey Lynch, Reza Mahjourian, Sherry Q. Moore, Thinh Nguyen, Ken Oslund, Barney J Reed, Krista Reymann, Pannag R. Sanketi, Anish Shankar, Pierre Sermanet, Vikas Sindhwani, Avi Singh, Vincent Vanhoucke, Grace Vesom, Peng Xu
We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets.
1 code implementation • NeurIPS 2023 • Yizhe Zhang, Jiatao Gu, Zhuofeng Wu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly
Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation.
no code implementations • 3 Dec 2022 • Akshay Mehra, Skyler Seto, Navdeep Jaitly, Barry-John Theobald
Furthermore, the lack of calibration increases the inconsistency in the predictions of the model across exits, leading to both inefficient inference and more misclassifications compared with evaluation on in-distribution data.
no code implementations • 11 Nov 2022 • Tatiana Likhomanenko, Ronan Collobert, Navdeep Jaitly, Samy Bengio
Continuous pseudo-labeling (PL) algorithms such as slimIPL have recently emerged as a powerful strategy for semi-supervised learning in speech recognition.
no code implementations • 2 Nov 2022 • Dan Berrebbi, Ronan Collobert, Navdeep Jaitly, Tatiana Likhomanenko
We perform a systematic analysis on both labeled and unlabeled data by varying the number of speakers while keeping the number of hours fixed and vice versa.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 17 Oct 2022 • Dan Berrebbi, Ronan Collobert, Samy Bengio, Navdeep Jaitly, Tatiana Likhomanenko
Nevertheless, these approaches still rely on bootstrapping the ST using an initial supervised learning phase where the model is trained on labeled data alone.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 15 Jul 2022 • Shuangfei Zhai, Navdeep Jaitly, Jason Ramapuram, Dan Busbridge, Tatiana Likhomanenko, Joseph Yitan Cheng, Walter Talbott, Chen Huang, Hanlin Goh, Joshua Susskind
This pretraining strategy which has been used in BERT models in NLP, Wav2Vec models in Speech and, recently, in MAE models in Vision, forces the model to learn about relationships between the content in different parts of the input using autoencoding related objectives.
no code implementations • 5 Jul 2022 • Chen Huang, Walter Talbott, Navdeep Jaitly, Josh Susskind
Inspired by the success of ConvNets that are combined with pooling to capture long-range dependencies, we learn to pool neighboring features for each token before computing attention in a given attention layer.
no code implementations • 7 May 2020 • Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu
On a long-form YouTube test set, when the nonstreaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22. 3% to 14. 8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67. 0% to 25. 3%.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 31 Mar 2020 • Wenbo Gao, Laura Graesser, Krzysztof Choromanski, Xingyou Song, Nevena Lazic, Pannag Sanketi, Vikas Sindhwani, Navdeep Jaitly
We propose a model-free algorithm for learning efficient policies capable of returning table tennis balls by controlling robot joints at a rate of 100Hz.
1 code implementation • ICML 2020 • William Chan, Chitwan Saharia, Geoffrey Hinton, Mohammad Norouzi, Navdeep Jaitly
This paper presents the Imputer, a neural sequence model that generates output sequences iteratively via imputations.
no code implementations • 13 Dec 2019 • Steven Schwarcz, Peng Xu, David D'Ambrosio, Juhana Kangaspunta, Anelia Angelova, Huong Phan, Navdeep Jaitly
The corpus consists of ping pong play with three main annotation streams that can be used to learn tracking and action recognition models -- tracking of the ping pong ball and poses of humans in the videos and the spin of the ball being hit by humans.
no code implementations • 25 Sep 2019 • Jialin Song, Joe Wenjie Jiang, Amir Yazdanbakhsh, Ebrahim Songhori, Anna Goldie, Navdeep Jaitly, Azalia Mirhoseini
On the other end of the spectrum, approaches rooted in Policy Iteration, such as Dual Policy Iteration do not choose next step actions based on an expert, but instead use planning or search over the policy to choose an action distribution to train towards.
2 code implementations • 21 Feb 2019 • Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.
no code implementations • 20 Aug 2018 • Samuel S. Schoenholz, Sean Hackett, Laura Deming, Eugene Melamud, Navdeep Jaitly, Fiona McAllister, Jonathon O'Brien, George Dahl, Bryson Bennett, Andrew M. Dai, Daphne Koller
As in many other scientific domains, we face a fundamental problem when using machine learning to identify proteins from mass spectrometry data: large ground truth datasets mapping inputs to correct outputs are extremely difficult to obtain.
33 code implementations • 16 Dec 2017 • Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu
This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text.
Ranked #2 on Speech Synthesis on North American English
4 code implementations • 5 Dec 2017 • Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani
Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 20 Nov 2017 • Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu, Xuedong Zhang
We explored both CTC and LAS systems for building speech recognition models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 16 Jun 2017 • Chung-Cheng Chiu, Dieterich Lawson, Yuping Luo, George Tucker, Kevin Swersky, Ilya Sutskever, Navdeep Jaitly
This is because the models require that the entirety of the input sequence be available at the beginning of inference, an assumption that is not valid for instantaneous speech recognition.
no code implementations • 16 May 2017 • Dieterich Lawson, Chung-Cheng Chiu, George Tucker, Colin Raffel, Kevin Swersky, Navdeep Jaitly
There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition.
no code implementations • ICLR 2018 • Luke Metz, Julian Ibarz, Navdeep Jaitly, James Davidson
Specifically, we show how Q-values and policies over continuous spaces can be modeled using a next step prediction model over discretized dimensions.
31 code implementations • 29 Mar 2017 • Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous
A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.
Ranked #5 on Speech Synthesis on North American English
1 code implementation • 24 Mar 2017 • Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen
We present a recurrent encoder-decoder deep neural network architecture that directly translates speech in one language into text in another.
no code implementations • 13 Feb 2017 • Akosua Busia, Navdeep Jaitly
This sequential model achieves 70. 3% Q8 accuracy on CB513 with a single model; an ensemble of these models produces 71. 4% Q8 accuracy on the same test set, improving upon the previous overall state of the art for the eight-class secondary structure problem.
no code implementations • 8 Dec 2016 • Jan Chorowski, Navdeep Jaitly
The recently proposed Sequence-to-Sequence (seq2seq) framework advocates replacing complex data processing pipelines, such as an entire automatic speech recognition system, with a single neural network trained in an end-to-end fashion.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • NeurIPS 2016 • Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, David Sussillo, Samy Bengio
However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.
no code implementations • 4 Nov 2016 • Akosua Busia, Jasmine Collins, Navdeep Jaitly
We first train a series of deep neural networks to predict eight-class secondary structure labels given a protein's amino acid sequence information and find that using recent methods for regularization, such as dropout and weight-norm constraining, leads to measurable gains in accuracy.
Protein Secondary Structure Prediction Protein Structure Prediction
1 code implementation • 31 Oct 2016 • Richard Sproat, Navdeep Jaitly
Though our conclusions are largely negative on this point, we are actually not arguing that the text normalization problem is intractable using an pure RNN approach, merely that it is not going to be something that can be solved merely by having huge amounts of annotated text data and feeding that to a general RNN model.
no code implementations • 10 Oct 2016 • William Chan, Yu Zhang, Quoc Le, Navdeep Jaitly
We present the Latent Sequence Decompositions (LSD) framework.
2 code implementations • 10 Oct 2016 • Yu Zhang, William Chan, Navdeep Jaitly
Sequence-to-sequence models have shown success in end-to-end speech recognition.
no code implementations • NeurIPS 2016 • Mohammad Norouzi, Samy Bengio, Zhifeng Chen, Navdeep Jaitly, Mike Schuster, Yonghui Wu, Dale Schuurmans
A key problem in structured output prediction is direct optimization of the task reward function that matters for test evaluation.
no code implementations • 3 Aug 2016 • Yuping Luo, Chung-Cheng Chiu, Navdeep Jaitly, Ilya Sutskever
Though capable and easy to use, they require that the entirety of the input sequence is available at the beginning of inference, an assumption that is not valid for instantaneous translation and speech recognition.
no code implementations • 8 May 2016 • Georgia Gkioxari, Alexander Toshev, Navdeep Jaitly
In this model the output variables for a given input are predicted sequentially using neural networks.
29 code implementations • 18 Nov 2015 • Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan Frey
In this paper, we propose the "adversarial autoencoder" (AAE), which is a probabilistic autoencoder that uses the recently proposed generative adversarial networks (GAN) to perform variational inference by matching the aggregated posterior of the hidden code vector of the autoencoder with an arbitrary prior distribution.
Ranked #7 on Unsupervised MNIST on MNIST
no code implementations • 16 Nov 2015 • Navdeep Jaitly, David Sussillo, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, Samy Bengio
However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.
no code implementations • 4 Sep 2015 • Ivan Bogun, Anelia Angelova, Navdeep Jaitly
Videos, unlike still images, are temporally coherent which makes the application of deep networks non-trivial.
40 code implementations • 5 Aug 2015 • William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals
Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly.
9 code implementations • NeurIPS 2015 • Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer
Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning.
21 code implementations • NeurIPS 2015 • Oriol Vinyals, Meire Fortunato, Navdeep Jaitly
It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output.
Ranked #9 on Point Cloud Completion on ShapeNet (using extra training data)
6 code implementations • 3 Apr 2015 • Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton
Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients.
Ranked #27 on Sequential Image Classification on Sequential MNIST
no code implementations • 22 Dec 2014 • Soumik Sarkar, Vivek Venugopalan, Kishore Reddy, Michael Giering, Julian Ryde, Navdeep Jaitly
Occlusion edges in images which correspond to range discontinuity in the scene from the point of view of the observer are an important prerequisite for many vision and mobile robot tasks.
no code implementations • 4 Jun 2014 • George E. Dahl, Navdeep Jaitly, Ruslan Salakhutdinov
Although artificial neural networks have occasionally been used for Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) studies in the past, the literature has of late been dominated by other machine learning techniques such as random forests.
no code implementations • Proceedings of the 31st International Conference on International Conference on Machine Learning 2014 • Alex Graves, Navdeep Jaitly
A modification to the objective function is introduced that trains the network to minimise the expectation of an arbitrary transcription loss function.
no code implementations • Signal Processing Magazine 2012 • Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury
Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input.