Search Results for author: Navdeep Jaitly

Found 53 papers, 18 papers with code

Tacotron: Towards End-to-End Speech Synthesis

29 code implementations • 29 Mar 2017 • Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.

Ranked #5 on Speech Synthesis on North American English

Audio Synthesis Speech Synthesis +1

50,652

Paper
Code

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

30 code implementations • 16 Dec 2017 • Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text.

Ranked #2 on Speech Synthesis on North American English

Speech Synthesis

28,968

Paper
Code

Adversarial Autoencoders

28 code implementations • 18 Nov 2015 • Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan Frey

In this paper, we propose the "adversarial autoencoder" (AAE), which is a probabilistic autoencoder that uses the recently proposed generative adversarial networks (GAN) to perform variational inference by matching the aggregated posterior of the hidden code vector of the autoencoder with an arbitrary prior distribution.

Ranked #6 on Unsupervised Image Classification on MNIST

Clustering Data Visualization +5

15,665

Paper
Code

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

2 code implementations • 21 Feb 2019 • Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.

Sequence-To-Sequence Speech Recognition

2,781

Paper
Code

Listen, Attend and Spell

40 code implementations • 5 Aug 2015 • William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals

Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly.

Language Modelling Reading Comprehension +1

1,157

Paper
Code

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

4 code implementations • 5 Dec 2017 • Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani

Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

574

Paper
Code

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

9 code implementations • NeurIPS 2015 • Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer

Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning.

Constituency Parsing Image Captioning +2

517

Paper
Code

Pointer Networks

21 code implementations • NeurIPS 2015 • Oriol Vinyals, Meire Fortunato, Navdeep Jaitly

It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output.

Ranked #9 on Point Cloud Completion on ShapeNet (using extra training data)

Combinatorial Optimization Point Cloud Completion

466

Paper
Code

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

6 code implementations • 3 Apr 2015 • Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton

Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients.

Ranked #27 on Sequential Image Classification on Sequential MNIST

Language Modelling Sequential Image Classification +2

426

Paper
Code

RNN Approaches to Text Normalization: A Challenge

1 code implementation • 31 Oct 2016 • Richard Sproat, Navdeep Jaitly

Though our conclusions are largely negative on this point, we are actually not arguing that the text normalization problem is intractable using an pure RNN approach, merely that it is not going to be something that can be solved merely by having huge amounts of annotated text data and feeding that to a general RNN model.

Paper
Code

Imputer: Sequence Modelling via Imputation and Dynamic Programming

1 code implementation • ICML 2020 • William Chan, Chitwan Saharia, Geoffrey Hinton, Mohammad Norouzi, Navdeep Jaitly

This paper presents the Imputer, a neural sequence model that generates output sequences iteratively via imputations.

Imputation speech-recognition +1

Paper
Code

An Online Sequence-to-Sequence Model Using Partial Conditioning

1 code implementation • NeurIPS 2016 • Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, David Sussillo, Samy Bengio

However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.

Paper
Code

PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model

1 code implementation • NeurIPS 2023 • Yizhe Zhang, Jiatao Gu, Zhuofeng Wu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly

Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation.

Denoising

Paper
Code

How Far Are We from Intelligent Visual Deductive Reasoning?

1 code implementation • 7 Mar 2024 • Yizhe Zhang, He Bai, Ruixiang Zhang, Jiatao Gu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly

Vision-Language Models (VLMs) such as GPT-4V have recently demonstrated incredible strides on diverse vision language tasks.

In-Context Learning Visual Reasoning

Paper
Code

Sequence-to-Sequence Models Can Directly Translate Foreign Speech

1 code implementation • 24 Mar 2017 • Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen

We present a recurrent encoder-decoder deep neural network architecture that directly translates speech in one language into text in another.

Machine Translation Sequence-To-Sequence Speech Recognition +2

Paper
Code

Very Deep Convolutional Networks for End-to-End Speech Recognition

2 code implementations • 10 Oct 2016 • Yu Zhang, William Chan, Navdeep Jaitly

Sequence-to-sequence models have shown success in end-to-end speech recognition.

speech-recognition Speech Recognition

Paper
Code

Position Prediction as an Effective Pretraining Strategy

1 code implementation • 15 Jul 2022 • Shuangfei Zhai, Navdeep Jaitly, Jason Ramapuram, Dan Busbridge, Tatiana Likhomanenko, Joseph Yitan Cheng, Walter Talbott, Chen Huang, Hanlin Goh, Joshua Susskind

This pretraining strategy which has been used in BERT models in NLP, Wav2Vec models in Speech and, recently, in MAE models in Vision, forces the model to learn about relationships between the content in different parts of the input using autoencoding related objectives.

Position speech-recognition +1

Paper
Code

Probing the Multi-turn Planning Capabilities of LLMs via 20 Question Games

1 code implementation • 2 Oct 2023 • Yizhe Zhang, Jiarui Lu, Navdeep Jaitly

In this paper, we offer a surrogate problem which assesses an LLMs's capability to deduce an entity unknown to itself, but revealed to a judge, by asking the judge a series of queries.

Paper
Code

Discrete Sequential Prediction of Continuous Actions for Deep RL

no code implementations • ICLR 2018 • Luke Metz, Julian Ibarz, Navdeep Jaitly, James Davidson

Specifically, we show how Q-values and policies over continuous spaces can be modeled using a next step prediction model over discretized dimensions.

Continuous Control Q-Learning +1

Paper
Add Code

Speech recognition for medical conversations

no code implementations • 20 Nov 2017 • Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu, Xuedong Zhang

We explored both CTC and LAS systems for building speech recognition models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Learning Hard Alignments with Variational Inference

no code implementations • 16 May 2017 • Dieterich Lawson, Chung-Cheng Chiu, George Tucker, Colin Raffel, Kevin Swersky, Navdeep Jaitly

There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition.

Hard Attention Image Captioning +5

Paper
Add Code

An online sequence-to-sequence model for noisy speech recognition

no code implementations • 16 Jun 2017 • Chung-Cheng Chiu, Dieterich Lawson, Yuping Luo, George Tucker, Kevin Swersky, Ilya Sutskever, Navdeep Jaitly

This is because the models require that the entirety of the input sequence be available at the beginning of inference, an assumption that is not valid for instantaneous speech recognition.

Noisy Speech Recognition speech-recognition

Paper
Add Code

Next-Step Conditioned Deep Convolutional Neural Networks Improve Protein Secondary Structure Prediction

no code implementations • 13 Feb 2017 • Akosua Busia, Navdeep Jaitly

This sequential model achieves 70. 3% Q8 accuracy on CB513 with a single model; an ensemble of these models produces 71. 4% Q8 accuracy on the same test set, improving upon the previous overall state of the art for the eight-class secondary structure problem.

Protein Secondary Structure Prediction

Paper
Add Code

Latent Sequence Decompositions

no code implementations • 10 Oct 2016 • William Chan, Yu Zhang, Quoc Le, Navdeep Jaitly

We present the Latent Sequence Decompositions (LSD) framework.

speech-recognition Speech Recognition +1

Paper
Add Code

Reward Augmented Maximum Likelihood for Neural Structured Prediction

no code implementations • NeurIPS 2016 • Mohammad Norouzi, Samy Bengio, Zhifeng Chen, Navdeep Jaitly, Mike Schuster, Yonghui Wu, Dale Schuurmans

A key problem in structured output prediction is direct optimization of the task reward function that matters for test evaluation.

Machine Translation speech-recognition +3

Paper
Add Code

Towards better decoding and language model integration in sequence to sequence models

no code implementations • 8 Dec 2016 • Jan Chorowski, Navdeep Jaitly

The recently proposed Sequence-to-Sequence (seq2seq) framework advocates replacing complex data processing pipelines, such as an entire automatic speech recognition system, with a single neural network trained in an end-to-end fashion.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Protein Secondary Structure Prediction Using Deep Multi-scale Convolutional Neural Networks and Next-Step Conditioning

no code implementations • 4 Nov 2016 • Akosua Busia, Jasmine Collins, Navdeep Jaitly

We first train a series of deep neural networks to predict eight-class secondary structure labels given a protein's amino acid sequence information and find that using recent methods for regularization, such as dropout and weight-norm constraining, leads to measurable gains in accuracy.

Protein Secondary Structure Prediction Protein Structure Prediction

Paper
Add Code

Chained Predictions Using Convolutional Neural Networks

no code implementations • 8 May 2016 • Georgia Gkioxari, Alexander Toshev, Navdeep Jaitly

In this model the output variables for a given input are predicted sequentially using neural networks.

Pose Estimation

Paper
Add Code

A Neural Transducer

no code implementations • 16 Nov 2015 • Navdeep Jaitly, David Sussillo, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, Samy Bengio

However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.

Paper
Add Code

Learning Online Alignments with Continuous Rewards Policy Gradient

no code implementations • 3 Aug 2016 • Yuping Luo, Chung-Cheng Chiu, Navdeep Jaitly, Ilya Sutskever

Though capable and easy to use, they require that the entirety of the input sequence is available at the beginning of inference, an assumption that is not valid for instantaneous translation and speech recognition.

Machine Translation Question Answering +4

Paper
Add Code

Object Recognition from Short Videos for Robotic Perception

no code implementations • 4 Sep 2015 • Ivan Bogun, Anelia Angelova, Navdeep Jaitly

Videos, unlike still images, are temporally coherent which makes the application of deep networks non-trivial.

Object Object Recognition

Paper
Add Code

Occlusion Edge Detection in RGB-D Frames using Deep Convolutional Networks

no code implementations • 22 Dec 2014 • Soumik Sarkar, Vivek Venugopalan, Kishore Reddy, Michael Giering, Julian Ryde, Navdeep Jaitly

Occlusion edges in images which correspond to range discontinuity in the scene from the point of view of the observer are an important prerequisite for many vision and mobile robot tasks.

Edge Detection

Paper
Add Code

Multi-task Neural Networks for QSAR Predictions

no code implementations • 4 Jun 2014 • George E. Dahl, Navdeep Jaitly, Ruslan Salakhutdinov

Although artificial neural networks have occasionally been used for Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) studies in the past, the literature has of late been dominated by other machine learning techniques such as random forests.

Paper
Add Code

Peptide-Spectra Matching from Weak Supervision

no code implementations • 20 Aug 2018 • Samuel S. Schoenholz, Sean Hackett, Laura Deming, Eugene Melamud, Navdeep Jaitly, Fiona McAllister, Jonathon O'Brien, George Dahl, Bryson Bennett, Andrew M. Dai, Daphne Koller

As in many other scientific domains, we face a fundamental problem when using machine learning to identify proteins from mass spectrometry data: large ground truth datasets mapping inputs to correct outputs are extremely difficult to obtain.

Paper
Add Code

SPIN: A High Speed, High Resolution Vision Dataset for Tracking and Action Recognition in Ping Pong

no code implementations • 13 Dec 2019 • Steven Schwarcz, Peng Xu, David D'Ambrosio, Juhana Kangaspunta, Anelia Angelova, Huong Phan, Navdeep Jaitly

The corpus consists of ping pong play with three main annotation streams that can be used to learn tracking and action recognition models -- tracking of the ping pong ball and poses of humans in the videos and the spin of the ball being hit by humans.

Action Recognition Pose Estimation +1

Paper
Add Code

Robotic Table Tennis with Model-Free Reinforcement Learning

no code implementations • 31 Mar 2020 • Wenbo Gao, Laura Graesser, Krzysztof Choromanski, Xingyou Song, Nevena Lazic, Pannag Sanketi, Vikas Sindhwani, Navdeep Jaitly

We propose a model-free algorithm for learning efficient policies capable of returning table tennis balls by controlling robot joints at a rate of 100Hz.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Deep Neural Networks for Acoustic Modeling in Speech Recognition

no code implementations • Signal Processing Magazine 2012 • Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models to determine how well each state of each HMM ﬁts a frame or a short window of frames of coefﬁcients that represents the acoustic input.

speech-recognition Speech Recognition

Paper
Add Code

Towards End-To-End Speech Recognition with Recurrent Neural Networks

no code implementations • Proceedings of the 31st International Conference on International Conference on Machine Learning 2014 • Alex Graves, Navdeep Jaitly

A modification to the objective function is introduced that trains the network to minimise the expectation of an arbitrary transcription loss function.

Language Modelling speech-recognition +1

Paper
Add Code

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

no code implementations • 7 May 2020 • Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu

On a long-form YouTube test set, when the nonstreaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22. 3% to 14. 8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67. 0% to 25. 3%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Policy Optimization by Local Improvement through Search

no code implementations • 25 Sep 2019 • Jialin Song, Joe Wenjie Jiang, Amir Yazdanbakhsh, Ebrahim Songhori, Anna Goldie, Navdeep Jaitly, Azalia Mirhoseini

On the other end of the spectrum, approaches rooted in Policy Iteration, such as Dual Policy Iteration do not choose next step actions based on an expert, but instead use planning or search over the policy to choose an action distribution to train towards.

Imitation Learning reinforcement-learning +1

Paper
Add Code

Efficient Representation Learning via Adaptive Context Pooling

no code implementations • 5 Jul 2022 • Chen Huang, Walter Talbott, Navdeep Jaitly, Josh Susskind

Inspired by the success of ConvNets that are combined with pooling to capture long-range dependencies, we learn to pool neighboring features for each token before computing attention in a given attention layer.

Representation Learning

Paper
Add Code

Continuous Pseudo-Labeling from the Start

no code implementations • 17 Oct 2022 • Dan Berrebbi, Ronan Collobert, Samy Bengio, Navdeep Jaitly, Tatiana Likhomanenko

Nevertheless, these approaches still rely on bootstrapping the ST using an initial supervised learning phase where the model is trained on labeled data alone.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

More Speaking or More Speakers?

no code implementations • 2 Nov 2022 • Dan Berrebbi, Ronan Collobert, Navdeep Jaitly, Tatiana Likhomanenko

We perform a systematic analysis on both labeled and unlabeled data by varying the number of speakers while keeping the number of hours fixed and vice versa.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Continuous Soft Pseudo-Labeling in ASR

no code implementations • 11 Nov 2022 • Tatiana Likhomanenko, Ronan Collobert, Navdeep Jaitly, Samy Bengio

Continuous pseudo-labeling (PL) algorithms such as slimIPL have recently emerged as a powerful strategy for semi-supervised learning in speech recognition.

speech-recognition Speech Recognition

Paper
Add Code

Understanding the Robustness of Multi-Exit Models under Common Corruptions

no code implementations • 3 Dec 2022 • Akshay Mehra, Skyler Seto, Navdeep Jaitly, Barry-John Theobald

Furthermore, the lack of calibration increases the inconsistency in the predictions of the model across exits, leading to both inefficient inference and more misclassifications compared with evaluation on in-distribution data.

Paper
Add Code

REALM: Robust Entropy Adaptive Loss Minimization for Improved Single-Sample Test-Time Adaptation

no code implementations • 7 Sep 2023 • Skyler Seto, Barry-John Theobald, Federico Danieli, Navdeep Jaitly, Dan Busbridge

In online F-TTA, a pre-trained model is adapted using a stream of test samples by minimizing a self-supervised objective, such as entropy minimization.

Test-time Adaptation

Paper
Add Code

Robotic Table Tennis: A Case Study into a High Speed Learning System

no code implementations • 6 Sep 2023 • David B. D'Ambrosio, Jonathan Abelian, Saminda Abeyruwan, Michael Ahn, Alex Bewley, Justin Boyd, Krzysztof Choromanski, Omar Cortes, Erwin Coumans, Tianli Ding, Wenbo Gao, Laura Graesser, Atil Iscen, Navdeep Jaitly, Deepali Jain, Juhana Kangaspunta, Satoshi Kataoka, Gus Kouretas, Yuheng Kuang, Nevena Lazic, Corey Lynch, Reza Mahjourian, Sherry Q. Moore, Thinh Nguyen, Ken Oslund, Barney J Reed, Krista Reymann, Pannag R. Sanketi, Anish Shankar, Pierre Sermanet, Vikas Sindhwani, Avi Singh, Vincent Vanhoucke, Grace Vesom, Peng Xu

We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets.

Paper
Add Code

Construction of Paired Knowledge Graph-Text Datasets Informed by Cyclic Evaluation

no code implementations • 20 Sep 2023 • Ali Mousavi, Xin Zhan, He Bai, Peng Shi, Theo Rekatsinas, Benjamin Han, Yunyao Li, Jeff Pound, Josh Susskind, Natalie Schluter, Ihab Ilyas, Navdeep Jaitly

Guided by these observations, we construct a new, improved dataset called LAGRANGE using heuristics meant to improve equivalence between KG and text and show the impact of each of the heuristics on cyclic evaluation.

Hallucination Knowledge Graphs

Paper
Add Code

Matryoshka Diffusion Models

no code implementations • 23 Oct 2023 • Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly

Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges.

Image Generation Zero-shot Generalization

Paper
Add Code

Generating Molecular Conformer Fields

no code implementations • 27 Nov 2023 • Yuyang Wang, Ahmed A. Elhag, Navdeep Jaitly, Joshua M. Susskind, Miguel Angel Bautista

In this paper we tackle the problem of generating conformers of a molecule in 3D space given its molecular graph.

Paper
Add Code

KGLens: A Parameterized Knowledge Graph Solution to Assess What an LLM Does and Doesn't Know

no code implementations • 15 Dec 2023 • Shangshang Zheng, He Bai, Yizhe Zhang, Yi Su, Xiaochuan Niu, Navdeep Jaitly

Measuring the alignment between a Knowledge Graph (KG) and Large Language Models (LLMs) is an effective method to assess the factualness and identify the knowledge blind spots of LLMs.

Knowledge Graphs

Paper
Add Code

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

no code implementations • 29 Jan 2024 • Pratyush Maini, Skyler Seto, He Bai, David Grangier, Yizhe Zhang, Navdeep Jaitly

Large language models are trained on massive scrapes of the web, which are often unstructured, noisy, and poorly phrased.

Language Modelling

Paper
Add Code

Divide-or-Conquer? Which Part Should You Distill Your LLM?

no code implementations • 22 Feb 2024 • Zhuofeng Wu, He Bai, Aonan Zhang, Jiatao Gu, VG Vinod Vydiswaran, Navdeep Jaitly, Yizhe Zhang

Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first.

Problem Decomposition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.