Search Results for author: Quoc V. Le

Found 117 papers, 73 papers with code

Primer: Searching for Efficient Transformers for Language Modeling

4 code implementations17 Sep 2021 David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc V. Le

For example, at a 500M parameter size, Primer improves the original T5 architecture on C4 auto-regressive language modeling, reducing the training cost by 4X.

Language Modelling

STraTA: Self-Training with Task Augmentation for Better Few-shot Learning

1 code implementation EMNLP 2021 Tu Vu, Minh-Thang Luong, Quoc V. Le, Grady Simon, Mohit Iyyer

Despite their recent successes in tackling many NLP tasks, large-scale pre-trained language models do not perform as well in few-shot settings where only a handful of training examples are available.

Few-Shot Learning Few-Shot NLI +1

Finetuned Language Models Are Zero-Shot Learners

1 code implementation3 Sep 2021 Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le

We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially boosts zero-shot performance on unseen tasks.

Common Sense Reasoning Language Modelling +5

Multi-Task Self-Training for Learning General Representations

no code implementations ICCV 2021 Golnaz Ghiasi, Barret Zoph, Ekin D. Cubuk, Quoc V. Le, Tsung-Yi Lin

The results suggest self-training is a promising direction to aggregate labeled and unlabeled training data for learning general feature representations.

Multi-Task Learning

CoAtNet: Marrying Convolution and Attention for All Data Sizes

3 code implementations NeurIPS 2021 Zihang Dai, Hanxiao Liu, Quoc V. Le, Mingxing Tan

Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks.

 Ranked #1 on Image Classification on ImageNet (using extra training data)

Image Classification

Pay Attention to MLPs

16 code implementations NeurIPS 2021 Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le

Transformers have become one of the most important architectural innovations in deep learning and have enabled many breakthroughs over the past few years.

Image Classification Natural Language Inference +2

EfficientNetV2: Smaller Models and Faster Training

12 code implementations1 Apr 2021 Mingxing Tan, Quoc V. Le

By pretraining on the same ImageNet21k, our EfficientNetV2 achieves 87. 3% top-1 accuracy on ImageNet ILSVRC2012, outperforming the recent ViT by 2. 0% accuracy while training 5x-11x faster using the same computing resources.

Data Augmentation Image Classification +1

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

2 code implementations11 Feb 2021 Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, YunHsuan Sung, Zhen Li, Tom Duerig

In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset.

Cross-Modal Retrieval Fine-Grained Image Classification +2

Evolving Reinforcement Learning Algorithms

3 code implementations ICLR 2021 John D. Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Sergey Levine, Quoc V. Le, Honglak Lee, Aleksandra Faust

Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm.

Atari Games Meta-Learning

AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

1 code implementation5 Jan 2021 Hieu Pham, Quoc V. Le

As a result, these conventional methods are less effective than methods that leverage the structures, such as SpatialDropout and DropBlock, which randomly drop the values at certain contiguous areas in the hidden states and setting them to zero.

Image Classification Language Modelling +1

Towards Domain-Agnostic Contrastive Learning

no code implementations9 Nov 2020 Vikas Verma, Minh-Thang Luong, Kenji Kawaguchi, Hieu Pham, Quoc V. Le

Despite recent success, most contrastive self-supervised learning methods are domain-specific, relying heavily on data augmentation techniques that require knowledge about a particular domain, such as image cropping and rotation.

Contrastive Learning Data Augmentation +3

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

no code implementations20 Oct 2020 Yu Zhang, James Qin, Daniel S. Park, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Quoc V. Le, Yonghui Wu

We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset.

 Ranked #1 on Speech Recognition on LibriSpeech test-clean (using extra training data)

automatic-speech-recognition Speech Recognition

Smooth Adversarial Training

1 code implementation25 Jun 2020 Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le

SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82. 2% accuracy and 58. 6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9. 5% for accuracy and 11. 6% for robustness.

Adversarial Defense Adversarial Robustness

Rethinking Pre-training and Self-training

2 code implementations NeurIPS 2020 Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le

For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data.

 Ranked #1 on Semantic Segmentation on PASCAL VOC 2012 test (using extra training data)

Data Augmentation Object Detection +1

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

3 code implementations NeurIPS 2020 Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le

With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost.

Language understanding Reading Comprehension +1

AutoHAS: Efficient Hyperparameter and Architecture Search

no code implementations5 Jun 2020 Xuanyi Dong, Mingxing Tan, Adams Wei Yu, Daiyi Peng, Bogdan Gabrys, Quoc V. Le

Efficient hyperparameter or architecture search methods have shown remarkable results, but each of them is only applicable to searching for either hyperparameters (HPs) or architectures.

Hyperparameter Optimization Neural Architecture Search

Improved Noisy Student Training for Automatic Speech Recognition

no code implementations19 May 2020 Daniel S. Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, Quoc V. Le

Noisy student training is an iterative self-training method that leverages augmentation to improve network performance.

Ranked #4 on Speech Recognition on LibriSpeech test-clean (using extra training data)

automatic-speech-recognition Image Classification +1

Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension

no code implementations ICLR 2020 Xinyun Chen, Chen Liang, Adams Wei Yu, Denny Zhou, Dawn Song, Quoc V. Le

Integrating distributed representations with symbolic operations is essential for reading comprehension requiring complex reasoning, such as counting, sorting and arithmetics, but most existing approaches are hard to scale to more domains or more complex reasoning.

Data Augmentation Question Answering +1

Evolving Normalization-Activation Layers

8 code implementations NeurIPS 2020 Hanxiao Liu, Andrew Brock, Karen Simonyan, Quoc V. Le

Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other.

Image Classification Image Generation +2

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

14 code implementations ICLR 2020 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning

Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts whether each token in the corrupted input was replaced by a generator sample or not.

Language Modelling Language understanding +3

Meta Pseudo Labels

7 code implementations CVPR 2021 Hieu Pham, Zihang Dai, Qizhe Xie, Minh-Thang Luong, Quoc V. Le

We present Meta Pseudo Labels, a semi-supervised learning method that achieves a new state-of-the-art top-1 accuracy of 90. 2% on ImageNet, which is 1. 6% better than the existing state-of-the-art.

 Ranked #1 on Image Classification on ImageNet ReaL (using extra training data)

Meta-Learning Semi-Supervised Image Classification

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

2 code implementations6 Mar 2020 Esteban Real, Chen Liang, David R. So, Quoc V. Le

However, this progress has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks---or similarly restrictive search spaces.

AutoML

Towards a Human-like Open-Domain Chatbot

2 code implementations27 Jan 2020 Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le

We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations.

Chatbot

SpecAugment on Large Scale Datasets

no code implementations11 Dec 2019 Daniel S. Park, Yu Zhang, Chung-Cheng Chiu, Youzheng Chen, Bo Li, William Chan, Quoc V. Le, Yonghui Wu

Recently, SpecAugment, an augmentation scheme for automatic speech recognition that acts directly on the spectrogram of input utterances, has shown to be highly effective in enhancing the performance of end-to-end networks on public datasets.

automatic-speech-recognition Speech Recognition

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

5 code implementations CVPR 2020 Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song

We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search.

Classification General Classification +5

MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices

2 code implementations CVPR 2020 Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi Lin, Dmitry Kalenichenko, Hartwig Adams, Quoc V. Le

We propose MnasFPN, a mobile-friendly search space for the detection head, and combine it with latency-aware architecture search to produce efficient object detection models.

Object Detection

Adversarial Examples Improve Image Recognition

6 code implementations CVPR 2020 Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le

We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger.

Image Classification

Self-training with Noisy Student improves ImageNet classification

12 code implementations CVPR 2020 Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. Le

During the learning of the student, we inject noise such as dropout, stochastic depth, and data augmentation via RandAugment to the student so that the student generalizes better than the teacher.

Ranked #9 on Image Classification on ImageNet ReaL (using extra training data)

Classification Data Augmentation +2

High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

no code implementations NeurIPS 2019 Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee

Predicting future video frames is extremely challenging, as there are many factors of variation that make up the dynamics of how frames change through time.

Optical Flow Estimation Video Prediction

RandAugment: Practical automated data augmentation with a reduced search space

13 code implementations NeurIPS 2020 Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, Quoc V. Le

Additionally, due to the separate search phase, these approaches are unable to adjust the regularization strength based on model or dataset size.

Data Augmentation Image Classification +1

Semi-supervised Learning by Coaching

no code implementations25 Sep 2019 Hieu Pham, Quoc V. Le

Recent semi-supervised learning (SSL) methods often have a teacher to train a student in order to propagate labels from labeled data to unlabeled data.

Saccader: Improving Accuracy of Hard Attention Models for Vision

2 code implementations NeurIPS 2019 Gamaleldin F. Elsayed, Simon Kornblith, Quoc V. Le

Although deep convolutional neural networks achieve state-of-the-art performance across nearly all image classification tasks, their decisions are difficult to interpret.

Image Classification

MixConv: Mixed Depthwise Convolutional Kernels

12 code implementations22 Jul 2019 Mingxing Tan, Quoc V. Le

In this paper, we systematically study the impact of different kernel sizes, and observe that combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency.

AutoML Image Classification +1

Neural Input Search for Large Scale Recommendation Models

no code implementations10 Jul 2019 Manas R. Joglekar, Cong Li, Jay K. Adams, Pranav Khaitan, Quoc V. Le

During training we use reinforcement learning to find the optimal vocabulary size for each feature and embedding dimension for each value of the feature.

Learning Data Augmentation Strategies for Object Detection

6 code implementations ECCV 2020 Barret Zoph, Ekin D. Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, Quoc V. Le

Importantly, the best policy found on COCO may be transferred unchanged to other detection datasets and models to improve predictive accuracy.

Image Augmentation Image Classification +1

XLNet: Generalized Autoregressive Pretraining for Language Understanding

22 code implementations NeurIPS 2019 Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling.

Document Ranking Humor Detection +8

Selfie: Self-supervised Pretraining for Image Embedding

1 code implementation7 Jun 2019 Trieu H. Trinh, Minh-Thang Luong, Quoc V. Le

Notably, on ImageNet 224 x 224 with 60 examples per class (5%), our method improves the mean accuracy of ResNet-50 from 35. 6% to 46. 7%, an improvement of 11. 1 points in absolute accuracy.

Language Modelling

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

97 code implementations ICML 2019 Mingxing Tan, Quoc V. Le

Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available.

Ranked #2 on Fine-Grained Image Classification on Birdsnap (using extra training data)

Fine-Grained Image Classification Neural Architecture Search +1

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

no code implementations9 May 2019 Daniel S. Park, Jascha Sohl-Dickstein, Quoc V. Le, Samuel L. Smith

We find that the optimal SGD hyper-parameters are determined by a "normalized noise scale," which is a function of the batch size, learning rate, and initialization conditions.

Diversity and Depth in Per-Example Routing Models

no code implementations ICLR 2019 Prajit Ramachandran, Quoc V. Le

Both architectural diversity and routing depth can increase the representational power of a routing network.

Multi-Task Learning

Do Language Models Have Common Sense?

no code implementations ICLR 2019 Trieu H. Trinh, Quoc V. Le

It has been argued that current machine learning models do not have commonsense, and therefore must be hard-coded with prior knowledge (Marcus, 2018).

Common Sense Reasoning Language Modelling

Unsupervised Data Augmentation for Consistency Training

16 code implementations NeurIPS 2020 Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, Quoc V. Le

In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.

Image Augmentation Semi-Supervised Image Classification +2

CondConv: Conditionally Parameterized Convolutions for Efficient Inference

8 code implementations NeurIPS 2019 Brandon Yang, Gabriel Bender, Quoc V. Le, Jiquan Ngiam

We demonstrate that scaling networks with CondConv improves the performance and inference cost trade-off of several existing convolutional neural network architectures on both classification and detection tasks.

Classification General Classification +2

The Evolved Transformer

2 code implementations30 Jan 2019 David R. So, Chen Liang, Quoc V. Le

Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models.

Machine Translation Neural Architecture Search

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

25 code implementations ACL 2019 Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov

Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling.

Language Modelling

Domain Adaptive Transfer Learning with Specialist Models

no code implementations16 Nov 2018 Jiquan Ngiam, Daiyi Peng, Vijay Vasudevan, Simon Kornblith, Quoc V. Le, Ruoming Pang

Our method to compute importance weights follow from ideas in domain adaptation, and we show a novel application to transfer learning.

Ranked #2 on Fine-Grained Image Classification on Stanford Cars (using extra training data)

Domain Adaptation Fine-Grained Image Classification +2

DropBlock: A regularization method for convolutional networks

6 code implementations NeurIPS 2018 Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le

This lack of success of dropout for convolutional layers is perhaps due to the fact that activation units in convolutional layers are spatially correlated so information can still flow through convolutional networks despite dropout.

Image Classification Object Detection

Semi-Supervised Sequence Modeling with Cross-View Training

2 code implementations EMNLP 2018 Kevin Clark, Minh-Thang Luong, Christopher D. Manning, Quoc V. Le

We therefore propose Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data.

CCG Supertagging Dependency Parsing +6

MnasNet: Platform-Aware Neural Architecture Search for Mobile

12 code implementations CVPR 2019 Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le

In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency.

Image Classification Neural Architecture Search +1

Stochastic natural gradient descent draws posterior samples in function space

no code implementations25 Jun 2018 Samuel L. Smith, Daniel Duckworth, Semon Rezchikov, Quoc V. Le, Jascha Sohl-Dickstein

Recent work has argued that stochastic gradient descent can approximate the Bayesian uncertainty in model parameters near local minima.

AutoAugment: Learning Augmentation Policies from Data

20 code implementations24 May 2018 Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc V. Le

In our implementation, we have designed a search space where a policy consists of many sub-policies, one of which is randomly chosen for each image in each mini-batch.

Fine-Grained Image Classification Image Augmentation

Do Better ImageNet Models Transfer Better?

no code implementations CVPR 2019 Simon Kornblith, Jonathon Shlens, Quoc V. Le

Transfer learning is a cornerstone of computer vision, yet little work has been done to evaluate the relationship between architecture and transfer.

Classification Fine-Grained Image Classification +2

Regularized Evolution for Image Classifier Architecture Search

4 code implementations5 Feb 2018 Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V. Le

The effort devoted to hand-crafting neural network image classifiers has motivated the use of architecture search to discover them automatically.

Image Classification Neural Architecture Search

A Hierarchical Model for Device Placement

no code implementations ICLR 2018 Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc V. Le, Jeff Dean

We introduce a hierarchical model for efficient placement of computational graphs onto hardware devices, especially in heterogeneous environments with a mixture of CPUs, GPUs, and other computational devices.

Machine Translation Translation

Cross-View Training for Semi-Supervised Learning

no code implementations ICLR 2018 Kevin Clark, Thang Luong, Quoc V. Le

The students can learn from the teacher (the full model) because the teacher sees more of each example.

Ranked #3 on Chunking on CoNLL 2000 (using extra training data)

Chunking

A Goal-oriented Neural Conversation Model by Self-Play

no code implementations ICLR 2018 Wei Wei, Quoc V. Le, Andrew M. Dai, Li-Jia Li

One challenge in applying such techniques to building goal-oriented conversation models is that maximum likelihood-based models are not optimized toward accomplishing goals.

Language Modelling Language understanding +1

EXPLORING NEURAL ARCHITECTURE SEARCH FOR LANGUAGE TASKS

no code implementations ICLR 2018 Minh-Thang Luong, David Dohan, Adams Wei Yu, Quoc V. Le, Barret Zoph, Vijay Vasudevan

Neural architecture search (NAS), the task of finding neural architectures automatically, has recently emerged as a promising approach for unveiling better models over human-designed ones.

Language Modelling Neural Architecture Search +2

Faster Discovery of Neural Architectures by Searching for Paths in a Large Model

no code implementations ICLR 2018 Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean

We propose Efficient Neural Architecture Search (ENAS), a faster and less expensive approach to automated model design than previous methods.

Neural Architecture Search

Code Synthesis with Priority Queue Training

no code implementations ICLR 2018 Daniel A. Abolafia, Quoc V. Le, Mohammad Norouzi

We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards.

Program Synthesis

Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?

1 code implementation ICML 2018 Maithra Raghu, Alex Irpan, Jacob Andreas, Robert Kleinberg, Quoc V. Le, Jon Kleinberg

Deep reinforcement learning has achieved many recent successes, but our understanding of its strengths and limitations is hampered by the lack of rich environments in which we can fully characterize optimal behavior, and correspondingly diagnose individual actions against such a characterization.

Don't Decay the Learning Rate, Increase the Batch Size

3 code implementations ICLR 2018 Samuel L. Smith, Pieter-Jan Kindermans, Chris Ying, Quoc V. Le

We can further reduce the number of parameter updates by increasing the learning rate $\epsilon$ and scaling the batch size $B \propto \epsilon$.

A Bayesian Perspective on Generalization and Stochastic Gradient Descent

no code implementations17 Oct 2017 Samuel L. Smith, Quoc V. Le

Interpreting stochastic gradient descent as a stochastic differential equation, we identify the "noise scale" $g = \epsilon (\frac{N}{B} - 1) \approx \epsilon N/B$, where $\epsilon$ is the learning rate, $N$ the training set size and $B$ the batch size.

Searching for Activation Functions

20 code implementations ICLR 2018 Prajit Ramachandran, Barret Zoph, Quoc V. Le

The simplicity of Swish and its similarity to ReLU make it easy for practitioners to replace ReLUs with Swish units in any neural network.

Image Classification

Neural Optimizer Search with Reinforcement Learning

3 code implementations21 Sep 2017 Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le

We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures.

Machine Translation Translation

Neural Optimizer Search using Reinforcement Learning

no code implementations ICML 2017 Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le

We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures.

Machine Translation Translation

Learning Transferable Architectures for Scalable Image Recognition

10 code implementations CVPR 2018 Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le

In our experiments, we search for the best convolutional layer (or "cell") on the CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking together more copies of this cell, each with their own parameters to design a convolutional architecture, named "NASNet architecture".

Image Classification Neural Architecture Search

Device Placement Optimization with Reinforcement Learning

1 code implementation ICML 2017 Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jeff Dean

Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices.

Language Modelling Machine Translation +1

Learning to Skim Text

3 code implementations ACL 2017 Adams Wei Yu, Hongrae Lee, Quoc V. Le

Recurrent Neural Networks are showing much promise in many sub-areas of natural language processing, ranging from document classification to machine translation to automatic question answering.

Document Classification General Classification +4

An Online Sequence-to-Sequence Model Using Partial Conditioning

no code implementations NeurIPS 2016 Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, David Sussillo, Samy Bengio

However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.

Neural Combinatorial Optimization with Reinforcement Learning

9 code implementations29 Nov 2016 Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, Samy Bengio

Despite the computational expense, without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes.

Combinatorial Optimization Traveling Salesman Problem

Learning a Natural Language Interface with Neural Programmer

2 code implementations28 Nov 2016 Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, Dario Amodei

The main experimental result in this paper is that a single Neural Programmer model achieves 34. 2% accuracy using only 10, 000 examples with weak supervision.

Language understanding Program induction +1

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

4 code implementations TACL 2017 Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean

In addition to improving the translation quality of language pairs that the model was trained with, our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation.

Machine Translation Transfer Learning +1

Unsupervised Pretraining for Sequence to Sequence Learning

no code implementations EMNLP 2017 Prajit Ramachandran, Peter J. Liu, Quoc V. Le

We apply this method to challenging benchmarks in machine translation and abstractive summarization and find that it significantly improves the subsequent supervised models.

Abstractive Text Summarization Machine Translation +1

Neural Architecture Search with Reinforcement Learning

11 code implementations5 Nov 2016 Barret Zoph, Quoc V. Le

Our cell achieves a test set perplexity of 62. 4 on the Penn Treebank, which is 3. 6 perplexity better than the previous state-of-the-art model.

Image Classification Language Modelling +3

HyperNetworks

7 code implementations27 Sep 2016 David Ha, Andrew Dai, Quoc V. Le

This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network.

Handwriting generation Language Modelling +2

Adding Gradient Noise Improves Learning for Very Deep Networks

5 code implementations21 Nov 2015 Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens

This success is partially attributed to architectural innovations such as convolutional and long short-term memory networks.

Question Answering

Multi-task Sequence to Sequence Learning

no code implementations19 Nov 2015 Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser

This paper examines three multi-task learning (MTL) settings for sequence to sequence models: (a) the oneto-many setting - where the encoder is shared between several tasks such as machine translation and syntactic parsing, (b) the many-to-one setting - useful when only the decoder can be shared, as in the case of translation and image caption generation, and (c) the many-to-many setting - where multiple encoders and decoders are shared, which is the case with unsupervised objectives and translation.

Machine Translation Multi-Task Learning +1

Neural Programmer: Inducing Latent Programs with Gradient Descent

no code implementations16 Nov 2015 Arvind Neelakantan, Quoc V. Le, Ilya Sutskever

In this work, we propose Neural Programmer, an end-to-end differentiable neural network augmented with a small set of basic arithmetic and logic operations.

Question Answering Speech Recognition

A Neural Transducer

no code implementations16 Nov 2015 Navdeep Jaitly, David Sussillo, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, Samy Bengio

However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.

Semi-supervised Sequence Learning

171 code implementations NeurIPS 2015 Andrew M. Dai, Quoc V. Le

In our experiments, we find that long short term memory recurrent networks after being pretrained with the two approaches are more stable and generalize better.

Language Modelling Text Classification

Listen, Attend and Spell

37 code implementations5 Aug 2015 William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals

Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly.

Language Modelling Speech Recognition

Document Embedding with Paragraph Vectors

5 code implementations29 Jul 2015 Andrew M. Dai, Christopher Olah, Quoc V. Le

Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts.

Document Embedding Sentiment Analysis +1

Addressing the Rare Word Problem in Neural Machine Translation

5 code implementations IJCNLP 2015 Minh-Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba

Our experiments on the WMT14 English to French translation task show that this method provides a substantial improvement of up to 2. 8 BLEU points over an equivalent NMT system that does not use this technique.

Machine Translation Translation +1

Sequence to Sequence Learning with Neural Networks

52 code implementations NeurIPS 2014 Ilya Sutskever, Oriol Vinyals, Quoc V. Le

Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

Machine Translation Time Series Forecasting +1

Distributed Representations of Sentences and Documents

26 code implementations16 May 2014 Quoc V. Le, Tomas Mikolov

Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models.

Question Answering Sentiment Analysis +1

Grounded Compositional Semantics for Finding and Describing Images with Sentences

no code implementations TACL 2014 Richard Socher, Andrej Karpathy, Quoc V. Le, Christopher D. Manning, Andrew Y. Ng

Previous work on Recursive Neural Networks (RNNs) shows that these models can produce compositional feature vectors for accurately representing and classifying sentences or images.

Exploiting Similarities among Languages for Machine Translation

8 code implementations17 Sep 2013 Tomas Mikolov, Quoc V. Le, Ilya Sutskever

Dictionaries and phrase tables are the basis of modern statistical machine translation systems.

Machine Translation Translation

Large Scale Distributed Deep Networks

no code implementations NeurIPS 2012 Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Quoc V. Le, Andrew Y. Ng

Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance.

Object Recognition Speech Recognition

Tiled convolutional neural networks

no code implementations NeurIPS 2010 Jiquan Ngiam, Zhenghao Chen, Daniel Chia, Pang W. Koh, Quoc V. Le, Andrew Y. Ng

Using convolutional (tied) weights significantly reduces the number of parameters that have to be learned, and also allows translational invariance to be hard-coded into the architecture.

Object Recognition

Measuring Invariances in Deep Networks

no code implementations NeurIPS 2009 Ian Goodfellow, Honglak Lee, Quoc V. Le, Andrew Saxe, Andrew Y. Ng

Our evaluation metrics can also be used to evaluate future work in unsupervised deep learning, and thus help the development of future algorithms.

Cannot find the paper you are looking for? You can Submit a new open access paper.