Search Results for author: Quoc V. Le

Found 138 papers, 90 papers with code

Long-form factuality in large language models

2 code implementations27 Mar 2024 Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le

Empirically, we demonstrate that LLM agents can outperform crowdsourced human annotators - on a set of ~16k individual facts, SAFE agrees with crowdsourced human annotators 72% of the time, and on a random subset of 100 disagreement cases, SAFE wins 76% of the time.


Self-Discover: Large Language Models Self-Compose Reasoning Structures

2 code implementations6 Feb 2024 Pei Zhou, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, Huaixiu Steven Zheng

We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods.


AutoNumerics-Zero: Automated Discovery of State-of-the-Art Mathematical Functions

no code implementations13 Dec 2023 Esteban Real, Yao Chen, Mirko Rossini, Connal de Souza, Manav Garg, Akhil Verghese, Moritz Firsching, Quoc V. Le, Ekin Dogus Cubuk, David H. Park

Computers calculate transcendental functions by approximating them through the composition of a few limited-precision instructions.

Large Language Models as Optimizers

2 code implementations7 Sep 2023 Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou, Xinyun Chen

In this work, we propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language.


Simple synthetic data reduces sycophancy in large language models

1 code implementation7 Aug 2023 Jerry Wei, Da Huang, Yifeng Lu, Denny Zhou, Quoc V. Le

Adding these data in a lightweight finetuning step can significantly reduce sycophantic behavior on held-out prompts.

FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search

no code implementations7 Aug 2023 Jordan Dotzel, Gang Wu, Andrew Li, Muhammad Umar, Yun Ni, Mohamed S. Abdelfattah, Zhiru Zhang, Liqun Cheng, Martin G. Dixon, Norman P. Jouppi, Quoc V. Le, Sheng Li

With integer models, we increase the accuracy of ResNet-18 on ImageNet by 1. 31% and ResNet-50 by 0. 90% with equivalent model cost over previous methods.


DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

2 code implementations NeurIPS 2023 Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

The mixture proportions of pretraining data domains (e. g., Wikipedia, books, web text) greatly affect language model (LM) performance.

Language Modelling

Symbol tuning improves in-context learning in language models

no code implementations15 May 2023 Jerry Wei, Le Hou, Andrew Lampinen, Xiangning Chen, Da Huang, Yi Tay, Xinyun Chen, Yifeng Lu, Denny Zhou, Tengyu Ma, Quoc V. Le

We present symbol tuning - finetuning language models on in-context input-label pairs where natural language labels (e. g., "positive/negative sentiment") are replaced with arbitrary symbols (e. g., "foo/bar").

In-Context Learning

Unified Functional Hashing in Automatic Machine Learning

1 code implementation10 Feb 2023 Ryan Gillard, Stephen Jonany, Yingjie Miao, Michael Munn, Connal de Souza, Jonathan Dungay, Chen Liang, David R. So, Quoc V. Le, Esteban Real

In this paper, we show that large efficiency gains can be obtained by employing a fast unified functional hash, especially through the functional equivalence caching technique, which we also present.

Neural Architecture Search

PyGlove: Efficiently Exchanging ML Ideas as Code

1 code implementation3 Feb 2023 Daiyi Peng, Xuanyi Dong, Esteban Real, Yifeng Lu, Quoc V. Le

We also perform a case study of a large codebase where PyGlove led to an 80% reduction in the number of lines of code.

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

1 code implementation31 Jan 2023 Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, Adam Roberts

We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022).

Inverse scaling can become U-shaped

no code implementations3 Nov 2022 Jason Wei, Najoung Kim, Yi Tay, Quoc V. Le

The Inverse Scaling Prize (McKenzie et al. 2022) identified eleven such inverse scaling tasks, evaluated on models of up to 280B parameters and up to 500 zettaFLOPs of training compute.


Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

1 code implementation17 Oct 2022 Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei

BIG-Bench (Srivastava et al., 2022) is a diverse evaluation suite that focuses on tasks believed to be beyond the capabilities of current language models.

Language Modelling

Revisiting Multi-Scale Feature Fusion for Semantic Segmentation

no code implementations23 Mar 2022 Tianjian Meng, Golnaz Ghiasi, Reza Mahjourian, Quoc V. Le, Mingxing Tan

It is commonly believed that high internal resolution combined with expensive operations (e. g. atrous convolutions) are necessary for accurate semantic segmentation, resulting in slow speed and large memory usage.

Segmentation Semantic Segmentation

DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

1 code implementation CVPR 2022 Yingwei Li, Adams Wei Yu, Tianjian Meng, Ben Caine, Jiquan Ngiam, Daiyi Peng, Junyang Shen, Bo Wu, Yifeng Lu, Denny Zhou, Quoc V. Le, Alan Yuille, Mingxing Tan

In this paper, we propose two novel techniques: InverseAug that inverses geometric-related augmentations, e. g., rotation, to enable accurate geometric alignment between lidar points and image pixels, and LearnableAlign that leverages cross-attention to dynamically capture the correlations between image and lidar features during fusion.

3D Object Detection Autonomous Driving +2

Transformer Quality in Linear Time

1 code implementation21 Feb 2022 Weizhe Hua, Zihang Dai, Hanxiao Liu, Quoc V. Le

We revisit the design choices in Transformers, and propose methods to address their weaknesses in handling long sequences.

8k Language Modelling +1

Combined Scaling for Zero-shot Transfer Learning

no code implementations19 Nov 2021 Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, Quoc V. Le

Second, while increasing the dataset size and the model size has been the defacto method to improve the performance of deep learning models like BASIC, the effect of a large contrastive batch size on such contrastive-trained image-text models is not well-understood.

Classification Contrastive Learning +3

Primer: Searching for Efficient Transformers for Language Modeling

4 code implementations17 Sep 2021 David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc V. Le

For example, at a 500M parameter size, Primer improves the original T5 architecture on C4 auto-regressive language modeling, reducing the training cost by 4X.

Language Modelling

STraTA: Self-Training with Task Augmentation for Better Few-shot Learning

1 code implementation EMNLP 2021 Tu Vu, Minh-Thang Luong, Quoc V. Le, Grady Simon, Mohit Iyyer

Despite their recent successes in tackling many NLP tasks, large-scale pre-trained language models do not perform as well in few-shot settings where only a handful of training examples are available.

Few-Shot Learning Few-Shot NLI +1

Finetuned Language Models Are Zero-Shot Learners

5 code implementations ICLR 2022 Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le

We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks.

Common Sense Reasoning Coreference Resolution +8

Multi-Task Self-Training for Learning General Representations

no code implementations ICCV 2021 Golnaz Ghiasi, Barret Zoph, Ekin D. Cubuk, Quoc V. Le, Tsung-Yi Lin

The results suggest self-training is a promising direction to aggregate labeled and unlabeled training data for learning general feature representations.

Multi-Task Learning Partially Labeled Datasets +1

CoAtNet: Marrying Convolution and Attention for All Data Sizes

14 code implementations NeurIPS 2021 Zihang Dai, Hanxiao Liu, Quoc V. Le, Mingxing Tan

Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks.

Image Classification Inductive Bias

Pay Attention to MLPs

20 code implementations NeurIPS 2021 Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le

Transformers have become one of the most important architectural innovations in deep learning and have enabled many breakthroughs over the past few years.

Image Classification Natural Language Inference +2

EfficientNetV2: Smaller Models and Faster Training

20 code implementations1 Apr 2021 Mingxing Tan, Quoc V. Le

By pretraining on the same ImageNet21k, our EfficientNetV2 achieves 87. 3% top-1 accuracy on ImageNet ILSVRC2012, outperforming the recent ViT by 2. 0% accuracy while training 5x-11x faster using the same computing resources.

Classification Data Augmentation +2

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

4 code implementations11 Feb 2021 Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, YunHsuan Sung, Zhen Li, Tom Duerig

In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset.

 Ranked #1 on Image Classification on VTAB-1k (using extra training data)

Cross-Modal Retrieval Fine-Grained Image Classification +6

Evolving Reinforcement Learning Algorithms

5 code implementations ICLR 2021 John D. Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Sergey Levine, Quoc V. Le, Honglak Lee, Aleksandra Faust

Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm.

Atari Games Meta-Learning +2

AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

1 code implementation5 Jan 2021 Hieu Pham, Quoc V. Le

As a result, these conventional methods are less effective than methods that leverage the structures, such as SpatialDropout and DropBlock, which randomly drop the values at certain contiguous areas in the hidden states and setting them to zero.

Image Classification Language Modelling +1

Towards Domain-Agnostic Contrastive Learning

no code implementations9 Nov 2020 Vikas Verma, Minh-Thang Luong, Kenji Kawaguchi, Hieu Pham, Quoc V. Le

Despite recent success, most contrastive self-supervised learning methods are domain-specific, relying heavily on data augmentation techniques that require knowledge about a particular domain, such as image cropping and rotation.

Contrastive Learning Data Augmentation +3

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

1 code implementation20 Oct 2020 Yu Zhang, James Qin, Daniel S. Park, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Quoc V. Le, Yonghui Wu

We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset.

 Ranked #1 on Speech Recognition on LibriSpeech test-clean (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Smooth Adversarial Training

1 code implementation25 Jun 2020 Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le

SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82. 2% accuracy and 58. 6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9. 5% for accuracy and 11. 6% for robustness.

Adversarial Defense Adversarial Robustness

Rethinking Pre-training and Self-training

2 code implementations NeurIPS 2020 Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le

For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data.

Data Augmentation Object +4

AutoHAS: Efficient Hyperparameter and Architecture Search

no code implementations5 Jun 2020 Xuanyi Dong, Mingxing Tan, Adams Wei Yu, Daiyi Peng, Bogdan Gabrys, Quoc V. Le

Efficient hyperparameter or architecture search methods have shown remarkable results, but each of them is only applicable to searching for either hyperparameters (HPs) or architectures.

Hyperparameter Optimization Neural Architecture Search +1

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

3 code implementations NeurIPS 2020 Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le

With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost.

Decoder Reading Comprehension +1

Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension

no code implementations ICLR 2020 Xinyun Chen, Chen Liang, Adams Wei Yu, Denny Zhou, Dawn Song, Quoc V. Le

Integrating distributed representations with symbolic operations is essential for reading comprehension requiring complex reasoning, such as counting, sorting and arithmetics, but most existing approaches are hard to scale to more domains or more complex reasoning.

Data Augmentation Math +2

Evolving Normalization-Activation Layers

8 code implementations NeurIPS 2020 Hanxiao Liu, Andrew Brock, Karen Simonyan, Quoc V. Le

Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other.

Image Classification Image Generation +2

Meta Pseudo Labels

9 code implementations CVPR 2021 Hieu Pham, Zihang Dai, Qizhe Xie, Minh-Thang Luong, Quoc V. Le

We present Meta Pseudo Labels, a semi-supervised learning method that achieves a new state-of-the-art top-1 accuracy of 90. 2% on ImageNet, which is 1. 6% better than the existing state-of-the-art.

Meta-Learning Semi-Supervised Image Classification

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

17 code implementations ICLR 2020 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning

Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts whether each token in the corrupted input was replaced by a generator sample or not.

Language Modelling Masked Language Modeling +3

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

1 code implementation6 Mar 2020 Esteban Real, Chen Liang, David R. So, Quoc V. Le

However, this progress has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks---or similarly restrictive search spaces.

AutoML BIG-bench Machine Learning

Towards a Human-like Open-Domain Chatbot

2 code implementations27 Jan 2020 Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le

We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations.

Chatbot Specificity

SpecAugment on Large Scale Datasets

no code implementations11 Dec 2019 Daniel S. Park, Yu Zhang, Chung-Cheng Chiu, Youzheng Chen, Bo Li, William Chan, Quoc V. Le, Yonghui Wu

Recently, SpecAugment, an augmentation scheme for automatic speech recognition that acts directly on the spectrogram of input utterances, has shown to be highly effective in enhancing the performance of end-to-end networks on public datasets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

13 code implementations CVPR 2020 Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song

We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search.

Decoder General Classification +6

MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices

2 code implementations CVPR 2020 Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi Lin, Dmitry Kalenichenko, Hartwig Adams, Quoc V. Le

We propose MnasFPN, a mobile-friendly search space for the detection head, and combine it with latency-aware architecture search to produce efficient object detection models.

object-detection Object Detection

Adversarial Examples Improve Image Recognition

6 code implementations CVPR 2020 Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le

We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger.

Domain Generalization Image Classification

Self-training with Noisy Student improves ImageNet classification

13 code implementations CVPR 2020 Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. Le

During the learning of the student, we inject noise such as dropout, stochastic depth, and data augmentation via RandAugment to the student so that the student generalizes better than the teacher.

Ranked #16 on Image Classification on ImageNet ReaL (using extra training data)

Data Augmentation General Classification +1

High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

no code implementations NeurIPS 2019 Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee

Predicting future video frames is extremely challenging, as there are many factors of variation that make up the dynamics of how frames change through time.

Inductive Bias Optical Flow Estimation +2

RandAugment: Practical automated data augmentation with a reduced search space

16 code implementations NeurIPS 2020 Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, Quoc V. Le

Additionally, due to the separate search phase, these approaches are unable to adjust the regularization strength based on model or dataset size.

Data Augmentation Domain Generalization +3

Semi-supervised Learning by Coaching

no code implementations25 Sep 2019 Hieu Pham, Quoc V. Le

Recent semi-supervised learning (SSL) methods often have a teacher to train a student in order to propagate labels from labeled data to unlabeled data.

Saccader: Improving Accuracy of Hard Attention Models for Vision

2 code implementations NeurIPS 2019 Gamaleldin F. Elsayed, Simon Kornblith, Quoc V. Le

Although deep convolutional neural networks achieve state-of-the-art performance across nearly all image classification tasks, their decisions are difficult to interpret.

Hard Attention Image Classification

MixConv: Mixed Depthwise Convolutional Kernels

13 code implementations22 Jul 2019 Mingxing Tan, Quoc V. Le

In this paper, we systematically study the impact of different kernel sizes, and observe that combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency.

AutoML Image Classification +2

Neural Input Search for Large Scale Recommendation Models

no code implementations10 Jul 2019 Manas R. Joglekar, Cong Li, Jay K. Adams, Pranav Khaitan, Quoc V. Le

During training we use reinforcement learning to find the optimal vocabulary size for each feature and embedding dimension for each value of the feature.


XLNet: Generalized Autoregressive Pretraining for Language Understanding

23 code implementations NeurIPS 2019 Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling.

Audio Question Answering Chinese Reading Comprehension +9

Selfie: Self-supervised Pretraining for Image Embedding

1 code implementation7 Jun 2019 Trieu H. Trinh, Minh-Thang Luong, Quoc V. Le

Notably, on ImageNet 224 x 224 with 60 examples per class (5%), our method improves the mean accuracy of ResNet-50 from 35. 6% to 46. 7%, an improvement of 11. 1 points in absolute accuracy.

Language Modelling Masked Language Modeling

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

134 code implementations ICML 2019 Mingxing Tan, Quoc V. Le

Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available.

Action Recognition Domain Generalization +4

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

no code implementations9 May 2019 Daniel S. Park, Jascha Sohl-Dickstein, Quoc V. Le, Samuel L. Smith

We find that the optimal SGD hyper-parameters are determined by a "normalized noise scale," which is a function of the batch size, learning rate, and initialization conditions.

Diversity and Depth in Per-Example Routing Models

no code implementations ICLR 2019 Prajit Ramachandran, Quoc V. Le

Both architectural diversity and routing depth can increase the representational power of a routing network.

Multi-Task Learning

Do Language Models Have Common Sense?

no code implementations ICLR 2019 Trieu H. Trinh, Quoc V. Le

It has been argued that current machine learning models do not have commonsense, and therefore must be hard-coded with prior knowledge (Marcus, 2018).

Common Sense Reasoning Language Modelling

Unsupervised Data Augmentation for Consistency Training

20 code implementations NeurIPS 2020 Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, Quoc V. Le

In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.

Image Augmentation Semi-Supervised Image Classification +2

CondConv: Conditionally Parameterized Convolutions for Efficient Inference

9 code implementations NeurIPS 2019 Brandon Yang, Gabriel Bender, Quoc V. Le, Jiquan Ngiam

We demonstrate that scaling networks with CondConv improves the performance and inference cost trade-off of several existing convolutional neural network architectures on both classification and detection tasks.

General Classification Image Classification +1

The Evolved Transformer

3 code implementations30 Jan 2019 David R. So, Chen Liang, Quoc V. Le

Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models.

Machine Translation Neural Architecture Search

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

35 code implementations ACL 2019 Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov

Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling.

Language Modelling

Domain Adaptive Transfer Learning with Specialist Models

no code implementations16 Nov 2018 Jiquan Ngiam, Daiyi Peng, Vijay Vasudevan, Simon Kornblith, Quoc V. Le, Ruoming Pang

Our method to compute importance weights follow from ideas in domain adaptation, and we show a novel application to transfer learning.

Ranked #3 on Fine-Grained Image Classification on Stanford Cars (using extra training data)

Domain Adaptation Fine-Grained Image Classification +2

DropBlock: A regularization method for convolutional networks

6 code implementations NeurIPS 2018 Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le

This lack of success of dropout for convolutional layers is perhaps due to the fact that activation units in convolutional layers are spatially correlated so information can still flow through convolutional networks despite dropout.

Image Classification Object Detection

Semi-Supervised Sequence Modeling with Cross-View Training

2 code implementations EMNLP 2018 Kevin Clark, Minh-Thang Luong, Christopher D. Manning, Quoc V. Le

We therefore propose Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data.

CCG Supertagging Dependency Parsing +7

MnasNet: Platform-Aware Neural Architecture Search for Mobile

28 code implementations CVPR 2019 Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le

In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency.

Ranked #833 on Image Classification on ImageNet (using extra training data)

Image Classification Neural Architecture Search +2

Stochastic natural gradient descent draws posterior samples in function space

no code implementations25 Jun 2018 Samuel L. Smith, Daniel Duckworth, Semon Rezchikov, Quoc V. Le, Jascha Sohl-Dickstein

Recent work has argued that stochastic gradient descent can approximate the Bayesian uncertainty in model parameters near local minima.


AutoAugment: Learning Augmentation Policies from Data

33 code implementations24 May 2018 Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc V. Le

In our implementation, we have designed a search space where a policy consists of many sub-policies, one of which is randomly chosen for each image in each mini-batch.

Domain Generalization Fine-Grained Image Classification +1

Do Better ImageNet Models Transfer Better?

no code implementations CVPR 2019 Simon Kornblith, Jonathon Shlens, Quoc V. Le

Transfer learning is a cornerstone of computer vision, yet little work has been done to evaluate the relationship between architecture and transfer.

Fine-Grained Image Classification General Classification +1

Regularized Evolution for Image Classifier Architecture Search

4 code implementations5 Feb 2018 Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V. Le

The effort devoted to hand-crafting neural network image classifiers has motivated the use of architecture search to discover them automatically.

Evolutionary Algorithms Image Classification +1

Faster Discovery of Neural Architectures by Searching for Paths in a Large Model

no code implementations ICLR 2018 Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean

We propose Efficient Neural Architecture Search (ENAS), a faster and less expensive approach to automated model design than previous methods.

Neural Architecture Search

A Goal-oriented Neural Conversation Model by Self-Play

no code implementations ICLR 2018 Wei Wei, Quoc V. Le, Andrew M. Dai, Li-Jia Li

One challenge in applying such techniques to building goal-oriented conversation models is that maximum likelihood-based models are not optimized toward accomplishing goals.

Language Modelling Natural Language Understanding

Cross-View Training for Semi-Supervised Learning

no code implementations ICLR 2018 Kevin Clark, Thang Luong, Quoc V. Le

The students can learn from the teacher (the full model) because the teacher sees more of each example.

Ranked #4 on Chunking on CoNLL 2000 (using extra training data)


Code Synthesis with Priority Queue Training

no code implementations ICLR 2018 Daniel A. Abolafia, Quoc V. Le, Mohammad Norouzi

We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards.

Program Synthesis

A Hierarchical Model for Device Placement

no code implementations ICLR 2018 Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc V. Le, Jeff Dean

We introduce a hierarchical model for efficient placement of computational graphs onto hardware devices, especially in heterogeneous environments with a mixture of CPUs, GPUs, and other computational devices.

Machine Translation Reinforcement Learning (RL) +1


no code implementations ICLR 2018 Minh-Thang Luong, David Dohan, Adams Wei Yu, Quoc V. Le, Barret Zoph, Vijay Vasudevan

Neural architecture search (NAS), the task of finding neural architectures automatically, has recently emerged as a promising approach for unveiling better models over human-designed ones.

Language Modelling Neural Architecture Search +2

Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?

1 code implementation ICML 2018 Maithra Raghu, Alex Irpan, Jacob Andreas, Robert Kleinberg, Quoc V. Le, Jon Kleinberg

Deep reinforcement learning has achieved many recent successes, but our understanding of its strengths and limitations is hampered by the lack of rich environments in which we can fully characterize optimal behavior, and correspondingly diagnose individual actions against such a characterization.

reinforcement-learning Reinforcement Learning (RL)

Don't Decay the Learning Rate, Increase the Batch Size

3 code implementations ICLR 2018 Samuel L. Smith, Pieter-Jan Kindermans, Chris Ying, Quoc V. Le

We can further reduce the number of parameter updates by increasing the learning rate $\epsilon$ and scaling the batch size $B \propto \epsilon$.

A Bayesian Perspective on Generalization and Stochastic Gradient Descent

no code implementations17 Oct 2017 Samuel L. Smith, Quoc V. Le

Interpreting stochastic gradient descent as a stochastic differential equation, we identify the "noise scale" $g = \epsilon (\frac{N}{B} - 1) \approx \epsilon N/B$, where $\epsilon$ is the learning rate, $N$ the training set size and $B$ the batch size.

Searching for Activation Functions

21 code implementations ICLR 2018 Prajit Ramachandran, Barret Zoph, Quoc V. Le

The simplicity of Swish and its similarity to ReLU make it easy for practitioners to replace ReLUs with Swish units in any neural network.

Image Classification

Neural Optimizer Search with Reinforcement Learning

2 code implementations21 Sep 2017 Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le

We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures.

Machine Translation reinforcement-learning +2

Neural Optimizer Search using Reinforcement Learning

no code implementations ICML 2017 Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le

We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures.

Machine Translation reinforcement-learning +2

Learning Transferable Architectures for Scalable Image Recognition

17 code implementations CVPR 2018 Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le

In our experiments, we search for the best convolutional layer (or "cell") on the CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking together more copies of this cell, each with their own parameters to design a convolutional architecture, named "NASNet architecture".

Classification Image Classification +1

Device Placement Optimization with Reinforcement Learning

1 code implementation ICML 2017 Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jeff Dean

Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices.

Language Modelling Machine Translation +3

Learning to Skim Text

4 code implementations ACL 2017 Adams Wei Yu, Hongrae Lee, Quoc V. Le

Recurrent Neural Networks are showing much promise in many sub-areas of natural language processing, ranging from document classification to machine translation to automatic question answering.

Document Classification General Classification +4

An Online Sequence-to-Sequence Model Using Partial Conditioning

1 code implementation NeurIPS 2016 Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, David Sussillo, Samy Bengio

However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.

Neural Combinatorial Optimization with Reinforcement Learning

10 code implementations29 Nov 2016 Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, Samy Bengio

Despite the computational expense, without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes.

Combinatorial Optimization reinforcement-learning +2

Learning a Natural Language Interface with Neural Programmer

2 code implementations28 Nov 2016 Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, Dario Amodei

The main experimental result in this paper is that a single Neural Programmer model achieves 34. 2% accuracy using only 10, 000 examples with weak supervision.

Natural Language Queries Program induction +1

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

4 code implementations TACL 2017 Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean

In addition to improving the translation quality of language pairs that the model was trained with, our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation.

Machine Translation NMT +3

Unsupervised Pretraining for Sequence to Sequence Learning

no code implementations EMNLP 2017 Prajit Ramachandran, Peter J. Liu, Quoc V. Le

We apply this method to challenging benchmarks in machine translation and abstractive summarization and find that it significantly improves the subsequent supervised models.

Abstractive Text Summarization Decoder +2

Neural Architecture Search with Reinforcement Learning

11 code implementations5 Nov 2016 Barret Zoph, Quoc V. Le

Our cell achieves a test set perplexity of 62. 4 on the Penn Treebank, which is 3. 6 perplexity better than the previous state-of-the-art model.

Image Classification Language Modelling +4


8 code implementations27 Sep 2016 David Ha, Andrew Dai, Quoc V. Le

This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network.

Handwriting generation Language Modelling +2

Adding Gradient Noise Improves Learning for Very Deep Networks

4 code implementations21 Nov 2015 Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens

This success is partially attributed to architectural innovations such as convolutional and long short-term memory networks.

Question Answering

Multi-task Sequence to Sequence Learning

no code implementations19 Nov 2015 Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser

This paper examines three multi-task learning (MTL) settings for sequence to sequence models: (a) the oneto-many setting - where the encoder is shared between several tasks such as machine translation and syntactic parsing, (b) the many-to-one setting - useful when only the decoder can be shared, as in the case of translation and image caption generation, and (c) the many-to-many setting - where multiple encoders and decoders are shared, which is the case with unsupervised objectives and translation.

Caption Generation Decoder +3

A Neural Transducer

no code implementations16 Nov 2015 Navdeep Jaitly, David Sussillo, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, Samy Bengio

However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.

Neural Programmer: Inducing Latent Programs with Gradient Descent

no code implementations16 Nov 2015 Arvind Neelakantan, Quoc V. Le, Ilya Sutskever

In this work, we propose Neural Programmer, an end-to-end differentiable neural network augmented with a small set of basic arithmetic and logic operations.

Question Answering speech-recognition +1

Semi-supervised Sequence Learning

162 code implementations NeurIPS 2015 Andrew M. Dai, Quoc V. Le

In our experiments, we find that long short term memory recurrent networks after being pretrained with the two approaches are more stable and generalize better.

Language Modelling Text Classification

Listen, Attend and Spell

40 code implementations5 Aug 2015 William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals

Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly.

Decoder Language Modelling +2

Document Embedding with Paragraph Vectors

5 code implementations29 Jul 2015 Andrew M. Dai, Christopher Olah, Quoc V. Le

Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts.

Document Embedding Sentiment Analysis +1

Addressing the Rare Word Problem in Neural Machine Translation

5 code implementations IJCNLP 2015 Minh-Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba

Our experiments on the WMT14 English to French translation task show that this method provides a substantial improvement of up to 2. 8 BLEU points over an equivalent NMT system that does not use this technique.

Machine Translation NMT +3

Sequence to Sequence Learning with Neural Networks

73 code implementations NeurIPS 2014 Ilya Sutskever, Oriol Vinyals, Quoc V. Le

Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

Ranked #4 on Traffic Prediction on PeMS-M (using extra training data)

Machine Translation Sentence +2

Distributed Representations of Sentences and Documents

27 code implementations16 May 2014 Quoc V. Le, Tomas Mikolov

Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models.

Question Answering Sentiment Analysis +1

Grounded Compositional Semantics for Finding and Describing Images with Sentences

no code implementations TACL 2014 Richard Socher, Andrej Karpathy, Quoc V. Le, Christopher D. Manning, Andrew Y. Ng

Previous work on Recursive Neural Networks (RNNs) shows that these models can produce compositional feature vectors for accurately representing and classifying sentences or images.


Exploiting Similarities among Languages for Machine Translation

8 code implementations17 Sep 2013 Tomas Mikolov, Quoc V. Le, Ilya Sutskever

Dictionaries and phrase tables are the basis of modern statistical machine translation systems.

Machine Translation Translation

Tiled convolutional neural networks

no code implementations NeurIPS 2010 Jiquan Ngiam, Zhenghao Chen, Daniel Chia, Pang W. Koh, Quoc V. Le, Andrew Y. Ng

Using convolutional (tied) weights significantly reduces the number of parameters that have to be learned, and also allows translational invariance to be hard-coded into the architecture.

Object Recognition

Measuring Invariances in Deep Networks

no code implementations NeurIPS 2009 Ian Goodfellow, Honglak Lee, Quoc V. Le, Andrew Saxe, Andrew Y. Ng

Our evaluation metrics can also be used to evaluate future work in unsupervised deep learning, and thus help the development of future algorithms.

Cannot find the paper you are looking for? You can Submit a new open access paper.