7 code implementations • 13 Feb 2023 • Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Yao Liu, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, Quoc V. Le
We present a method to formulate algorithm discovery as program search, and apply it to discover optimization algorithms for deep neural network training.
1 code implementation • 10 Feb 2023 • Ryan Gillard, Stephen Jonany, Yingjie Miao, Michael Munn, Connal de Souza, Jonathan Dungay, Chen Liang, David R. So, Quoc V. Le, Esteban Real
In this paper, we show that large efficiency gains can be obtained by employing a fast unified functional hash, especially through the functional equivalence caching technique, which we also present.
no code implementations • 8 Feb 2023 • Qingqing Huang, Daniel S. Park, Tao Wang, Timo I. Denk, Andy Ly, Nanxin Chen, Zhengdong Zhang, Zhishuai Zhang, Jiahui Yu, Christian Frank, Jesse Engel, Quoc V. Le, William Chan, Zhifeng Chen, Wei Han
We introduce Noise2Music, where a series of diffusion models is trained to generate high-quality 30-second music clips from text prompts.
1 code implementation • 3 Feb 2023 • Daiyi Peng, Xuanyi Dong, Esteban Real, Yifeng Lu, Quoc V. Le
We also perform a case study of a large codebase where PyGlove led to an 80% reduction in the number of lines of code.
1 code implementation • 31 Jan 2023 • Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, Adam Roberts
We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022).
no code implementations • 3 Nov 2022 • Jason Wei, Najoung Kim, Yi Tay, Quoc V. Le
The Inverse Scaling Prize (McKenzie et al. 2022) identified eleven such inverse scaling tasks, evaluated on models of up to 280B parameters and up to 500 zettaFLOPs of training compute.
1 code implementation • 20 Oct 2022 • Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei
We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation).
Ranked #1 on
Multi-task Language Understanding
on BBH-nlp
Cross-Lingual Question Answering
Multi-task Language Understanding
+1
1 code implementation • 20 Oct 2022 • Yi Tay, Jason Wei, Hyung Won Chung, Vinh Q. Tran, David R. So, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani
This paper proposes UL2R, a method that substantially improves existing language models and their scaling curves with a relatively tiny amount of extra compute.
Ranked #1 on
Question Answering
on StrategyQA
no code implementations • 19 Oct 2022 • Gary Wang, Ekin D. Cubuk, Andrew Rosenberg, Shuyang Cheng, Ron J. Weiss, Bhuvana Ramabhadran, Pedro J. Moreno, Quoc V. Le, Daniel S. Park
Data augmentation is a ubiquitous technique used to provide robustness to automatic speech recognition (ASR) training.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 17 Oct 2022 • Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei
BIG-Bench (Srivastava et al., 2022) is a diverse evaluation suite that focuses on tasks believed to be beyond the capabilities of current language models.
no code implementations • 23 Mar 2022 • Tianjian Meng, Golnaz Ghiasi, Reza Mahjourian, Quoc V. Le, Mingxing Tan
It is commonly believed that high internal resolution combined with expensive operations (e. g. atrous convolutions) are necessary for accurate semantic segmentation, resulting in slow speed and large memory usage.
1 code implementation • CVPR 2022 • Yingwei Li, Adams Wei Yu, Tianjian Meng, Ben Caine, Jiquan Ngiam, Daiyi Peng, Junyang Shen, Bo Wu, Yifeng Lu, Denny Zhou, Quoc V. Le, Alan Yuille, Mingxing Tan
In this paper, we propose two novel techniques: InverseAug that inverses geometric-related augmentations, e. g., rotation, to enable accurate geometric alignment between lidar points and image pixels, and LearnableAlign that leverages cross-attention to dynamically capture the correlations between image and lidar features during fusion.
no code implementations • 21 Feb 2022 • Weizhe Hua, Zihang Dai, Hanxiao Liu, Quoc V. Le
We revisit the design choices in Transformers, and propose methods to address their weaknesses in handling long sequences.
Ranked #1 on
Language Modelling
on Wiki-40B
no code implementations • 19 Nov 2021 • Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, Quoc V. Le
Second, while increasing the dataset size and the model size has been the defacto method to improve the performance of deep learning models like BASIC, the effect of a large contrastive batch size on such contrastive-trained image-text models is not well-understood.
no code implementations • 27 Sep 2021 • Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang, Yonghui Wu
We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
3 code implementations • 17 Sep 2021 • David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc V. Le
For example, at a 500M parameter size, Primer improves the original T5 architecture on C4 auto-regressive language modeling, reducing the training cost by 4X.
Ranked #1 on
Language Modelling
on C4
1 code implementation • EMNLP 2021 • Tu Vu, Minh-Thang Luong, Quoc V. Le, Grady Simon, Mohit Iyyer
Despite their recent successes in tackling many NLP tasks, large-scale pre-trained language models do not perform as well in few-shot settings where only a handful of training examples are available.
Ranked #1 on
Few-Shot NLI
on SNLI (8 training examples per class)
3 code implementations • ICLR 2022 • Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le
We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks.
Ranked #1 on
Question Answering
on StoryCloze
no code implementations • ICCV 2021 • Golnaz Ghiasi, Barret Zoph, Ekin D. Cubuk, Quoc V. Le, Tsung-Yi Lin
The results suggest self-training is a promising direction to aggregate labeled and unlabeled training data for learning general feature representations.
10 code implementations • NeurIPS 2021 • Zihang Dai, Hanxiao Liu, Quoc V. Le, Mingxing Tan
Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks.
Ranked #1 on
Image Classification
on GasHisSDB
18 code implementations • NeurIPS 2021 • Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le
Transformers have become one of the most important architectural innovations in deep learning and have enabled many breakthroughs over the past few years.
Ranked #22 on
Natural Language Inference
on MultiNLI
20 code implementations • 1 Apr 2021 • Mingxing Tan, Quoc V. Le
By pretraining on the same ImageNet21k, our EfficientNetV2 achieves 87. 3% top-1 accuracy on ImageNet ILSVRC2012, outperforming the recent ViT by 2. 0% accuracy while training 5x-11x faster using the same computing resources.
Ranked #2 on
Image Classification
on Stanford Cars
3 code implementations • 11 Feb 2021 • Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, YunHsuan Sung, Zhen Li, Tom Duerig
In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset.
Ranked #1 on
Image Classification
on VTAB-1k
(using extra training data)
no code implementations • NeurIPS 2020 • Daiyi Peng, Xuanyi Dong, Esteban Real, Mingxing Tan, Yifeng Lu, Hanxiao Liu, Gabriel Bender, Adam Kraft, Chen Liang, Quoc V. Le
As a result, AutoML can be reformulated as an automated process of symbolic manipulation.
5 code implementations • ICLR 2021 • John D. Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Sergey Levine, Quoc V. Le, Honglak Lee, Aleksandra Faust
Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm.
1 code implementation • 5 Jan 2021 • Hieu Pham, Quoc V. Le
As a result, these conventional methods are less effective than methods that leverage the structures, such as SpatialDropout and DropBlock, which randomly drop the values at certain contiguous areas in the hidden states and setting them to zero.
Ranked #1 on
Image Classification
on cifar-10,4000
1 code implementation • EMNLP 2020 • Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning
We introduce Electric, an energy-based cloze model for representation learning over text.
5 code implementations • CVPR 2021 • Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le, Barret Zoph
Our baseline model outperforms the LVIS 2020 Challenge winning entry by +3. 6 mask AP on rare categories.
Ranked #2 on
Object Detection
on PASCAL VOC 2007
no code implementations • 9 Nov 2020 • Vikas Verma, Minh-Thang Luong, Kenji Kawaguchi, Hieu Pham, Quoc V. Le
Despite recent success, most contrastive self-supervised learning methods are domain-specific, relying heavily on data augmentation techniques that require knowledge about a particular domain, such as image cropping and rotation.
1 code implementation • 20 Oct 2020 • Yu Zhang, James Qin, Daniel S. Park, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Quoc V. Le, Yonghui Wu
We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset.
Ranked #1 on
Speech Recognition
on LibriSpeech test-clean
(using extra training data)
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 25 Jun 2020 • Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le
SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82. 2% accuracy and 58. 6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9. 5% for accuracy and 11. 6% for robustness.
2 code implementations • NeurIPS 2020 • Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le
For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data.
Ranked #1 on
Semantic Segmentation
on PASCAL VOC 2012 val
3 code implementations • NeurIPS 2020 • Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le
With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost.
Ranked #6 on
Reading Comprehension
on RACE
no code implementations • 5 Jun 2020 • Xuanyi Dong, Mingxing Tan, Adams Wei Yu, Daiyi Peng, Bogdan Gabrys, Quoc V. Le
Efficient hyperparameter or architecture search methods have shown remarkable results, but each of them is only applicable to searching for either hyperparameters (HPs) or architectures.
no code implementations • 19 May 2020 • Daniel S. Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, Quoc V. Le
Noisy student training is an iterative self-training method that leverages augmentation to improve network performance.
Ranked #4 on
Speech Recognition
on LibriSpeech test-clean
(using extra training data)
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • ICLR 2020 • Xinyun Chen, Chen Liang, Adams Wei Yu, Denny Zhou, Dawn Song, Quoc V. Le
Integrating distributed representations with symbolic operations is essential for reading comprehension requiring complex reasoning, such as counting, sorting and arithmetics, but most existing approaches are hard to scale to more domains or more complex reasoning.
Ranked #4 on
Question Answering
on DROP Test
1 code implementation • 22 Apr 2020 • Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Sungmin Bae, Azade Nazi, Jiwoo Pak, Andy Tong, Kavya Srinivasa, William Hang, Emre Tuncer, Anand Babu, Quoc V. Le, James Laudon, Richard Ho, Roger Carpenter, Jeff Dean
To achieve these results, we pose placement as a Reinforcement Learning (RL) problem and train an agent to place the nodes of a chip netlist onto a chip canvas.
8 code implementations • NeurIPS 2020 • Hanxiao Liu, Andrew Brock, Karen Simonyan, Quoc V. Le
Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other.
no code implementations • ECCV 2020 • Shuyang Cheng, Zhaoqi Leng, Ekin Dogus Cubuk, Barret Zoph, Chunyan Bai, Jiquan Ngiam, Yang song, Benjamin Caine, Vijay Vasudevan, Cong-Cong Li, Quoc V. Le, Jonathon Shlens, Dragomir Anguelov
Data augmentation has been widely adopted for object detection in 3D point clouds.
7 code implementations • CVPR 2021 • Hieu Pham, Zihang Dai, Qizhe Xie, Minh-Thang Luong, Quoc V. Le
We present Meta Pseudo Labels, a semi-supervised learning method that achieves a new state-of-the-art top-1 accuracy of 90. 2% on ImageNet, which is 1. 6% better than the existing state-of-the-art.
17 code implementations • ICLR 2020 • Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning
Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts whether each token in the corrupted input was replaced by a generator sample or not.
Ranked #7 on
Question Answering
on Quora Question Pairs
2 code implementations • 6 Mar 2020 • Esteban Real, Chen Liang, David R. So, Quoc V. Le
However, this progress has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks---or similarly restrictive search spaces.
2 code implementations • 27 Jan 2020 • Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le
We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations.
no code implementations • 11 Dec 2019 • Daniel S. Park, Yu Zhang, Chung-Cheng Chiu, Youzheng Chen, Bo Li, William Chan, Quoc V. Le, Yonghui Wu
Recently, SpecAugment, an augmentation scheme for automatic speech recognition that acts directly on the spectrogram of input utterances, has shown to be highly effective in enhancing the performance of end-to-end networks on public datasets.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
5 code implementations • CVPR 2020 • Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song
We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search.
Ranked #8 on
Image Classification
on iNaturalist
2 code implementations • CVPR 2020 • Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi Lin, Dmitry Kalenichenko, Hartwig Adams, Quoc V. Le
We propose MnasFPN, a mobile-friendly search space for the detection head, and combine it with latency-aware architecture search to produce efficient object detection models.
Ranked #231 on
Object Detection
on COCO test-dev
no code implementations • NeurIPS 2019 • Zhilin Yang, Thang Luong, Russ R. Salakhutdinov, Quoc V. Le
The softmax bottleneck has been shown to limit the expressiveness of neural language models.
6 code implementations • CVPR 2020 • Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le
We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger.
Ranked #186 on
Image Classification
on ImageNet
52 code implementations • CVPR 2020 • Mingxing Tan, Ruoming Pang, Quoc V. Le
Model efficiency has become increasingly important in computer vision.
Ranked #6 on
Object Detection
on COCO minival
(APS metric)
12 code implementations • CVPR 2020 • Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. Le
During the learning of the student, we inject noise such as dropout, stochastic depth, and data augmentation via RandAugment to the student so that the student generalizes better than the teacher.
Ranked #14 on
Image Classification
on ImageNet ReaL
(using extra training data)
no code implementations • NeurIPS 2019 • Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee
Predicting future video frames is extremely challenging, as there are many factors of variation that make up the dynamics of how frames change through time.
15 code implementations • NeurIPS 2020 • Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, Quoc V. Le
Additionally, due to the separate search phase, these approaches are unable to adjust the regularization strength based on model or dataset size.
Ranked #10 on
Data Augmentation
on ImageNet
no code implementations • 25 Sep 2019 • Hieu Pham, Quoc V. Le
Recent semi-supervised learning (SSL) methods often have a teacher to train a student in order to propagate labels from labeled data to unlabeled data.
2 code implementations • NeurIPS 2019 • Gamaleldin F. Elsayed, Simon Kornblith, Quoc V. Le
Although deep convolutional neural networks achieve state-of-the-art performance across nearly all image classification tasks, their decisions are difficult to interpret.
12 code implementations • 22 Jul 2019 • Mingxing Tan, Quoc V. Le
In this paper, we systematically study the impact of different kernel sizes, and observe that combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency.
Ranked #646 on
Image Classification
on ImageNet
no code implementations • 10 Jul 2019 • Manas R. Joglekar, Cong Li, Jay K. Adams, Pranav Khaitan, Quoc V. Le
During training we use reinforcement learning to find the optimal vocabulary size for each feature and embedding dimension for each value of the feature.
1 code implementation • ACL 2019 • Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, Quoc V. Le
It can be challenging to train multi-task neural networks that outperform or even match their single-task counterparts.
6 code implementations • ECCV 2020 • Barret Zoph, Ekin D. Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, Quoc V. Le
Importantly, the best policy found on COCO may be transferred unchanged to other detection datasets and models to improve predictive accuracy.
Ranked #80 on
Object Detection
on COCO test-dev
23 code implementations • NeurIPS 2019 • Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling.
1 code implementation • 7 Jun 2019 • Trieu H. Trinh, Minh-Thang Luong, Quoc V. Le
Notably, on ImageNet 224 x 224 with 60 examples per class (5%), our method improves the mean accuracy of ResNet-50 from 35. 6% to 46. 7%, an improvement of 11. 1 points in absolute accuracy.
117 code implementations • ICML 2019 • Mingxing Tan, Quoc V. Le
Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available.
Ranked #1 on
Medical Image Classification
on NCT-CRC-HE-100K
Fine-Grained Image Classification
Medical Image Classification
+2
no code implementations • 9 May 2019 • Daniel S. Park, Jascha Sohl-Dickstein, Quoc V. Le, Samuel L. Smith
We find that the optimal SGD hyper-parameters are determined by a "normalized noise scale," which is a function of the batch size, learning rate, and initialization conditions.
49 code implementations • ICCV 2019 • Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam
We achieve new state of the art results for mobile classification, detection and segmentation.
Ranked #4 on
Dichotomous Image Segmentation
on DIS-TE3
no code implementations • ICLR 2019 • Zihang Dai*, Zhilin Yang*, Yiming Yang, William W. Cohen, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov
Moreover, Transformer-XL is up to 1, 800+ times faster than vanilla Transformer during evaluation.
no code implementations • ICLR 2019 • Prajit Ramachandran, Quoc V. Le
Both architectural diversity and routing depth can increase the representational power of a routing network.
Ranked #2 on
Multi-Task Learning
on OMNIGLOT
no code implementations • ICLR 2019 • Trieu H. Trinh, Quoc V. Le
It has been argued that current machine learning models do not have commonsense, and therefore must be hard-coded with prior knowledge (Marcus, 2018).
20 code implementations • NeurIPS 2020 • Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, Quoc V. Le
In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.
Ranked #1 on
Sentiment Analysis
on Amazon Review Full
14 code implementations • ICCV 2019 • Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, Quoc V. Le
Convolutional networks have been the paradigm of choice in many computer vision applications.
Ranked #109 on
Image Classification
on CIFAR-100
29 code implementations • 18 Apr 2019 • Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le
On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.
Ranked #1 on
Speech Recognition
on Hub5'00 SwitchBoard
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
3 code implementations • CVPR 2019 • Golnaz Ghiasi, Tsung-Yi Lin, Ruoming Pang, Quoc V. Le
Here we aim to learn a better architecture of feature pyramid network for object detection.
8 code implementations • NeurIPS 2019 • Brandon Yang, Gabriel Bender, Quoc V. Le, Jiquan Ngiam
We demonstrate that scaling networks with CondConv improves the performance and inference cost trade-off of several existing convolutional neural network architectures on both classification and detection tasks.
Ranked #682 on
Image Classification
on ImageNet
2 code implementations • 30 Jan 2019 • David R. So, Chen Liang, Quoc V. Le
Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models.
Ranked #1 on
Machine Translation
on WMT2014 English-Czech
33 code implementations • ACL 2019 • Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov
Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling.
Ranked #3 on
Language Modelling
on One Billion Word
no code implementations • 16 Nov 2018 • Jiquan Ngiam, Daiyi Peng, Vijay Vasudevan, Simon Kornblith, Quoc V. Le, Ruoming Pang
Our method to compute importance weights follow from ideas in domain adaptation, and we show a novel application to transfer learning.
Ranked #2 on
Fine-Grained Image Classification
on Stanford Cars
(using extra training data)
12 code implementations • NeurIPS 2019 • Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, Zhifeng Chen
Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks.
Ranked #4 on
Fine-Grained Image Classification
on Birdsnap
(using extra training data)
6 code implementations • NeurIPS 2018 • Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le
This lack of success of dropout for convolutional layers is perhaps due to the fact that activation units in convolutional layers are spatially correlated so information can still flow through convolutional networks despite dropout.
Ranked #681 on
Image Classification
on ImageNet
2 code implementations • EMNLP 2018 • Kevin Clark, Minh-Thang Luong, Christopher D. Manning, Quoc V. Le
We therefore propose Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data.
Ranked #3 on
CCG Supertagging
on CCGbank
18 code implementations • CVPR 2019 • Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le
In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency.
Ranked #731 on
Image Classification
on ImageNet
no code implementations • 25 Jun 2018 • Samuel L. Smith, Daniel Duckworth, Semon Rezchikov, Quoc V. Le, Jascha Sohl-Dickstein
Recent work has argued that stochastic gradient descent can approximate the Bayesian uncertainty in model parameters near local minima.
2 code implementations • 7 Jun 2018 • Trieu H. Trinh, Quoc V. Le
Commonsense reasoning is a long-standing challenge for deep learning.
Ranked #6 on
Common Sense Reasoning
on Winograd Schema Challenge
27 code implementations • 24 May 2018 • Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc V. Le
In our implementation, we have designed a search space where a policy consists of many sub-policies, one of which is randomly chosen for each image in each mini-batch.
Ranked #4 on
Data Augmentation
on ImageNet
no code implementations • CVPR 2019 • Simon Kornblith, Jonathon Shlens, Quoc V. Le
Transfer learning is a cornerstone of computer vision, yet little work has been done to evaluate the relationship between architecture and transfer.
16 code implementations • ICLR 2018 • Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, Quoc V. Le
On the SQuAD dataset, our model is 3x to 13x faster in training and 4x to 9x faster in inference, while achieving equivalent accuracy to recurrent models.
Ranked #27 on
Question Answering
on SQuAD1.1 dev
no code implementations • ICML 2018 • Trieu H. Trinh, Andrew M. Dai, Minh-Thang Luong, Quoc V. Le
Despite recent advances in training recurrent neural networks (RNNs), capturing long-term dependencies in sequences remains a fundamental challenge.
Ranked #10 on
Sequential Image Classification
on Sequential CIFAR-10
27 code implementations • 9 Feb 2018 • Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean
The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set.
4 code implementations • 5 Feb 2018 • Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V. Le
The effort devoted to hand-crafting neural network image classifiers has motivated the use of architecture search to discover them automatically.
4 code implementations • 10 Jan 2018 • Daniel A. Abolafia, Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le
Models and examples built with TensorFlow
no code implementations • ICLR 2018 • Kevin Clark, Thang Luong, Quoc V. Le
The students can learn from the teacher (the full model) because the teacher sees more of each example.
Ranked #4 on
Chunking
on CoNLL 2000
(using extra training data)
no code implementations • ICLR 2018 • Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean
We propose Efficient Neural Architecture Search (ENAS), a faster and less expensive approach to automated model design than previous methods.
no code implementations • ICLR 2018 • Daniel A. Abolafia, Quoc V. Le, Mohammad Norouzi
We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards.
no code implementations • ICLR 2018 • Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc V. Le, Jeff Dean
We introduce a hierarchical model for efficient placement of computational graphs onto hardware devices, especially in heterogeneous environments with a mixture of CPUs, GPUs, and other computational devices.
no code implementations • ICLR 2018 • Wei Wei, Quoc V. Le, Andrew M. Dai, Li-Jia Li
One challenge in applying such techniques to building goal-oriented conversation models is that maximum likelihood-based models are not optimized toward accomplishing goals.
no code implementations • ICLR 2018 • Minh-Thang Luong, David Dohan, Adams Wei Yu, Quoc V. Le, Barret Zoph, Vijay Vasudevan
Neural architecture search (NAS), the task of finding neural architectures automatically, has recently emerged as a promising approach for unveiling better models over human-designed ones.
no code implementations • ICLR 2018 • Ekin D. Cubuk, Barret Zoph, Samuel S. Schoenholz, Quoc V. Le
Finally, we study the effect of network architectures on adversarial sensitivity.
1 code implementation • ICML 2018 • Maithra Raghu, Alex Irpan, Jacob Andreas, Robert Kleinberg, Quoc V. Le, Jon Kleinberg
Deep reinforcement learning has achieved many recent successes, but our understanding of its strengths and limitations is hampered by the lack of rich environments in which we can fully characterize optimal behavior, and correspondingly diagnose individual actions against such a characterization.
3 code implementations • ICLR 2018 • Samuel L. Smith, Pieter-Jan Kindermans, Chris Ying, Quoc V. Le
We can further reduce the number of parameter updates by increasing the learning rate $\epsilon$ and scaling the batch size $B \propto \epsilon$.
no code implementations • 17 Oct 2017 • Samuel L. Smith, Quoc V. Le
Interpreting stochastic gradient descent as a stochastic differential equation, we identify the "noise scale" $g = \epsilon (\frac{N}{B} - 1) \approx \epsilon N/B$, where $\epsilon$ is the learning rate, $N$ the training set size and $B$ the batch size.
21 code implementations • ICLR 2018 • Prajit Ramachandran, Barret Zoph, Quoc V. Le
The simplicity of Swish and its similarity to ReLU make it easy for practitioners to replace ReLUs with Swish units in any neural network.
3 code implementations • 21 Sep 2017 • Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le
We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures.
no code implementations • ICML 2017 • Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le
We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures.
12 code implementations • CVPR 2018 • Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le
In our experiments, we search for the best convolutional layer (or "cell") on the CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking together more copies of this cell, each with their own parameters to design a convolutional architecture, named "NASNet architecture".
Ranked #31 on
Image Classification
on ImageNet ReaL
1 code implementation • ICML 2017 • Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jeff Dean
Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices.
4 code implementations • ACL 2017 • Adams Wei Yu, Hongrae Lee, Quoc V. Le
Recurrent Neural Networks are showing much promise in many sub-areas of natural language processing, ranging from document classification to machine translation to automatic question answering.
no code implementations • NeurIPS 2016 • Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, David Sussillo, Samy Bengio
However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.
12 code implementations • 29 Nov 2016 • Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, Samy Bengio
Despite the computational expense, without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes.
2 code implementations • 28 Nov 2016 • Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, Dario Amodei
The main experimental result in this paper is that a single Neural Programmer model achieves 34. 2% accuracy using only 10, 000 examples with weak supervision.
4 code implementations • TACL 2017 • Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean
In addition to improving the translation quality of language pairs that the model was trained with, our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation.
no code implementations • EMNLP 2017 • Prajit Ramachandran, Peter J. Liu, Quoc V. Le
We apply this method to challenging benchmarks in machine translation and abstractive summarization and find that it significantly improves the subsequent supervised models.
10 code implementations • 5 Nov 2016 • Barret Zoph, Quoc V. Le
Our cell achieves a test set perplexity of 62. 4 on the Penn Treebank, which is 3. 6 perplexity better than the previous state-of-the-art model.
8 code implementations • 27 Sep 2016 • David Ha, Andrew Dai, Quoc V. Le
This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network.
Ranked #14 on
Language Modelling
on Penn Treebank (Character Level)
24 code implementations • 26 Sep 2016 • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean
To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder.
Ranked #35 on
Machine Translation
on WMT2014 English-French
4 code implementations • 21 Nov 2015 • Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens
This success is partially attributed to architectural innovations such as convolutional and long short-term memory networks.
no code implementations • 19 Nov 2015 • Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser
This paper examines three multi-task learning (MTL) settings for sequence to sequence models: (a) the oneto-many setting - where the encoder is shared between several tasks such as machine translation and syntactic parsing, (b) the many-to-one setting - useful when only the decoder can be shared, as in the case of translation and image caption generation, and (c) the many-to-many setting - where multiple encoders and decoders are shared, which is the case with unsupervised objectives and translation.
no code implementations • 16 Nov 2015 • Arvind Neelakantan, Quoc V. Le, Ilya Sutskever
In this work, we propose Neural Programmer, an end-to-end differentiable neural network augmented with a small set of basic arithmetic and logic operations.
no code implementations • 16 Nov 2015 • Navdeep Jaitly, David Sussillo, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, Samy Bengio
However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.
164 code implementations • NeurIPS 2015 • Andrew M. Dai, Quoc V. Le
In our experiments, we find that long short term memory recurrent networks after being pretrained with the two approaches are more stable and generalize better.
40 code implementations • 5 Aug 2015 • William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals
Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly.
5 code implementations • 29 Jul 2015 • Andrew M. Dai, Christopher Olah, Quoc V. Le
Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts.
4 code implementations • 3 Apr 2015 • Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton
Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients.
Ranked #25 on
Sequential Image Classification
on Sequential MNIST
5 code implementations • IJCNLP 2015 • Minh-Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba
Our experiments on the WMT14 English to French translation task show that this method provides a substantial improvement of up to 2. 8 BLEU points over an equivalent NMT system that does not use this technique.
Ranked #40 on
Machine Translation
on WMT2014 English-French
64 code implementations • NeurIPS 2014 • Ilya Sutskever, Oriol Vinyals, Quoc V. Le
Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Ranked #5 on
Traffic Prediction
on PeMS-M
(using extra training data)
26 code implementations • 16 May 2014 • Quoc V. Le, Tomas Mikolov
Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models.
Ranked #4 on
Question Answering
on QASent
no code implementations • TACL 2014 • Richard Socher, Andrej Karpathy, Quoc V. Le, Christopher D. Manning, Andrew Y. Ng
Previous work on Recursive Neural Networks (RNNs) shows that these models can produce compositional feature vectors for accurately representing and classifying sentences or images.
8 code implementations • 17 Sep 2013 • Tomas Mikolov, Quoc V. Le, Ilya Sutskever
Dictionaries and phrase tables are the basis of modern statistical machine translation systems.
no code implementations • NeurIPS 2012 • Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Quoc V. Le, Andrew Y. Ng
Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance.
1 code implementation • 29 Dec 2011 • Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, Andrew Y. Ng
For example, is it possible to learn a face detector using only unlabeled images?
no code implementations • NeurIPS 2011 • Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, Andrew Y. Ng
We show that the soft reconstruction cost can also be used to prevent replicated features in tiled convolutional neural networks.
Ranked #116 on
Image Classification
on STL-10
no code implementations • NeurIPS 2010 • Jiquan Ngiam, Zhenghao Chen, Daniel Chia, Pang W. Koh, Quoc V. Le, Andrew Y. Ng
Using convolutional (tied) weights significantly reduces the number of parameters that have to be learned, and also allows translational invariance to be hard-coded into the architecture.
no code implementations • NeurIPS 2009 • Ian Goodfellow, Honglak Lee, Quoc V. Le, Andrew Saxe, Andrew Y. Ng
Our evaluation metrics can also be used to evaluate future work in unsupervised deep learning, and thus help the development of future algorithms.
no code implementations • NeurIPS 2008 • Olivier Chapelle, Chuong B. Do, Choon H. Teo, Quoc V. Le, Alex J. Smola
Large-margin structured estimation methods work by minimizing a convex upper bound of loss functions.