1 code implementation • 17 Feb 2022 • Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, William Fedus
But advancing the state-of-the-art across a broad set of natural language tasks has been hindered by training instabilities and uncertain quality during fine-tuning.
Ranked #1 on
Common Sense Reasoning
on ARC (Easy)
3 code implementations • 3 Sep 2021 • Xianzhi Du, Yeqing Li, Yin Cui, Rui Qian, Jing Li, Irwan Bello
A recent work from Bello shows that training and scaling strategies may be more significant than model architectures for visual recognition.
Ranked #38 on
Action Classification
on Kinetics-600
3 code implementations • NeurIPS 2021 • Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph
Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.
Ranked #1 on
Document Image Classification
on AIP
7 code implementations • ICLR 2021 • Irwan Bello
We present lambda layers -- an alternative framework to self-attention -- for capturing long-range interactions between an input and structured contextual information (e. g. a pixel surrounded by other pixels).
Ranked #278 on
Image Classification
on ImageNet
no code implementations • 1 Jan 2021 • Zhuoran Shen, Irwan Bello, Raviteja Vemulapalli, Xuhui Jia, Ching-Hui Chen
Based on the proposed GSA module, we introduce new standalone global attention-based deep networks that use GSA modules instead of convolutions to model pixel interactions.
no code implementations • 6 Oct 2020 • Zhuoran Shen, Irwan Bello, Raviteja Vemulapalli, Xuhui Jia, Ching-Hui Chen
Based on the proposed GSA module, we introduce new standalone global attention-based deep networks that use GSA modules instead of convolutions to model pixel interactions.
7 code implementations • NeurIPS 2019 • Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jonathon Shlens
The natural question that arises is whether attention can be a stand-alone primitive for vision models instead of serving as just an augmentation on top of convolutions.
14 code implementations • ICCV 2019 • Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, Quoc V. Le
Convolutional networks have been the paradigm of choice in many computer vision applications.
Ranked #114 on
Image Classification
on CIFAR-100
(using extra training data)
2 code implementations • ICLR 2019 • Irwan Bello, Sayali Kulkarni, Sagar Jain, Craig Boutilier, Ed Chi, Elad Eban, Xiyang Luo, Alan Mackey, Ofer Meshi
Ranking is a central task in machine learning and information retrieval.
no code implementations • 8 Aug 2018 • Maximilian Alber, Irwan Bello, Barret Zoph, Pieter-Jan Kindermans, Prajit Ramachandran, Quoc Le
The back-propagation algorithm is the cornerstone of deep learning.
3 code implementations • 21 Sep 2017 • Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le
We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures.
no code implementations • ICML 2017 • Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le
We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures.
10 code implementations • 29 Nov 2016 • Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, Samy Bengio
Despite the computational expense, without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes.