Search Results for author: Irwan Bello

Found 13 papers, 9 papers with code

ST-MoE: Designing Stable and Transferable Sparse Expert Models

1 code implementation17 Feb 2022 Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, William Fedus

But advancing the state-of-the-art across a broad set of natural language tasks has been hindered by training instabilities and uncertain quality during fine-tuning.

 Ranked #1 on Common Sense Reasoning on ARC (Challenge) (using extra training data)

Natural Language Processing Natural Questions +3

Revisiting 3D ResNets for Video Recognition

1 code implementation3 Sep 2021 Xianzhi Du, Yeqing Li, Yin Cui, Rui Qian, Jing Li, Irwan Bello

A recent work from Bello shows that training and scaling strategies may be more significant than model architectures for visual recognition.

Action Classification Contrastive Learning +1

Revisiting ResNets: Improved Training and Scaling Strategies

4 code implementations NeurIPS 2021 Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph

Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.

Action Classification Document Image Classification +2

LambdaNetworks: Modeling Long-Range Interactions Without Attention

6 code implementations ICLR 2021 Irwan Bello

We present lambda layers -- an alternative framework to self-attention -- for capturing long-range interactions between an input and structured contextual information (e. g. a pixel surrounded by other pixels).

Image Classification Instance Segmentation +3

Global Self-Attention Networks

no code implementations1 Jan 2021 Zhuoran Shen, Irwan Bello, Raviteja Vemulapalli, Xuhui Jia, Ching-Hui Chen

Based on the proposed GSA module, we introduce new standalone global attention-based deep networks that use GSA modules instead of convolutions to model pixel interactions.

Video Understanding

Global Self-Attention Networks for Image Recognition

no code implementations6 Oct 2020 Zhuoran Shen, Irwan Bello, Raviteja Vemulapalli, Xuhui Jia, Ching-Hui Chen

Based on the proposed GSA module, we introduce new standalone global attention-based deep networks that use GSA modules instead of convolutions to model pixel interactions.

Video Understanding

Stand-Alone Self-Attention in Vision Models

7 code implementations NeurIPS 2019 Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jonathon Shlens

The natural question that arises is whether attention can be a stand-alone primitive for vision models instead of serving as just an augmentation on top of convolutions.

object-detection Object Detection

Backprop Evolution

no code implementations8 Aug 2018 Maximilian Alber, Irwan Bello, Barret Zoph, Pieter-Jan Kindermans, Prajit Ramachandran, Quoc Le

The back-propagation algorithm is the cornerstone of deep learning.

Neural Optimizer Search with Reinforcement Learning

3 code implementations21 Sep 2017 Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le

We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures.

Machine Translation reinforcement-learning +1

Neural Optimizer Search using Reinforcement Learning

no code implementations ICML 2017 Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le

We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures.

Machine Translation reinforcement-learning +1

Neural Combinatorial Optimization with Reinforcement Learning

11 code implementations29 Nov 2016 Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, Samy Bengio

Despite the computational expense, without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes.

Combinatorial Optimization reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.