Such design fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix and thus strengthens the capacity of visual representation.
Ranked #288 on Image Classification on ImageNet
Light-weight convolutional neural networks (CNNs) are the de-facto for mobile vision tasks.
Ranked #768 on Image Classification on ImageNet
This paper presents an efficient multi-scale vision Transformer, called ResT, that capably served as a general-purpose backbone for image recognition.
Ranked #378 on Image Classification on ImageNet
Specifically, we append two types of attention modules on top of traditional dilated FCN, which model the semantic interdependencies in spatial and channel dimensions respectively.
Ranked #6 on Semantic Segmentation on Trans10K
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification.
Ranked #3 on Image Classification on Certificate Verification
Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited representation capability.
Ranked #905 on Image Classification on ImageNet
Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2. 251%, surpassing the winning entry of 2016 by a relative improvement of ~25%.
Ranked #56 on Image Classification on CIFAR-10
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.
Ranked #2 on Multimodal Machine Translation on Multi30K (BLUE (DE-EN) metric)
We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks.
In this paper, we propose an efficient Shuffle Attention (SA) module to address this issue, which adopts Shuffle Units to combine two types of attention mechanisms effectively.