The Bottleneck Transformer (BoTNet) is an image classification model that incorporates self-attention for multiple computer vision tasks including image classification, object detection and instance segmentation. By just replacing the spatial convolutions with global self-attention in the final three bottleneck blocks of a ResNet and no other changes, the approach improves upon baselines significantly on instance segmentation and object detection while also reducing the parameters, with minimal overhead in latency.
Source: Bottleneck Transformers for Visual RecognitionPAPER | DATE |
---|---|
Bottleneck Transformers for Visual Recognition
• • • • • |
2021-01-27 |
TASK | PAPERS | SHARE |
---|---|---|
Image Classification | 1 | 33.33% |
Instance Segmentation | 1 | 33.33% |
Object Detection | 1 | 33.33% |
COMPONENT | TYPE |
|
---|---|---|
![]() |
Image Model Blocks | |
![]() |
Convolutions | |
![]() |
Pooling Operations |