ISTR: End-to-End Instance Segmentation with Transformers

3 May 2021 Jie Hu Liujuan Cao Yao Lu Shengchuan Zhang Yan Wang Ke Li Feiyue Huang Ling Shao Rongrong Ji

End-to-end paradigms significantly improve the accuracy of various deep-learning-based computer vision models. To this end, tasks like object detection have been upgraded by replacing non-end-to-end components, such as removing non-maximum suppression by training with a set loss based on bipartite matching... (read more)

PDF Abstract

Datasets


Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Object Detection COCO test-dev ISTR (ResNet50-FPN-3x) box AP 46.8 # 57
Instance Segmentation COCO test-dev ISTR (ResNet50-FPN-3x) mask AP 38.6 # 30
Instance Segmentation COCO test-dev ISTR (ResNet50-FPN-3x, multi-scale) APS 22.1 # 12
APM 40.4 # 18
APL 50.6 # 20
Object Detection COCO test-dev ISTR (ResNet50-FPN-3x, multi-scale) APS 27.8 # 58
APM 48.7 # 61
APL 59.9 # 49
Instance Segmentation COCO test-dev ISTR (ResNet101-FPN-3x, multi-scale) mask AP 39.9 # 21
APS 22.8 # 6
APM 41.9 # 14
APL 52.3 # 14
Object Detection COCO test-dev ISTR (ResNet101-FPN-3x, multi-scale) box AP 48.1 # 48
APS 28.7 # 55
APM 50.4 # 45
APL 61.5 # 35

Methods used in the Paper


METHOD TYPE
Multi-Head Attention
Attention Modules
Adam
Stochastic Optimization
Layer Normalization
Normalization
Residual Connection
Skip Connections
Label Smoothing
Regularization
BPE
Subword Segmentation
Dropout
Regularization
Softmax
Output Functions
Dense Connections
Feedforward Networks
Scaled Dot-Product Attention
Attention Mechanisms
Transformer
Transformers