Detr, or Detection Transformer, is a set-based object detector using a Transformer on top of a convolutional backbone. It uses a conventional CNN backbone to learn a 2D representation of an input image. The model flattens it and supplements it with a positional encoding before passing it into a transformer encoder. A transformer decoder then takes as input a small fixed number of learned positional embeddings, which we call object queries, and additionally attends to the encoder output. We pass each output embedding of the decoder to a shared feed forward network (FFN) that predicts either a detection (class and bounding box) or a “no object” class.
Source: End-to-End Object Detection with TransformersPAPER | DATE |
---|---|
Efficient DETR: Improving End-to-End Object Detector with Dense Prior
• • • |
2021-04-03 |
You Only Look One-level Feature
• • • • • |
2021-03-17 |
Fast Convergence of DETR with Spatially Modulated Co-Attention
• • • • |
2021-01-19 |
TrackFormer: Multi-Object Tracking with Transformers
• • • |
2021-01-07 |
DETR for Crowd Pedestrian Detection
• • • • • • • |
2020-12-12 |
MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
• • • • |
2020-12-01 |
Rethinking Transformer-based Set Prediction for Object Detection
• • • |
2020-11-21 |
End-to-End Object Detection with Adaptive Clustering Transformer
• • • • |
2020-11-18 |
UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
• • • |
2020-11-18 |
Deformable DETR: Deformable Transformers for End-to-End Object Detection
• • • • • |
2020-10-08 |
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
• • • • • |
2020-06-17 |
End-to-End Object Detection with Transformers
• • • • • |
2020-05-26 |
TASK | PAPERS | SHARE |
---|---|---|
Object Detection | 8 | 40.00% |
Panoptic Segmentation | 3 | 15.00% |
Multi-Object Tracking | 1 | 5.00% |
Object Tracking | 1 | 5.00% |
Video Understanding | 1 | 5.00% |
Pedestrian Detection | 1 | 5.00% |
Multi-Task Learning | 1 | 5.00% |
Unsupervised Pre-training | 1 | 5.00% |
Image Classification | 1 | 5.00% |
COMPONENT | TYPE |
|
---|---|---|
![]() |
Convolutions | |
![]() |
Feedforward Networks | |
![]() |
Convolutional Neural Networks | (optional) |
![]() |
Transformers |