Detection Transformer

Introduced by Carion et al. in End-to-End Object Detection with Transformers

Detr, or Detection Transformer, is a set-based object detector using a Transformer on top of a convolutional backbone. It uses a conventional CNN backbone to learn a 2D representation of an input image. The model flattens it and supplements it with a positional encoding before passing it into a transformer encoder. A transformer decoder then takes as input a small fixed number of learned positional embeddings, which we call object queries, and additionally attends to the encoder output. We pass each output embedding of the decoder to a shared feed forward network (FFN) that predicts either a detection (class and bounding box) or a “no object” class.

Source: End-to-End Object Detection with Transformers

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Object Detection	107	34.74%
Semantic Segmentation	11	3.57%
Instance Segmentation	9	2.92%
Image Classification	7	2.27%
2D Object Detection	6	1.95%
Few-Shot Object Detection	6	1.95%
Autonomous Driving	5	1.62%
Real-Time Object Detection	5	1.62%
Semi-Supervised Object Detection	5	1.62%