Methods > Computer Vision > Object Detection Models

Detection Transformer

Introduced by Carion et al. in End-to-End Object Detection with Transformers

Detr, or Detection Transformer, is a set-based object detector using a Transformer on top of a convolutional backbone. It uses a conventional CNN backbone to learn a 2D representation of an input image. The model flattens it and supplements it with a positional encoding before passing it into a transformer encoder. A transformer decoder then takes as input a small fixed number of learned positional embeddings, which we call object queries, and additionally attends to the encoder output. We pass each output embedding of the decoder to a shared feed forward network (FFN) that predicts either a detection (class and bounding box) or a “no object” class.

Source: End-to-End Object Detection with Transformers

Latest Papers

Efficient DETR: Improving End-to-End Object Detector with Dense Prior
Zhuyu YaoJiangbo AiBoxun LiChi Zhang
You Only Look One-level Feature
| Qiang ChenYingming WangTong YangXiangyu ZhangJian ChengJian Sun
Fast Convergence of DETR with Spatially Modulated Co-Attention
| Peng GaoMinghang ZhengXiaogang WangJifeng DaiHongsheng Li
TrackFormer: Multi-Object Tracking with Transformers
Tim MeinhardtAlexander KirillovLaura Leal-TaixeChristoph Feichtenhofer
DETR for Crowd Pedestrian Detection
| Matthieu LinChuming LiXingyuan BuMing SunChen LinJunjie YanWanli OuyangZhidong Deng
MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
| Huiyu WangYukun ZhuHartwig AdamAlan YuilleLiang-Chieh Chen
Rethinking Transformer-based Set Prediction for Object Detection
Zhiqing SunShengcao CaoYiming YangKris Kitani
End-to-End Object Detection with Adaptive Clustering Transformer
Minghang ZhengPeng GaoXiaogang WangHongsheng LiHao Dong
UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
| Zhigang DaiBolun CaiYugeng LinJunying Chen
Deformable DETR: Deformable Transformers for End-to-End Object Detection
| Xizhou ZhuWeijie SuLewei LuBin LiXiaogang WangJifeng Dai
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
| Mathilde CaronIshan MisraJulien MairalPriya GoyalPiotr BojanowskiArmand Joulin
End-to-End Object Detection with Transformers
| Nicolas CarionFrancisco MassaGabriel SynnaeveNicolas UsunierAlexander KirillovSergey Zagoruyko