Methods > Computer Vision > Object Detection Models

Detection Transformer

Introduced by Carion et al. in End-to-End Object Detection with Transformers

Detr, or Detection Transformer, is a set-based object detector using a Transformer on top of a convolutional backbone. It uses a conventional CNN backbone to learn a 2D representation of an input image. The model flattens it and supplements it with a positional encoding before passing it into a transformer encoder. A transformer decoder then takes as input a small fixed number of learned positional embeddings, which we call object queries, and additionally attends to the encoder output. We pass each output embedding of the decoder to a shared feed forward network (FFN) that predicts either a detection (class and bounding box) or a “no object” class.

Source: End-to-End Object Detection with Transformers

Latest Papers

PAPER DATE
Efficient DETR: Improving End-to-End Object Detector with Dense Prior
Zhuyu YaoJiangbo AiBoxun LiChi Zhang
2021-04-03
You Only Look One-level Feature
| Qiang ChenYingming WangTong YangXiangyu ZhangJian ChengJian Sun
2021-03-17
Fast Convergence of DETR with Spatially Modulated Co-Attention
| Peng GaoMinghang ZhengXiaogang WangJifeng DaiHongsheng Li
2021-01-19
TrackFormer: Multi-Object Tracking with Transformers
Tim MeinhardtAlexander KirillovLaura Leal-TaixeChristoph Feichtenhofer
2021-01-07
DETR for Crowd Pedestrian Detection
| Matthieu LinChuming LiXingyuan BuMing SunChen LinJunjie YanWanli OuyangZhidong Deng
2020-12-12
MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
| Huiyu WangYukun ZhuHartwig AdamAlan YuilleLiang-Chieh Chen
2020-12-01
Rethinking Transformer-based Set Prediction for Object Detection
Zhiqing SunShengcao CaoYiming YangKris Kitani
2020-11-21
End-to-End Object Detection with Adaptive Clustering Transformer
Minghang ZhengPeng GaoXiaogang WangHongsheng LiHao Dong
2020-11-18
UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
| Zhigang DaiBolun CaiYugeng LinJunying Chen
2020-11-18
Deformable DETR: Deformable Transformers for End-to-End Object Detection
| Xizhou ZhuWeijie SuLewei LuBin LiXiaogang WangJifeng Dai
2020-10-08
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
| Mathilde CaronIshan MisraJulien MairalPriya GoyalPiotr BojanowskiArmand Joulin
2020-06-17
End-to-End Object Detection with Transformers
| Nicolas CarionFrancisco MassaGabriel SynnaeveNicolas UsunierAlexander KirillovSergey Zagoruyko
2020-05-26

Categories