TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Object Detection	COCO minival	Pix2seq (ViT-L)	box AP	50.0	# 77
Object Detection	COCO minival	Pix2seq (R50)	box AP	42.6	# 137
Object Detection	COCO minival	Pix2seq (ViT-B)	box AP	47.1	# 92
Object Detection	COCO minival	Pix2seq (R50-C4)	box AP	47.3	# 91
Object Detection	COCO minival	Pix2seq (R101-DC5)	box AP	45.0	# 107
Object Detection	COCO minival	Pix2seq (R101-DC5)	AP50	63.2	# 51
Object Detection	COCO minival	Pix2seq (R101-DC5)	AP75	48.6	# 36
Object Detection	COCO minival	Pix2seq (R101-DC5)	APS	28.2	# 26
Object Detection	COCO minival	Pix2seq (R101-DC5)	APM	48.9	# 22
Object Detection	COCO minival	Pix2seq (R101-DC5)	APL	60.4	# 27
Object Detection	COCO minival	Pix2seq (R50-DC5 )	box AP	43.2	# 129
Object Detection	COCO minival	Pix2seq (R50-DC5 )	AP50	61.0	# 71
Object Detection	COCO minival	Pix2seq (R50-DC5 )	AP75	46.1	# 54
Object Detection	COCO minival	Pix2seq (R50-DC5 )	APS	26.6	# 36
Object Detection	COCO minival	Pix2seq (R50-DC5 )	APM	47	# 37
Object Detection	COCO minival	Pix2seq (R50-DC5 )	APL	58.6	# 39

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pix2seq-a-language-modeling-framework-for/object-detection-on-coco-minival)](https://paperswithcode.com/sota/object-detection-on-coco-minival?p=pix2seq-a-language-modeling-framework-for)`

Pix2seq: A Language Modeling Framework for Object Detection

ICLR 2022 · Ting Chen, Saurabh Saxena, Lala Li, David J. Fleet, Geoffrey Hinton ·

We present Pix2Seq, a simple and generic framework for object detection. Unlike existing approaches that explicitly integrate prior knowledge about the task, we cast object detection as a language modeling task conditioned on the observed pixel inputs. Object descriptions (e.g., bounding boxes and class labels) are expressed as sequences of discrete tokens, and we train a neural network to perceive the image and generate the desired sequence. Our approach is based mainly on the intuition that if a neural network knows about where and what the objects are, we just need to teach it how to read them out. Beyond the use of task-specific data augmentations, our approach makes minimal assumptions about the task, yet it achieves competitive results on the challenging COCO dataset, compared to highly specialized and well optimized detection algorithms.

PDF Abstract ICLR 2022 PDF ICLR 2022 Abstract

Code

Add Remove Mark official

google-research/pix2seq official

↳ Quickstart in

Colab

806

gaopengcuhk/Stable-Pix2Seq

232

gaopengcuhk/Unofficial-Pix2Seq

164

moein-shariatnia/Pix2Seq

↳ Quickstart in

Colab

109

gaopengcuhk/Pretrained-Pix2Seq

See all 6 implementations

Tasks

Add Remove

Language Modelling

Object

object-detection

Object Detection

Datasets

MS COCO

Objects365

Results from the Paper

Edit

Ranked #77 on Object Detection on COCO minival (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Object Detection	COCO minival	Pix2seq (ViT-L)	box AP	50.0	# 77	Compare
Object Detection	COCO minival	Pix2seq (R50)	box AP	42.6	# 137	Compare
Object Detection	COCO minival	Pix2seq (ViT-B)	box AP	47.1	# 92	Compare
Object Detection	COCO minival	Pix2seq (R50-C4)	box AP	47.3	# 91	Compare
Object Detection	COCO minival	Pix2seq (R101-DC5)	box AP	45.0	# 107	Compare
			AP50	63.2	# 51	Compare
			AP75	48.6	# 36	Compare
			APS	28.2	# 26	Compare
			APM	48.9	# 22	Compare
			APL	60.4	# 27	Compare
Object Detection	COCO minival	Pix2seq (R50-DC5 )	box AP	43.2	# 129	Compare
			AP50	61.0	# 71	Compare
			AP75	46.1	# 54	Compare
			APS	26.6	# 36	Compare
			APM	47	# 37	Compare
			APL	58.6	# 39	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Pix2seq: A Language Modeling Framework for Object Detection

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove