TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Scene Graph Generation	Visual Genome	SpeaQ (without reweighting)	Recall@50	32.9	# 1
Scene Graph Generation	Visual Genome	SpeaQ (without reweighting)	Recall@100	36.0	# 1
Scene Graph Generation	Visual Genome	SpeaQ (without reweighting)	mean Recall @100	14.1	# 4
Scene Graph Generation	Visual Genome	SpeaQ (without reweighting)	R@100	36.0	# 3
Scene Graph Generation	Visual Genome	SpeaQ (without reweighting)	mR@100	14.1	# 2
Scene Graph Generation	Visual Genome	SpeaQ (without reweighting)	mR@50	11.8	# 2
Scene Graph Generation	Visual Genome	SpeaQ (with reweighting)	Recall@50	32.1	# 2
Scene Graph Generation	Visual Genome	SpeaQ (with reweighting)	Recall@100	35.5	# 2
Scene Graph Generation	Visual Genome	SpeaQ (with reweighting)	mean Recall @100	17.6	# 2
Scene Graph Generation	Visual Genome	SpeaQ (with reweighting)	R@100	35.5	# 4
Scene Graph Generation	Visual Genome	SpeaQ (with reweighting)	mR@100	17.6	# 3
Scene Graph Generation	Visual Genome	SpeaQ (with reweighting)	mR@50	15.1	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/groupwise-query-specialization-and-quality/scene-graph-generation-on-visual-genome)](https://paperswithcode.com/sota/scene-graph-generation-on-visual-genome?p=groupwise-query-specialization-and-quality)`

Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection

26 Mar 2024 · Jongha Kim, Jihwan Park, Jinyoung Park, Jinyoung Kim, Sehyung Kim, Hyunwoo J. Kim ·

Visual Relationship Detection (VRD) has seen significant advancements with Transformer-based architectures recently. However, we identify two key limitations in a conventional label assignment for training Transformer-based VRD models, which is a process of mapping a ground-truth (GT) to a prediction. Under the conventional assignment, an unspecialized query is trained since a query is expected to detect every relation, which makes it difficult for a query to specialize in specific relations. Furthermore, a query is also insufficiently trained since a GT is assigned only to a single prediction, therefore near-correct or even correct predictions are suppressed by being assigned no relation as a GT. To address these issues, we propose Groupwise Query Specialization and Quality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization trains a specialized query by dividing queries and relations into disjoint groups and directing a query in a specific query group solely toward relations in the corresponding relation group. Quality-Aware Multi-Assignment further facilitates the training by assigning a GT to multiple predictions that are significantly close to a GT in terms of a subject, an object, and the relation in between. Experimental results and analyses show that SpeaQ effectively trains specialized queries, which better utilize the capacity of a model, resulting in consistent performance gains with zero additional inference cost across multiple VRD models and benchmarks. Code is available at https://github.com/mlvlab/SpeaQ.

PDF Abstract

Code

Add Remove Mark official

mlvlab/speaq official

Tasks

Add Remove

Relation

Relationship Detection

Scene Graph Generation

Visual Relationship Detection

Datasets

Visual Genome

HICO-DET

Results from the Paper

Edit

Ranked #1 on Scene Graph Generation on Visual Genome

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Scene Graph Generation	Visual Genome	SpeaQ (without reweighting)	Recall@50	32.9	# 1	Compare
			Recall@100	36.0	# 1	Compare
			mean Recall @100	14.1	# 4	Compare
			R@100	36.0	# 3	Compare
			mR@100	14.1	# 2	Compare
			mR@50	11.8	# 2	Compare
Scene Graph Generation	Visual Genome	SpeaQ (with reweighting)	Recall@50	32.1	# 2	Compare
			Recall@100	35.5	# 2	Compare
			mean Recall @100	17.6	# 2	Compare
			R@100	35.5	# 4	Compare
			mR@100	17.6	# 3	Compare
			mR@50	15.1	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove