TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Instance Segmentation	OVIS validation	GRAtt-VIS (ResNet-50)	mask AP	36.2	# 21
Video Instance Segmentation	OVIS validation	GRAtt-VIS (ResNet-50)	AP50	60.8	# 19
Video Instance Segmentation	OVIS validation	GRAtt-VIS (ResNet-50)	AP75	36.8	# 21
Video Instance Segmentation	OVIS validation	GRAtt-VIS (ResNet-50)	AR1	16.8	# 14
Video Instance Segmentation	OVIS validation	GRAtt-VIS (ResNet-50)	AR10	40.1	# 17
Video Instance Segmentation	OVIS validation	GRAtt-VIS (Swin-L)	mask AP	45.7	# 9
Video Instance Segmentation	OVIS validation	GRAtt-VIS (Swin-L)	AP50	69.1	# 9
Video Instance Segmentation	OVIS validation	GRAtt-VIS (Swin-L)	AP75	47.8	# 8
Video Instance Segmentation	OVIS validation	GRAtt-VIS (Swin-L)	AR1	19.2	# 6
Video Instance Segmentation	OVIS validation	GRAtt-VIS (Swin-L)	AR10	49.4	# 8
Video Instance Segmentation	YouTube-VIS 2021	GRAtt-VIS (Swin-L)	mask AP	60.3	# 4
Video Instance Segmentation	YouTube-VIS 2021	GRAtt-VIS (Swin-L)	AP50	81.3	# 7
Video Instance Segmentation	YouTube-VIS 2021	GRAtt-VIS (Swin-L)	AP75	67.1	# 6
Video Instance Segmentation	YouTube-VIS 2021	GRAtt-VIS (Swin-L)	AR10	64.5	# 7
Video Instance Segmentation	YouTube-VIS 2021	GRAtt-VIS (Swin-L)	AR1	48.8	# 3
Video Instance Segmentation	YouTube-VIS 2021	GRAtt-VIS (ResNet-50)	mask AP	48.9	# 19
Video Instance Segmentation	YouTube-VIS 2021	GRAtt-VIS (ResNet-50)	AP50	69.2	# 21
Video Instance Segmentation	YouTube-VIS 2021	GRAtt-VIS (ResNet-50)	AP75	53.1	# 20
Video Instance Segmentation	YouTube-VIS 2021	GRAtt-VIS (ResNet-50)	AR10	56.0	# 18
Video Instance Segmentation	YouTube-VIS 2021	GRAtt-VIS (ResNet-50)	AR1	41.8	# 19

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gratt-vis-gated-residual-attention-for-auto/video-instance-segmentation-on-youtube-vis-2)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-2?p=gratt-vis-gated-residual-attention-for-auto)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gratt-vis-gated-residual-attention-for-auto/video-instance-segmentation-on-ovis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-ovis-1?p=gratt-vis-gated-residual-attention-for-auto)`

GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation

26 May 2023 · Tanveer Hannan, Rajat Koner, Maximilian Bernhard, Suprosanna Shit, Bjoern Menze, Volker Tresp, Matthias Schubert, Thomas Seidl ·

Recent trends in Video Instance Segmentation (VIS) have seen a growing reliance on online methods to model complex and lengthy video sequences. However, the degradation of representation and noise accumulation of the online methods, especially during occlusion and abrupt changes, pose substantial challenges. Transformer-based query propagation provides promising directions at the cost of quadratic memory attention. However, they are susceptible to the degradation of instance features due to the above-mentioned challenges and suffer from cascading effects. The detection and rectification of such errors remain largely underexplored. To this end, we introduce \textbf{GRAtt-VIS}, \textbf{G}ated \textbf{R}esidual \textbf{Att}ention for \textbf{V}ideo \textbf{I}nstance \textbf{S}egmentation. Firstly, we leverage a Gumbel-Softmax-based gate to detect possible errors in the current frame. Next, based on the gate activation, we rectify degraded features from its past representation. Such a residual configuration alleviates the need for dedicated memory and provides a continuous stream of relevant instance features. Secondly, we propose a novel inter-instance interaction using gate activation as a mask for self-attention. This masking strategy dynamically restricts the unrepresentative instance queries in the self-attention and preserves vital information for long-term tracking. We refer to this novel combination of Gated Residual Connection and Masked Self-Attention as \textbf{GRAtt} block, which can easily be integrated into the existing propagation-based framework. Further, GRAtt blocks significantly reduce the attention overhead and simplify dynamic temporal modeling. GRAtt-VIS achieves state-of-the-art performance on YouTube-VIS and the highly challenging OVIS dataset, significantly improving over previous methods. Code is available at \url{https://github.com/Tanveer81/GRAttVIS}.

PDF Abstract

Code

Add Remove Mark official

tanveer81/grattvis official

Tasks

Add Remove

Instance Segmentation

Semantic Segmentation

Video Instance Segmentation

Datasets

YouTube-VIS 2019

OVIS YouTube-VIS 2021

Results from the Paper

Edit

Ranked #4 on Video Instance Segmentation on YouTube-VIS 2021 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Instance Segmentation	OVIS validation	GRAtt-VIS (ResNet-50)	mask AP	36.2	# 21	Compare
			AP50	60.8	# 19	Compare
			AP75	36.8	# 21	Compare
			AR1	16.8	# 14	Compare
			AR10	40.1	# 17	Compare
Video Instance Segmentation	OVIS validation	GRAtt-VIS (Swin-L)	mask AP	45.7	# 9	Compare
			AP50	69.1	# 9	Compare
			AP75	47.8	# 8	Compare
			AR1	19.2	# 6	Compare
			AR10	49.4	# 8	Compare
Video Instance Segmentation	YouTube-VIS 2021	GRAtt-VIS (Swin-L)	mask AP	60.3	# 4	Compare
			AP50	81.3	# 7	Compare
			AP75	67.1	# 6	Compare
			AR10	64.5	# 7	Compare
			AR1	48.8	# 3	Compare
Video Instance Segmentation	YouTube-VIS 2021	GRAtt-VIS (ResNet-50)	mask AP	48.9	# 19	Compare
			AP50	69.2	# 21	Compare
			AP75	53.1	# 20	Compare
			AR10	56.0	# 18	Compare
			AR1	41.8	# 19	Compare

Methods

Add Remove

Residual Connection

Edit Social Preview

GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove