TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Object Detection	ImageNet VID	SLTnet FPN-X101	MAP	82.4	# 21
Video Object Detection	USC-GRAD-STDdb	SLTnet FPN-X101	AP 0.5	44.9	# 1
Video Object Detection	USC-GRAD-STDdb	SLTnet FPN-X101	AP	16.6	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/short-term-anchor-linking-and-long-term-self/video-object-detection-on-usc-grad-stddb)](https://paperswithcode.com/sota/video-object-detection-on-usc-grad-stddb?p=short-term-anchor-linking-and-long-term-self)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/short-term-anchor-linking-and-long-term-self/video-object-detection-on-imagenet-vid)](https://paperswithcode.com/sota/video-object-detection-on-imagenet-vid?p=short-term-anchor-linking-and-long-term-self)`

Short-term anchor linking and long-term self-guided attention for video object detection

18 Apr 2021 · Daniel Cores, Víctor M Brea, Manuel Mucientes ·

We present a new network architecture able to take advantage of spatio-temporal information available in videos to boost object detection precision. First, box features are associated and aggregated by linking proposals that come from the same anchor box in the nearby frames. Then, we design a new attention module that aggregates short-term enhanced box features to exploit long-term spatio-temporal information. This module takes advantage of geometrical features in the long-term for the first time in the video object detection domain. Finally, a spatio-temporal double head is fed with both spatial information from the reference frame and the aggregated information that takes into account the short- and long-term temporal context. We have tested our proposal in five video object detection datasets with very different characteristics, in order to prove its robustness in a wide number of scenarios. Non-parametric statistical tests show that our approach outperforms the state-of-the-art. Our code is available at https://github.com/daniel-cores/SLTnet.

PDF

Code

Add Remove Mark official

daniel-cores/SLTnet

Tasks

Add Remove

Object

object-detection

Object Detection

Video Object Detection

Datasets

ImageNet VID USC-GRAD-STDdb

Results from the Paper

Add Remove

Ranked #1 on Video Object Detection on USC-GRAD-STDdb

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Object Detection	ImageNet VID	SLTnet FPN-X101	MAP	82.4	# 21	Compare
Video Object Detection	USC-GRAD-STDdb	SLTnet FPN-X101	AP 0.5	44.9	# 1	Compare
Video Object Detection	USC-GRAD-STDdb	SLTnet FPN-X101	AP	16.6	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Short-term anchor linking and long-term self-guided attention for video object detection

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove