TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Action Recognition	Something-Something V1	EAN ResNet50 (single clip, center crop,8+16 ensemble, with sparse Transformer)	Top 1 Accuracy	57.2	# 14
Action Recognition	Something-Something V1	EAN ResNet50 (single clip, center crop,8+16 ensemble, with sparse Transformer)	Top 5 Accuracy	83.9	# 11

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ean-event-adaptive-network-for-enhanced/action-recognition-in-videos-on-something-1)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something-1?p=ean-event-adaptive-network-for-enhanced)`

EAN: Event Adaptive Network for Enhanced Action Recognition

22 Jul 2021 · Yuan Tian, Yichao Yan, Guangtao Zhai, Guodong Guo, Zhiyong Gao ·

Efficiently modeling spatial-temporal information in videos is crucial for action recognition. To achieve this goal, state-of-the-art methods typically employ the convolution operator and the dense interaction modules such as non-local blocks. However, these methods cannot accurately fit the diverse events in videos. On the one hand, the adopted convolutions are with fixed scales, thus struggling with events of various scales. On the other hand, the dense interaction modeling paradigm only achieves sub-optimal performance as action-irrelevant parts bring additional noises for the final prediction. In this paper, we propose a unified action recognition framework to investigate the dynamic nature of video content by introducing the following designs. First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events. Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer, which yields a sparse paradigm. We call the proposed framework as Event Adaptive Network (EAN) because both key designs are adaptive to the input video content. To exploit the short-term motions within local segments, we propose a novel and efficient Latent Motion Code (LMC) module, further improving the performance of the framework. Extensive experiments on several large-scale video datasets, e.g., Something-to-Something V1&V2, Kinetics, and Diving48, verify that our models achieve state-of-the-art or competitive performances at low FLOPs. Codes are available at: https://github.com/tianyuan168326/EAN-Pytorch.

PDF Abstract

Code

Add Remove Mark official

tianyuan168326/EAN-Pytorch official

Tasks

Add Remove

Action Recognition

Datasets

ImageNet

Something-Something V1

Results from the Paper

Edit

Ranked #14 on Action Recognition on Something-Something V1

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Action Recognition	Something-Something V1	EAN ResNet50 (single clip, center crop,8+16 ensemble, with sparse Transformer)	Top 1 Accuracy	57.2	# 14		Compare
Action Recognition	Something-Something V1		Top 5 Accuracy	83.9	# 11		Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

EAN: Event Adaptive Network for Enhanced Action Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove