TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Recognition	Jester (Gesture Recognition)	DirecFormer	Val	98.15	# 1
Action Classification	Kinetics-400	DirecFormer	Acc@1	82.75	# 66
Action Classification	Kinetics-400	DirecFormer	Acc@5	94.86	# 57
Action Recognition	Something-Something V2	DirecFormer	Top-1 Accuracy	64.94	# 90
Action Recognition	Something-Something V2	DirecFormer	Top-5 Accuracy	87.9	# 82

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/direcformer-a-directed-attention-in/action-recognition-on-jester-gesture)](https://paperswithcode.com/sota/action-recognition-on-jester-gesture?p=direcformer-a-directed-attention-in)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/direcformer-a-directed-attention-in/action-classification-on-kinetics-400)](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=direcformer-a-directed-attention-in)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/direcformer-a-directed-attention-in/action-recognition-in-videos-on-something)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something?p=direcformer-a-directed-attention-in)`

DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

CVPR 2022 · Thanh-Dat Truong, Quoc-Huy Bui, Chi Nhan Duong, Han-Seok Seo, Son Lam Phung, Xin Li, Khoa Luu ·

Human action recognition has recently become one of the popular research topics in the computer vision community. Various 3D-CNN based methods have been presented to tackle both the spatial and temporal dimensions in the task of video action recognition with competitive results. However, these methods have suffered some fundamental limitations such as lack of robustness and generalization, e.g., how does the temporal ordering of video frames affect the recognition results? This work presents a novel end-to-end Transformer-based Directed Attention (DirecFormer) framework for robust action recognition. The method takes a simple but novel perspective of Transformer-based approach to understand the right order of sequence actions. Therefore, the contributions of this work are three-fold. Firstly, we introduce the problem of ordered temporal learning issues to the action recognition problem. Secondly, a new Directed Attention mechanism is introduced to understand and provide attentions to human actions in the right order. Thirdly, we introduce the conditional dependency in action sequence modeling that includes orders and classes. The proposed approach consistently achieves the state-of-the-art (SOTA) results compared with the recent action recognition methods, on three standard large-scale benchmarks, i.e. Jester, Kinetics-400 and Something-Something-V2.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Code

Add Remove Mark official

uark-cviu/direcformer official

Tasks

Add Remove

Action Classification

Action Recognition

Action Recognition In Videos

Gesture Recognition

Temporal Action Localization

Datasets

ImageNet

Kinetics

Kinetics 400

Something-Something V2

Jester (Gesture Recognition)

Results from the Paper

Edit

Ranked #1 on Action Recognition on Jester (Gesture Recognition)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Recognition	Jester (Gesture Recognition)	DirecFormer	Val	98.15	# 1	Compare
Action Classification	Kinetics-400	DirecFormer	Acc@1	82.75	# 66	Compare
Action Classification	Kinetics-400	DirecFormer	Acc@5	94.86	# 57	Compare
Action Recognition	Something-Something V2	DirecFormer	Top-1 Accuracy	64.94	# 90	Compare
Action Recognition	Something-Something V2	DirecFormer	Top-5 Accuracy	87.9	# 82	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove