Region-based Non-local Operation for Video Classification

17 Jul 2020 Guoxi Huang Adrian G. Bors

Convolutional Neural Networks (CNNs) model long-range dependencies by deeply stacking convolution operations with small window sizes, which makes the optimizations difficult. This paper presents region-based non-local (RNL) operations as a family of self-attention mechanisms, which can directly capture long-range dependencies without using a deep stack of local operations... (read more)

PDF Abstract PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Action Classification Kinetics-400 RNL+TSM Ensemble(ResNet50, 8 + 16 frames) Accuracy 77.4 # 11
Action Recognition Something-Something V1 RNL+TSM Ensemble(ResNet50, ImageNet pretrained) Top 1 Accuracy 52.7 # 8
Top 5 Accuracy 81.5 # 6
Action Recognition Something-Something V1 RNL+TSM Ensemble(R50+R101, ImageNet pretrained) Top 1 Accuracy 54.1 # 5
Top 5 Accuracy 82.2 # 5

Methods used in the Paper