MARS: Motion-Augmented RGB Stream for Action Recognition

CVPR 2019 Nieves Crasto Philippe Weinzaepfel Karteek Alahari Cordelia Schmid

Most state-of-the-art methods for action recognition consist of a two-stream architecture with 3D convolutions: an appearance stream for RGB frames and a motion stream for optical flow frames. Although combining flow with RGB improves the performance, the cost of computing accurate optical flow is high, and increases action recognition latency... (read more)

PDF Abstract

Evaluation Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK USES EXTRA
TRAINING DATA
COMPARE
Action Recognition In Videos HMDB-51 MARS+RGB+FLow (64 frames, Kinetics pretrained) Average accuracy of 3 splits 80.9 # 6
Action Classification Kinetics-400 MARS+RGB+Flow (64 frames) Accuracy 74.9 # 12
Action Classification Kinetics-400 MARS+RGB+Flow (16 frames) Accuracy 68.9 # 15
Action Classification MiniKinetics MARS+RGB+Flow (16 frames) Top-1 Accuracy 73.5 # 1
Action Recognition In Videos Something-Something V1 MARS+RGB+Flow (64 frames, Kinetics pretrained) Top 1 Accuracy 53.0 # 3
Action Recognition In Videos Something-Something V1 MARS+RGB+Flow (16 frames, Kinetics pretrained) Top 1 Accuracy 40.4 # 18
Action Recognition In Videos UCF101 MARS+RGB+Flow (16 frames) 3-fold Accuracy 95.8 # 9
Action Recognition In Videos UCF101 MARS+RGB+Flow (64 frames, Kinetics pretrained) 3-fold Accuracy 97.8 # 2