MARS: Motion-Augmented RGB Stream for Action Recognition

Most state-of-the-art methods for action recognition consist of a two-stream architecture with 3D convolutions: an appearance stream for RGB frames and a motion stream for optical flow frames. Although combining flow with RGB improves the performance, the cost of computing accurate optical flow is high, and increases action recognition latency... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK USES EXTRA
TRAINING DATA
BENCHMARK
Action Recognition HMDB-51 MARS+RGB+FLow (64 frames, Kinetics pretrained) Average accuracy of 3 splits 80.9 # 10
Action Classification Kinetics-400 MARS+RGB+Flow (16 frames) Vid acc@1 68.9 # 81
Action Classification Kinetics-400 MARS+RGB+Flow (64 frames) Vid acc@1 74.9 # 62
Action Classification MiniKinetics MARS+RGB+Flow (16 frames) Top-1 Accuracy 73.5 # 1
Action Recognition Something-Something V1 MARS+RGB+Flow (16 frames, Kinetics pretrained) Top 1 Accuracy 40.4 # 43
Action Recognition Something-Something V1 MARS+RGB+Flow (64 frames, Kinetics pretrained) Top 1 Accuracy 53.0 # 11
Action Recognition UCF101 MARS+RGB+Flow (16 frames) 3-fold Accuracy 95.8 # 33
Action Recognition UCF101 MARS+RGB+Flow (64 frames, Kinetics pretrained) 3-fold Accuracy 97.8 # 7

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet