Two-Stream Video Classification with Cross-Modality Attention

1 Aug 2019Lu ChiGuiyu TianYadong MuQi Tian

Fusing multi-modality information is known to be able to effectively bring significant improvement in video classification. However, the most popular method up to now is still simply fusing each stream's prediction scores at the last stage... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Action Classification Kinetics-400 CMA iter1 (16 frames) Accuracy 75.98 # 12
Action Recognition In Videos UCF101 CMA iter1-S 3-fold Accuracy 96.5 # 7

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet