Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition

19 Apr 2018  ·  Okan Köpüklü, Neslihan Köse, Gerhard Rigoll ·

Acquiring spatio-temporal states of an action is the most crucial step for action classification. In this paper, we propose a data level fusion strategy, Motion Fused Frames (MFFs), designed to fuse motion information into static images as better representatives of spatio-temporal states of an action. MFFs can be used as input to any deep learning architecture with very little modification on the network. We evaluate MFFs on hand gesture recognition tasks using three video datasets - Jester, ChaLearn LAP IsoGD and NVIDIA Dynamic Hand Gesture Datasets - which require capturing long-term temporal relations of hand movements. Our approach obtains very competitive performance on Jester and ChaLearn benchmarks with the classification accuracies of 96.28% and 57.4%, respectively, while achieving state-of-the-art performance with 84.7% accuracy on NVIDIA benchmark.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Hand Gesture Recognition ChaLean test 8-MFFs-3f1c Accuracy 56.7 # 1
Hand Gesture Recognition ChaLearn val 8-MFFs-3f1c (5 crop) Accuracy 57.4 # 1
Hand Gesture Recognition Jester test DRX3D Top 1 Accuracy 96.6 # 1
Hand Gesture Recognition Jester val 8-MFFs-3f1c (5 crop) Top 1 Accuracy 96.33 # 1
Top 5 Accuracy 99.86 # 1
Hand Gesture Recognition NVGesture 8-MFFs-3f1c Accuracy 84.7 # 5


No methods listed for this paper. Add relevant methods here