Learning Deep and Compact Models for Gesture Recognition

29 Dec 2017  ·  Koustav Mullick, Anoop M. Namboodiri ·

We look at the problem of developing a compact and accurate model for gesture recognition from videos in a deep-learning framework. Towards this we propose a joint 3DCNN-LSTM model that is end-to-end trainable and is shown to be better suited to capture the dynamic information in actions. The solution achieves close to state-of-the-art accuracy on the ChaLearn dataset, with only half the model size. We also explore ways to derive a much more compact representation in a knowledge distillation framework followed by model compression. The final model is less than $1~MB$ in size, which is less than one hundredth of our initial model, with a drop of $7\%$ in accuracy, and is suitable for real-time gesture recognition on mobile devices.

PDF Abstract


  Add Datasets introduced or used in this paper

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Gesture Recognition Chalearn 2014 3D-CNN + LSTM Accuracy 93.2 # 1