Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video

5 Jun 2015 β€’ Lionel Pigou β€’ AΓ€ron van den Oord β€’ Sander Dieleman β€’ Mieke Van Herreweghe β€’ Joni Dambre

Recent studies have demonstrated the power of recurrent neural networks for machine translation, image captioning and speech recognition. For the task of capturing temporal structure in video, however, there still remain numerous open research questions... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Gesture Recognition Montalbano Temp Conv + LSTM Error rate 2.77 # 1
Jaccard (Mean) 90.6 # 1
Precision 94.49 # 1
Recall 94.57 # 1

Methods used in the Paper


METHOD TYPE
πŸ€– No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet