Automatic Sign Language Translation (SLT) systems can be a great asset to improve the communication with and within deaf communities. Currently, the main issue preventing effective translation models lays in the low availability of labelled data, which hinders the use of modern deep learning models. SLT is a complex problem that involves many subtasks, of which handshape recognition is the most important. We compare a series of models specially tailored for small datasets to improve their performance on handshape recognition tasks. We evaluate Wide-DenseNet and few-shot Prototypical Network models with and without transfer learning, and also using Model-Agnostic Meta-Learning (MAML). Our findings indicate that Wide-DenseNet without transfer learning and Prototipical Networks with transfer learning provide the best results. Prototypical networks, particularly, are vastly superior when using less than 30 samples, while Wide-DenseNet achieves the best results with more samples. On the other hand, MAML does not improve performance in any scenario. These results can help to design better SLT models.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Hand Gesture Recognition LSA16 Prototypical Networks + CNN Accuracy 98.38 # 1
Hand Gesture Recognition RWTH-PHOENIX Handshapes dev set DenseNet Accuracy 96.05 # 1