Sign Language Translation
23 papers with code • 4 benchmarks • 11 datasets
Given a video containing sign language, the task is to predict the translation into (written) spoken language.
Image credit: How2Sign
These leaderboards are used to track progress in Sign Language Translation
Most implemented papers
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation
Concretely, we pretrain the sign-to-gloss visual network on the general domain of human actions and the within-domain of a sign-to-gloss dataset, and pretrain the gloss-to-text translation network on the general domain of a multilingual corpus and the within-domain of a gloss-to-text corpus.
Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison
Based on this new large-scale dataset, we are able to experiment with several deep learning methods for word-level sign recognition and evaluate their performances in large scale scenarios.
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences.
Neural Sign Language Translation
SLR seeks to recognize a sequence of continuous signs but neglects the underlying rich grammatical and linguistic structures of sign language that differ from spoken language.
Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation
We report state-of-the-art sign language recognition and translation results achieved by our Sign Language Transformers.
Better Sign Language Translation with STMC-Transformer
This contradicts previous claims that GT gloss translation acts as an upper bound for SLT performance and reveals that glosses are an inefficient representation of sign language.
ASL Recognition with Metric-Learning based Lightweight Network
In the past decades the set of human tasks that are solved by machines was extended dramatically.
How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language
Towards this end, we introduce How2Sign, a multimodal and multiview continuous American Sign Language (ASL) dataset, consisting of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth.
Frozen Pretrained Transformers for Neural Sign Language Translation
Our results show that pretrained language models can be used to improve sign language translation performance and that the self-attention patterns in BERT transfer in zero-shot to the encoder and decoder of sign language translation models.
Stochastic Transformer Networks with Linear Competing Units: Application to end-to-end SL Translation
In this paper, we attenuate this need, by introducing an end-to-end SLT model that does not entail explicit use of glosses; the model only needs text groundtruth.