Sign Language Translation
51 papers with code • 6 benchmarks • 17 datasets
Given a video containing sign language, the task is to predict the translation into (written) spoken language.
Image credit: How2Sign
Datasets
Most implemented papers
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation
Concretely, we pretrain the sign-to-gloss visual network on the general domain of human actions and the within-domain of a sign-to-gloss dataset, and pretrain the gloss-to-text translation network on the general domain of a multilingual corpus and the within-domain of a gloss-to-text corpus.
Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison
Based on this new large-scale dataset, we are able to experiment with several deep learning methods for word-level sign recognition and evaluate their performances in large scale scenarios.
Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation
We report state-of-the-art sign language recognition and translation results achieved by our Sign Language Transformers.
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences.
An Open-Source Gloss-Based Baseline for Spoken to Signed Language Translation
Sign language translation systems are complex and require many components.
SignBank+: Preparing a Multilingual Sign Language Dataset for Machine Translation Using Large Language Models
We introduce SignBank+, a clean version of the SignBank dataset, optimized for machine translation between spoken language text and SignWriting, a phonetic sign language writing system.
Neural Sign Language Translation
SLR seeks to recognize a sequence of continuous signs but neglects the underlying rich grammatical and linguistic structures of sign language that differ from spoken language.
Better Sign Language Translation with STMC-Transformer
This contradicts previous claims that GT gloss translation acts as an upper bound for SLT performance and reveals that glosses are an inefficient representation of sign language.
ASL Recognition with Metric-Learning based Lightweight Network
In the past decades the set of human tasks that are solved by machines was extended dramatically.
How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language
Towards this end, we introduce How2Sign, a multimodal and multiview continuous American Sign Language (ASL) dataset, consisting of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth.