Our approach comprises three phases: 1) developing a sign language dictionary encompassing all glosses present in a target sign language dataset; 2) training an isolated sign language recognition model on augmented signs using both conventional classification loss and our novel saliency loss; 3) employing a sliding window approach on the input sign sequence and feeding each sign clip to the well-optimized model for online recognition.
The objective of this paper is to develop a functional system for translating spoken languages into sign languages, referred to as Spoken2Sign translation.
Sign languages are visual languages which convey information by signers' handshape, facial expression, body movement, and so forth.
Ranked #1 on Sign Language Recognition on WLASL-2000
The first task enhances the visual module, which is sensitive to the insufficient training problem, from the perspective of consistency.
Ranked #7 on Sign Language Recognition on CSL-Daily
RGB videos, however, are raw signals with substantial visual redundancy, leading the encoder to overlook the key information for sign language understanding.
The backbone of most deep-learning-based continuous sign language recognition (CSLR) models consists of a visual module, a sequential module, and an alignment module.