Sign languages are visual languages which convey information by signers' handshape, facial expression, body movement, and so forth.
Ranked #1 on Sign Language Recognition on WLASL-2000
We name the CSLR model trained with the above auxiliary tasks as consistency-enhanced CSLR, which performs well on signer-dependent datasets in which all signers appear during both training and testing.
Ranked #7 on Sign Language Recognition on CSL-Daily
RGB videos, however, are raw signals with substantial visual redundancy, leading the encoder to overlook the key information for sign language understanding.
The backbone of most deep-learning-based continuous sign language recognition (CSLR) models consists of a visual module, a sequential module, and an alignment module.