no code implementations • 8 Jul 2022 • Junfu Pu, Ying Shan
The cross-modal transformer decoder achieves the capability of synthesizing smooth dance motion sequences, which keeps a consistency with key poses at corresponding positions, by introducing the local neighbor position embedding.
no code implementations • 7 Jul 2022 • Jiashuo Yu, Junfu Pu, Ying Cheng, Rui Feng, Ying Shan
Although audio-visual representation has been proved to be applicable in many downstream tasks, the representation of dancing videos, which is more specific and always accompanied by music with complex auditory contents, remains challenging and uninvestigated.
no code implementations • CVPR 2021 • Hao Zhou, Wengang Zhou, Weizhen Qi, Junfu Pu, Houqiang Li
Finally, the synthetic parallel data serves as a strong supplement for the end-to-end training of the encoder-decoder SLT framework.
Ranked #5 on Sign Language Translation on CSL-Daily
no code implementations • 11 Oct 2020 • Junfu Pu, Wengang Zhou, Hezhen Hu, Houqiang Li
Continuous sign language recognition (SLR) deals with unaligned video-text pair and uses the word error rate (WER), i. e., edit distance, as the main evaluation metric.
no code implementations • 24 Aug 2020 • Hezhen Hu, Wengang Zhou, Junfu Pu, Houqiang Li
Sign language recognition (SLR) is a challenging problem, involving complex manual features, i. e., hand gestures, and fine-grained non-manual features (NMFs), i. e., facial expression, mouth shapes, etc.
no code implementations • CVPR 2019 • Junfu Pu, Wengang Zhou, Houqiang Li
Our framework consists of two modules: a 3D convolutional residual network (3D-ResNet) for feature learning and an encoder-decoder network with connectionist temporal classification (CTC) for sequence modelling.