no code implementations • 19 Aug 2024 • Zhigang Chen, Benjia Zhou, Yiqing Huang, Jun Wan, Yibo Hu, Hailin Shi, Yanyan Liang, Zhen Lei, Du Zhang
Notably, C${^2}$RL improves the BLEU-4 score by +5. 3 on P14T, +10. 6 on CSL-daily, +6. 2 on OpenASL, and +1. 3 on How2Sign.
Gloss-free Sign Language Translation
Representation Learning
+3
no code implementations • 19 Mar 2024 • Zhigang Chen, Benjia Zhou, Jun Li, Jun Wan, Zhen Lei, Ning Jiang, Quan Lu, Guoqing Zhao
Although some approaches work towards gloss-free SLT through jointly training the visual encoder and translation network, these efforts still suffer from poor performance and inefficient use of the powerful Large Language Model (LLM).
Ranked #4 on
Gloss-free Sign Language Translation
on PHOENIX14T
no code implementations • 5 Dec 2023 • Tianshun Han, Shengnan Gui, Yiqing Huang, Baihui Li, Lijian Liu, Benjia Zhou, Ning Jiang, Quan Lu, Ruicong Zhi, Yanyan Liang, Du Zhang, Jun Wan
The framework entails three modules: PMMTalk encoder, cross-modal alignment module, and PMMTalk decoder.
1 code implementation • 23 Aug 2023 • Yujun Ma, Benjia Zhou, Ruili Wang, Pichao Wang
RGB-D action and gesture recognition remain an interesting topic in human-centered scene understanding, primarily due to the multiple granularities and large variation in human motion.
1 code implementation • ICCV 2023 • Benjia Zhou, Zhigang Chen, Albert Clapés, Jun Wan, Yanyan Liang, Sergio Escalera, Zhen Lei, Du Zhang
Many previous methods employ an intermediate representation, i. e., gloss sequences, to facilitate SLT, thus transforming it into a two-stage task of sign language recognition (SLR) followed by sign language translation (SLT).
Ranked #6 on
Gloss-free Sign Language Translation
on CSL-Daily
1 code implementation • 16 Nov 2022 • Benjia Zhou, Pichao Wang, Jun Wan, Yanyan Liang, Fan Wang
Although improving motion recognition to some extent, these methods still face sub-optimal situations in the following aspects: (i) Data augmentation, i. e., the scale of the RGB-D datasets is still limited, and few efforts have been made to explore novel data augmentation strategies for videos; (ii) Optimization mechanism, i. e., the tightly space-time-entangled network structure brings more challenges to spatiotemporal information modeling; And (iii) cross-modal knowledge fusion, i. e., the high similarity between multimodal representations caused to insufficient late fusion.
Ranked #4 on
Action Recognition
on NTU RGB+D
no code implementations • 29 Sep 2022 • Benjia Zhou, Pichao Wang, Jun Wan, Yanyan Liang, Fan Wang
To achieve these two purposes, we propose a novel data-centric ViT training framework to dynamically measure the ``difficulty'' of training samples and generate ``effective'' samples for models at different training stages.
1 code implementation • CVPR 2022 • Benjia Zhou, Pichao Wang, Jun Wan, Yanyan Liang, Fan Wang, Du Zhang, Zhen Lei, Hao Li, Rong Jin
Decoupling spatiotemporal representation refers to decomposing the spatial and temporal features into dimension-independent factors.
Ranked #1 on
Hand Gesture Recognition
on NVGesture
1 code implementation • 10 Feb 2021 • Benjia Zhou, Yunan Li, Jun Wan
Meanwhile, a more adaptive architecture-searched network structure can also perform better than the block-fixed ones like Resnet since it increases the diversity of features in different stages of the network better.
no code implementations • 12 Nov 2020 • Jiangtao Kong, Yu Cheng, Benjia Zhou, Kai Li, Junliang Xing
To obtain a high-performance vehicle ReID model, we present a novel Distance Shrinking with Angular Marginalizing (DSAM) loss function to perform hybrid learning in both the Original Feature Space (OFS) and the Feature Angular Space (FAS) using the local verification and the global identification information.
1 code implementation • 21 Aug 2020 • Zitong Yu, Benjia Zhou, Jun Wan, Pichao Wang, Haoyu Chen, Xin Liu, Stan Z. Li, Guoying Zhao
Gesture recognition has attracted considerable attention owing to its great potential in applications.
no code implementations • 23 Apr 2020 • Ajian Liu, Xuan Li, Jun Wan, Sergio Escalera, Hugo Jair Escalante, Meysam Madadi, Yi Jin, Zhuoyuan Wu, Xiaogang Yu, Zichang Tan, Qi Yuan, Ruikun Yang, Benjia Zhou, Guodong Guo, Stan Z. Li
Although ethnic bias has been verified to severely affect the performance of face recognition systems, it still remains an open research problem in face anti-spoofing.