Search Results for author: Benjia Zhou

Found 11 papers, 6 papers with code

Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation

no code implementations • 19 Mar 2024 • Zhigang Chen, Benjia Zhou, Jun Li, Jun Wan, Zhen Lei, Ning Jiang, Quan Lu, Guoqing Zhao

Although some approaches work towards gloss-free SLT through jointly training the visual encoder and translation network, these efforts still suffer from poor performance and inefficient use of the powerful Large Language Model (LLM).

Ranked #1 on Gloss-free Sign Language Translation on CSL-Daily

Gloss-free Sign Language Translation Language Modelling +3

Paper
Add Code

PMMTalk: Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features

no code implementations • 5 Dec 2023 • Tianshun Han, Shengnan Gui, Yiqing Huang, Baihui Li, Lijian Liu, Benjia Zhou, Ning Jiang, Quan Lu, Ruicong Zhi, Yanyan Liang, Du Zhang, Jun Wan

The framework entails three modules: PMMTalk encoder, cross-modal alignment module, and PMMTalk decoder.

speech-recognition Speech Recognition +1

Paper
Add Code

Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition

1 code implementation • 23 Aug 2023 • Yujun Ma, Benjia Zhou, Ruili Wang, Pichao Wang

RGB-D action and gesture recognition remain an interesting topic in human-centered scene understanding, primarily due to the multiple granularities and large variation in human motion.

Gesture Recognition Scene Understanding

Paper
Code

Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining

1 code implementation • ICCV 2023 • Benjia Zhou, Zhigang Chen, Albert Clapés, Jun Wan, Yanyan Liang, Sergio Escalera, Zhen Lei, Du Zhang

Many previous methods employ an intermediate representation, i. e., gloss sequences, to facilitate SLT, thus transforming it into a two-stage task of sign language recognition (SLR) followed by sign language translation (SLT).

Ranked #2 on Gloss-free Sign Language Translation on PHOENIX14T

Gloss-free Sign Language Translation Self-Supervised Learning +3

Paper
Code

A Unified Multimodal De- and Re-coupling Framework for RGB-D Motion Recognition

1 code implementation • 16 Nov 2022 • Benjia Zhou, Pichao Wang, Jun Wan, Yanyan Liang, Fan Wang

Although improving motion recognition to some extent, these methods still face sub-optimal situations in the following aspects: (i) Data augmentation, i. e., the scale of the RGB-D datasets is still limited, and few efforts have been made to explore novel data augmentation strategies for videos; (ii) Optimization mechanism, i. e., the tightly space-time-entangled network structure brings more challenges to spatiotemporal information modeling; And (iii) cross-modal knowledge fusion, i. e., the high similarity between multimodal representations caused to insufficient late fusion.

Ranked #3 on Action Recognition on NTU RGB+D

Action Recognition Data Augmentation +2

Paper
Code

Effective Vision Transformer Training: A Data-Centric Perspective

no code implementations • 29 Sep 2022 • Benjia Zhou, Pichao Wang, Jun Wan, Yanyan Liang, Fan Wang

To achieve these two purposes, we propose a novel data-centric ViT training framework to dynamically measure the ``difficulty'' of training samples and generate ``effective'' samples for models at different training stages.

Paper
Add Code

Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

1 code implementation • CVPR 2022 • Benjia Zhou, Pichao Wang, Jun Wan, Yanyan Liang, Fan Wang, Du Zhang, Zhen Lei, Hao Li, Rong Jin

Decoupling spatiotemporal representation refers to decomposing the spatial and temporal features into dimension-independent factors.

Ranked #1 on Hand Gesture Recognition on NVGesture

Hand Gesture Recognition

Paper
Code

Regional Attention with Architecture-Rebuilt 3D Network for RGB-D Gesture Recognition

1 code implementation • 10 Feb 2021 • Benjia Zhou, Yunan Li, Jun Wan

Meanwhile, a more adaptive architecture-searched network structure can also perform better than the block-fixed ones like Resnet since it increases the diversity of features in different stages of the network better.

Gesture Recognition Neural Architecture Search

Paper
Code

DSAM: A Distance Shrinking with Angular Marginalizing Loss for High Performance Vehicle Re-identificatio

no code implementations • 12 Nov 2020 • Jiangtao Kong, Yu Cheng, Benjia Zhou, Kai Li, Junliang Xing

To obtain a high-performance vehicle ReID model, we present a novel Distance Shrinking with Angular Marginalizing (DSAM) loss function to perform hybrid learning in both the Original Feature Space (OFS) and the Feature Angular Space (FAS) using the local verification and the global identification information.

Person Re-Identification Vehicle Re-Identification

Paper
Add Code

Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition

1 code implementation • 21 Aug 2020 • Zitong Yu, Benjia Zhou, Jun Wan, Pichao Wang, Haoyu Chen, Xin Liu, Stan Z. Li, Guoying Zhao

Gesture recognition has attracted considerable attention owing to its great potential in applications.

Gesture Recognition Neural Architecture Search

Paper
Code

Cross-ethnicity Face Anti-spoofing Recognition Challenge: A Review

no code implementations • 23 Apr 2020 • Ajian Liu, Xuan Li, Jun Wan, Sergio Escalera, Hugo Jair Escalante, Meysam Madadi, Yi Jin, Zhuoyuan Wu, Xiaogang Yu, Zichang Tan, Qi Yuan, Ruikun Yang, Benjia Zhou, Guodong Guo, Stan Z. Li

Although ethnic bias has been verified to severely affect the performance of face recognition systems, it still remains an open research problem in face anti-spoofing.

Face Anti-Spoofing Face Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.