Search Results for author: Minglei Li

Found 18 papers, 7 papers with code

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

1 code implementation • 2 Apr 2024 • Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu

While previous works mostly generate structural human skeletons, resulting in the omission of appearance information, we focus on the direct generation of audio-driven co-speech gesture videos in this work.

Video Generation

Paper
Code

Syllable based DNN-HMM Cantonese Speech to Text System

no code implementations • LREC 2016 • Timothy Wong, Claire Li, Sam Lam, Billy Chiu, Qin Lu, Minglei Li, Dan Xiong, Roy Shing Yu, Vincent T. Y. Ng

This paper reports our work on building up a Cantonese Speech-to-Text (STT) system with a syllable based acoustic model.

speech-recognition Speech Recognition

Paper
Add Code

Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers

no code implementations • 25 Dec 2023 • Peng Ye, Yongqi Huang, Chongjun Tu, Minglei Li, Tao Chen, Tong He, Wanli Ouyang

We first validate eight manually-defined partial fine-tuning strategies across kinds of datasets and vision transformer architectures, and find that some partial fine-tuning strategies (e. g., ffn only or attention only) can achieve better performance with fewer tuned parameters than full fine-tuning, and selecting appropriate layers is critical to partial fine-tuning.

Paper
Add Code

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation

no code implementations • 23 Dec 2023 • Xize Cheng, Rongjie Huang, Linjun Li, Tao Jin, Zehan Wang, Aoxiong Yin, Minglei Li, Xinyu Duan, Changpeng Yang, Zhou Zhao

However, talking head translation, converting audio-visual speech (i. e., talking head video) from one language into another, still confronts several challenges compared to audio speech: (1) Existing methods invariably rely on cascading, synthesizing via both audio and text, resulting in delays and cascading errors.

Self-Supervised Learning Speech-to-Speech Translation +1

Paper
Add Code

Practical Deep Dispersed Watermarking with Synchronization and Fusion

1 code implementation • 23 Oct 2023 • Hengchang Guo, Qilong Zhang, Junwei Luo, Feng Guo, Wenbin Zhang, Xiaodong Su, Minglei Li

Compared with state-of-the-art approaches, our blind watermarking can achieve better performance: averagely improve the bit accuracy by 5. 28\% and 5. 93\% against single and combined attacks, respectively, and show less file size increment and better visual quality.

Paper
Code

UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons

1 code implementation • 13 Sep 2023 • Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, Changpeng Yang, Zonghong Dai

The automatic co-speech gesture generation draws much attention in computer animation.

Gesture Generation

Paper
Code

The DiffuseStyleGesture+ entry to the GENEA Challenge 2023

1 code implementation • 26 Aug 2023 • Sicheng Yang, Haiwei Xue, Zhensong Zhang, Minglei Li, Zhiyong Wu, Xiaofei Wu, Songcen Xu, Zonghong Dai

In this paper, we introduce the DiffuseStyleGesture+, our solution for the Generation and Evaluation of Non-verbal Behavior for Embodied Agents (GENEA) Challenge 2023, which aims to foster the development of realistic, automated systems for generating conversational gestures.

120

Paper
Code

The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation

1 code implementation • 27 Jul 2023 • Lingdong Kong, Yaru Niu, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit R. Cottereau, Ding Zhao, Liangjun Zhang, Hesheng Wang, Wei Tsang Ooi, Ruijie Zhu, Ziyang Song, Li Liu, Tianzhu Zhang, Jun Yu, Mohan Jing, Pengwei Li, Xiaohua Qi, Cheng Jin, Yingfeng Chen, Jie Hou, Jie Zhang, Zhen Kan, Qiang Ling, Liang Peng, Minglei Li, Di Xu, Changpeng Yang, Yuanqi Yao, Gang Wu, Jian Kuai, Xianming Liu, Junjun Jiang, Jiamian Huang, Baojun Li, Jiale Chen, Shuang Zhang, Sun Ao, Zhenyu Li, Runze Chen, Haiyong Luo, Fang Zhao, Jingze Yu

In this paper, we summarize the winning solutions from the RoboDepth Challenge -- an academic competition designed to facilitate and advance robust OoD depth estimation.

Depth Estimation Image Restoration +1

246

Paper
Code

QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation

1 code implementation • CVPR 2023 • Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Haolin Zhuang

Levenshtein distance based on audio quantization as a similarity metric of corresponding speech of gestures helps match more appropriate gestures with speech, and solves the alignment problem of speech and gestures well.

Gesture Generation Quantization

Paper
Code

Co-Speech Gesture Synthesis by Reinforcement Learning With Contrastive Pre-Trained Rewards

no code implementations • CVPR 2023 • Mingyang Sun, Mengchen Zhao, Yaqing Hou, Minglei Li, Huang Xu, Songcen Xu, Jianye Hao

There is a growing demand of automatically synthesizing co-speech gestures for virtual characters.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

The ReprGesture entry to the GENEA Challenge 2022

1 code implementation • 25 Aug 2022 • Sicheng Yang, Zhiyong Wu, Minglei Li, Mengchen Zhao, Jiuxin Lin, Liyang Chen, Weihong Bao

This paper describes the ReprGesture entry to the Generation and Evaluation of Non-verbal Behaviour for Embodied Agents (GENEA) challenge 2022.

Gesture Generation Representation Learning

Paper
Code

LMTurk: Few-Shot Learners as Crowdsourcing Workers in a Language-Model-as-a-Service Framework

no code implementations • Findings (NAACL) 2022 • Mengjie Zhao, Fei Mi, Yasheng Wang, Minglei Li, Xin Jiang, Qun Liu, Hinrich Schütze

We propose LMTurk, a novel approach that treats few-shot learners as crowdsourcing workers.

Active Learning Language Modelling

Paper
Add Code

A Signal Detection Scheme Based on Deep Learning in OFDM Systems

no code implementations • 24 Jul 2021 • Guangliang Pan, Zitong Liu, Wei Wang, Minglei Li

Simulation results show that the DDLSD scheme outperforms the existing traditional methods in terms of improving channel estimation and signal detection performance.

Time Series Time Series Analysis