no code implementations • 5 Sep 2024 • Lingyu Xiong, Xize Cheng, Jintao Tan, Xianjia Wu, Xiandong Li, Lei Zhu, Fei Ma, Minglei Li, Huang Xu, Zhihu Hu
Ultimately, we inject the previously generated talking segmentation and style codes into a mask-guided StyleGAN to synthesize video frame.
no code implementations • 3 Aug 2024 • Jintao Tan, Xize Cheng, Lingyu Xiong, Lei Zhu, Xiandong Li, Xianjia Wu, Kai Gong, Minglei Li, Yi Cai
Audio-driven talking head generation is a significant and challenging task applicable to various fields such as virtual avatars, film production, and online conferences.
no code implementations • 5 Jun 2024 • Minglei Li, Peng Ye, Yongqi Huang, Lin Zhang, Tao Chen, Tong He, Jiayuan Fan, Wanli Ouyang
Parameter-efficient fine-tuning (PEFT) has become increasingly important as foundation models continue to grow in both popularity and size.
no code implementations • 29 May 2024 • Guangliang Pan, Jie Li, Minglei Li
The advantage of this fusion mode is that it can deeply capture the long-term dependence of multichannel spectrum data.
1 code implementation • CVPR 2024 • Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu
While previous works mostly generate structural human skeletons, resulting in the omission of appearance information, we focus on the direct generation of audio-driven co-speech gesture videos in this work.
no code implementations • LREC 2016 • Timothy Wong, Claire Li, Sam Lam, Billy Chiu, Qin Lu, Minglei Li, Dan Xiong, Roy Shing Yu, Vincent T. Y. Ng
This paper reports our work on building up a Cantonese Speech-to-Text (STT) system with a syllable based acoustic model.
no code implementations • 25 Dec 2023 • Peng Ye, Yongqi Huang, Chongjun Tu, Minglei Li, Tao Chen, Tong He, Wanli Ouyang
We first validate eight manually-defined partial fine-tuning strategies across kinds of datasets and vision transformer architectures, and find that some partial fine-tuning strategies (e. g., ffn only or attention only) can achieve better performance with fewer tuned parameters than full fine-tuning, and selecting appropriate layers is critical to partial fine-tuning.
no code implementations • 23 Dec 2023 • Xize Cheng, Rongjie Huang, Linjun Li, Tao Jin, Zehan Wang, Aoxiong Yin, Minglei Li, Xinyu Duan, Changpeng Yang, Zhou Zhao
However, talking head translation, converting audio-visual speech (i. e., talking head video) from one language into another, still confronts several challenges compared to audio speech: (1) Existing methods invariably rely on cascading, synthesizing via both audio and text, resulting in delays and cascading errors.
1 code implementation • 23 Oct 2023 • Hengchang Guo, Qilong Zhang, Junwei Luo, Feng Guo, Wenbin Zhang, Xiaodong Su, Minglei Li
Compared with state-of-the-art approaches, our blind watermarking can achieve better performance: averagely improve the bit accuracy by 5. 28\% and 5. 93\% against single and combined attacks, respectively, and show less file size increment and better visual quality.
1 code implementation • 13 Sep 2023 • Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, Changpeng Yang, Zonghong Dai
The automatic co-speech gesture generation draws much attention in computer animation.
1 code implementation • 26 Aug 2023 • Sicheng Yang, Haiwei Xue, Zhensong Zhang, Minglei Li, Zhiyong Wu, Xiaofei Wu, Songcen Xu, Zonghong Dai
In this paper, we introduce the DiffuseStyleGesture+, our solution for the Generation and Evaluation of Non-verbal Behavior for Embodied Agents (GENEA) Challenge 2023, which aims to foster the development of realistic, automated systems for generating conversational gestures.
1 code implementation • 27 Jul 2023 • Lingdong Kong, Yaru Niu, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit R. Cottereau, Liangjun Zhang, Hesheng Wang, Wei Tsang Ooi, Ruijie Zhu, Ziyang Song, Li Liu, Tianzhu Zhang, Jun Yu, Mohan Jing, Pengwei Li, Xiaohua Qi, Cheng Jin, Yingfeng Chen, Jie Hou, Jie Zhang, Zhen Kan, Qiang Ling, Liang Peng, Minglei Li, Di Xu, Changpeng Yang, Yuanqi Yao, Gang Wu, Jian Kuai, Xianming Liu, Junjun Jiang, Jiamian Huang, Baojun Li, Jiale Chen, Shuang Zhang, Sun Ao, Zhenyu Li, Runze Chen, Haiyong Luo, Fang Zhao, Jingze Yu
In this paper, we summarize the winning solutions from the RoboDepth Challenge -- an academic competition designed to facilitate and advance robust OoD depth estimation.
1 code implementation • CVPR 2023 • Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Haolin Zhuang
Levenshtein distance based on audio quantization as a similarity metric of corresponding speech of gestures helps match more appropriate gestures with speech, and solves the alignment problem of speech and gestures well.
no code implementations • CVPR 2023 • Mingyang Sun, Mengchen Zhao, Yaqing Hou, Minglei Li, Huang Xu, Songcen Xu, Jianye Hao
There is a growing demand of automatically synthesizing co-speech gestures for virtual characters.
1 code implementation • 25 Aug 2022 • Sicheng Yang, Zhiyong Wu, Minglei Li, Mengchen Zhao, Jiuxin Lin, Liyang Chen, Weihong Bao
This paper describes the ReprGesture entry to the Generation and Evaluation of Non-verbal Behaviour for Embodied Agents (GENEA) challenge 2022.
no code implementations • Findings (NAACL) 2022 • Mengjie Zhao, Fei Mi, Yasheng Wang, Minglei Li, Xin Jiang, Qun Liu, Hinrich Schütze
We propose LMTurk, a novel approach that treats few-shot learners as crowdsourcing workers.
no code implementations • 24 Jul 2021 • Guangliang Pan, Zitong Liu, Wei Wang, Minglei Li
Simulation results show that the DDLSD scheme outperforms the existing traditional methods in terms of improving channel estimation and signal detection performance.
no code implementations • IJCNLP 2017 • Yunfei Long, Qin Lu, Rong Xiang, Minglei Li, Chu-Ren Huang
This paper proposes a novel method to incorporate speaker profiles into an attention based LSTM model for fake news detection.
no code implementations • IJCNLP 2017 • Minglei Li, Qin Lu, Yunfei Long
In this paper, we investigate the effectiveness of different affective lexicons through sentiment analysis of phrases.
no code implementations • EMNLP 2017 • Yunfei Long, Qin Lu, Rong Xiang, Minglei Li, Chu-Ren Huang
Evaluations show the CBA based method outperforms the state-of-the-art local context based attention methods significantly.
no code implementations • LREC 2016 • Minglei Li, Yunfei Long, Lu Qin, Wenjie Li
Secondly, a SVM based classifier is used to select the data whose natural labels are consistent with the predicted labels.