Search Results for author: Yongqi Wang

Found 10 papers, 0 papers with code

Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment

no code implementations • 14 Apr 2024 • Zhiqing Hong, Rongjie Huang, Xize Cheng, Yongqi Wang, RuiQi Li, Fuming You, Zhou Zhao, Zhimeng Zhang

A song is a combination of singing voice and accompaniment.

Music Generation Singing Voice Synthesis

Paper
Add Code

AUD-TGN: Advancing Action Unit Detection with Temporal Convolution and GPT-2 in Wild Audiovisual Contexts

no code implementations • 20 Mar 2024 • Jun Yu, Zerui Zhang, Zhihong Wei, Gongpeng Zhao, Zhongpeng Cai, Yongqi Wang, Guochen Xie, Jichao Zhu, Wangyuan Zhu

Leveraging the synergy of both audio data and visual data is essential for understanding human emotions and behaviors, especially in in-the-wild setting.

Action Unit Detection Facial Action Unit Detection

Paper
Add Code

Multimodal Fusion Method with Spatiotemporal Sequences and Relationship Learning for Valence-Arousal Estimation

no code implementations • 19 Mar 2024 • Jun Yu, Gongpeng Zhao, Yongqi Wang, Zhihong Wei, Yang Zheng, Zerui Zhang, Zhongpeng Cai, Guochen Xie, Jichao Zhu, Wangyuan Zhu

This paper presents our approach for the VA (Valence-Arousal) estimation task in the ABAW6 competition.

Arousal Estimation

Paper
Add Code

Exploring Facial Expression Recognition through Semi-Supervised Pretraining and Temporal Modeling

no code implementations • 18 Mar 2024 • Jun Yu, Zhihong Wei, Zhongpeng Cai, Gongpeng Zhao, Zerui Zhang, Yongqi Wang, Guochen Xie, Jichao Zhu, Wangyuan Zhu

Facial Expression Recognition (FER) plays a crucial role in computer vision and finds extensive applications across various fields.

Facial Expression Recognition Facial Expression Recognition (FER)

Paper
Add Code

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

no code implementations • 18 Mar 2024 • Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, RuiQi Li, Wenrui Liu, Fuming You, Tao Jin, Zhou Zhao

Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly.

Attribute Singing Voice Synthesis

Paper
Add Code

Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning

no code implementations • 2 Mar 2024 • Shuo Yang, Zirui Shang, Yongqi Wang, Derong Deng, Hongwei Chen, Qiyuan Cheng, Xinxiao wu

This paper proposes a novel framework for multi-label image recognition without any training data, called data-free framework, which uses knowledge of pre-trained Large Language Model (LLM) to learn prompts to adapt pretrained Vision-Language Model (VLM) like CLIP to multilabel classification.

Language Modelling Large Language Model

Paper
Add Code

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer

no code implementations • 14 Sep 2023 • Yongqi Wang, Jionghao Bai, Rongjie Huang, RuiQi Li, Zhiqing Hong, Zhou Zhao

Direct speech-to-speech translation (S2ST) with discrete self-supervised representations has achieved remarkable accuracy, but is unable to preserve the speaker timbre of the source speech during translation.

In-Context Learning Language Modelling +3

Paper
Add Code

Make-A-Voice: Unified Voice Synthesis With Discrete Representation

no code implementations • 30 May 2023 • Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Luping Liu, Zhenhui Ye, Ziyue Jiang, Chao Weng, Zhou Zhao, Dong Yu

Various applications of voice synthesis have been developed independently despite the fact that they generate "voice" as output in common.

Singing Voice Synthesis Voice Conversion

Paper
Add Code

Connecting Multi-modal Contrastive Representations

no code implementations • NeurIPS 2023 • Zehan Wang, Yang Zhao, Xize Cheng, Haifeng Huang, Jiageng Liu, Li Tang, Linjun Li, Yongqi Wang, Aoxiong Yin, Ziang Zhang, Zhou Zhao

This paper proposes a novel training-efficient method for learning MCR without paired data called Connecting Multi-modal Contrastive Representations (C-MCR).

3D Point Cloud Classification counterfactual +4

Paper
Add Code

FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis

no code implementations • 8 Jul 2022 • Yongqi Wang, Zhou Zhao

To tackle these problems, we propose FastLTS, a non-autoregressive end-to-end model which can directly synthesize high-quality speech audios from unconstrained talking videos with low latency, and has a relatively small model size.

Lip to Speech Synthesis Speech Synthesis

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.