Search Results for author: RuiQi Li

Found 29 papers, 11 papers with code

MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis

no code implementations26 Feb 2025 Ziyue Jiang, Yi Ren, RuiQi Li, Shengpeng Ji, Boyang Zhang, Zhenhui Ye, Chen Zhang, Bai Jionghao, Xiaoda Yang, Jialong Zuo, Yu Zhang, Rui Liu, Xiang Yin, Zhou Zhao

While recent zero-shot text-to-speech (TTS) models have significantly improved speech quality and expressiveness, mainstream systems still suffer from issues related to speech-text alignment modeling: 1) models without explicit speech-text alignment modeling exhibit less robustness, especially for hard sentences in practical applications; 2) predefined alignment-based models suffer from naturalness constraints of forced alignments.

Speech Synthesis Text to Speech

A Learnable Multi-views Contrastive Framework with Reconstruction Discrepancy for Medical Time-Series

no code implementations30 Jan 2025 Yifan Wang, Hongfeng Ai, RuiQi Li, Maowei Jiang, Cheng Jiang, Chenzhong Li

In medical time series disease diagnosis, two key challenges are identified. First, the high annotation cost of medical data leads to overfitting in models trained on label-limited, single-center datasets.

Contrastive Learning Diagnostic +1

MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization

no code implementations16 Oct 2024 RuiQi Li, Siqi Zheng, Xize Cheng, Ziang Zhang, Shengpeng Ji, Zhou Zhao

Generating music that aligns with the visual content of a video has been a challenging task, as it requires a deep understanding of visual semantics and involves generating music whose melody, rhythm, and dynamics harmonize with the visual narratives.

In-Context Learning Music Generation +1

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

1 code implementation24 Sep 2024 Yu Zhang, Ziyue Jiang, RuiQi Li, Changhao Pan, Jinzheng He, Rongjie Huang, Chuxin Wang, Zhou Zhao

To address these challenges, we introduce TCSinger, the first zero-shot SVS model for style transfer across cross-lingual speech and singing styles, along with multi-level style control.

Clustering Language Modelling +4

GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks

1 code implementation20 Sep 2024 Yu Zhang, Changhao Pan, Wenxiang Guo, RuiQi Li, Zhiyuan Zhu, Jialei Wang, Wenhao Xu, Jingyu Lu, Zhiqing Hong, Chuxin Wang, Lichao Zhang, Jinzheng He, Ziyue Jiang, Yuxin Chen, Chen Yang, Jiecheng Zhou, Xinyu Cheng, Zhou Zhao

The scarcity of high-quality and multi-task singing datasets significantly hinders the development of diverse controllable and personalized singing tasks, as existing singing datasets suffer from low quality, limited diversity of languages and singers, absence of multi-technique information and realistic music scores, and poor task suitability.

All Singing Voice Synthesis +2

Distributionally Robust Stochastic Data-Driven Predictive Control with Optimized Feedback Gain

no code implementations9 Sep 2024 RuiQi Li, John W. Simpson-Porco, Stephen L. Smith

We consider the problem of direct data-driven predictive control for unknown stochastic linear time-invariant (LTI) systems with partial state observation.

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

1 code implementation29 Aug 2024 Shengpeng Ji, Ziyue Jiang, Wen Wang, Yifu Chen, Minghui Fang, Jialong Zuo, Qian Yang, Xize Cheng, Zehan Wang, RuiQi Li, Ziang Zhang, Xiaoda Yang, Rongjie Huang, Yidi Jiang, Qian Chen, Siqi Zheng, Zhou Zhao

Despite the reduced number of tokens, WavTokenizer achieves state-of-the-art reconstruction quality with outstanding UTMOS scores and inherently contains richer semantic information.

Language Modeling Language Modelling

WindowMixer: Intra-Window and Inter-Window Modeling for Time Series Forecasting

no code implementations14 Jun 2024 Quangao Liu, RuiQi Li, Maowei Jiang, Wei Yang, Chen Liang, Longlong Pang, Zhuozhang Zou

Time series forecasting (TSF) is crucial in fields like economic forecasting, weather prediction, traffic flow analysis, and public health surveillance.

Missing Values Time Series +1

Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion

no code implementations4 Jun 2024 RuiQi Li, Rongjie Huang, Yongqi Wang, Zhiqing Hong, Zhou Zhao

We adopt discrete-unit random resampling and pitch corruption strategies, enabling training with unpaired singing data and thus mitigating the issue of data scarcity.

In-Context Learning Language Modeling +5

Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching

1 code implementation1 Jun 2024 Yongqi Wang, Wenxiang Guo, Rongjie Huang, Jiawei Huang, Zehan Wang, Fuming You, RuiQi Li, Zhou Zhao

By employing a non-autoregressive vector field estimator based on a feed-forward transformer and channel-level cross-modal feature fusion with strong temporal alignment, our model generates audio that is highly synchronized with the input video.

Video-to-Sound Generation

FAITH: Frequency-domain Attention In Two Horizons for Time Series Forecasting

1 code implementation22 May 2024 RuiQi Li, Maowei Jiang, Kai Wang, Kaiduo Feng, Quangao Liu, Yue Sun, Xiufang Zhou

Time Series Forecasting plays a crucial role in various fields such as industrial equipment maintenance, meteorology, energy consumption, traffic flow and financial investment.

Time Series Time Series Forecasting

Robust Singing Voice Transcription Serves Synthesis

no code implementations16 May 2024 RuiQi Li, Yu Zhang, Yongqi Wang, Zhiqing Hong, Rongjie Huang, Zhou Zhao

Note-level Automatic Singing Voice Transcription (AST) converts singing recordings into note sequences, facilitating the automatic annotation of singing datasets for Singing Voice Synthesis (SVS) applications.

Decoder Singing Voice Synthesis

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

1 code implementation18 Mar 2024 Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, RuiQi Li, Wenrui Liu, Fuming You, Tao Jin, Zhou Zhao

Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly.

Attribute Decoder +1

Stochastic Data-Driven Predictive Control with Equivalence to Stochastic MPC

no code implementations23 Dec 2023 RuiQi Li, John W. Simpson-Porco, Stephen L. Smith

We propose a data-driven receding-horizon control method dealing with the chance-constrained output-tracking problem of unknown stochastic linear time-invariant (LTI) systems with partial state observation.

Model Predictive Control

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

1 code implementation17 Dec 2023 Yu Zhang, Rongjie Huang, RuiQi Li, Jinzheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase.

Quantization Singing Voice Synthesis +1

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer

no code implementations14 Sep 2023 Yongqi Wang, Jionghao Bai, Rongjie Huang, RuiQi Li, Zhiqing Hong, Zhou Zhao

The acoustic language model we introduce for style transfer leverages self-supervised in-context learning, acquiring style transfer ability without relying on any speaker-parallel data, thereby overcoming data scarcity.

In-Context Learning Language Modeling +4

When Do Discourse Markers Affect Computational Sentence Understanding?

no code implementations1 Sep 2023 RuiQi Li, Liesbeth Allein, Damien Sileo, Marie-Francine Moens

The capabilities and use cases of automatic natural language processing (NLP) have grown significantly over the last few years.

Sentence

DisCover: Disentangled Music Representation Learning for Cover Song Identification

no code implementations19 Jul 2023 Jiahao Xun, Shengyu Zhang, Yanting Yang, Jieming Zhu, Liqun Deng, Zhou Zhao, Zhenhua Dong, RuiQi Li, Lichao Zhang, Fei Wu

We analyze the CSI task in a disentanglement view with the causal graph technique, and identify the intra-version and inter-version effects biasing the invariant learning.

Blocking Cover song identification +3

Automated Action Model Acquisition from Narrative Texts

no code implementations17 Jul 2023 RuiQi Li, Leyang Cui, Songtuan Lin, Patrik Haslum

Action models, which take the form of precondition/effect axioms, facilitate causal and motivational connections between actions for AI agents.

model

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment

no code implementations8 May 2023 RuiQi Li, Rongjie Huang, Lichao Zhang, Jinglin Liu, Zhou Zhao

The speech-to-singing (STS) voice conversion task aims to generate singing samples corresponding to speech recordings while facing a major challenge: the alignment between the target (singing) pitch contour and the source (speech) content is difficult to learn in a text-free situation.

cross-modal alignment Rhythm +2

Zero-shot Medical Image Translation via Frequency-Guided Diffusion Models

1 code implementation5 Apr 2023 Yunxiang Li, Hua-Chieh Shao, Xiao Liang, Liyuan Chen, RuiQi Li, Steve Jiang, Jing Wang, You Zhang

However, for medical image translation, the existing diffusion models are deficient in accurately retaining structural information since the structure details of source domain images are lost during the forward diffusion process and cannot be fully recovered through learned reverse diffusion, while the integrity of anatomical structures is extremely important in medical images.

Anatomy SSIM +2

EDeR: A Dataset for Exploring Dependency Relations Between Events

1 code implementation4 Apr 2023 RuiQi Li, Patrik Haslum, Leyang Cui

We argue that an important type of relation not explored in NLP or IR research to date is that of an event being an argument - required or optional - of another event.

Event Extraction Information Retrieval +3

Data-Driven Model Predictive Control for Linear Time-Periodic Systems

no code implementations30 Mar 2022 RuiQi Li, John W. Simpson-Porco, Stephen L. Smith

Robustness of the algorithm to noisy data is illustrated via simulation of a regularized version of the algorithm applied to a stochastic multi-input multi-output LTP system.

LEMMA Model Predictive Control

Latent Space Arc Therapy Optimization

no code implementations24 May 2021 Noah Bice, Mohamad Fakhreddine, RuiQi Li, Dan Nguyen, Christopher Kabat, Pamela Myers, Niko Papanikolaou, Neil Kirby

Volumetric modulated arc therapy planning is a challenging problem in high-dimensional, non-convex optimization.

ARC

The Geometry of Information Cocoon: Analyzing the Cultural Space with Word Embedding Models

1 code implementation20 Jul 2020 Huimin Xu, Zhicong Chen, RuiQi Li, Cheng-Jun Wang

In contrast, the people of higher social class have more capability to stride over the constraints of information cocoon.

Computers and Society

Cannot find the paper you are looking for? You can Submit a new open access paper.