no code implementations • 26 Feb 2025 • Ziyue Jiang, Yi Ren, RuiQi Li, Shengpeng Ji, Boyang Zhang, Zhenhui Ye, Chen Zhang, Bai Jionghao, Xiaoda Yang, Jialong Zuo, Yu Zhang, Rui Liu, Xiang Yin, Zhou Zhao
While recent zero-shot text-to-speech (TTS) models have significantly improved speech quality and expressiveness, mainstream systems still suffer from issues related to speech-text alignment modeling: 1) models without explicit speech-text alignment modeling exhibit less robustness, especially for hard sentences in practical applications; 2) predefined alignment-based models suffer from naturalness constraints of forced alignments.
no code implementations • 30 Jan 2025 • Yifan Wang, Hongfeng Ai, RuiQi Li, Maowei Jiang, Cheng Jiang, Chenzhong Li
In medical time series disease diagnosis, two key challenges are identified. First, the high annotation cost of medical data leads to overfitting in models trained on label-limited, single-center datasets.
no code implementations • 31 Oct 2024 • Xiufeng Huang, RuiQi Li, Yiu-ming Cheung, Ka Chun Cheung, Simon See, Renjie Wan
3D Gaussian Splatting (3DGS) has become a crucial method for acquiring 3D assets.
no code implementations • 16 Oct 2024 • RuiQi Li, Siqi Zheng, Xize Cheng, Ziang Zhang, Shengpeng Ji, Zhou Zhao
Generating music that aligns with the visual content of a video has been a challenging task, as it requires a deep understanding of visual semantics and involves generating music whose melody, rhythm, and dynamics harmonize with the visual narratives.
1 code implementation • 24 Sep 2024 • Yu Zhang, Ziyue Jiang, RuiQi Li, Changhao Pan, Jinzheng He, Rongjie Huang, Chuxin Wang, Zhou Zhao
To address these challenges, we introduce TCSinger, the first zero-shot SVS model for style transfer across cross-lingual speech and singing styles, along with multi-level style control.
1 code implementation • 20 Sep 2024 • Yu Zhang, Changhao Pan, Wenxiang Guo, RuiQi Li, Zhiyuan Zhu, Jialei Wang, Wenhao Xu, Jingyu Lu, Zhiqing Hong, Chuxin Wang, Lichao Zhang, Jinzheng He, Ziyue Jiang, Yuxin Chen, Chen Yang, Jiecheng Zhou, Xinyu Cheng, Zhou Zhao
The scarcity of high-quality and multi-task singing datasets significantly hinders the development of diverse controllable and personalized singing tasks, as existing singing datasets suffer from low quality, limited diversity of languages and singers, absence of multi-technique information and realistic music scores, and poor task suitability.
no code implementations • 9 Sep 2024 • RuiQi Li, John W. Simpson-Porco, Stephen L. Smith
We consider the problem of direct data-driven predictive control for unknown stochastic linear time-invariant (LTI) systems with partial state observation.
1 code implementation • 29 Aug 2024 • Shengpeng Ji, Ziyue Jiang, Wen Wang, Yifu Chen, Minghui Fang, Jialong Zuo, Qian Yang, Xize Cheng, Zehan Wang, RuiQi Li, Ziang Zhang, Xiaoda Yang, Rongjie Huang, Yidi Jiang, Qian Chen, Siqi Zheng, Zhou Zhao
Despite the reduced number of tokens, WavTokenizer achieves state-of-the-art reconstruction quality with outstanding UTMOS scores and inherently contains richer semantic information.
no code implementations • 2 Jul 2024 • RuiQi Li, Zhiqing Hong, Yongqi Wang, Lichao Zhang, Rongjie Huang, Siqi Zheng, Zhou Zhao
Text-to-song (TTSong) is a music generation task that synthesizes accompanied singing voices.
no code implementations • 14 Jun 2024 • Quangao Liu, RuiQi Li, Maowei Jiang, Wei Yang, Chen Liang, Longlong Pang, Zhuozhang Zou
Time series forecasting (TSF) is crucial in fields like economic forecasting, weather prediction, traffic flow analysis, and public health surveillance.
no code implementations • 4 Jun 2024 • RuiQi Li, Rongjie Huang, Yongqi Wang, Zhiqing Hong, Zhou Zhao
We adopt discrete-unit random resampling and pitch corruption strategies, enabling training with unpaired singing data and thus mitigating the issue of data scarcity.
1 code implementation • 1 Jun 2024 • Yongqi Wang, Wenxiang Guo, Rongjie Huang, Jiawei Huang, Zehan Wang, Fuming You, RuiQi Li, Zhou Zhao
By employing a non-autoregressive vector field estimator based on a feed-forward transformer and channel-level cross-modal feature fusion with strong temporal alignment, our model generates audio that is highly synchronized with the input video.
Ranked #4 on
Video-to-Sound Generation
on VGG-Sound
1 code implementation • 22 May 2024 • RuiQi Li, Maowei Jiang, Kai Wang, Kaiduo Feng, Quangao Liu, Yue Sun, Xiufang Zhou
Time Series Forecasting plays a crucial role in various fields such as industrial equipment maintenance, meteorology, energy consumption, traffic flow and financial investment.
no code implementations • 16 May 2024 • RuiQi Li, Yu Zhang, Yongqi Wang, Zhiqing Hong, Rongjie Huang, Zhou Zhao
Note-level Automatic Singing Voice Transcription (AST) converts singing recordings into note sequences, facilitating the automatic annotation of singing datasets for Singing Voice Synthesis (SVS) applications.
no code implementations • 14 Apr 2024 • Zhiqing Hong, Rongjie Huang, Xize Cheng, Yongqi Wang, RuiQi Li, Fuming You, Zhou Zhao, Zhimeng Zhang
A song is a combination of singing voice and accompaniment.
1 code implementation • 18 Mar 2024 • Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, RuiQi Li, Wenrui Liu, Fuming You, Tao Jin, Zhou Zhao
Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly.
no code implementations • 23 Dec 2023 • RuiQi Li, John W. Simpson-Porco, Stephen L. Smith
We propose a data-driven receding-horizon control method dealing with the chance-constrained output-tracking problem of unknown stochastic linear time-invariant (LTI) systems with partial state observation.
1 code implementation • 17 Dec 2023 • Yu Zhang, Rongjie Huang, RuiQi Li, Jinzheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao
Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase.
no code implementations • 14 Sep 2023 • Yongqi Wang, Jionghao Bai, Rongjie Huang, RuiQi Li, Zhiqing Hong, Zhou Zhao
The acoustic language model we introduce for style transfer leverages self-supervised in-context learning, acquiring style transfer ability without relying on any speaker-parallel data, thereby overcoming data scarcity.
no code implementations • 1 Sep 2023 • RuiQi Li, Liesbeth Allein, Damien Sileo, Marie-Francine Moens
The capabilities and use cases of automatic natural language processing (NLP) have grown significantly over the last few years.
no code implementations • 19 Jul 2023 • Jiahao Xun, Shengyu Zhang, Yanting Yang, Jieming Zhu, Liqun Deng, Zhou Zhao, Zhenhua Dong, RuiQi Li, Lichao Zhang, Fei Wu
We analyze the CSI task in a disentanglement view with the causal graph technique, and identify the intra-version and inter-version effects biasing the invariant learning.
no code implementations • 17 Jul 2023 • RuiQi Li, Leyang Cui, Songtuan Lin, Patrik Haslum
Action models, which take the form of precondition/effect axioms, facilitate causal and motivational connections between actions for AI agents.
no code implementations • 8 May 2023 • RuiQi Li, Rongjie Huang, Lichao Zhang, Jinglin Liu, Zhou Zhao
The speech-to-singing (STS) voice conversion task aims to generate singing samples corresponding to speech recordings while facing a major challenge: the alignment between the target (singing) pitch contour and the source (speech) content is difficult to learn in a text-free situation.
1 code implementation • 5 Apr 2023 • Yunxiang Li, Hua-Chieh Shao, Xiao Liang, Liyuan Chen, RuiQi Li, Steve Jiang, Jing Wang, You Zhang
However, for medical image translation, the existing diffusion models are deficient in accurately retaining structural information since the structure details of source domain images are lost during the forward diffusion process and cannot be fully recovered through learned reverse diffusion, while the integrity of anatomical structures is extremely important in medical images.
1 code implementation • 4 Apr 2023 • RuiQi Li, Patrik Haslum, Leyang Cui
We argue that an important type of relation not explored in NLP or IR research to date is that of an event being an argument - required or optional - of another event.
1 code implementation • NIPS 2022 • Lichao Zhang, RuiQi Li, Shoutong Wang, Liqun Deng, Jinglin Liu, Yi Ren, Jinzheng He, Rongjie Huang, Jieming Zhu, Xiao Chen, Zhou Zhao
The lack of publicly available high-quality and accurately labeled datasets has long been a major bottleneck for singing voice synthesis (SVS).
no code implementations • 30 Mar 2022 • RuiQi Li, John W. Simpson-Porco, Stephen L. Smith
Robustness of the algorithm to noisy data is illustrated via simulation of a regularized version of the algorithm applied to a stochastic multi-input multi-output LTP system.
no code implementations • 24 May 2021 • Noah Bice, Mohamad Fakhreddine, RuiQi Li, Dan Nguyen, Christopher Kabat, Pamela Myers, Niko Papanikolaou, Neil Kirby
Volumetric modulated arc therapy planning is a challenging problem in high-dimensional, non-convex optimization.
1 code implementation • 20 Jul 2020 • Huimin Xu, Zhicong Chen, RuiQi Li, Cheng-Jun Wang
In contrast, the people of higher social class have more capability to stride over the constraints of information cocoon.
Computers and Society