1 code implementation • 26 Jun 2025 • Huadai Liu, Jialei Wang, Kaicheng Luo, Wen Wang, Qian Chen, Zhou Zhao, Wei Xue
While end-to-end video-to-audio generation has greatly improved, producing high-fidelity audio that authentically captures the nuances of visual content remains challenging.
no code implementations • 26 May 2025 • Chunyang Jiang, Chi-Min Chan, Yiyang Cai, Yulong Liu, Wei Xue, Yike Guo
To bridge this gap, in this paper we propose a novel framework, Learning With Forgetting (LWF), to achieve graceful forgetting in generative language models.
1 code implementation • 19 May 2025 • Ziyang Ma, Yinghao Ma, Yanqiao Zhu, Chen Yang, Yi-Wen Chao, Ruiyang Xu, Wenxi Chen, Yuanzhe Chen, Zhuo Chen, Jian Cong, Kai Li, Keliang Li, Siyou Li, Xinfeng Li, Xiquan Li, Zheng Lian, Yuzhe Liang, Minghao Liu, Zhikang Niu, Tianrui Wang, Yuping Wang, Yuxuan Wang, Yihao Wu, Guanrou Yang, Jianwei Yu, Ruibin Yuan, Zhisheng Zheng, Ziya Zhou, Haina Zhu, Wei Xue, Emmanouil Benetos, Kai Yu, Eng-Siong Chng, Xie Chen
Each item in the benchmark demands multi-step deep reasoning beyond surface-level understanding.
no code implementations • 17 May 2025 • Chi-Min Chan, Chunpu Xu, Jiaming Ji, Zhen Ye, Pengcheng Wen, Chunyang Jiang, Yaodong Yang, Wei Xue, Sirui Han, Yike Guo
The current focus of AI research is shifting from emphasizing model training towards enhancing evaluation quality, a transition that is crucial for driving further advancements in AI systems.
1 code implementation • 16 May 2025 • Jixun Yao, Guobin Ma, Huixin Xue, Huakang Chen, Chunbo Hao, Yuepeng Jiang, Haohe Liu, Ruibin Yuan, Jin Xu, Wei Xue, Hao liu, Lei Xie
Moreover, to assess the effectiveness of song aesthetic evaluation, we conduct experiments using SongEval to predict aesthetic scores and demonstrate better performance than existing objective evaluation metrics in predicting human-perceived musical quality.
no code implementations • 11 May 2025 • Peng Li, Suizhi Ma, Jialiang Chen, YuAn Liu, Chongyi Zhang, Wei Xue, Wenhan Luo, Alla Sheffer, Wenping Wang, Yike Guo
This conditional multiview diffusion model not only allows the generation of 3D models part by part but also enables local editing of 3D models according to the local revision of the input image without changing other 3D parts.
no code implementations • 3 May 2025 • Xingqun Qi, Yatian Wang, Hengyuan Zhang, Jiahao Pan, Wei Xue, Shanghang Zhang, Wenhan Luo, Qifeng Liu, Yike Guo
Considering the asymmetric body dynamics of two speakers, our framework is built upon two cooperative generation branches conditioned on separated speaker audio.
no code implementations • 29 Apr 2025 • Xiang Wang, Hui Xu, Jing Cai, Ta Zhou, XiBei Yang, Wei Xue
Secondly, to more fully perceive the global information of cross-frequency and within-frequency coupling, the global perception branch of DB-GNN adopts a Transformer architecture.
1 code implementation • 21 Apr 2025 • Huadai Liu, Tianyi Luo, Kaicheng Luo, Qikai Jiang, Peiwen Sun, Jialei Wang, Rongjie Huang, Qian Chen, Wen Wang, Xiangtai Li, Shiliang Zhang, Zhijie Yan, Zhou Zhao, Wei Xue
To generate spatial audio from 360-degree video, we propose a novel framework OmniAudio, which leverages self-supervised pre-training using both spatial audio data (in FOA format) and large-scale non-spatial data.
no code implementations • 13 Mar 2025 • Zeyue Tian, Yizhu Jin, Zhaoyang Liu, Ruibin Yuan, Xu Tan, Qifeng Chen, Wei Xue, Yike Guo
Audio and music generation have emerged as crucial tasks in many applications, yet existing approaches face significant limitations: they operate in isolation without unified capabilities across modalities, suffer from scarce high-quality, multi-modal training data, and struggle to effectively integrate diverse inputs.
1 code implementation • 11 Mar 2025 • Ruibin Yuan, Hanfeng Lin, Shuyue Guo, Ge Zhang, Jiahao Pan, Yongyi Zang, Haohe Liu, Yiming Liang, Wenye Ma, Xingjian Du, Xinrun Du, Zhen Ye, Tianyu Zheng, Yinghao Ma, Minghao Liu, Zeyue Tian, Ziya Zhou, Liumeng Xue, Xingwei Qu, Yizhi Li, Shangda Wu, Tianhao Shen, Ziyang Ma, Jun Zhan, Chunhui Wang, Yatian Wang, Xiaowei Chi, Xinyue Zhang, Zhenzhu Yang, Xiangzhou Wang, Shansong Liu, Lingrui Mei, Peng Li, Junjie Wang, Jianwei Yu, Guojian Pang, Xu Li, ZiHao Wang, Xiaohuan Zhou, Lijun Yu, Emmanouil Benetos, Yong Chen, Chenghua Lin, Xie Chen, Gus Xia, Zhaoxiang Zhang, Chao Zhang, Wenhu Chen, Xinyu Zhou, Xipeng Qiu, Roger Dannenberg, Jiaheng Liu, Jian Yang, Wenhao Huang, Wei Xue, Xu Tan, Yike Guo
We tackle the task of long-form music generation--particularly the challenging \textbf{lyrics-to-song} problem--by introducing YuE, a family of open foundation models based on the LLaMA2 architecture.
1 code implementation • 3 Mar 2025 • Xinsheng Wang, Mingqi Jiang, Ziyang Ma, Ziyu Zhang, Songxiang Liu, Linqin Li, Zheng Liang, Qixi Zheng, Rui Wang, Xiaoqin Feng, Weizhen Bian, Zhen Ye, Sitong Cheng, Ruibin Yuan, Zhixian Zhao, Xinfa Zhu, Jiahao Pan, Liumeng Xue, Pengcheng Zhu, Yunlin Chen, Zhifei Li, Xie Chen, Lei Xie, Yike Guo, Wei Xue
Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis.
1 code implementation • 1 Mar 2025 • Boyi Kang, Xinfa Zhu, Zihan Zhang, Zhen Ye, Mingshuai Liu, Ziqian Wang, Yike Zhu, Guobin Ma, Jun Chen, Longshuai Xiao, Chao Weng, Wei Xue, Lei Xie
In this paper, we introduce LLaSE-G1, a LLaMA-based language model that incentivizes generalization capabilities for speech enhancement.
1 code implementation • 24 Feb 2025 • Hao Gu, Wei Li, Lujun Li, Qiyuan Zhu, Mark Lee, Shengjie Sun, Wei Xue, Yike Guo
Based on observations of expert diversity, we decompose their weights into a shared base weight and unique delta weights.
1 code implementation • 23 Feb 2025 • Liumeng Xue, Ziya Zhou, Jiahao Pan, Zixuan Li, Shuai Fan, Yinghao Ma, Sitong Cheng, Dongchao Yang, Haohan Guo, Yujia Xiao, Xinsheng Wang, Zixuan Shen, Chuanbo Zhu, Xinshen Zhang, Tianchi Liu, Ruibin Yuan, Zeyue Tian, Haohe Liu, Emmanouil Benetos, Ge Zhang, Yike Guo, Wei Xue
Audio-FLAN lays the foundation for unified audio-language models that can seamlessly handle both understanding (e. g., transcription, comprehension) and generation (e. g., speech, music, sound) tasks across a wide range of audio domains in a zero-shot manner.
no code implementations • 9 Feb 2025 • Xinyu Liu, Ailing Zeng, Wei Xue, Harry Yang, Wenhan Luo, Qifeng Liu, Yike Guo
Crafting magic and illusions is one of the most thrilling aspects of filmmaking, with visual effects (VFX) serving as the powerhouse behind unforgettable cinematic experiences.
1 code implementation • 6 Feb 2025 • Zhen Ye, Xinfa Zhu, Chi-Min Chan, Xinsheng Wang, Xu Tan, Jiahe Lei, Yi Peng, Haohe Liu, Yizhu Jin, Zheqi Dai, Hongzhan Lin, Jianyi Chen, Xingjian Du, Liumeng Xue, Yunlin Chen, Zhifei Li, Lei Xie, Qiuqiang Kong, Yike Guo, Wei Xue
Recent advances in text-based large language models (LLMs), particularly in the GPT series and the o1 model, have demonstrated the effectiveness of scaling both training-time and inference-time compute.
no code implementations • 5 Feb 2025 • Junkun Jiang, Jie Chen, Ho Yin Au, Mingyuan Chen, Wei Xue, Yike Guo
Multi-person motion capture over sparse angular observations is a challenging problem under interference from both self- and mutual-occlusions.
no code implementations • CVPR 2025 • Peng Li, Wangguandong Zheng, YuAn Liu, Tao Yu, Yangguang Li, Xingqun Qi, Xiaowei Chi, Siyu Xia, Yan-Pei Cao, Wei Xue, Wenhan Luo, Yike Guo
It is found that directly applying multiview diffusion on single-view human images leads to severe geometric distortions, especially on generated faces.
no code implementations • 4 Dec 2024 • Yan Li, Ziya Zhou, Zhiqiang Wang, Wei Xue, Wenhan Luo, Yike Guo
The differences between human talking and singing limit the performance of existing talking face video generation models when applied to singing.
no code implementations • 22 Nov 2024 • Yiyang Cai, Zhengkai Jiang, Yulong Liu, Chunyang Jiang, Wei Xue, Wenhan Luo, Yike Guo
Facial personalization represents a crucial downstream task in the domain of text-to-image generation.
no code implementations • 21 Nov 2024 • Fengxin Li, Yi Li, Yue Liu, Chao Zhou, YuAn Wang, Xiaoxiang Deng, Wei Xue, Dapeng Liu, Lei Xiao, Haijie Gu, Jie Jiang, Hongyan Liu, Biao Qin, Jun He
Display advertising provides significant value to advertisers, publishers, and users.
no code implementations • 20 Oct 2024 • Xiaowei Chi, Chun-Kai Fan, Hengyuan Zhang, Xingqun Qi, Rongyu Zhang, Anthony Chen, Chi-Min Chan, Wei Xue, Qifeng Liu, Shanghang Zhang, Yike Guo
It leverages the complementary strengths of pre-trained vision-language and video generation models, enabling them to function as a world model in embodied scenarios.
1 code implementation • 16 Oct 2024 • Huadai Liu, Jialei Wang, Rongjie Huang, Yang Liu, Heng Lu, Zhou Zhao, Wei Xue
To alleviate the inefficient timesteps allocation and suboptimal distribution of noise, FlashAudio optimizes the time distribution of rectified flow with Bifocal Samplers and proposes immiscible flow to minimize the total distance of data-noise pairs in a batch vias assignment.
no code implementations • 14 Oct 2024 • Peiwen Sun, Sitong Cheng, Xiangtai Li, Zhen Ye, Huadai Liu, Honggang Zhang, Wei Xue, Yike Guo
However, when it comes to stereo audio generation, the soundscapes often have a complex scene of multiple objects and directions.
no code implementations • 7 Oct 2024 • Siyuan Hou, Shansong Liu, Ruibin Yuan, Wei Xue, Ying Shan, Mangsuo Zhao, Chao Zhang
For more precise and fine-grained melody control, we introduce a novel top-$k$ constant-Q Transform representation as the melody prompt, reducing ambiguity compared to previous representations (e. g., chroma), particularly for music with multiple tracks or a wide range of pitch values.
1 code implementation • 4 Oct 2024 • Tianyu Wu, Lingrui Mei, Ruibin Yuan, Lujun Li, Wei Xue, Yike Guo
While recent advancements in large language model (LLM) alignment have enabled the effective identification of malicious objectives involving scene nesting and keyword rewriting, our study reveals that these methods remain inadequate at detecting malicious objectives expressed through context within nested harmless objectives.
no code implementations • 16 Sep 2024 • Peng Li, Wangguandong Zheng, YuAn Liu, Tao Yu, Yangguang Li, Xingqun Qi, Mengfei Li, Xiaowei Chi, Siyu Xia, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo
Detailed and photorealistic 3D human modeling is essential for various applications and has seen tremendous progress.
1 code implementation • 4 Sep 2024 • Xinyu Liu, Yingqing He, Lanqing Guo, Xiang Li, Bu Jin, Peng Li, Yan Li, Chi-Min Chan, Qifeng Chen, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo
The hierarchical prompts offer both global and local guidance.
1 code implementation • 30 Aug 2024 • Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo, Wei Xue
By enhancing the semantic ability of the codec, X-Codec significantly reduces WER in speech synthesis tasks and extends these benefits to non-speech applications, including music and sound generation.
1 code implementation • 27 Aug 2024 • Chi-Min Chan, Jianxuan Yu, Weize Chen, Chunyang Jiang, Xinyu Liu, Weijie Shi, Zhiyuan Liu, Wei Xue, Yike Guo
However, configuring an MAS for a task remains challenging, with performance only observable post-execution.
1 code implementation • 19 Aug 2024 • Chunyang Jiang, Chi-Min Chan, Wei Xue, Qifeng Liu, Yike Guo
Large language models (LLMs) have shown remarkable capability in numerous tasks and applications.
no code implementations • 18 Aug 2024 • Cheng Lin, Lujun Li, Dezhi Li, Jie Zou, Wei Xue, Yike Guo
This approach allows the model to more precisely adapt to specific tasks while maintaining a compact parameter space.
no code implementations • 3 Aug 2024 • Peijie Dong, Lujun Li, Yuedong Zhong, Dayou Du, Ruibo Fan, Yuhan Chen, Zhenheng Tang, Qiang Wang, Wei Xue, Yike Guo, Xiaowen Chu
In this paper, we present the first structural binarization method for LLM compression to less than 1-bit precision.
no code implementations • 31 Jul 2024 • Ziya Zhou, Yuhang Wu, Zhiyue Wu, Xinyue Zhang, Ruibin Yuan, Yinghao Ma, Lu Wang, Emmanouil Benetos, Wei Xue, Yike Guo
Yet scant research explores the details of how these LLMs perform on advanced music understanding and conditioned generation, especially from the multi-step reasoning perspective, which is a critical aspect in the conditioned, editable, and interactive human-computer co-creation process.
1 code implementation • 30 Jul 2024 • Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Ruibin Yuan, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen, Shanghang Zhang, Qifeng Liu, Yike Guo
To fulfill this gap, we present MMTrail, a large-scale multi-modality video-language dataset incorporating more than 20M trailer clips with visual captions, and 2M high-quality clips with multimodal captions.
1 code implementation • CVPR 2025 • Zeyue Tian, Zhaoyang Liu, Ruibin Yuan, Jiahao Pan, Qifeng Liu, Xu Tan, Qifeng Chen, Wei Xue, Yike Guo
In this work, we systematically study music generation conditioned solely on the video.
1 code implementation • 29 May 2024 • Yingqing He, Zhaoyang Liu, Jingye Chen, Zeyue Tian, Hongyu Liu, Xiaowei Chi, Runtao Liu, Ruibin Yuan, Yazhou Xing, Wenhai Wang, Jifeng Dai, Yong Zhang, Wei Xue, Qifeng Liu, Yike Guo, Qifeng Chen
With the recent advancement in large language models (LLMs), there is a growing interest in combining LLMs with multimodal learning.
no code implementations • 27 May 2024 • Xingqun Qi, Hengyuan Zhang, Yatian Wang, Jiahao Pan, Chen Liu, Peng Li, Xiaowei Chi, Mengfei Li, Wei Xue, Shanghang Zhang, Wenhan Luo, Qifeng Liu, Yike Guo
Here, we construct the audio ControlNet through a trainable copy of our pre-trained diffusion model.
no code implementations • 22 May 2024 • Yi Xiao, Qilong Jia, Wei Xue, Lei Bai
Data assimilation refers to a set of algorithms designed to compute the optimal estimate of a system's state by refining the prior prediction (known as background states) using observed data.
no code implementations • 13 May 2024 • Jianyi Chen, Wei Xue, Xu Tan, Zhen Ye, Qifeng Liu, Yike Guo
By intensive experimental studies, we demonstrate that the proposed method can generate better samples than SingSong, and accelerate the generation by at least 30 times.
1 code implementation • 28 Apr 2024 • Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo
Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints.
1 code implementation • 23 Apr 2024 • Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Wei Xue, Qifeng Liu, Yike Guo
The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation.
no code implementations • 22 Apr 2024 • Xiaoxia Cheng, Zeqi Tan, Wei Xue, Weiming Lu
In this paper, we propose an information re-organization (InfoRE) method before proceeding with the reasoning to enhance the reasoning ability of LLMs.
no code implementations • 9 Apr 2024 • Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan, Stephen W. Huang, Jie Fu, Ge Zhang
In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music.
1 code implementation • 31 Mar 2024 • Chi-Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, Jie Fu
To this end, we propose learning to Refine Query for Retrieval Augmented Generation (RQ-RAG) in this paper, endeavoring to enhance the model by equipping it with capabilities for explicit rewriting, decomposition, and disambiguation.
no code implementations • 25 Mar 2024 • Lei Liu, Xiaoyan Yang, Fangzhou Li, Chenfei Chi, Yue Shen, Shiwei Lyu Ming Zhang, Xiaowei Ma, Xiangguo Lyu, Liya Ma, Zhiqiang Zhang, Wei Xue, Yiran Huang, Jinjie Gu
Applying such paradigm, we construct an evaluation benchmark in the field of urology, including a LCP, a SPs dataset, and an automated RAE.
1 code implementation • 25 Feb 2024 • Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Emmanouil Benetos, Jie Fu, Gus Xia, Roger Dannenberg, Wei Xue, Shiyin Kang, Yike Guo
It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language.
1 code implementation • 22 Feb 2024 • Junwei Pan, Wei Xue, Ximei Wang, Haibin Yu, Xun Liu, Shijie Quan, Xueming Qiu, Dapeng Liu, Lei Xiao, Jie Jiang
We present Tencent's ads recommendation system and examine the challenges and practices of learning appropriate recommendation representations.
no code implementations • 19 Feb 2024 • Congyun Jin, Ming Zhang, Xiaowei Ma, Li Yujiao, Yingbo Wang, Yabo Jia, Yuliang Du, Tao Sun, Haowen Wang, Cong Fan, Jinjie Gu, Chenfei Chi, Xiangguo Lv, Fangzhou Li, Wei Xue, Yiran Huang
Recent advancements in Large Language Models (LLMs) and Large Multi-modal Models (LMMs) have shown potential in various medical applications, such as Intelligent Medical Diagnosis.
1 code implementation • 3 Jan 2024 • Yiwen Lu, Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, Yike Guo
The diffusion-based Singing Voice Conversion (SVC) methods have achieved remarkable performances, producing natural audios with high similarity to the target timbre.
no code implementations • 22 Dec 2023 • Dongmei Zhang, Chang Li, Ray Zhang, Shenghao Xie, Wei Xue, Xiaodong Xie, Shanghang Zhang
In this work, we propose FM-OV3D, a method of Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection, which improves the open-vocabulary localization and recognition abilities of 3D model by blending knowledge from multiple pre-trained foundation models, achieving true open-vocabulary without facing constraints from original 3D datasets.
no code implementations • 16 Dec 2023 • Yi Xiao, Lei Bai, Wei Xue, Kang Chen, Tao Han, Wanli Ouyang
Weather forecasting is a crucial yet highly challenging task.
1 code implementation • 15 Dec 2023 • Shiwei Lyu, Chenfei Chi, Hongbo Cai, Lei Shi, Xiaoyan Yang, Lei Liu, Xiang Chen, Deng Zhao, Zhiqiang Zhang, Xianguo Lyu, Ming Zhang, Fangzhou Li, Xiaowei Ma, Yue Shen, Jinjie Gu, Wei Xue, Yiran Huang
We introduce RJUA-QA, a novel medical dataset for question answering (QA) and reasoning with clinical evidence, contributing to bridge the gap between general large language models (LLMs) and medical-specific LLM applications.
no code implementations • CVPR 2024 • Xingqun Qi, Jiahao Pan, Peng Li, Ruibin Yuan, Xiaowei Chi, Mengfei Li, Wenhan Luo, Wei Xue, Shanghang Zhang, Qifeng Liu, Yike Guo
In addition, the lack of large-scale available datasets with emotional transition speech and corresponding 3D human gestures also limits the addressing of this task.
no code implementations • 13 Sep 2023 • Min Zeng, Wei Xue, Qifeng Liu, Yike Guo
Recent advancements in data-driven task-oriented dialogue systems (ToDs) struggle with incremental learning due to computational constraints and time-consuming issues.
no code implementations • 18 Aug 2023 • Wei Xue, Zongyi Guo, Baoliang Cui, Zheng Xing, Xiaoyi Zeng, Xiufei Wang, Shuhui Wu, Weiming Lu
E-commerce platforms benefit from accurate product understanding to enhance user experience and operational efficiency.
3 code implementations • 14 Aug 2023 • Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, Zhiyuan Liu
Text evaluation has historically posed significant challenges, often demanding substantial labor and time cost.
1 code implementation • 29 Jun 2023 • Le Zhuo, Ruibin Yuan, Jiahao Pan, Yinghao Ma, Yizhi Li, Ge Zhang, Si Liu, Roger Dannenberg, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wei Xue, Yike Guo
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal.
1 code implementation • NeurIPS 2023 • Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Roger Dannenberg, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike Guo, Jie Fu
This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark.
2 code implementations • 7 Jun 2023 • Jiaming Liu, Senqiao Yang, Peidong Jia, Renrui Zhang, Ming Lu, Yandong Guo, Wei Xue, Shanghang Zhang
Note that, our method can be regarded as a novel transfer paradigm for large-scale models, delivering promising results in adaptation to continually changing distributions.
no code implementations • 22 May 2023 • Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, Yike Guo
Since expert knowledge is hard to acquire, it hinders the flexibility to quickly design and tune digital synthesizers for diverse sounds.
1 code implementation • 18 May 2023 • Wei Xue, Yongliang Shen, Wenqi Ren, Jietian Guo, ShiLiang Pu, Weiming Lu
Taxonomy completion, enriching existing taxonomies by inserting new concepts as parents or attaching them as children, has gained significant interest.
1 code implementation • 11 May 2023 • Zhen Ye, Wei Xue, Xu Tan, Jie Chen, Qifeng Liu, Yike Guo
In this paper, we propose a "Co"nsistency "Mo"del-based "Speech" synthesis method, CoMoSpeech, which achieve speech synthesis through a single diffusion sampling step while achieving high audio quality.
no code implementations • 1 Mar 2023 • Lianyu Zhou, Dong Wei, Donghuan Lu, Wei Xue, Liansheng Wang, Yefeng Zheng
As an essential indicator for cancer progression and treatment response, tumor size is often measured following the response evaluation criteria in solid tumors (RECIST) guideline in CT slices.
no code implementations • 18 Aug 2022 • Yike Guo, Qifeng Liu, Jie Chen, Wei Xue, Jie Fu, Henrik Jensen, Fernando Rosas, Jeffrey Shaw, Xing Wu, Jiji Zhang, Jianliang Xu
This report presents a comprehensive view of our vision on the development path of the human-machine symbiotic art creation.
no code implementations • 15 Aug 2022 • Xin Wang, Wei Xue, Yilun Han, Guangwen Yang
We develop a user-friendly platform NeuroGCM for efficiently developing hybrid modeling in climate simulation.
no code implementations • 16 Jul 2022 • Jiahao Qi, Zhiqiang Gong, Xingyue Liu, Kangcheng Bin, Chen Chen, YongQian Li, Wei Xue, Yu Zhang, Ping Zhong
Deep learning methodology contributes a lot to the development of hyperspectral image (HSI) analysis community.
no code implementations • 19 May 2022 • Yu Zhang, Zhiqiang Gong, Yichuang Zhang, YongQian Li, Kangcheng Bin, Jiahao Qi, Wei Xue, Ping Zhong
Transferable adversarial attack is always in the spotlight since deep learning models have been demonstrated to be vulnerable to adversarial samples.
no code implementations • 28 Jan 2022 • Wei Xue, Xiaojiang Peng
Stereo matching is crucial for binocular stereo vision.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Ximing Liu, Wei Xue, Qi Su, Weiran Nie, Wei Peng
Creating high-quality annotated dialogue corpora is challenging.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 26 Sep 2020 • Zixuan Xiao, Wei Xue, Ping Zhong
Particularly, in order to fit the object detection settings, our proposed few-shot detector concentrates on the relations that lie in the level of objects instead of the full image with the assistance of Self-Adaptive Attention Network (SAAN).
no code implementations • 3 Sep 2020 • Zixuan Xiao, Ping Zhong, Yuan Quan, Xuping Yin, Wei Xue
Then the object-specific features are delivered to the two-stage detection backend for the detection results.
no code implementations • 28 Jul 2020 • Wei Xue, Gang Quan, Chao Zhang, Guohong Ding, Xiaodong He, BoWen Zhou
Statistical signal processing based speech enhancement methods adopt expert knowledge to design the statistical models and linear filters, which is complementary to the deep neural network (DNN) based methods which are data-driven.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 17 Jul 2020 • Vinay Kothapally, Wei Xia, Shahram Ghorbani, John H. L. Hansen, Wei Xue, Jing Huang
The reliability of using fully convolutional networks (FCNs) has been successfully demonstrated by recent studies in many speech applications.
no code implementations • 25 Jun 2020 • Christina Gao, Jia Liu, Lian-Tao Wang, Xiao-Ping Wang, Wei Xue, Yi-Ming Zhong
Meanwhile, they can also scatter with the atoms through the inverse Primakoff process via the axion-photon coupling, which emits a photon and mimics the electronic recoil signals.
High Energy Physics - Phenomenology High Energy Physics - Experiment
1 code implementation • 4 May 2019 • Yushu Chen, Hao Jing, Wenlai Zhao, Zhi-Qiang Liu, Ouyi Li, Liang Qiao, Wei Xue, Guangwen Yang
RSG is further combined with adaptive methods to construct ARSG for acceleration.
1 code implementation • ACL 2018 • Wei Xue, Tao Li
Aspect based sentiment analysis (ABSA) can provide more detailed information than general sentiment analysis, because it aims to predict the sentiment polarities of the given aspects or entities in text.
Aspect-Based Sentiment Analysis
Aspect Category Sentiment Analysis
1 code implementation • 15 Jan 2018 • Philip Ilten, Yotam Soreq, Mike Williams, Wei Xue
Searches for dark photons provide serendipitous discovery potential for other types of vector particles.
High Energy Physics - Phenomenology High Energy Physics - Experiment
no code implementations • IJCNLP 2017 • Wei Xue, Wubai Zhou, Tao Li, Qing Wang
Online reviews are valuable resources not only for consumers to make decisions before purchase, but also for providers to get feedbacks for their services or commodities.
no code implementations • 4 Jul 2017 • Xun Liu, Wei Xue, Lei Xiao, Bo Zhang
Then we extend the model family to a variety of bayesian online models with increasing feature embedding capabilities, such as Sparse-MLP, FM-MLP and FFM-MLP.