no code implementations • 17 Apr 2025 • Guanrou Yang, Chen Yang, Qian Chen, Ziyang Ma, Wenxi Chen, Wen Wang, Tianrui Wang, Yifan Yang, Zhikang Niu, Wenrui Liu, Fan Yu, Zhihao Du, Zhifu Gao, Shiliang Zhang, Xie Chen
In this work, we propose EmoVoice, a novel emotion-controllable TTS model that exploits large language models (LLMs) to enable fine-grained freestyle natural language emotion control, and a phoneme boost variant design that makes the model output phoneme tokens and audio tokens in parallel to enhance content consistency, inspired by chain-of-thought (CoT) and modality-of-thought (CoM) techniques.
no code implementations • 17 Feb 2025 • Zhiyuan Fang, Zicong Hong, Yuegui Huang, Yufeng Lyu, Wuhui Chen, Yue Yu, Fan Yu, Zibin Zheng
Large Language Models (LLMs) have demonstrated impressive performance across various tasks, and their application in edge scenarios has attracted significant attention.
no code implementations • 9 Feb 2025 • Zhiyuan Fang, Yuegui Huang, Zicong Hong, Yufeng Lyu, Wuhui Chen, Yue Yu, Fan Yu, Zibin Zheng
Although offloading techniques utilise memory from the CPU and disk and parallelise the I/O and computation for efficiency, the computation for each expert in MoE models is often less than the I/O, resulting in numerous bubbles in the pipeline.
no code implementations • 10 Jan 2025 • Qian Chen, Yafeng Chen, Yanni Chen, Mengzhe Chen, Yingda Chen, Chong Deng, Zhihao Du, Ruize Gao, Changfeng Gao, Zhifu Gao, Yabin Li, Xiang Lv, Jiaqing Liu, Haoneng Luo, Bin Ma, Chongjia Ni, Xian Shi, Jialong Tang, Hui Wang, Hao Wang, Wen Wang, Yuxuan Wang, Yunlan Xu, Fan Yu, Zhijie Yan, Yexin Yang, Baosong Yang, Xian Yang, Guanrou Yang, Tianyu Zhao, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Pei Zhang, Chong Zhang, Jinren Zhou
Previous models for voice interactions are categorized as native and aligned.
1 code implementation • 13 Dec 2024 • Zhihao Du, Yuxuan Wang, Qian Chen, Xian Shi, Xiang Lv, Tianyu Zhao, Zhifu Gao, Yexin Yang, Changfeng Gao, Hui Wang, Fan Yu, Huadai Liu, Zhengyan Sheng, Yue Gu, Chong Deng, Wen Wang, Shiliang Zhang, Zhijie Yan, Jingren Zhou
By training on a large-scale multilingual dataset, CosyVoice 2 achieves human-parity naturalness, minimal response latency, and virtually lossless synthesis quality in the streaming mode.
no code implementations • 4 Nov 2024 • Yiming Sun, Fan Yu, Shaoxiang Chen, Yu Zhang, Junwei Huang, Chenhui Li, Yang Li, Changbo Wang
Recently, Vision-Language~(VL) trackers have proposed to utilize additional natural language descriptions to enhance versatility in various applications.
no code implementations • 22 Oct 2024 • Guanrou Yang, Fan Yu, Ziyang Ma, Zhihao Du, Zhifu Gao, Shiliang Zhang, Xie Chen
While automatic speech recognition (ASR) systems have achieved remarkable performance with large-scale datasets, their efficacy remains inadequate in low-resource settings, encompassing dialects, accents, minority languages, and long-tail hotwords, domains with significant practical relevance.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 17 Jun 2024 • Kaidi Li, Tianmeng Yang, Min Zhou, Jiahao Meng, Shendi Wang, Yihui Wu, Boshuai Tan, Hu Song, Lujia Pan, Fan Yu, Zhenli Sheng, Yunhai Tong
While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked.
1 code implementation • 9 Jun 2024 • Guanrou Yang, Ziyang Ma, Fan Yu, Zhifu Gao, Shiliang Zhang, Xie Chen
As more and more information-rich data like video become available, utilizing multi-modal auxiliary information to enhance audio tasks has sparked widespread research interest.
2 code implementations • 13 Feb 2024 • Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, JiaMing Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen
We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 23 May 2023 • Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie
The recently proposed serialized output training (SOT) simplifies multi-talker automatic speech recognition (ASR) by generating speaker transcriptions separated by a special token.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 21 May 2023 • Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai
In addition, a two-pass decoding strategy is further proposed to fully leverage the contextual modeling ability resulting in a better recognition performance.
no code implementations • 1 Nov 2022 • Mohan Shi, Jie Zhang, Zhihao Du, Fan Yu, Qian Chen, Shiliang Zhang, Li-Rong Dai
Speaker-attributed automatic speech recognition (SA-ASR) in multi-party meeting scenarios is one of the most valuable and challenging ASR task.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • 17 Oct 2022 • Yingcan Wei, Matthias Langer, Fan Yu, Minseok Lee, Kingsley Liu, Jerry Shi, Joey Wang
Recommendation systems are of crucial importance for a variety of modern apps and web services, such as news feeds, social networks, e-commerce, search, etc.
no code implementations • 17 Oct 2022 • Joey Wang, Yingcan Wei, Minseok Lee, Matthias Langer, Fan Yu, Jie Liu, Alex Liu, Daniel Abel, Gems Guo, Jianbing Dong, Jerry Shi, Kunlun Li
In this talk, we introduce Merlin HugeCTR.
2 code implementations • 20 Aug 2022 • Jun Zhang, Sirui Liu, Mengyun Chen, Haotian Chu, Min Wang, Zidong Wang, Jialiang Yu, Ningxi Ni, Fan Yu, Diqing Chen, Yi Isaac Yang, Boxin Xue, Lijiang Yang, YuAn Liu, Yi Qin Gao
Data-driven predictive methods which can efficiently and accurately transform protein sequences into biologically active structures are highly valuable for scientific research and medical development.
1 code implementation • Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2022 • Xiang Huang, Hongsheng Liu, Beiji Shi, Zidong Wang, Kang Yang, Yang Li, Min Wang, Haotian Chu, Jing Zhou, Fan Yu, Bei Hua, Bin Dong, Lei Chen
In recent years, deep learning technology has been used to solve partial differential equations (PDEs), among which the physics-informed neural networks (PINNs)method emerges to be a promising method for solving both forward and inverse PDE problems.
2 code implementations • 24 Jun 2022 • Sirui Liu, Jun Zhang, Haotian Chu, Min Wang, Boxin Xue, Ningxi Ni, Jialiang Yu, Yuhao Xie, Zhenyu Chen, Mengyun Chen, YuAn Liu, Piya Patra, Fan Xu, Jie Chen, Zidong Wang, Lijiang Yang, Fan Yu, Lei Chen, Yi Qin Gao
We provide in addition the benchmark training procedure for SOTA protein structure prediction model on this dataset.
no code implementations • 31 Mar 2022 • Fan Yu, Zhihao Du, Shiliang Zhang, Yuxiao Lin, Lei Xie
Therefore, we propose the second approach, WD-SOT, to address alignment errors by introducing a word-level diarization model, which can get rid of such timestamp alignment dependency.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • ACL 2022 • Jiawei Zhou, Xiaoguang Li, Lifeng Shang, Lan Luo, Ke Zhan, Enrui Hu, Xinyu Zhang, Hao Jiang, Zhao Cao, Fan Yu, Xin Jiang, Qun Liu, Lei Chen
To alleviate the data scarcity problem in training question answering systems, recent works propose additional intermediate pre-training for dense passage retrieval (DPR).
no code implementations • 15 Nov 2021 • Xiang Huang, Zhanhong Ye, Hongsheng Liu, Beiji Shi, Zidong Wang, Kang Yang, Yang Li, Bingya Weng, Min Wang, Haotian Chu, Fan Yu, Bei Hua, Lei Chen, Bin Dong
Many important problems in science and engineering require solving the so-called parametric partial differential equations (PDEs), i. e., PDEs with different physical parameters, boundary conditions, shapes of computation domains, etc.
no code implementations • 2 Nov 2021 • Xiang Huang, Hongsheng Liu, Beiji Shi, Zidong Wang, Kang Yang, Yang Li, Bingya Weng, Min Wang, Haotian Chu, Jing Zhou, Fan Yu, Bei Hua, Lei Chen, Bin Dong
In recent years, deep learning technology has been used to solve partial differential equations (PDEs), among which the physics-informed neural networks (PINNs) emerges to be a promising method for solving both forward and inverse PDE problems.
no code implementations • 25th International Conference on Pattern Recognition (ICPR) 2021 • Ge Pan, Han Zhang, Fan Yu, Yonghong Song, Yuanlin Zhang, Han Yuan
In this paper, we propose a method called YOLA (You Ought to Look Around) which includes three parts: 1) a robust backbone SPN-I3D for extracting spatio-temporal features.
no code implementations • 14 Oct 2021 • Hao Jiang, Ke Zhan, Jianwei Qu, Yongkang Wu, Zhaoye Fei, Xinyu Zhang, Lei Chen, Zhicheng Dou, Xipeng Qiu, Zikai Guo, Ruofei Lai, Jiawen Wu, Enrui Hu, Yinxia Zhang, Yantao Jia, Fan Yu, Zhao Cao
To increase the number of activated experts without an increase in computational cost, we propose SAM (Switch and Mixture) routing, an efficient hierarchical routing mechanism that activates multiple experts in a same device (GPU).
4 code implementations • CVPR 2021 • Changsu Cao, Jiaqi Hu, Wengang Zhang, Xusheng Xu, Dechin Chen, Fan Yu, Jun Li, Hanshi Hu, Dingshun Lv, Man-Hong Yung
We also propose a lossless compression algorithm based on iVPF.
3 code implementations • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021 • Xinyu Zhang, Ke Zhan, Enrui Hu, Chengzhen Fu, Lan Luo, Hao Jiang, Yantao Jia, Fan Yu, Zhicheng Dou, Zhao Cao, Lei Chen
Currently, the most popular method for open-domain Question Answering (QA) adopts "Retriever and Reader" pipeline, where the retriever extracts a list of candidate documents from a large set of documents followed by a ranker to rank the most relevant documents and the reader extracts answer from the candidates.
1 code implementation • CVPR 2021 • Zedong Tang, Fenlong Jiang, Maoguo Gong, Hao Li, Yue Wu, Fan Yu, Zidong Wang, Min Wang
For the fully connected layers, by utilizing the low-rank property of Kronecker factors of Fisher information matrix, our method only requires inverting a small matrix to approximate the curvature with desirable accuracy.
no code implementations • Proceedings of the 2021 International Conference on Management of Data 2021 • Yidi Wu, Yuntao Gui, Tatiana Jin, James Cheng, Xiao Yan, Peiqi Yin, Yufei Cai, Bo Tang, Fan Yu
Graph neural networks (GNNs) have achieved remarkable performance in many graph analytics tasks such as node classification, link prediction and graph clustering.
1 code implementation • Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 • Zedong Tang, Fenlong Jiang, Maoguo Gong, Hao Li, Yue Wu, Fan Yu, Zidong Wang, Min Wang
For the fully connected layers, by utilizing the low-rank property of Kronecker factors of Fisher information matrix, our method only requires inverting a small matrix to approximate the curvature with desirable accuracy.
no code implementations • AAAI Technical Track on Machine Learning 2021 • Mengyun Chen, Kaixin Gao, Xiaolei Liu, Zidong Wang, Ningxi Ni, Qian Zhang, Lei Chen, Chao Ding, ZhengHai Huang, Min Wang, Shuangling Wang, Fan Yu, Xinyuan Zhao, Dachuan Xu
It is well-known that second-order optimizer can accelerate the training of deep neural networks, however, the huge computation cost of second-order optimization makes it impractical to apply in real practice.
1 code implementation • Proceedings of the Sixteenth European Conference on Computer Systems 2021 • Zhenkun Cai, Xiao Yan, Yidi Wu, Kaihao Ma, James Cheng, Fan Yu
Graph neural networks (GNNs) have gained increasing popularity in many areas such as e-commerce, social networks and bio-informatics.
no code implementations • Proceedings of the Sixteenth European Conference on Computer Systems 2021 • Yidi Wu, Kaihao Ma, Zhenkun Cai, Tatiana Jin, Boyang Li, Chenguang Zheng, James Cheng, Fan Yu
Graph neural networks (GNNs) have achieved breakthrough performance in graph analytics such as node classification, link prediction and graph clustering.
no code implementations • IEEE Transactions on Parallel and Distributed Systems 2021 • Yidi Wu, Kaihao Ma, Xiao Yan, Zhi Liu, Zhenkun Cai, Yuzhen Huang, James Cheng, Han Yuan, Fan Yu
We study how to support elasticity, that is, the ability to dynamically adjust the parallelism (i. e., the number of GPUs), for deep neural network (DNN) training in a GPU cluster.
4 code implementations • 2 Feb 2021 • Zhuoyuan Yao, Di wu, Xiong Wang, BinBin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, Xin Lei
In this paper, we propose an open source, production first, and production ready speech recognition toolkit called WeNet in which a new two-pass approach is implemented to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.
6 code implementations • ICCV 2021 • Xinjun Ma, Yue Gong, Qirui Wang, Jingwei Huang, Lei Chen, Fan Yu
As a result, we achieve promising results on all datasets and the highest F-Score on the online TNT intermediate benchmark.
Ranked #9 on
Point Clouds
on Tanks and Temples
no code implementations • 24 Dec 2020 • Zedong Tang, Fenlong Jiang, Junke Song, Maoguo Gong, Hao Li, Fan Yu, Zidong Wang, Min Wang
Optimizers that further adjust the scale of gradient, such as Adam, Natural Gradient (NG), etc., despite widely concerned and used by the community, are often found poor generalization performance, compared with Stochastic Gradient Descent (SGD).
5 code implementations • 10 Dec 2020 • BinBin Zhang, Di wu, Zhuoyuan Yao, Xiong Wang, Fan Yu, Chao Yang, Liyong Guo, Yaguang Hu, Lei Xie, Xin Lei
In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.
Ranked #12 on
Speech Recognition
on AISHELL-1
no code implementations • 27 Nov 2020 • Kai-Xin Gao, Xiao-Lei Liu, Zheng-Hai Huang, Min Wang, Shuangling Wang, Zidong Wang, Dachuan Xu, Fan Yu
Using second-order optimization methods for training deep neural networks (DNNs) has attracted many researchers.
no code implementations • 21 Nov 2020 • Kai-Xin Gao, Xiao-Lei Liu, Zheng-Hai Huang, Min Wang, Zidong Wang, Dachuan Xu, Fan Yu
There have been many attempts to use second-order optimization methods for training deep neural networks.
no code implementations • 13 Nov 2020 • Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao
Automatic speech recognition (ASR) has been significantly advanced with the use of deep learning and big data.
Sound Audio and Speech Processing
no code implementations • 17 Jul 2020 • Georgios Akrivis, Minghua Chen, Fan Yu, Zhi Zhou
In combination with the Grenander--Szeg\"o theorem, we observe that a relaxed positivity condition on multipliers, milder than the basic %fundamental requirement of the Nevanlinna--Odeh multipliers that the sum of the absolute values of their components is strictly less than $1$, makes the energy technique applicable to the stability analysis of BDF methods for parabolic equations with selfadjoint elliptic part.
Numerical Analysis Numerical Analysis
1 code implementation • 16 Apr 2020 • Zhenkun Cai, Kaihao Ma, Xiao Yan, Yidi Wu, Yuzhen Huang, James Cheng, Teng Su, Fan Yu
A good parallelization strategy can significantly improve the efficiency or reduce the cost for the distributed training of deep neural networks (DNNs).