no code implementations • IWSLT (EMNLP) 2018 • Yuguang Wang, Liangliang Shi, Linyu Wei, Weifeng Zhu, Jinkun Chen, Zhichao Wang, Shixue Wen, Wei Chen, Yanfeng Wang, Jia Jia
Our final average result on speech translation is 31. 02 BLEU.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
no code implementations • 28 Dec 2024 • Zibin Pan, Zhichao Wang, Chi Li, Kaiyan Zheng, Boqi Wang, Xiaoying Tang, Junhua Zhao
A steepest descent direction for unlearning is then calculated in the condition of being non-conflicting with other clients' gradients and closest to the target client's gradient.
no code implementations • 18 Dec 2024 • Hai-Xiao Wang, Zhichao Wang
Venturing into the transductive learning landscape, we, for the first time, pinpoint the information-theoretical threshold for the exact recovery of all test nodes in CSBM.
1 code implementation • 3 Nov 2024 • Aliyah R. Hsu, James Zhu, Zhichao Wang, Bin Bi, Shubham Mehrotra, Shiva K. Pentyala, Katherine Tan, Xiang-Bo Mao, Roshanak Omrani, Sougata Chaudhuri, Regunathan Radhakrishnan, Sitaram Asur, Claire Na Cheng, Bin Yu
LLMs have demonstrated impressive proficiency in generating coherent and high-quality text, making them valuable across a range of text-generation tasks.
no code implementations • 28 Oct 2024 • Zhichao Wang, Bin Bi, Zixu Zhu, Xiangbo Mao, Jun Wang, Shiyu Wang
Due to the differing nature and objective functions of SFT and alignment, catastrophic forgetting has become a significant issue.
no code implementations • 26 Oct 2024 • Zhichao Wang, Lin Wang, Yongxin Guo, Ying-Jun Angela Zhang, Xiaoying Tang
The increasing concern for data privacy has driven the rapid development of federated learning (FL), a privacy-preserving collaborative paradigm.
no code implementations • 19 Oct 2024 • Zhichao Wang, Xinhai Chen, Chunye Gong, Bo Yang, Liang Deng, Yufei Sun, Yufei Pang, Jie Liu
We verified the proposed model on both 2D and 3D meshes.
no code implementations • 27 Aug 2024 • Zhichao Wang, Bin Bi, Can Huang, Shiva Kumar Pentyala, Zixu James Zhu, Sitaram Asur, Na Claire Cheng
DPO proposes a mapping between an optimal policy and a reward, greatly simplifying the training process of RLHF.
no code implementations • 5 Aug 2024 • Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Lei Xie, Yuping Wang
StreamVoice+ integrates a semantic encoder and a connector with the original StreamVoice framework, now trained using a non-streaming ASR.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 2 Aug 2024 • Parthe Pandit, Zhichao Wang, Yizhe Zhu
In this regime, under certain conditions on the data distribution, the kernel random matrix involved in KRR exhibits behavior akin to that of a linear kernel.
no code implementations • 24 Jul 2024 • Zhichao Wang, Xiaoliang Yan, Shreyes Melkote, David Rosen
Generative design (GD) methods aim to automatically generate a wide variety of designs that satisfy functional or aesthetic design requirements.
no code implementations • 23 Jul 2024 • Zhichao Wang, Bin Bi, Shiva Kumar Pentyala, Kiran Ramnath, Sougata Chaudhuri, Shubham Mehrotra, Zixu, Zhu, Xiang-Bo Mao, Sitaram Asur, Na, Cheng
With advancements in self-supervised learning, the availability of trillions tokens in a pre-training corpus, instruction fine-tuning, and the development of large Transformers with billions of parameters, large language models (LLMs) are now capable of generating factual and coherent responses to human queries.
no code implementations • 25 Jun 2024 • Shiva Kumar Pentyala, Zhichao Wang, Bin Bi, Kiran Ramnath, Xiang-Bo Mao, Regunathan Radhakrishnan, Sitaram Asur, Na, Cheng
This paper introduces PAFT, a new PArallel training paradigm for effective LLM Fine-Tuning, which independently performs SFT and preference alignment (e. g., DPO and ORPO, etc.)
1 code implementation • 17 Jun 2024 • Lin Wang, Zhichao Wang, Xiaoying Tang
It enables full parameter training in FL with only selected block updates and uploads, thereby reducing communication, computation, and memory costs.
no code implementations • 12 Jun 2024 • Ziqian Ning, Shuai Wang, Pengcheng Zhu, Zhichao Wang, Jixun Yao, Lei Xie, Mengxiao Bi
With speaker-independent semantic tokens to guide the training of the content encoder, the dependency on ASR is removed and the model can operate under extremely small chunks, with cascading errors eliminated.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 15 Feb 2024 • Zhichao Wang, Denny Wu, Zhou Fan
Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) defined by the nonlinear feature map of a feedforward neural network.
no code implementations • 19 Jan 2024 • Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Lei Xie, Yuping Wang
Specifically, to enable streaming capability, StreamVoice employs a fully causal context-aware LM with a temporal-independent acoustic predictor, while alternately processing semantic and acoustic features at each time step of autoregression which eliminates the dependence on complete source speech.
no code implementations • 31 Oct 2023 • Lin Wang, Zhichao Wang, Xi Leng, Xiaoying Tang
Preserving privacy and reducing communication costs for edge users pose significant challenges in recommendation systems.
no code implementations • 4 Oct 2023 • Ziqian Ning, Yuepeng Jiang, Zhichao Wang, Bin Zhang, Lei Xie
This paper introduces the T23 team's system submitted to the Singing Voice Conversion Challenge 2023.
no code implementations • 24 Sep 2023 • Zhichao Wang, Xinhai Chen, Junjun Yan, Jie Liu
With a lightweight model, GMSNet can effectively smoothing mesh nodes with varying degrees and remain unaffected by the order of input data.
no code implementations • 3 Sep 2023 • Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yuping Wang
In addition to conveying the linguistic content from source speech to converted speech, maintaining the speaking style of source speech also plays an important role in the voice conversion (VC) task, which is essential in many scenarios with highly expressive source speech, such as dubbing and data augmentation.
1 code implementation • 12 Aug 2023 • Zhichao Wang, Mengyu Dai, Keld Lundgaard
In the second stage, an audio-driven talking head generation method is employed to produce compelling videos privided the audio generated in the first stage.
1 code implementation • 12 Jul 2023 • Junjun Yan, Xinhai Chen, Zhichao Wang, Enqiang Zhou, Jie Liu
To alleviate these issues, we proposed auxiliary-task learning-based physics-informed neural networks (ATL-PINNs), which provide four different auxiliary-task learning modes and investigate their performance compared with original PINNs.
no code implementations • 18 Jun 2023 • Zhichao Wang, Yuanzhe Chen, Lei Xie, Qiao Tian, Yuping Wang
An intuitive approach is to follow AudioLM - Tokenizing speech into semantic and acoustic tokens respectively by HuBERT and SoundStream, and converting source semantic tokens to target acoustic tokens conditioned on acoustic tokens of the target speaker.
1 code implementation • 15 Jun 2023 • Junjun Yan, Xinhai Chen, Zhichao Wang, Enqiang Zhoui, Jie Liu
To address the issue of low accuracy and convergence problems of existing PINNs, we propose a self-training physics-informed neural network, ST-PINN.
1 code implementation • 23 May 2023 • Andrew Engel, Zhichao Wang, Natalie S. Frank, Ioana Dumitriu, Sutanay Choudhury, Anand Sarwate, Tony Chiang
A second trend has been to utilize kernel functions in various explain-by-example or data attribution tasks.
no code implementations • 12 May 2023 • Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang
Specifically, to flexibly adapt to the dynamic-variant speaker characteristic in the temporal and channel axis of the speech, we propose a novel fine-grained speaker modeling method, called temporal-channel retrieval (TCR), to find out when and where speaker information appears in speech.
no code implementations • 29 Jan 2023 • Lin Wang, Zhichao Wang, Sai Praneeth Karimireddy, Xiaoying Tang
Ensuring fairness is a crucial aspect of Federated Learning (FL), which enables the model to perform consistently across all clients.
no code implementations • 16 Nov 2022 • Zhichao Wang, Xinsheng Wang, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang
Conveying the linguistic content and maintaining the source speech's speaking style, such as intonation and emotion, is essential in voice conversion (VC).
no code implementations • 11 Nov 2022 • Zhichao Wang, Yizhe Zhu
Our analysis shows high-probability non-asymptotic concentration results for the training errors, cross-validations, and generalization errors of RFRR centered around their respective values for a kernel ridge regression (KRR).
no code implementations • 9 Nov 2022 • Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie, Mengxiao Bi
We further fuse the linguistic and para-linguistic features through an attention mechanism, where speaker-dependent prosody features are adopted as the attention query, which result from a prosody encoder with target speaker embedding and normalized pitch and energy of source speech as input.
no code implementations • 27 Oct 2022 • Yuanzhe Chen, Ming Tu, Tang Li, Xin Li, Qiuqiang Kong, Jiaxin Li, Zhichao Wang, Qiao Tian, Yuping Wang, Yuxuan Wang
In this paper, we propose to use intermediate bottleneck features (IBFs) to replace PPGs.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 18 Oct 2022 • Xinhai Chen, Jie Liu, Junjun Yan, Zhichao Wang, Chunye Gong
To improve the prediction accuracy of the neural network, we also introduce a novel auxiliary line strategy and an efficient network model during meshing.
no code implementations • 24 May 2022 • Andrew Engel, Zhichao Wang, Anand D. Sarwate, Sutanay Choudhury, Tony Chiang
We introduce torchNTK, a python library to calculate the empirical neural tangent kernel (NTK) of neural network models in the PyTorch framework.
no code implementations • 3 May 2022 • Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang
We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$.
no code implementations • 5 Feb 2022 • Xiangmeng Wang, Qian Li, Dianer Yu, Peng Cui, Zhichao Wang, Guandong Xu
Traditional recommendation models trained on observational interaction data have generated large impacts in a wide range of applications, it faces bias problems that cover users' true intent and thus deteriorate the recommendation effectiveness.
no code implementations • 2 Jan 2022 • Wendong Gan, Bolong Wen, Ying Yan, Haitao Chen, Zhichao Wang, Hongqiang Du, Lei Xie, Kaixuan Guo, Hai Li
Specifically, prosody vector is first extracted from pre-trained VQ-Wav2Vec model, where rich prosody information is embedded while most speaker and environment information are removed effectively by quantization.
no code implementations • 23 Dec 2021 • Qicong Xie, Tao Li, Xinsheng Wang, Zhichao Wang, Lei Xie, Guoqiao Yu, Guanglu Wan
Moreover, the explicit prosody features used in the prosody predicting module can increase the diversity of synthetic speech by adjusting the value of prosody features.
no code implementations • 24 Nov 2021 • Zhichao Wang, Qicong Xie, Tao Li, Hongqiang Du, Lei Xie, Pengcheng Zhu, Mengxiao Bi
One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness.
no code implementations • 20 Sep 2021 • Zhichao Wang, Yizhe Zhu
As an application, we show that random feature regression induced by the empirical kernel achieves the same asymptotic performance as its limiting kernel regression under the ultra-wide regime.
no code implementations • CVPR 2021 • Qian Li, Zhichao Wang, Gang Li, Jun Pang, Guandong Xu
Sinkhorn divergence has become a very popular metric to compare probability distributions in optimal transport.
no code implementations • 16 Jun 2021 • Zhichao Wang, Xinyong Zhou, Fengyu Yang, Tao Li, Hongqiang Du, Lei Xie, Wendong Gan, Haitao Chen, Hai Li
Specifically, prosodic features are used to explicit model prosody, while VAE and reference encoder are used to implicitly model prosody, which take Mel spectrum and bottleneck feature as input respectively.
no code implementations • 8 Apr 2021 • Zhichao Wang, Wenwen Yang, Pan Zhou, Wei Chen
Recently, attention-based encoder-decoder (AED) end-to-end (E2E) models have drawn more and more attention in the field of automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • NeurIPS 2020 • Zhou Fan, Zhichao Wang
We study the eigenvalue distributions of the Conjugate Kernel and Neural Tangent Kernel associated to multi-layer feedforward neural networks.
no code implementations • CVPR 2019 • Zhichao Wang, Qian Li, Gang Li, Guandong Xu
In this work, we discover a set of general polynomials that vanish on vectorized PDs and extract the task-adapted feature representation from these polynomials.