Search Results for author: Zhichao Wang

Found 31 papers, 4 papers with code

The Sogou-TIIC Speech Translation System for IWSLT 2018

no code implementations • IWSLT (EMNLP) 2018 • Yuguang Wang, Liangliang Shi, Linyu Wei, Weifeng Zhu, Jinkun Chen, Zhichao Wang, Shixue Wen, Wei Chen, Yanfeng Wang, Jia Jia

Our final average result on speech translation is 31. 02 BLEU.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Nonlinear spiked covariance matrices and signal propagation in deep neural networks

no code implementations • 15 Feb 2024 • Zhichao Wang, Denny Wu, Zhou Fan

Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) defined by the nonlinear feature map of a feedforward neural network.

Representation Learning

Paper
Add Code

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

no code implementations • 19 Jan 2024 • Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Zhuo Chen, Lei Xie, Yuping Wang, Yuxuan Wang

Specifically, to enable streaming capability, StreamVoice employs a fully causal context-aware LM with a temporal-independent acoustic predictor, while alternately processing semantic and acoustic features at each time step of autoregression which eliminates the dependence on complete source speech.

Language Modelling Voice Conversion

Paper
Add Code

FedRec+: Enhancing Privacy and Addressing Heterogeneity in Federated Recommendation Systems

no code implementations • 31 Oct 2023 • Lin Wang, Zhichao Wang, Xi Leng, Xiaoying Tang

Preserving privacy and reducing communication costs for edge users pose significant challenges in recommendation systems.

Federated Learning Recommendation Systems

Paper
Add Code

VITS-Based Singing Voice Conversion Leveraging Whisper and multi-scale F0 Modeling

no code implementations • 4 Oct 2023 • Ziqian Ning, Yuepeng Jiang, Zhichao Wang, Bin Zhang, Lei Xie

This paper introduces the T23 team's system submitted to the Singing Voice Conversion Challenge 2023.

Voice Conversion

Paper
Add Code

Proposing an intelligent mesh smoothing method with graph neural networks

no code implementations • 24 Sep 2023 • Zhichao Wang, Xinhai Chen, Junjun Yan, Jie Liu

With a lightweight model, GMSNet can effectively smoothing mesh nodes with varying degrees and remain unaffected by the order of input data.

Data Augmentation

Paper
Add Code

MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

no code implementations • 3 Sep 2023 • Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yuping Wang

In addition to conveying the linguistic content from source speech to converted speech, maintaining the speaking style of source speech also plays an important role in the voice conversion (VC) task, which is essential in many scenarios with highly expressive source speech, such as dubbing and data augmentation.

Data Augmentation Disentanglement +3

Paper
Add Code

Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation

1 code implementation • 12 Aug 2023 • Zhichao Wang, Mengyu Dai, Keld Lundgaard

In the second stage, an audio-driven talking head generation method is employed to produce compelling videos privided the audio generated in the first stage.

Talking Head Generation

Paper
Code

Auxiliary-Tasks Learning for Physics-Informed Neural Network-Based Partial Differential Equations Solving

1 code implementation • 12 Jul 2023 • Junjun Yan, Xinhai Chen, Zhichao Wang, Enqiang Zhou, Jie Liu

To alleviate these issues, we proposed auxiliary-task learning-based physics-informed neural networks (ATL-PINNs), which provide four different auxiliary-task learning modes and investigate their performance compared with original PINNs.

Paper
Code

LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models

no code implementations • 18 Jun 2023 • Zhichao Wang, Yuanzhe Chen, Lei Xie, Qiao Tian, Yuping Wang

An intuitive approach is to follow AudioLM - Tokenizing speech into semantic and acoustic tokens respectively by HuBERT and SoundStream, and converting source semantic tokens to target acoustic tokens conditioned on acoustic tokens of the target speaker.

Audio Generation Disentanglement +2

Paper
Add Code

ST-PINN: A Self-Training Physics-Informed Neural Network for Partial Differential Equations

1 code implementation • 15 Jun 2023 • Junjun Yan, Xinhai Chen, Zhichao Wang, Enqiang Zhoui, Jie Liu

To address the issue of low accuracy and convergence problems of existing PINNs, we propose a self-training physics-informed neural network, ST-PINN.

Pseudo Label Self-Learning

Paper
Code

Faithful and Efficient Explanations for Neural Networks via Neural Tangent Kernel Surrogate Models

1 code implementation • 23 May 2023 • Andrew Engel, Zhichao Wang, Natalie S. Frank, Ioana Dumitriu, Sutanay Choudhury, Anand Sarwate, Tony Chiang

A second trend has been to utilize kernel functions in various explain-by-example or data attribution tasks.

Data Poisoning Language Modelling +1

Paper
Code

Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion

no code implementations • 12 May 2023 • Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang

Specifically, to flexibly adapt to the dynamic-variant speaker characteristic in the temporal and channel axis of the speech, we propose a novel fine-grained speaker modeling method, called temporal-channel retrieval (TCR), to find out when and where speaker information appears in speech.

Disentanglement Retrieval +2

Paper
Add Code

FedEBA+: Towards Fair and Effective Federated Learning via Entropy-Based Model

no code implementations • 29 Jan 2023 • Lin Wang, Zhichao Wang, Sai Praneeth Karimireddy, Xiaoying Tang

Ensuring fairness is a crucial aspect of Federated Learning (FL), which enables the model to perform consistently across all clients.

Fairness Federated Learning

Paper
Add Code

Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints

no code implementations • 16 Nov 2022 • Zhichao Wang, Xinsheng Wang, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang

Conveying the linguistic content and maintaining the source speech's speaking style, such as intonation and emotion, is essential in voice conversion (VC).

Voice Conversion

Paper
Add Code

Overparameterized random feature regression with nearly orthogonal data

no code implementations • 11 Nov 2022 • Zhichao Wang, Yizhe Zhu

Our analysis shows high-probability non-asymptotic concentration results for the training errors, cross-validations, and generalization errors of RFRR centered around their respective values for a kernel ridge regression (KRR).

regression

Paper
Add Code

Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features

no code implementations • 9 Nov 2022 • Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie, Mengxiao Bi

We further fuse the linguistic and para-linguistic features through an attention mechanism, where speaker-dependent prosody features are adopted as the attention query, which result from a prosody encoder with target speaker embedding and normalized pitch and energy of source speech as input.

Voice Conversion

Paper
Add Code

Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance

no code implementations • 27 Oct 2022 • Yuanzhe Chen, Ming Tu, Tang Li, Xin Li, Qiuqiang Kong, Jiaxin Li, Zhichao Wang, Qiao Tian, Yuping Wang, Yuxuan Wang

In this paper, we propose to use intermediate bottleneck features (IBFs) to replace PPGs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

An Improved Structured Mesh Generation Method Based on Physics-informed Neural Networks

no code implementations • 18 Oct 2022 • Xinhai Chen, Jie Liu, Junjun Yan, Zhichao Wang, Chunye Gong

To improve the prediction accuracy of the neural network, we also introduce a novel auxiliary line strategy and an efficient network model during meshing.

Paper
Add Code

TorchNTK: A Library for Calculation of Neural Tangent Kernels of PyTorch Models

no code implementations • 24 May 2022 • Andrew Engel, Zhichao Wang, Anand D. Sarwate, Sutanay Choudhury, Tony Chiang

We introduce torchNTK, a python library to calculate the empirical neural tangent kernel (NTK) of neural network models in the PyTorch framework.

Paper
Add Code

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

no code implementations • 3 May 2022 • Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang

We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$.

Paper
Add Code

Causal Disentanglement for Semantics-Aware Intent Learning in Recommendation

no code implementations • 5 Feb 2022 • Xiangmeng Wang, Qian Li, Dianer Yu, Peng Cui, Zhichao Wang, Guandong Xu

Traditional recommendation models trained on observational interaction data have generated large impacts in a wide range of applications, it faces bias problems that cover users' true intent and thus deteriorate the recommendation effectiveness.

Disentanglement

Paper
Add Code

IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion

no code implementations • 2 Jan 2022 • Wendong Gan, Bolong Wen, Ying Yan, Haitao Chen, Zhichao Wang, Hongqiang Du, Lei Xie, Kaixuan Guo, Hai Li

Specifically, prosody vector is first extracted from pre-trained VQ-Wav2Vec model, where rich prosody information is embedded while most speaker and environment information are removed effectively by quantization.

Quantization Voice Conversion

Paper
Add Code

Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios

no code implementations • 23 Dec 2021 • Qicong Xie, Tao Li, Xinsheng Wang, Zhichao Wang, Lei Xie, Guoqiao Yu, Guanglu Wan

Moreover, the explicit prosody features used in the prosody predicting module can increase the diversity of synthetic speech by adjusting the value of prosody features.

Speech Synthesis Style Transfer +1

Paper
Add Code

One-shot Voice Conversion For Style Transfer Based On Speaker Adaptation

no code implementations • 24 Nov 2021 • Zhichao Wang, Qicong Xie, Tao Li, Hongqiang Du, Lei Xie, Pengcheng Zhu, Mengxiao Bi

One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness.

Style Transfer Voice Conversion

Paper
Add Code

Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks

no code implementations • 20 Sep 2021 • Zhichao Wang, Yizhe Zhu

As an application, we show that random feature regression induced by the empirical kernel achieves the same asymptotic performance as its limiting kernel regression under the ultra-wide regime.

regression

Paper
Add Code

Hilbert Sinkhorn Divergence for Optimal Transport

no code implementations • CVPR 2021 • Qian Li, Zhichao Wang, Gang Li, Jun Pang, Guandong Xu

Sinkhorn divergence has become a very popular metric to compare probability distributions in optimal transport.

Image Classification Topological Data Analysis

Paper
Add Code

Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion

no code implementations • 16 Jun 2021 • Zhichao Wang, Xinyong Zhou, Fengyu Yang, Tao Li, Hongqiang Du, Lei Xie, Wendong Gan, Haitao Chen, Hai Li

Specifically, prosodic features are used to explicit model prosody, while VAE and reference encoder are used to implicitly model prosody, which take Mel spectrum and bottleneck feature as input respectively.

Style Transfer Voice Conversion

Paper
Add Code

WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition

no code implementations • 8 Apr 2021 • Zhichao Wang, Wenwen Yang, Pan Zhou, Wei Chen

Recently, attention-based encoder-decoder (AED) end-to-end (E2E) models have drawn more and more attention in the field of automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks

no code implementations • NeurIPS 2020 • Zhou Fan, Zhichao Wang

We study the eigenvalue distributions of the Conjugate Kernel and Neural Tangent Kernel associated to multi-layer feedforward neural networks.

Paper
Add Code

Polynomial Representation for Persistence Diagram

no code implementations • CVPR 2019 • Zhichao Wang, Qian Li, Gang Li, Guandong Xu

In this work, we discover a set of general polynomials that vanish on vectorized PDs and extract the task-adapted feature representation from these polynomials.

BIG-bench Machine Learning Topological Data Analysis

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.