Search Results for author: Zhichao Wang

Found 31 papers, 4 papers with code

Nonlinear spiked covariance matrices and signal propagation in deep neural networks

no code implementations15 Feb 2024 Zhichao Wang, Denny Wu, Zhou Fan

Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) defined by the nonlinear feature map of a feedforward neural network.

Representation Learning

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

no code implementations19 Jan 2024 Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Zhuo Chen, Lei Xie, Yuping Wang, Yuxuan Wang

Specifically, to enable streaming capability, StreamVoice employs a fully causal context-aware LM with a temporal-independent acoustic predictor, while alternately processing semantic and acoustic features at each time step of autoregression which eliminates the dependence on complete source speech.

Language Modelling Voice Conversion

FedRec+: Enhancing Privacy and Addressing Heterogeneity in Federated Recommendation Systems

no code implementations31 Oct 2023 Lin Wang, Zhichao Wang, Xi Leng, Xiaoying Tang

Preserving privacy and reducing communication costs for edge users pose significant challenges in recommendation systems.

Federated Learning Recommendation Systems

VITS-Based Singing Voice Conversion Leveraging Whisper and multi-scale F0 Modeling

no code implementations4 Oct 2023 Ziqian Ning, Yuepeng Jiang, Zhichao Wang, Bin Zhang, Lei Xie

This paper introduces the T23 team's system submitted to the Singing Voice Conversion Challenge 2023.

Voice Conversion

Proposing an intelligent mesh smoothing method with graph neural networks

no code implementations24 Sep 2023 Zhichao Wang, Xinhai Chen, Junjun Yan, Jie Liu

With a lightweight model, GMSNet can effectively smoothing mesh nodes with varying degrees and remain unaffected by the order of input data.

Data Augmentation

MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

no code implementations3 Sep 2023 Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yuping Wang

In addition to conveying the linguistic content from source speech to converted speech, maintaining the speaking style of source speech also plays an important role in the voice conversion (VC) task, which is essential in many scenarios with highly expressive source speech, such as dubbing and data augmentation.

Data Augmentation Disentanglement +3

Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation

1 code implementation12 Aug 2023 Zhichao Wang, Mengyu Dai, Keld Lundgaard

In the second stage, an audio-driven talking head generation method is employed to produce compelling videos privided the audio generated in the first stage.

Talking Head Generation

Auxiliary-Tasks Learning for Physics-Informed Neural Network-Based Partial Differential Equations Solving

1 code implementation12 Jul 2023 Junjun Yan, Xinhai Chen, Zhichao Wang, Enqiang Zhou, Jie Liu

To alleviate these issues, we proposed auxiliary-task learning-based physics-informed neural networks (ATL-PINNs), which provide four different auxiliary-task learning modes and investigate their performance compared with original PINNs.

LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models

no code implementations18 Jun 2023 Zhichao Wang, Yuanzhe Chen, Lei Xie, Qiao Tian, Yuping Wang

An intuitive approach is to follow AudioLM - Tokenizing speech into semantic and acoustic tokens respectively by HuBERT and SoundStream, and converting source semantic tokens to target acoustic tokens conditioned on acoustic tokens of the target speaker.

Audio Generation Disentanglement +2

ST-PINN: A Self-Training Physics-Informed Neural Network for Partial Differential Equations

1 code implementation15 Jun 2023 Junjun Yan, Xinhai Chen, Zhichao Wang, Enqiang Zhoui, Jie Liu

To address the issue of low accuracy and convergence problems of existing PINNs, we propose a self-training physics-informed neural network, ST-PINN.

Pseudo Label Self-Learning

Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion

no code implementations12 May 2023 Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang

Specifically, to flexibly adapt to the dynamic-variant speaker characteristic in the temporal and channel axis of the speech, we propose a novel fine-grained speaker modeling method, called temporal-channel retrieval (TCR), to find out when and where speaker information appears in speech.

Disentanglement Retrieval +2

FedEBA+: Towards Fair and Effective Federated Learning via Entropy-Based Model

no code implementations29 Jan 2023 Lin Wang, Zhichao Wang, Sai Praneeth Karimireddy, Xiaoying Tang

Ensuring fairness is a crucial aspect of Federated Learning (FL), which enables the model to perform consistently across all clients.

Fairness Federated Learning

Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints

no code implementations16 Nov 2022 Zhichao Wang, Xinsheng Wang, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang

Conveying the linguistic content and maintaining the source speech's speaking style, such as intonation and emotion, is essential in voice conversion (VC).

Voice Conversion

Overparameterized random feature regression with nearly orthogonal data

no code implementations11 Nov 2022 Zhichao Wang, Yizhe Zhu

Our analysis shows high-probability non-asymptotic concentration results for the training errors, cross-validations, and generalization errors of RFRR centered around their respective values for a kernel ridge regression (KRR).

regression

Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features

no code implementations9 Nov 2022 Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie, Mengxiao Bi

We further fuse the linguistic and para-linguistic features through an attention mechanism, where speaker-dependent prosody features are adopted as the attention query, which result from a prosody encoder with target speaker embedding and normalized pitch and energy of source speech as input.

Voice Conversion

An Improved Structured Mesh Generation Method Based on Physics-informed Neural Networks

no code implementations18 Oct 2022 Xinhai Chen, Jie Liu, Junjun Yan, Zhichao Wang, Chunye Gong

To improve the prediction accuracy of the neural network, we also introduce a novel auxiliary line strategy and an efficient network model during meshing.

TorchNTK: A Library for Calculation of Neural Tangent Kernels of PyTorch Models

no code implementations24 May 2022 Andrew Engel, Zhichao Wang, Anand D. Sarwate, Sutanay Choudhury, Tony Chiang

We introduce torchNTK, a python library to calculate the empirical neural tangent kernel (NTK) of neural network models in the PyTorch framework.

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

no code implementations3 May 2022 Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang

We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$.

Causal Disentanglement for Semantics-Aware Intent Learning in Recommendation

no code implementations5 Feb 2022 Xiangmeng Wang, Qian Li, Dianer Yu, Peng Cui, Zhichao Wang, Guandong Xu

Traditional recommendation models trained on observational interaction data have generated large impacts in a wide range of applications, it faces bias problems that cover users' true intent and thus deteriorate the recommendation effectiveness.

Disentanglement

IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion

no code implementations2 Jan 2022 Wendong Gan, Bolong Wen, Ying Yan, Haitao Chen, Zhichao Wang, Hongqiang Du, Lei Xie, Kaixuan Guo, Hai Li

Specifically, prosody vector is first extracted from pre-trained VQ-Wav2Vec model, where rich prosody information is embedded while most speaker and environment information are removed effectively by quantization.

Quantization Voice Conversion

Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios

no code implementations23 Dec 2021 Qicong Xie, Tao Li, Xinsheng Wang, Zhichao Wang, Lei Xie, Guoqiao Yu, Guanglu Wan

Moreover, the explicit prosody features used in the prosody predicting module can increase the diversity of synthetic speech by adjusting the value of prosody features.

Speech Synthesis Style Transfer +1

One-shot Voice Conversion For Style Transfer Based On Speaker Adaptation

no code implementations24 Nov 2021 Zhichao Wang, Qicong Xie, Tao Li, Hongqiang Du, Lei Xie, Pengcheng Zhu, Mengxiao Bi

One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness.

Style Transfer Voice Conversion

Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks

no code implementations20 Sep 2021 Zhichao Wang, Yizhe Zhu

As an application, we show that random feature regression induced by the empirical kernel achieves the same asymptotic performance as its limiting kernel regression under the ultra-wide regime.

regression

Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion

no code implementations16 Jun 2021 Zhichao Wang, Xinyong Zhou, Fengyu Yang, Tao Li, Hongqiang Du, Lei Xie, Wendong Gan, Haitao Chen, Hai Li

Specifically, prosodic features are used to explicit model prosody, while VAE and reference encoder are used to implicitly model prosody, which take Mel spectrum and bottleneck feature as input respectively.

Style Transfer Voice Conversion

WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition

no code implementations8 Apr 2021 Zhichao Wang, Wenwen Yang, Pan Zhou, Wei Chen

Recently, attention-based encoder-decoder (AED) end-to-end (E2E) models have drawn more and more attention in the field of automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks

no code implementations NeurIPS 2020 Zhou Fan, Zhichao Wang

We study the eigenvalue distributions of the Conjugate Kernel and Neural Tangent Kernel associated to multi-layer feedforward neural networks.

Polynomial Representation for Persistence Diagram

no code implementations CVPR 2019 Zhichao Wang, Qian Li, Gang Li, Guandong Xu

In this work, we discover a set of general polynomials that vanish on vectorized PDs and extract the task-adapted feature representation from these polynomials.

BIG-bench Machine Learning Topological Data Analysis

Cannot find the paper you are looking for? You can Submit a new open access paper.