Search Results for author: Kaizhi Qian

Found 18 papers, 11 papers with code

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

11 code implementations14 May 2019 Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Mark Hasegawa-Johnson

On the other hand, CVAE training is simple but does not come with the distribution-matching property of a GAN.

Style Transfer Voice Conversion

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

1 code implementation20 Apr 2022 Kaizhi Qian, Yang Zhang, Heting Gao, Junrui Ni, Cheng-I Lai, David Cox, Mark Hasegawa-Johnson, Shiyu Chang

Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks.

Disentanglement Self-Supervised Learning

Global Rhythm Style Transfer Without Text Transcriptions

1 code implementation16 Jun 2021 Kaizhi Qian, Yang Zhang, Shiyu Chang, JinJun Xiong, Chuang Gan, David Cox, Mark Hasegawa-Johnson

In this paper, we propose AutoPST, which can disentangle global prosody style from speech without relying on any text transcriptions.

Representation Learning Style Transfer

SpeechSplit 2.0: Unsupervised speech disentanglement for voice conversion Without tuning autoencoder Bottlenecks

1 code implementation26 Mar 2022 Chak Ho Chan, Kaizhi Qian, Yang Zhang, Mark Hasegawa-Johnson

SpeechSplit can perform aspect-specific voice conversion by disentangling speech into content, rhythm, pitch, and timbre using multiple autoencoders in an unsupervised manner.

Disentanglement Voice Conversion

Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

1 code implementation29 Mar 2022 Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson

An unsupervised text-to-speech synthesis (TTS) system learns to generate speech waveforms corresponding to any written sentence in a language by observing: 1) a collection of untranscribed speech waveforms in that language; 2) a collection of texts written in that language without access to any transcribed speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

1 code implementation15 Apr 2020 Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham J. Mysore

Recently, AutoVC, a conditional autoencoders (CAEs) based method achieved state-of-the-art results by disentangling the speaker identity and speech content using information-constraining bottlenecks, and it achieves zero-shot conversion by swapping in a different speaker's identity embedding to synthesize a new voice.

Style Transfer Voice Conversion

Speech Denoising with Auditory Models

1 code implementation21 Nov 2020 Mark R. Saddler, Andrew Francl, Jenelle Feather, Kaizhi Qian, Yang Zhang, Josh H. McDermott

Contemporary speech enhancement predominantly relies on audio transforms that are trained to reconstruct a clean speech waveform.

Denoising Speech Denoising +1

Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling

1 code implementation15 Nov 2023 Bairu Hou, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang, Yang Zhang

Uncertainty decomposition refers to the task of decomposing the total uncertainty of a model into data (aleatoric) uncertainty, resulting from the inherent complexity or ambiguity of the data, and model (epistemic) uncertainty, resulting from the lack of knowledge in the model.

Uncertainty Quantification

Deep Learning Based Speech Beamforming

no code implementations15 Feb 2018 Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florencio, Mark Hasegawa-Johnson

On the other hand, deep learning based enhancement approaches are able to learn complicated speech distributions and perform efficient inference, but they are unable to deal with variable number of input channels.

Speech Enhancement

An Efficient and Margin-Approaching Zero-Confidence Adversarial Attack

no code implementations ICLR 2019 Yang Zhang, Shiyu Chang, Mo Yu, Kaizhi Qian

The second paradigm, called the zero-confidence attack, finds the smallest perturbation needed to cause mis-classification, also known as the margin of an input feature.

Adversarial Attack

Continuous Convolutional Neural Network forNonuniform Time Series

no code implementations25 Sep 2019 Hui Shi, Yang Zhang, Hao Wu, Shiyu Chang, Kaizhi Qian, Mark Hasegawa-Johnson, Jishen Zhao

Convolutional neural network (CNN) for time series data implicitly assumes that the data are uniformly sampled, whereas many event-based and multi-modal data are nonuniform or have heterogeneous sampling rates.

Time Series Time Series Analysis

Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos

no code implementations CVPR 2023 Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan

Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound.

Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning

no code implementations23 Jun 2023 Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Yonggan Fu, Yingyan Lin

Despite the impressive performance recently achieved by automatic speech recognition (ASR), we observe two primary challenges that hinder its broader applications: (1) The difficulty of introducing scalability into the model to support more languages with limited training, inference, and storage overhead; (2) The low-resource adaptation ability that enables effective low-resource adaptation while avoiding over-fitting and catastrophic forgetting issues.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Cannot find the paper you are looking for? You can Submit a new open access paper.