Search Results for author: Wei Ping

Found 31 papers, 20 papers with code

Defending against Adversarial Audio via Diffusion Model

1 code implementation2 Mar 2023 Shutong Wu, Jiongxiao Wang, Wei Ping, Weili Nie, Chaowei Xiao

In this paper, we propose an adversarial purification-based defense pipeline, AudioPure, for acoustic systems via off-the-shelf diffusion models.

Evaluating Parameter Efficient Learning for Generation

no code implementations25 Oct 2022 Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J. Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro

For cross-domain and cross-dataset cases, we show that (a) Adapter (Houlsby et al., 2019) performs the best amongst all the PERMs studied here, and (b) it outperforms finetuning if the task dataset is below a certain size.

BigVGAN: A Universal Neural Vocoder with Large-Scale Training

2 code implementations9 Jun 2022 Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon

Despite recent progress in generative adversarial network (GAN)-based vocoders, where the model generates raw waveform conditioned on acoustic features, it is challenging to synthesize high-fidelity audio for numerous speakers across various recording environments.

Audio Generation Inductive Bias +2

Speech Denoising in the Waveform Domain with Self-Attention

1 code implementation15 Feb 2022 Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro

In this work, we present CleanUNet, a causal speech denoising model on the raw waveform.

Denoising Speech Denoising

One TTS Alignment To Rule Them All

3 code implementations23 Aug 2021 Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro

However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words.

Speech Synthesis

Long-Short Transformer: Efficient Transformers for Language and Vision

3 code implementations NeurIPS 2021 Chen Zhu, Wei Ping, Chaowei Xiao, Mohammad Shoeybi, Tom Goldstein, Anima Anandkumar, Bryan Catanzaro

For instance, Transformer-LS achieves 0. 97 test BPC on enwik8 using half the number of parameters than previous method, while being faster and is able to handle 3x as long sequences compared to its full-attention version on the same hardware.

Language Modelling

On Fast Sampling of Diffusion Probabilistic Models

1 code implementation ICML Workshop INNF 2021 Zhifeng Kong, Wei Ping

In this work, we propose FastDPM, a unified framework for fast sampling in diffusion probabilistic models.

Local Knowledge Powered Conversational Agents

1 code implementation20 Oct 2020 Sashank Santhanam, Wei Ping, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro

State-of-the-art conversational agents have advanced significantly in conjunction with the use of large transformer-based language models.


DiffWave: A Versatile Diffusion Model for Audio Synthesis

11 code implementations ICLR 2021 Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro

In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation.

Speech Synthesis

Parallel Neural Text-to-Speech

no code implementations ICLR 2020 Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

In this work, we first propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.

WaveFlow: A Compact Flow-based Model for Raw Audio

5 code implementations ICML 2020 Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song

WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases.

Multi-Speaker End-to-End Speech Synthesis

no code implementations9 Jul 2019 Jihyun Park, Kexin Zhao, Kainan Peng, Wei Ping

In this work, we extend ClariNet (Ping et al., 2019), a fully end-to-end speech synthesis model (i. e., text-to-wave), to generate high-fidelity speech from multiple speakers.

Speech Synthesis

Non-Autoregressive Neural Text-to-Speech

2 code implementations ICML 2020 Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

In this work, we propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.

Text-To-Speech Synthesis

Cancer Metastasis Detection With Neural Conditional Random Field

1 code implementation19 Jun 2018 Yi Li, Wei Ping

Compared to the baseline method without considering spatial correlations, we show that the proposed NCRF framework obtains probability maps of patch predictions with better visual quality.

Cancer Metastasis Detection whole slide images

Neural Voice Cloning with a Few Samples

2 code implementations NeurIPS 2018 Sercan O. Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou

Speaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples.

Speech Synthesis Voice Cloning

HybridNet: A Hybrid Neural Architecture to Speed-up Autoregressive Models

no code implementations ICLR 2018 Yanqi Zhou, Wei Ping, Sercan Arik, Kainan Peng, Greg Diamos

This paper introduces HybridNet, a hybrid neural network to speed-up autoregressive models for raw audio waveform generation.

Speech Synthesis

Topic Compositional Neural Language Model

no code implementations28 Dec 2017 Wenlin Wang, Zhe Gan, Wenqi Wang, Dinghan Shen, Jiaji Huang, Wei Ping, Sanjeev Satheesh, Lawrence Carin

The TCNLM learns the global semantic coherence of a document via a neural topic model, and the probability of each learned latent topic is further used to build a Mixture-of-Experts (MoE) language model, where each expert (corresponding to one topic) is a recurrent neural network (RNN) that accounts for learning the local structure of a word sequence.

Language Modelling

Learning Infinite RBMs with Frank-Wolfe

no code implementations NeurIPS 2016 Wei Ping, Qiang Liu, Alexander Ihler

In this work, we propose an infinite restricted Boltzmann machine~(RBM), whose maximum likelihood estimation~(MLE) corresponds to a constrained convex optimization.

Deep Voice 2: Multi-Speaker Neural Text-to-Speech

1 code implementation NeurIPS 2017 Sercan Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou

We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constructed with higher performance building blocks and demonstrates a significant audio quality improvement over Deep Voice 1.

Speech Synthesis

Belief Propagation in Conditional RBMs for Structured Prediction

no code implementations2 Mar 2017 Wei Ping, Alexander Ihler

We demonstrate that, in both maximum likelihood and max-margin learning, training conditional RBMs with BP as the inference routine can provide significantly better results than current state-of-the-art CD methods on structured prediction problems.

Structured Prediction

Decomposition Bounds for Marginal MAP

no code implementations NeurIPS 2015 Wei Ping, Qiang Liu, Alexander Ihler

Marginal MAP inference involves making MAP predictions in systems defined with latent variables or missing information.

Marginal Structured SVM with Hidden Variables

no code implementations4 Sep 2014 Wei Ping, Qiang Liu, Alexander Ihler

In this work, we propose the marginal structured SVM (MSSVM) for structured prediction with hidden variables.

Structured Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.