1 code implementation • 2 Mar 2023 • Shutong Wu, Jiongxiao Wang, Wei Ping, Weili Nie, Chaowei Xiao
In this paper, we propose an adversarial purification-based defense pipeline, AudioPure, for acoustic systems via off-the-shelf diffusion models.
no code implementations • 9 Feb 2023 • Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Ming-Yu Liu, Yuke Zhu, Mohammad Shoeybi, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar
Augmenting pretrained language models (LMs) with a vision encoder (e. g., Flamingo) has obtained state-of-the-art results in image-to-text generation.
no code implementations • 25 Oct 2022 • Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J. Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro
For cross-domain and cross-dataset cases, we show that (a) Adapter (Houlsby et al., 2019) performs the best amongst all the PERMs studied here, and (b) it outperforms finetuning if the task dataset is below a certain size.
2 code implementations • 9 Jun 2022 • Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon
Despite recent progress in generative adversarial network (GAN)-based vocoders, where the model generates raw waveform conditioned on acoustic features, it is challenging to synthesize high-fidelity audio for numerous speakers across various recording environments.
2 code implementations • 9 Jun 2022 • Nayeon Lee, Wei Ping, Peng Xu, Mostofa Patwary, Pascale Fung, Mohammad Shoeybi, Bryan Catanzaro
In this work, we measure and improve the factual accuracy of large-scale LMs for open-ended text generation.
1 code implementation • Findings (ACL) 2022 • Zihan Liu, Mostofa Patwary, Ryan Prenger, Shrimai Prabhumoye, Wei Ping, Mohammad Shoeybi, Bryan Catanzaro
We propose a multi-stage prompting approach to generate knowledgeable responses from a single pretrained LM.
1 code implementation • 15 Feb 2022 • Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro
In this work, we present CleanUNet, a causal speech denoising model on the raw waveform.
1 code implementation • 8 Feb 2022 • Boxin Wang, Wei Ping, Chaowei Xiao, Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Bo Li, Anima Anandkumar, Bryan Catanzaro
In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models.
3 code implementations • 23 Aug 2021 • Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro
However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words.
3 code implementations • NeurIPS 2021 • Chen Zhu, Wei Ping, Chaowei Xiao, Mohammad Shoeybi, Tom Goldstein, Anima Anandkumar, Bryan Catanzaro
For instance, Transformer-LS achieves 0. 97 test BPC on enwik8 using half the number of parameters than previous method, while being faster and is able to handle 3x as long sequences compared to its full-attention version on the same hardware.
Ranked #1 on
Language Modelling
on enwik8 dev
1 code implementation • ICML Workshop INNF 2021 • Kevin J. Shih, Rafael Valle, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
This work introduces a predominantly parallel, end-to-end TTS model based on normalizing flows.
1 code implementation • ICML Workshop INNF 2021 • Zhifeng Kong, Wei Ping
In this work, we propose FastDPM, a unified framework for fast sampling in diffusion probabilistic models.
2 code implementations • ACL 2021 • Devendra Singh Sachan, Mostofa Patwary, Mohammad Shoeybi, Neel Kant, Wei Ping, William L Hamilton, Bryan Catanzaro
We also explore two approaches for end-to-end supervised training of the reader and retriever components in OpenQA models.
1 code implementation • 20 Oct 2020 • Sashank Santhanam, Wei Ping, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro
State-of-the-art conversational agents have advanced significantly in conjunction with the use of large transformer-based language models.
11 code implementations • ICLR 2021 • Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro
In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation.
no code implementations • ICLR 2020 • Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao
In this work, we first propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.
5 code implementations • ICML 2020 • Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song
WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases.
no code implementations • 9 Jul 2019 • Jihyun Park, Kexin Zhao, Kainan Peng, Wei Ping
In this work, we extend ClariNet (Ping et al., 2019), a fully end-to-end speech synthesis model (i. e., text-to-wave), to generate high-fidelity speech from multiple speakers.
2 code implementations • ICML 2020 • Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao
In this work, we propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.
no code implementations • EMNLP 2018 • Jiaji Huang, Yi Li, Wei Ping, Liang Huang
We propose a large margin criterion for training neural language models.
4 code implementations • ICLR 2019 • Wei Ping, Kainan Peng, Jitong Chen
In this work, we propose a new solution for parallel wave generation by WaveNet.
1 code implementation • 19 Jun 2018 • Yi Li, Wei Ping
Compared to the baseline method without considering spatial correlations, we show that the proposed NCRF framework obtains probability maps of patch predictions with better visual quality.
2 code implementations • NeurIPS 2018 • Sercan O. Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou
Speaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples.
no code implementations • ICLR 2018 • Yanqi Zhou, Wei Ping, Sercan Arik, Kainan Peng, Greg Diamos
This paper introduces HybridNet, a hybrid neural network to speed-up autoregressive models for raw audio waveform generation.
no code implementations • 28 Dec 2017 • Wenlin Wang, Zhe Gan, Wenqi Wang, Dinghan Shen, Jiaji Huang, Wei Ping, Sanjeev Satheesh, Lawrence Carin
The TCNLM learns the global semantic coherence of a document via a neural topic model, and the probability of each learned latent topic is further used to build a Mixture-of-Experts (MoE) language model, where each expert (corresponding to one topic) is a recurrent neural network (RNN) that accounts for learning the local structure of a word sequence.
7 code implementations • ICLR 2018 • Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O. Arik, Ajay Kannan, Sharan Narang, Jonathan Raiman, John Miller
We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system.
no code implementations • NeurIPS 2016 • Wei Ping, Qiang Liu, Alexander Ihler
In this work, we propose an infinite restricted Boltzmann machine~(RBM), whose maximum likelihood estimation~(MLE) corresponds to a constrained convex optimization.
1 code implementation • NeurIPS 2017 • Sercan Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou
We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constructed with higher performance building blocks and demonstrates a significant audio quality improvement over Deep Voice 1.
no code implementations • 2 Mar 2017 • Wei Ping, Alexander Ihler
We demonstrate that, in both maximum likelihood and max-margin learning, training conditional RBMs with BP as the inference routine can provide significantly better results than current state-of-the-art CD methods on structured prediction problems.
no code implementations • NeurIPS 2015 • Wei Ping, Qiang Liu, Alexander Ihler
Marginal MAP inference involves making MAP predictions in systems defined with latent variables or missing information.
no code implementations • 4 Sep 2014 • Wei Ping, Qiang Liu, Alexander Ihler
In this work, we propose the marginal structured SVM (MSSVM) for structured prediction with hidden variables.