Search Results for author: Yuan Cao

Found 57 papers, 10 papers with code

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

no code implementations9 Jan 2022 Aditya Siddhant, Ankur Bapna, Orhan Firat, Yuan Cao, Mia Xu Chen, Isaac Caswell, Xavier Garcia

While recent progress in massively multilingual MT is one step closer to reaching this goal, it is becoming evident that extending a multilingual MT system simply by training on more parallel data is unscalable, since the availability of labeled data for low-resource and non-English-centric language pairs is forbiddingly limited.

Machine Translation Self-Supervised Learning +1

Benign Overfitting in Adversarially Robust Linear Classification

no code implementations31 Dec 2021 Jinghui Chen, Yuan Cao, Quanquan Gu

Our result suggests that under moderate perturbations, adversarially trained linear classifiers can achieve the near-optimal standard and adversarial risks, despite overfitting the noisy training data.

Understanding How Encoder-Decoder Architectures Attend

no code implementations NeurIPS 2021 Kyle Aitken, Vinay V Ramasesh, Yuan Cao, Niru Maheswaranathan

Moreover, how these mechanisms vary depending on the particular architecture used for the encoder and decoder (recurrent, feed-forward, etc.)

SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue Systems

no code implementations13 Oct 2021 Harrison Lee, Raghav Gupta, Abhinav Rastogi, Yuan Cao, Bin Zhang, Yonghui Wu

We evaluate two dialogue state tracking models on SGD-X and observe that neither generalizes well across schema variations, measured by joint goal accuracy and a novel metric for measuring schema sensitivity.

Data Augmentation Dialogue State Tracking

Efficient and Private Federated Learning with Partially Trainable Networks

no code implementations6 Oct 2021 Hakim Sidahmed, Zheng Xu, Ankush Garg, Yuan Cao, Mingqing Chen

Through extensive experiments, we empirically show that Federated learning of Partially Trainable neural networks (FedPT) can result in superior communication-accuracy trade-offs, with up to $46\times$ reduction in communication cost, at a small accuracy cost.

Federated Learning

Towards Zero-Label Language Learning

no code implementations19 Sep 2021 ZiRui Wang, Adams Wei Yu, Orhan Firat, Yuan Cao

This paper explores zero-label learning in Natural Language Processing (NLP), whereby no human-annotated data is used anywhere during training and models are trained purely on synthetic data.

Data Augmentation

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization

no code implementations25 Aug 2021 Difan Zou, Yuan Cao, Yuanzhi Li, Quanquan Gu

In this paper, we provide a theoretical explanation for this phenomenon: we show that in the nonconvex setting of learning over-parameterized two-layer convolutional neural networks starting from the same random initialization, for a class of data distributions (inspired from image data), Adam and gradient descent (GD) can converge to different global solutions of the training objective with provably different generalization errors, even with weight decay regularization.

Image Classification

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

no code implementations24 Aug 2021 ZiRui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao

With recent progress in joint modeling of visual and textual representations, Vision-Language Pretraining (VLP) has achieved impressive performance on many multimodal downstream tasks.

Image Captioning Language Modelling +2

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

no code implementations NeurIPS 2021 Yuan Cao, Quanquan Gu, Mikhail Belkin

In this paper, we study this "benign overfitting" phenomenon of the maximum margin classifier for linear classification problems.

General Classification

Single-photon imaging over 200 km

no code implementations10 Mar 2021 Zheng-Ping Li, Jun-Tian Ye, Xin Huang, Peng-Yu Jiang, Yuan Cao, Yu Hong, Chao Yu, Jun Zhang, Qiang Zhang, Cheng-Zhi Peng, Feihu Xu, Jian-Wei Pan

Long-range active imaging has widespread applications in remote sensing and target recognition.

Echo State Speech Recognition

no code implementations18 Feb 2021 Harsh Shrivastava, Ankush Garg, Yuan Cao, Yu Zhang, Tara Sainath

We propose automatic speech recognition (ASR) models inspired by echo state network (ESN), in which a subset of recurrent neural networks (RNN) layers in the models are randomly initialized and untrained.

Speech Recognition

Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise

1 code implementation4 Jan 2021 Spencer Frei, Yuan Cao, Quanquan Gu

We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by stochastic gradient descent (SGD) following an arbitrary initialization.

Towards NNGP-guided Neural Architecture Search

1 code implementation11 Nov 2020 Daniel S. Park, Jaehoon Lee, Daiyi Peng, Yuan Cao, Jascha Sohl-Dickstein

Since NNGP inference provides a cheap measure of performance of a network architecture, we investigate its potential as a signal for neural architecture search (NAS).

Neural Architecture Search

The geometry of integration in text classification RNNs

1 code implementation ICLR 2021 Kyle Aitken, Vinay V. Ramasesh, Ankush Garg, Yuan Cao, David Sussillo, Niru Maheswaranathan

Using tools from dynamical systems analysis, we study recurrent networks trained on a battery of both natural and synthetic text classification tasks.

General Classification Text Classification

Rapid Domain Adaptation for Machine Translation with Monolingual Data

no code implementations23 Oct 2020 Mahdis Mahdieh, Mia Xu Chen, Yuan Cao, Orhan Firat

In this paper, we propose an approach that enables rapid domain adaptation from the perspective of unsupervised translation.

Domain Adaptation Machine Translation +1

Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

1 code implementation21 Oct 2020 Jiaming Luo, Frederik Hartmann, Enrico Santus, Yuan Cao, Regina Barzilay

We evaluate the model on both deciphered languages (Gothic, Ugaritic) and an undeciphered one (Iberian).

Decipherment

Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins

no code implementations1 Oct 2020 Spencer Frei, Yuan Cao, Quanquan Gu

We analyze the properties of gradient descent on convex surrogates for the zero-one loss for the agnostic learning of linear halfspaces.

General Classification

Agnostic Learning of a Single Neuron with Gradient Descent

no code implementations NeurIPS 2020 Spencer Frei, Yuan Cao, Quanquan Gu

In the agnostic PAC learning setting, where no assumption on the relationship between the labels $y$ and the input $x$ is made, if the optimal population risk is $\mathsf{OPT}$, we show that gradient descent achieves population risk $O(\mathsf{OPT})+\epsilon$ in polynomial time and sample complexity when $\sigma$ is strictly increasing.

Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling

3 code implementations NeurIPS 2020 Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, Yoshua Bengio

To make that practical, we show that sampling from this modified density can be achieved by sampling in latent space according to an energy-based model induced by the sum of the latent prior log-density and the discriminator output score.

Image Generation

Echo State Neural Machine Translation

no code implementations27 Feb 2020 Ankush Garg, Yuan Cao, Qi Ge

We present neural machine translation (NMT) models inspired by echo state network (ESN), named Echo State NMT (ESNMT), in which the encoder and decoder layer weights are randomly generated then fixed throughout training.

Machine Translation Translation

A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks

no code implementations NeurIPS 2020 Zixiang Chen, Yuan Cao, Quanquan Gu, Tong Zhang

In this paper, we provide a generalized neural tangent kernel analysis and show that noisy gradient descent with weight decay can still exhibit a "kernel-like" behavior.

Learning Theory

Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis

no code implementations6 Feb 2020 Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Yonghui Wu

This paper proposes a hierarchical, fine-grained and interpretable latent variable model for prosody based on the Tacotron 2 text-to-speech model.

Speech Synthesis

Towards Understanding the Spectral Bias of Deep Learning

no code implementations3 Dec 2019 Yuan Cao, Zhiying Fang, Yue Wu, Ding-Xuan Zhou, Quanquan Gu

An intriguing phenomenon observed during training neural networks is the spectral bias, which states that neural networks are biased towards learning less complex functions.

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

no code implementations ICLR 2021 Zixiang Chen, Yuan Cao, Difan Zou, Quanquan Gu

A recent line of research on deep learning focuses on the extremely over-parameterized setting, and shows that when the network width is larger than a high degree polynomial of the training sample size $n$ and the inverse of the target error $\epsilon^{-1}$, deep neural networks learned by (stochastic) gradient descent enjoy nice optimization and generalization guarantees.

Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks

no code implementations NeurIPS 2019 Yuan Cao, Quanquan Gu

We study the sample complexity of learning one-hidden-layer convolutional neural networks (CNNs) with non-overlapping filters.

Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks

no code implementations NeurIPS 2019 Spencer Frei, Yuan Cao, Quanquan Gu

The skip-connections used in residual networks have become a standard architecture choice in deep learning due to the increased training stability and generalization performance with this architecture, although there has been limited theoretical understanding for this improvement.

Generalization Bounds

Training Deep Neural Networks with Partially Adaptive Momentum

no code implementations25 Sep 2019 Jinghui Chen, Dongruo Zhou, Yiqi Tang, Ziyan Yang, Yuan Cao, Quanquan Gu

Experiments on standard benchmarks show that our proposed algorithm can maintain fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks.

Precipitation Nowcasting with Star-Bridge Networks

no code implementations18 Jul 2019 Yuan Cao, Qiuying Li, Hongming Shan, Zhizhong Huang, Lei Chen, Leiming Ma, Junping Zhang

Precipitation nowcasting, which aims to precisely predict the short-term rainfall intensity of a local region, is gaining increasing attention in the artificial intelligence community.

Video Prediction

Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B

1 code implementation ACL 2019 Jiaming Luo, Yuan Cao, Regina Barzilay

In this paper we propose a novel neural approach for automatic decipherment of lost languages.

Decipherment

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

no code implementations NeurIPS 2019 Yuan Cao, Quanquan Gu

We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i. e., number of hidden nodes per layer) is much larger than the number of training data points.

Generalization Bounds

Gmail Smart Compose: Real-Time Assisted Writing

no code implementations17 May 2019 Mia Xu Chen, Benjamin N Lee, Gagan Bansal, Yuan Cao, Shuyuan Zhang, Justin Lu, Jackie Tsay, Yinan Wang, Andrew M. Dai, Zhifeng Chen, Timothy Sohn, Yonghui Wu

In this paper, we present Smart Compose, a novel system for generating interactive, real-time suggestions in Gmail that assists users in writing mails by reducing repetitive typing.

Language Modelling Model Selection

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

3 code implementations21 Feb 2019 Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.

Sequence-To-Sequence Speech Recognition

Generalization Error Bounds of Gradient Descent for Learning Over-parameterized Deep ReLU Networks

no code implementations4 Feb 2019 Yuan Cao, Quanquan Gu

However, existing generalization error bounds are unable to explain the good generalization performance of over-parameterized DNNs.

Generalization Bounds

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

no code implementations21 Nov 2018 Difan Zou, Yuan Cao, Dongruo Zhou, Quanquan Gu

In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data.

Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation

no code implementations5 Nov 2018 Ye Jia, Melvin Johnson, Wolfgang Macherey, Ron J. Weiss, Yuan Cao, Chung-Cheng Chiu, Naveen Ari, Stella Laurenzo, Yonghui Wu

In this paper, we demonstrate that using pre-trained MT or text-to-speech (TTS) synthesis models to convert weakly supervised data into speech-to-translation pairs for ST training can be more effective than multi-task learning.

Machine Translation Multi-Task Learning +3

Hierarchical Generative Modeling for Controllable Speech Synthesis

2 code implementations ICLR 2019 Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang, Yuan Cao, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang

This paper proposes a neural sequence-to-sequence text-to-speech (TTS) model which can control latent attributes in the generated speech that are rarely annotated in the training data, such as speaking style, accent, background noise, and recording conditions.

Speech Synthesis

Towards Decomposed Linguistic Representation with Holographic Reduced Representation

no code implementations27 Sep 2018 Jiaming Luo, Yuan Cao, Yonghui Wu

The vast majority of neural models in Natural Language Processing adopt a form of structureless distributed representations.

High-Temperature Structure Detection in Ferromagnets

no code implementations21 Sep 2018 Yuan Cao, Matey Neykov, Han Liu

The goal is to distinguish whether the underlying graph is empty, i. e., the model consists of independent Rademacher variables, versus the alternative that the underlying graph contains a subgraph of a certain structure.

Training Deeper Neural Machine Translation Models with Transparent Attention

no code implementations EMNLP 2018 Ankur Bapna, Mia Xu Chen, Orhan Firat, Yuan Cao, Yonghui Wu

While current state-of-the-art NMT models, such as RNN seq2seq and Transformers, possess a large number of parameters, they are still shallow in comparison to convolutional models used for both text and vision applications.

Machine Translation Translation

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

no code implementations16 Aug 2018 Dongruo Zhou, Jinghui Chen, Yuan Cao, Yiqi Tang, Ziyan Yang, Quanquan Gu

In this paper, we provide a fine-grained convergence analysis for a general class of adaptive gradient methods including AMSGrad, RMSProp and AdaGrad.

The Edge Density Barrier: Computational-Statistical Tradeoffs in Combinatorial Inference

no code implementations ICML 2018 Hao Lu, Yuan Cao, Zhuoran Yang, Junwei Lu, Han Liu, Zhaoran Wang

We study the hypothesis testing problem of inferring the existence of combinatorial structures in undirected graphical models.

Two-sample testing

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

5 code implementations18 Jun 2018 Jinghui Chen, Dongruo Zhou, Yiqi Tang, Ziyan Yang, Yuan Cao, Quanquan Gu

Experiments on standard benchmarks show that our proposed algorithm can maintain a fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks.

Training Conditional Random Fields with Natural Gradient Descent

no code implementations10 Aug 2015 Yuan Cao

We propose a novel parameter estimation procedure that works efficiently for conditional random fields (CRF).

Local and Global Inference for High Dimensional Nonparanormal Graphical Models

no code implementations9 Feb 2015 Quanquan Gu, Yuan Cao, Yang Ning, Han Liu

Due to the presence of unknown marginal transformations, we propose a pseudo likelihood based inferential approach.

Cannot find the paper you are looking for? You can Submit a new open access paper.