Search Results for author: Yuanzhi Li

Found 52 papers, 7 papers with code

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization

no code implementations25 Aug 2021 Difan Zou, Yuan Cao, Yuanzhi Li, Quanquan Gu

In this paper, we provide a theoretical explanation for this phenomenon: we show that in the nonconvex setting of learning over-parameterized two-layer convolutional neural networks starting from the same random initialization, for a class of data distributions (inspired from image data), Adam and gradient descent (GD) can converge to different global solutions of the training objective with provably different generalization errors, even with weight decay regularization.

Image Classification

LoRA: Low-Rank Adaptation of Large Language Models

1 code implementation17 Jun 2021 Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Weizhu Chen

We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.

Language Modelling

Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity

no code implementations15 Jun 2021 Dhruv Malik, Aldo Pacchiano, Vishwak Srinivasan, Yuanzhi Li

Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces.

Atari Games

Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions

no code implementations4 Jun 2021 Zeyuan Allen-Zhu, Yuanzhi Li

Generative adversarial networks (GANs) are among the most successful models for learning high-complexity, real-world distributions.


Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning

no code implementations31 May 2021 Zixin Wen, Yuanzhi Li

We present an underlying principle called $\textbf{feature decoupling}$ to explain the effects of augmentations, where we theoretically characterize how augmentations can reduce the correlations of dense features between positive samples while keeping the correlations of sparse features intact, thereby forcing the neural networks to learn from the self-supervision of sparse features.

Contrastive Learning Self-Supervised Learning

Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability

1 code implementation ICLR 2021 Jeremy M. Cohen, Simran Kaur, Yuanzhi Li, J. Zico Kolter, Ameet Talwalkar

We empirically demonstrate that full-batch gradient descent on neural network training objectives typically operates in a regime we call the Edge of Stability.

When Is Generalizable Reinforcement Learning Tractable?

no code implementations1 Jan 2021 Dhruv Malik, Yuanzhi Li, Pradeep Ravikumar

Agents trained by reinforcement learning (RL) often fail to generalize beyond the environment they were trained in, even when presented with new scenarios that seem similar to the training environment.

Representation Learning

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

no code implementations17 Dec 2020 Zeyuan Allen-Zhu, Yuanzhi Li

Our result sheds light on how ensemble works in deep learning in a way that is completely different from traditional theorems, and how the "dark knowledge" is hidden in the outputs of the ensemble -- that can be used in knowledge distillation -- comparing to the true data labels.

Knowledge Distillation Learning Theory

A law of robustness for two-layers neural networks

no code implementations30 Sep 2020 Sébastien Bubeck, Yuanzhi Li, Dheeraj Nagaraj

We make a precise conjecture that, for any Lipschitz activation function and for most datasets, any two-layers neural network with $k$ neurons that perfectly fit the data must have its Lipschitz constant larger (up to a constant) than $\sqrt{n/k}$ where $n$ is the number of datapoints.

Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

no code implementations9 Jul 2020 Yuanzhi Li, Tengyu Ma, Hongyang R. Zhang

We consider the dynamic of gradient descent for learning a two-layer neural network.

Feature Purification: How Adversarial Training Performs Robust Deep Learning

no code implementations20 May 2020 Zeyuan Allen-Zhu, Yuanzhi Li

Finally, we also prove a complexity lower bound, showing that low complexity models such as linear classifiers, low-degree polynomials, or even the neural tangent kernel for this network, CANNOT defend against perturbations of this same radius, no matter what algorithms are used to train them.

Making Method of Moments Great Again? -- How can GANs learn distributions

no code implementations9 Mar 2020 Yuanzhi Li, Zehao Dou

In GANs, the training of the generator usually stops when the discriminator can no longer distinguish the generator's output from the set of training examples.

Backward Feature Correction: How Deep Learning Performs Deep Learning

no code implementations13 Jan 2020 Zeyuan Allen-Zhu, Yuanzhi Li

On the technical side, we show for regression and even binary classification, for every input dimension $d>0$, there is a concept class of degree $\omega(1)$ polynomials so that, using $\omega(1)$-layer neural networks as learners, SGD can learn any function from this class in $\mathsf{poly}(d)$ time and sample complexity to any $\frac{1}{\mathsf{poly}(d)}$ error, through learning to represent it as a composition of $\omega(1)$ layers of quadratic functions.

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks

2 code implementations NeurIPS 2019 Yuanzhi Li, Colin Wei, Tengyu Ma

This concept translates to a larger-scale setting: we demonstrate that one can add a small patch to CIFAR-10 images that is immediately memorizable by a model with small initial learning rate, but ignored by the model with large learning rate until after annealing.

Complexity of Highly Parallel Non-Smooth Convex Optimization

no code implementations NeurIPS 2019 Sébastien Bubeck, Qijia Jiang, Yin Tat Lee, Yuanzhi Li, Aaron Sidford

Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer $\mathrm{poly}(d)$ gradient queries in parallel.

What Can ResNet Learn Efficiently, Going Beyond Kernels?

no code implementations NeurIPS 2019 Zeyuan Allen-Zhu, Yuanzhi Li

Recently, there is an influential line of work relating neural networks to kernels in the over-parameterized regime, proving they can learn certain concept class that is also learnable by kernels with similar test error.

One-Shot Learning

Improved Path-length Regret Bounds for Bandits

no code implementations29 Jan 2019 Sébastien Bubeck, Yuanzhi Li, Haipeng Luo, Chen-Yu Wei

We study adaptive regret bounds in terms of the variation of the losses (the so-called path-length bounds) for both multi-armed bandit and more generally linear bandit.

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

no code implementations NeurIPS 2019 Zeyuan Allen-Zhu, Yuanzhi Li, YIngyu Liang

In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations.

Learning Theory

A Convergence Theory for Deep Learning via Over-Parameterization

no code implementations9 Nov 2018 Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song

In terms of network architectures, our theory at least applies to fully-connected neural networks, convolutional neural networks (CNN), and residual neural networks (ResNet).

On the Convergence Rate of Training Recurrent Neural Networks

no code implementations NeurIPS 2019 Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song

In this paper, we focus on recurrent neural networks (RNNs) which are multi-layer networks widely used in natural language processing.

The Well-Tempered Lasso

no code implementations ICML 2018 Yuanzhi Li, Yoram Singer

Every regression parameter in the Lasso changes linearly as a function of the regularization value.

The Well Tempered Lasso

no code implementations8 Jun 2018 Yuanzhi Li, Yoram Singer

Every regression parameter in the Lasso changes linearly as a function of the regularization value.

Online Improper Learning with an Approximation Oracle

no code implementations NeurIPS 2018 Elad Hazan, Wei Hu, Yuanzhi Li, Zhiyuan Li

We revisit the question of reducing online learning to approximate optimization of the offline problem.

Learning Mixtures of Linear Regressions with Nearly Optimal Complexity

no code implementations22 Feb 2018 Yuanzhi Li, YIngyu Liang

Mixtures of Linear Regressions (MLR) is an important mixture model with many applications.

Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits

no code implementations ICML 2018 Zeyuan Allen-Zhu, Sébastien Bubeck, Yuanzhi Li

Regret bounds in online learning compare the player's performance to $L^*$, the optimal performance in hindsight with a fixed strategy.

Multi-Armed Bandits

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

no code implementations26 Dec 2017 Yuanzhi Li, Tengyu Ma, Hongyang Zhang

We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations.

Neon2: Finding Local Minima via First-Order Oracles

no code implementations NeurIPS 2018 Zeyuan Allen-Zhu, Yuanzhi Li

We propose a reduction for non-convex optimization that can (1) turn an stationary-point finding algorithm into an local-minimum finding one, and (2) replace the Hessian-vector product computations with only gradient computations.

Near-Optimal Discrete Optimization for Experimental Design: A Regret Minimization Approach

no code implementations14 Nov 2017 Zeyuan Allen-Zhu, Yuanzhi Li, Aarti Singh, Yining Wang

The experimental design problem concerns the selection of k points from a potentially large design pool of p-dimensional vectors, so as to maximize the statistical efficiency regressed on the selected k design points.

Sparsity, variance and curvature in multi-armed bandits

no code implementations3 Nov 2017 Sébastien Bubeck, Michael B. Cohen, Yuanzhi Li

In (online) learning theory the concepts of sparsity, variance and curvature are well-understood and are routinely used to obtain refined regret and generalization bounds.

Generalization Bounds Learning Theory +1

Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls

no code implementations NeurIPS 2017 Zeyuan Allen-Zhu, Elad Hazan, Wei Hu, Yuanzhi Li

We propose a rank-$k$ variant of the classical Frank-Wolfe algorithm to solve convex optimization over a trace-norm ball.

Near-Optimal Design of Experiments via Regret Minimization

no code implementations ICML 2017 Zeyuan Allen-Zhu, Yuanzhi Li, Aarti Singh, Yining Wang

We consider computationally tractable methods for the experimental design problem, where k out of n design points of dimension p are selected so that certain optimality criteria are approximately satisfied.

A Nearly Instance Optimal Algorithm for Top-k Ranking under the Multinomial Logit Model

no code implementations25 Jul 2017 Xi Chen, Yuanzhi Li, Jieming Mao

We study the active learning problem of top-$k$ ranking from multi-wise comparisons under the popular multinomial logit model.

Active Learning

Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations

1 code implementation ICML 2017 Yuanzhi Li, YIngyu Liang

Non-negative matrix factorization is a basic tool for decomposing data into the feature and weight matrices under non-negativity constraints, and in practice is often solved in the alternating minimization framework.

Convergence Analysis of Two-layer Neural Networks with ReLU Activation

no code implementations NeurIPS 2017 Yuanzhi Li, Yang Yuan

We also show that the identity mapping is necessary for convergence, as it moves the initial point to a better place for optimization.

Algorithms and matching lower bounds for approximately-convex optimization

no code implementations NeurIPS 2016 Andrej Risteski, Yuanzhi Li

In recent years, a rapidly increasing number of applications in practice requires solving non-convex objectives, like training neural networks, learning graphical models, maximum likelihood estimation etc.

Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates

no code implementations NeurIPS 2016 Yuanzhi Li, YIngyu Liang, Andrej Risteski

Non-negative matrix factorization is a popular tool for decomposing data into feature and weight matrices under non-negativity constraints.

Faster Principal Component Regression and Stable Matrix Chebyshev Approximation

no code implementations ICML 2017 Zeyuan Allen-Zhu, Yuanzhi Li

We solve principal component regression (PCR), up to a multiplicative accuracy $1+\gamma$, by reducing the problem to $\tilde{O}(\gamma^{-1})$ black-box calls of ridge regression.

First Efficient Convergence for Streaming k-PCA: a Global, Gap-Free, and Near-Optimal Rate

no code implementations26 Jul 2016 Zeyuan Allen-Zhu, Yuanzhi Li

We provide $\textit{global}$ convergence for Oja's algorithm which is popularly used in practice but lacks theoretical understanding for $k>1$.

Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition

no code implementations ICML 2017 Zeyuan Allen-Zhu, Yuanzhi Li

We study $k$-GenEV, the problem of finding the top $k$ generalized eigenvectors, and $k$-CCA, the problem of finding the top $k$ vectors in canonical-correlation analysis.

LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain

no code implementations NeurIPS 2016 Zeyuan Allen-Zhu, Yuanzhi Li

In the $O(\mathsf{nnz}(A) + \mathsf{poly}(1/\varepsilon))$ running-time regime, LazySVD outperforms [3] in certain parameter regimes without even using alternating minimization.

Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods

no code implementations NeurIPS 2016 Yuanzhi Li, Andrej Risteski

The well known maximum-entropy principle due to Jaynes, which states that given mean parameters, the maximum entropy distribution matching them is in an exponential family, has been very popular in machine learning due to its "Occam's razor" interpretation.

An optimal algorithm for bandit convex optimization

no code implementations14 Mar 2016 Elad Hazan, Yuanzhi Li

We consider the problem of online convex optimization against an arbitrary adversary with bandit feedback, known as bandit convex optimization.

Recovery guarantee of weighted low-rank approximation via alternating minimization

no code implementations6 Feb 2016 Yuanzhi Li, YIngyu Liang, Andrej Risteski

We show that the properties only need to hold in an average sense and can be achieved by the clipping step.

Matrix Completion

Linear Algebraic Structure of Word Senses, with Applications to Polysemy

1 code implementation TACL 2018 Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

A novel aspect of our technique is that each extracted word sense is accompanied by one of about 2000 "discourse atoms" that gives a succinct description of which other words co-occur with that word sense.

Information Retrieval Word Embeddings

A Latent Variable Model Approach to PMI-based Word Embeddings

4 code implementations TACL 2016 Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods.

Word Embeddings

A Theoretical Analysis of NDCG Type Ranking Measures

no code implementations24 Apr 2013 Yining Wang, Li-Wei Wang, Yuanzhi Li, Di He, Tie-Yan Liu, Wei Chen

We show that NDCG with logarithmic discount has consistent distinguishability although it converges to the same limit for all ranking functions.

Cannot find the paper you are looking for? You can Submit a new open access paper.