no code implementations • 14 Mar 2023 • Haoyu Zhao, Abhishek Panigrahi, Rong Ge, Sanjeev Arora
We also show that the Inside-Outside algorithm is optimal for masked language modeling loss on the PCFG-generated data.
no code implementations • 24 Feb 2023 • Muthu Chidambaram, Chenwei Wu, Yu Cheng, Rong Ge
Furthermore, drawing from the growing body of work on self-supervised learning, we propose a novel masking objective and we prove that minimizing this new objective can recover the ground-truth dictionary.
no code implementations • 1 Feb 2023 • Mo Zhou, Rong Ge
In this work, we give a different parametrization of the model which leads to a new implicit regularization effect that combines the benefit of $\ell_1$ and $\ell_2$ interpolators.
no code implementations • 24 Oct 2022 • Muthu Chidambaram, Xiang Wang, Chenwei Wu, Rong Ge
Mixup is a data augmentation technique that relies on training using random convex combinations of data points and their labels.
no code implementations • 7 Oct 2022 • Xingyu Zhu, Zixuan Wang, Xiang Wang, Mo Zhou, Rong Ge
Globally we observe that the training dynamics for our example has an interesting bifurcating behavior, which was also observed in the training of neural nets.
no code implementations • 3 Oct 2022 • Xiang Wang, Annie N. Wang, Mo Zhou, Rong Ge
Monotonic linear interpolation (MLI) - on the line connecting a random initialization with the minimizer it converges to, the loss and accuracy are monotonic - is a phenomenon that is commonly observed in the training of neural networks.
no code implementations • ICML 2020 • Keerti Anand, Rong Ge, Debmalya Panigrahi
A popular line of recent research incorporates ML advice in the design of online algorithms to improve their performance in typical instances.
no code implementations • NeurIPS 2021 • Keerti Anand, Rong Ge, Amit Kumar, Debmalya Panigrahi
The emerging field of learning-augmented online algorithms uses ML techniques to predict future input parameters and thereby improve the performance of online algorithms.
no code implementations • 8 May 2022 • Keerti Anand, Rong Ge, Amit Kumar, Debmalya Panigrahi
In this paper, we give a generic algorithmic framework for online covering problems with multiple predictions that obtains an online solution that is competitive against the performance of the best predictor.
no code implementations • 2 Feb 2022 • Zeping Luo, Shiyou Wu, Cindy Weng, Mo Zhou, Rong Ge
Self-supervised learning has significantly improved the performance of many NLP tasks.
1 code implementation • ICLR 2022 • Muthu Chidambaram, Xiang Wang, Yuzheng Hu, Chenwei Wu, Rong Ge
Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk and exhibit better generalization and robustness on various tasks when compared to standard training.
no code implementations • 29 Sep 2021 • Zeping Luo, Cindy Weng, Shiyou Wu, Mo Zhou, Rong Ge
Self-supervised learning has significantly improved the performance of many NLP tasks.
1 code implementation • 23 Sep 2021 • Yu Cheng, Ilias Diakonikolas, Rong Ge, Shivam Gupta, Daniel M. Kane, Mahdi Soltanolkotabi
We explore the connection between outlier-robust high-dimensional statistics and non-convex optimization in the presence of sparsity constraints, with a focus on the fundamental tasks of robust sparse mean estimation and robust sparse PCA.
no code implementations • NeurIPS 2021 • Rong Ge, Yunwei Ren, Xiang Wang, Mo Zhou
In this paper we study the training dynamics for gradient flow on over-parametrized tensor decomposition problems.
no code implementations • 4 Feb 2021 • Mo Zhou, Rong Ge, Chi Jin
We show that as long as the loss is already lower than a threshold (polynomial in relevant parameters), all student neurons in an over-parameterized two-layer neural network will converge to one of teacher neurons, and the loss will go to 0.
no code implementations • NeurIPS 2020 • Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, Rong Ge
We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = \Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2. 5l}\log d)$.
no code implementations • 8 Oct 2020 • Yikai Wu, Xingyu Zhu, Chenwei Wu, Annie Wang, Rong Ge
We can analyze the properties of these smaller matrices and prove the structure of top eigenspace random 2-layer networks.
no code implementations • 30 Sep 2020 • Rong Ge, Holden Lee, Jianfeng Lu, Andrej Risteski
We give a algorithm for exact sampling from the Bingham distribution $p(x)\propto \exp(x^\top A x)$ on the sphere $\mathcal S^{d-1}$ with expected runtime of $\operatorname{poly}(d, \lambda_{\max}(A)-\lambda_{\min}(A))$.
1 code implementation • 30 Jun 2020 • Xiang Wang, Shuai Yuan, Chenwei Wu, Rong Ge
Solving this problem using a learning-to-learn approach -- using meta-gradient descent on a meta-objective based on the trajectory that the optimizer generates -- was recently shown to be effective.
no code implementations • 29 Jun 2020 • Abraham Frandsen, Rong Ge
Finding a Tucker decomposition is a nonconvex optimization problem.
no code implementations • 29 Jun 2020 • Abraham Frandsen, Rong Ge
In this work we study a model where there is a hidden linear subspace in which the dynamics is linear.
1 code implementation • 12 May 2020 • Yu Wang, Rong Ge, Shuang Qiu
Unlike existing work in deep neural network (DNN) graphs optimization for inference performance, we explore DNN graph optimization for energy awareness and savings for power- and resource-constrained machine learning devices.
no code implementations • ICML 2020 • Yu Cheng, Ilias Diakonikolas, Rong Ge, Mahdi Soltanolkotabi
We study the problem of high-dimensional robust mean estimation in the presence of a constant fraction of adversarial outliers.
no code implementations • 16 Apr 2020 • Majid Janzamin, Rong Ge, Jean Kossaifi, Anima Anandkumar
PCA and other spectral techniques applied to matrices have several limitations.
no code implementations • 8 Nov 2019 • Rong Ge, Holden Lee, Jianfeng Lu
Estimating the normalizing constant of an unnormalized probability distribution has important applications in computer science, statistical physics, machine learning, and statistics.
no code implementations • 26 Sep 2019 • Rong Ge, Runzhe Wang, Haoyu Zhao
It has been observed \citep{zhang2016understanding} that deep neural networks can memorize: they achieve 100\% accuracy on training data.
1 code implementation • NeurIPS 2019 • Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Sanjeev Arora, Rong Ge
Mode connectivity is a surprising phenomenon in the loss landscape of deep nets.
no code implementations • 11 Jun 2019 • Yu Cheng, Ilias Diakonikolas, Rong Ge, David Woodruff
We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted.
no code implementations • 1 May 2019 • Rong Ge, Zhize Li, Wei-Yao Wang, Xiang Wang
Variance reduction techniques like SVRG provide simple and fast algorithms for optimizing a convex finite-sum objective.
no code implementations • ICLR 2019 • Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli
One plausible explanation is that non-convex neural network training procedures are better suited to the use of fundamentally different learning rate schedules, such as the ``cut the learning rate every constant number of epochs'' method (which more closely resembles an exponentially decaying learning rate schedule); note that this widely used schedule is in stark contrast to the polynomial decay schemes prescribed in the stochastic approximation literature, which are indeed shown to be (worst case) optimal for classes of convex optimization problems.
1 code implementation • NeurIPS 2019 • Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli
First, this work shows that even if the time horizon T (i. e. the number of iterations SGD is run for) is known in advance, SGD's final iterate behavior with any polynomially decaying learning rate scheme is highly sub-optimal compared to the minimax rate (by a condition number factor in the strongly convex case and a factor of $\sqrt{T}$ in the non-strongly convex case).
no code implementations • 13 Feb 2019 • Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan
More recent theory has shown that GD and SGD can avoid saddle points, but the dependence on dimension in these analyses is polynomial.
no code implementations • 11 Feb 2019 • Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan
In this note, we derive concentration inequalities for random vectors with subGaussian norm (a generalization of both subGaussian random vectors and norm bounded random vectors), which are tight up to logarithmic factors.
1 code implementation • ICLR 2019 • Abraham Frandsen, Rong Ge
Word embedding is a powerful tool in natural language processing.
no code implementations • 29 Nov 2018 • Rong Ge, Holden Lee, Andrej Risteski
Previous approaches rely on decomposing the state space as a partition of sets, while our approach can be thought of as decomposing the stationary measure as a mixture of distributions (a "soft partition").
no code implementations • 23 Nov 2018 • Yu Cheng, Ilias Diakonikolas, Rong Ge
We study the fundamental problem of high-dimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted.
no code implementations • ICLR 2019 • Rong Ge, Rohith Kuditipudi, Zhize Li, Xiang Wang
We give a new algorithm for learning a two-layer neural network under a general class of input distributions.
no code implementations • 28 Mar 2018 • Yu Cheng, Rong Ge
Matrix completion is a well-studied problem with many machine learning applications.
no code implementations • NeurIPS 2018 • Chi Jin, Lydia T. Liu, Rong Ge, Michael. I. Jordan
Our objective is to find the $\epsilon$-approximate local minima of the underlying function $F$ while avoiding the shallow local minima---arising because of the tolerance $\nu$---which exist only in $f$.
no code implementations • ICML 2018 • Sanjeev Arora, Rong Ge, Behnam Neyshabur, Yi Zhang
Analysis of correctness of our compression relies upon some newly identified \textquotedblleft noise stability\textquotedblright properties of trained deep nets, which are also experimentally verified.
no code implementations • ICML 2018 • Maryam Fazel, Rong Ge, Sham M. Kakade, Mehran Mesbahi
Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model 2) they are an "end-to-end" approach, directly optimizing the performance metric of interest 3) they inherently allow for richly parameterized policies.
no code implementations • ICLR 2018 • Maryam Fazel, Rong Ge, Sham M. Kakade, Mehran Mesbahi
Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model; 2) they are an "end-to-end" approach, directly optimizing the performance metric of interest; 3) they inherently allow for richly parameterized policies.
no code implementations • ICLR 2018 • Rong Ge, Jason D. Lee, Tengyu Ma
All global minima of $G$ correspond to the ground truth parameters.
no code implementations • NeurIPS 2018 • Rong Ge, Holden Lee, Andrej Risteski
We analyze this Markov chain for the canonical multi-modal distribution: a mixture of gaussians (of equal variance).
no code implementations • NeurIPS 2017 • Rong Ge, Tengyu Ma
The landscape of many objective functions in learning has been conjectured to have the geometric property that "all local optima are (approximately) global optima", and thus they can be solved efficiently by local search algorithms.
no code implementations • ICML 2017 • Rong Ge, Chi Jin, Yi Zheng
In this paper we develop a new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA.
1 code implementation • ICML 2017 • Sanjeev Arora, Rong Ge, YIngyu Liang, Tengyu Ma, Yi Zhang
We show that training of generative adversarial network (GAN) may not have good generalization properties; e. g., training may appear successful but the trained distribution may be far from target distribution in standard metrics.
no code implementations • ICML 2017 • Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael. I. Jordan
This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i. e., it is almost "dimension-free").
no code implementations • 22 Feb 2017 • Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, Sanjeev Arora
We take a first cut at explaining the expressivity of multilayer nets by giving a sufficient criterion for a function to be approximable by a neural network with $n$ hidden layers.
no code implementations • 28 Dec 2016 • Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski
Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables.
no code implementations • 28 Oct 2016 • Anima Anandkumar, Yuan Deng, Rong Ge, Hossein Mobahi
For the challenging problem of tensor PCA, we prove global convergence of the homotopy method in the "high noise" regime.
no code implementations • 27 May 2016 • Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, Ankur Moitra
But designing provable algorithms for inference has proven to be more challenging.
no code implementations • NeurIPS 2016 • Rong Ge, Jason D. Lee, Tengyu Ma
Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems.
no code implementations • 13 Apr 2016 • Rong Ge, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford
Our algorithm is linear in the input size and the number of components $k$ up to a $\log(k)$ factor.
no code implementations • 18 Feb 2016 • Anima Anandkumar, Rong Ge
Local search heuristics for non-convex optimizations are popular in applied machine learning.
no code implementations • 14 Jul 2015 • Rong Ge, James Zou
In this paper, we develop the general framework of Rich Component Analysis (RCA) to model settings where the observations from different views are driven by different sets of latent components, and each component can be a complex, high-dimensional distribution.
no code implementations • 8 Jul 2015 • Rong Ge, James Zou
A plethora of algorithms have been developed to tackle NMF, but due to the non-convex nature of the problem, there is little guarantee on how well these methods work.
no code implementations • 24 Jun 2015 • Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford
We develop a family of accelerated stochastic algorithms that minimize sums of convex functions.
no code implementations • 21 Apr 2015 • Rong Ge, Tengyu Ma
We also give a polynomial time algorithm for certifying the injective norm of random low rank tensors.
1 code implementation • 6 Mar 2015 • Rong Ge, Furong Huang, Chi Jin, Yang Yuan
To the best of our knowledge this is the first work that gives global convergence guarantees for stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points.
no code implementations • 2 Mar 2015 • Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra
Its standard formulation is as a non-convex optimization problem which is solved in practice by heuristics based on alternating minimization.
no code implementations • 2 Mar 2015 • Rong Ge, Qingqing Huang, Sham M. Kakade
Unfortunately, learning mixture of Gaussians is an information theoretically hard problem: in order to learn the parameters up to a reasonable accuracy, the number of samples required is exponential in the number of Gaussian components in the worst case.
no code implementations • 20 Dec 2014 • Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford
In the absence of computational constraints, the minimizer of a sample average of observed data -- commonly referred to as either the empirical risk minimizer (ERM) or the $M$-estimator -- is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties.
no code implementations • 13 Nov 2014 • Qingqing Huang, Rong Ge, Sham Kakade, Munther Dahleh
Consider a stationary discrete random process with alphabet size d, which is assumed to be the output process of an unknown stationary Hidden Markov Model (HMM).
no code implementations • 6 Nov 2014 • Anima Anandkumar, Rong Ge, Majid Janzamin
We present a novel analysis of the dynamics of tensor power iterations in the overcomplete regime where the tensor CP rank is larger than the input dimension.
no code implementations • 3 Aug 2014 • Animashree Anandkumar, Rong Ge, Majid Janzamin
In the unsupervised setting, we use a simple initialization algorithm based on SVD of the tensor slices, and provide guarantees under the stricter condition that $k\le \beta d$ (where constant $\beta$ can be larger than $1$), where the tensor method recovers the components under a polynomial running time (and exponential in $\beta$).
no code implementations • 21 Feb 2014 • Animashree Anandkumar, Rong Ge, Majid Janzamin
In this paper, we provide local and global convergence guarantees for recovering CP (Candecomp/Parafac) tensor decomposition.
no code implementations • 3 Jan 2014 • Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma
In dictionary learning, also known as sparse coding, the algorithm is given samples of the form $y = Ax$ where $x\in \mathbb{R}^m$ is an unknown random sparse vector and $A$ is an unknown dictionary matrix in $\mathbb{R}^{n\times m}$ (usually $m > n$, which is the overcomplete case).
no code implementations • 23 Oct 2013 • Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma
The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.
no code implementations • 28 Aug 2013 • Sanjeev Arora, Rong Ge, Ankur Moitra
In sparse recovery we are given a matrix $A$ (the dictionary) and a vector of the form $A X$ where $X$ is sparse, and the goal is to recover $X$.
no code implementations • 12 Feb 2013 • Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade
We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method.
2 code implementations • 19 Dec 2012 • Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu
Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora.
no code implementations • NeurIPS 2012 • Sanjeev Arora, Rong Ge, Ankur Moitra, Sushant Sachdeva
We present a new algorithm for Independent Component Analysis (ICA) which has provable performance guarantees.
no code implementations • 29 Oct 2012 • Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky
This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order).
2 code implementations • 9 Apr 2012 • Sanjeev Arora, Rong Ge, Ankur Moitra
Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents.