You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • ICML 2020 • Keerti Anand, Rong Ge, Debmalya Panigrahi

In this paper, we ask the complementary question: can we redesign ML algorithms to provide better predictions for online algorithms?

no code implementations • 11 Jun 2021 • Rong Ge, Yunwei Ren, Xiang Wang, Mo Zhou

In this paper we study the training dynamics for gradient flow on over-parametrized tensor decomposition problems.

no code implementations • 4 Feb 2021 • Mo Zhou, Rong Ge, Chi Jin

We show that as long as the loss is already lower than a threshold (polynomial in relevant parameters), all student neurons in an over-parameterized two-layer neural network will converge to one of teacher neurons, and the loss will go to 0.

no code implementations • NeurIPS 2020 • Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, Rong Ge

We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = \Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2. 5l}\log d)$.

no code implementations • 8 Oct 2020 • Yikai Wu, Xingyu Zhu, Chenwei Wu, Annie Wang, Rong Ge

Hessian captures important properties of the deep neural network loss landscape.

no code implementations • 30 Sep 2020 • Rong Ge, Holden Lee, Jianfeng Lu, Andrej Risteski

We give a algorithm for exact sampling from the Bingham distribution $p(x)\propto \exp(x^\top A x)$ on the sphere $\mathcal S^{d-1}$ with expected runtime of $\operatorname{poly}(d, \lambda_{\max}(A)-\lambda_{\min}(A))$.

1 code implementation • 30 Jun 2020 • Xiang Wang, Shuai Yuan, Chenwei Wu, Rong Ge

Solving this problem using a learning-to-learn approach -- using meta-gradient descent on a meta-objective based on the trajectory that the optimizer generates -- was recently shown to be effective.

no code implementations • 29 Jun 2020 • Abraham Frandsen, Rong Ge

Finding a Tucker decomposition is a nonconvex optimization problem.

no code implementations • 29 Jun 2020 • Abraham Frandsen, Rong Ge

In this work we study a model where there is a hidden linear subspace in which the dynamics is linear.

1 code implementation • 12 May 2020 • Yu Wang, Rong Ge, Shuang Qiu

Unlike existing work in deep neural network (DNN) graphs optimization for inference performance, we explore DNN graph optimization for energy awareness and savings for power- and resource-constrained machine learning devices.

no code implementations • ICML 2020 • Yu Cheng, Ilias Diakonikolas, Rong Ge, Mahdi Soltanolkotabi

We study the problem of high-dimensional robust mean estimation in the presence of a constant fraction of adversarial outliers.

no code implementations • 16 Apr 2020 • Majid Janzamin, Rong Ge, Jean Kossaifi, Anima Anandkumar

PCA and other spectral techniques applied to matrices have several limitations.

no code implementations • 8 Nov 2019 • Rong Ge, Holden Lee, Jianfeng Lu

Estimating the normalizing constant of an unnormalized probability distribution has important applications in computer science, statistical physics, machine learning, and statistics.

no code implementations • 26 Sep 2019 • Rong Ge, Runzhe Wang, Haoyu Zhao

It has been observed \citep{zhang2016understanding} that deep neural networks can memorize: they achieve 100\% accuracy on training data.

1 code implementation • NeurIPS 2019 • Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Sanjeev Arora, Rong Ge

Mode connectivity is a surprising phenomenon in the loss landscape of deep nets.

no code implementations • 11 Jun 2019 • Yu Cheng, Ilias Diakonikolas, Rong Ge, David Woodruff

We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted.

no code implementations • ICLR 2019 • Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli

One plausible explanation is that non-convex neural network training procedures are better suited to the use of fundamentally different learning rate schedules, such as the ``cut the learning rate every constant number of epochs'' method (which more closely resembles an exponentially decaying learning rate schedule); note that this widely used schedule is in stark contrast to the polynomial decay schemes prescribed in the stochastic approximation literature, which are indeed shown to be (worst case) optimal for classes of convex optimization problems.

no code implementations • 1 May 2019 • Rong Ge, Zhize Li, Wei-Yao Wang, Xiang Wang

Variance reduction techniques like SVRG provide simple and fast algorithms for optimizing a convex finite-sum objective.

1 code implementation • NeurIPS 2019 • Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli

First, this work shows that even if the time horizon T (i. e. the number of iterations SGD is run for) is known in advance, SGD's final iterate behavior with any polynomially decaying learning rate scheme is highly sub-optimal compared to the minimax rate (by a condition number factor in the strongly convex case and a factor of $\sqrt{T}$ in the non-strongly convex case).

no code implementations • 13 Feb 2019 • Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan

More recent theory has shown that GD and SGD can avoid saddle points, but the dependence on dimension in these analyses is polynomial.

no code implementations • 11 Feb 2019 • Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan

In this note, we derive concentration inequalities for random vectors with subGaussian norm (a generalization of both subGaussian random vectors and norm bounded random vectors), which are tight up to logarithmic factors.

1 code implementation • ICLR 2019 • Abraham Frandsen, Rong Ge

Word embedding is a powerful tool in natural language processing.

no code implementations • 29 Nov 2018 • Rong Ge, Holden Lee, Andrej Risteski

Previous approaches rely on decomposing the state space as a partition of sets, while our approach can be thought of as decomposing the stationary measure as a mixture of distributions (a "soft partition").

no code implementations • 23 Nov 2018 • Yu Cheng, Ilias Diakonikolas, Rong Ge

We study the fundamental problem of high-dimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted.

no code implementations • ICLR 2019 • Rong Ge, Rohith Kuditipudi, Zhize Li, Xiang Wang

We give a new algorithm for learning a two-layer neural network under a general class of input distributions.

no code implementations • 28 Mar 2018 • Yu Cheng, Rong Ge

Matrix completion is a well-studied problem with many machine learning applications.

no code implementations • NeurIPS 2018 • Chi Jin, Lydia T. Liu, Rong Ge, Michael. I. Jordan

Our objective is to find the $\epsilon$-approximate local minima of the underlying function $F$ while avoiding the shallow local minima---arising because of the tolerance $\nu$---which exist only in $f$.

no code implementations • ICML 2018 • Sanjeev Arora, Rong Ge, Behnam Neyshabur, Yi Zhang

Analysis of correctness of our compression relies upon some newly identified \textquotedblleft noise stability\textquotedblright properties of trained deep nets, which are also experimentally verified.

no code implementations • ICML 2018 • Maryam Fazel, Rong Ge, Sham M. Kakade, Mehran Mesbahi

Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model 2) they are an "end-to-end" approach, directly optimizing the performance metric of interest 3) they inherently allow for richly parameterized policies.

no code implementations • ICLR 2018 • Maryam Fazel, Rong Ge, Sham M. Kakade, Mehran Mesbahi

Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model; 2) they are an "end-to-end" approach, directly optimizing the performance metric of interest; 3) they inherently allow for richly parameterized policies.

no code implementations • ICLR 2018 • Rong Ge, Jason D. Lee, Tengyu Ma

All global minima of $G$ correspond to the ground truth parameters.

no code implementations • NeurIPS 2018 • Rong Ge, Holden Lee, Andrej Risteski

We analyze this Markov chain for the canonical multi-modal distribution: a mixture of gaussians (of equal variance).

no code implementations • NeurIPS 2017 • Rong Ge, Tengyu Ma

The landscape of many objective functions in learning has been conjectured to have the geometric property that "all local optima are (approximately) global optima", and thus they can be solved efficiently by local search algorithms.

no code implementations • ICML 2017 • Rong Ge, Chi Jin, Yi Zheng

In this paper we develop a new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA.

1 code implementation • ICML 2017 • Sanjeev Arora, Rong Ge, YIngyu Liang, Tengyu Ma, Yi Zhang

We show that training of generative adversarial network (GAN) may not have good generalization properties; e. g., training may appear successful but the trained distribution may be far from target distribution in standard metrics.

no code implementations • ICML 2017 • Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael. I. Jordan

This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i. e., it is almost "dimension-free").

no code implementations • 22 Feb 2017 • Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, Sanjeev Arora

We take a first cut at explaining the expressivity of multilayer nets by giving a sufficient criterion for a function to be approximable by a neural network with $n$ hidden layers.

no code implementations • 28 Dec 2016 • Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski

Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables.

no code implementations • 28 Oct 2016 • Anima Anandkumar, Yuan Deng, Rong Ge, Hossein Mobahi

For the challenging problem of tensor PCA, we prove global convergence of the homotopy method in the "high noise" regime.

no code implementations • 27 May 2016 • Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, Ankur Moitra

But designing provable algorithms for inference has proven to be more challenging.

no code implementations • NeurIPS 2016 • Rong Ge, Jason D. Lee, Tengyu Ma

Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems.

no code implementations • 13 Apr 2016 • Rong Ge, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford

Our algorithm is linear in the input size and the number of components $k$ up to a $\log(k)$ factor.

no code implementations • 18 Feb 2016 • Anima Anandkumar, Rong Ge

Local search heuristics for non-convex optimizations are popular in applied machine learning.

no code implementations • 14 Jul 2015 • Rong Ge, James Zou

In this paper, we develop the general framework of Rich Component Analysis (RCA) to model settings where the observations from different views are driven by different sets of latent components, and each component can be a complex, high-dimensional distribution.

no code implementations • 8 Jul 2015 • Rong Ge, James Zou

A plethora of algorithms have been developed to tackle NMF, but due to the non-convex nature of the problem, there is little guarantee on how well these methods work.

no code implementations • 24 Jun 2015 • Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

We develop a family of accelerated stochastic algorithms that minimize sums of convex functions.

no code implementations • 21 Apr 2015 • Rong Ge, Tengyu Ma

We also give a polynomial time algorithm for certifying the injective norm of random low rank tensors.

1 code implementation • 6 Mar 2015 • Rong Ge, Furong Huang, Chi Jin, Yang Yuan

To the best of our knowledge this is the first work that gives global convergence guarantees for stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points.

no code implementations • 2 Mar 2015 • Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra

Its standard formulation is as a non-convex optimization problem which is solved in practice by heuristics based on alternating minimization.

no code implementations • 2 Mar 2015 • Rong Ge, Qingqing Huang, Sham M. Kakade

Unfortunately, learning mixture of Gaussians is an information theoretically hard problem: in order to learn the parameters up to a reasonable accuracy, the number of samples required is exponential in the number of Gaussian components in the worst case.

no code implementations • 20 Dec 2014 • Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

In the absence of computational constraints, the minimizer of a sample average of observed data -- commonly referred to as either the empirical risk minimizer (ERM) or the $M$-estimator -- is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties.

no code implementations • 13 Nov 2014 • Qingqing Huang, Rong Ge, Sham Kakade, Munther Dahleh

Consider a stationary discrete random process with alphabet size d, which is assumed to be the output process of an unknown stationary Hidden Markov Model (HMM).

no code implementations • 6 Nov 2014 • Anima Anandkumar, Rong Ge, Majid Janzamin

We present a novel analysis of the dynamics of tensor power iterations in the overcomplete regime where the tensor CP rank is larger than the input dimension.

no code implementations • 3 Aug 2014 • Animashree Anandkumar, Rong Ge, Majid Janzamin

In the unsupervised setting, we use a simple initialization algorithm based on SVD of the tensor slices, and provide guarantees under the stricter condition that $k\le \beta d$ (where constant $\beta$ can be larger than $1$), where the tensor method recovers the components under a polynomial running time (and exponential in $\beta$).

no code implementations • 21 Feb 2014 • Animashree Anandkumar, Rong Ge, Majid Janzamin

In this paper, we provide local and global convergence guarantees for recovering CP (Candecomp/Parafac) tensor decomposition.

no code implementations • 3 Jan 2014 • Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

In dictionary learning, also known as sparse coding, the algorithm is given samples of the form $y = Ax$ where $x\in \mathbb{R}^m$ is an unknown random sparse vector and $A$ is an unknown dictionary matrix in $\mathbb{R}^{n\times m}$ (usually $m > n$, which is the overcomplete case).

no code implementations • 23 Oct 2013 • Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.

no code implementations • 28 Aug 2013 • Sanjeev Arora, Rong Ge, Ankur Moitra

In sparse recovery we are given a matrix $A$ (the dictionary) and a vector of the form $A X$ where $X$ is sparse, and the goal is to recover $X$.

no code implementations • 12 Feb 2013 • Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade

We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method.

2 code implementations • 19 Dec 2012 • Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu

Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora.

no code implementations • NeurIPS 2012 • Sanjeev Arora, Rong Ge, Ankur Moitra, Sushant Sachdeva

We present a new algorithm for Independent Component Analysis (ICA) which has provable performance guarantees.

no code implementations • 29 Oct 2012 • Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky

This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order).

2 code implementations • 9 Apr 2012 • Sanjeev Arora, Rong Ge, Ankur Moitra

Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.