Search Results for author: Rong Ge

Found 63 papers, 9 papers with code

Customizing ML Predictions for Online Algorithms

no code implementations ICML 2020 Keerti Anand, Rong Ge, Debmalya Panigrahi

In this paper, we ask the complementary question: can we redesign ML algorithms to provide better predictions for online algorithms?

Understanding Deflation Process in Over-parametrized Tensor Decomposition

no code implementations11 Jun 2021 Rong Ge, Yunwei Ren, Xiang Wang, Mo Zhou

In this paper we study the training dynamics for gradient flow on over-parametrized tensor decomposition problems.

Tensor Decomposition

A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network

no code implementations4 Feb 2021 Mo Zhou, Rong Ge, Chi Jin

We show that as long as the loss is already lower than a threshold (polynomial in relevant parameters), all student neurons in an over-parameterized two-layer neural network will converge to one of teacher neurons, and the loss will go to 0.

Beyond Lazy Training for Over-parameterized Tensor Decomposition

no code implementations NeurIPS 2020 Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, Rong Ge

We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = \Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2. 5l}\log d)$.

Tensor Decomposition

Efficient sampling from the Bingham distribution

no code implementations30 Sep 2020 Rong Ge, Holden Lee, Jianfeng Lu, Andrej Risteski

We give a algorithm for exact sampling from the Bingham distribution $p(x)\propto \exp(x^\top A x)$ on the sphere $\mathcal S^{d-1}$ with expected runtime of $\operatorname{poly}(d, \lambda_{\max}(A)-\lambda_{\min}(A))$.

Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

1 code implementation30 Jun 2020 Xiang Wang, Shuai Yuan, Chenwei Wu, Rong Ge

Solving this problem using a learning-to-learn approach -- using meta-gradient descent on a meta-objective based on the trajectory that the optimizer generates -- was recently shown to be effective.

Optimization Landscape of Tucker Decomposition

no code implementations29 Jun 2020 Abraham Frandsen, Rong Ge

Finding a Tucker decomposition is a nonconvex optimization problem.

Extracting Latent State Representations with Linear Dynamics from Rich Observations

no code implementations29 Jun 2020 Abraham Frandsen, Rong Ge

In this work we study a model where there is a hidden linear subspace in which the dynamics is linear.

Energy-Aware DNN Graph Optimization

1 code implementation12 May 2020 Yu Wang, Rong Ge, Shuang Qiu

Unlike existing work in deep neural network (DNN) graphs optimization for inference performance, we explore DNN graph optimization for energy awareness and savings for power- and resource-constrained machine learning devices.

High-Dimensional Robust Mean Estimation via Gradient Descent

no code implementations ICML 2020 Yu Cheng, Ilias Diakonikolas, Rong Ge, Mahdi Soltanolkotabi

We study the problem of high-dimensional robust mean estimation in the presence of a constant fraction of adversarial outliers.

Estimating Normalizing Constants for Log-Concave Distributions: Algorithms and Lower Bounds

no code implementations8 Nov 2019 Rong Ge, Holden Lee, Jianfeng Lu

Estimating the normalizing constant of an unnormalized probability distribution has important applications in computer science, statistical physics, machine learning, and statistics.

Mildly Overparametrized Neural Nets can Memorize Training Data Efficiently

no code implementations26 Sep 2019 Rong Ge, Runzhe Wang, Haoyu Zhao

It has been observed \citep{zhang2016understanding} that deep neural networks can memorize: they achieve 100\% accuracy on training data.

Faster Algorithms for High-Dimensional Robust Covariance Estimation

no code implementations11 Jun 2019 Yu Cheng, Ilias Diakonikolas, Rong Ge, David Woodruff

We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted.

Rethinking learning rate schedules for stochastic optimization

no code implementations ICLR 2019 Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli

One plausible explanation is that non-convex neural network training procedures are better suited to the use of fundamentally different learning rate schedules, such as the ``cut the learning rate every constant number of epochs'' method (which more closely resembles an exponentially decaying learning rate schedule); note that this widely used schedule is in stark contrast to the polynomial decay schemes prescribed in the stochastic approximation literature, which are indeed shown to be (worst case) optimal for classes of convex optimization problems.

Stochastic Optimization

Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization

no code implementations1 May 2019 Rong Ge, Zhize Li, Wei-Yao Wang, Xiang Wang

Variance reduction techniques like SVRG provide simple and fast algorithms for optimizing a convex finite-sum objective.

The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares

1 code implementation NeurIPS 2019 Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli

First, this work shows that even if the time horizon T (i. e. the number of iterations SGD is run for) is known in advance, SGD's final iterate behavior with any polynomially decaying learning rate scheme is highly sub-optimal compared to the minimax rate (by a condition number factor in the strongly convex case and a factor of $\sqrt{T}$ in the non-strongly convex case).

Stochastic Optimization

On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points

no code implementations13 Feb 2019 Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan

More recent theory has shown that GD and SGD can avoid saddle points, but the dependence on dimension in these analyses is polynomial.

A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm

no code implementations11 Feb 2019 Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan

In this note, we derive concentration inequalities for random vectors with subGaussian norm (a generalization of both subGaussian random vectors and norm bounded random vectors), which are tight up to logarithmic factors.

Simulated Tempering Langevin Monte Carlo II: An Improved Proof using Soft Markov Chain Decomposition

no code implementations29 Nov 2018 Rong Ge, Holden Lee, Andrej Risteski

Previous approaches rely on decomposing the state space as a partition of sets, while our approach can be thought of as decomposing the stationary measure as a mixture of distributions (a "soft partition").

High-Dimensional Robust Mean Estimation in Nearly-Linear Time

no code implementations23 Nov 2018 Yu Cheng, Ilias Diakonikolas, Rong Ge

We study the fundamental problem of high-dimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted.

Learning Two-layer Neural Networks with Symmetric Inputs

no code implementations ICLR 2019 Rong Ge, Rohith Kuditipudi, Zhize Li, Xiang Wang

We give a new algorithm for learning a two-layer neural network under a general class of input distributions.

Non-Convex Matrix Completion Against a Semi-Random Adversary

no code implementations28 Mar 2018 Yu Cheng, Rong Ge

Matrix completion is a well-studied problem with many machine learning applications.

Matrix Completion

On the Local Minima of the Empirical Risk

no code implementations NeurIPS 2018 Chi Jin, Lydia T. Liu, Rong Ge, Michael. I. Jordan

Our objective is to find the $\epsilon$-approximate local minima of the underlying function $F$ while avoiding the shallow local minima---arising because of the tolerance $\nu$---which exist only in $f$.

Stronger generalization bounds for deep nets via a compression approach

no code implementations ICML 2018 Sanjeev Arora, Rong Ge, Behnam Neyshabur, Yi Zhang

Analysis of correctness of our compression relies upon some newly identified \textquotedblleft noise stability\textquotedblright properties of trained deep nets, which are also experimentally verified.

Generalization Bounds

Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator

no code implementations ICML 2018 Maryam Fazel, Rong Ge, Sham M. Kakade, Mehran Mesbahi

Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model 2) they are an "end-to-end" approach, directly optimizing the performance metric of interest 3) they inherently allow for richly parameterized policies.

Continuous Control Policy Gradient Methods

Global Convergence of Policy Gradient Methods for Linearized Control Problems

no code implementations ICLR 2018 Maryam Fazel, Rong Ge, Sham M. Kakade, Mehran Mesbahi

Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model; 2) they are an "end-to-end" approach, directly optimizing the performance metric of interest; 3) they inherently allow for richly parameterized policies.

Continuous Control Policy Gradient Methods

On the Optimization Landscape of Tensor Decompositions

no code implementations NeurIPS 2017 Rong Ge, Tengyu Ma

The landscape of many objective functions in learning has been conjectured to have the geometric property that "all local optima are (approximately) global optima", and thus they can be solved efficiently by local search algorithms.

Latent Variable Models Tensor Decomposition

No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis

no code implementations ICML 2017 Rong Ge, Chi Jin, Yi Zheng

In this paper we develop a new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA.

Matrix Completion

Generalization and Equilibrium in Generative Adversarial Nets (GANs)

1 code implementation ICML 2017 Sanjeev Arora, Rong Ge, YIngyu Liang, Tengyu Ma, Yi Zhang

We show that training of generative adversarial network (GAN) may not have good generalization properties; e. g., training may appear successful but the trained distribution may be far from target distribution in standard metrics.

How to Escape Saddle Points Efficiently

no code implementations ICML 2017 Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael. I. Jordan

This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i. e., it is almost "dimension-free").

On the ability of neural nets to express distributions

no code implementations22 Feb 2017 Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, Sanjeev Arora

We take a first cut at explaining the expressivity of multilayer nets by giving a sufficient criterion for a function to be approximable by a neural network with $n$ hidden layers.

Provable learning of Noisy-or Networks

no code implementations28 Dec 2016 Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski

Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables.

Latent Variable Models Tensor Decomposition +1

Homotopy Analysis for Tensor PCA

no code implementations28 Oct 2016 Anima Anandkumar, Yuan Deng, Rong Ge, Hossein Mobahi

For the challenging problem of tensor PCA, we prove global convergence of the homotopy method in the "high noise" regime.

Global Optimization

Matrix Completion has No Spurious Local Minimum

no code implementations NeurIPS 2016 Rong Ge, Jason D. Lee, Tengyu Ma

Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems.

Matrix Completion Recommendation Systems

Efficient approaches for escaping higher order saddle points in non-convex optimization

no code implementations18 Feb 2016 Anima Anandkumar, Rong Ge

Local search heuristics for non-convex optimizations are popular in applied machine learning.

Rich Component Analysis

no code implementations14 Jul 2015 Rong Ge, James Zou

In this paper, we develop the general framework of Rich Component Analysis (RCA) to model settings where the observations from different views are driven by different sets of latent components, and each component can be a complex, high-dimensional distribution.

Latent Variable Models

Intersecting Faces: Non-negative Matrix Factorization With New Guarantees

no code implementations8 Jul 2015 Rong Ge, James Zou

A plethora of algorithms have been developed to tackle NMF, but due to the non-convex nature of the problem, there is little guarantee on how well these methods work.

Decomposing Overcomplete 3rd Order Tensors using Sum-of-Squares Algorithms

no code implementations21 Apr 2015 Rong Ge, Tengyu Ma

We also give a polynomial time algorithm for certifying the injective norm of random low rank tensors.

Tensor Decomposition

Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition

1 code implementation6 Mar 2015 Rong Ge, Furong Huang, Chi Jin, Yang Yuan

To the best of our knowledge this is the first work that gives global convergence guarantees for stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points.

Latent Variable Models Tensor Decomposition

Simple, Efficient, and Neural Algorithms for Sparse Coding

no code implementations2 Mar 2015 Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra

Its standard formulation is as a non-convex optimization problem which is solved in practice by heuristics based on alternating minimization.

Learning Mixtures of Gaussians in High Dimensions

no code implementations2 Mar 2015 Rong Ge, Qingqing Huang, Sham M. Kakade

Unfortunately, learning mixture of Gaussians is an information theoretically hard problem: in order to learn the parameters up to a reasonable accuracy, the number of samples required is exponential in the number of Gaussian components in the worst case.

Learning Theory

Competing with the Empirical Risk Minimizer in a Single Pass

no code implementations20 Dec 2014 Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

In the absence of computational constraints, the minimizer of a sample average of observed data -- commonly referred to as either the empirical risk minimizer (ERM) or the $M$-estimator -- is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties.

Minimal Realization Problems for Hidden Markov Models

no code implementations13 Nov 2014 Qingqing Huang, Rong Ge, Sham Kakade, Munther Dahleh

Consider a stationary discrete random process with alphabet size d, which is assumed to be the output process of an unknown stationary Hidden Markov Model (HMM).

Latent Variable Models Tensor Decomposition

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

no code implementations6 Nov 2014 Anima Anandkumar, Rong Ge, Majid Janzamin

We present a novel analysis of the dynamics of tensor power iterations in the overcomplete regime where the tensor CP rank is larger than the input dimension.

Latent Variable Models

Sample Complexity Analysis for Learning Overcomplete Latent Variable Models through Tensor Methods

no code implementations3 Aug 2014 Animashree Anandkumar, Rong Ge, Majid Janzamin

In the unsupervised setting, we use a simple initialization algorithm based on SVD of the tensor slices, and provide guarantees under the stricter condition that $k\le \beta d$ (where constant $\beta$ can be larger than $1$), where the tensor method recovers the components under a polynomial running time (and exponential in $\beta$).

Latent Variable Models Tensor Decomposition

Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-$1$ Updates

no code implementations21 Feb 2014 Animashree Anandkumar, Rong Ge, Majid Janzamin

In this paper, we provide local and global convergence guarantees for recovering CP (Candecomp/Parafac) tensor decomposition.

Tensor Decomposition

More Algorithms for Provable Dictionary Learning

no code implementations3 Jan 2014 Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

In dictionary learning, also known as sparse coding, the algorithm is given samples of the form $y = Ax$ where $x\in \mathbb{R}^m$ is an unknown random sparse vector and $A$ is an unknown dictionary matrix in $\mathbb{R}^{n\times m}$ (usually $m > n$, which is the overcomplete case).

Dictionary Learning

Provable Bounds for Learning Some Deep Representations

no code implementations23 Oct 2013 Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.

New Algorithms for Learning Incoherent and Overcomplete Dictionaries

no code implementations28 Aug 2013 Sanjeev Arora, Rong Ge, Ankur Moitra

In sparse recovery we are given a matrix $A$ (the dictionary) and a vector of the form $A X$ where $X$ is sparse, and the goal is to recover $X$.

Dictionary Learning Edge Detection +1

A Tensor Approach to Learning Mixed Membership Community Models

no code implementations12 Feb 2013 Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade

We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method.

Community Detection Stochastic Block Model

A Practical Algorithm for Topic Modeling with Provable Guarantees

2 code implementations19 Dec 2012 Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu

Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora.

Dimensionality Reduction Topic Models

Tensor decompositions for learning latent variable models

no code implementations29 Oct 2012 Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky

This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order).

Latent Variable Models

Learning Topic Models - Going beyond SVD

2 code implementations9 Apr 2012 Sanjeev Arora, Rong Ge, Ankur Moitra

Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents.

Topic Models

Cannot find the paper you are looking for? You can Submit a new open access paper.