no code implementations • NeurIPS 2023 • Shuyao Li, Yu Cheng, Ilias Diakonikolas, Jelena Diakonikolas, Rong Ge, Stephen J. Wright

We introduce a general framework for efficiently finding an approximate SOSP with \emph{dimension-independent} accuracy guarantees, using $\widetilde{O}({D^2}/{\epsilon})$ samples where $D$ is the ambient dimension and $\epsilon$ is the fraction of corrupted datapoints.

no code implementations • 21 Feb 2024 • Max Vladymyrov, Johannes von Oswald, Mark Sandler, Rong Ge

Recent research has demonstrated that transformers, particularly linear attention models, implicitly execute gradient-descent-like algorithms on data provided in-context during their forward inference step.

no code implementations • 14 Feb 2024 • Ziang Chen, Rong Ge

In this work, we study the mean-field flow for learning subspace-sparse polynomials using stochastic gradient descent and two-layer neural networks, where the input distribution is standard Gaussian and the output only depends on the projection of the input onto a low-dimensional subspace.

no code implementations • 10 Feb 2024 • Muthu Chidambaram, Rong Ge

Data augmentation has been pivotal in successfully training deep learning models on classification tasks over the past decade.

2 code implementations • 9 Jan 2024 • Thomas Randall, Jaehoon Koo, Brice Videau, Michael Kruse, Xingfu Wu, Paul Hovland, Mary Hall, Rong Ge, Prasanna Balaprakash

We introduce the first generative TL-based autotuning approach based on the Gaussian copula (GC) to model the high-performing regions of the search space from prior data and then generate high-performing configurations for new tasks.

1 code implementation • 12 Dec 2023 • Thomas Randall, Tyler Allen, Rong Ge

Word2Vec remains one of the highly-impactful innovations in the field of Natural Language Processing (NLP) that represents latent grammatical and syntactical information in human text with dense vectors in a low dimension.

no code implementations • 4 Oct 2023 • Chenwei Wu, Li Erran Li, Stefano Ermon, Patrick Haffner, Rong Ge, Zaiwei Zhang

Compositionality is a common property in many modalities including natural languages and images, but the compositional generalization of multi-modal models is not well-understood.

1 code implementation • 1 Jun 2023 • Muthu Chidambaram, Rong Ge

Despite the impressive generalization capabilities of deep neural networks, they have been repeatedly shown to be overconfident when they are wrong.

no code implementations • 3 Apr 2023 • Yunwei Ren, Mo Zhou, Rong Ge

Depth separation -- why a deeper network is more powerful than a shallower one -- has been a major problem in deep learning theory.

no code implementations • 14 Mar 2023 • Haoyu Zhao, Abhishek Panigrahi, Rong Ge, Sanjeev Arora

We also show that the Inside-Outside algorithm is optimal for masked language modeling loss on the PCFG-generated data.

1 code implementation • 24 Feb 2023 • Muthu Chidambaram, Chenwei Wu, Yu Cheng, Rong Ge

Furthermore, drawing from the growing body of work on self-supervised learning, we propose a novel masking objective for which recovering the ground-truth dictionary is in fact optimal as the signal increases for a large class of data-generating processes.

no code implementations • 1 Feb 2023 • Mo Zhou, Rong Ge

In this work, we give a different parametrization of the model which leads to a new implicit regularization effect that combines the benefit of $\ell_1$ and $\ell_2$ interpolators.

1 code implementation • 24 Oct 2022 • Muthu Chidambaram, Xiang Wang, Chenwei Wu, Rong Ge

Mixup is a data augmentation technique that relies on training using random convex combinations of data points and their labels.

no code implementations • 7 Oct 2022 • Xingyu Zhu, Zixuan Wang, Xiang Wang, Mo Zhou, Rong Ge

Globally we observe that the training dynamics for our example has an interesting bifurcating behavior, which was also observed in the training of neural nets.

no code implementations • 3 Oct 2022 • Xiang Wang, Annie N. Wang, Mo Zhou, Rong Ge

Monotonic linear interpolation (MLI) - on the line connecting a random initialization with the minimizer it converges to, the loss and accuracy are monotonic - is a phenomenon that is commonly observed in the training of neural networks.

no code implementations • NeurIPS 2021 • Keerti Anand, Rong Ge, Amit Kumar, Debmalya Panigrahi

The emerging field of learning-augmented online algorithms uses ML techniques to predict future input parameters and thereby improve the performance of online algorithms.

no code implementations • ICML 2020 • Keerti Anand, Rong Ge, Debmalya Panigrahi

A popular line of recent research incorporates ML advice in the design of online algorithms to improve their performance in typical instances.

no code implementations • 8 May 2022 • Keerti Anand, Rong Ge, Amit Kumar, Debmalya Panigrahi

In this paper, we give a generic algorithmic framework for online covering problems with multiple predictions that obtains an online solution that is competitive against the performance of the best predictor.

no code implementations • 2 Feb 2022 • Zeping Luo, Shiyou Wu, Cindy Weng, Mo Zhou, Rong Ge

Self-supervised learning has significantly improved the performance of many NLP tasks.

1 code implementation • ICLR 2022 • Muthu Chidambaram, Xiang Wang, Yuzheng Hu, Chenwei Wu, Rong Ge

Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk and exhibit better generalization and robustness on various tasks when compared to standard training.

no code implementations • 29 Sep 2021 • Zeping Luo, Cindy Weng, Shiyou Wu, Mo Zhou, Rong Ge

Self-supervised learning has significantly improved the performance of many NLP tasks.

1 code implementation • 23 Sep 2021 • Yu Cheng, Ilias Diakonikolas, Rong Ge, Shivam Gupta, Daniel M. Kane, Mahdi Soltanolkotabi

We explore the connection between outlier-robust high-dimensional statistics and non-convex optimization in the presence of sparsity constraints, with a focus on the fundamental tasks of robust sparse mean estimation and robust sparse PCA.

no code implementations • NeurIPS 2021 • Rong Ge, Yunwei Ren, Xiang Wang, Mo Zhou

In this paper we study the training dynamics for gradient flow on over-parametrized tensor decomposition problems.

no code implementations • 4 Feb 2021 • Mo Zhou, Rong Ge, Chi Jin

We show that as long as the loss is already lower than a threshold (polynomial in relevant parameters), all student neurons in an over-parameterized two-layer neural network will converge to one of teacher neurons, and the loss will go to 0.

no code implementations • NeurIPS 2020 • Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, Rong Ge

We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = \Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2. 5l}\log d)$.

no code implementations • 8 Oct 2020 • Yikai Wu, Xingyu Zhu, Chenwei Wu, Annie Wang, Rong Ge

We can analyze the properties of these smaller matrices and prove the structure of top eigenspace random 2-layer networks.

no code implementations • 30 Sep 2020 • Rong Ge, Holden Lee, Jianfeng Lu, Andrej Risteski

We give a algorithm for exact sampling from the Bingham distribution $p(x)\propto \exp(x^\top A x)$ on the sphere $\mathcal S^{d-1}$ with expected runtime of $\operatorname{poly}(d, \lambda_{\max}(A)-\lambda_{\min}(A))$.

1 code implementation • 30 Jun 2020 • Xiang Wang, Shuai Yuan, Chenwei Wu, Rong Ge

Solving this problem using a learning-to-learn approach -- using meta-gradient descent on a meta-objective based on the trajectory that the optimizer generates -- was recently shown to be effective.

no code implementations • 29 Jun 2020 • Abraham Frandsen, Rong Ge

In this work we study a model where there is a hidden linear subspace in which the dynamics is linear.

no code implementations • 29 Jun 2020 • Abraham Frandsen, Rong Ge

Finding a Tucker decomposition is a nonconvex optimization problem.

1 code implementation • 12 May 2020 • Yu Wang, Rong Ge, Shuang Qiu

Unlike existing work in deep neural network (DNN) graphs optimization for inference performance, we explore DNN graph optimization for energy awareness and savings for power- and resource-constrained machine learning devices.

no code implementations • ICML 2020 • Yu Cheng, Ilias Diakonikolas, Rong Ge, Mahdi Soltanolkotabi

We study the problem of high-dimensional robust mean estimation in the presence of a constant fraction of adversarial outliers.

no code implementations • 16 Apr 2020 • Majid Janzamin, Rong Ge, Jean Kossaifi, Anima Anandkumar

PCA and other spectral techniques applied to matrices have several limitations.

no code implementations • 8 Nov 2019 • Rong Ge, Holden Lee, Jianfeng Lu

Estimating the normalizing constant of an unnormalized probability distribution has important applications in computer science, statistical physics, machine learning, and statistics.

no code implementations • 26 Sep 2019 • Rong Ge, Runzhe Wang, Haoyu Zhao

It has been observed \citep{zhang2016understanding} that deep neural networks can memorize: they achieve 100\% accuracy on training data.

1 code implementation • NeurIPS 2019 • Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Sanjeev Arora, Rong Ge

Mode connectivity is a surprising phenomenon in the loss landscape of deep nets.

no code implementations • 11 Jun 2019 • Yu Cheng, Ilias Diakonikolas, Rong Ge, David Woodruff

We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted.

no code implementations • 1 May 2019 • Rong Ge, Zhize Li, Wei-Yao Wang, Xiang Wang

Variance reduction techniques like SVRG provide simple and fast algorithms for optimizing a convex finite-sum objective.

no code implementations • ICLR 2019 • Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli

One plausible explanation is that non-convex neural network training procedures are better suited to the use of fundamentally different learning rate schedules, such as the ``cut the learning rate every constant number of epochs'' method (which more closely resembles an exponentially decaying learning rate schedule); note that this widely used schedule is in stark contrast to the polynomial decay schemes prescribed in the stochastic approximation literature, which are indeed shown to be (worst case) optimal for classes of convex optimization problems.

1 code implementation • NeurIPS 2019 • Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli

First, this work shows that even if the time horizon T (i. e. the number of iterations SGD is run for) is known in advance, SGD's final iterate behavior with any polynomially decaying learning rate scheme is highly sub-optimal compared to the minimax rate (by a condition number factor in the strongly convex case and a factor of $\sqrt{T}$ in the non-strongly convex case).

no code implementations • 13 Feb 2019 • Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan

More recent theory has shown that GD and SGD can avoid saddle points, but the dependence on dimension in these analyses is polynomial.

no code implementations • 11 Feb 2019 • Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan

In this note, we derive concentration inequalities for random vectors with subGaussian norm (a generalization of both subGaussian random vectors and norm bounded random vectors), which are tight up to logarithmic factors.

1 code implementation • ICLR 2019 • Abraham Frandsen, Rong Ge

Word embedding is a powerful tool in natural language processing.

no code implementations • 29 Nov 2018 • Rong Ge, Holden Lee, Andrej Risteski

Previous approaches rely on decomposing the state space as a partition of sets, while our approach can be thought of as decomposing the stationary measure as a mixture of distributions (a "soft partition").

no code implementations • 23 Nov 2018 • Yu Cheng, Ilias Diakonikolas, Rong Ge

We study the fundamental problem of high-dimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted.

no code implementations • ICLR 2019 • Rong Ge, Rohith Kuditipudi, Zhize Li, Xiang Wang

We give a new algorithm for learning a two-layer neural network under a general class of input distributions.

no code implementations • 28 Mar 2018 • Yu Cheng, Rong Ge

Matrix completion is a well-studied problem with many machine learning applications.

no code implementations • NeurIPS 2018 • Chi Jin, Lydia T. Liu, Rong Ge, Michael. I. Jordan

Our objective is to find the $\epsilon$-approximate local minima of the underlying function $F$ while avoiding the shallow local minima---arising because of the tolerance $\nu$---which exist only in $f$.

no code implementations • ICML 2018 • Sanjeev Arora, Rong Ge, Behnam Neyshabur, Yi Zhang

Analysis of correctness of our compression relies upon some newly identified \textquotedblleft noise stability\textquotedblright properties of trained deep nets, which are also experimentally verified.

no code implementations • ICML 2018 • Maryam Fazel, Rong Ge, Sham M. Kakade, Mehran Mesbahi

Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model 2) they are an "end-to-end" approach, directly optimizing the performance metric of interest 3) they inherently allow for richly parameterized policies.

no code implementations • ICLR 2018 • Maryam Fazel, Rong Ge, Sham M. Kakade, Mehran Mesbahi

Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model; 2) they are an "end-to-end" approach, directly optimizing the performance metric of interest; 3) they inherently allow for richly parameterized policies.

no code implementations • ICLR 2018 • Rong Ge, Jason D. Lee, Tengyu Ma

All global minima of $G$ correspond to the ground truth parameters.

no code implementations • NeurIPS 2018 • Rong Ge, Holden Lee, Andrej Risteski

We analyze this Markov chain for the canonical multi-modal distribution: a mixture of gaussians (of equal variance).

no code implementations • NeurIPS 2017 • Rong Ge, Tengyu Ma

The landscape of many objective functions in learning has been conjectured to have the geometric property that "all local optima are (approximately) global optima", and thus they can be solved efficiently by local search algorithms.

no code implementations • ICML 2017 • Rong Ge, Chi Jin, Yi Zheng

In this paper we develop a new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA.

no code implementations • ICML 2017 • Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael. I. Jordan

This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i. e., it is almost "dimension-free").

1 code implementation • ICML 2017 • Sanjeev Arora, Rong Ge, YIngyu Liang, Tengyu Ma, Yi Zhang

We show that training of generative adversarial network (GAN) may not have good generalization properties; e. g., training may appear successful but the trained distribution may be far from target distribution in standard metrics.

no code implementations • 22 Feb 2017 • Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, Sanjeev Arora

We take a first cut at explaining the expressivity of multilayer nets by giving a sufficient criterion for a function to be approximable by a neural network with $n$ hidden layers.

no code implementations • 28 Dec 2016 • Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski

Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables.

no code implementations • 28 Oct 2016 • Anima Anandkumar, Yuan Deng, Rong Ge, Hossein Mobahi

For the challenging problem of tensor PCA, we prove global convergence of the homotopy method in the "high noise" regime.

no code implementations • 27 May 2016 • Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, Ankur Moitra

But designing provable algorithms for inference has proven to be more challenging.

no code implementations • NeurIPS 2016 • Rong Ge, Jason D. Lee, Tengyu Ma

Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems.

no code implementations • 13 Apr 2016 • Rong Ge, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford

Our algorithm is linear in the input size and the number of components $k$ up to a $\log(k)$ factor.

no code implementations • 18 Feb 2016 • Anima Anandkumar, Rong Ge

Local search heuristics for non-convex optimizations are popular in applied machine learning.

no code implementations • 14 Jul 2015 • Rong Ge, James Zou

In this paper, we develop the general framework of Rich Component Analysis (RCA) to model settings where the observations from different views are driven by different sets of latent components, and each component can be a complex, high-dimensional distribution.

no code implementations • 8 Jul 2015 • Rong Ge, James Zou

A plethora of algorithms have been developed to tackle NMF, but due to the non-convex nature of the problem, there is little guarantee on how well these methods work.

no code implementations • 24 Jun 2015 • Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

We develop a family of accelerated stochastic algorithms that minimize sums of convex functions.

no code implementations • 21 Apr 2015 • Rong Ge, Tengyu Ma

We also give a polynomial time algorithm for certifying the injective norm of random low rank tensors.

1 code implementation • 6 Mar 2015 • Rong Ge, Furong Huang, Chi Jin, Yang Yuan

To the best of our knowledge this is the first work that gives global convergence guarantees for stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points.

no code implementations • 2 Mar 2015 • Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra

Its standard formulation is as a non-convex optimization problem which is solved in practice by heuristics based on alternating minimization.

no code implementations • 2 Mar 2015 • Rong Ge, Qingqing Huang, Sham M. Kakade

Unfortunately, learning mixture of Gaussians is an information theoretically hard problem: in order to learn the parameters up to a reasonable accuracy, the number of samples required is exponential in the number of Gaussian components in the worst case.

no code implementations • 20 Dec 2014 • Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

In the absence of computational constraints, the minimizer of a sample average of observed data -- commonly referred to as either the empirical risk minimizer (ERM) or the $M$-estimator -- is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties.

no code implementations • 13 Nov 2014 • Qingqing Huang, Rong Ge, Sham Kakade, Munther Dahleh

Consider a stationary discrete random process with alphabet size d, which is assumed to be the output process of an unknown stationary Hidden Markov Model (HMM).

no code implementations • 6 Nov 2014 • Anima Anandkumar, Rong Ge, Majid Janzamin

We present a novel analysis of the dynamics of tensor power iterations in the overcomplete regime where the tensor CP rank is larger than the input dimension.

no code implementations • 3 Aug 2014 • Animashree Anandkumar, Rong Ge, Majid Janzamin

In the unsupervised setting, we use a simple initialization algorithm based on SVD of the tensor slices, and provide guarantees under the stricter condition that $k\le \beta d$ (where constant $\beta$ can be larger than $1$), where the tensor method recovers the components under a polynomial running time (and exponential in $\beta$).

no code implementations • 21 Feb 2014 • Animashree Anandkumar, Rong Ge, Majid Janzamin

In this paper, we provide local and global convergence guarantees for recovering CP (Candecomp/Parafac) tensor decomposition.

no code implementations • 3 Jan 2014 • Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

In dictionary learning, also known as sparse coding, the algorithm is given samples of the form $y = Ax$ where $x\in \mathbb{R}^m$ is an unknown random sparse vector and $A$ is an unknown dictionary matrix in $\mathbb{R}^{n\times m}$ (usually $m > n$, which is the overcomplete case).

no code implementations • 23 Oct 2013 • Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.

no code implementations • 28 Aug 2013 • Sanjeev Arora, Rong Ge, Ankur Moitra

In sparse recovery we are given a matrix $A$ (the dictionary) and a vector of the form $A X$ where $X$ is sparse, and the goal is to recover $X$.

no code implementations • 12 Feb 2013 • Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade

We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method.

2 code implementations • 19 Dec 2012 • Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu

Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora.

no code implementations • NeurIPS 2012 • Sanjeev Arora, Rong Ge, Ankur Moitra, Sushant Sachdeva

We present a new algorithm for Independent Component Analysis (ICA) which has provable performance guarantees.

no code implementations • 29 Oct 2012 • Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky

This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order).

2 code implementations • 9 Apr 2012 • Sanjeev Arora, Rong Ge, Ankur Moitra

Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.