Search Results for author: Rong Ge

Found 84 papers, 16 papers with code

Learning Topic Models - Going beyond SVD

2 code implementations9 Apr 2012 Sanjeev Arora, Rong Ge, Ankur Moitra

Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents.

Topic Models

Tensor decompositions for learning latent variable models

no code implementations29 Oct 2012 Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky

This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order).

A Practical Algorithm for Topic Modeling with Provable Guarantees

2 code implementations19 Dec 2012 Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu

Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora.

Dimensionality Reduction Topic Models

A Tensor Approach to Learning Mixed Membership Community Models

no code implementations12 Feb 2013 Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade

We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method.

Community Detection Stochastic Block Model

New Algorithms for Learning Incoherent and Overcomplete Dictionaries

no code implementations28 Aug 2013 Sanjeev Arora, Rong Ge, Ankur Moitra

In sparse recovery we are given a matrix $A$ (the dictionary) and a vector of the form $A X$ where $X$ is sparse, and the goal is to recover $X$.

Dictionary Learning Edge Detection +1

Provable Bounds for Learning Some Deep Representations

no code implementations23 Oct 2013 Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.

More Algorithms for Provable Dictionary Learning

no code implementations3 Jan 2014 Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

In dictionary learning, also known as sparse coding, the algorithm is given samples of the form $y = Ax$ where $x\in \mathbb{R}^m$ is an unknown random sparse vector and $A$ is an unknown dictionary matrix in $\mathbb{R}^{n\times m}$ (usually $m > n$, which is the overcomplete case).

Dictionary Learning

Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-$1$ Updates

no code implementations21 Feb 2014 Animashree Anandkumar, Rong Ge, Majid Janzamin

In this paper, we provide local and global convergence guarantees for recovering CP (Candecomp/Parafac) tensor decomposition.

Tensor Decomposition

Sample Complexity Analysis for Learning Overcomplete Latent Variable Models through Tensor Methods

no code implementations3 Aug 2014 Animashree Anandkumar, Rong Ge, Majid Janzamin

In the unsupervised setting, we use a simple initialization algorithm based on SVD of the tensor slices, and provide guarantees under the stricter condition that $k\le \beta d$ (where constant $\beta$ can be larger than $1$), where the tensor method recovers the components under a polynomial running time (and exponential in $\beta$).

Tensor Decomposition

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

no code implementations6 Nov 2014 Anima Anandkumar, Rong Ge, Majid Janzamin

We present a novel analysis of the dynamics of tensor power iterations in the overcomplete regime where the tensor CP rank is larger than the input dimension.

Minimal Realization Problems for Hidden Markov Models

no code implementations13 Nov 2014 Qingqing Huang, Rong Ge, Sham Kakade, Munther Dahleh

Consider a stationary discrete random process with alphabet size d, which is assumed to be the output process of an unknown stationary Hidden Markov Model (HMM).

Tensor Decomposition

Competing with the Empirical Risk Minimizer in a Single Pass

no code implementations20 Dec 2014 Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

In the absence of computational constraints, the minimizer of a sample average of observed data -- commonly referred to as either the empirical risk minimizer (ERM) or the $M$-estimator -- is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties.

Simple, Efficient, and Neural Algorithms for Sparse Coding

no code implementations2 Mar 2015 Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra

Its standard formulation is as a non-convex optimization problem which is solved in practice by heuristics based on alternating minimization.

Learning Mixtures of Gaussians in High Dimensions

no code implementations2 Mar 2015 Rong Ge, Qingqing Huang, Sham M. Kakade

Unfortunately, learning mixture of Gaussians is an information theoretically hard problem: in order to learn the parameters up to a reasonable accuracy, the number of samples required is exponential in the number of Gaussian components in the worst case.

Learning Theory Vocal Bursts Intensity Prediction

Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition

1 code implementation6 Mar 2015 Rong Ge, Furong Huang, Chi Jin, Yang Yuan

To the best of our knowledge this is the first work that gives global convergence guarantees for stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points.

Tensor Decomposition

Decomposing Overcomplete 3rd Order Tensors using Sum-of-Squares Algorithms

no code implementations21 Apr 2015 Rong Ge, Tengyu Ma

We also give a polynomial time algorithm for certifying the injective norm of random low rank tensors.

Tensor Decomposition

Intersecting Faces: Non-negative Matrix Factorization With New Guarantees

no code implementations8 Jul 2015 Rong Ge, James Zou

A plethora of algorithms have been developed to tackle NMF, but due to the non-convex nature of the problem, there is little guarantee on how well these methods work.

Rich Component Analysis

no code implementations14 Jul 2015 Rong Ge, James Zou

In this paper, we develop the general framework of Rich Component Analysis (RCA) to model settings where the observations from different views are driven by different sets of latent components, and each component can be a complex, high-dimensional distribution.

Efficient approaches for escaping higher order saddle points in non-convex optimization

no code implementations18 Feb 2016 Anima Anandkumar, Rong Ge

Local search heuristics for non-convex optimizations are popular in applied machine learning.

Matrix Completion has No Spurious Local Minimum

no code implementations NeurIPS 2016 Rong Ge, Jason D. Lee, Tengyu Ma

Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems.

Collaborative Filtering Matrix Completion +1

Provable Algorithms for Inference in Topic Models

no code implementations27 May 2016 Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, Ankur Moitra

But designing provable algorithms for inference has proven to be more challenging.

Topic Models

Homotopy Analysis for Tensor PCA

no code implementations28 Oct 2016 Anima Anandkumar, Yuan Deng, Rong Ge, Hossein Mobahi

For the challenging problem of tensor PCA, we prove global convergence of the homotopy method in the "high noise" regime.

Provable learning of Noisy-or Networks

no code implementations28 Dec 2016 Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski

Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables.

Tensor Decomposition Topic Models

On the ability of neural nets to express distributions

no code implementations22 Feb 2017 Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, Sanjeev Arora

We take a first cut at explaining the expressivity of multilayer nets by giving a sufficient criterion for a function to be approximable by a neural network with $n$ hidden layers.

How to Escape Saddle Points Efficiently

no code implementations ICML 2017 Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael. I. Jordan

This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i. e., it is almost "dimension-free").

Generalization and Equilibrium in Generative Adversarial Nets (GANs)

1 code implementation ICML 2017 Sanjeev Arora, Rong Ge, YIngyu Liang, Tengyu Ma, Yi Zhang

We show that training of generative adversarial network (GAN) may not have good generalization properties; e. g., training may appear successful but the trained distribution may be far from target distribution in standard metrics.

Generative Adversarial Network

No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis

no code implementations ICML 2017 Rong Ge, Chi Jin, Yi Zheng

In this paper we develop a new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA.

Matrix Completion

On the Optimization Landscape of Tensor Decompositions

no code implementations NeurIPS 2017 Rong Ge, Tengyu Ma

The landscape of many objective functions in learning has been conjectured to have the geometric property that "all local optima are (approximately) global optima", and thus they can be solved efficiently by local search algorithms.

Tensor Decomposition

Global Convergence of Policy Gradient Methods for Linearized Control Problems

no code implementations ICLR 2018 Maryam Fazel, Rong Ge, Sham M. Kakade, Mehran Mesbahi

Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model; 2) they are an "end-to-end" approach, directly optimizing the performance metric of interest; 3) they inherently allow for richly parameterized policies.

Continuous Control Policy Gradient Methods

Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator

no code implementations ICML 2018 Maryam Fazel, Rong Ge, Sham M. Kakade, Mehran Mesbahi

Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model 2) they are an "end-to-end" approach, directly optimizing the performance metric of interest 3) they inherently allow for richly parameterized policies.

Continuous Control Policy Gradient Methods

Stronger generalization bounds for deep nets via a compression approach

no code implementations ICML 2018 Sanjeev Arora, Rong Ge, Behnam Neyshabur, Yi Zhang

Analysis of correctness of our compression relies upon some newly identified \textquotedblleft noise stability\textquotedblright properties of trained deep nets, which are also experimentally verified.

Generalization Bounds

On the Local Minima of the Empirical Risk

no code implementations NeurIPS 2018 Chi Jin, Lydia T. Liu, Rong Ge, Michael. I. Jordan

Our objective is to find the $\epsilon$-approximate local minima of the underlying function $F$ while avoiding the shallow local minima---arising because of the tolerance $\nu$---which exist only in $f$.

Non-Convex Matrix Completion Against a Semi-Random Adversary

no code implementations28 Mar 2018 Yu Cheng, Rong Ge

Matrix completion is a well-studied problem with many machine learning applications.

Matrix Completion

Learning Two-layer Neural Networks with Symmetric Inputs

no code implementations ICLR 2019 Rong Ge, Rohith Kuditipudi, Zhize Li, Xiang Wang

We give a new algorithm for learning a two-layer neural network under a general class of input distributions.

Vocal Bursts Valence Prediction

High-Dimensional Robust Mean Estimation in Nearly-Linear Time

no code implementations23 Nov 2018 Yu Cheng, Ilias Diakonikolas, Rong Ge

We study the fundamental problem of high-dimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted.

Vocal Bursts Intensity Prediction

Simulated Tempering Langevin Monte Carlo II: An Improved Proof using Soft Markov Chain Decomposition

no code implementations29 Nov 2018 Rong Ge, Holden Lee, Andrej Risteski

Previous approaches rely on decomposing the state space as a partition of sets, while our approach can be thought of as decomposing the stationary measure as a mixture of distributions (a "soft partition").

A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm

no code implementations11 Feb 2019 Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan

In this note, we derive concentration inequalities for random vectors with subGaussian norm (a generalization of both subGaussian random vectors and norm bounded random vectors), which are tight up to logarithmic factors.

On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points

no code implementations13 Feb 2019 Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan

More recent theory has shown that GD and SGD can avoid saddle points, but the dependence on dimension in these analyses is polynomial.

BIG-bench Machine Learning

The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares

1 code implementation NeurIPS 2019 Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli

First, this work shows that even if the time horizon T (i. e. the number of iterations SGD is run for) is known in advance, SGD's final iterate behavior with any polynomially decaying learning rate scheme is highly sub-optimal compared to the minimax rate (by a condition number factor in the strongly convex case and a factor of $\sqrt{T}$ in the non-strongly convex case).

Stochastic Optimization

Rethinking learning rate schedules for stochastic optimization

no code implementations ICLR 2019 Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli

One plausible explanation is that non-convex neural network training procedures are better suited to the use of fundamentally different learning rate schedules, such as the ``cut the learning rate every constant number of epochs'' method (which more closely resembles an exponentially decaying learning rate schedule); note that this widely used schedule is in stark contrast to the polynomial decay schemes prescribed in the stochastic approximation literature, which are indeed shown to be (worst case) optimal for classes of convex optimization problems.

Stochastic Optimization

Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization

no code implementations1 May 2019 Rong Ge, Zhize Li, Wei-Yao Wang, Xiang Wang

Variance reduction techniques like SVRG provide simple and fast algorithms for optimizing a convex finite-sum objective.

Faster Algorithms for High-Dimensional Robust Covariance Estimation

no code implementations11 Jun 2019 Yu Cheng, Ilias Diakonikolas, Rong Ge, David Woodruff

We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted.

Vocal Bursts Intensity Prediction

Mildly Overparametrized Neural Nets can Memorize Training Data Efficiently

no code implementations26 Sep 2019 Rong Ge, Runzhe Wang, Haoyu Zhao

It has been observed \citep{zhang2016understanding} that deep neural networks can memorize: they achieve 100\% accuracy on training data.

Estimating Normalizing Constants for Log-Concave Distributions: Algorithms and Lower Bounds

no code implementations8 Nov 2019 Rong Ge, Holden Lee, Jianfeng Lu

Estimating the normalizing constant of an unnormalized probability distribution has important applications in computer science, statistical physics, machine learning, and statistics.

High-Dimensional Robust Mean Estimation via Gradient Descent

no code implementations ICML 2020 Yu Cheng, Ilias Diakonikolas, Rong Ge, Mahdi Soltanolkotabi

We study the problem of high-dimensional robust mean estimation in the presence of a constant fraction of adversarial outliers.

LEMMA Vocal Bursts Intensity Prediction

Energy-Aware DNN Graph Optimization

1 code implementation12 May 2020 Yu Wang, Rong Ge, Shuang Qiu

Unlike existing work in deep neural network (DNN) graphs optimization for inference performance, we explore DNN graph optimization for energy awareness and savings for power- and resource-constrained machine learning devices.

Extracting Latent State Representations with Linear Dynamics from Rich Observations

no code implementations29 Jun 2020 Abraham Frandsen, Rong Ge

In this work we study a model where there is a hidden linear subspace in which the dynamics is linear.

Position reinforcement-learning +1

Optimization Landscape of Tucker Decomposition

no code implementations29 Jun 2020 Abraham Frandsen, Rong Ge

Finding a Tucker decomposition is a nonconvex optimization problem.

Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

1 code implementation30 Jun 2020 Xiang Wang, Shuai Yuan, Chenwei Wu, Rong Ge

Solving this problem using a learning-to-learn approach -- using meta-gradient descent on a meta-objective based on the trajectory that the optimizer generates -- was recently shown to be effective.

Efficient sampling from the Bingham distribution

no code implementations30 Sep 2020 Rong Ge, Holden Lee, Jianfeng Lu, Andrej Risteski

We give a algorithm for exact sampling from the Bingham distribution $p(x)\propto \exp(x^\top A x)$ on the sphere $\mathcal S^{d-1}$ with expected runtime of $\operatorname{poly}(d, \lambda_{\max}(A)-\lambda_{\min}(A))$.

Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

no code implementations8 Oct 2020 Yikai Wu, Xingyu Zhu, Chenwei Wu, Annie Wang, Rong Ge

We can analyze the properties of these smaller matrices and prove the structure of top eigenspace random 2-layer networks.

Generalization Bounds

Beyond Lazy Training for Over-parameterized Tensor Decomposition

no code implementations NeurIPS 2020 Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, Rong Ge

We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = \Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2. 5l}\log d)$.

Tensor Decomposition

A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network

no code implementations4 Feb 2021 Mo Zhou, Rong Ge, Chi Jin

We show that as long as the loss is already lower than a threshold (polynomial in relevant parameters), all student neurons in an over-parameterized two-layer neural network will converge to one of teacher neurons, and the loss will go to 0.

Understanding Deflation Process in Over-parametrized Tensor Decomposition

no code implementations NeurIPS 2021 Rong Ge, Yunwei Ren, Xiang Wang, Mo Zhou

In this paper we study the training dynamics for gradient flow on over-parametrized tensor decomposition problems.

Tensor Decomposition

Outlier-Robust Sparse Estimation via Non-Convex Optimization

1 code implementation23 Sep 2021 Yu Cheng, Ilias Diakonikolas, Rong Ge, Shivam Gupta, Daniel M. Kane, Mahdi Soltanolkotabi

We explore the connection between outlier-robust high-dimensional statistics and non-convex optimization in the presence of sparsity constraints, with a focus on the fundamental tasks of robust sparse mean estimation and robust sparse PCA.

Towards Understanding the Data Dependency of Mixup-style Training

1 code implementation ICLR 2022 Muthu Chidambaram, Xiang Wang, Yuzheng Hu, Chenwei Wu, Rong Ge

Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk and exhibit better generalization and robustness on various tasks when compared to standard training.

Online Algorithms with Multiple Predictions

no code implementations8 May 2022 Keerti Anand, Rong Ge, Amit Kumar, Debmalya Panigrahi

In this paper, we give a generic algorithmic framework for online covering problems with multiple predictions that obtains an online solution that is competitive against the performance of the best predictor.

Customizing ML Predictions for Online Algorithms

no code implementations ICML 2020 Keerti Anand, Rong Ge, Debmalya Panigrahi

A popular line of recent research incorporates ML advice in the design of online algorithms to improve their performance in typical instances.

A Regression Approach to Learning-Augmented Online Algorithms

no code implementations NeurIPS 2021 Keerti Anand, Rong Ge, Amit Kumar, Debmalya Panigrahi

The emerging field of learning-augmented online algorithms uses ML techniques to predict future input parameters and thereby improve the performance of online algorithms.

regression Scheduling

Plateau in Monotonic Linear Interpolation -- A "Biased" View of Loss Landscape for Deep Networks

no code implementations3 Oct 2022 Xiang Wang, Annie N. Wang, Mo Zhou, Rong Ge

Monotonic linear interpolation (MLI) - on the line connecting a random initialization with the minimizer it converges to, the loss and accuracy are monotonic - is a phenomenon that is commonly observed in the training of neural networks.

Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

no code implementations7 Oct 2022 Xingyu Zhu, Zixuan Wang, Xiang Wang, Mo Zhou, Rong Ge

Globally we observe that the training dynamics for our example has an interesting bifurcating behavior, which was also observed in the training of neural nets.

Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup

1 code implementation24 Oct 2022 Muthu Chidambaram, Xiang Wang, Chenwei Wu, Rong Ge

Mixup is a data augmentation technique that relies on training using random convex combinations of data points and their labels.

Data Augmentation Image Classification

Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

no code implementations1 Feb 2023 Mo Zhou, Rong Ge

In this work, we give a different parametrization of the model which leads to a new implicit regularization effect that combines the benefit of $\ell_1$ and $\ell_2$ interpolators.

regression

Hiding Data Helps: On the Benefits of Masking for Sparse Coding

1 code implementation24 Feb 2023 Muthu Chidambaram, Chenwei Wu, Yu Cheng, Rong Ge

Furthermore, drawing from the growing body of work on self-supervised learning, we propose a novel masking objective for which recovering the ground-truth dictionary is in fact optimal as the signal increases for a large class of data-generating processes.

Dictionary Learning Self-Supervised Learning

Do Transformers Parse while Predicting the Masked Word?

no code implementations14 Mar 2023 Haoyu Zhao, Abhishek Panigrahi, Rong Ge, Sanjeev Arora

We also show that the Inside-Outside algorithm is optimal for masked language modeling loss on the PCFG-generated data.

Constituency Parsing Language Modelling +1

Depth Separation with Multilayer Mean-Field Networks

no code implementations3 Apr 2023 Yunwei Ren, Mo Zhou, Rong Ge

Depth separation -- why a deeper network is more powerful than a shallower one -- has been a major problem in deep learning theory.

Learning Theory

On the Limitations of Temperature Scaling for Distributions with Overlaps

1 code implementation1 Jun 2023 Muthu Chidambaram, Rong Ge

Despite the impressive generalization capabilities of deep neural networks, they have been repeatedly shown to be overconfident when they are wrong.

Data Augmentation Image Classification

The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models

no code implementations4 Oct 2023 Chenwei Wu, Li Erran Li, Stefano Ermon, Patrick Haffner, Rong Ge, Zaiwei Zhang

Compositionality is a common property in many modalities including natural languages and images, but the compositional generalization of multi-modal models is not well-understood.

FULL-W2V: Fully Exploiting Data Reuse for W2V on GPU-Accelerated Systems

1 code implementation12 Dec 2023 Thomas Randall, Tyler Allen, Rong Ge

Word2Vec remains one of the highly-impactful innovations in the field of Natural Language Processing (NLP) that represents latent grammatical and syntactical information in human text with dense vectors in a low dimension.

Transfer-Learning-Based Autotuning Using Gaussian Copula

2 code implementations9 Jan 2024 Thomas Randall, Jaehoon Koo, Brice Videau, Michael Kruse, Xingfu Wu, Paul Hovland, Mary Hall, Rong Ge, Prasanna Balaprakash

We introduce the first generative TL-based autotuning approach based on the Gaussian copula (GC) to model the high-performing regions of the search space from prior data and then generate high-performing configurations for new tasks.

Transfer Learning

For Better or For Worse? Learning Minimum Variance Features With Label Augmentation

no code implementations10 Feb 2024 Muthu Chidambaram, Rong Ge

Data augmentation has been pivotal in successfully training deep learning models on classification tasks over the past decade.

Data Augmentation Image Classification

Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input

no code implementations14 Feb 2024 Ziang Chen, Rong Ge

In this work, we study the mean-field flow for learning subspace-sparse polynomials using stochastic gradient descent and two-layer neural networks, where the input distribution is standard Gaussian and the output only depends on the projection of the input onto a low-dimensional subspace.

Linear Transformers are Versatile In-Context Learners

no code implementations21 Feb 2024 Max Vladymyrov, Johannes von Oswald, Mark Sandler, Rong Ge

Recent research has demonstrated that transformers, particularly linear attention models, implicitly execute gradient-descent-like algorithms on data provided in-context during their forward inference step.

Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing

no code implementations NeurIPS 2023 Shuyao Li, Yu Cheng, Ilias Diakonikolas, Jelena Diakonikolas, Rong Ge, Stephen J. Wright

We introduce a general framework for efficiently finding an approximate SOSP with \emph{dimension-independent} accuracy guarantees, using $\widetilde{O}({D^2}/{\epsilon})$ samples where $D$ is the ambient dimension and $\epsilon$ is the fraction of corrupted datapoints.

Cannot find the paper you are looking for? You can Submit a new open access paper.