Search Results for author: Suvrit Sra

Found 91 papers, 14 papers with code

Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition

no code implementations ICML 2020 Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, Tiancheng Yu

We consider the task of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses.

Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions

no code implementations ICML 2020 Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Suvrit Sra, Ali Jadbabaie

Therefore, we introduce the notion of (delta, epsilon)-stationarity, a generalization that allows for a point to be within distance delta of an epsilon-stationary point and reduces to epsilon-stationarity for smooth functions.

On a class of geodesically convex optimization problems solved via Euclidean MM methods

no code implementations22 Jun 2022 Suvrit Sra, Melanie Weber

We study geodesically convex (g-convex) problems that can be written as a difference of Euclidean convex functions.

Riemannian optimization

Understanding the unstable convergence of gradient descent

no code implementations3 Apr 2022 Kwangjun Ahn, Jingzhao Zhang, Suvrit Sra

Most existing analyses of (stochastic) gradient descent rely on the condition that for $L$-smooth costs, the step size is less than $2/L$.

Sign and Basis Invariant Networks for Spectral Graph Representation Learning

1 code implementation25 Feb 2022 Derek Lim, Joshua Robinson, Lingxiao Zhao, Tess Smidt, Suvrit Sra, Haggai Maron, Stefanie Jegelka

Moreover, when used with Laplacian eigenvectors, our architectures are provably expressive for graph representation learning: they can approximate any spectral graph convolution, can compute spectral invariants that go beyond message passing neural networks, and can provably simulate previously proposed graph positional encodings.

Graph Regression Graph Representation Learning

Minimax in Geodesic Metric Spaces: Sion's Theorem and Algorithms

no code implementations13 Feb 2022 Peiyuan Zhang, Jingzhao Zhang, Suvrit Sra

Determining whether saddle points exist or are approximable for nonconvex-nonconcave problems is usually intractable.

Time varying regression with hidden linear dynamics

no code implementations29 Dec 2021 Ali Jadbabaie, Horia Mania, Devavrat Shah, Suvrit Sra

We revisit a model for time-varying linear regression that assumes the unknown parameters evolve according to a linear dynamical system.

Max-Margin Contrastive Learning

1 code implementation21 Dec 2021 Anshul Shah, Suvrit Sra, Rama Chellappa, Anoop Cherian

Standard contrastive learning approaches usually require a large number of negatives for effective unsupervised learning and often exhibit slow convergence.

Contrastive Learning Representation Learning +1

Understanding Riemannian Acceleration via a Proximal Extragradient Framework

no code implementations4 Nov 2021 Jikai Jin, Suvrit Sra

We contribute to advancing the understanding of Riemannian accelerated gradient methods.

Riemannian optimization

Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

no code implementations ICLR 2022 Chulhee Yun, Shashank Rajput, Suvrit Sra

In distributed learning, local SGD (also known as federated averaging) and its simple baseline minibatch SGD are widely studied optimization methods.

Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective

no code implementations12 Oct 2021 Jingzhao Zhang, Haochuan Li, Suvrit Sra, Ali Jadbabaie

This work examines the deep disconnect between existing theoretical analyses of gradient-based algorithms and the practice of training deep neural networks.

Can contrastive learning avoid shortcut solutions?

1 code implementation NeurIPS 2021 Joshua Robinson, Li Sun, Ke Yu, Kayhan Batmanghelich, Stefanie Jegelka, Suvrit Sra

However, we observe that the contrastive loss does not always sufficiently guide which features are extracted, a behavior that can negatively impact the performance on downstream tasks via "shortcuts", i. e., by inadvertently suppressing important predictive features.

Contrastive Learning

Can Single-Shuffle SGD be Better than Reshuffling SGD and GD?

no code implementations12 Mar 2021 Chulhee Yun, Suvrit Sra, Ali Jadbabaie

We propose matrix norm inequalities that extend the Recht-R\'e (2012) conjecture on a noncommutative AM-GM inequality by supplementing it with another inequality that accounts for single-shuffle, which is a widely used without-replacement sampling scheme that shuffles only once in the beginning and is overlooked in the Recht-R\'e conjecture.

Provably Efficient Algorithms for Multi-Objective Competitive RL

no code implementations5 Feb 2021 Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra

To our knowledge, this work provides the first provably efficient algorithms for vector-valued Markov games and our theoretical guarantees are near-optimal.

Stochastic Optimization with Non-stationary Noise: The Power of Moment Estimation

no code implementations1 Jan 2021 Jingzhao Zhang, Hongzhou Lin, Subhro Das, Suvrit Sra, Ali Jadbabaie

In particular, standard results on optimal convergence rates for stochastic optimization assume either there exists a uniform bound on the moments of the gradient noise, or that the noise decays as the algorithm progresses.

Stochastic Optimization

Why do classifier accuracies show linear trends under distribution shift?

no code implementations31 Dec 2020 Horia Mania, Suvrit Sra

Recent studies of generalization in deep learning have observed a puzzling trend: accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution.

Online Learning in Unknown Markov Games

no code implementations28 Oct 2020 Yi Tian, Yuanhao Wang, Tiancheng Yu, Suvrit Sra

We study online learning in unknown Markov games, a problem that arises in episodic multi-agent reinforcement learning where the actions of the opponents are unobservable.

Multi-agent Reinforcement Learning online learning

Coping with Label Shift via Distributionally Robust Optimisation

no code implementations ICLR 2021 Jingzhao Zhang, Aditya Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra

The label shift problem refers to the supervised learning setting where the train and test label distributions do not match.

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

no code implementations NeurIPS 2020 Yi Tian, Jian Qian, Suvrit Sra

We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components.

reinforcement-learning

Understanding Nesterov's Acceleration via Proximal Point Method

no code implementations17 May 2020 Kwangjun Ahn, Suvrit Sra

The proximal point method (PPM) is a fundamental method in optimization that is often used as a building block for designing optimization algorithms.

On Tight Convergence Rates of Without-replacement SGD

no code implementations18 Apr 2020 Kwangjun Ahn, Suvrit Sra

For solving finite-sum optimization problems, SGD without replacement sampling is empirically shown to outperform SGD.

Strength from Weakness: Fast Learning Using Weak Supervision

no code implementations ICML 2020 Joshua Robinson, Stefanie Jegelka, Suvrit Sra

Our theoretical results are reflected empirically across a range of tasks and illustrate how weak labels speed up learning on the strong task.

Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions

no code implementations10 Feb 2020 Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Ali Jadbabaie, Suvrit Sra

In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds.

From Nesterov's Estimate Sequence to Riemannian Acceleration

no code implementations24 Jan 2020 Kwangjun Ahn, Suvrit Sra

We control this distortion by developing a novel geometric inequality, which permits us to propose and analyze a Riemannian counterpart to Nesterov's accelerated gradient method.

Why are Adaptive Methods Good for Attention Models?

no code implementations NeurIPS 2020 Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J. Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the \emph{de facto} algorithm in deep learning, adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across important tasks, such as attention models.

Learning Adversarial MDPs with Bandit Feedback and Unknown Transition

no code implementations3 Dec 2019 Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, Tiancheng Yu

We consider the problem of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses.

Projection-free nonconvex stochastic optimization on Riemannian manifolds

no code implementations9 Oct 2019 Melanie Weber, Suvrit Sra

We present algorithms for both purely stochastic optimization and finite-sum problems.

Stochastic Optimization

Why ADAM Beats SGD for Attention Models

no code implementations25 Sep 2019 Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning, adaptive methods like Adam have been observed to outperform SGD across important tasks, such as attention models.

Are deep ResNets provably better than linear predictors?

no code implementations NeurIPS 2019 Chulhee Yun, Suvrit Sra, Ali Jadbabaie

Recent results in the literature indicate that a residual network (ResNet) composed of a single residual block outperforms linear predictors, in the sense that all local minima in its optimization landscape are at least as good as the best linear predictor.

Near Optimal Stratified Sampling

no code implementations26 Jun 2019 Tiancheng Yu, Xiyu Zhai, Suvrit Sra

The performance of a machine learning system is usually evaluated by using i. i. d.\ observations with true labels.

Flexible Modeling of Diversity with Strongly Log-Concave Distributions

1 code implementation NeurIPS 2019 Joshua Robinson, Suvrit Sra, Stefanie Jegelka

We propose SLC as the right extension of SR that enables easier, more intuitive control over diversity, illustrating this via examples of practical importance.

Escaping Saddle Points with Adaptive Gradient Methods

no code implementations26 Jan 2019 Matthew Staib, Sashank J. Reddi, Satyen Kale, Sanjiv Kumar, Suvrit Sra

Adaptive methods such as Adam and RMSProp are widely used in deep learning but are not well understood.

Deep-RBF Networks Revisited: Robust Classification with Rejection

no code implementations7 Dec 2018 Pourya Habib Zadeh, Reshad Hosseini, Suvrit Sra

On the other hand, deep-RBF networks assign high confidence only to the regions containing enough feature points, but they have been discounted due to the widely-held belief that they have the vanishing gradient problem.

Adversarial Attack Classification +3

Exponentiated Strongly Rayleigh Distributions

no code implementations NeurIPS 2018 Zelda E. Mariet, Suvrit Sra, Stefanie Jegelka

Strongly Rayleigh (SR) measures are discrete probability distributions over the subsets of a ground set.

Point Processes

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity

no code implementations NeurIPS 2019 Chulhee Yun, Suvrit Sra, Ali Jadbabaie

We also prove that width $\Theta(\sqrt{N})$ is necessary and sufficient for memorizing $N$ data points, proving tight bounds on memorization capacity.

Efficiently testing local optimality and escaping saddles for ReLU networks

no code implementations ICLR 2019 Chulhee Yun, Suvrit Sra, Ali Jadbabaie

In the benign case, we solve one equality constrained QP, and we prove that projected gradient descent solves it exponentially fast.

Random Shuffling Beats SGD after Finite Epochs

no code implementations26 Jun 2018 Jeff Z. HaoChen, Suvrit Sra

We present the first (to our knowledge) non-asymptotic solution to this problem, which shows that after a "reasonable" number of epochs RandomShuffle indeed converges faster than SGD.

Towards Riemannian Accelerated Gradient Methods

no code implementations7 Jun 2018 Hongyi Zhang, Suvrit Sra

We propose a Riemannian version of Nesterov's Accelerated Gradient algorithm (RAGD), and show that for geodesically smooth and strongly convex problems, within a neighborhood of the minimizer whose radius depends on the condition number as well as the sectional curvature of the manifold, RAGD converges to the minimizer with acceleration.

Direct Runge-Kutta Discretization Achieves Acceleration

no code implementations NeurIPS 2018 Jingzhao Zhang, Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie

We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method.

Non-Linear Temporal Subspace Representations for Activity Recognition

no code implementations CVPR 2018 Anoop Cherian, Suvrit Sra, Stephen Gould, Richard Hartley

As these features are often non-linear, we propose a novel pooling method, kernelized rank pooling, that represents a given sequence compactly as the pre-image of the parameters of a hyperplane in a reproducing kernel Hilbert space, projections of data onto which captures their temporal order.

Action Recognition Riemannian optimization +1

Learning Determinantal Point Processes by Corrective Negative Sampling

no code implementations15 Feb 2018 Zelda Mariet, Mike Gartrell, Suvrit Sra

To address this issue, which reduces the quality of the learned model, we introduce a novel optimization problem, Contrastive Estimation (CE), which encodes information about "negative" samples into the basic learning model.

Language Modelling Point Processes

Small nonlinearities in activation functions create bad local minima in neural networks

no code implementations ICLR 2019 Chulhee Yun, Suvrit Sra, Ali Jadbabaie

Our results thus indicate that in general "no spurious local minima" is a property limited to deep linear networks, and insights obtained from linear networks may not be robust.

Riemannian Optimization via Frank-Wolfe Methods

1 code implementation30 Oct 2017 Melanie Weber, Suvrit Sra

Both tasks involve geodesically convex interval constraints, for which we show that the Riemannian "linear" oracle required by RFW admits a closed-form solution; this result may be of independent interest.

Riemannian optimization

A Generic Approach for Escaping Saddle points

no code implementations5 Sep 2017 Sashank J. Reddi, Manzil Zaheer, Suvrit Sra, Barnabas Poczos, Francis Bach, Ruslan Salakhutdinov, Alexander J. Smola

A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points.

Second-order methods

Global optimality conditions for deep neural networks

no code implementations ICLR 2018 Chulhee Yun, Suvrit Sra, Ali Jadbabaie

We study the error landscape of deep linear and nonlinear neural networks with the squared error loss.

Distributional Adversarial Networks

1 code implementation ICLR 2018 Chengtao Li, David Alvarez-Melis, Keyulu Xu, Stefanie Jegelka, Suvrit Sra

We propose a framework for adversarial training that relies on a sample rather than a single sample point as the fundamental unit of discrimination.

Domain Adaptation

An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization

1 code implementation10 Jun 2017 Reshad Hosseini, Suvrit Sra

This motivates us to take a closer look at the problem geometry, and derive a better formulation that is much more amenable to Riemannian optimization.

Density Estimation Riemannian optimization

Sequence Summarization Using Order-constrained Kernelized Feature Subspaces

no code implementations24 May 2017 Anoop Cherian, Suvrit Sra, Richard Hartley

As these features are often non-linear, we propose a novel pooling method, kernelized rank pooling, that represents a given sequence compactly as the pre-image of the parameters of a hyperplane in an RKHS, projections of data onto which captures their temporal order.

Action Recognition Riemannian optimization +1

Polynomial Time Algorithms for Dual Volume Sampling

no code implementations NeurIPS 2017 Chengtao Li, Stefanie Jegelka, Suvrit Sra

We study dual volume sampling, a method for selecting k columns from an n x m short and wide matrix (n <= k <= m) such that the probability of selection is proportional to the volume spanned by the rows of the induced submatrix.

Experimental Design

Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization

no code implementations NeurIPS 2016 Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alexander J. Smola

We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonsmooth part is convex.

Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling

no code implementations NeurIPS 2016 Chengtao Li, Stefanie Jegelka, Suvrit Sra

We consider the task of rapidly sampling from such constrained measures, and develop fast Markov chain samplers for them.

Point Processes

Stochastic Frank-Wolfe Methods for Nonconvex Optimization

no code implementations27 Jul 2016 Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

Finally, we show that the faster convergence rates of our variance reduced methods also translate into improved convergence rates for the stochastic setting.

Kronecker Determinantal Point Processes

4 code implementations NeurIPS 2016 Zelda Mariet, Suvrit Sra

Determinantal Point Processes (DPPs) are probabilistic models over all subsets a ground set of $N$ items.

Point Processes Stochastic Optimization

Fast Stochastic Methods for Nonsmooth Nonconvex Optimization

no code implementations23 May 2016 Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

This paper builds upon our recent series of papers on fast stochastic methods for smooth nonconvex optimization [22, 23], with a novel analysis for nonconvex and nonsmooth functions.

Directional Statistics in Machine Learning: a Brief Review

2 code implementations1 May 2016 Suvrit Sra

The modern data analyst must cope with data encoded in various forms, vectors, matrices, strings, graphs, or more.

Fast DPP Sampling for Nyström with Application to Kernel Methods

no code implementations19 Mar 2016 Chengtao Li, Stefanie Jegelka, Suvrit Sra

Its theoretical guarantees and empirical performance rely critically on the quality of the landmarks selected.

Point Processes

Fast Incremental Method for Nonconvex Optimization

no code implementations19 Mar 2016 Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

We analyze a fast incremental aggregated gradient method for optimizing nonconvex problems of the form $\min_x \sum_i f_i(x)$.

Stochastic Variance Reduction for Nonconvex Optimization

no code implementations19 Mar 2016 Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola

We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them.

First-order Methods for Geodesically Convex Optimization

no code implementations19 Feb 2016 Hongyi Zhang, Suvrit Sra

Geodesic convexity generalizes the notion of (vector space) convexity to nonlinear metric spaces.

Gauss quadrature for matrix inverse forms with applications

no code implementations7 Dec 2015 Chengtao Li, Suvrit Sra, Stefanie Jegelka

We present a framework for accelerating a spectrum of machine learning algorithms that require computation of bilinear inverse forms $u^\top A^{-1}u$, where $A$ is a positive definite matrix and $u$ a given vector.

Point Processes

Efficient Sampling for k-Determinantal Point Processes

no code implementations4 Sep 2015 Chengtao Li, Stefanie Jegelka, Suvrit Sra

Our method takes advantage of the diversity property of subsets sampled from a DPP, and proceeds in two stages: first it constructs coresets for the ground set of items; thereafter, it efficiently samples subsets based on the constructed coresets.

Point Processes

AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization

no code implementations20 Aug 2015 Suvrit Sra, Adams Wei Yu, Mu Li, Alexander J. Smola

We study distributed stochastic convex optimization under the delayed gradient model where the server nodes perform parameter updates, while the worker nodes compute stochastic gradients.

Fixed-point algorithms for learning determinantal point processes

no code implementations4 Aug 2015 Zelda Mariet, Suvrit Sra

Determinantal point processes (DPPs) offer an elegant tool for encoding probabilities over subsets of a ground set.

Point Processes

Riemannian Dictionary Learning and Sparse Coding for Positive Definite Matrices

no code implementations10 Jul 2015 Anoop Cherian, Suvrit Sra

Inspired by the great success of dictionary learning and sparse coding for vector-valued data, our goal in this paper is to represent data in the form of SPD matrices as sparse conic combinations of SPD atoms from a learned dictionary via a Riemannian geometric approach.

Dictionary Learning Riemannian optimization

Convex Optimization for Parallel Energy Minimization

no code implementations5 Mar 2015 K. S. Sesh Kumar, Alvaro Barbero, Stefanie Jegelka, Suvrit Sra, Francis Bach

By exploiting results from convex and submodular theory, we reformulate the quadratic energy minimization problem as a total variation denoising problem, which, when viewed geometrically, enables the use of projection and reflection based convex methods.

Denoising

Efficient Structured Matrix Rank Minimization

no code implementations NeurIPS 2014 Adams Wei Yu, Wanli Ma, YaoLiang Yu, Jaime Carbonell, Suvrit Sra

We study the problem of finding structured low-rank matrices using nuclear norm regularization where the structure is encoded by a linear map.

Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms

no code implementations22 Sep 2014 Yu-Xiang Wang, Veeranjaneyulu Sadhanala, Wei Dai, Willie Neiswanger, Suvrit Sra, Eric P. Xing

We develop parallel and distributed Frank-Wolfe algorithms; the former on shared memory machines with mini-batching, and the latter in a delayed update framework.

Large-scale randomized-coordinate descent methods with non-separable linear constraints

no code implementations9 Sep 2014 Sashank Reddi, Ahmed Hefny, Carlton Downey, Avinava Dubey, Suvrit Sra

We develop randomized (block) coordinate descent (CD) methods for linearly constrained convex optimization.

Randomized Nonlinear Component Analysis

no code implementations1 Feb 2014 David Lopez-Paz, Suvrit Sra, Alex Smola, Zoubin Ghahramani, Bernhard Schölkopf

Although nonlinear variants of PCA and CCA have been proposed, these are computationally prohibitive in the large scale.

Geometric optimisation on positive definite matrices for elliptically contoured distributions

no code implementations NeurIPS 2013 Suvrit Sra, Reshad Hosseini

We exploit the remarkable structure of the convex cone of positive definite matrices which allows one to uncover hidden geodesic convexity of objective functions that are nonconvex in the ordinary Euclidean sense.

Riemannian optimization

Statistical estimation for optimization problems on graphs

no code implementations29 Nov 2013 Mikhail Langovoy, Suvrit Sra

Large graphs abound in machine learning, data mining, and several related areas.

Combinatorial Optimization

Reflection methods for user-friendly submodular optimization

no code implementations NeurIPS 2013 Stefanie Jegelka, Francis Bach, Suvrit Sra

A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution.

Semantic Segmentation

Scalable nonconvex inexact proximal splitting

no code implementations NeurIPS 2012 Suvrit Sra

To our knowledge, our framework is first to develop and analyze incremental \emph{nonconvex} proximal-splitting algorithms, even if we disregard the ability to handle nonvanishing errors.

A new metric on the manifold of kernel matrices with application to matrix geometric means

no code implementations NeurIPS 2012 Suvrit Sra

Symmetric positive definite (spd) matrices are remarkably pervasive in a multitude of scientific disciplines, including machine learning and optimization.

Positive definite matrices and the S-divergence

no code implementations8 Oct 2011 Suvrit Sra

Positive definite matrices abound in a dazzling variety of applications.

Cannot find the paper you are looking for? You can Submit a new open access paper.