You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • ICML 2020 • Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Suvrit Sra, Ali Jadbabaie

Therefore, we introduce the notion of (delta, epsilon)-stationarity, a generalization that allows for a point to be within distance delta of an epsilon-stationary point and reduces to epsilon-stationarity for smooth functions.

no code implementations • ICML 2020 • Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, Tiancheng Yu

We consider the task of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses.

1 code implementation • 2 Oct 2023 • Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra

Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics.

no code implementations • 10 Jul 2023 • Adarsh Barik, Suvrit Sra, Jean Honorio

Invex programs are a special kind of non-convex problems which attain global minima at every stationary point.

no code implementations • 25 May 2023 • Kwangjun Ahn, Ali Jadbabaie, Suvrit Sra

For general cost functions, we present a gradient-based algorithm that finds an approximate flat local minimum efficiently.

no code implementations • 24 Feb 2023 • David X. Wu, Chulhee Yun, Suvrit Sra

We uncover how SGD interacts with batch normalization and can exhibit undesirable training dynamics such as divergence.

no code implementations • 30 Dec 2022 • Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra

We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system.

no code implementations • 22 Jun 2022 • Melanie Weber, Suvrit Sra

We study geodesically convex (g-convex) problems that can be written as a difference of Euclidean convex functions.

no code implementations • 3 Apr 2022 • Kwangjun Ahn, Jingzhao Zhang, Suvrit Sra

Most existing analyses of (stochastic) gradient descent rely on the condition that for $L$-smooth costs, the step size is less than $2/L$.

2 code implementations • 25 Feb 2022 • Derek Lim, Joshua Robinson, Lingxiao Zhao, Tess Smidt, Suvrit Sra, Haggai Maron, Stefanie Jegelka

We introduce SignNet and BasisNet -- new neural architectures that are invariant to two key symmetries displayed by eigenvectors: (i) sign flips, since if $v$ is an eigenvector then so is $-v$; and (ii) more general basis symmetries, which occur in higher dimensional eigenspaces with infinitely many choices of basis eigenvectors.

Ranked #10 on Graph Regression on ZINC-500k

no code implementations • 13 Feb 2022 • Peiyuan Zhang, Jingzhao Zhang, Suvrit Sra

Deciding whether saddle points exist or are approximable for nonconvex-nonconcave problems is usually intractable.

no code implementations • 29 Dec 2021 • Ali Jadbabaie, Horia Mania, Devavrat Shah, Suvrit Sra

We revisit a model for time-varying linear regression that assumes the unknown parameters evolve according to a linear dynamical system.

1 code implementation • 21 Dec 2021 • Anshul Shah, Suvrit Sra, Rama Chellappa, Anoop Cherian

Standard contrastive learning approaches usually require a large number of negatives for effective unsupervised learning and often exhibit slow convergence.

Ranked #98 on Self-Supervised Image Classification on ImageNet

no code implementations • 4 Nov 2021 • Jikai Jin, Suvrit Sra

We contribute to advancing the understanding of Riemannian accelerated gradient methods.

no code implementations • ICLR 2022 • Chulhee Yun, Shashank Rajput, Suvrit Sra

In distributed learning, local SGD (also known as federated averaging) and its simple baseline minibatch SGD are widely studied optimization methods.

no code implementations • 12 Oct 2021 • Jingzhao Zhang, Haochuan Li, Suvrit Sra, Ali Jadbabaie

This work examines the deep disconnect between existing theoretical analyses of gradient-based algorithms and the practice of training deep neural networks.

1 code implementation • NeurIPS 2021 • Joshua Robinson, Li Sun, Ke Yu, Kayhan Batmanghelich, Stefanie Jegelka, Suvrit Sra

However, we observe that the contrastive loss does not always sufficiently guide which features are extracted, a behavior that can negatively impact the performance on downstream tasks via "shortcuts", i. e., by inadvertently suppressing important predictive features.

no code implementations • 12 Mar 2021 • Chulhee Yun, Suvrit Sra, Ali Jadbabaie

We propose matrix norm inequalities that extend the Recht-R\'e (2012) conjecture on a noncommutative AM-GM inequality by supplementing it with another inequality that accounts for single-shuffle, which is a widely used without-replacement sampling scheme that shuffles only once in the beginning and is overlooked in the Recht-R\'e conjecture.

no code implementations • 5 Feb 2021 • Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra

To our knowledge, this work provides the first provably efficient algorithms for vector-valued Markov games and our theoretical guarantees are near-optimal.

no code implementations • 1 Jan 2021 • Jingzhao Zhang, Hongzhou Lin, Subhro Das, Suvrit Sra, Ali Jadbabaie

In particular, standard results on optimal convergence rates for stochastic optimization assume either there exists a uniform bound on the moments of the gradient noise, or that the noise decays as the algorithm progresses.

no code implementations • 31 Dec 2020 • Horia Mania, Suvrit Sra

Recent studies of generalization in deep learning have observed a puzzling trend: accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution.

no code implementations • 28 Oct 2020 • Yi Tian, Yuanhao Wang, Tiancheng Yu, Suvrit Sra

We study online learning in unknown Markov games, a problem that arises in episodic multi-agent reinforcement learning where the actions of the opponents are unobservable.

no code implementations • ICLR 2021 • Jingzhao Zhang, Aditya Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra

The label shift problem refers to the supervised learning setting where the train and test label distributions do not match.

1 code implementation • ICLR 2021 • Joshua Robinson, Ching-Yao Chuang, Suvrit Sra, Stefanie Jegelka

How can you sample good negative examples for contrastive learning?

no code implementations • NeurIPS 2020 • Yi Tian, Jian Qian, Suvrit Sra

We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components.

no code implementations • NeurIPS 2020 • Kwangjun Ahn, Chulhee Yun, Suvrit Sra

We study without-replacement SGD for solving finite-sum optimization problems.

no code implementations • 8 Jun 2020 • Jingzhao Zhang, Hongzhou Lin, Subhro Das, Suvrit Sra, Ali Jadbabaie

We study oracle complexity of gradient based methods for stochastic approximation problems.

no code implementations • 17 May 2020 • Kwangjun Ahn, Suvrit Sra

The proximal point method (PPM) is a fundamental method in optimization that is often used as a building block for designing optimization algorithms.

no code implementations • 18 Apr 2020 • Kwangjun Ahn, Suvrit Sra

For solving finite-sum optimization problems, SGD without replacement sampling is empirically shown to outperform SGD.

no code implementations • ICML 2020 • Joshua Robinson, Stefanie Jegelka, Suvrit Sra

Our theoretical results are reflected empirically across a range of tasks and illustrate how weak labels speed up learning on the strong task.

no code implementations • 10 Feb 2020 • Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Ali Jadbabaie, Suvrit Sra

In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds.

no code implementations • 24 Jan 2020 • Kwangjun Ahn, Suvrit Sra

We control this distortion by developing a novel geometric inequality, which permits us to propose and analyze a Riemannian counterpart to Nesterov's accelerated gradient method.

no code implementations • NeurIPS 2020 • Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J. Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the \emph{de facto} algorithm in deep learning, adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across important tasks, such as attention models.

no code implementations • 3 Dec 2019 • Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, Tiancheng Yu

We consider the problem of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses.

no code implementations • 9 Oct 2019 • Melanie Weber, Suvrit Sra

We present algorithms for both purely stochastic optimization and finite-sum problems.

no code implementations • 25 Sep 2019 • Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning, adaptive methods like Adam have been observed to outperform SGD across important tasks, such as attention models.

no code implementations • 22 Jul 2019 • Tiancheng Yu, Suvrit Sra

A Markov Decision Process (MDP) is a popular model for reinforcement learning.

no code implementations • NeurIPS 2019 • Chulhee Yun, Suvrit Sra, Ali Jadbabaie

Recent results in the literature indicate that a residual network (ResNet) composed of a single residual block outperforms linear predictors, in the sense that all local minima in its optimization landscape are at least as good as the best linear predictor.

no code implementations • 26 Jun 2019 • Tiancheng Yu, Xiyu Zhai, Suvrit Sra

The performance of a machine learning system is usually evaluated by using i. i. d.\ observations with true labels.

1 code implementation • NeurIPS 2019 • Joshua Robinson, Suvrit Sra, Stefanie Jegelka

We propose SLC as the right extension of SR that enables easier, more intuitive control over diversity, illustrating this via examples of practical importance.

1 code implementation • ICLR 2020 • Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie

We provide a theoretical explanation for the effectiveness of gradient clipping in training deep neural networks.

no code implementations • 26 Jan 2019 • Matthew Staib, Sashank J. Reddi, Satyen Kale, Sanjiv Kumar, Suvrit Sra

Adaptive methods such as Adam and RMSProp are widely used in deep learning but are not well understood.

no code implementations • 7 Dec 2018 • Pourya Habib Zadeh, Reshad Hosseini, Suvrit Sra

On the other hand, deep-RBF networks assign high confidence only to the regions containing enough feature points, but they have been discounted due to the widely-held belief that they have the vanishing gradient problem.

no code implementations • NeurIPS 2018 • Zelda E. Mariet, Suvrit Sra, Stefanie Jegelka

Strongly Rayleigh (SR) measures are discrete probability distributions over the subsets of a ground set.

no code implementations • 10 Nov 2018 • Jingzhao Zhang, Hongyi Zhang, Suvrit Sra

We study smooth stochastic optimization problems on Riemannian manifolds.

no code implementations • NeurIPS 2019 • Chulhee Yun, Suvrit Sra, Ali Jadbabaie

We also prove that width $\Theta(\sqrt{N})$ is necessary and sufficient for memorizing $N$ data points, proving tight bounds on memorization capacity.

no code implementations • ICLR 2019 • Chulhee Yun, Suvrit Sra, Ali Jadbabaie

In the benign case, we solve one equality constrained QP, and we prove that projected gradient descent solves it exponentially fast.

no code implementations • 26 Jun 2018 • Jeff Z. HaoChen, Suvrit Sra

We present the first (to our knowledge) non-asymptotic solution to this problem, which shows that after a "reasonable" number of epochs RandomShuffle indeed converges faster than SGD.

no code implementations • 7 Jun 2018 • Hongyi Zhang, Suvrit Sra

We propose a Riemannian version of Nesterov's Accelerated Gradient algorithm (RAGD), and show that for geodesically smooth and strongly convex problems, within a neighborhood of the minimizer whose radius depends on the condition number as well as the sectional curvature of the manifold, RAGD converges to the minimizer with acceleration.

no code implementations • NeurIPS 2018 • Jingzhao Zhang, Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie

We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method.

no code implementations • CVPR 2018 • Anoop Cherian, Suvrit Sra, Stephen Gould, Richard Hartley

As these features are often non-linear, we propose a novel pooling method, kernelized rank pooling, that represents a given sequence compactly as the pre-image of the parameters of a hyperplane in a reproducing kernel Hilbert space, projections of data onto which captures their temporal order.

no code implementations • 15 Feb 2018 • Zelda Mariet, Mike Gartrell, Suvrit Sra

To address this issue, which reduces the quality of the learned model, we introduce a novel optimization problem, Contrastive Estimation (CE), which encodes information about "negative" samples into the basic learning model.

no code implementations • ICLR 2019 • Chulhee Yun, Suvrit Sra, Ali Jadbabaie

Our results thus indicate that in general "no spurious local minima" is a property limited to deep linear networks, and insights obtained from linear networks may not be robust.

1 code implementation • 30 Oct 2017 • Melanie Weber, Suvrit Sra

Both tasks involve geodesically convex interval constraints, for which we show that the Riemannian "linear" oracle required by RFW admits a closed-form solution; this result may be of independent interest.

no code implementations • 5 Sep 2017 • Sashank J. Reddi, Manzil Zaheer, Suvrit Sra, Barnabas Poczos, Francis Bach, Ruslan Salakhutdinov, Alexander J. Smola

A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points.

no code implementations • ICLR 2018 • Chulhee Yun, Suvrit Sra, Ali Jadbabaie

We study the error landscape of deep linear and nonlinear neural networks with the squared error loss.

1 code implementation • ICLR 2018 • Chengtao Li, David Alvarez-Melis, Keyulu Xu, Stefanie Jegelka, Suvrit Sra

We propose a framework for adversarial training that relies on a sample rather than a single sample point as the fundamental unit of discrimination.

1 code implementation • 10 Jun 2017 • Reshad Hosseini, Suvrit Sra

This motivates us to take a closer look at the problem geometry, and derive a better formulation that is much more amenable to Riemannian optimization.

no code implementations • 24 May 2017 • Anoop Cherian, Suvrit Sra, Richard Hartley

As these features are often non-linear, we propose a novel pooling method, kernelized rank pooling, that represents a given sequence compactly as the pre-image of the parameters of a hyperplane in an RKHS, projections of data onto which captures their temporal order.

no code implementations • NeurIPS 2017 • Chengtao Li, Stefanie Jegelka, Suvrit Sra

We study dual volume sampling, a method for selecting k columns from an n x m short and wide matrix (n <= k <= m) such that the probability of selection is proportional to the volume spanned by the rows of the induced submatrix.

no code implementations • NeurIPS 2016 • Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alexander J. Smola

We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonsmooth part is convex.

no code implementations • NeurIPS 2016 • Chengtao Li, Stefanie Jegelka, Suvrit Sra

We consider the task of rapidly sampling from such constrained measures, and develop fast Markov chain samplers for them.

no code implementations • 27 Jul 2016 • Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

Finally, we show that the faster convergence rates of our variance reduced methods also translate into improved convergence rates for the stochastic setting.

1 code implementation • 18 Jul 2016 • Pourya Habib Zadeh, Reshad Hosseini, Suvrit Sra

We revisit the task of learning a Euclidean metric from data.

no code implementations • 13 Jul 2016 • Chengtao Li, Stefanie Jegelka, Suvrit Sra

In this note we consider sampling from (non-homogeneous) strongly Rayleigh probability measures.

4 code implementations • NeurIPS 2016 • Zelda Mariet, Suvrit Sra

Determinantal Point Processes (DPPs) are probabilistic models over all subsets a ground set of $N$ items.

no code implementations • NeurIPS 2016 • Hongyi Zhang, Sashank J. Reddi, Suvrit Sra

We study optimization of finite sums of geodesically smooth functions on Riemannian manifolds.

no code implementations • 23 May 2016 • Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

This paper builds upon our recent series of papers on fast stochastic methods for smooth nonconvex optimization [22, 23], with a novel analysis for nonconvex and nonsmooth functions.

2 code implementations • 1 May 2016 • Suvrit Sra

The modern data analyst must cope with data encoded in various forms, vectors, matrices, strings, graphs, or more.

no code implementations • 7 Apr 2016 • Ke Jiang, Suvrit Sra, Brian Kulis

Topic models have emerged as fundamental tools in unsupervised machine learning.

no code implementations • 19 Mar 2016 • Chengtao Li, Stefanie Jegelka, Suvrit Sra

Its theoretical guarantees and empirical performance rely critically on the quality of the landmarks selected.

no code implementations • 19 Mar 2016 • Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

We analyze a fast incremental aggregated gradient method for optimizing nonconvex problems of the form $\min_x \sum_i f_i(x)$.

no code implementations • 19 Mar 2016 • Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola

We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them.

no code implementations • 19 Feb 2016 • Hongyi Zhang, Suvrit Sra

Geodesic convexity generalizes the notion of (vector space) convexity to nonlinear metric spaces.

no code implementations • 7 Dec 2015 • Chengtao Li, Suvrit Sra, Stefanie Jegelka

We present a framework for accelerating a spectrum of machine learning algorithms that require computation of bilinear inverse forms $u^\top A^{-1}u$, where $A$ is a positive definite matrix and $u$ a given vector.

no code implementations • NeurIPS 2015 • Reshad Hosseini, Suvrit Sra

We take a new look at parameter estimation for Gaussian Mixture Model (GMMs).

2 code implementations • 16 Nov 2015 • Zelda Mariet, Suvrit Sra

We introduce Divnet, a flexible technique for learning networks with diverse neurons.

no code implementations • 4 Sep 2015 • Chengtao Li, Stefanie Jegelka, Suvrit Sra

Our method takes advantage of the diversity property of subsets sampled from a DPP, and proceeds in two stages: first it constructs coresets for the ground set of items; thereafter, it efficiently samples subsets based on the constructed coresets.

no code implementations • 20 Aug 2015 • Suvrit Sra, Adams Wei Yu, Mu Li, Alexander J. Smola

We study distributed stochastic convex optimization under the delayed gradient model where the server nodes perform parameter updates, while the worker nodes compute stochastic gradients.

no code implementations • 4 Aug 2015 • Zelda Mariet, Suvrit Sra

Determinantal point processes (DPPs) offer an elegant tool for encoding probabilities over subsets of a ground set.

no code implementations • 10 Jul 2015 • Anoop Cherian, Suvrit Sra

Inspired by the great success of dictionary learning and sparse coding for vector-valued data, our goal in this paper is to represent data in the form of SPD matrices as sparse conic combinations of SPD atoms from a learned dictionary via a Riemannian geometric approach.

no code implementations • 25 Jun 2015 • Reshad Hosseini, Suvrit Sra

We take a new look at parameter estimation for Gaussian Mixture Models (GMMs).

no code implementations • NeurIPS 2015 • Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabás Póczos, Alex Smola

We demonstrate the empirical performance of our method through a concrete realization of asynchronous SVRG.

no code implementations • 5 Mar 2015 • K. S. Sesh Kumar, Alvaro Barbero, Stefanie Jegelka, Suvrit Sra, Francis Bach

By exploiting results from convex and submodular theory, we reformulate the quadratic energy minimization problem as a total variation denoising problem, which, when viewed geometrically, enables the use of projection and reflection based convex methods.

no code implementations • NeurIPS 2014 • Adams Wei Yu, Wanli Ma, YaoLiang Yu, Jaime Carbonell, Suvrit Sra

We study the problem of finding structured low-rank matrices using nuclear norm regularization where the structure is encoded by a linear map.

3 code implementations • 3 Nov 2014 • Álvaro Barbero, Suvrit Sra

We study \emph{TV regularization}, a widely used technique for eliciting structured sparsity.

Ranked #1 on Microarray Classification on ArrayCGH

no code implementations • 17 Oct 2014 • Reshad Hosseini, Suvrit Sra, Lucas Theis, Matthias Bethge

We study modeling and inference with the Elliptical Gamma Distribution (EGD).

no code implementations • 22 Sep 2014 • Yu-Xiang Wang, Veeranjaneyulu Sadhanala, Wei Dai, Willie Neiswanger, Suvrit Sra, Eric P. Xing

We develop parallel and distributed Frank-Wolfe algorithms; the former on shared memory machines with mini-batching, and the latter in a delayed update framework.

no code implementations • 9 Sep 2014 • Sashank Reddi, Ahmed Hefny, Carlton Downey, Avinava Dubey, Suvrit Sra

We develop randomized (block) coordinate descent (CD) methods for linearly constrained convex optimization.

no code implementations • 1 Feb 2014 • David Lopez-Paz, Suvrit Sra, Alex Smola, Zoubin Ghahramani, Bernhard Schölkopf

Although nonlinear variants of PCA and CCA have been proposed, these are computationally prohibitive in the large scale.

no code implementations • NeurIPS 2013 • Suvrit Sra, Reshad Hosseini

We exploit the remarkable structure of the convex cone of positive definite matrices which allows one to uncover hidden geodesic convexity of objective functions that are nonconvex in the ordinary Euclidean sense.

no code implementations • 29 Nov 2013 • Mikhail Langovoy, Suvrit Sra

Large graphs abound in machine learning, data mining, and several related areas.

no code implementations • NeurIPS 2013 • Stefanie Jegelka, Francis Bach, Suvrit Sra

A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution.

no code implementations • NeurIPS 2012 • Suvrit Sra

Symmetric positive definite (spd) matrices are remarkably pervasive in a multitude of scientific disciplines, including machine learning and optimization.

no code implementations • NeurIPS 2012 • Suvrit Sra

To our knowledge, our framework is first to develop and analyze incremental \emph{nonconvex} proximal-splitting algorithms, even if we disregard the ability to handle nonvanishing errors.

no code implementations • 8 Oct 2011 • Suvrit Sra

Positive definite matrices abound in a dazzling variety of applications.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.