Search Results for author: Suvrit Sra

Found 98 papers, 16 papers with code

Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition

no code implementations • ICML 2020 • Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, Tiancheng Yu

We consider the task of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses.

Paper
Add Code

Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions

no code implementations • ICML 2020 • Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Suvrit Sra, Ali Jadbabaie

Therefore, we introduce the notion of (delta, epsilon)-stationarity, a generalization that allows for a point to be within distance delta of an epsilon-stationary point and reduces to epsilon-stationarity for smooth functions.

Paper
Add Code

Efficient Sampling on Riemannian Manifolds via Langevin MCMC

no code implementations • 15 Feb 2024 • Xiang Cheng, Jingzhao Zhang, Suvrit Sra

We study the task of efficiently sampling from a Gibbs distribution $d \pi^* = e^{-h} d {vol}_g$ over a Riemannian manifold $M$ via (geometric) Langevin MCMC; this algorithm involves computing exponential maps in random Gaussian directions and is efficiently implementable in practice.

Paper
Add Code

Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context

no code implementations • 11 Dec 2023 • Xiang Cheng, Yuxin Chen, Suvrit Sra

Many neural network architectures are known to be Turing Complete, and can thus, in principle implement arbitrary algorithms.

In-Context Learning

Paper
Add Code

Linear attention is (maybe) all you need (to understand transformer optimization)

1 code implementation • 2 Oct 2023 • Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra

Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics.

Paper
Code

Invex Programs: First Order Algorithms and Their Convergence

no code implementations • 10 Jul 2023 • Adarsh Barik, Suvrit Sra, Jean Honorio

Invex programs are a special kind of non-convex problems which attain global minima at every stationary point.

Paper
Add Code

How to escape sharp minima with random perturbations

no code implementations • 25 May 2023 • Kwangjun Ahn, Ali Jadbabaie, Suvrit Sra

Under this notion, we then analyze algorithms that find approximate flat minima efficiently.

Paper
Add Code

On the Training Instability of Shuffling SGD with Batch Normalization

no code implementations • 24 Feb 2023 • David X. Wu, Chulhee Yun, Suvrit Sra

We uncover how SGD interacts with batch normalization and can exhibit undesirable training dynamics such as divergence.

regression

Paper
Add Code

Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?

no code implementations • 30 Dec 2022 • Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra

We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system.

Representation Learning

Paper
Add Code

On a class of geodesically convex optimization problems solved via Euclidean MM methods

no code implementations • 22 Jun 2022 • Melanie Weber, Suvrit Sra

We study geodesically convex (g-convex) problems that can be written as a difference of Euclidean convex functions.

BIG-bench Machine Learning Riemannian optimization

Paper
Add Code

Understanding the unstable convergence of gradient descent

no code implementations • 3 Apr 2022 • Kwangjun Ahn, Jingzhao Zhang, Suvrit Sra

Most existing analyses of (stochastic) gradient descent rely on the condition that for $L$-smooth costs, the step size is less than $2/L$.

Paper
Add Code

Sign and Basis Invariant Networks for Spectral Graph Representation Learning

2 code implementations • 25 Feb 2022 • Derek Lim, Joshua Robinson, Lingxiao Zhao, Tess Smidt, Suvrit Sra, Haggai Maron, Stefanie Jegelka

We introduce SignNet and BasisNet -- new neural architectures that are invariant to two key symmetries displayed by eigenvectors: (i) sign flips, since if $v$ is an eigenvector then so is $-v$; and (ii) more general basis symmetries, which occur in higher dimensional eigenspaces with infinitely many choices of basis eigenvectors.

Ranked #10 on Graph Regression on ZINC-500k

Graph Regression Graph Representation Learning

Paper
Code

Sion's Minimax Theorem in Geodesic Metric Spaces and a Riemannian Extragradient Algorithm

no code implementations • 13 Feb 2022 • Peiyuan Zhang, Jingzhao Zhang, Suvrit Sra

Deciding whether saddle points exist or are approximable for nonconvex-nonconcave problems is usually intractable.

Paper
Add Code

Time varying regression with hidden linear dynamics

no code implementations • 29 Dec 2021 • Ali Jadbabaie, Horia Mania, Devavrat Shah, Suvrit Sra

We revisit a model for time-varying linear regression that assumes the unknown parameters evolve according to a linear dynamical system.

regression

Paper
Add Code

Max-Margin Contrastive Learning

1 code implementation • 21 Dec 2021 • Anshul Shah, Suvrit Sra, Rama Chellappa, Anoop Cherian

Standard contrastive learning approaches usually require a large number of negatives for effective unsupervised learning and often exhibit slow convergence.

Ranked #108 on Self-Supervised Image Classification on ImageNet

Contrastive Learning Representation Learning +1

Paper
Code

Understanding Riemannian Acceleration via a Proximal Extragradient Framework

no code implementations • 4 Nov 2021 • Jikai Jin, Suvrit Sra

We contribute to advancing the understanding of Riemannian accelerated gradient methods.

Riemannian optimization

Paper
Add Code

Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

no code implementations • ICLR 2022 • Chulhee Yun, Shashank Rajput, Suvrit Sra

In distributed learning, local SGD (also known as federated averaging) and its simple baseline minibatch SGD are widely studied optimization methods.

Paper
Add Code

Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective

no code implementations • 12 Oct 2021 • Jingzhao Zhang, Haochuan Li, Suvrit Sra, Ali Jadbabaie

This work examines the deep disconnect between existing theoretical analyses of gradient-based algorithms and the practice of training deep neural networks.

Paper
Add Code

Can contrastive learning avoid shortcut solutions?

1 code implementation • NeurIPS 2021 • Joshua Robinson, Li Sun, Ke Yu, Kayhan Batmanghelich, Stefanie Jegelka, Suvrit Sra

However, we observe that the contrastive loss does not always sufficiently guide which features are extracted, a behavior that can negatively impact the performance on downstream tasks via "shortcuts", i. e., by inadvertently suppressing important predictive features.

Contrastive Learning

Paper
Code

Can Single-Shuffle SGD be Better than Reshuffling SGD and GD?

no code implementations • 12 Mar 2021 • Chulhee Yun, Suvrit Sra, Ali Jadbabaie

We propose matrix norm inequalities that extend the Recht-R\'e (2012) conjecture on a noncommutative AM-GM inequality by supplementing it with another inequality that accounts for single-shuffle, which is a widely used without-replacement sampling scheme that shuffles only once in the beginning and is overlooked in the Recht-R\'e conjecture.

Paper
Add Code

Provably Efficient Algorithms for Multi-Objective Competitive RL

no code implementations • 5 Feb 2021 • Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra

To our knowledge, this work provides the first provably efficient algorithms for vector-valued Markov games and our theoretical guarantees are near-optimal.

Multi-Objective Reinforcement Learning

Paper
Add Code

Stochastic Optimization with Non-stationary Noise: The Power of Moment Estimation

no code implementations • 1 Jan 2021 • Jingzhao Zhang, Hongzhou Lin, Subhro Das, Suvrit Sra, Ali Jadbabaie

In particular, standard results on optimal convergence rates for stochastic optimization assume either there exists a uniform bound on the moments of the gradient noise, or that the noise decays as the algorithm progresses.

Stochastic Optimization

Paper
Add Code

Why do classifier accuracies show linear trends under distribution shift?

no code implementations • 31 Dec 2020 • Horia Mania, Suvrit Sra

Recent studies of generalization in deep learning have observed a puzzling trend: accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution.

Paper
Add Code

Online Learning in Unknown Markov Games

no code implementations • 28 Oct 2020 • Yi Tian, Yuanhao Wang, Tiancheng Yu, Suvrit Sra

We study online learning in unknown Markov games, a problem that arises in episodic multi-agent reinforcement learning where the actions of the opponents are unobservable.

Multi-agent Reinforcement Learning

Paper
Add Code

Coping with Label Shift via Distributionally Robust Optimisation

1 code implementation • ICLR 2021 • Jingzhao Zhang, Aditya Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra

The label shift problem refers to the supervised learning setting where the train and test label distributions do not match.

Paper
Code

Contrastive Learning with Hard Negative Samples

1 code implementation • ICLR 2021 • Joshua Robinson, Ching-Yao Chuang, Suvrit Sra, Stefanie Jegelka

How can you sample good negative examples for contrastive learning?

Contrastive Learning Metric Learning

237

Paper
Code

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

no code implementations • NeurIPS 2020 • Yi Tian, Jian Qian, Suvrit Sra

We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

SGD with shuffling: optimal rates without component convexity and large epoch requirements

no code implementations • NeurIPS 2020 • Kwangjun Ahn, Chulhee Yun, Suvrit Sra

We study without-replacement SGD for solving finite-sum optimization problems.

Paper
Add Code

Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity

no code implementations • 8 Jun 2020 • Jingzhao Zhang, Hongzhou Lin, Subhro Das, Suvrit Sra, Ali Jadbabaie

We study oracle complexity of gradient based methods for stochastic approximation problems.

Stochastic Optimization

Paper
Add Code

Understanding Nesterov's Acceleration via Proximal Point Method

no code implementations • 17 May 2020 • Kwangjun Ahn, Suvrit Sra

The proximal point method (PPM) is a fundamental method in optimization that is often used as a building block for designing optimization algorithms.

Paper
Add Code

On Tight Convergence Rates of Without-replacement SGD

no code implementations • 18 Apr 2020 • Kwangjun Ahn, Suvrit Sra

For solving finite-sum optimization problems, SGD without replacement sampling is empirically shown to outperform SGD.

Paper
Add Code

Strength from Weakness: Fast Learning Using Weak Supervision

no code implementations • ICML 2020 • Joshua Robinson, Stefanie Jegelka, Suvrit Sra

Our theoretical results are reflected empirically across a range of tasks and illustrate how weak labels speed up learning on the strong task.

Weakly-supervised Learning

Paper
Add Code

Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions

no code implementations • 10 Feb 2020 • Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Ali Jadbabaie, Suvrit Sra

In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds.

Paper
Add Code

From Nesterov's Estimate Sequence to Riemannian Acceleration

no code implementations • 24 Jan 2020 • Kwangjun Ahn, Suvrit Sra

We control this distortion by developing a novel geometric inequality, which permits us to propose and analyze a Riemannian counterpart to Nesterov's accelerated gradient method.

Paper
Add Code

Why are Adaptive Methods Good for Attention Models?

no code implementations • NeurIPS 2020 • Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J. Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the \emph{de facto} algorithm in deep learning, adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across important tasks, such as attention models.

Paper
Add Code

Learning Adversarial MDPs with Bandit Feedback and Unknown Transition

no code implementations • 3 Dec 2019 • Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, Tiancheng Yu

We consider the problem of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses.

Paper
Add Code

Projection-free nonconvex stochastic optimization on Riemannian manifolds

no code implementations • 9 Oct 2019 • Melanie Weber, Suvrit Sra

We present algorithms for both purely stochastic optimization and finite-sum problems.

Stochastic Optimization

Paper
Add Code

Why ADAM Beats SGD for Attention Models

no code implementations • 25 Sep 2019 • Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning, adaptive methods like Adam have been observed to outperform SGD across important tasks, such as attention models.

Paper
Add Code

Efficient Policy Learning for Non-Stationary MDPs under Adversarial Manipulation

no code implementations • 22 Jul 2019 • Tiancheng Yu, Suvrit Sra

A Markov Decision Process (MDP) is a popular model for reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Are deep ResNets provably better than linear predictors?

no code implementations • NeurIPS 2019 • Chulhee Yun, Suvrit Sra, Ali Jadbabaie

Recent results in the literature indicate that a residual network (ResNet) composed of a single residual block outperforms linear predictors, in the sense that all local minima in its optimization landscape are at least as good as the best linear predictor.

Paper
Add Code

Near Optimal Stratified Sampling

no code implementations • 26 Jun 2019 • Tiancheng Yu, Xiyu Zhai, Suvrit Sra

The performance of a machine learning system is usually evaluated by using i. i. d.\ observations with true labels.

Paper
Add Code

Flexible Modeling of Diversity with Strongly Log-Concave Distributions

1 code implementation • NeurIPS 2019 • Joshua Robinson, Suvrit Sra, Stefanie Jegelka

We propose SLC as the right extension of SR that enables easier, more intuitive control over diversity, illustrating this via examples of practical importance.

Paper
Code

Why gradient clipping accelerates training: A theoretical justification for adaptivity

1 code implementation • ICLR 2020 • Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie

We provide a theoretical explanation for the effectiveness of gradient clipping in training deep neural networks.

General Classification Image Classification +1

Paper
Code

Escaping Saddle Points with Adaptive Gradient Methods

no code implementations • 26 Jan 2019 • Matthew Staib, Sashank J. Reddi, Satyen Kale, Sanjiv Kumar, Suvrit Sra

Adaptive methods such as Adam and RMSProp are widely used in deep learning but are not well understood.

Paper
Add Code

Deep-RBF Networks Revisited: Robust Classification with Rejection

no code implementations • 7 Dec 2018 • Pourya Habib Zadeh, Reshad Hosseini, Suvrit Sra

On the other hand, deep-RBF networks assign high confidence only to the regions containing enough feature points, but they have been discounted due to the widely-held belief that they have the vanishing gradient problem.

Adversarial Attack Classification +3

Paper
Add Code

Exponentiated Strongly Rayleigh Distributions

no code implementations • NeurIPS 2018 • Zelda E. Mariet, Suvrit Sra, Stefanie Jegelka

Strongly Rayleigh (SR) measures are discrete probability distributions over the subsets of a ground set.

Point Processes

Paper
Add Code

R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate

no code implementations • 10 Nov 2018 • Jingzhao Zhang, Hongyi Zhang, Suvrit Sra

We study smooth stochastic optimization problems on Riemannian manifolds.

Stochastic Optimization

Paper
Add Code

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity

no code implementations • NeurIPS 2019 • Chulhee Yun, Suvrit Sra, Ali Jadbabaie

We also prove that width $\Theta(\sqrt{N})$ is necessary and sufficient for memorizing $N$ data points, proving tight bounds on memorization capacity.

Memorization

Paper
Add Code

Efficiently testing local optimality and escaping saddles for ReLU networks

no code implementations • ICLR 2019 • Chulhee Yun, Suvrit Sra, Ali Jadbabaie

In the benign case, we solve one equality constrained QP, and we prove that projected gradient descent solves it exponentially fast.

Paper
Add Code

Random Shuffling Beats SGD after Finite Epochs

no code implementations • 26 Jun 2018 • Jeff Z. HaoChen, Suvrit Sra

We present the first (to our knowledge) non-asymptotic solution to this problem, which shows that after a "reasonable" number of epochs RandomShuffle indeed converges faster than SGD.

Paper
Add Code

Towards Riemannian Accelerated Gradient Methods

no code implementations • 7 Jun 2018 • Hongyi Zhang, Suvrit Sra

We propose a Riemannian version of Nesterov's Accelerated Gradient algorithm (RAGD), and show that for geodesically smooth and strongly convex problems, within a neighborhood of the minimizer whose radius depends on the condition number as well as the sectional curvature of the manifold, RAGD converges to the minimizer with acceleration.

Paper
Add Code

Direct Runge-Kutta Discretization Achieves Acceleration

no code implementations • NeurIPS 2018 • Jingzhao Zhang, Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie

We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method.

Paper
Add Code

Non-Linear Temporal Subspace Representations for Activity Recognition

no code implementations • CVPR 2018 • Anoop Cherian, Suvrit Sra, Stephen Gould, Richard Hartley

As these features are often non-linear, we propose a novel pooling method, kernelized rank pooling, that represents a given sequence compactly as the pre-image of the parameters of a hyperplane in a reproducing kernel Hilbert space, projections of data onto which captures their temporal order.

Action Recognition Riemannian optimization +3

Paper
Add Code

Learning Determinantal Point Processes by Corrective Negative Sampling

no code implementations • 15 Feb 2018 • Zelda Mariet, Mike Gartrell, Suvrit Sra

To address this issue, which reduces the quality of the learned model, we introduce a novel optimization problem, Contrastive Estimation (CE), which encodes information about "negative" samples into the basic learning model.

Language Modelling Point Processes

Paper
Add Code

Small nonlinearities in activation functions create bad local minima in neural networks

no code implementations • ICLR 2019 • Chulhee Yun, Suvrit Sra, Ali Jadbabaie

Our results thus indicate that in general "no spurious local minima" is a property limited to deep linear networks, and insights obtained from linear networks may not be robust.

Paper
Add Code

Riemannian Optimization via Frank-Wolfe Methods

1 code implementation • 30 Oct 2017 • Melanie Weber, Suvrit Sra

Both tasks involve geodesically convex interval constraints, for which we show that the Riemannian "linear" oracle required by RFW admits a closed-form solution; this result may be of independent interest.

Riemannian optimization

Paper
Code

A Generic Approach for Escaping Saddle points

no code implementations • 5 Sep 2017 • Sashank J. Reddi, Manzil Zaheer, Suvrit Sra, Barnabas Poczos, Francis Bach, Ruslan Salakhutdinov, Alexander J. Smola

A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points.

Second-order methods

Paper
Add Code

Global optimality conditions for deep neural networks

no code implementations • ICLR 2018 • Chulhee Yun, Suvrit Sra, Ali Jadbabaie

We study the error landscape of deep linear and nonlinear neural networks with the squared error loss.

Paper
Add Code

Distributional Adversarial Networks

1 code implementation • ICLR 2018 • Chengtao Li, David Alvarez-Melis, Keyulu Xu, Stefanie Jegelka, Suvrit Sra

We propose a framework for adversarial training that relies on a sample rather than a single sample point as the fundamental unit of discrimination.

Domain Adaptation

Paper
Code

An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization

1 code implementation • 10 Jun 2017 • Reshad Hosseini, Suvrit Sra

This motivates us to take a closer look at the problem geometry, and derive a better formulation that is much more amenable to Riemannian optimization.

Density Estimation Riemannian optimization

Paper
Code

Sequence Summarization Using Order-constrained Kernelized Feature Subspaces

no code implementations • 24 May 2017 • Anoop Cherian, Suvrit Sra, Richard Hartley

As these features are often non-linear, we propose a novel pooling method, kernelized rank pooling, that represents a given sequence compactly as the pre-image of the parameters of a hyperplane in an RKHS, projections of data onto which captures their temporal order.

Action Recognition Riemannian optimization +3

Paper
Add Code

Polynomial Time Algorithms for Dual Volume Sampling

no code implementations • NeurIPS 2017 • Chengtao Li, Stefanie Jegelka, Suvrit Sra

We study dual volume sampling, a method for selecting k columns from an n x m short and wide matrix (n <= k <= m) such that the probability of selection is proportional to the volume spanned by the rows of the induced submatrix.

Experimental Design

Paper
Add Code

Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization

no code implementations • NeurIPS 2016 • Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alexander J. Smola

We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonsmooth part is convex.

Paper
Add Code

Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling

no code implementations • NeurIPS 2016 • Chengtao Li, Stefanie Jegelka, Suvrit Sra

We consider the task of rapidly sampling from such constrained measures, and develop fast Markov chain samplers for them.

Point Processes

Paper
Add Code

Stochastic Frank-Wolfe Methods for Nonconvex Optimization

no code implementations • 27 Jul 2016 • Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

Finally, we show that the faster convergence rates of our variance reduced methods also translate into improved convergence rates for the stochastic setting.

Paper
Add Code

Geometric Mean Metric Learning

1 code implementation • 18 Jul 2016 • Pourya Habib Zadeh, Reshad Hosseini, Suvrit Sra

We revisit the task of learning a Euclidean metric from data.

General Classification Metric Learning +1

Paper
Code

Fast Sampling for Strongly Rayleigh Measures with Application to Determinantal Point Processes

no code implementations • 13 Jul 2016 • Chengtao Li, Stefanie Jegelka, Suvrit Sra

In this note we consider sampling from (non-homogeneous) strongly Rayleigh probability measures.

Point Processes

Paper
Add Code

Kronecker Determinantal Point Processes

4 code implementations • NeurIPS 2016 • Zelda Mariet, Suvrit Sra

Determinantal Point Processes (DPPs) are probabilistic models over all subsets a ground set of $N$ items.

Point Processes Stochastic Optimization

Paper
Code

Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds

no code implementations • NeurIPS 2016 • Hongyi Zhang, Sashank J. Reddi, Suvrit Sra

We study optimization of finite sums of geodesically smooth functions on Riemannian manifolds.

Riemannian optimization Stochastic Optimization

Paper
Add Code

Fast Stochastic Methods for Nonsmooth Nonconvex Optimization

no code implementations • 23 May 2016 • Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

This paper builds upon our recent series of papers on fast stochastic methods for smooth nonconvex optimization [22, 23], with a novel analysis for nonconvex and nonsmooth functions.

Paper
Add Code

Directional Statistics in Machine Learning: a Brief Review

2 code implementations • 1 May 2016 • Suvrit Sra

The modern data analyst must cope with data encoded in various forms, vectors, matrices, strings, graphs, or more.

BIG-bench Machine Learning

325

Paper
Code

Combinatorial Topic Models using Small-Variance Asymptotics

no code implementations • 7 Apr 2016 • Ke Jiang, Suvrit Sra, Brian Kulis

Topic models have emerged as fundamental tools in unsupervised machine learning.

Combinatorial Optimization Topic Models

Paper
Add Code

Fast DPP Sampling for Nyström with Application to Kernel Methods

no code implementations • 19 Mar 2016 • Chengtao Li, Stefanie Jegelka, Suvrit Sra

Its theoretical guarantees and empirical performance rely critically on the quality of the landmarks selected.

Point Processes regression

Paper
Add Code

Fast Incremental Method for Nonconvex Optimization

no code implementations • 19 Mar 2016 • Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

We analyze a fast incremental aggregated gradient method for optimizing nonconvex problems of the form $\min_x \sum_i f_i(x)$.

Paper
Add Code

Stochastic Variance Reduction for Nonconvex Optimization

no code implementations • 19 Mar 2016 • Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola

We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them.

Paper
Add Code

First-order Methods for Geodesically Convex Optimization

no code implementations • 19 Feb 2016 • Hongyi Zhang, Suvrit Sra

Geodesic convexity generalizes the notion of (vector space) convexity to nonlinear metric spaces.

Paper
Add Code

Gauss quadrature for matrix inverse forms with applications

no code implementations • 7 Dec 2015 • Chengtao Li, Suvrit Sra, Stefanie Jegelka

We present a framework for accelerating a spectrum of machine learning algorithms that require computation of bilinear inverse forms $u^\top A^{-1}u$, where $A$ is a positive definite matrix and $u$ a given vector.

BIG-bench Machine Learning Point Processes

Paper
Add Code

Matrix Manifold Optimization for Gaussian Mixtures

no code implementations • NeurIPS 2015 • Reshad Hosseini, Suvrit Sra

We take a new look at parameter estimation for Gaussian Mixture Model (GMMs).

Density Estimation Riemannian optimization

Paper
Add Code

Diversity Networks: Neural Network Compression Using Determinantal Point Processes

2 code implementations • 16 Nov 2015 • Zelda Mariet, Suvrit Sra

We introduce Divnet, a flexible technique for learning networks with diverse neurons.

Neural Network Compression Point Processes

Paper
Code

Efficient Sampling for k-Determinantal Point Processes

no code implementations • 4 Sep 2015 • Chengtao Li, Stefanie Jegelka, Suvrit Sra

Our method takes advantage of the diversity property of subsets sampled from a DPP, and proceeds in two stages: first it constructs coresets for the ground set of items; thereafter, it efficiently samples subsets based on the constructed coresets.

Point Processes

Paper
Add Code

AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization

no code implementations • 20 Aug 2015 • Suvrit Sra, Adams Wei Yu, Mu Li, Alexander J. Smola

We study distributed stochastic convex optimization under the delayed gradient model where the server nodes perform parameter updates, while the worker nodes compute stochastic gradients.

Paper
Add Code

Fixed-point algorithms for learning determinantal point processes

no code implementations • 4 Aug 2015 • Zelda Mariet, Suvrit Sra

Determinantal point processes (DPPs) offer an elegant tool for encoding probabilities over subsets of a ground set.

Point Processes

Paper
Add Code

Riemannian Dictionary Learning and Sparse Coding for Positive Definite Matrices

no code implementations • 10 Jul 2015 • Anoop Cherian, Suvrit Sra

Inspired by the great success of dictionary learning and sparse coding for vector-valued data, our goal in this paper is to represent data in the form of SPD matrices as sparse conic combinations of SPD atoms from a learned dictionary via a Riemannian geometric approach.

BIG-bench Machine Learning Dictionary Learning +2

Paper
Add Code

Manifold Optimization for Gaussian Mixture Models

no code implementations • 25 Jun 2015 • Reshad Hosseini, Suvrit Sra

We take a new look at parameter estimation for Gaussian Mixture Models (GMMs).

Density Estimation Riemannian optimization

Paper
Add Code

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

no code implementations • NeurIPS 2015 • Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabás Póczos, Alex Smola

We demonstrate the empirical performance of our method through a concrete realization of asynchronous SVRG.

Paper
Add Code

Convex Optimization for Parallel Energy Minimization

no code implementations • 5 Mar 2015 • K. S. Sesh Kumar, Alvaro Barbero, Stefanie Jegelka, Suvrit Sra, Francis Bach

By exploiting results from convex and submodular theory, we reformulate the quadratic energy minimization problem as a total variation denoising problem, which, when viewed geometrically, enables the use of projection and reflection based convex methods.

Denoising

Paper
Add Code

Efficient Structured Matrix Rank Minimization

no code implementations • NeurIPS 2014 • Adams Wei Yu, Wanli Ma, YaoLiang Yu, Jaime Carbonell, Suvrit Sra

We study the problem of finding structured low-rank matrices using nuclear norm regularization where the structure is encoded by a linear map.

Paper
Add Code

Modular proximal optimization for multidimensional total-variation regularization

3 code implementations • 3 Nov 2014 • Álvaro Barbero, Suvrit Sra

We study \emph{TV regularization}, a widely used technique for eliciting structured sparsity.

Ranked #1 on Microarray Classification on ArrayCGH

Image Deconvolution Image Denoising +1

210

Paper
Code

Inference and Mixture Modeling with the Elliptical Gamma Distribution

no code implementations • 17 Oct 2014 • Reshad Hosseini, Suvrit Sra, Lucas Theis, Matthias Bethge

We study modeling and inference with the Elliptical Gamma Distribution (EGD).

Density Estimation

Paper
Add Code

Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms

no code implementations • 22 Sep 2014 • Yu-Xiang Wang, Veeranjaneyulu Sadhanala, Wei Dai, Willie Neiswanger, Suvrit Sra, Eric P. Xing

We develop parallel and distributed Frank-Wolfe algorithms; the former on shared memory machines with mini-batching, and the latter in a delayed update framework.

Paper
Add Code

Large-scale randomized-coordinate descent methods with non-separable linear constraints

no code implementations • 9 Sep 2014 • Sashank Reddi, Ahmed Hefny, Carlton Downey, Avinava Dubey, Suvrit Sra

We develop randomized (block) coordinate descent (CD) methods for linearly constrained convex optimization.

Paper
Add Code

Randomized Nonlinear Component Analysis

no code implementations • 1 Feb 2014 • David Lopez-Paz, Suvrit Sra, Alex Smola, Zoubin Ghahramani, Bernhard Schölkopf

Although nonlinear variants of PCA and CCA have been proposed, these are computationally prohibitive in the large scale.

Clustering

Paper
Add Code

Geometric optimisation on positive definite matrices for elliptically contoured distributions

no code implementations • NeurIPS 2013 • Suvrit Sra, Reshad Hosseini

We exploit the remarkable structure of the convex cone of positive definite matrices which allows one to uncover hidden geodesic convexity of objective functions that are nonconvex in the ordinary Euclidean sense.

Riemannian optimization

Paper
Add Code

Statistical estimation for optimization problems on graphs

no code implementations • 29 Nov 2013 • Mikhail Langovoy, Suvrit Sra

Large graphs abound in machine learning, data mining, and several related areas.

Combinatorial Optimization Position

Paper
Add Code

Reflection methods for user-friendly submodular optimization

no code implementations • NeurIPS 2013 • Stefanie Jegelka, Francis Bach, Suvrit Sra

A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution.

Image Segmentation Semantic Segmentation

Paper
Add Code

A new metric on the manifold of kernel matrices with application to matrix geometric means

no code implementations • NeurIPS 2012 • Suvrit Sra

Symmetric positive definite (spd) matrices are remarkably pervasive in a multitude of scientific disciplines, including machine learning and optimization.

Paper
Add Code

Scalable nonconvex inexact proximal splitting

no code implementations • NeurIPS 2012 • Suvrit Sra

To our knowledge, our framework is first to develop and analyze incremental \emph{nonconvex} proximal-splitting algorithms, even if we disregard the ability to handle nonvanishing errors.

Paper
Add Code

Positive definite matrices and the S-divergence

no code implementations • 8 Oct 2011 • Suvrit Sra

Positive definite matrices abound in a dazzling variety of applications.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.