Search Results for author: Aryan Mokhtari

Found 63 papers, 8 papers with code

In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness

no code implementations18 Feb 2024 Liam Collins, Advait Parulekar, Aryan Mokhtari, Sujay Sanghavi, Sanjay Shakkottai

We show that an attention unit learns a window that it uses to implement a nearest-neighbors predictor adapted to the landscape of the pretraining tasks.

In-Context Learning

An Accelerated Gradient Method for Simple Bilevel Optimization with Convex Lower-level Problem

no code implementations12 Feb 2024 Jincheng Cao, Ruichen Jiang, Erfan Yazdandoost Hamedani, Aryan Mokhtari

In this paper, we focus on simple bilevel optimization problems, where we minimize a convex smooth objective function over the optimal solution set of another convex smooth constrained optimization problem.

Bilevel Optimization

Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate

no code implementations5 Jan 2024 Ruichen Jiang, Parameswaran Raman, Shoham Sabach, Aryan Mokhtari, Mingyi Hong, Volkan Cevher

In this paper, we introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of ${O}\left(\frac{1}{mk}+\frac{1}{k^2}\right)$ for solving convex optimization problems.

Second-order methods

Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks

no code implementations13 Jul 2023 Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai

An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network.

Binary Classification Multi-Task Learning +1

Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate

no code implementations27 Jun 2023 Zhan Gao, Aryan Mokhtari, Alec Koppel

Interestingly, our established non-asymptotic superlinear convergence rate demonstrates an explicit trade-off between the convergence speed and memory requirement, which to our knowledge, is the first of its kind.

Online Learning Guided Curvature Approximation: A Quasi-Newton Method with Global Non-Asymptotic Superlinear Convergence

no code implementations16 Feb 2023 Ruichen Jiang, Qiujiang Jin, Aryan Mokhtari

Quasi-Newton algorithms are among the most popular iterative methods for solving unconstrained minimization problems, largely due to their favorable superlinear convergence property.

InfoNCE Loss Provably Learns Cluster-Preserving Representations

no code implementations15 Feb 2023 Advait Parulekar, Liam Collins, Karthikeyan Shanmugam, Aryan Mokhtari, Sanjay Shakkottai

The goal of contrasting learning is to learn a representation that preserves underlying clusters by keeping samples with similar content, e. g. the ``dogness'' of a dog, close to each other in the space generated by the representation.

Network Adaptive Federated Learning: Congestion and Lossy Compression

no code implementations11 Jan 2023 Parikshit Hegde, Gustavo de Veciana, Aryan Mokhtari

In order to achieve the dual goals of privacy and learning across distributed data, Federated Learning (FL) systems rely on frequent exchanges of large files (model updates) between a set of clients and the server.

Federated Learning

Future Gradient Descent for Adapting the Temporal Shifting Data Distribution in Online Recommendation Systems

no code implementations2 Sep 2022 Mao Ye, Ruichen Jiang, Haoxiang Wang, Dhruv Choudhary, Xiaocong Du, Bhargav Bhushanam, Aryan Mokhtari, Arun Kejariwal, Qiang Liu

One of the key challenges of learning an online recommendation model is the temporal domain shift, which causes the mismatch between the training and testing data distribution and hence domain generalization error.

Domain Generalization Recommendation Systems

A Conditional Gradient-based Method for Simple Bilevel Optimization with Convex Lower-level Problem

1 code implementation17 Jun 2022 Ruichen Jiang, Nazanin Abolfazli, Aryan Mokhtari, Erfan Yazdandoost Hamedani

To the best of our knowledge, our method achieves the best-known iteration complexity for the considered class of bilevel problems.

Bilevel Optimization

Straggler-Resilient Personalized Federated Learning

1 code implementation5 Jun 2022 Isidoros Tziotis, Zebang Shen, Ramtin Pedarsani, Hamed Hassani, Aryan Mokhtari

Federated Learning is an emerging learning paradigm that allows training models from samples distributed across a large network of clients while respecting privacy and communication restrictions.

Learning Theory Personalized Federated Learning +1

FedAvg with Fine Tuning: Local Updates Lead to Representation Learning

no code implementations27 May 2022 Liam Collins, Hamed Hassani, Aryan Mokhtari, Sanjay Shakkottai

We show that the reason behind generalizability of the FedAvg's output is its power in learning the common data representation among the clients' tasks, by leveraging the diversity among client data distributions via local updates.

Federated Learning Image Classification +1

Generalized Optimistic Methods for Convex-Concave Saddle Point Problems

no code implementations19 Feb 2022 Ruichen Jiang, Aryan Mokhtari

In this paper, we follow this approach and distill the underlying idea of optimism to propose a generalized optimistic method, which includes the optimistic gradient method as a special case.

Second-order methods

The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance

no code implementations11 Feb 2022 Matthew Faw, Isidoros Tziotis, Constantine Caramanis, Aryan Mokhtari, Sanjay Shakkottai, Rachel Ward

We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic gradient methods (SGD), where the step sizes change based on observed stochastic gradients, for minimizing non-convex, smooth objectives.

MAML and ANIL Provably Learn Representations

no code implementations7 Feb 2022 Liam Collins, Aryan Mokhtari, Sewoong Oh, Sanjay Shakkottai

Recent empirical evidence has driven conventional wisdom to believe that gradient-based meta-learning (GBML) methods perform well at few-shot learning because they learn an expressive data representation that is shared across tasks.

Few-Shot Learning Representation Learning

Minimax Optimization: The Case of Convex-Submodular

no code implementations1 Nov 2021 Arman Adibi, Aryan Mokhtari, Hamed Hassani

Prior literature has thus far mainly focused on studying such problems in the continuous domain, e. g., convex-concave minimax optimization is now understood to a significant extent.

Exploiting Local Convergence of Quasi-Newton Methods Globally: Adaptive Sample Size Approach

no code implementations NeurIPS 2021 Qiujiang Jin, Aryan Mokhtari

In this paper, we use an adaptive sample size scheme that exploits the superlinear convergence of quasi-Newton methods globally and throughout the entire learning process.

Exploiting Shared Representations for Personalized Federated Learning

3 code implementations14 Feb 2021 Liam Collins, Hamed Hassani, Aryan Mokhtari, Sanjay Shakkottai

Based on this intuition, we propose a novel federated learning framework and algorithm for learning a shared data representation across clients and unique local heads for each client.

Meta-Learning Multi-Task Learning +2

Generalization of Model-Agnostic Meta-Learning Algorithms: Recurring and Unseen Tasks

no code implementations NeurIPS 2021 Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

In this paper, we study the generalization properties of Model-Agnostic Meta-Learning (MAML) algorithms for supervised learning problems.

Generalization Bounds Meta-Learning

Straggler-Resilient Federated Learning: Leveraging the Interplay Between Statistical Accuracy and System Heterogeneity

no code implementations28 Dec 2020 Amirhossein Reisizadeh, Isidoros Tziotis, Hamed Hassani, Aryan Mokhtari, Ramtin Pedarsani

Federated Learning is a novel paradigm that involves learning from data samples distributed across a large network of clients while the data remains local.

Federated Learning

Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach

2 code implementations NeurIPS 2020 Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

In this paper, we study a personalized variant of the federated learning in which our goal is to find an initial shared model that current or new users can easily adapt to their local dataset by performing one or a few steps of gradient descent with respect to their own data.

Meta-Learning Personalized Federated Learning

Second Order Optimality in Decentralized Non-Convex Optimization via Perturbed Gradient Tracking

no code implementations NeurIPS 2020 Isidoros Tziotis, Constantine Caramanis, Aryan Mokhtari

In this paper we study the problem of escaping from saddle points and achieving second-order optimality in a decentralized setting where a group of agents collaborate to minimize their aggregate objective function.

How Does the Task Landscape Affect MAML Performance?

no code implementations27 Oct 2020 Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

Model-Agnostic Meta-Learning (MAML) has become increasingly popular for training models that can quickly adapt to new tasks via one or few stochastic gradient descent steps.

Few-Shot Image Classification Meta-Learning

Submodular Meta-Learning

1 code implementation NeurIPS 2020 Arman Adibi, Aryan Mokhtari, Hamed Hassani

Motivated by this terminology, we propose a novel meta-learning framework in the discrete domain where each task is equivalent to maximizing a set function under a cardinality constraint.

Meta-Learning

Federated Learning with Compression: Unified Analysis and Sharp Guarantees

1 code implementation2 Jul 2020 Farzin Haddadpour, Mohammad Mahdi Kamani, Aryan Mokhtari, Mehrdad Mahdavi

In federated learning, communication cost is often a critical bottleneck to scale up distributed optimization algorithms to collaboratively learn a model from millions of devices with potentially unreliable or limited communication and heterogeneous data distributions.

Distributed Optimization Federated Learning

Safe Learning under Uncertain Objectives and Constraints

no code implementations23 Jun 2020 Mohammad Fereydounian, Zebang Shen, Aryan Mokhtari, Amin Karbasi, Hamed Hassani

More precisely, by assuming that Reliable-FW has access to a (stochastic) gradient oracle of the objective function and a noisy feasibility oracle of the safety polytope, it finds an $\epsilon$-approximate first-order stationary point with the optimal ${\mathcal{O}}({1}/{\epsilon^2})$ gradient oracle complexity (resp.

Hybrid Model for Anomaly Detection on Call Detail Records by Time Series Forecasting

no code implementations7 Jun 2020 Aryan Mokhtari, Leyla Sadighi, Behnam Bahrak, Mojtaba Eshghie

In this paper, a new hybrid method is proposed based on various anomaly detection methods such as GARCH, K-means, and Neural Network to determine the anomalous data.

Anomaly Detection Time Series +1

Non-asymptotic Superlinear Convergence of Standard Quasi-Newton Methods

no code implementations30 Mar 2020 Qiujiang Jin, Aryan Mokhtari

In this paper, we provide a finite-time (non-asymptotic) convergence analysis for Broyden quasi-Newton algorithms under the assumptions that the objective function is strongly convex, its gradient is Lipschitz continuous, and its Hessian is Lipschitz continuous at the optimal solution.

Quantized Decentralized Stochastic Learning over Directed Graphs

no code implementations ICML 2020 Hossein Taheri, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani

We consider a decentralized stochastic learning problem where data points are distributed among computing nodes communicating over a directed graph.

Quantization

Personalized Federated Learning: A Meta-Learning Approach

no code implementations19 Feb 2020 Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

In this paper, we study a personalized variant of the federated learning in which our goal is to find an initial shared model that current or new users can easily adapt to their local dataset by performing one or a few steps of gradient descent with respect to their own data.

Meta-Learning Personalized Federated Learning

On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning

1 code implementation NeurIPS 2021 Alireza Fallah, Kristian Georgiev, Aryan Mokhtari, Asuman Ozdaglar

We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems, where the goal is to find a policy using data from several tasks represented by Markov Decision Processes (MDPs) that can be updated by one step of stochastic policy gradient for the realized MDP.

Meta-Learning Meta Reinforcement Learning +3

Stochastic Continuous Greedy ++: When Upper and Lower Bounds Match

no code implementations NeurIPS 2019 Amin Karbasi, Hamed Hassani, Aryan Mokhtari, Zebang Shen

Concretely, for a monotone and continuous DR-submodular function, \SCGPP achieves a tight $[(1-1/e)\OPT -\epsilon]$ solution while using $O(1/\epsilon^2)$ stochastic gradients and $O(1/\epsilon)$ calls to the linear optimization oracle.

A Decentralized Proximal Point-type Method for Saddle Point Problems

no code implementations31 Oct 2019 Weijie Liu, Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil, Zebang Shen, Nenggan Zheng

In this paper, we focus on solving a class of constrained non-convex non-concave saddle point problems in a decentralized manner by a group of nodes in a network.

Vocal Bursts Type Prediction

One Sample Stochastic Frank-Wolfe

no code implementations10 Oct 2019 Mingrui Zhang, Zebang Shen, Aryan Mokhtari, Hamed Hassani, Amin Karbasi

One of the beauties of the projected gradient descent method lies in its rather simple mechanism and yet stable behavior with inexact, stochastic gradients, which has led to its wide-spread use in many machine learning applications.

On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms

no code implementations27 Aug 2019 Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

We study the convergence of a class of gradient-based Model-Agnostic Meta-Learning (MAML) methods and characterize their overall complexity as well as their best achievable accuracy in terms of gradient norm for nonconvex loss functions.

Meta-Learning

Robust and Communication-Efficient Collaborative Learning

1 code implementation NeurIPS 2019 Amirhossein Reisizadeh, Hossein Taheri, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani

We consider a decentralized learning problem, where a set of computing nodes aim at solving a non-convex optimization problem collaboratively.

Quantization

Stochastic Conditional Gradient++

no code implementations19 Feb 2019 Hamed Hassani, Amin Karbasi, Aryan Mokhtari, Zebang Shen

It is known that this rate is optimal in terms of stochastic gradient evaluations.

Stochastic Optimization

A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach

no code implementations24 Jan 2019 Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil

In this paper we consider solving saddle point problems using two variants of Gradient Descent-Ascent algorithms, Extra-gradient (EG) and Optimistic Gradient Descent Ascent (OGDA) methods.

Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy

no code implementations26 Oct 2018 Majid Jahani, Xi He, Chenxin Ma, Aryan Mokhtari, Dheevatsa Mudigere, Alejandro Ribeiro, Martin Takáč

In this paper, we propose a Distributed Accumulated Newton Conjugate gradiEnt (DANCE) method in which sample size is gradually increasing to quickly obtain a solution whose empirical loss is under satisfactory statistical accuracy.

Escaping Saddle Points in Constrained Optimization

no code implementations NeurIPS 2018 Aryan Mokhtari, Asuman Ozdaglar, Ali Jadbabaie

We propose a generic framework that yields convergence to a second-order stationary point of the problem, if the convex set $\mathcal{C}$ is simple for a quadratic objective function.

An Exact Quantized Decentralized Gradient Descent Algorithm

no code implementations29 Jun 2018 Amirhossein Reisizadeh, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani

We consider the problem of decentralized consensus optimization, where the sum of $n$ smooth and strongly convex functions are minimized over $n$ distributed agents that form a connected network.

Distributed Optimization Quantization

Direct Runge-Kutta Discretization Achieves Acceleration

no code implementations NeurIPS 2018 Jingzhao Zhang, Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie

We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method.

Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization

no code implementations24 Apr 2018 Aryan Mokhtari, Hamed Hassani, Amin Karbasi

Further, for a monotone and continuous DR-submodular function and subject to a general convex body constraint, we prove that our proposed method achieves a $((1-1/e)OPT-\eps)$ guarantee with $O(1/\eps^3)$ stochastic gradient computations.

Stochastic Optimization

Conditional Gradient Method for Stochastic Submodular Maximization: Closing the Gap

no code implementations5 Nov 2017 Aryan Mokhtari, Hamed Hassani, Amin Karbasi

More precisely, for a monotone and continuous DR-submodular function and subject to a \textit{general} convex body constraint, we prove that \alg achieves a $[(1-1/e)\text{OPT} -\eps]$ guarantee (in expectation) with $\mathcal{O}{(1/\eps^3)}$ stochastic gradient computations.

First-Order Adaptive Sample Size Methods to Reduce Complexity of Empirical Risk Minimization

no code implementations NeurIPS 2017 Aryan Mokhtari, Alejandro Ribeiro

Theoretical analyses show that the use of adaptive sample size methods reduces the overall computational cost of achieving the statistical accuracy of the whole dataset for a broad range of deterministic and stochastic first-order methods.

Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

no code implementations22 May 2017 Mark Eisen, Aryan Mokhtari, Alejandro Ribeiro

In this paper, we propose a novel adaptive sample size second-order method, which reduces the cost of computing the Hessian by solving a sequence of ERM problems corresponding to a subset of samples and lowers the cost of computing the Hessian inverse using a truncated eigenvalue decomposition.

Second-order methods

IQN: An Incremental Quasi-Newton Method with Local Superlinear Convergence Rate

no code implementations2 Feb 2017 Aryan Mokhtari, Mark Eisen, Alejandro Ribeiro

This makes their computational cost per iteration independent of the number of objective functions $n$.

Surpassing Gradient Descent Provably: A Cyclic Incremental Method with Linear Convergence Rate

no code implementations1 Nov 2016 Aryan Mokhtari, Mert Gürbüzbalaban, Alejandro Ribeiro

We prove that not only the proposed DIAG method converges linearly to the optimal solution, but also its linear convergence factor justifies the advantage of incremental methods on GD.

Stochastic Averaging for Constrained Optimization with Application to Online Resource Allocation

no code implementations7 Oct 2016 Tianyi Chen, Aryan Mokhtari, Xin Wang, Alejandro Ribeiro, Georgios B. Giannakis

Existing approaches to resource allocation for nowadays stochastic networks are challenged to meet fast convergence and tolerable delay requirements.

A Class of Parallel Doubly Stochastic Algorithms for Large-Scale Learning

no code implementations15 Jun 2016 Aryan Mokhtari, Alec Koppel, Alejandro Ribeiro

Algorithms that are parallel in either of these dimensions exist, but RAPSA is the first attempt at a methodology that is parallel in both the selection of blocks and the selection of elements of the training set.

Image Classification

A Decentralized Quasi-Newton Method for Dual Formulations of Consensus Optimization

no code implementations23 Mar 2016 Mark Eisen, Aryan Mokhtari, Alejandro Ribeiro

The resulting dual D-BFGS method is a fully decentralized algorithm in which nodes approximate curvature information of themselves and their neighbors through the satisfaction of a secant condition.

Second-order methods

Doubly Random Parallel Stochastic Methods for Large Scale Learning

no code implementations22 Mar 2016 Aryan Mokhtari, Alec Koppel, Alejandro Ribeiro

Algorithms that are parallel in either of these dimensions exist, but RAPSA is the first attempt at a methodology that is parallel in both, the selection of blocks and the selection of elements of the training set.

DSA: Decentralized Double Stochastic Averaging Gradient Algorithm

no code implementations13 Jun 2015 Aryan Mokhtari, Alejandro Ribeiro

The decentralized double stochastic averaging gradient (DSA) algorithm is proposed as a solution alternative that relies on: (i) The use of local stochastic averaging gradients.

Optimization and Control

Global Convergence of Online Limited Memory BFGS

no code implementations6 Sep 2014 Aryan Mokhtari, Alejandro Ribeiro

Global convergence of an online (stochastic) limited memory version of the Broyden-Fletcher- Goldfarb-Shanno (BFGS) quasi-Newton method for solving optimization problems with stochastic objectives that arise in large scale machine learning is established.

A Quasi-Newton Method for Large Scale Support Vector Machines

no code implementations20 Feb 2014 Aryan Mokhtari, Alejandro Ribeiro

This paper adapts a recently developed regularized stochastic version of the Broyden, Fletcher, Goldfarb, and Shanno (BFGS) quasi-Newton method for the solution of support vector machine classification problems.

General Classification

RES: Regularized Stochastic BFGS Algorithm

no code implementations29 Jan 2014 Aryan Mokhtari, Alejandro Ribeiro

Numerical experiments showcase reductions in convergence time relative to stochastic gradient descent algorithms and non-regularized stochastic versions of BFGS.

Second-order methods

Cannot find the paper you are looking for? You can Submit a new open access paper.