1 code implementation • 31 Aug 2023 • Davoud Ataee Tarzanagh, Yingcong Li, Christos Thrampoulidis, Samet Oymak
In this work, we establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem that separates optimal input tokens from non-optimal tokens using linear constraints on the outer-products of token pairs.
1 code implementation • NeurIPS 2023 • Davoud Ataee Tarzanagh, Yingcong Li, Xuechen Zhang, Samet Oymak
Interestingly, the SVM formulation of $\boldsymbol{p}$ is influenced by the support vector geometry of $\boldsymbol{v}$.
1 code implementation • 2 Jun 2023 • Davoud Ataee Tarzanagh, Mingchen Li, Pranay Sharma, Samet Oymak
Stochastic approximation with multiple coupled sequences (MSA) has found broad applications in machine learning as it encompasses a rich class of problems including bilevel optimization (BLO), multi-level compositional optimization (MCO), and reinforcement learning (specifically, actor-critic methods).
no code implementations • 8 Nov 2022 • Parvin Nazari, Ahmad Mousavi, Davoud Ataee Tarzanagh, George Michailidis
A key feature of the proposed algorithm is to estimate the hyper-gradient of the penalty function via decentralized computation of matrix-vector products and few vector communications, which is then integrated within an alternating algorithm to obtain finite-time convergence analysis under different convexity assumptions.
1 code implementation • 6 Jul 2022 • Davoud Ataee Tarzanagh, Parvin Nazari, BoJian Hou, Li Shen, Laura Balzano
This paper introduces \textit{online bilevel optimization} in which a sequence of time-varying bilevel problems is revealed one after the other.
3 code implementations • 4 May 2022 • Davoud Ataee Tarzanagh, Mingchen Li, Christos Thrampoulidis, Samet Oymak
Standard federated optimization methods successfully apply to stochastic problems with single-level structure.
no code implementations • 9 Dec 2021 • Davoud Ataee Tarzanagh, Laura Balzano, Alfred O. Hero
In particular, we assume there is some community or clustering structure in the true underlying graph, and we seek to learn a sparse undirected graph and its communities from the data such that demographic groups are fairly represented within the communities.
no code implementations • 13 Nov 2021 • Yahya Sattar, Zhe Du, Davoud Ataee Tarzanagh, Laura Balzano, Necmiye Ozay, Samet Oymak
Combining our sample complexity results with recent perturbation results for certainty equivalent control, we prove that when the episode lengths are appropriately chosen, the proposed adaptive control scheme achieves $\mathcal{O}(\sqrt{T})$ regret, which can be improved to $\mathcal{O}(polylog(T))$ with partial knowledge of the system.
no code implementations • 26 May 2021 • Zhe Du, Yahya Sattar, Davoud Ataee Tarzanagh, Laura Balzano, Samet Oymak, Necmiye Ozay
Real-world control applications often involve complex dynamics subject to abrupt changes or variations.
no code implementations • 26 Apr 2021 • Babak Barazandeh, Davoud Ataee Tarzanagh, George Michailidis
Adaptive momentum methods have recently attracted a lot of attention for training of deep neural networks.
no code implementations • 19 May 2020 • Parvin Nazari, Davoud Ataee Tarzanagh, George Michailidis
In this paper, we design and analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) stochastic optimization problems.
1 code implementation • 30 Jan 2020 • Kyle Gilman, Davoud Ataee Tarzanagh, Laura Balzano
We propose a new fast streaming algorithm for the tensor completion problem of imputing missing entries of a low-tubal-rank tensor using the tensor singular value decomposition (t-SVD) algebraic framework.
1 code implementation • 24 Nov 2019 • Davoud Ataee Tarzanagh, George Michailidis
We introduce a general tensor model suitable for data analytic tasks for {\em heterogeneous} datasets, wherein there are joint low-rank structures within groups of observations, but also discriminative structures across different groups.
no code implementations • 17 May 2019 • Davoud Ataee Tarzanagh, Mohamad Kazem Shirani Faradonbeh, George Michailidis
Principal components analysis (PCA) is a widely used dimension reduction technique with an extensive range of applications.
1 code implementation • ICLR 2019 • Parvin Nazari, Davoud Ataee Tarzanagh, George Michailidis
Adaptive gradient-based optimization methods such as \textsc{Adagrad}, \textsc{Rmsprop}, and \textsc{Adam} are widely used in solving large-scale machine learning problems including deep learning.