Search Results for author: Khashayar Gatmiry

Found 21 papers, 0 papers with code

On the Role of Depth and Looping for In-Context Learning with Task Diversity

no code implementations29 Oct 2024 Khashayar Gatmiry, Nikunj Saunshi, Sashank J. Reddi, Stefanie Jegelka, Sanjiv Kumar

By studying in-context linear regression on unimodal Gaussian data, recent empirical and theoretical works have argued that ICL emerges from Transformers' abilities to simulate learning algorithms like gradient descent.

Diversity In-Context Learning +1

Computing Optimal Regularizers for Online Linear Optimization

no code implementations22 Oct 2024 Khashayar Gatmiry, Jon Schneider, Stefanie Jegelka

Follow-the-Regularized-Leader (FTRL) algorithms are a popular class of learning algorithms for online linear optimization (OLO) that guarantee sub-linear regret, but the choice of regularizer can significantly impact dimension-dependent factors in the regret bound.

Simplicity Bias via Global Convergence of Sharpness Minimization

no code implementations21 Oct 2024 Khashayar Gatmiry, Zhiyuan Li, Sashank J. Reddi, Stefanie Jegelka

To obtain this result, our main technical contribution is to show that label noise SGD always minimizes the sharpness on the manifold of models with zero loss for two-layer networks.

What does guidance do? A fine-grained analysis in a simple setting

no code implementations19 Sep 2024 Muthu Chidambaram, Khashayar Gatmiry, Sitan Chen, Holden Lee, Jianfeng Lu

The use of guidance in diffusion models was originally motivated by the premise that the guidance-modified score is that of the data distribution tilted by a conditional likelihood raised to some power.

Adversarial Online Learning with Temporal Feedback Graphs

no code implementations30 Jun 2024 Khashayar Gatmiry, Jon Schneider

We study a variant of prediction with expert advice where the learner's action at round $t$ is only allowed to depend on losses on a specific subset of the rounds (where the structure of which rounds' losses are visible at time $t$ is provided by a directed "feedback graph" known to the learner).

Learning Mixtures of Gaussians Using Diffusion Models

no code implementations29 Apr 2024 Khashayar Gatmiry, Jonathan Kelner, Holden Lee

We give a new algorithm for learning mixtures of $k$ Gaussians (with identity covariance in $\mathbb{R}^n$) to TV error $\varepsilon$, with quasi-polynomial ($O(n^{\text{poly log}\left(\frac{n+k}{\varepsilon}\right)})$) time and sample complexity, under a minimum weight assumption.

Image Generation

EM for Mixture of Linear Regression with Clustered Data

no code implementations22 Aug 2023 Amirhossein Reisizadeh, Khashayar Gatmiry, Asuman Ozdaglar

In many settings however, heterogeneous data may be generated in clusters with shared structures, as is the case in several applications such as federated learning where a common latent variable governs the distribution of all the samples generated by a client.

Federated Learning regression

A Unified Approach to Controlling Implicit Regularization via Mirror Descent

no code implementations24 Jun 2023 Haoyuan Sun, Khashayar Gatmiry, Kwangjun Ahn, Navid Azizan

However, the implicit regularization of different algorithms are confined to either a specific geometry or a particular class of learning problems, indicating a gap in a general approach for controlling the implicit regularization.

Classification regression

The Inductive Bias of Flatness Regularization for Deep Matrix Factorization

no code implementations22 Jun 2023 Khashayar Gatmiry, Zhiyuan Li, Ching-Yao Chuang, Sashank Reddi, Tengyu Ma, Stefanie Jegelka

Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions.

Inductive Bias

When does Metropolized Hamiltonian Monte Carlo provably outperform Metropolis-adjusted Langevin algorithm?

no code implementations10 Apr 2023 Yuansi Chen, Khashayar Gatmiry

We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator to sample from a distribution on $\mathbb{R}^d$ whose log-density is smooth, has Lipschitz Hessian in Frobenius norm and satisfies isoperimetry.

A Simple Proof of the Mixing of Metropolis-Adjusted Langevin Algorithm under Smoothness and Isoperimetry

no code implementations8 Apr 2023 Yuansi Chen, Khashayar Gatmiry

We study the mixing time of Metropolis-Adjusted Langevin algorithm (MALA) for sampling a target density on $\mathbb{R}^d$.

Sampling with Barriers: Faster Mixing via Lewis Weights

no code implementations1 Mar 2023 Khashayar Gatmiry, Jonathan Kelner, Santosh S. Vempala

We introduce a hybrid of the Lewis weights barrier and the standard logarithmic barrier and prove that the mixing rate for the corresponding RHMC is bounded by $\tilde O(m^{1/3}n^{4/3})$, improving on the previous best bound of $\tilde O(mn^{2/3})$ (based on the log barrier).

Optimal algorithms for group distributionally robust optimization and beyond

no code implementations28 Dec 2022 Tasuku Soma, Khashayar Gatmiry, Stefanie Jegelka

Distributionally robust optimization (DRO) can improve the robustness and fairness of learning methods.

Fairness

Bandit Algorithms for Prophet Inequality and Pandora's Box

no code implementations16 Nov 2022 Khashayar Gatmiry, Thomas Kesselheim, Sahil Singla, Yifan Wang

The goal is to minimize the regret, which is the difference over $T$ rounds in the total value of the optimal algorithm that knows the distributions vs. the total value of our algorithm that learns the distributions from the partial feedback.

Multi-Armed Bandits Stochastic Optimization

Quasi-Newton Steps for Efficient Online Exp-Concave Optimization

no code implementations2 Nov 2022 Zakaria Mhammedi, Khashayar Gatmiry

Typical algorithms for these settings, such as the Online Newton Step (ONS), can guarantee a $O(d\ln T)$ bound on their regret after $T$ rounds, where $d$ is the dimension of the feasible set.

Open-Ended Question Answering

On the generalization of learning algorithms that do not converge

no code implementations16 Aug 2022 Nisha Chandramoorthy, Andreas Loukas, Khashayar Gatmiry, Stefanie Jegelka

To reduce this discrepancy between theory and practice, this paper focuses on the generalization of neural networks whose training dynamics do not necessarily converge to fixed points.

Learning Theory

Convergence of the Riemannian Langevin Algorithm

no code implementations22 Apr 2022 Khashayar Gatmiry, Santosh S. Vempala

We study the Riemannian Langevin Algorithm for the problem of sampling from a distribution with density $\nu$ with respect to the natural measure on a manifold with metric $g$.

Optimization and Adaptive Generalization of Three layer Neural Networks

no code implementations ICLR 2022 Khashayar Gatmiry, Stefanie Jegelka, Jonathan Kelner

While there has been substantial recent work studying generalization of neural networks, the ability of deep nets in automating the process of feature extraction still evades a thorough mathematical understanding.

Generalization Bounds

Non-submodular Function Maximization subject to a Matroid Constraint, with Applications

no code implementations19 Nov 2018 Khashayar Gatmiry, Manuel Gomez-Rodriguez

Then, we show that the same greedy algorithm offers a constant approximation factor of $(1 + 1/(1-\alpha))^{-1}$, where $\alpha$ is the generalized curvature of the function.

Point Processes

Cannot find the paper you are looking for? You can Submit a new open access paper.