no code implementations • 22 Aug 2023 • Amirhossein Reisizadeh, Khashayar Gatmiry, Asuman Ozdaglar
In many settings however, heterogeneous data may be generated in clusters with shared structures, as is the case in several applications such as federated learning where a common latent variable governs the distribution of all the samples generated by a client.
no code implementations • 24 Jun 2023 • Haoyuan Sun, Khashayar Gatmiry, Kwangjun Ahn, Navid Azizan
However, the implicit regularization of different algorithms are confined to either a specific geometry or a particular class of learning problems, indicating a gap in a general approach for controlling the implicit regularization.
no code implementations • 22 Jun 2023 • Khashayar Gatmiry, Zhiyuan Li, Ching-Yao Chuang, Sashank Reddi, Tengyu Ma, Stefanie Jegelka
Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions.
no code implementations • 10 Apr 2023 • Yuansi Chen, Khashayar Gatmiry
We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator to sample from a distribution on $\mathbb{R}^d$ whose log-density is smooth, has Lipschitz Hessian in Frobenius norm and satisfies isoperimetry.
no code implementations • 8 Apr 2023 • Yuansi Chen, Khashayar Gatmiry
We study the mixing time of Metropolis-Adjusted Langevin algorithm (MALA) for sampling a target density on $\mathbb{R}^d$.
no code implementations • 1 Mar 2023 • Khashayar Gatmiry, Jonathan Kelner, Santosh S. Vempala
We introduce a hybrid of the Lewis weights barrier and the standard logarithmic barrier and prove that the mixing rate for the corresponding RHMC is bounded by $\tilde O(m^{1/3}n^{4/3})$, improving on the previous best bound of $\tilde O(mn^{2/3})$ (based on the log barrier).
no code implementations • 28 Dec 2022 • Tasuku Soma, Khashayar Gatmiry, Stefanie Jegelka
Distributionally robust optimization (DRO) can improve the robustness and fairness of learning methods.
no code implementations • 16 Nov 2022 • Khashayar Gatmiry, Thomas Kesselheim, Sahil Singla, Yifan Wang
The goal is to minimize the regret, which is the difference over $T$ rounds in the total value of the optimal algorithm that knows the distributions vs. the total value of our algorithm that learns the distributions from the partial feedback.
no code implementations • 2 Nov 2022 • Zakaria Mhammedi, Khashayar Gatmiry
Typical algorithms for these settings, such as the Online Newton Step (ONS), can guarantee a $O(d\ln T)$ bound on their regret after $T$ rounds, where $d$ is the dimension of the feasible set.
no code implementations • 16 Aug 2022 • Nisha Chandramoorthy, Andreas Loukas, Khashayar Gatmiry, Stefanie Jegelka
To reduce this discrepancy between theory and practice, this paper focuses on the generalization of neural networks whose training dynamics do not necessarily converge to fixed points.
no code implementations • 22 Apr 2022 • Khashayar Gatmiry, Santosh S. Vempala
We study the Riemannian Langevin Algorithm for the problem of sampling from a distribution with density $\nu$ with respect to the natural measure on a manifold with metric $g$.
no code implementations • ICLR 2022 • Khashayar Gatmiry, Stefanie Jegelka, Jonathan Kelner
While there has been substantial recent work studying generalization of neural networks, the ability of deep nets in automating the process of feature extraction still evades a thorough mathematical understanding.
no code implementations • NeurIPS 2020 • Khashayar Gatmiry, Maryam Aliakbarpour, Stefanie Jegelka
Determinantal point processes (DPPs) are popular probabilistic models of diversity.
no code implementations • 19 Nov 2018 • Khashayar Gatmiry, Manuel Gomez-Rodriguez
Then, we show that the same greedy algorithm offers a constant approximation factor of $(1 + 1/(1-\alpha))^{-1}$, where $\alpha$ is the generalized curvature of the function.