no code implementations • 4 Mar 2024 • Umut Şimşekli, Mert Gürbüzbalaban, Sinan Yildirim, Lingjiong Zhu
Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years.
no code implementations • 10 Feb 2023 • Mert Gürbüzbalaban, Yuanhan Hu, Umut Şimşekli, Lingjiong Zhu
Our results bring a new understanding of the benefits of cyclic and randomized stepsizes compared to constant stepsize in terms of the tail behavior.
no code implementations • 27 Jan 2023 • Anant Raj, Lingjiong Zhu, Mert Gürbüzbalaban, Umut Şimşekli
Very recently, new generalization bounds have been proven, indicating a non-monotonic relationship between the generalization error and heavy tails, which is more pertinent to the reported empirical observations.
no code implementations • 29 Nov 2022 • Mert Gürbüzbalaban, Yuanhan Hu, Lingjiong Zhu
When $f$ is smooth and gradients are available, we get $\tilde{\mathcal{O}}(d/\varepsilon^{10})$ iteration complexity for PLD to sample the target up to an $\varepsilon$-error where the error is measured in the TV distance and $\tilde{\mathcal{O}}(\cdot)$ hides logarithmic factors.
no code implementations • 2 Jun 2022 • Anant Raj, Melih Barsbey, Mert Gürbüzbalaban, Lingjiong Zhu, Umut Şimşekli
Recent studies have shown that heavy tails can emerge in stochastic optimization and that the heaviness of the tails have links to the generalization error.
no code implementations • NeurIPS 2021 • Alexander Camuto, George Deligiannidis, Murat A. Erdogdu, Mert Gürbüzbalaban, Umut Şimşekli, Lingjiong Zhu
As our main contribution, we prove that the generalization error of a stochastic optimization algorithm can be bounded based on the `complexity' of the fractal structure that underlies its invariant measure.
1 code implementation • 7 Jun 2021 • Saeed Soori, Bugra Can, Baourun Mu, Mert Gürbüzbalaban, Maryam Mehri Dehnavi
This work proposes a time-efficient Natural Gradient Descent method, called TENGraD, with linear convergence guarantees.
no code implementations • NeurIPS 2021 • Hongjian Wang, Mert Gürbüzbalaban, Lingjiong Zhu, Umut Şimşekli, Murat A. Erdogdu
In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives.
1 code implementation • 13 Feb 2021 • Alexander Camuto, Xiaoyu Wang, Lingjiong Zhu, Chris Holmes, Mert Gürbüzbalaban, Umut Şimşekli
In this paper we focus on the so-called `implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of SGD.
no code implementations • 1 Jul 2020 • Mert Gürbüzbalaban, Xuefeng Gao, Yuanhan Hu, Lingjiong Zhu
Stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian Monte Carlo (SGHMC) are two popular Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference that can scale to large datasets, allowing to sample from the posterior distribution of the parameters of a statistical model given the input data and the prior distribution over the model parameters.
no code implementations • NeurIPS 2020 • Yossi Arjevani, Joan Bruna, Bugra Can, Mert Gürbüzbalaban, Stefanie Jegelka, Hongzhou Lin
We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex.
no code implementations • 8 Jun 2020 • Mert Gürbüzbalaban, Andrzej Ruszczyński, Landi Zhu
We consider a distributionally robust formulation of stochastic optimization problems arising in statistical learning, where robustness is with respect to uncertainty in the underlying data distribution.
1 code implementation • ICML 2020 • Umut Şimşekli, Lingjiong Zhu, Yee Whye Teh, Mert Gürbüzbalaban
Stochastic gradient descent with momentum (SGDm) is one of the most popular optimization algorithms in deep learning.
no code implementations • 29 Nov 2019 • Umut Şimşekli, Mert Gürbüzbalaban, Thanh Huy Nguyen, Gaël Richard, Levent Sagun
This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion.
1 code implementation • NeurIPS 2019 • Thanh Huy Nguyen, Umut Şimşekli, Mert Gürbüzbalaban, Gaël Richard
We show that the behaviors of the two systems are indeed similar for small step-sizes and we identify how the error depends on the algorithm and problem parameters.
no code implementations • 12 Sep 2018 • Xuefeng Gao, Mert Gürbüzbalaban, Lingjiong Zhu
We provide finite-time performance bounds for the global convergence of both SGHMC variants for solving stochastic non-convex optimization problems with explicit constants.
no code implementations • 1 Nov 2016 • Aryan Mokhtari, Mert Gürbüzbalaban, Alejandro Ribeiro
We prove that not only the proposed DIAG method converges linearly to the optimal solution, but also its linear convergence factor justifies the advantage of incremental methods on GD.