Search Results for author: Yuanhan Hu

Found 6 papers, 0 papers with code

Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

no code implementations • 10 Feb 2023 • Mert Gürbüzbalaban, Yuanhan Hu, Umut Şimşekli, Lingjiong Zhu

Our results bring a new understanding of the benefits of cyclic and randomized stepsizes compared to constant stepsize in terms of the tail behavior.

Scheduling

Paper
Add Code

Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling

no code implementations • 29 Nov 2022 • Mert Gürbüzbalaban, Yuanhan Hu, Lingjiong Zhu

When $f$ is smooth and gradients are available, we get $\tilde{\mathcal{O}}(d/\varepsilon^{10})$ iteration complexity for PLD to sample the target up to an $\varepsilon$-error where the error is measured in the TV distance and $\tilde{\mathcal{O}}(\cdot)$ hides logarithmic factors.

Paper
Add Code

Heavy-Tail Phenomenon in Decentralized SGD

no code implementations • 13 May 2022 • Mert Gurbuzbalaban, Yuanhan Hu, Umut Simsekli, Kun Yuan, Lingjiong Zhu

To have a more explicit control on the tail exponent, we then consider the case where the loss at each node is a quadratic, and show that the tail-index can be estimated as a function of the step-size, batch-size, and the topological properties of the network of the computational nodes.

Stochastic Optimization

Paper
Add Code

Decentralized Stochastic Gradient Langevin Dynamics and Hamiltonian Monte Carlo

no code implementations • 1 Jul 2020 • Mert Gürbüzbalaban, Xuefeng Gao, Yuanhan Hu, Lingjiong Zhu

Stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian Monte Carlo (SGHMC) are two popular Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference that can scale to large datasets, allowing to sample from the posterior distribution of the parameters of a statistical model given the input data and the prior distribution over the model parameters.

Bayesian Inference regression

Paper
Add Code

Fractional moment-preserving initialization schemes for training deep neural networks

no code implementations • 25 May 2020 • Mert Gurbuzbalaban, Yuanhan Hu

We prove that the logarithm of the norm of the network outputs, if properly scaled, will converge to a Gaussian distribution with an explicit mean and variance we can compute depending on the activation used, the value of s chosen and the network width.

Paper
Add Code

Non-Convex Optimization via Non-Reversible Stochastic Gradient Langevin Dynamics

no code implementations • 6 Apr 2020 • Yuanhan Hu, Xiaoyu Wang, Xuefeng Gao, Mert Gurbuzbalaban, Lingjiong Zhu

In this paper, we study the non reversible Stochastic Gradient Langevin Dynamics (NSGLD) which is based on discretization of the non-reversible Langevin diffusion.

Stochastic Optimization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.