Search Results for author: Umut Şimşekli

Found 43 papers, 18 papers with code

Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations

no code implementations4 Mar 2024 Umut Şimşekli, Mert Gürbüzbalaban, Sinan Yildirim, Lingjiong Zhu

Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years.

Learning Theory

Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation

no code implementations12 Feb 2024 Benjamin Dupuis, Umut Şimşekli

Understanding the generalization properties of heavy-tailed stochastic optimization algorithms has attracted increasing attention over the past years.

Generalization Bounds Stochastic Optimization

Tighter Generalisation Bounds via Interpolation

no code implementations7 Feb 2024 Paul Viallard, Maxime Haddouche, Umut Şimşekli, Benjamin Guedj

We also instantiate our bounds as training objectives, yielding non-trivial guarantees and practical performances.

Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

no code implementations10 Feb 2023 Mert Gürbüzbalaban, Yuanhan Hu, Umut Şimşekli, Lingjiong Zhu

Our results bring a new understanding of the benefits of cyclic and randomized stepsizes compared to constant stepsize in terms of the tail behavior.

Scheduling

Generalization Bounds with Data-dependent Fractal Dimensions

1 code implementation6 Feb 2023 Benjamin Dupuis, George Deligiannidis, Umut Şimşekli

To achieve this goal, we build up on a classical covering argument in learning theory and introduce a data-dependent fractal dimension.

Generalization Bounds Learning Theory +1

Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions

no code implementations27 Jan 2023 Anant Raj, Lingjiong Zhu, Mert Gürbüzbalaban, Umut Şimşekli

Very recently, new generalization bounds have been proven, indicating a non-monotonic relationship between the generalization error and heavy tails, which is more pertinent to the reported empirical observations.

Generalization Bounds

Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares

no code implementations2 Jun 2022 Anant Raj, Melih Barsbey, Mert Gürbüzbalaban, Lingjiong Zhu, Umut Şimşekli

Recent studies have shown that heavy tails can emerge in stochastic optimization and that the heaviness of the tails have links to the generalization error.

Stochastic Optimization

Chaotic Regularization and Heavy-Tailed Limits for Deterministic Gradient Descent

1 code implementation23 May 2022 Soon Hoe Lim, Yijun Wan, Umut Şimşekli

Recent studies have shown that gradient descent (GD) can achieve improved generalization when its dynamics exhibits a chaotic behavior.

Generalization Bounds

Rate-Distortion Theoretic Generalization Bounds for Stochastic Learning Algorithms

no code implementations4 Mar 2022 Milad Sefidgaran, Amin Gohari, Gaël Richard, Umut Şimşekli

Understanding generalization in modern machine learning settings has been one of the major challenges in statistical learning theory.

Generalization Bounds Learning Theory

Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks

2 code implementations NeurIPS 2021 Tolga Birdal, Aaron Lou, Leonidas Guibas, Umut Şimşekli

Disobeying the classical wisdom of statistical learning theory, modern deep neural networks generalize well even though they typically contain millions of parameters.

Learning Theory Topological Data Analysis

Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers

no code implementations2 Aug 2021 Liam Hodgkinson, Umut Şimşekli, Rajiv Khanna, Michael W. Mahoney

Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood.

Generalization Bounds Stochastic Optimization

Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections

2 code implementations NeurIPS 2021 Kimia Nadjahi, Alain Durmus, Pierre E. Jacob, Roland Badeau, Umut Şimşekli

The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits.

Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms

no code implementations NeurIPS 2021 Alexander Camuto, George Deligiannidis, Murat A. Erdogdu, Mert Gürbüzbalaban, Umut Şimşekli, Lingjiong Zhu

As our main contribution, we prove that the generalization error of a stochastic optimization algorithm can be bounded based on the `complexity' of the fractal structure that underlies its invariant measure.

Generalization Bounds Learning Theory +1

Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks

1 code implementation NeurIPS 2021 Melih Barsbey, Milad Sefidgaran, Murat A. Erdogdu, Gaël Richard, Umut Şimşekli

Neural network compression techniques have become increasingly popular as they can drastically reduce the storage and computation requirements for very large networks.

Generalization Bounds Neural Network Compression

Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

no code implementations NeurIPS 2021 Hongjian Wang, Mert Gürbüzbalaban, Lingjiong Zhu, Umut Şimşekli, Murat A. Erdogdu

In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives.

Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections

1 code implementation13 Feb 2021 Alexander Camuto, Xiaoyu Wang, Lingjiong Zhu, Chris Holmes, Mert Gürbüzbalaban, Umut Şimşekli

In this paper we focus on the so-called `implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of SGD.

Self-Supervised VQ-VAE for One-Shot Music Style Transfer

1 code implementation10 Feb 2021 Ondřej Cífka, Alexey Ozerov, Umut Şimşekli, Gaël Richard

While several style conversion methods tailored to musical signals have been proposed, most lack the 'one-shot' capability of classical image style transfer algorithms.

Music Style Transfer Self-Supervised Learning +1

Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks

1 code implementation NeurIPS 2020 Umut Şimşekli, Ozan Sener, George Deligiannidis, Murat A. Erdogdu

Despite its success in a wide range of applications, characterizing the generalization properties of stochastic gradient descent (SGD) in non-convex deep learning problems is still an important challenge.

Generalization Bounds

The Heavy-Tail Phenomenon in SGD

1 code implementation8 Jun 2020 Mert Gurbuzbalaban, Umut Şimşekli, Lingjiong Zhu

We claim that depending on the structure of the Hessian of the loss at the minimum, and the choices of the algorithm parameters $\eta$ and $b$, the SGD iterates will converge to a \emph{heavy-tailed} stationary distribution.

Synchronizing Probability Measures on Rotations via Optimal Transport

no code implementations CVPR 2020 Tolga Birdal, Michael Arbel, Umut Şimşekli, Leonidas Guibas

We introduce a new paradigm, $\textit{measure synchronization}$, for synchronizing graphs with measure-valued edges.

Pose Estimation

Statistical and Topological Properties of Sliced Probability Divergences

1 code implementation NeurIPS 2020 Kimia Nadjahi, Alain Durmus, Lénaïc Chizat, Soheil Kolouri, Shahin Shahrampour, Umut Şimşekli

The idea of slicing divergences has been proven to be successful when comparing two probability measures in various machine learning applications including generative modeling, and consists in computing the expected value of a `base divergence' between one-dimensional random projections of the two measures.

On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks

no code implementations29 Nov 2019 Umut Şimşekli, Mert Gürbüzbalaban, Thanh Huy Nguyen, Gaël Richard, Levent Sagun

This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion.

Approximate Bayesian Computation with the Sliced-Wasserstein Distance

1 code implementation28 Oct 2019 Kimia Nadjahi, Valentin De Bortoli, Alain Durmus, Roland Badeau, Umut Şimşekli

Approximate Bayesian Computation (ABC) is a popular method for approximate inference in generative models with intractable but easy-to-sample likelihood.

Image Denoising

Bayesian Model Selection for Identifying Markov Equivalent Causal Graphs

no code implementations pproximateinference AABI Symposium 2019 Mehmet Burak Kurutmaz, Melih Barsbey, Ali Taylan Cemgil, Sinan Yildirim, Umut Şimşekli

We believe that the Bayesian approach to causal discovery both allows the rich methodology of Bayesian inference to be used in various difficult aspects of this problem and provides a unifying framework to causal discovery research.

Bayesian Inference Causal Discovery +1

Supervised Symbolic Music Style Translation Using Synthetic Data

1 code implementation4 Jul 2019 Ondřej Cífka, Umut Şimşekli, Gaël Richard

Research on style transfer and domain translation has clearly demonstrated the ability of deep learning-based algorithms to manipulate images in terms of artistic style.

Music Genre Transfer Style Transfer +2

First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

1 code implementation NeurIPS 2019 Thanh Huy Nguyen, Umut Şimşekli, Mert Gürbüzbalaban, Gaël Richard

We show that the behaviors of the two systems are indeed similar for small step-sizes and we identify how the error depends on the algorithm and problem parameters.

Computational Efficiency

Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance

1 code implementation NeurIPS 2019 Kimia Nadjahi, Alain Durmus, Umut Şimşekli, Roland Badeau

Minimum expected distance estimation (MEDE) algorithms have been widely used for probabilistic models with intractable likelihood functions and they have become increasingly popular due to their use in implicit generative modeling (e. g. Wasserstein generative adversarial networks, Wasserstein autoencoders).

Probabilistic Permutation Synchronization using the Riemannian Structure of the Birkhoff Polytope

no code implementations CVPR 2019 Tolga Birdal, Umut Şimşekli

We present an entirely new geometric and probabilistic approach to synchronization of correspondences across multiple sets of objects or images.

Graph Matching

Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization

no code implementations22 Jan 2019 Thanh Huy Nguyen, Umut Şimşekli, Gaël Richard

Recent studies on diffusion-based sampling methods have shown that Langevin Monte Carlo (LMC) algorithms can be beneficial for non-convex optimization, and rigorous theoretical guarantees have been proven for both asymptotic and finite-time regimes.

Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions

1 code implementation21 Jun 2018 Antoine Liutkus, Umut Şimşekli, Szymon Majewski, Alain Durmus, Fabian-Robert Stöter

To the best of our knowledge, the proposed algorithm is the first nonparametric IGM algorithm with explicit theoretical guarantees.

Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization

no code implementations ICML 2018 Umut Şimşekli, Çağatay Yıldız, Thanh Huy Nguyen, Gaël Richard, A. Taylan Cemgil

The results support our theory and show that the proposed algorithm provides a significant speedup over the recently proposed synchronous distributed L-BFGS algorithm.

Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC

no code implementations NeurIPS 2018 Tolga Birdal, Umut Şimşekli, M. Onur Eken, Slobodan Ilic

We introduce Tempered Geodesic Markov Chain Monte Carlo (TG-MCMC) algorithm for initializing pose graph optimization problems, arising in various scenarios such as SFM (structure from motion) or SLAM (simultaneous localization and mapping).

Simultaneous Localization and Mapping

Fractional Langevin Monte Carlo: Exploring Levy Driven Stochastic Differential Equations for MCMC

no code implementations ICML 2017 Umut Şimşekli

These so called Langevin Monte Carlo (LMC) methods are based on diffusions driven by a Brownian motion, which gives rise to Gaussian proposal distributions in the resulting algorithms.

Efficient Exploration

Fractional Langevin Monte Carlo: Exploring Lévy Driven Stochastic Differential Equations for Markov Chain Monte Carlo

no code implementations12 Jun 2017 Umut Şimşekli

These so called Langevin Monte Carlo (LMC) methods are based on diffusions driven by a Brownian motion, which gives rise to Gaussian proposal distributions in the resulting algorithms.

Efficient Exploration

Learning the Morphology of Brain Signals Using Alpha-Stable Convolutional Sparse Coding

no code implementations NeurIPS 2017 Mainak Jas, Tom Dupré La Tour, Umut Şimşekli, Alexandre Gramfort

Neural time-series data contain a wide variety of prototypical signal waveforms (atoms) that are of significant importance in clinical and cognitive research.

Time Series Time Series Analysis

Stochastic Quasi-Newton Langevin Monte Carlo

no code implementations10 Feb 2016 Umut Şimşekli, Roland Badeau, A. Taylan Cemgil, Gaël Richard

These second order methods directly approximate the inverse Hessian by using a limited history of samples and their gradients.

Second-order methods

HAMSI: A Parallel Incremental Optimization Algorithm Using Quadratic Approximations for Solving Partially Separable Problems

no code implementations5 Sep 2015 Kamer Kaya, Figen Öztoprak, Ş. İlker Birbil, A. Taylan Cemgil, Umut Şimşekli, Nurdan Kuru, Hazal Koptagel, M. Kaan Öztürk

We propose HAMSI (Hessian Approximated Multiple Subsets Iteration), which is a provably convergent, second order incremental algorithm for solving large-scale partially separable optimization problems.

Parallel Stochastic Gradient Markov Chain Monte Carlo for Matrix Factorisation Models

no code implementations3 Jun 2015 Umut Şimşekli, Hazal Koptagel, Hakan Güldaş, A. Taylan Cemgil, Figen Öztoprak, Ş. İlker Birbil

For large matrix factorisation problems, we develop a distributed Markov Chain Monte Carlo (MCMC) method based on stochastic gradient Langevin dynamics (SGLD) that we call Parallel SGLD (PSGLD).

Cannot find the paper you are looking for? You can Submit a new open access paper.