Search Results for author: Umut Şimşekli

Found 43 papers, 18 papers with code

Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations

no code implementations • 4 Mar 2024 • Umut Şimşekli, Mert Gürbüzbalaban, Sinan Yildirim, Lingjiong Zhu

Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years.

Learning Theory

Paper
Add Code

Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation

no code implementations • 12 Feb 2024 • Benjamin Dupuis, Umut Şimşekli

Understanding the generalization properties of heavy-tailed stochastic optimization algorithms has attracted increasing attention over the past years.

Generalization Bounds Stochastic Optimization

Paper
Add Code

Tighter Generalisation Bounds via Interpolation

no code implementations • 7 Feb 2024 • Paul Viallard, Maxime Haddouche, Umut Şimşekli, Benjamin Guedj

We also instantiate our bounds as training objectives, yielding non-trivial guarantees and practical performances.

Paper
Add Code

Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

no code implementations • 10 Feb 2023 • Mert Gürbüzbalaban, Yuanhan Hu, Umut Şimşekli, Lingjiong Zhu

Our results bring a new understanding of the benefits of cyclic and randomized stepsizes compared to constant stepsize in terms of the tail behavior.

Scheduling

Paper
Add Code

Generalization Bounds with Data-dependent Fractal Dimensions

1 code implementation • 6 Feb 2023 • Benjamin Dupuis, George Deligiannidis, Umut Şimşekli

To achieve this goal, we build up on a classical covering argument in learning theory and introduce a data-dependent fractal dimension.

Generalization Bounds Learning Theory +1

Paper
Code

Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions

no code implementations • 27 Jan 2023 • Anant Raj, Lingjiong Zhu, Mert Gürbüzbalaban, Umut Şimşekli

Very recently, new generalization bounds have been proven, indicating a non-monotonic relationship between the generalization error and heavy tails, which is more pertinent to the reported empirical observations.

Generalization Bounds

Paper
Add Code

Generalization Bounds for Stochastic Gradient Descent via Localized $\varepsilon$-Covers

no code implementations • 19 Sep 2022 • Sejun Park, Umut Şimşekli, Murat A. Erdogdu

In this paper, we propose a new covering technique localized for the trajectories of SGD.

Generalization Bounds

Paper
Add Code

Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares

no code implementations • 2 Jun 2022 • Anant Raj, Melih Barsbey, Mert Gürbüzbalaban, Lingjiong Zhu, Umut Şimşekli

Recent studies have shown that heavy tails can emerge in stochastic optimization and that the heaviness of the tails have links to the generalization error.

Stochastic Optimization

Paper
Add Code

Chaotic Regularization and Heavy-Tailed Limits for Deterministic Gradient Descent

1 code implementation • 23 May 2022 • Soon Hoe Lim, Yijun Wan, Umut Şimşekli

Recent studies have shown that gradient descent (GD) can achieve improved generalization when its dynamics exhibits a chaotic behavior.

Generalization Bounds

Paper
Code

Rate-Distortion Theoretic Generalization Bounds for Stochastic Learning Algorithms

no code implementations • 4 Mar 2022 • Milad Sefidgaran, Amin Gohari, Gaël Richard, Umut Şimşekli

Understanding generalization in modern machine learning settings has been one of the major challenges in statistical learning theory.

Generalization Bounds Learning Theory

Paper
Add Code

Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks

2 code implementations • NeurIPS 2021 • Tolga Birdal, Aaron Lou, Leonidas Guibas, Umut Şimşekli

Disobeying the classical wisdom of statistical learning theory, modern deep neural networks generalize well even though they typically contain millions of parameters.

Learning Theory Topological Data Analysis

Paper
Code

Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers

no code implementations • 2 Aug 2021 • Liam Hodgkinson, Umut Şimşekli, Rajiv Khanna, Michael W. Mahoney

Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood.

Generalization Bounds Stochastic Optimization

Paper
Add Code

Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections

2 code implementations • NeurIPS 2021 • Kimia Nadjahi, Alain Durmus, Pierre E. Jacob, Roland Badeau, Umut Şimşekli

The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits.

Paper
Code

Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms

no code implementations • NeurIPS 2021 • Alexander Camuto, George Deligiannidis, Murat A. Erdogdu, Mert Gürbüzbalaban, Umut Şimşekli, Lingjiong Zhu

As our main contribution, we prove that the generalization error of a stochastic optimization algorithm can be bounded based on the `complexity' of the fractal structure that underlies its invariant measure.

Generalization Bounds Learning Theory +1

Paper
Add Code

Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks

1 code implementation • NeurIPS 2021 • Melih Barsbey, Milad Sefidgaran, Murat A. Erdogdu, Gaël Richard, Umut Şimşekli

Neural network compression techniques have become increasingly popular as they can drastically reduce the storage and computation requirements for very large networks.

Generalization Bounds Neural Network Compression

Paper
Code

Relative Positional Encoding for Transformers with Linear Complexity

1 code implementation • 18 May 2021 • Antoine Liutkus, Ondřej Cífka, Shih-Lun Wu, Umut Şimşekli, Yi-Hsuan Yang, Gaël Richard

Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity.

Gaussian Processes Image Classification +2

Paper
Code

Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

no code implementations • NeurIPS 2021 • Hongjian Wang, Mert Gürbüzbalaban, Lingjiong Zhu, Umut Şimşekli, Murat A. Erdogdu

In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives.

Paper
Add Code

Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections

1 code implementation • 13 Feb 2021 • Alexander Camuto, Xiaoyu Wang, Lingjiong Zhu, Chris Holmes, Mert Gürbüzbalaban, Umut Şimşekli

In this paper we focus on the so-called `implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of SGD.

Paper
Code

Self-Supervised VQ-VAE for One-Shot Music Style Transfer

1 code implementation • 10 Feb 2021 • Ondřej Cífka, Alexey Ozerov, Umut Şimşekli, Gaël Richard

While several style conversion methods tailored to musical signals have been proposed, most lack the 'one-shot' capability of classical image style transfer algorithms.

Music Style Transfer Self-Supervised Learning +1

Paper
Code

Groove2Groove: One-Shot Music Style Transfer with Supervision from Synthetic Data

1 code implementation • IEEE/ACM Transactions on Audio, Speech, and Language Processing 2020 • Ondřej Cífka, Umut Şimşekli, Gaël Richard

Style transfer is the process of changing the style of an image, video, audio clip or musical piece so as to match the style of a given example.

Music Genre Transfer Music Style Transfer +2

141

Paper
Code

Explicit Regularisation in Gaussian Noise Injections

no code implementations • NeurIPS 2020 • Alexander Camuto, Matthew Willetts, Umut Şimşekli, Stephen Roberts, Chris Holmes

We study the regularisation induced in neural networks by Gaussian noise injections (GNIs).

Paper
Add Code

Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks

1 code implementation • NeurIPS 2020 • Umut Şimşekli, Ozan Sener, George Deligiannidis, Murat A. Erdogdu

Despite its success in a wide range of applications, characterizing the generalization properties of stochastic gradient descent (SGD) in non-convex deep learning problems is still an important challenge.

Generalization Bounds

Paper
Code

The Heavy-Tail Phenomenon in SGD

1 code implementation • 8 Jun 2020 • Mert Gurbuzbalaban, Umut Şimşekli, Lingjiong Zhu

We claim that depending on the structure of the Hessian of the loss at the minimum, and the choices of the algorithm parameters $\eta$ and $b$, the SGD iterates will converge to a \emph{heavy-tailed} stationary distribution.

Paper
Code

Synchronizing Probability Measures on Rotations via Optimal Transport

no code implementations • CVPR 2020 • Tolga Birdal, Michael Arbel, Umut Şimşekli, Leonidas Guibas

We introduce a new paradigm, $\textit{measure synchronization}$, for synchronizing graphs with measure-valued edges.

Pose Estimation

Paper
Add Code

Statistical and Topological Properties of Sliced Probability Divergences

1 code implementation • NeurIPS 2020 • Kimia Nadjahi, Alain Durmus, Lénaïc Chizat, Soheil Kolouri, Shahin Shahrampour, Umut Şimşekli

The idea of slicing divergences has been proven to be successful when comparing two probability measures in various machine learning applications including generative modeling, and consists in computing the expected value of a `base divergence' between one-dimensional random projections of the two measures.

Paper
Code

Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

1 code implementation • ICML 2020 • Umut Şimşekli, Lingjiong Zhu, Yee Whye Teh, Mert Gürbüzbalaban

Stochastic gradient descent with momentum (SGDm) is one of the most popular optimization algorithms in deep learning.

Paper
Code

On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks

no code implementations • 29 Nov 2019 • Umut Şimşekli, Mert Gürbüzbalaban, Thanh Huy Nguyen, Gaël Richard, Levent Sagun

This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion.

Paper
Add Code

Approximate Bayesian Computation with the Sliced-Wasserstein Distance

1 code implementation • 28 Oct 2019 • Kimia Nadjahi, Valentin De Bortoli, Alain Durmus, Roland Badeau, Umut Şimşekli

Approximate Bayesian Computation (ABC) is a popular method for approximate inference in generative models with intractable but easy-to-sample likelihood.

Image Denoising

Paper
Code

Bayesian Model Selection for Identifying Markov Equivalent Causal Graphs

no code implementations • pproximateinference AABI Symposium 2019 • Mehmet Burak Kurutmaz, Melih Barsbey, Ali Taylan Cemgil, Sinan Yildirim, Umut Şimşekli

We believe that the Bayesian approach to causal discovery both allows the rich methodology of Bayesian inference to be used in various difficult aspects of this problem and provides a unifying framework to causal discovery research.

Bayesian Inference Causal Discovery +1

Paper
Add Code

Supervised Symbolic Music Style Translation Using Synthetic Data

1 code implementation • 4 Jul 2019 • Ondřej Cífka, Umut Şimşekli, Gaël Richard

Research on style transfer and domain translation has clearly demonstrated the ability of deep learning-based algorithms to manipulate images in terms of artistic style.

Music Genre Transfer Style Transfer +2

Paper
Code

First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

1 code implementation • NeurIPS 2019 • Thanh Huy Nguyen, Umut Şimşekli, Mert Gürbüzbalaban, Gaël Richard

We show that the behaviors of the two systems are indeed similar for small step-sizes and we identify how the error depends on the algorithm and problem parameters.

Computational Efficiency

Paper
Code

Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance

1 code implementation • NeurIPS 2019 • Kimia Nadjahi, Alain Durmus, Umut Şimşekli, Roland Badeau

Minimum expected distance estimation (MEDE) algorithms have been widely used for probabilistic models with intractable likelihood functions and they have become increasingly popular due to their use in implicit generative modeling (e. g. Wasserstein generative adversarial networks, Wasserstein autoencoders).

Paper
Code

Probabilistic Permutation Synchronization using the Riemannian Structure of the Birkhoff Polytope

no code implementations • CVPR 2019 • Tolga Birdal, Umut Şimşekli

We present an entirely new geometric and probabilistic approach to synchronization of correspondences across multiple sets of objects or images.

Graph Matching

Paper
Add Code

Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization

no code implementations • 22 Jan 2019 • Thanh Huy Nguyen, Umut Şimşekli, Gaël Richard

Recent studies on diffusion-based sampling methods have shown that Langevin Monte Carlo (LMC) algorithms can be beneficial for non-convex optimization, and rigorous theoretical guarantees have been proven for both asymptotic and finite-time regimes.

Paper
Add Code

Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions

1 code implementation • 21 Jun 2018 • Antoine Liutkus, Umut Şimşekli, Szymon Majewski, Alain Durmus, Fabian-Robert Stöter

To the best of our knowledge, the proposed algorithm is the first nonparametric IGM algorithm with explicit theoretical guarantees.

Paper
Code

Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization

no code implementations • ICML 2018 • Umut Şimşekli, Çağatay Yıldız, Thanh Huy Nguyen, Gaël Richard, A. Taylan Cemgil

The results support our theory and show that the proposed algorithm provides a significant speedup over the recently proposed synchronous distributed L-BFGS algorithm.

Paper
Add Code

Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC

no code implementations • NeurIPS 2018 • Tolga Birdal, Umut Şimşekli, M. Onur Eken, Slobodan Ilic

We introduce Tempered Geodesic Markov Chain Monte Carlo (TG-MCMC) algorithm for initializing pose graph optimization problems, arising in various scenarios such as SFM (structure from motion) or SLAM (simultaneous localization and mapping).

Simultaneous Localization and Mapping

Paper
Add Code

Fractional Langevin Monte Carlo: Exploring Levy Driven Stochastic Differential Equations for MCMC

no code implementations • ICML 2017 • Umut Şimşekli

These so called Langevin Monte Carlo (LMC) methods are based on diffusions driven by a Brownian motion, which gives rise to Gaussian proposal distributions in the resulting algorithms.

Efficient Exploration

Paper
Add Code

Fractional Langevin Monte Carlo: Exploring Lévy Driven Stochastic Differential Equations for Markov Chain Monte Carlo

no code implementations • 12 Jun 2017 • Umut Şimşekli

These so called Langevin Monte Carlo (LMC) methods are based on diffusions driven by a Brownian motion, which gives rise to Gaussian proposal distributions in the resulting algorithms.

Efficient Exploration

Paper
Add Code

Learning the Morphology of Brain Signals Using Alpha-Stable Convolutional Sparse Coding

no code implementations • NeurIPS 2017 • Mainak Jas, Tom Dupré La Tour, Umut Şimşekli, Alexandre Gramfort

Neural time-series data contain a wide variety of prototypical signal waveforms (atoms) that are of significant importance in clinical and cognitive research.

Time Series Time Series Analysis

Paper
Add Code

Stochastic Quasi-Newton Langevin Monte Carlo

no code implementations • 10 Feb 2016 • Umut Şimşekli, Roland Badeau, A. Taylan Cemgil, Gaël Richard

These second order methods directly approximate the inverse Hessian by using a limited history of samples and their gradients.

Second-order methods

Paper
Add Code

HAMSI: A Parallel Incremental Optimization Algorithm Using Quadratic Approximations for Solving Partially Separable Problems

no code implementations • 5 Sep 2015 • Kamer Kaya, Figen Öztoprak, Ş. İlker Birbil, A. Taylan Cemgil, Umut Şimşekli, Nurdan Kuru, Hazal Koptagel, M. Kaan Öztürk

We propose HAMSI (Hessian Approximated Multiple Subsets Iteration), which is a provably convergent, second order incremental algorithm for solving large-scale partially separable optimization problems.

Paper
Add Code

Parallel Stochastic Gradient Markov Chain Monte Carlo for Matrix Factorisation Models

no code implementations • 3 Jun 2015 • Umut Şimşekli, Hazal Koptagel, Hakan Güldaş, A. Taylan Cemgil, Figen Öztoprak, Ş. İlker Birbil

For large matrix factorisation problems, we develop a distributed Markov Chain Monte Carlo (MCMC) method based on stochastic gradient Langevin dynamics (SGLD) that we call Parallel SGLD (PSGLD).

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.