no code implementations • 11 Jul 2024 • Rayna Andreeva, Benjamin Dupuis, Rik Sarkar, Tolga Birdal, Umut Şimşekli
Our experimental results demonstrate that our new complexity measures correlate highly with generalization error in industry-standards architectures such as transformers and deep graph networks.
no code implementations • 4 Mar 2024 • Umut Şimşekli, Mert Gürbüzbalaban, Sinan Yildirim, Lingjiong Zhu
Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years.
1 code implementation • 12 Feb 2024 • Benjamin Dupuis, Umut Şimşekli
Understanding the generalization properties of heavy-tailed stochastic optimization algorithms has attracted increasing attention over the past years.
no code implementations • 7 Feb 2024 • Paul Viallard, Maxime Haddouche, Umut Şimşekli, Benjamin Guedj
We also instantiate our bounds as training objectives, yielding non-trivial guarantees and practical performances.
no code implementations • 10 Feb 2023 • Mert Gürbüzbalaban, Yuanhan Hu, Umut Şimşekli, Lingjiong Zhu
Our results bring a new understanding of the benefits of cyclic and randomized stepsizes compared to constant stepsize in terms of the tail behavior.
1 code implementation • 6 Feb 2023 • Benjamin Dupuis, George Deligiannidis, Umut Şimşekli
To achieve this goal, we build up on a classical covering argument in learning theory and introduce a data-dependent fractal dimension.
no code implementations • 27 Jan 2023 • Anant Raj, Lingjiong Zhu, Mert Gürbüzbalaban, Umut Şimşekli
Very recently, new generalization bounds have been proven, indicating a non-monotonic relationship between the generalization error and heavy tails, which is more pertinent to the reported empirical observations.
no code implementations • 19 Sep 2022 • Sejun Park, Umut Şimşekli, Murat A. Erdogdu
In this paper, we propose a new covering technique localized for the trajectories of SGD.
no code implementations • 2 Jun 2022 • Anant Raj, Melih Barsbey, Mert Gürbüzbalaban, Lingjiong Zhu, Umut Şimşekli
Recent studies have shown that heavy tails can emerge in stochastic optimization and that the heaviness of the tails have links to the generalization error.
1 code implementation • 23 May 2022 • Soon Hoe Lim, Yijun Wan, Umut Şimşekli
Recent studies have shown that gradient descent (GD) can achieve improved generalization when its dynamics exhibits a chaotic behavior.
no code implementations • 4 Mar 2022 • Milad Sefidgaran, Amin Gohari, Gaël Richard, Umut Şimşekli
Understanding generalization in modern machine learning settings has been one of the major challenges in statistical learning theory.
2 code implementations • NeurIPS 2021 • Tolga Birdal, Aaron Lou, Leonidas Guibas, Umut Şimşekli
Disobeying the classical wisdom of statistical learning theory, modern deep neural networks generalize well even though they typically contain millions of parameters.
no code implementations • 2 Aug 2021 • Liam Hodgkinson, Umut Şimşekli, Rajiv Khanna, Michael W. Mahoney
Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood.
2 code implementations • NeurIPS 2021 • Kimia Nadjahi, Alain Durmus, Pierre E. Jacob, Roland Badeau, Umut Şimşekli
The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits.
no code implementations • NeurIPS 2021 • Alexander Camuto, George Deligiannidis, Murat A. Erdogdu, Mert Gürbüzbalaban, Umut Şimşekli, Lingjiong Zhu
As our main contribution, we prove that the generalization error of a stochastic optimization algorithm can be bounded based on the `complexity' of the fractal structure that underlies its invariant measure.
1 code implementation • NeurIPS 2021 • Melih Barsbey, Milad Sefidgaran, Murat A. Erdogdu, Gaël Richard, Umut Şimşekli
Neural network compression techniques have become increasingly popular as they can drastically reduce the storage and computation requirements for very large networks.
1 code implementation • 18 May 2021 • Antoine Liutkus, Ondřej Cífka, Shih-Lun Wu, Umut Şimşekli, Yi-Hsuan Yang, Gaël Richard
Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity.
no code implementations • NeurIPS 2021 • Hongjian Wang, Mert Gürbüzbalaban, Lingjiong Zhu, Umut Şimşekli, Murat A. Erdogdu
In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives.
1 code implementation • 13 Feb 2021 • Alexander Camuto, Xiaoyu Wang, Lingjiong Zhu, Chris Holmes, Mert Gürbüzbalaban, Umut Şimşekli
In this paper we focus on the so-called `implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of SGD.
1 code implementation • 10 Feb 2021 • Ondřej Cífka, Alexey Ozerov, Umut Şimşekli, Gaël Richard
While several style conversion methods tailored to musical signals have been proposed, most lack the 'one-shot' capability of classical image style transfer algorithms.
1 code implementation • IEEE/ACM Transactions on Audio, Speech, and Language Processing 2020 • Ondřej Cífka, Umut Şimşekli, Gaël Richard
Style transfer is the process of changing the style of an image, video, audio clip or musical piece so as to match the style of a given example.
no code implementations • NeurIPS 2020 • Alexander Camuto, Matthew Willetts, Umut Şimşekli, Stephen Roberts, Chris Holmes
We study the regularisation induced in neural networks by Gaussian noise injections (GNIs).
1 code implementation • NeurIPS 2020 • Umut Şimşekli, Ozan Sener, George Deligiannidis, Murat A. Erdogdu
Despite its success in a wide range of applications, characterizing the generalization properties of stochastic gradient descent (SGD) in non-convex deep learning problems is still an important challenge.
1 code implementation • 8 Jun 2020 • Mert Gurbuzbalaban, Umut Şimşekli, Lingjiong Zhu
We claim that depending on the structure of the Hessian of the loss at the minimum, and the choices of the algorithm parameters $\eta$ and $b$, the SGD iterates will converge to a \emph{heavy-tailed} stationary distribution.
no code implementations • CVPR 2020 • Tolga Birdal, Michael Arbel, Umut Şimşekli, Leonidas Guibas
We introduce a new paradigm, $\textit{measure synchronization}$, for synchronizing graphs with measure-valued edges.
1 code implementation • NeurIPS 2020 • Kimia Nadjahi, Alain Durmus, Lénaïc Chizat, Soheil Kolouri, Shahin Shahrampour, Umut Şimşekli
The idea of slicing divergences has been proven to be successful when comparing two probability measures in various machine learning applications including generative modeling, and consists in computing the expected value of a `base divergence' between one-dimensional random projections of the two measures.
1 code implementation • ICML 2020 • Umut Şimşekli, Lingjiong Zhu, Yee Whye Teh, Mert Gürbüzbalaban
Stochastic gradient descent with momentum (SGDm) is one of the most popular optimization algorithms in deep learning.
no code implementations • 29 Nov 2019 • Umut Şimşekli, Mert Gürbüzbalaban, Thanh Huy Nguyen, Gaël Richard, Levent Sagun
This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion.
1 code implementation • 28 Oct 2019 • Kimia Nadjahi, Valentin De Bortoli, Alain Durmus, Roland Badeau, Umut Şimşekli
Approximate Bayesian Computation (ABC) is a popular method for approximate inference in generative models with intractable but easy-to-sample likelihood.
no code implementations • pproximateinference AABI Symposium 2019 • Mehmet Burak Kurutmaz, Melih Barsbey, Ali Taylan Cemgil, Sinan Yildirim, Umut Şimşekli
We believe that the Bayesian approach to causal discovery both allows the rich methodology of Bayesian inference to be used in various difficult aspects of this problem and provides a unifying framework to causal discovery research.
1 code implementation • 4 Jul 2019 • Ondřej Cífka, Umut Şimşekli, Gaël Richard
Research on style transfer and domain translation has clearly demonstrated the ability of deep learning-based algorithms to manipulate images in terms of artistic style.
1 code implementation • NeurIPS 2019 • Thanh Huy Nguyen, Umut Şimşekli, Mert Gürbüzbalaban, Gaël Richard
We show that the behaviors of the two systems are indeed similar for small step-sizes and we identify how the error depends on the algorithm and problem parameters.
1 code implementation • NeurIPS 2019 • Kimia Nadjahi, Alain Durmus, Umut Şimşekli, Roland Badeau
Minimum expected distance estimation (MEDE) algorithms have been widely used for probabilistic models with intractable likelihood functions and they have become increasingly popular due to their use in implicit generative modeling (e. g. Wasserstein generative adversarial networks, Wasserstein autoencoders).
no code implementations • CVPR 2019 • Tolga Birdal, Umut Şimşekli
We present an entirely new geometric and probabilistic approach to synchronization of correspondences across multiple sets of objects or images.
no code implementations • 22 Jan 2019 • Thanh Huy Nguyen, Umut Şimşekli, Gaël Richard
Recent studies on diffusion-based sampling methods have shown that Langevin Monte Carlo (LMC) algorithms can be beneficial for non-convex optimization, and rigorous theoretical guarantees have been proven for both asymptotic and finite-time regimes.
1 code implementation • 21 Jun 2018 • Antoine Liutkus, Umut Şimşekli, Szymon Majewski, Alain Durmus, Fabian-Robert Stöter
To the best of our knowledge, the proposed algorithm is the first nonparametric IGM algorithm with explicit theoretical guarantees.
no code implementations • ICML 2018 • Umut Şimşekli, Çağatay Yıldız, Thanh Huy Nguyen, Gaël Richard, A. Taylan Cemgil
The results support our theory and show that the proposed algorithm provides a significant speedup over the recently proposed synchronous distributed L-BFGS algorithm.
no code implementations • NeurIPS 2018 • Tolga Birdal, Umut Şimşekli, M. Onur Eken, Slobodan Ilic
We introduce Tempered Geodesic Markov Chain Monte Carlo (TG-MCMC) algorithm for initializing pose graph optimization problems, arising in various scenarios such as SFM (structure from motion) or SLAM (simultaneous localization and mapping).
no code implementations • ICML 2017 • Umut Şimşekli
These so called Langevin Monte Carlo (LMC) methods are based on diffusions driven by a Brownian motion, which gives rise to Gaussian proposal distributions in the resulting algorithms.
no code implementations • 12 Jun 2017 • Umut Şimşekli
These so called Langevin Monte Carlo (LMC) methods are based on diffusions driven by a Brownian motion, which gives rise to Gaussian proposal distributions in the resulting algorithms.
no code implementations • NeurIPS 2017 • Mainak Jas, Tom Dupré La Tour, Umut Şimşekli, Alexandre Gramfort
Neural time-series data contain a wide variety of prototypical signal waveforms (atoms) that are of significant importance in clinical and cognitive research.
no code implementations • 10 Feb 2016 • Umut Şimşekli, Roland Badeau, A. Taylan Cemgil, Gaël Richard
These second order methods directly approximate the inverse Hessian by using a limited history of samples and their gradients.
no code implementations • 5 Sep 2015 • Kamer Kaya, Figen Öztoprak, Ş. İlker Birbil, A. Taylan Cemgil, Umut Şimşekli, Nurdan Kuru, Hazal Koptagel, M. Kaan Öztürk
We propose HAMSI (Hessian Approximated Multiple Subsets Iteration), which is a provably convergent, second order incremental algorithm for solving large-scale partially separable optimization problems.
no code implementations • 3 Jun 2015 • Umut Şimşekli, Hazal Koptagel, Hakan Güldaş, A. Taylan Cemgil, Figen Öztoprak, Ş. İlker Birbil
For large matrix factorisation problems, we develop a distributed Markov Chain Monte Carlo (MCMC) method based on stochastic gradient Langevin dynamics (SGLD) that we call Parallel SGLD (PSGLD).