no code implementations • 12 Feb 2024 • Yunbum Kook, Matthew S. Zhang, Sinho Chewi, Murat A. Erdogdu, Mufan Bill Li
We study the complexity of sampling from the stationary distribution of a mean-field SDE, or equivalently, the complexity of minimizing a functional over the space of probability measures which includes an interaction term.
no code implementations • 3 Oct 2023 • Avital Shafran, Ilia Shumailov, Murat A. Erdogdu, Nicolas Papernot
We discover that prior knowledge of the attacker, i. e. access to in-distribution data, dominates other factors like the attack policy the adversary follows to choose which queries to make to the victim model API.
1 code implementation • NeurIPS 2023 • Tyler Kastner, Murat A. Erdogdu, Amir-Massoud Farahmand
We consider the problem of learning models for risk-sensitive reinforcement learning.
Distributional Reinforcement Learning reinforcement-learning
no code implementations • 7 Mar 2023 • Alireza Mousavi-Hosseini, Tyler Farghly, Ye He, Krishnakumar Balasubramanian, Murat A. Erdogdu
We do so by establishing upper and lower bounds for Langevin diffusions and LMC under weak Poincar\'e inequalities that are satisfied by a large class of densities including polynomially-decaying heavy-tailed densities (i. e., Cauchy-type).
no code implementations • 1 Mar 2023 • Ye He, Tyler Farghly, Krishnakumar Balasubramanian, Murat A. Erdogdu
We analyze the complexity of sampling from a class of heavy-tailed distributions by discretizing a natural class of It\^o diffusions associated with weighted Poincar\'e inequalities.
no code implementations • 16 Feb 2023 • Matthew Zhang, Sinho Chewi, Mufan Bill Li, Krishnakumar Balasubramanian, Murat A. Erdogdu
As a byproduct, we also obtain the first KL divergence guarantees for ULMC without Hessian smoothness under strong log-concavity, which is based on a new result on the log-Sobolev constant along the underdamped Langevin diffusion.
no code implementations • 29 Sep 2022 • Alireza Mousavi-Hosseini, Sejun Park, Manuela Girotti, Ioannis Mitliagkas, Murat A. Erdogdu
We further demonstrate that, SGD-trained ReLU NNs can learn a single-index target of the form $y=f(\langle\boldsymbol{u},\boldsymbol{x}\rangle) + \epsilon$ by recovering the principal direction, with a sample complexity linear in $d$ (up to log factors), where $f$ is a monotonic function with at most polynomial growth, and $\epsilon$ is the noise.
no code implementations • 19 Sep 2022 • Sejun Park, Umut Şimşekli, Murat A. Erdogdu
In this paper, we propose a new covering technique localized for the trajectories of SGD.
no code implementations • 25 Jul 2022 • Adam Dziedzic, Stephan Rabanser, Mohammad Yaghini, Armin Ale, Murat A. Erdogdu, Nicolas Papernot
We introduce $p$-DkNN, a novel inference procedure that takes a trained deep neural network and analyzes the similarity structures of its intermediate hidden representations to compute $p$-values associated with the end-to-end model prediction.
no code implementations • 3 May 2022 • Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang
We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$.
no code implementations • 23 Feb 2022 • Nuri Mert Vural, Lu Yu, Krishnakumar Balasubramanian, Stanislav Volgushev, Murat A. Erdogdu
We study stochastic convex optimization under infinite noise variance.
no code implementations • 10 Feb 2022 • Krishnakumar Balasubramanian, Sinho Chewi, Murat A. Erdogdu, Adil Salim, Matthew Zhang
For the task of sampling from a density $\pi \propto \exp(-V)$ on $\mathbb{R}^d$, where $V$ is possibly non-convex but $L$-gradient Lipschitz, we prove that averaged Langevin Monte Carlo outputs a sample with $\varepsilon$-relative Fisher information after $O( L^2 d^2/\varepsilon^2)$ iterations.
no code implementations • 20 Jan 2022 • Ye He, Krishnakumar Balasubramanian, Murat A. Erdogdu
We analyze the oracle complexity of sampling from polynomially decaying heavy-tailed target densities based on running the Unadjusted Langevin Algorithm on certain transformed versions of the target density.
no code implementations • 23 Dec 2021 • Sinho Chewi, Murat A. Erdogdu, Mufan Bill Li, Ruoqi Shen, Matthew Zhang
Classically, the continuous-time Langevin diffusion converges exponentially fast to its stationary distribution $\pi$ under the sole assumption that $\pi$ satisfies a Poincar\'e inequality.
no code implementations • 30 Oct 2021 • Matthew S. Zhang, Murat A. Erdogdu, Animesh Garg
Policy gradient methods have been frequently applied to problems in control and reinforcement learning with great success, yet existing convergence analysis still relies on non-intuitive, impractical and often opaque conditions.
no code implementations • NeurIPS 2021 • Abhishek Roy, Krishnakumar Balasubramanian, Murat A. Erdogdu
In this work, we establish risk bounds for the Empirical Risk Minimization (ERM) with both dependent and heavy-tailed data-generating processes.
no code implementations • NeurIPS 2021 • Alexander Camuto, George Deligiannidis, Murat A. Erdogdu, Mert Gürbüzbalaban, Umut Şimşekli, Lingjiong Zhu
As our main contribution, we prove that the generalization error of a stochastic optimization algorithm can be bounded based on the `complexity' of the fractal structure that underlies its invariant measure.
1 code implementation • NeurIPS 2021 • Melih Barsbey, Milad Sefidgaran, Murat A. Erdogdu, Gaël Richard, Umut Şimşekli
Neural network compression techniques have become increasingly popular as they can drastically reduce the storage and computation requirements for very large networks.
1 code implementation • NeurIPS 2021 • Ilia Shumailov, Zakhar Shumaylov, Dmitry Kazhdan, Yiren Zhao, Nicolas Papernot, Murat A. Erdogdu, Ross Anderson
Machine learning is vulnerable to a wide variety of attacks.
no code implementations • NeurIPS 2021 • Hongjian Wang, Mert Gürbüzbalaban, Lingjiong Zhu, Umut Şimşekli, Murat A. Erdogdu
In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives.
no code implementations • NeurIPS 2020 • Ye He, Krishnakumar Balasubramanian, Murat A. Erdogdu
The randomized midpoint method, proposed by [SL19], has emerged as an optimal discretization procedure for simulating the continuous time Langevin diffusions.
no code implementations • 21 Oct 2020 • Mufan Bill Li, Murat A. Erdogdu
We propose a Langevin diffusion-based algorithm for non-convex optimization and sampling on a product manifold of spheres.
no code implementations • 22 Jul 2020 • Murat A. Erdogdu, Rasa Hosseinzadeh, Matthew S. Zhang
We prove that, initialized with a Gaussian random vector that has sufficiently small variance, iterating the LMC algorithm for $\widetilde{\mathcal{O}}(\lambda^2 d\epsilon^{-1})$ steps is sufficient to reach $\epsilon$-neighborhood of the target in both Chi-squared and Renyi divergence, where $\lambda$ is the logarithmic Sobolev constant of $\nu_*$.
1 code implementation • NeurIPS 2020 • Umut Şimşekli, Ozan Sener, George Deligiannidis, Murat A. Erdogdu
Despite its success in a wide range of applications, characterizing the generalization properties of stochastic gradient descent (SGD) in non-convex deep learning problems is still an important challenge.
no code implementations • NeurIPS 2021 • Lu Yu, Krishnakumar Balasubramanian, Stanislav Volgushev, Murat A. Erdogdu
Structured non-convex learning problems, for which critical points have favorable statistical properties, arise frequently in statistical machine learning.
no code implementations • 27 May 2020 • Murat A. Erdogdu, Rasa Hosseinzadeh
This convergence rate, in terms of $\epsilon$ dependency, is not directly influenced by the tail growth rate $\alpha$ of the potential function as long as its growth is at least linear, and it only relies on the order of smoothness $\beta$.
no code implementations • pproximateinference AABI Symposium 2019 • Jimmy Ba, Murat A. Erdogdu, Marzyeh Ghassemi, Taiji Suzuki, Shengyang Sun, Denny Wu, Tianzong Zhang
Particle-based inference algorithm is a promising method to efficiently generate samples for an intractable target distribution by iteratively updating a set of particles.
no code implementations • NeurIPS 2019 • Xuechen Li, Denny Wu, Lester Mackey, Murat A. Erdogdu
In this paper, we establish the convergence rate of sampling algorithms obtained by discretizing smooth It\^o diffusions exhibiting fast Wasserstein-$2$ contraction, based on local deviation properties of the integration scheme.
no code implementations • 3 Apr 2019 • Andreas Anastasiou, Krishnakumar Balasubramanian, Murat A. Erdogdu
A crucial intermediate step is proving a non-asymptotic martingale central limit theorem (CLT), i. e., establishing the rates of convergence of a multivariate martingale difference sequence to a normal random vector, which might be of independent interest.
no code implementations • NeurIPS 2018 • Murat A. Erdogdu, Lester Mackey, Ohad Shamir
An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems.
no code implementations • 12 Jul 2018 • Murat A. Erdogdu, Asuman Ozdaglar, Pablo A. Parrilo, Nuri Denizcan Vanli
Furthermore, incorporating Lanczos method to the block-coordinate maximization, we propose an algorithm that is guaranteed to return a solution that provides $1-O(1/r)$ approximation to the original SDP without any assumptions, where $r$ is the rank of the factorization.
no code implementations • NeurIPS 2017 • Hakan Inan, Murat A. Erdogdu, Mark Schnitzer
We use our proposed robust loss in a matrix factorization framework to extract the neurons and their temporal activity in calcium imaging datasets.
no code implementations • NeurIPS 2017 • Murat A. Erdogdu, Yash Deshpande, Andrea Montanari
We demonstrate that the resulting algorithm can solve problems with tens of thousands of variables within minutes, and outperforms BP and GBP on practical problems such as image denoising and Ising spin glasses.
no code implementations • NeurIPS 2016 • Murat A. Erdogdu, Lee H. Dicker, Mohsen Bayati
We study the problem of efficiently estimating the coefficients of generalized linear models (GLMs) in the large-scale setting where the number of observations $n$ is much larger than the number of predictors $p$, i. e. $n\gg p \gg 1$.
no code implementations • 21 Nov 2016 • Murat A. Erdogdu, Mohsen Bayati, Lee H. Dicker
Using this relation, we design an algorithm that achieves the same accuracy as the empirical risk minimizer through iterations that attain up to a cubic convergence rate, and that are cheaper than any batch optimization algorithm by at least a factor of $\mathcal{O}(p)$.
no code implementations • NeurIPS 2015 • Murat A. Erdogdu
We consider the problem of efficiently computing the maximum likelihood estimator in Generalized Linear Models (GLMs)when the number of observations is much larger than the number of coefficients (n > > p > > 1).
no code implementations • 28 Nov 2015 • Murat A. Erdogdu
We consider the problem of efficiently computing the maximum likelihood estimator in Generalized Linear Models (GLMs) when the number of observations is much larger than the number of coefficients ($n \gg p \gg 1$).
no code implementations • NeurIPS 2015 • Murat A. Erdogdu, Andrea Montanari
In this regime, algorithms which utilize sub-sampling techniques are known to be effective.
1 code implementation • 8 Jun 2015 • Qingyuan Zhao, Murat A. Erdogdu, Hera Y. He, Anand Rajaraman, Jure Leskovec
Social networking websites allow users to create and share content.
Social and Information Networks Physics and Society Applications 60G55, 62P25 H.2.8
no code implementations • NeurIPS 2013 • Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari
In this context, we develop new estimators for the $\ell_2$ estimation risk $\|\hat{\theta}-\theta_0\|_2$ and the variance of the noise.