no code implementations • 14 Oct 2024 • Siddharth Mitra, Andre Wibisono
We study the mixing time of two popular discrete time Markov chains in continuous space, the unadjusted Langevin algorithm and the proximal sampler, which are discretizations of the Langevin dynamics.
no code implementations • 6 May 2024 • Jonas Katona, Xiuyuan Wang, Andre Wibisono
We derive new error bounds on the MH when truncated at orders in the stepsize in terms of the number of iterations, $K$, and use these bounds to show an improved $\mathcal{O}(K^{1/5})$ total regret bound and an $\mathcal{O}(K^{-4/5})$ duality gap of the average iterates for AMD.
no code implementations • 26 Feb 2024 • Jiaming Liang, Siddharth Mitra, Andre Wibisono
We study the rate at which the initial and current random variables become independent along a Markov chain, focusing on the Langevin diffusion in continuous time and the Unadjusted Langevin Algorithm (ULA) in discrete time.
no code implementations • 12 Feb 2024 • Andre Wibisono, Yihong Wu, Kaylee Yingxi Yang
We study the problem of estimating the score function of an unknown probability distribution $\rho^*$ from $n$ independent and identically distributed observations in $d$ dimensions.
1 code implementation • 14 Dec 2023 • Vishwak Srinivasan, Andre Wibisono, Ashia Wilson
This algorithm adds an accept-reject filter to the Markov chain induced by a single step of the Mirror Langevin algorithm (Zhang et al., 2020), which is a basic discretisation of the Mirror Langevin dynamics.
no code implementations • 25 Sep 2023 • Zihao Hu, Guanghui Wang, Xi Wang, Andre Wibisono, Jacob Abernethy, Molei Tao
In the context of Euclidean space, it is established that the last-iterates of both the extragradient (EG) and past extragradient (PEG) methods converge to the solution of monotone variational inequality problems at a rate of $O\left(\frac{1}{\sqrt{T}}\right)$ (Cai et al., 2022).
no code implementations • 26 May 2023 • Ketaki Joshi, Raghavendra Pradyumna Pothukuchi, Andre Wibisono, Abhishek Bhattacharjee
Compared to state-of-the-art weight regularization methods to mitigate catastrophic forgetting, our approach is simple, effective, and enables faster learning.
no code implementations • 15 Feb 2023 • Jun-Kun Wang, Andre Wibisono
Quasar convexity is a condition that allows some first-order methods to efficiently minimize a function even when the optimization landscape is non-convex.
no code implementations • 2 Nov 2022 • Kaylee Yingxi Yang, Andre Wibisono
We study the Inexact Langevin Dynamics (ILD), Inexact Langevin Algorithm (ILA), and Score-based Generative Modeling (SGM) when utilizing estimated score functions for sampling.
no code implementations • 28 Oct 2022 • Ryan Yang, Haizhou Du, Andre Wibisono, Patrick Baker
Distributed machine learning (DML) can be an important capability for modern military to take advantage of data and devices distributed at multiple vantage points to adapt and learn.
no code implementations • 18 Oct 2022 • Jun-Kun Wang, Andre Wibisono
We consider a setting that a model needs to adapt to a new domain under distribution shifts, given that only unlabeled test samples from the new domain are accessible at test time.
no code implementations • 5 Jul 2022 • Jun-Kun Wang, Andre Wibisono
When the potential $f$ is $L$-smooth and $m$-strongly convex, i. e.\ for sampling from a log-smooth and strongly log-concave target distribution $\pi$, it is known that under a constant integration time, the number of iterations that ideal HMC takes to get an $\epsilon$ Wasserstein-2 distance to the target $\pi$ is $O( \kappa \log \frac{1}{\epsilon} )$, where $\kappa := \frac{L}{m}$ is the condition number.
no code implementations • 22 Jun 2022 • Jun-Kun Wang, Chi-Heng Lin, Andre Wibisono, Bin Hu
An additional condition needs to be satisfied for the acceleration result of HB beyond quadratics in this work, which naturally holds when the dimension is one or, more broadly, when the Hessian is diagonal.
no code implementations • 8 Jun 2022 • Andre Wibisono, Molei Tao, Georgios Piliouras
In this paper we study two-player bilinear zero-sum games with constrained strategy spaces.
no code implementations • 13 Feb 2022 • Yongxin Chen, Sinho Chewi, Adil Salim, Andre Wibisono
We study the proximal sampler of Lee, Shen, and Tian (2021) and obtain new convergence guarantees under weaker assumptions than strong log-concavity: namely, our results hold for (1) weakly log-concave targets, and (2) targets satisfying isoperimetric assumptions which allow for non-log-concavity.
no code implementations • 29 Jan 2022 • Haizhou Du, Ryan Yang, Yijian Chen, Qiao Xiang, Andre Wibisono, Wei Huang
In this paper, we analyze properties of the WPM and rigorously prove convergence properties of our aggregation mechanism.
no code implementations • 24 Sep 2021 • Ruilin Li, Molei Tao, Santosh S. Vempala, Andre Wibisono
The Mirror Langevin Diffusion (MLD) is a sampling analogue of mirror flow in continuous time, and it has nice convergence properties under log-Sobolev or Poincare inequalities relative to the Hessian metric, as shown by Chewi et al. (2020).
no code implementations • 4 Nov 2019 • Andre Wibisono
We study the Proximal Langevin Algorithm (PLA) for sampling from a probability distribution $\nu = e^{-f}$ on $\mathbb{R}^n$ under isoperimetry.
no code implementations • ICLR 2020 • Jacob Abernethy, Kevin A. Lai, Andre Wibisono
While classic work in convex-concave min-max optimization relies on average-iterate convergence results, the emergence of nonconvex applications such as training Generative Adversarial Networks has led to renewed interest in last-iterate convergence guarantees.
no code implementations • NeurIPS 2019 • Santosh S. Vempala, Andre Wibisono
We also prove convergence guarantees in R\'enyi divergence of order $q > 1$ assuming the limit of ULA satisfies either the log-Sobolev or Poincar\'e inequality.
1 code implementation • NeurIPS 2019 • Ashia Wilson, Lester Mackey, Andre Wibisono
We also introduce a new first-order algorithm, called rescaled gradient descent (RGD), and show that RGD achieves a faster convergence rate than gradient descent provided the function is strongly smooth -- a natural generalization of the standard smoothness assumption on the objective function.
Optimization and Control
no code implementations • 22 Feb 2018 • Andre Wibisono
We show SLA is in fact consistent for Gaussian target measure, whereas ULA is not.
no code implementations • 14 Mar 2016 • Andre Wibisono, Ashia C. Wilson, Michael. I. Jordan
We show that there is a Lagrangian functional that we call the \emph{Bregman Lagrangian} which generates a large class of accelerated methods in continuous time, including (but not limited to) accelerated gradient descent, its non-Euclidean extension, and accelerated higher-order gradient methods.
no code implementations • NeurIPS 2014 • Po-Ling Loh, Andre Wibisono
We establish sufficient conditions for the concavity of our reweighted objective function in terms of weight assignments in the Kikuchi expansion, and show that a reweighted version of the sum product algorithm applied to the Kikuchi region graph will produce global optima of the Kikuchi approximation whenever the algorithm converges.
no code implementations • 7 Dec 2013 • John C. Duchi, Michael. I. Jordan, Martin J. Wainwright, Andre Wibisono
We consider derivative-free algorithms for stochastic and non-stochastic convex optimization problems that use only function values rather than gradients.
no code implementations • NeurIPS 2013 • Jacob Abernethy, Peter L. Bartlett, Rafael Frongillo, Andre Wibisono
We consider a popular problem in finance, option pricing, through the lens of an online learning game between Nature and an Investor.
2 code implementations • NeurIPS 2013 • Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, Michael. I. Jordan
We present SDA-Bayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior.
no code implementations • NeurIPS 2012 • Andre Wibisono, Martin J. Wainwright, Michael. I. Jordan, John C. Duchi
We consider derivative-free algorithms for stochastic optimization problems that use only noisy function values rather than gradients, analyzing their finite-sample convergence rates.