no code implementations • 15 Feb 2023 • Jun-Kun Wang, Andre Wibisono
Quasar convexity is a condition that allows some first-order methods to efficiently minimize a function even when the optimization landscape is non-convex.
no code implementations • 18 Oct 2022 • Jun-Kun Wang, Andre Wibisono
We consider a setting that a model needs to adapt to a new domain under distribution shifts, given that only unlabeled test samples from the new domain are accessible at test time.
no code implementations • 5 Jul 2022 • Jun-Kun Wang, Andre Wibisono
When the potential $f$ is $L$-smooth and $m$-strongly convex, i. e.\ for sampling from a log-smooth and strongly log-concave target distribution $\pi$, it is known that under a constant integration time, the number of iterations that ideal HMC takes to get an $\epsilon$ Wasserstein-2 distance to the target $\pi$ is $O( \kappa \log \frac{1}{\epsilon} )$, where $\kappa := \frac{L}{m}$ is the condition number.
no code implementations • 22 Jun 2022 • Jun-Kun Wang, Chi-Heng Lin, Andre Wibisono, Bin Hu
An additional condition needs to be satisfied for the acceleration result of HB beyond quadratics in this work, which naturally holds when the dimension is one or, more broadly, when the Hessian is diagonal.
no code implementations • 22 Nov 2021 • Jun-Kun Wang, Jacob Abernethy, Kfir Y. Levy
We develop an algorithmic framework for solving convex optimization problems using no-regret game dynamics.
no code implementations • 23 Jun 2021 • Jun-Kun Wang
In the first part of this dissertation research, we develop a modular framework that can serve as a recipe for constructing and analyzing iterative algorithms for convex optimization.
no code implementations • ICLR 2020 • Jun-Kun Wang, Chi-Heng Lin, Jacob Abernethy
At the same time, a widely-observed empirical phenomenon is that in training deep networks stochastic momentum appears to significantly improve convergence time, variants of it have flourished in the development of other popular update methods, e. g. ADAM [KB15], AMSGrad [RKK18], etc.
no code implementations • 4 Oct 2020 • Jun-Kun Wang, Chi-Heng Lin, Jacob Abernethy
Our result shows that with the appropriate choice of parameters Polyak's momentum has a rate of $(1-\Theta(\frac{1}{\sqrt{\kappa'}}))^t$.
no code implementations • 4 Oct 2020 • Jun-Kun Wang, Jacob Abernethy
Over-parametrization has become a popular technique in deep learning.
no code implementations • 4 Oct 2020 • Jun-Kun Wang, Jacob Abernethy
The Heavy Ball Method, proposed by Polyak over five decades ago, is a first-order method for optimizing continuous functions.
no code implementations • 25 Sep 2019 • Jun-Kun Wang, Xiaoyun Li, Ping Li
Perhaps the only methods that enjoy convergence guarantees are the ones that sample the perturbed points uniformly from a unit sphere or from a multivariate Gaussian distribution with an isotropic covariance.
no code implementations • ICLR 2019 • Jun-Kun Wang, Xiaoyun Li, Ping Li
We consider new variants of optimization algorithms.
no code implementations • ICLR 2020 • Jun-Kun Wang, Xiaoyun Li, Belhal Karimi, Ping Li
We propose a new variant of AMSGrad, a popular adaptive gradient based optimization algorithm widely used for training deep neural networks.
no code implementations • 14 Nov 2018 • Jarrid Rector-Brooks, Jun-Kun Wang, Barzan Mozafari
We also show that, for the general case of (smooth) non-convex functions, FW with line search converges with high probability to a stationary point at a rate of $O\left(\frac{1}{t}\right)$, as long as the constraint set is strongly convex -- one of the fastest convergence rates in non-convex optimization.
no code implementations • NeurIPS 2018 • Jun-Kun Wang, Jacob Abernethy
In this paper we show that the technique can be enhanced to a rate of $O(1/T^2)$ by extending recent work \cite{RS13, SALS15} that leverages \textit{optimistic learning} to speed up equilibrium computation.
no code implementations • 17 May 2018 • Jacob Abernethy, Kevin A. Lai, Kfir. Y. Levy, Jun-Kun Wang
We consider the use of no-regret algorithms to compute equilibria for particular classes of convex-concave games.
no code implementations • NeurIPS 2017 • Jacob D. Abernethy, Jun-Kun Wang
We consider the Frank-Wolfe (FW) method for constrained convex optimization, and we show that this classical technique can be interpreted from a different perspective: FW emerges as the computation of an equilibrium (saddle point) of a special convex-concave zero sum game.