1 code implementation • 12 Jul 2024 • Charles Dickens, Connor Pryor, Changyu Gao, Alon Albalak, Eriq Augustine, William Wang, Stephen Wright, Lise Getoor
There is a pressing need for a unifying theory to illuminate the commonalities and differences in approaches and enable further progress.
3 code implementations • 17 Jan 2024 • Charles Dickens, Changyu Gao, Connor Pryor, Stephen Wright, Lise Getoor
We leverage convex and bilevel optimization techniques to develop a general gradient-based parameter learning framework for neural-symbolic (NeSy) systems.
no code implementations • 26 Nov 2023 • Shubham Kumar Bharti, Stephen Wright, Adish Singla, Xiaojin Zhu
The goal of the teacher is to teach a realizable target policy to the learner using minimum number of state demonstrations.
no code implementations • 4 Sep 2023 • Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, Robert Nowak
When the perturbed lower-level problem uniformly satisfies the small-error proximal error-bound (EB) condition, we propose a first-order algorithm that converges to an $\epsilon$-stationary point of the penalty function, using in total $O(\epsilon^{-3})$ and $O(\epsilon^{-7})$ accesses to first-order (stochastic) gradient oracles when the oracle is deterministic and oracles are noisy, respectively.
no code implementations • 8 Feb 2023 • Like Hui, Mikhail Belkin, Stephen Wright
We provide an extensive set of experiments on multi-class classification problems showing that the squentropy loss outperforms both the pure cross entropy and rescaled square losses in terms of the classification accuracy.
no code implementations • 26 Jan 2023 • Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, Robert Nowak
Specifically, we show that F2SA converges to an $\epsilon$-stationary solution of the bilevel problem after $\epsilon^{-7/2}, \epsilon^{-5/2}$, and $\epsilon^{-3/2}$ iterations (each iteration using $O(1)$ samples) when stochastic noises are in both level objectives, only in the upper-level objective, and not present (deterministic settings), respectively.
1 code implementation • 19 Sep 2022 • Mao Ye, Bo Liu, Stephen Wright, Peter Stone, Qiang Liu
Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning.
no code implementations • 6 Oct 2021 • Zhiyan Ding, Shi Chen, Qin Li, Stephen Wright
Finding the optimal configuration of parameters in ResNet is a nonconvex minimization problem, but first-order methods nevertheless find the global optimum in the overparameterized regime.
no code implementations • 30 May 2021 • Zhiyan Ding, Shi Chen, Qin Li, Stephen Wright
Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global optimizer with perfect fit (zero-loss) in many practical situations.
no code implementations • 12 May 2020 • Sinong Geng, Zhaobin Kuang, Jie Liu, Stephen Wright, David Page
We study the $L_1$-regularized maximum likelihood estimator/estimation (MLE) problem for discrete Markov random fields (MRFs), where efficient and scalable learning requires both sparse regularization and approximate inference.
1 code implementation • 18 Aug 2019 • Rahul Mazumder, Stephen Wright, Andrew Zheng
We consider a class of linear-programming based estimators in reconstructing a sparse signal from linear measurements.
no code implementations • 22 May 2019 • Zachary Charles, Shashank Rajput, Stephen Wright, Dimitris Papailiopoulos
Our results are derived by showing that adversarial training with gradient updates minimizes a robust version of the empirical risk at a $\mathcal{O}(\ln(t)^2/t)$ rate, despite non-smoothness.
no code implementations • 8 Jan 2019 • Kwang-Sung Jun, Rebecca Willett, Stephen Wright, Robert Nowak
We introduce the bilinear bandit problem with low-rank structure in which an action takes the form of a pair of arms from two different entity types, and the reward is a bilinear function of the known feature vectors of the arms.
1 code implementation • NeurIPS 2018 • Hongyi Wang, Scott Sievert, Zachary Charles, Shengchao Liu, Stephen Wright, Dimitris Papailiopoulos
We present ATOMO, a general framework for atomic sparsification of stochastic gradients.
no code implementations • ICML 2018 • Bin Hu, Stephen Wright, Laurent Lessard
Our combination of perspectives leads to a better understanding of accelerated variance-reduced stochastic methods for finite-sum problems.
2 code implementations • 18 May 2018 • Gábor Braun, Sebastian Pokutta, Dan Tu, Stephen Wright
We present a blended conditional gradient approach for minimizing a smooth convex function over a polytope P, combining the Frank--Wolfe algorithm (also called conditional gradient) with gradient-based steps, different from away steps and pairwise steps, but still achieving linear convergence for strongly convex functions, along with good practical performance.
no code implementations • NeurIPS 2017 • Cong Han Lim, Stephen Wright
We study the norms obtained from extending the k-support norm and OWL norms to the setting in which there are overlapping groups.
no code implementations • 6 Nov 2017 • Kwang-Sung Jun, Francesco Orabona, Stephen Wright, Rebecca Willett
A key challenge in online learning is that classical algorithms can be slow to adapt to changing environments.
no code implementations • 14 Oct 2016 • Kwang-Sung Jun, Francesco Orabona, Rebecca Willett, Stephen Wright
This paper describes a new parameter-free online learning algorithm for changing environments.
no code implementations • NeurIPS 2014 • Cong Han Lim, Stephen Wright
Using a recent construction of Goemans (2010), we show that when optimizing over the convex hull of the permutation vectors (the permutahedron), we can reduce the number of variables and constraints to $\Theta(n \log n)$ in theory and $\Theta(n \log^2 n)$ in practice.
no code implementations • 23 Apr 2014 • Nikhil Rao, Parikshit Shah, Stephen Wright
CoGEnT combines a greedy selection scheme based on the conditional gradient approach with a backward (or "truncation") step that exploits the quadratic nature of the objective to reduce the basis size.
no code implementations • NeurIPS 2013 • Srikrishna Sridhar, Stephen Wright, Christopher Re, Ji Liu, Victor Bittorf, Ce Zhang
Many problems in machine learning can be solved by rounding the solution of an appropriate linear program.
no code implementations • NeurIPS 2011 • Benjamin Recht, Christopher Re, Stephen Wright, Feng Niu
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks.