no code implementations • 24 Jan 2012 • Tianbao Yang, Mehrdad Mahdavi, Rong Jin, Shenghuo Zhu
We study the non-smooth optimization problems in machine learning, where both the loss function and the regularizer are non-smooth functions.
no code implementations • 27 Jun 2012 • Ming Ji, Tianbao Yang, Binbin Lin, Rong Jin, Jiawei Han
In this work, we develop a simple algorithm for semi-supervised regression.
no code implementations • 13 Nov 2012 • Lijun Zhang, Mehrdad Mahdavi, Rong Jin, Tianbao Yang, Shenghuo Zhu
Random projection has been widely used in data classification.
no code implementations • 26 Nov 2012 • Mehrdad Mahdavi, Tianbao Yang, Rong Jin
We first propose a projection based algorithm which attains an $O(T^{-1/3})$ convergence rate.
no code implementations • NeurIPS 2012 • Jinfeng Yi, Rong Jin, Shaili Jain, Tianbao Yang, Anil K. Jain
One difficulty in learning the pairwise similarity measure is that there is a significant amount of noise and inter-worker variations in the manual annotations obtained via crowdsourcing.
no code implementations • NeurIPS 2012 • Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou
Both random Fourier features and the Nyström method have been successfully applied to efficient kernel learning.
no code implementations • NeurIPS 2012 • Mehrdad Mahdavi, Tianbao Yang, Rong Jin, Shenghuo Zhu, Jin-Feng Yi
Although many variants of stochastic gradient descent have been proposed for large-scale convex optimization, most of them require projecting the solution at {\it each} iteration to ensure that the obtained solution stays within the feasible domain.
no code implementations • 2 Apr 2013 • Lijun Zhang, Tianbao Yang, Rong Jin, Xiaofei He
Traditional algorithms for stochastic optimization require projecting the solution at each iteration into a given domain to ensure its feasibility.
no code implementations • 19 Apr 2013 • Jianhui Chen, Tianbao Yang, Qihang Lin, Lijun Zhang, Yi Chang
We consider stochastic strongly convex optimization with a complex inequality constraint.
no code implementations • NeurIPS 2013 • Mehrdad Mahdavi, Tianbao Yang, Rong Jin
It leverages on the theory of Lagrangian method in constrained optimization and attains the optimal convergence rate of $[O(1/ \sqrt{T})]$ in high probability for general Lipschitz continuous objectives.
no code implementations • NeurIPS 2013 • Tianbao Yang
We make a progress along the line by presenting a distributed stochastic dual coordinate ascent algorithm in a star network, with an analysis of the tradeoff between computation and communication.
no code implementations • 4 Dec 2013 • Tianbao Yang, Shenghuo Zhu, Rong Jin, Yuanqing Lin
Extraordinary performances have been observed and reported for the well-motivated updates, as referred to the practical updates, compared to the naive updates.
no code implementations • 13 Aug 2014 • Tianbao Yang, Rong Jin, Shenghuo Zhu, Qihang Lin
In this work, we study data preconditioning, a well-known and long-existing technique, for boosting the convergence of first-order methods for regularized loss minimization.
no code implementations • NeurIPS 2014 • Tianbao Yang, Rong Jin
In this work, we study the problem of transductive pairwise classification from pairwise similarities~\footnote{The pairwise similarities are usually derived from some side information instead of the underlying class labels.}.
no code implementations • 10 Dec 2014 • Xiaoyu Wang, Tianbao Yang, Guobin Chen, Yuanqing Lin
In contrast, this paper proposes an \emph{object-centric sampling} (OCS) scheme that samples image windows based on the object location information.
no code implementations • 15 Apr 2015 • Tianbao Yang, Lijun Zhang, Rong Jin, Shenghuo Zhu
In this paper, we study randomized reduction methods, which reduce high-dimensional features into low-dimensional space by randomized methods (e. g., random projection, random hashing), for large-scale high-dimensional classification.
no code implementations • 26 Apr 2015 • Lijun Zhang, Tianbao Yang, Rong Jin, Zhi-Hua Zhou
To the best of our knowledge, this is first time such a relative bound is proved for the regularized formulation of matrix completion.
no code implementations • 4 May 2015 • Tianbao Yang, Lijun Zhang, Rong Jin, Shenghuo Zhu
In this paper, we consider the problem of column subset selection.
no code implementations • CVPR 2015 • Saining Xie, Tianbao Yang, Xiaoyu Wang, Yuanqing Lin
We demonstrate the success of the proposed framework on two small-scale fine-grained datasets (Stanford Dogs and Stanford Cars) and on a large-scale car dataset that we collected.
no code implementations • 18 Jul 2015 • Tianbao Yang, Lijun Zhang, Qihang Lin, Rong Jin
In this paper, we study a fast approximation method for {\it large-scale high-dimensional} sparse least-squares regression problem by exploiting the Johnson-Lindenstrauss (JL) transforms, which embed a set of high-dimensional vectors into a low-dimensional space.
no code implementations • 27 Jul 2015 • Jason D. Lee, Qihang Lin, Tengyu Ma, Tianbao Yang
We also prove a lower bound for the number of rounds of communication for a broad class of distributed first-order methods including the proposed algorithms in this paper.
no code implementations • 14 Aug 2015 • Adams Wei Yu, Qihang Lin, Tianbao Yang
We propose a doubly stochastic primal-dual coordinate optimization algorithm for empirical risk minimization, which can be formulated as a bilinear saddle-point problem.
no code implementations • 25 Sep 2015 • Lijun Zhang, Tianbao Yang, Rong Jin, Zhi-Hua Zhou
In this paper, we study a special bandit setting of online stochastic linear optimization, where only one-bit of information is revealed to the learner at each round.
no code implementations • 6 Oct 2015 • Tianbao Yang, Qihang Lin
In this paper, we show that simple {Stochastic} subGradient Decent methods with multiple Restarting, named {\bf RSGD}, can achieve a \textit{linear convergence rate} for a class of non-smooth and non-strongly convex optimization problems where the epigraph of the objective function is a polyhedron, to which we refer as {\bf polyhedral convex optimization}.
no code implementations • 5 Nov 2015 • Lijun Zhang, Tianbao Yang, Rong Jin, Zhi-Hua Zhou
In this paper, we utilize stochastic optimization to reduce the space complexity of convex composite optimization with a nuclear norm regularizer, where the variable is a matrix of size $m \times n$.
no code implementations • 12 Nov 2015 • Lijun Zhang, Tianbao Yang, Rong Jin, Zhi-Hua Zhou
In this paper, we develop a randomized algorithm and theory for learning a sparse model from large-scale and high-dimensional data, which is usually formulated as an empirical risk minimization problem with a sparsity-inducing regularizer.
no code implementations • 9 Dec 2015 • Tianbao Yang, Qihang Lin
We show that, when applied to a broad class of convex optimization problems, RSG method can find an $\epsilon$-optimal solution with a low complexity than SG method.
no code implementations • NeurIPS 2016 • Zhe Li, Boqing Gong, Tianbao Yang
To exhibit the optimal dropout probabilities, we analyze the shallow learning with multinomial dropout and establish the risk bound for stochastic optimization.
no code implementations • 12 Apr 2016 • Tianbao Yang, Qihang Lin, Zhe Li
This paper fills the gap between practice and theory by developing a basic convergence analysis of two stochastic momentum methods, namely stochastic heavy-ball method and the stochastic variant of Nesterov's accelerated gradient method.
no code implementations • CVPR 2016 • Chuang Gan, Tianbao Yang, Boqing Gong
Attributes possess appealing properties and benefit many computer vision problems, such as object recognition, learning with humans in the loop, and image retrieval.
no code implementations • 16 May 2016 • Tianbao Yang, Lijun Zhang, Rong Jin, Jin-Feng Yi
Secondly, we present a lower bound with noisy gradient feedback and then show that we can achieve optimal dynamic regrets under a stochastic gradient feedback and two-point bandit feedback.
no code implementations • 4 Jul 2016 • Yi Xu, Qihang Lin, Tianbao Yang
In particular, if the objective function $F(\mathbf w)$ in the $\epsilon$-sublevel set grows as fast as $\|\mathbf w - \mathbf w_*\|_2^{1/\theta}$, where $\mathbf w_*$ represents the closest optimal solution to $\mathbf w$ and $\theta\in(0, 1]$ quantifies the local growth rate, the iteration complexity of first-order stochastic optimization for achieving an $\epsilon$-optimal solution can be $\widetilde O(1/\epsilon^{2(1-\theta)})$, which is optimal at most up to a logarithmic factor.
no code implementations • NeurIPS 2016 • Yi Xu, Yan Yan, Qihang Lin, Tianbao Yang
In this work, we will show that the proposed HOPS achieved a lower iteration complexity of $\widetilde O(1/\epsilon^{1-\theta})$\footnote{$\widetilde O()$ suppresses a logarithmic factor.}
no code implementations • ICML 2017 • Tianbao Yang, Qihang Lin, Lijun Zhang
In this paper, we develop projection reduced optimization algorithms for both smooth and non-smooth optimization with improved convergence rates under a certain regularity condition of the constraint function.
no code implementations • NeurIPS 2017 • Lijun Zhang, Tianbao Yang, Jin-Feng Yi, Rong Jin, Zhi-Hua Zhou
When multiple gradients are accessible to the learner, we first demonstrate that the dynamic regret of strongly convex functions can be upper bounded by the minimum of the path-length and the squared path-length.
no code implementations • 23 Nov 2016 • Mingrui Liu, Tianbao Yang
Recent studies have shown that proximal gradient (PG) method and accelerated gradient method (APG) with restarting can enjoy a linear convergence under a weaker condition than strong convexity, namely a quadratic growth condition (QGC).
no code implementations • NeurIPS 2016 • Yi Xu, Yan Yan, Qihang Lin, Tianbao Yang
To the best of our knowledge, this is the lowest iteration complexity achieved so far for the considered non-smooth optimization problems without strong convexity assumption.
no code implementations • 6 Dec 2016 • Yi Xu, Haiqin Yang, Lijun Zhang, Tianbao Yang
Previously, oblivious random projection based approaches that project high dimensional features onto a random subspace have been used in practice for tackling high-dimensionality challenge in machine learning.
no code implementations • ICML 2018 • Lijun Zhang, Tianbao Yang, Rong Jin, Zhi-Hua Zhou
To cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret independently.
no code implementations • 7 Feb 2017 • Lijun Zhang, Tianbao Yang, Rong Jin
First, we establish an $\widetilde{O}(d/n + \sqrt{F_*/n})$ risk bound when the random function is nonnegative, convex and smooth, and the expected function is Lipschitz continuous, where $d$ is the dimensionality of the problem, $n$ is the number of samples, and $F_*$ is the minimal risk.
1 code implementation • 27 Apr 2017 • Yaohui Zeng, Tianbao Yang, Patrick Breheny
However, with the ultrahigh-dimensional, large-scale data sets now collected in many real-world applications, it is important to develop algorithms to solve the lasso that efficiently scale up to problems of this size.
no code implementations • 13 Jun 2017 • Zhe Li, Xiaoyu Wang, Xutao Lv, Tianbao Yang
By doing this, we show that previous deep CNNs such as GoogLeNet and Inception-type Nets can be compressed dramatically with marginal drop in performance.
no code implementations • ICML 2017 • Yi Xu, Qihang Lin, Tianbao Yang
In this paper, a new theory is developed for first-order stochastic convex optimization, showing that the global convergence rate is sufficiently quantified by a local growth rate of the objective function in a neighborhood of the optimal solutions.
no code implementations • 9 Sep 2017 • Tianbao Yang, Zhe Li, Lijun Zhang
In this paper, we present a simple analysis of {\bf fast rates} with {\it high probability} of {\bf empirical minimization} for {\it stochastic composite optimization} over a finite-dimensional bounded convex set with exponential concave loss functions and an arbitrary convex regularization.
no code implementations • 25 Sep 2017 • Mingrui Liu, Tianbao Yang
To the best of our knowledge, the proposed stochastic algorithm is the first one that converges to a second-order stationary point in {\it high probability} with a time complexity independent of the sample size and almost linear in dimensionality.
no code implementations • 25 Oct 2017 • Mingrui Liu, Tianbao Yang
In this paper, we study stochastic non-convex optimization with non-convex random functions.
no code implementations • NeurIPS 2018 • Yi Xu, Rong Jin, Tianbao Yang
Two classes of methods have been proposed for escaping from saddle points with one using the second-order information carried by the Hessian and the other adding the noise into the first-order information.
no code implementations • NeurIPS 2017 • Yi Xu, Qihang Lin, Tianbao Yang
The most studied error bound is the quadratic error bound, which generalizes strong convexity and is satisfied by a large family of machine learning problems.
no code implementations • NeurIPS 2017 • Mingrui Liu, Tianbao Yang
Recent studies have shown that proximal gradient (PG) method and accelerated gradient method (APG) with restarting can enjoy a linear convergence under a weaker condition than strong convexity, namely a quadratic growth condition (QGC).
no code implementations • NeurIPS 2017 • Yi Xu, Mingrui Liu, Qihang Lin, Tianbao Yang
The novelty of the proposed scheme lies at that it is adaptive to a local sharpness property of the objective function, which marks the key difference from previous adaptive scheme that adjusts the penalty parameter per-iteration based on certain conditions on iterates.
no code implementations • 4 Dec 2017 • Yi Xu, Rong Jin, Tianbao Yang
Accelerated gradient (AG) methods are breakthroughs in convex optimization, improving the convergence rate of the gradient descent method for optimization with smooth functions.
no code implementations • NeurIPS 2018 • Mingrui Liu, Xiaoxuan Zhang, Lijun Zhang, Rong Jin, Tianbao Yang
Error bound conditions (EBC) are properties that characterize the growth of an objective function when a point is moved away from the optimal set.
no code implementations • 21 May 2018 • Yi Xu, Shenghuo Zhu, Sen yang, Chi Zhang, Rong Jin, Tianbao Yang
Learning with a {\it convex loss} function has been a dominating paradigm for many years.
no code implementations • 3 Jun 2018 • Zhe Li, Xuehan Xiong, Zhou Ren, Ning Zhang, Xiaoyu Wang, Tianbao Yang
In this paper, we study how to design a genetic programming approach for optimizing the structure of a CNN for a given task under limited computational resources yet without imposing strong restrictions on the search space.
no code implementations • CVPR 2019 • Jian Ren, Zhe Li, Jianchao Yang, Ning Xu, Tianbao Yang, David J. Foran
In this paper, we propose an Ecologically-Inspired GENetic (EIGEN) approach that uses the concept of succession, extinction, mimicry, and gene duplication to search neural network structure from scratch with poorly initialized simple network and few constraints forced during the evolution, as we assume no prior knowledge about the task domain.
no code implementations • ICML 2018 • Qihang Lin, Runchao Ma, Tianbao Yang
To update the level parameter towards the optimality, both methods require an oracle that generates upper and lower bounds as well as an affine-minorant of the level function.
no code implementations • ICML 2018 • Mingrui Liu, Xiaoxuan Zhang, Zaiyi Chen, Xiaoyu Wang, Tianbao Yang
In this paper, we consider statistical learning with AUC (area under ROC curve) maximization in the classical stochastic setting where one random data drawn from an unknown distribution is revealed at each iteration for updating the model.
no code implementations • ICML 2018 • Zaiyi Chen, Yi Xu, Enhong Chen, Tianbao Yang
Although the convergence rates of existing variants of ADAGRAD have a better dependence on the number of iterations under the strong convexity condition, their iteration complexities have a explicitly linear dependence on the dimensionality of the problem.
no code implementations • ECCV 2018 • Yandong Li, Liqiang Wang, Tianbao Yang, Boqing Gong
The large volume of video content and high viewing frequency demand automatic video summarization algorithms, of which a key property is the capability of modeling diversity.
no code implementations • ECCV 2018 • Aidean Sharghi, Ali Borji, Chengtao Li, Tianbao Yang, Boqing Gong
In terms of modeling, we design a new probabilistic distribution such that, when it is integrated into SeqDPP, the resulting model accepts user input about the expected length of the summary.
no code implementations • ICLR 2019 • Zaiyi Chen, Zhuoning Yuan, Jin-Feng Yi, Bo-Wen Zhou, Enhong Chen, Tianbao Yang
For example, there is still a lack of theories of convergence for SGD and its variants that use stagewise step size and return an averaged solution in practice.
no code implementations • 30 Aug 2018 • Yan Yan, Tianbao Yang, Zhe Li, Qihang Lin, Yi Yang
However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored.
no code implementations • ICLR 2019 • Pingbo Pan, Yan Yan, Tianbao Yang, Yi Yang
In this work, we propose to refine the predictions of structured prediction models by effectively integrating discriminative models into the prediction.
no code implementations • 4 Oct 2018 • Hassan Rafique, Mingrui Liu, Qihang Lin, Tianbao Yang
Min-max problems have broad applications in machine learning, including learning with non-decomposable loss and learning with robustness to data distribution.
no code implementations • 24 Oct 2018 • Mingrui Liu, Hassan Rafique, Qihang Lin, Tianbao Yang
In this paper, we consider first-order convergence theory and algorithms for solving a class of non-convex non-concave min-max saddle-point problems, whose objective function is weakly convex in the variables of minimization and weakly concave in the variables of maximization.
no code implementations • 28 Nov 2018 • Yi Xu, Qi Qi, Qihang Lin, Rong Jin, Tianbao Yang
In this paper, we propose new stochastic optimization algorithms and study their first-order convergence theories for solving a broad family of DC functions.
no code implementations • NeurIPS 2018 • Xiaoxuan Zhang, Mingrui Liu, Xun Zhou, Tianbao Yang
To advance OFO, we propose an efficient online algorithm based on simultaneously learning a posterior probability of class and learning an optimal threshold by minimizing a stochastic strongly convex function with unknown strong convexity parameter.
no code implementations • NeurIPS 2018 • Mingrui Liu, Zhe Li, Xiaoyu Wang, Jin-Feng Yi, Tianbao Yang
Negative curvature descent (NCD) method has been utilized to design deterministic or stochastic algorithms for non-convex optimization aiming at finding second-order stationary points or local minima.
no code implementations • NeurIPS 2019 • Zhuoning Yuan, Yan Yan, Rong Jin, Tianbao Yang
For convex loss functions and two classes of "nice-behaviored" non-convex objectives that are close to a convex function, we establish faster convergence of stagewise training than the vanilla SGD under the PL condition on both training error and testing error.
no code implementations • 23 Apr 2019 • Yan Yan, Yi Xu, Qihang Lin, Lijun Zhang, Tianbao Yang
The main contribution of this paper is the design and analysis of new stochastic primal-dual algorithms that use a mixture of stochastic gradient updates and a logarithmic number of deterministic dual updates for solving a family of convex-concave problems with no bilinear structure assumed.
no code implementations • 7 Aug 2019 • Qihang Lin, Selvaprabu Nadarajah, Negar Soheili, Tianbao Yang
We design a stochastic feasible level set method (SFLS) for SOECs that has low data complexity and emphasizes feasibility before convergence.
no code implementations • ICML 2020 • Yan Yan, Yi Xu, Lijun Zhang, Xiaoyu Wang, Tianbao Yang
In this paper, we study a family of non-convex and possibly non-smooth inf-projection minimization problems, where the target objective function is equal to minimization of a joint function over another variable.
no code implementations • ICLR 2020 • Mingrui Liu, Zhuoning Yuan, Yiming Ying, Tianbao Yang
In this paper, we consider stochastic AUC maximization problem with a deep neural network as the predictive model.
1 code implementation • NeurIPS 2020 • Yunhui Guo, Mingrui Liu, Tianbao Yang, Tajana Rosing
This view leads to two improved schemes for episodic memory based lifelong learning, called MEGA-I and MEGA-II.
no code implementations • 25 Sep 2019 • Yunhui Guo, Mingrui Liu, Tianbao Yang, Tajana Rosing
In this paper, we introduce a novel and effective lifelong learning algorithm, called MixEd stochastic GrAdient (MEGA), which allows deep neural networks to acquire the ability of retaining performance on old tasks while learning new tasks.
no code implementations • NeurIPS 2020 • Mingrui Liu, Wei zhang, Youssef Mroueh, Xiaodong Cui, Jerret Ross, Tianbao Yang, Payel Das
Despite recent progress on decentralized algorithms for training deep neural networks, it remains unclear whether it is possible to train GANs in a decentralized manner.
1 code implementation • ECCV 2020 • Qi Qi, Yan Yan, Xiaoyu Wang, Tianbao Yang
To tackle this issue, we propose a simple and effective framework to sample pairs in a batch of data for updating the model.
no code implementations • ICLR 2020 • Mingrui Liu, Youssef Mroueh, Jerret Ross, Wei zhang, Xiaodong Cui, Payel Das, Tianbao Yang
Then we propose an adaptive variant of OSG named Optimistic Adagrad (OAdagrad) and reveal an \emph{improved} adaptive complexity $O\left(\epsilon^{-\frac{2}{1-\alpha}}\right)$, where $\alpha$ characterizes the growth rate of the cumulative stochastic gradient and $0\leq \alpha\leq 1/2$.
no code implementations • ICLR 2020 • Yunhui Guo, Mingrui Liu, Yandong Li, Liqiang Wang, Tianbao Yang, Tajana Rosing
We evaluate the effectiveness of traditional attack methods such as FGSM and PGD. The results show that A-GEM still possesses strong continual learning ability in the presence of adversarial examples in the memory and simple defense techniques such as label smoothing can further alleviate the adversarial effects.
no code implementations • 6 Feb 2020 • Lijun Zhang, Shiyin Lu, Tianbao Yang
To address this limitation, new performance measures, including dynamic regret and adaptive regret have been proposed to guide the design of online algorithms.
no code implementations • NeurIPS 2020 • Yan Yan, Yi Xu, Qihang Lin, Wei Liu, Tianbao Yang
In this paper, we bridge this gap by providing a sharp analysis of epoch-wise stochastic gradient descent ascent method (referred to as Epoch-GDA) for solving strongly convex strongly concave (SCSC) min-max problems, without imposing any additional assumption about smoothness or the function's structure.
no code implementations • 9 Mar 2020 • Zhishuai Guo, Yan Yan, Tianbao Yang
It remains unclear how these averaging schemes affect the convergence of {\it both optimization error and generalization error} (two equally important components of testing error) for {\bf non-strongly convex objectives, including non-convex problems}.
1 code implementation • ICML 2020 • Zhishuai Guo, Mingrui Liu, Zhuoning Yuan, Li Shen, Wei Liu, Tianbao Yang
In this paper, we study distributed algorithms for large-scale AUC maximization with a deep neural network as a predictive model.
no code implementations • 12 Jun 2020 • Zhishuai Guo, Yan Yan, Zhuoning Yuan, Tianbao Yang
However, most of the existing algorithms are slow in practice, and their analysis revolves around the convergence to a nearly stationary point. We consider leveraging the Polyak-Lojasiewicz (PL) condition to design faster stochastic algorithms with stronger convergence guarantee.
no code implementations • 17 Jun 2020 • Yan Yan, Xin Man, Tianbao Yang
In this paper, we propose robust stochastic algorithms for solving convex compositional problems of the form $f(\E_\xi g(\cdot; \xi)) + r(\cdot)$ by establishing {\bf sub-Gaussian confidence bounds} under weak assumptions about the tails of noise distribution, i. e., {\bf heavy-tailed noise} with bounded second-order moments.
1 code implementation • NeurIPS 2021 • Qi Qi, Zhishuai Guo, Yi Xu, Rong Jin, Tianbao Yang
In this paper, we propose a practical online method for solving a class of distributionally robust optimization (DRO) with non-convex objectives, which has important applications in machine learning for improving the robustness of neural networks.
no code implementations • 14 Sep 2020 • Daoming Lyu, Qi Qi, Mohammad Ghavamzadeh, Hengshuai Yao, Tianbao Yang, Bo Liu
To achieve variance-reduced off-policy-stable policy optimization, we propose an algorithm family that is memory-efficient, stochastically variance-reduced, and capable of learning from off-policy samples.
no code implementations • 24 Nov 2020 • Mingrui Liu, Wei zhang, Francesco Orabona, Tianbao Yang
As a result, Adam$^+$ requires few parameter tuning, as Adam, but it enjoys a provable convergence guarantee.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 2 Dec 2020 • Zhengyang Wang, Meng Liu, Youzhi Luo, Zhao Xu, Yaochen Xie, Limei Wang, Lei Cai, Qi Qi, Zhuoning Yuan, Tianbao Yang, Shuiwang Ji
Here we develop a suite of comprehensive machine learning methods and tools spanning different computational models, molecular representations, and loss functions for molecular property prediction and drug discovery.
4 code implementations • ICCV 2021 • Zhuoning Yuan, Yan Yan, Milan Sonka, Tianbao Yang
Our studies demonstrate that the proposed DAM method improves the performance of optimizing cross-entropy loss by a large margin, and also achieves better performance than optimizing the existing AUC square loss on these medical image classification tasks.
Ranked #2 on Multi-Label Classification on CheXpert
1 code implementation • 13 Dec 2020 • Qi Qi, Yi Xu, Rong Jin, Wotao Yin, Tianbao Yang
In this paper, we present a simple yet effective provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
1 code implementation • 9 Feb 2021 • Zhuoning Yuan, Zhishuai Guo, Yi Xu, Yiming Ying, Tianbao Yang
Deep AUC (area under the ROC curve) Maximization (DAM) has attracted much attention recently due to its great potential for imbalanced data classification.
no code implementations • NeurIPS 2021 • Lijun Zhang, Wei Jiang, Shiyin Lu, Tianbao Yang
Moreover, when the hitting cost is both convex and $\lambda$-quadratic growth, we reduce the competitive ratio to $1 + \frac{2}{\sqrt{\lambda}}$ by minimizing the weighted sum of the hitting cost and the switching cost.
no code implementations • NeurIPS 2021 • Guanghui Wang, Yuanyu Wan, Tianbao Yang, Lijun Zhang
To control the switching cost, we introduce the problem of online convex optimization with continuous switching constraint, where the goal is to achieve a small regret given a budget on the \emph{overall} switching cost.
1 code implementation • NeurIPS 2021 • Qi Qi, Youzhi Luo, Zhao Xu, Shuiwang Ji, Tianbao Yang
Compared with AUROC, AUPRC is a more appropriate metric for highly imbalanced datasets.
no code implementations • 30 Apr 2021 • Zhishuai Guo, Yi Xu, Wotao Yin, Rong Jin, Tianbao Yang
Our analysis exhibits that an increasing or large enough "momentum" parameter for the first-order moment used in practice is sufficient to ensure Adam and its many variants converge under a mild boundness condition on the adaptive scaling factor of the step size.
no code implementations • 5 May 2021 • Zhishuai Guo, Quanqi Hu, Lijun Zhang, Tianbao Yang
Although numerous studies have proposed stochastic algorithms for solving these problems, they are limited in two perspectives: (i) their sample complexities are high, which do not match the state-of-the-art result for non-convex stochastic optimization; (ii) their algorithms are tailored to problems with only one lower-level problem.
1 code implementation • 8 May 2021 • Yunwen Lei, Zhenhuan Yang, Tianbao Yang, Yiming Ying
In this paper, we provide a comprehensive generalization analysis of stochastic gradient methods for minimax problems under both convex-concave and nonconvex-nonconcave cases through the lens of algorithmic stability.
no code implementations • 8 May 2021 • Lijun Zhang, Guanghui Wang, JinFeng Yi, Tianbao Yang
In this paper, we propose a simple strategy for universal online convex optimization, which avoids these limitations.
1 code implementation • 9 Jun 2021 • Bokun Wang, Zhuoning Yuan, Yiming Ying, Tianbao Yang
The proposed algorithms require sampling a constant number of tasks and data samples per iteration, making them suitable for the continual learning scenario.
no code implementations • 2 Jul 2021 • Guanghui Wang, Ming Yang, Lijun Zhang, Tianbao Yang
In this paper, we further improve the stochastic optimization of AURPC by (i) developing novel stochastic momentum methods with a better iteration complexity of $O(1/\epsilon^4)$ for finding an $\epsilon$-stationary solution; and (ii) designing a novel family of stochastic adaptive methods with the same iteration complexity, which enjoy faster convergence in practice.
no code implementations • ICLR 2022 • Zhuoning Yuan, Zhishuai Guo, Nitesh Chawla, Tianbao Yang
The key idea of compositional training is to minimize a compositional objective function, where the outer function corresponds to an AUC loss and the inner function represents a gradient descent step for minimizing a traditional loss, e. g., the cross-entropy (CE) loss.
no code implementations • 1 Nov 2021 • Tianbao Yang
In this extended abstract, we will present and discuss opportunities and challenges brought about by a new deep learning method by AUC maximization (aka \underline{\bf D}eep \underline{\bf A}UC \underline{\bf M}aximization or {\bf DAM}) for medical image classification.
1 code implementation • 23 Nov 2021 • Zhenhuan Yang, Yunwen Lei, Puyu Wang, Tianbao Yang, Yiming Ying
A popular approach to handle streaming data in pairwise learning is an online gradient descent (OGD) algorithm, where one needs to pair the current instance with a buffering set of previous instances with a sufficiently large size and therefore suffers from a scalability issue.
no code implementations • NeurIPS 2021 • Zhenhuan Yang, Yunwen Lei, Puyu Wang, Tianbao Yang, Yiming Ying
A popular approach to handle streaming data in pairwise learning is an online gradient descent (OGD) algorithm, where one needs to pair the current instance with a buffering set of previous instances with a sufficiently large size and therefore suffers from a scalability issue.
no code implementations • 7 Dec 2021 • Zhishuai Guo, Yi Xu, Wotao Yin, Rong Jin, Tianbao Yang
Although rigorous convergence analysis exists for Adam, they impose specific requirements on the update of the adaptive step size, which are not generic enough to cover many other variants of Adam.
1 code implementation • 30 Dec 2021 • Dixian Zhu, Yiming Ying, Tianbao Yang
We study a family of loss functions named label-distributionally robust (LDR) losses for multi-class classification that are formulated from distributionally robust optimization (DRO) perspective, where the uncertainty in the given label information are modeled and captured by taking the worse case of distributional weights.
no code implementations • 15 Feb 2022 • Wei Jiang, Bokun Wang, Yibo Wang, Lijun Zhang, Tianbao Yang
To address these limitations, we propose a Stochastic Multi-level Variance Reduction method (SMVR), which achieves the optimal sample complexity of $\mathcal{O}\left(1 / \epsilon^{3}\right)$ to find an $\epsilon$-stationary point for non-convex objectives.
no code implementations • 24 Feb 2022 • Bokun Wang, Tianbao Yang
This paper studies stochastic optimization for a sum of compositional functions, where the inner-level function of each summand is coupled with the corresponding summation index.
1 code implementation • 24 Feb 2022 • Zi-Hao Qiu, Quanqi Hu, Yongjian Zhong, Lijun Zhang, Tianbao Yang
To the best of our knowledge, this is the first time that stochastic algorithms are proposed to optimize NDCG with a provable convergence guarantee.
1 code implementation • 24 Feb 2022 • Zhuoning Yuan, Yuexin Wu, Zi-Hao Qiu, Xianzhi Du, Lijun Zhang, Denny Zhou, Tianbao Yang
In this paper, we study contrastive learning from an optimization perspective, aiming to analyze and address a fundamental issue of existing contrastive learning methods that either rely on a large batch size or a large dictionary of feature vectors.
no code implementations • 1 Mar 2022 • Dixian Zhu, Gang Li, Bokun Wang, Xiaodong Wu, Tianbao Yang
In this paper, we propose systematic and efficient gradient-based methods for both one-way and two-way partial AUC (pAUC) maximization that are applicable to deep learning.
no code implementations • 3 Mar 2022 • Yao Yao, Qihang Lin, Tianbao Yang
The partial AUC, as a generalization of the AUC, summarizes only the TPRs over a specific range of the FPRs and is thus a more suitable performance measure in many real-world situations.
no code implementations • 27 Mar 2022 • Dixian Zhu, Xiaodong Wu, Tianbao Yang
(i) We benchmark a variety of loss functions with different algorithmic choices for deep AUROC optimization problem.
no code implementations • 28 Mar 2022 • Tianbao Yang, Yiming Ying
We also identify and discuss remaining and emerging issues for deep AUC maximization, and provide suggestions on topics for future work.
no code implementations • 2 May 2022 • Lijun Zhang, Wei Jiang, JinFeng Yi, Tianbao Yang
In this paper, we investigate an online prediction strategy named as Discounted-Normal-Predictor (Kapralov and Panigrahy, 2010) for smoothed online convex optimization (SOCO), in which the learner needs to minimize not only the hitting cost but also the switching cost.
no code implementations • 1 Jun 2022 • Tianbao Yang
This manuscript introduces a new optimization framework for machine learning and AI, named {\bf empirical X-risk minimization (EXM)}.
no code implementations • 1 Jun 2022 • Quanqi Hu, Yongjian Zhong, Tianbao Yang
To tackle this challenge, we present a single-loop randomized stochastic algorithm, which requires updates for only a constant number of blocks at each iteration.
1 code implementation • 14 Jun 2022 • Haiyang Yu, Limei Wang, Bokun Wang, Meng Liu, Tianbao Yang, Shuiwang Ji
GraphFM-IB applies FM to in-batch sampled data, while GraphFM-OB applies FM to out-of-batch data that are 1-hop neighborhood of in-batch data.
no code implementations • 18 Jul 2022 • Wei Jiang, Gang Li, Yibo Wang, Lijun Zhang, Tianbao Yang
The key issue is to track and estimate a sequence of $\mathbf g(\mathbf{w})=(g_1(\mathbf{w}), \ldots, g_m(\mathbf{w}))$ across iterations, where $\mathbf g(\mathbf{w})$ has $m$ blocks and it is only allowed to probe $\mathcal{O}(1)$ blocks to attain their stochastic values and Jacobians.
no code implementations • 11 Oct 2022 • Qi Qi, Jiameng Lyu, Kung sik Chan, Er Wei Bai, Tianbao Yang
Distributionally Robust Optimization (DRO), as a popular method to train robust models against distribution shift between training and test sets, has received tremendous attention in recent years.
no code implementations • 12 Oct 2022 • Qi Qi, Shervin Ardeshir, Yi Xu, Tianbao Yang
Improving fairness between privileged and less-privileged sensitive attribute groups (e. g, {race, gender}) has attracted lots of attention.
1 code implementation • 26 Oct 2022 • Zhishuai Guo, Rong Jin, Jiebo Luo, Tianbao Yang
To this end, we propose an active-passive decomposition framework that decouples the gradient's components with two types, namely active parts and passive parts, where the active parts depend on local data that are computed with the local model and the passive parts depend on other machines that are communicated/computed based on historical models and samples.
no code implementations • 23 Dec 2022 • Yao Yao, Qihang Lin, Tianbao Yang
In this work, we formulate the training problem of a fairness-aware machine learning model as an AUC optimization problem subject to a class of AUC-based fairness constraints.
no code implementations • NeurIPS 2023 • Lijun Zhang, Peng Zhao, Zhen-Hua Zhuang, Tianbao Yang, Zhi-Hua Zhou
First, we formulate GDRO as a stochastic convex-concave saddle-point problem, and demonstrate that stochastic mirror descent (SMD), using $m$ samples in each iteration, achieves an $O(m (\log m)/\epsilon^2)$ sample complexity for finding an $\epsilon$-optimal solution, which matches the $\Omega(m/\epsilon^2)$ lower bound up to a logarithmic factor.
no code implementations • 24 Feb 2023 • Yunwen Lei, Tianbao Yang, Yiming Ying, Ding-Xuan Zhou
For self-bounding Lipschitz loss functions, we further improve our results by developing optimistic bounds which imply fast rates in a low noise condition.
1 code implementation • 14 May 2023 • Dixian Zhu, Bokun Wang, Zhi Chen, Yaxing Wang, Milan Sonka, Xiaodong Wu, Tianbao Yang
This paper considers a novel application of deep AUC maximization (DAM) for multi-instance learning (MIL), in which a single class label is assigned to a bag of instances (e. g., multiple 2D slices of a CT scan for a patient).
1 code implementation • 19 May 2023 • Zi-Hao Qiu, Quanqi Hu, Zhuoning Yuan, Denny Zhou, Lijun Zhang, Tianbao Yang
In this paper, we aim to optimize a contrastive loss with individualized temperatures in a principled and systematic manner for self-supervised learning.
1 code implementation • 30 May 2023 • Quanqi Hu, Zi-Hao Qiu, Zhishuai Guo, Lijun Zhang, Tianbao Yang
In this paper, we consider non-convex multi-block bilevel optimization (MBBO) problems, which involve $m\gg 1$ lower level problems and have important applications in machine learning.
1 code implementation • 5 Jun 2023 • Zhuoning Yuan, Dixian Zhu, Zi-Hao Qiu, Gang Li, Xuanhui Wang, Tianbao Yang
This paper introduces the award-winning deep learning (DL) library called LibAUC for implementing state-of-the-art algorithms towards optimizing a family of risk functions named X-risks.
no code implementations • 13 Jun 2023 • Wei Jiang, Jiayu Qin, Lingyu Wu, Changyou Chen, Tianbao Yang, Lijun Zhang
Learning unnormalized statistical models (e. g., energy-based models) is computationally challenging due to the complexity of handling the partition function.
no code implementations • 7 Jul 2023 • Ming Yang, Xiyuan Wei, Tianbao Yang, Yiming Ying
Then, we establish the compositional uniform stability results for two popular stochastic compositional gradient descent algorithms, namely SCGD and SCSC.
no code implementations • 29 Aug 2023 • Haoran Liu, Bokun Wang, Jianling Wang, Xiangjue Dong, Tianbao Yang, James Caverlee
As powerful tools for representation learning on graphs, graph neural networks (GNNs) have played an important role in applications including social networks, recommendation systems, and online web services.
no code implementations • NeurIPS 2023 • Quanqi Hu, Dixian Zhu, Tianbao Yang
This paper investigates new families of compositional optimization problems, called $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO).
no code implementations • 18 Oct 2023 • Jianzhi Xv, Gang Li, Tianbao Yang
While deep AUC maximization (DAM) has shown remarkable success on imbalanced medical tasks, e. g., chest X-rays classification and skin lesions classification, it could suffer from severe overfitting when applied to small datasets due to its aggressive nature of pushing prediction scores of positive data away from that of negative data.
no code implementations • 4 Dec 2023 • Bokun Wang, Tianbao Yang
This paper revisits a class of convex Finite-Sum Coupled Compositional Stochastic Optimization (cFCCO) problems with many applications, including group distributionally robust optimization (GDRO), learning with imbalanced data, reinforcement learning, and learning to rank.
1 code implementation • 11 Dec 2023 • Ryan King, Tianbao Yang, Bobak Mortazavi
In downstream tasks, including in-hospital mortality prediction and phenotyping, our pretrained model outperforms baselines in settings where only a fraction of the data is labeled, emphasizing its ability to enhance ICU data analysis.
1 code implementation • 6 Apr 2024 • Zi-Hao Qiu, Siqi Guo, Mao Xu, Tuo Zhao, Lijun Zhang, Tianbao Yang
In this paper, we present a principled framework for learning a small yet generalizable temperature prediction network (TempNet) to improve LFMs.
no code implementations • ECCV 2020 • Zhuoning Yuan, Zhishuai Guo, Xiaotian Yu, Xiaoyu Wang, Tianbao Yang
In our experiment, we demonstrate that the proposed frame-work is able to train deep learning models with millions of classes and achieve above 10×speedup compared to existing approaches.