Search Results for author: Mingrui Liu

Found 36 papers, 7 papers with code

Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms

no code implementations1 Mar 2024 Toki Tahmid Inan, Mingrui Liu, Amarda Shehu

Our investigation encompasses a wide array of techniques, including SGD and its variants, flat-minima optimizers, and new algorithms we propose under the Basin Hopping framework.

Benchmarking Stochastic Optimization

Bilevel Optimization under Unbounded Smoothness: A New Algorithm and Convergence Analysis

1 code implementation17 Jan 2024 Jie Hao, Xiaochuan Gong, Mingrui Liu

When the upper-level problem is nonconvex and unbounded smooth, and the lower-level problem is strongly convex, we prove that our algorithm requires $\widetilde{\mathcal{O}}(1/\epsilon^4)$ iterations to find an $\epsilon$-stationary point in the stochastic setting, where each iteration involves calling a stochastic gradient or Hessian-vector product oracle.

Bilevel Optimization Hyperparameter Optimization +3

Stability and Generalization for Minibatch SGD and Local SGD

no code implementations2 Oct 2023 Yunwen Lei, Tao Sun, Mingrui Liu

We show both minibatch and local SGD achieve a linear speedup to attain the optimal risk bounds.

EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections for Federated Learning with Heterogeneous Data

1 code implementation14 Feb 2023 Michael Crawshaw, Yajie Bao, Mingrui Liu

In this paper, we design EPISODE, the very first algorithm to solve FL problems with heterogeneous data in the nonconvex and relaxed smoothness setting.

Federated Learning

On Inferring User Socioeconomic Status with Mobility Records

1 code implementation15 Nov 2022 Zheng Wang, Mingrui Liu, Cheng Long, Qianru Zhang, Jiangneng Li, Chunyan Miao

The DeepSEI model incorporates two networks called deep network and recurrent network, which extract the features of the mobility records from three aspects, namely spatiality, temporality and activity, one at a coarse level and the other at a detailed level.

Management

Robustness to Unbounded Smoothness of Generalized SignSGD

no code implementations23 Aug 2022 Michael Crawshaw, Mingrui Liu, Francesco Orabona, Wei zhang, Zhenxun Zhuang

We also compare these algorithms with popular optimizers on a set of deep learning tasks, observing that we can match the performance of Adam while beating the others.

Fast Composite Optimization and Statistical Recovery in Federated Learning

no code implementations17 Jul 2022 Yajie Bao, Michael Crawshaw, Shan Luo, Mingrui Liu

This paper investigates a class of composite optimization and statistical recovery problems in the FL setting, whose loss function consists of a data-dependent smooth loss and a non-smooth regularizer.

Federated Learning

Will Bilevel Optimizers Benefit from Loops

no code implementations27 May 2022 Kaiyi Ji, Mingrui Liu, Yingbin Liang, Lei Ying

Existing studies in the literature cover only some of those implementation choices, and the complexity bounds available are not refined enough to enable rigorous comparison among different implementations.

Bilevel Optimization Computational Efficiency

A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks

1 code implementation10 May 2022 Mingrui Liu, Zhenxun Zhuang, Yunwei Lei, Chunyang Liao

Gradient clipping is usually employed to address this issue in the single machine setting, but exploring this technique in the distributed setting is still in its infancy: it remains mysterious whether the gradient clipping scheme can take advantage of multiple machines to enjoy parallel speedup.

Federated Learning

Understanding AdamW through Proximal Methods and Scale-Freeness

1 code implementation31 Jan 2022 Zhenxun Zhuang, Mingrui Liu, Ashok Cutkosky, Francesco Orabona

First, we show how to re-interpret AdamW as an approximation of a proximal gradient method, which takes advantage of the closed-form proximal mapping of the regularizer instead of only utilizing its gradient information as in Adam-$\ell_2$.

Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent

no code implementations2 Dec 2021 Wei zhang, Mingrui Liu, Yu Feng, Xiaodong Cui, Brian Kingsbury, Yuhai Tu

We conduct extensive studies over 18 state-of-the-art DL models/tasks and demonstrate that DPSGD often converges in cases where SSGD diverges for large learning rates in the large batch setting.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Generalization Guarantee of SGD for Pairwise Learning

no code implementations NeurIPS 2021 Yunwen Lei, Mingrui Liu, Yiming Ying

We develop a novel high-probability generalization bound for uniformly-stable algorithms to incorporate the variance information for better generalization, based on which we establish the first nonsmooth learning algorithm to achieve almost optimal high-probability and dimension-independent generalization bounds in linear time.

Generalization Bounds Metric Learning

Asynchronous Decentralized Distributed Training of Acoustic Models

no code implementations21 Oct 2021 Xiaodong Cui, Wei zhang, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, David Kung

Specifically, we study three variants of asynchronous decentralized parallel SGD (ADPSGD), namely, fixed and randomized communication patterns on a ring as well as a delay-by-one scheme.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

On the Initialization for Convex-Concave Min-max Problems

no code implementations27 Feb 2021 Mingrui Liu, Francesco Orabona

This means that the convergence speed does not have any improvement even if the algorithm starts from the optimal solution, and hence, is oblivious to the initialization.

On the Last Iterate Convergence of Momentum Methods

no code implementations13 Feb 2021 Xiaoyu Li, Mingrui Liu, Francesco Orabona

In this paper, we focus on the convergence rate of the last iterate of SGDM.

Stochastic Optimization

Why Does Decentralized Training Outperform Synchronous Training In The Large Batch Setting?

no code implementations1 Jan 2021 Wei zhang, Mingrui Liu, Yu Feng, Brian Kingsbury, Yuhai Tu

We conduct extensive studies over 12 state-of-the-art DL models/tasks and demonstrate that DPSGD consistently outperforms SSGD in the large batch setting; and DPSGD converges in cases where SSGD diverges for large learning rates.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks

1 code implementation ICML 2020 Zhishuai Guo, Mingrui Liu, Zhuoning Yuan, Li Shen, Wei Liu, Tianbao Yang

In this paper, we study distributed algorithms for large-scale AUC maximization with a deep neural network as a predictive model.

Distributed Optimization

Improving Efficiency in Large-Scale Decentralized Distributed Training

no code implementations4 Feb 2020 Wei Zhang, Xiaodong Cui, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, Youssef Mroueh, Alper Buyuktosunoglu, Payel Das, David Kung, Michael Picheny

Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchronous Parallel SGD (AD-PSGD) is a family of distributed learning algorithms that have been demonstrated to perform well for large-scale deep learning tasks.

speech-recognition Speech Recognition

Attacking Lifelong Learning Models with Gradient Reversion

no code implementations ICLR 2020 Yunhui Guo, Mingrui Liu, Yandong Li, Liqiang Wang, Tianbao Yang, Tajana Rosing

We evaluate the effectiveness of traditional attack methods such as FGSM and PGD. The results show that A-GEM still possesses strong continual learning ability in the presence of adversarial examples in the memory and simple defense techniques such as label smoothing can further alleviate the adversarial effects.

Continual Learning

Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

no code implementations ICLR 2020 Mingrui Liu, Youssef Mroueh, Jerret Ross, Wei zhang, Xiaodong Cui, Payel Das, Tianbao Yang

Then we propose an adaptive variant of OSG named Optimistic Adagrad (OAdagrad) and reveal an \emph{improved} adaptive complexity $O\left(\epsilon^{-\frac{2}{1-\alpha}}\right)$, where $\alpha$ characterizes the growth rate of the cumulative stochastic gradient and $0\leq \alpha\leq 1/2$.

A Decentralized Parallel Algorithm for Training Generative Adversarial Nets

no code implementations NeurIPS 2020 Mingrui Liu, Wei zhang, Youssef Mroueh, Xiaodong Cui, Jerret Ross, Tianbao Yang, Payel Das

Despite recent progress on decentralized algorithms for training deep neural networks, it remains unclear whether it is possible to train GANs in a decentralized manner.

Improved Schemes for Episodic Memory-based Lifelong Learning

1 code implementation NeurIPS 2020 Yunhui Guo, Mingrui Liu, Tianbao Yang, Tajana Rosing

This view leads to two improved schemes for episodic memory based lifelong learning, called MEGA-I and MEGA-II.

Learning with Long-term Remembering: Following the Lead of Mixed Stochastic Gradient

no code implementations25 Sep 2019 Yunhui Guo, Mingrui Liu, Tianbao Yang, Tajana Rosing

In this paper, we introduce a novel and effective lifelong learning algorithm, called MixEd stochastic GrAdient (MEGA), which allows deep neural networks to acquire the ability of retaining performance on old tasks while learning new tasks.

Stochastic AUC Maximization with Deep Neural Networks

no code implementations ICLR 2020 Mingrui Liu, Zhuoning Yuan, Yiming Ying, Tianbao Yang

In this paper, we consider stochastic AUC maximization problem with a deep neural network as the predictive model.

Adaptive Negative Curvature Descent with Applications in Non-convex Optimization

no code implementations NeurIPS 2018 Mingrui Liu, Zhe Li, Xiaoyu Wang, Jin-Feng Yi, Tianbao Yang

Negative curvature descent (NCD) method has been utilized to design deterministic or stochastic algorithms for non-convex optimization aiming at finding second-order stationary points or local minima.

Faster Online Learning of Optimal Threshold for Consistent F-measure Optimization

no code implementations NeurIPS 2018 Xiaoxuan Zhang, Mingrui Liu, Xun Zhou, Tianbao Yang

To advance OFO, we propose an efficient online algorithm based on simultaneously learning a posterior probability of class and learning an optimal threshold by minimizing a stochastic strongly convex function with unknown strong convexity parameter.

First-order Convergence Theory for Weakly-Convex-Weakly-Concave Min-max Problems

no code implementations24 Oct 2018 Mingrui Liu, Hassan Rafique, Qihang Lin, Tianbao Yang

In this paper, we consider first-order convergence theory and algorithms for solving a class of non-convex non-concave min-max saddle-point problems, whose objective function is weakly convex in the variables of minimization and weakly concave in the variables of maximization.

Weakly-Convex Concave Min-Max Optimization: Provable Algorithms and Applications in Machine Learning

no code implementations4 Oct 2018 Hassan Rafique, Mingrui Liu, Qihang Lin, Tianbao Yang

Min-max problems have broad applications in machine learning, including learning with non-decomposable loss and learning with robustness to data distribution.

BIG-bench Machine Learning

Fast Stochastic AUC Maximization with $O(1/n)$-Convergence Rate

no code implementations ICML 2018 Mingrui Liu, Xiaoxuan Zhang, Zaiyi Chen, Xiaoyu Wang, Tianbao Yang

In this paper, we consider statistical learning with AUC (area under ROC curve) maximization in the classical stochastic setting where one random data drawn from an unknown distribution is revealed at each iteration for updating the model.

Fast Rates of ERM and Stochastic Approximation: Adaptive to Error Bound Conditions

no code implementations NeurIPS 2018 Mingrui Liu, Xiaoxuan Zhang, Lijun Zhang, Rong Jin, Tianbao Yang

Error bound conditions (EBC) are properties that characterize the growth of an objective function when a point is moved away from the optimal set.

Adaptive Accelerated Gradient Converging Method under H\"{o}lderian Error Bound Condition

no code implementations NeurIPS 2017 Mingrui Liu, Tianbao Yang

Recent studies have shown that proximal gradient (PG) method and accelerated gradient method (APG) with restarting can enjoy a linear convergence under a weaker condition than strong convexity, namely a quadratic growth condition (QGC).

ADMM without a Fixed Penalty Parameter: Faster Convergence with New Adaptive Penalization

no code implementations NeurIPS 2017 Yi Xu, Mingrui Liu, Qihang Lin, Tianbao Yang

The novelty of the proposed scheme lies at that it is adaptive to a local sharpness property of the objective function, which marks the key difference from previous adaptive scheme that adjusts the penalty parameter per-iteration based on certain conditions on iterates.

Stochastic Optimization

On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization

no code implementations25 Sep 2017 Mingrui Liu, Tianbao Yang

To the best of our knowledge, the proposed stochastic algorithm is the first one that converges to a second-order stationary point in {\it high probability} with a time complexity independent of the sample size and almost linear in dimensionality.

Adaptive Accelerated Gradient Converging Methods under Holderian Error Bound Condition

no code implementations23 Nov 2016 Mingrui Liu, Tianbao Yang

Recent studies have shown that proximal gradient (PG) method and accelerated gradient method (APG) with restarting can enjoy a linear convergence under a weaker condition than strong convexity, namely a quadratic growth condition (QGC).

Cannot find the paper you are looking for? You can Submit a new open access paper.