no code implementations • 4 Dec 2014 • Bicheng Ying, Ali H. Sayed
The paper examines the learning mechanism of adaptive agents over weakly-connected graphs and reveals an interesting behavior on how information flows through such topologies.
no code implementations • 24 Nov 2015 • Bicheng Ying, Ali H. Sayed
In this work and the supporting Part II, we examine the performance of stochastic sub-gradient learning strategies under weaker conditions than usually considered in the literature.
no code implementations • 24 Feb 2016 • Bicheng Ying, Kun Yuan, Ali H. Sayed
The stochastic dual coordinate-ascent (S-DCA) technique is a useful alternative to the traditional stochastic gradient-descent algorithm for solving large-scale optimization problems due to its scalability to large data sets and strong theoretical guarantees.
no code implementations • 14 Mar 2016 • Kun Yuan, Bicheng Ying, Ali H. Sayed
The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime.
no code implementations • 20 Apr 2017 • Bicheng Ying, Ali H. Sayed
The analysis in Part I revealed interesting properties for subgradient learning algorithms in the context of stochastic optimization when gradient noise is present.
no code implementations • 4 Aug 2017 • Kun Yuan, Bicheng Ying, Jiageng Liu, Ali H. Sayed
For such situations, the balanced gradient computation property of AVRG becomes a real advantage in reducing idle time caused by unbalanced local data storage requirements, which is characteristic of other reduced-variance gradient algorithms.
no code implementations • 4 Aug 2017 • Bicheng Ying, Kun Yuan, Ali H. Sayed
First, it resolves this open issue and provides the first theoretical guarantee of linear convergence under random reshuffling for SAGA; the argument is also adaptable to other variance-reduced algorithms.
no code implementations • 21 Mar 2018 • Bicheng Ying, Kun Yuan, Stefan Vlaski, Ali H. Sayed
In empirical risk optimization, it has been observed that stochastic gradient implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data uniformly.
no code implementations • 29 May 2018 • Bicheng Ying, Kun Yuan, Ali H. Sayed
This work studies the problem of learning under both large datasets and large-dimensional feature space scenarios.
no code implementations • 26 Mar 2019 • Kun Yuan, Sulaiman A. Alghunaim, Bicheng Ying, Ali H. Sayed
It is still unknown {\em whether}, {\em when} and {\em why} these bias-correction methods can outperform their traditional counterparts (such as consensus and diffusion) with noisy gradient and constant step-sizes.
no code implementations • 29 Sep 2021 • Bicheng Ying, Kun Yuan, Yiming Chen, Hanbin Hu, Yingya Zhang, Pan Pan, Wotao Yin
Decentralized adaptive gradient methods, in which each node averages only with its neighbors, are critical to save communication and wall-clock training time in deep learning tasks.
2 code implementations • NeurIPS 2021 • Bicheng Ying, Kun Yuan, Yiming Chen, Hanbin Hu, Pan Pan, Wotao Yin
Experimental results on a variety of tasks and models demonstrate that decentralized (momentum) SGD over exponential graphs promises both fast and high-quality training.
2 code implementations • 8 Nov 2021 • Bicheng Ying, Kun Yuan, Hanbin Hu, Yiming Chen, Wotao Yin
On mainstream DNN training tasks, BlueFog reaches a much higher throughput and achieves an overall $1. 2\times \sim 1. 8\times$ speedup over Horovod, a state-of-the-art distributed deep learning package based on Ring-Allreduce.
1 code implementation • 1 Jun 2023 • Lisang Ding, Kexin Jin, Bicheng Ying, Kun Yuan, Wotao Yin
Their communication, governed by the communication topology and gossip weight matrices, facilitates the exchange of model updates.