1 code implementation • 19 Oct 2024 • Zhaoxian Wu, Quan Xiao, Tayfun Gokmen, Hsinyu Tsai, Kaoutar El Maghraoui, Tianyi Chen
Aiming to accelerate the training of large deep neural models (DNN) in an energy-efficient way, an analog in-memory computing (AIMC) accelerator emerges as a solution with immense potential.
no code implementations • 17 Oct 2024 • Yue Huang, Zhaoxian Wu, Shiqian Ma, Qing Ling
Stochastic approximation (SA) that involves multiple coupled sequences, known as multiple-sequence SA (MSSA), finds diverse applications in the fields of signal processing and machine learning.
no code implementations • 28 Jun 2024 • Ying Cao, Zhaoxian Wu, Kun Yuan, Ali H. Sayed
This paper proposes a theoretical framework to evaluate and compare the performance of gradient-descent algorithms for distributed learning in relation to their behavior around local minima in nonconvex environments.
no code implementations • 18 Jun 2024 • Zhaoxian Wu, Tayfun Gokmen, Malte J. Rasch, Tianyi Chen
Given the high economic and environmental costs of using large vision or language models, analog in-memory accelerators present a promising solution for energy-efficient AI.
1 code implementation • 16 Jul 2023 • Xingrong Dong, Zhaoxian Wu, Qing Ling, Zhi Tian
But we prove that, even with a class of state-of-the-art robust aggregation rules, in an adversarial environment and in the presence of Byzantine participants, distributed online gradient descent can only achieve a linear adversarial regret bound, which is tight.
2 code implementations • 17 Sep 2020 • Jie Peng, Zhaoxian Wu, Qing Ling, Tianyi Chen
We prove that the proposed method reaches a neighborhood of the optimal solution at a linear convergence rate and the learning error is determined by the number of Byzantine workers.
no code implementations • 29 Dec 2019 • Zhaoxian Wu, Qing Ling, Tianyi Chen, Georgios B. Giannakis
This motivates us to reduce the variance of stochastic gradients as a means of robustifying SGD in the presence of Byzantine attacks.
1 code implementation • 9 Sep 2019 • Weiyu Li, Tianyi Chen, Liping Li, Zhaoxian Wu, Qing Ling
Specifically, in CSGD, the latest mini-batch stochastic gradient at a worker will be transmitted to the server if and only if it is sufficiently informative.