Search Results for author: Yushun Zhang

Found 8 papers, 3 papers with code

Why Transformers Need Adam: A Hessian Perspective

1 code implementation26 Feb 2024 Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo

SGD performs worse than Adam by a significant margin on Transformers, but the reason remains unclear.

Communication Efficiency Optimization of Federated Learning for Computing and Network Convergence of 6G Networks

no code implementations28 Nov 2023 Yizhuo Cai, Bo Lei, Qianying Zhao, Jing Peng, Min Wei, Yushun Zhang, Xing Zhang

In this paper, to improve the communication efficiency of federated learning in complex networks, we study the communication efficiency optimization of federated learning for computing and network convergence of 6G networks, methods that gives decisions on its training process for different network conditions and arithmetic power of participating devices in federated learning.

Federated Learning

Uncertainty and Explainable Analysis of Machine Learning Model for Reconstruction of Sonic Slowness Logs

no code implementations24 Aug 2023 Hua Wang, Yuqiong Wu, Yushun Zhang, Fuqiang Lai, Zhou Feng, Bing Xie, Ailin Zhao

Using the SHAP explainable machine learning model, we calculate the importance of each input log to the predicted results as well as the coupling relationship among input logs.

Ensemble Learning

When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work

no code implementations NeurIPS 2021 Jiawei Zhang, Yushun Zhang, Mingyi Hong, Ruoyu Sun, Zhi-Quan Luo

Third, we consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer.

Provable Adaptivity in Adam

no code implementations21 Aug 2022 Bohan Wang, Yushun Zhang, Huishuai Zhang, Qi Meng, Zhi-Ming Ma, Tie-Yan Liu, Wei Chen

In particular, the existing analysis of Adam cannot clearly demonstrate the advantage of Adam over SGD.

Attribute

Adam Can Converge Without Any Modification On Update Rules

no code implementations20 Aug 2022 Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun, Zhi-Quan Luo

We point out there is a mismatch between the settings of theory and practice: Reddi et al. 2018 pick the problem after picking the hyperparameters of Adam, i. e., $(\beta_1, \beta_2)$; while practical applications often fix the problem first and then tune $(\beta_1, \beta_2)$.

HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning

1 code implementation ICLR 2022 Ziniu Li, Yingru Li, Yushun Zhang, Tong Zhang, Zhi-Quan Luo

However, it is limited to the case where 1) a good feature is known in advance and 2) this feature is fixed during the training: if otherwise, RLSVI suffers an unbearable computational burden to obtain the posterior samples of the parameter in the $Q$-value function.

Efficient Exploration reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.