Search Results for author: Yushun Zhang

Found 10 papers, 4 papers with code

Adam-mini: Use Fewer Learning Rates To Gain More

1 code implementation24 Jun 2024 Yushun Zhang, Congliang Chen, Ziniu Li, Tian Ding, Chenwei Wu, Diederik P. Kingma, Yinyu Ye, Zhi-Quan Luo, Ruoyu Sun

Adam-mini reduces memory by cutting down the learning rate resources in Adam (i. e., $1/\sqrt{v}$).

Why Transformers Need Adam: A Hessian Perspective

2 code implementations26 Feb 2024 Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo

SGD performs worse than Adam by a significant margin on Transformers, but the reason remains unclear.

Communication Efficiency Optimization of Federated Learning for Computing and Network Convergence of 6G Networks

no code implementations28 Nov 2023 Yizhuo Cai, Bo Lei, Qianying Zhao, Jing Peng, Min Wei, Yushun Zhang, Xing Zhang

In this paper, to improve the communication efficiency of federated learning in complex networks, we study the communication efficiency optimization of federated learning for computing and network convergence of 6G networks, methods that gives decisions on its training process for different network conditions and arithmetic power of participating devices in federated learning.

Federated Learning

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

2 code implementations16 Oct 2023 Ziniu Li, Tian Xu, Yushun Zhang, Zhihang Lin, Yang Yu, Ruoyu Sun, Zhi-Quan Luo

ReMax can save about 46% GPU memory than PPO when training a 7B model and enables training on A800-80GB GPUs without the memory-saving offloading technique needed by PPO.

General Reinforcement Learning reinforcement-learning

Uncertainty and Explainable Analysis of Machine Learning Model for Reconstruction of Sonic Slowness Logs

no code implementations24 Aug 2023 Hua Wang, Yuqiong Wu, Yushun Zhang, Fuqiang Lai, Zhou Feng, Bing Xie, Ailin Zhao

Using the SHAP explainable machine learning model, we calculate the importance of each input log to the predicted results as well as the coupling relationship among input logs.

Ensemble Learning

When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work

no code implementations NeurIPS 2021 Jiawei Zhang, Yushun Zhang, Mingyi Hong, Ruoyu Sun, Zhi-Quan Luo

Third, we consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer.

Provable Adaptivity of Adam under Non-uniform Smoothness

no code implementations21 Aug 2022 Bohan Wang, Yushun Zhang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Tie-Yan Liu, Zhi-Quan Luo, Wei Chen

We present the first convergence analysis of RR Adam without the bounded smoothness assumption.

Attribute

Adam Can Converge Without Any Modification On Update Rules

no code implementations20 Aug 2022 Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun, Zhi-Quan Luo

We point out there is a mismatch between the settings of theory and practice: Reddi et al. 2018 pick the problem after picking the hyperparameters of Adam, i. e., $(\beta_1, \beta_2)$; while practical applications often fix the problem first and then tune $(\beta_1, \beta_2)$.

HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning

1 code implementation ICLR 2022 Ziniu Li, Yingru Li, Yushun Zhang, Tong Zhang, Zhi-Quan Luo

However, it is limited to the case where 1) a good feature is known in advance and 2) this feature is fixed during the training: if otherwise, RLSVI suffers an unbearable computational burden to obtain the posterior samples of the parameter in the $Q$-value function.

Deep Reinforcement Learning Efficient Exploration +2

Cannot find the paper you are looking for? You can Submit a new open access paper.