Search Results for author: Congliang Chen

Found 8 papers, 2 papers with code

Why Transformers Need Adam: A Hessian Perspective

1 code implementation • 26 Feb 2024 • Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo

SGD performs worse than Adam by a significant margin on Transformers, but the reason remains unclear.

Paper
Code

Rethinking SIGN Training: Provable Nonconvex Acceleration without First- and Second-Order Gradient Lipschitz

no code implementations • 23 Oct 2023 • Tao Sun, Congliang Chen, Peng Qiao, Li Shen, Xinwang Liu, Dongsheng Li

Sign-based stochastic methods have gained attention due to their ability to achieve robust performance despite using only the sign information for parameter updates.

Paper
Add Code

Adam Can Converge Without Any Modification On Update Rules

no code implementations • 20 Aug 2022 • Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun, Zhi-Quan Luo

We point out there is a mismatch between the settings of theory and practice: Reddi et al. 2018 pick the problem after picking the hyperparameters of Adam, i. e., $(\beta_1, \beta_2)$; while practical applications often fix the problem first and then tune $(\beta_1, \beta_2)$.

Paper
Add Code

Efficient-Adam: Communication-Efficient Distributed Adam

no code implementations • 28 May 2022 • Congliang Chen, Li Shen, Wei Liu, Zhi-Quan Luo

Distributed adaptive stochastic gradient methods have been widely used for large-scale nonconvex optimization, such as training deep learning models.

Quantization

Paper
Add Code

Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration

no code implementations • 14 Jan 2021 • Congliang Chen, Li Shen, Fangyu Zou, Wei Liu

Adam is one of the most influential adaptive stochastic algorithms for training deep neural networks, which has been pointed out to be divergent even in the simple convex setting via a few simple counterexamples.

Stochastic Optimization

Paper
Add Code

Quantized Adam with Error Feedback

no code implementations • 29 Apr 2020 • Congliang Chen, Li Shen, Hao-Zhi Huang, Wei Liu

In this paper, we present a distributed variant of adaptive stochastic gradient method for training deep neural networks in the parameter-server model.

Quantization

Paper
Add Code

A Unified Analysis of AdaGrad with Weighted Aggregation and Momentum Acceleration

no code implementations • 10 Aug 2018 • Li Shen, Congliang Chen, Fangyu Zou, Zequn Jie, Ju Sun, Wei Liu

Integrating adaptive learning rate and momentum techniques into SGD leads to a large class of efficiently accelerated adaptive stochastic algorithms, such as AdaGrad, RMSProp, Adam, AccAdaGrad, \textit{etc}.

Stochastic Optimization

Paper
Add Code

Arbitrary Style Transfer with Deep Feature Reshuffle

1 code implementation • CVPR 2018 • Shuyang Gu, Congliang Chen, Jing Liao, Lu Yuan

We theoretically prove that our new style loss based on reshuffle connects both global and local style losses respectively used by most parametric and non-parametric neural style transfer methods.

Style Transfer

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.