Search Results for author: Shuo Xie

Found 3 papers, 1 papers with code

Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization

no code implementations5 Apr 2024 Shuo Xie, Zhiyuan Li

Adam with decoupled weight decay, also known as AdamW, is widely acclaimed for its superior performance in language modeling tasks, surpassing Adam with $\ell_2$ regularization in terms of generalization and optimization.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.