no code implementations • 14 Oct 2024 • Zhanpeng Zhou, Mingze Wang, Yuchen Mao, Bingrui Li, Junchi Yan
Specifically, we find that SAM efficiently selects flatter minima late in training.
no code implementations • 7 Oct 2024 • Bingrui Li, Wei Huang, Andi Han, Zhanpeng Zhou, Taiji Suzuki, Jun Zhu, Jianfei Chen
We also show that Adam behaves similarly to SignGD in terms of both optimization and generalization in this setting.