no code implementations • 8 Nov 2022 • Han Nguyen, Hai Pham, Sashank J. Reddi, Barnabás Póczos
Despite their popularity in deep learning and machine learning in general, the theoretical properties of adaptive optimizers such as Adagrad, RMSProp, Adam or AdamW are not yet fully understood.
no code implementations • 20 Feb 2020 • Ilqar Ramazanli, Han Nguyen, Hai Pham, Sashank J. Reddi, Barnabas Poczos
It often leads to the dependence of convergence rate on maximum Lipschitz constant of gradients across the devices.
no code implementations • 25 Sep 2019 • Tung-Long Vuong, Han Nguyen, Hai Pham, Kenneth Tran
Under this framework, the objective function can represented end-to-end as a single computational graph, which allows seamless policy gradient computation via backpropagation through the models.