no code implementations • ICLR 2019 • Huizhuo Yuan, Chris Junchi Li, Yuhao Tang, Yuren Zhou
In this paper, we propose the StochAstic Recursive grAdient Policy Optimization (SARAPO) algorithm which is a novel variance reduction method on Trust Region Policy Optimization (TRPO).
1 code implementation • 11 Jan 2025 • Yifan Zhang, Yifeng Liu, Huizhuo Yuan, Zhen Qin, Yang Yuan, Quanquan Gu, Andrew Chi-Chih Yao
Scaling language models to handle longer input sequences typically necessitates large key-value (KV) caches, resulting in substantial memory overhead during inference.
no code implementations • 27 Dec 2024 • Yuanzhe Tao, Huizhuo Yuan, Xun Zhou, Yuan Cao, Quanquan Gu
In this paper, we present AdaGrad++ and Adam++, novel and simple parameter-free variants of AdaGrad and Adam with convergence guarantees.
2 code implementations • 15 Nov 2024 • Huizhuo Yuan, Yifeng Liu, Shuang Wu, Xun Zhou, Quanquan Gu
Despite the development of numerous variance reduction algorithms in the past decade aimed at accelerating stochastic optimization in both convex and nonconvex settings, variance reduction has not found widespread success in training deep neural networks or large language models.
no code implementations • 8 Oct 2024 • Jiafan He, Huizhuo Yuan, Quanquan Gu
Theoretically, we demonstrate that APO can achieve a faster convergence rate than the standard iterative preference optimization methods, including DPO and Self-Play Preference Optimization (SPPO).
1 code implementation • 1 May 2024 • Yue Wu, Zhiqing Sun, Huizhuo Yuan, Kaixuan Ji, Yiming Yang, Quanquan Gu
In this paper, we propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game aimed at identifying the Nash equilibrium policy.
1 code implementation • 21 Mar 2024 • Yan Wang, Lihao Wang, Yuning Shen, Yiqun Wang, Huizhuo Yuan, Yue Wu, Quanquan Gu
The conformational landscape of proteins is crucial to understanding their functionality in complex biological processes.
no code implementations • 15 Feb 2024 • Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs).
2 code implementations • 2 Jan 2024 • Zixiang Chen, Yihe Deng, Huizhuo Yuan, Kaixuan Ji, Quanquan Gu
In this paper, we delve into the prospect of growing a strong LLM out of a weak one without the need for acquiring additional human-annotated data.
no code implementations • 14 Dec 2023 • Zixiang Chen, Huizhuo Yuan, YongQian Li, Yiwen Kou, Junkai Zhang, Quanquan Gu
Discrete diffusion models have emerged as powerful tools for high-quality data generation.
no code implementations • 9 Mar 2020 • Huizhuo Yuan, Xiangru Lian, Ji Liu, Yuren Zhou
In this paper, we propose a novel algorithm named STOchastic Recursive Momentum for Policy Gradient (STORM-PG), which operates a SARAH-type stochastic recursive variance-reduced policy gradient in an exponential moving average fashion.
no code implementations • 7 Mar 2020 • Xiang Zhou, Huizhuo Yuan, Chris Junchi Li, Qingyun Sun
In this work, we put different variants of stochastic ADMM into a unified form, which includes standard, linearized and gradient-based ADMM with relaxation, and study their dynamics via a continuous-time model approach.
no code implementations • 31 Dec 2019 • Huizhuo Yuan, Xiangru Lian, Ji Liu
Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization, and is believed to be optimal.
no code implementations • NeurIPS 2019 • Huizhuo Yuan, Xiangru Lian, Chris Junchi Li, Ji Liu, Wenqing Hu
Stochastic compositional optimization arises in many important machine learning tasks such as reinforcement learning and portfolio management.