Search Results for author: Huizhuo Yuan

Found 14 papers, 5 papers with code

Policy Optimization via Stochastic Recursive Gradient Algorithm

no code implementations ICLR 2019 Huizhuo Yuan, Chris Junchi Li, Yuhao Tang, Yuren Zhou

In this paper, we propose the StochAstic Recursive grAdient Policy Optimization (SARAPO) algorithm which is a novel variance reduction method on Trust Region Policy Optimization (TRPO).

Tensor Product Attention Is All You Need

1 code implementation11 Jan 2025 Yifan Zhang, Yifeng Liu, Huizhuo Yuan, Zhen Qin, Yang Yuan, Quanquan Gu, Andrew Chi-Chih Yao

Scaling language models to handle longer input sequences typically necessitates large key-value (KV) caches, resulting in substantial memory overhead during inference.

Language Modeling Language Modelling

Towards Simple and Provable Parameter-Free Adaptive Gradient Methods

no code implementations27 Dec 2024 Yuanzhe Tao, Huizhuo Yuan, Xun Zhou, Yuan Cao, Quanquan Gu

In this paper, we present AdaGrad++ and Adam++, novel and simple parameter-free variants of AdaGrad and Adam with convergence guarantees.

MARS: Unleashing the Power of Variance Reduction for Training Large Models

2 code implementations15 Nov 2024 Huizhuo Yuan, Yifeng Liu, Shuang Wu, Xun Zhou, Quanquan Gu

Despite the development of numerous variance reduction algorithms in the past decade aimed at accelerating stochastic optimization in both convex and nonconvex settings, variance reduction has not found widespread success in training deep neural networks or large language models.

Stochastic Optimization

Accelerated Preference Optimization for Large Language Model Alignment

no code implementations8 Oct 2024 Jiafan He, Huizhuo Yuan, Quanquan Gu

Theoretically, we demonstrate that APO can achieve a faster convergence rate than the standard iterative preference optimization methods, including DPO and Self-Play Preference Optimization (SPPO).

Language Modeling Language Modelling +1

Self-Play Preference Optimization for Language Model Alignment

1 code implementation1 May 2024 Yue Wu, Zhiqing Sun, Huizhuo Yuan, Kaixuan Ji, Yiming Yang, Quanquan Gu

In this paper, we propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game aimed at identifying the Nash equilibrium policy.

Language Modeling Language Modelling +1

Protein Conformation Generation via Force-Guided SE(3) Diffusion Models

1 code implementation21 Mar 2024 Yan Wang, Lihao Wang, Yuning Shen, Yiqun Wang, Huizhuo Yuan, Yue Wu, Quanquan Gu

The conformational landscape of proteins is crucial to understanding their functionality in complex biological processes.

Diversity

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

no code implementations15 Feb 2024 Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu

Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs).

Reinforcement Learning (RL) Text-to-Image Generation

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

2 code implementations2 Jan 2024 Zixiang Chen, Yihe Deng, Huizhuo Yuan, Kaixuan Ji, Quanquan Gu

In this paper, we delve into the prospect of growing a strong LLM out of a weak one without the need for acquiring additional human-annotated data.

Stochastic Recursive Momentum for Policy Gradient Methods

no code implementations9 Mar 2020 Huizhuo Yuan, Xiangru Lian, Ji Liu, Yuren Zhou

In this paper, we propose a novel algorithm named STOchastic Recursive Momentum for Policy Gradient (STORM-PG), which operates a SARAH-type stochastic recursive variance-reduced policy gradient in an exponential moving average fashion.

Policy Gradient Methods

Stochastic Modified Equations for Continuous Limit of Stochastic ADMM

no code implementations7 Mar 2020 Xiang Zhou, Huizhuo Yuan, Chris Junchi Li, Qingyun Sun

In this work, we put different variants of stochastic ADMM into a unified form, which includes standard, linearized and gradient-based ADMM with relaxation, and study their dynamics via a continuous-time model approach.

Stochastic Recursive Variance Reduction for Efficient Smooth Non-Convex Compositional Optimization

no code implementations31 Dec 2019 Huizhuo Yuan, Xiangru Lian, Ji Liu

Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization, and is believed to be optimal.

Management Reinforcement Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.