Search Results for author: Hanyang Zhao

Found 8 papers, 1 papers with code

Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning

no code implementations3 Feb 2025 Hanyang Zhao, Haoxian Chen, Ji Zhang, David D. Yao, Wenpin Tang

Reinforcement learning from human feedback (RLHF), which aligns a diffusion model with input prompt, has become a crucial step in building reliable generative AI models.

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

no code implementations5 Oct 2024 Hanyang Zhao, Genta Indra Winata, Anirban Das, Shi-Xiong Zhang, David D. Yao, Wenpin Tang, Sambit Sahu

Recently, numerous preference optimization algorithms have been introduced as extensions to the Direct Preference Optimization (DPO) family.

Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning

no code implementations12 Sep 2024 Hanyang Zhao, Haoxian Chen, Ji Zhang, David D. Yao, Wenpin Tang

Reinforcement Learning from human feedback (RLHF) has been shown a promising direction for aligning generative models with human intent and has also been explored in recent works for alignment of diffusion generative models.

reinforcement-learning Reinforcement Learning +1

MallowsPO: Fine-Tune Your LLM with Preference Dispersions

no code implementations23 May 2024 Haoxian Chen, Hanyang Zhao, Henry Lam, David Yao, Wenpin Tang

Direct Preference Optimization (DPO) has recently emerged as a popular approach to improve reinforcement learning with human feedback (RLHF), leading to better techniques to fine-tune large language models (LLM).

Diversity

Score-based Diffusion Models via Stochastic Differential Equations -- a Technical Tutorial

no code implementations12 Feb 2024 Wenpin Tang, Hanyang Zhao

This is an expository article on the score-based diffusion models, with a particular focus on the formulation via stochastic differential equations (SDE).

reinforcement-learning

Contractive Diffusion Probabilistic Models

no code implementations23 Jan 2024 Wenpin Tang, Hanyang Zhao

In view of possibly unguaranteed score matching, we propose a new criterion -- the contraction of backward sampling in the design of DPMs, leading to a novel class of contractive DPMs (CDPMs).

Cannot find the paper you are looking for? You can Submit a new open access paper.