no code implementations • 13 Mar 2025 • Hanyang Zhao, Haoxian Chen, Yucheng Guo, Genta Indra Winata, Tingting Ou, ZiYu Huang, David D. Yao, Wenpin Tang
We demonstrate the effectiveness of our pipeline and the resulting datasets in fine-tuning state-of-the-art diffusion models.
no code implementations • 3 Feb 2025 • Hanyang Zhao, Haoxian Chen, Ji Zhang, David D. Yao, Wenpin Tang
Reinforcement learning from human feedback (RLHF), which aligns a diffusion model with input prompt, has become a crucial step in building reliable generative AI models.
no code implementations • 5 Oct 2024 • Hanyang Zhao, Genta Indra Winata, Anirban Das, Shi-Xiong Zhang, David D. Yao, Wenpin Tang, Sambit Sahu
Recently, numerous preference optimization algorithms have been introduced as extensions to the Direct Preference Optimization (DPO) family.
no code implementations • 17 Sep 2024 • Genta Indra Winata, Hanyang Zhao, Anirban Das, Wenpin Tang, David D. Yao, Shi-Xiong Zhang, Sambit Sahu
Preference tuning is a crucial process for aligning deep generative models with human preferences.
no code implementations • 12 Sep 2024 • Hanyang Zhao, Haoxian Chen, Ji Zhang, David D. Yao, Wenpin Tang
Reinforcement Learning from human feedback (RLHF) has been shown a promising direction for aligning generative models with human intent and has also been explored in recent works for alignment of diffusion generative models.
no code implementations • 26 Jul 2022 • Wenpin Tang, David D. Yao
In particular, we show when a participant is risk-neutral or risk-seeking, corresponding to the risk-adjusted valuation being a martingale or a sub-martingale, the optimal strategy must be to either buy all the time, sell all the time, or first buy then sell, and with both buying and selling executed at full capacity.