no code implementations • 26 Feb 2024 • Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Sergey Levine, Tommaso Biancalani
It is natural to frame this as a reinforcement learning (RL) problem, in which the objective is to fine-tune a diffusion model to maximize a reward function that corresponds to some property.
no code implementations • 23 Feb 2024 • Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani, Sergey Levine
Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins.
no code implementations • 20 Nov 2023 • Yulai Zhao, Wenhao Zhan, Xiaoyan Hu, Ho-fung Leung, Farzan Farnia, Wen Sun, Jason D. Lee
We study CVaR RL in low-rank MDPs with nonlinear function approximation.
1 code implementation • 8 May 2023 • Yulai Zhao, Zhuoran Yang, Zhaoran Wang, Jason D. Lee
Motivated by the observation, we present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
no code implementations • 7 Sep 2022 • Yulai Zhao, Jianshu Chen, Simon S. Du
Here, $n$ is the number of pre-training data and $m$ is the number of data in the downstream task, and typically $n \gg m$.
no code implementations • 2 Sep 2022 • Yulai Zhao
The core difficulty of using the performative risk as an optimization objective is that the data distribution itself depends on the model parameters.
no code implementations • 17 Feb 2021 • Yulai Zhao, Yuandong Tian, Jason D. Lee, Simon S. Du
Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces.