no code implementations • 25 Apr 2024 • Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun
While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications including the fine-tuning of generative models.
no code implementations • 16 Feb 2024 • Zhaolin Gao, Kianté Brantley, Thorsten Joachims
In this paper, we envision a use case where authors can receive LLM-generated reviews that uncover weak points in the current draft.
1 code implementation • 28 Oct 2019 • Wan-Yu Lin, Zhaolin Gao, Baochun Li
More specifically, we address the problem of graph-based semi-supervised learning in the presence of severely limited labeled samples, and propose a new framework, called {\em Shoestring}, that improves the learning performance through semantic transfer from these very few labeled samples to large numbers of unlabeled samples.