no code implementations • 28 Aug 2022 • Ju-Seung Byun, Andrew Perrault
Distributional reinforcement learning (DRL) has been shown to improve performance by modeling the value distribution, not just the mean.
1 code implementation • ICLR 2022 • Ju-Seung Byun, Andrew Perrault
We introduce transition policies that smoothly connect lower-level policies by producing a distribution of states and actions that matches what is expected by the next policy.
no code implementations • 20 Oct 2020 • Ju-Seung Byun, Byungmoon Kim, Huamin Wang
In this paper, we propose a new algorithm PPG (Proximal Policy Gradient), which is close to both VPG (vanilla policy gradient) and PPO (proximal policy optimization).