Multi-Critic Actor Learning: Teaching RL Policies to Act with Style

Using a single value function (critic) shared over multiple tasks in Actor-Critic multi-task reinforcement learning (MTRL) can result in negative interference between tasks, which can compromise learning performance. Multi-Critic Actor Learning (MultiCriticAL) proposes instead maintaining separate value-function estimators, i.e. critics, for each task being trained. This relaxes an assumption of continuity between task values and avoids interference between task-value estimates. Explicitly distinguishing between tasks also eliminates the need for critics to learn to do so. MultiCriticAL is tested in the context of multi-style learning, a special case of MTRL where agents are trained to behave with different distinct behavior styles, and yields up to 45% performance gains over the single-critic baselines and even successfully learns behavior styles in cases where single-critic approaches may simply fail to learn. As a further test of MultiCriticAL’s utility, it is tested on a simulation of EA’s UFC game, where our method enables a single policy function to learn and smoothly transition between multiple fighting styles.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods