Multi-Critic Actor Learning: Teaching RL Policies to Act with Style
Using a single value function (critic) shared over multiple tasks in Actor-Critic multi-task reinforcement learning (MTRL) can result in negative interference between tasks, which can compromise learning performance. Multi-Critic Actor Learning (MultiCriticAL) proposes instead maintaining separate value-function estimators, i.e. critics, for each task being trained. This relaxes an assumption of continuity between task values and avoids interference between task-value estimates. Explicitly distinguishing between tasks also eliminates the need for critics to learn to do so. MultiCriticAL is tested in the context of multi-style learning, a special case of MTRL where agents are trained to behave with different distinct behavior styles, and yields up to 45% performance gains over the single-critic baselines and even successfully learns behavior styles in cases where single-critic approaches may simply fail to learn. As a further test of MultiCriticAL’s utility, it is tested on a simulation of EA’s UFC game, where our method enables a single policy function to learn and smoothly transition between multiple fighting styles.
PDF Abstract