Distributed Methods

IMPALA, or the Importance Weighted Actor Learner Architecture, is an off-policy actor-critic framework that decouples acting from learning and learns from experience trajectories using V-trace. Unlike the popular A3C-based agents, in which workers communicate gradients with respect to the parameters of the policy to a central parameter server, IMPALA actors communicate trajectories of experience (sequences of states, actions, and rewards) to a centralized learner. Since the learner in IMPALA has access to full trajectories of experience we use a GPU to perform updates on mini-batches of trajectories while aggressively parallelising all time independent operations.

This type of decoupled architecture can achieve very high throughput. However, because the policy used to generate a trajectory can lag behind the policy on the learner by several updates at the time of gradient calculation, learning becomes off-policy. The V-trace off-policy actor-critic algorithm is used to correct for this harmful discrepancy.

Source: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures


Paper Code Results Date Stars


Task Papers Share
reinforcement Learning 6 42.86%
Continuous Control 2 14.29%
Atari Games 2 14.29%
OpenAI Gym 2 14.29%
Image Captioning 1 7.14%
Edge-computing 1 7.14%