Policy Gradient Methods

ACER, or Actor Critic with Experience Replay, is an actor-critic deep reinforcement learning agent with experience replay. It can be seen as an off-policy extension of A3C, where the off-policy estimator is made feasible by:

  • Using Retrace Q-value estimation.
  • Using truncated importance sampling with bias correction.
  • Using a trust region policy optimization method.
  • Using a stochastic dueling network architecture.
Source: Sample Efficient Actor-Critic with Experience Replay


Paper Code Results Date Stars