1 code implementation • 29 Mar 2023 • Thibault Lahire
Stochastic gradient descent samples uniformly the training set to build an unbiased gradient estimate with a limited number of samples.
no code implementations • 31 Dec 2021 • Thibault Lahire
This technical report is devoted to explaining how the actor loss of soft actor critic is obtained, as well as the associated gradient estimate.
1 code implementation • 4 Oct 2021 • Thibault Lahire, Matthieu Geist, Emmanuel Rachelson
The optimal sampling distribution being intractable, we make several approximations providing good results in practice and introduce, among others, LaBER (Large Batch Experience Replay), an easy-to-code and efficient method for sampling the replay buffer.