Decorrelated Double Q-learning

Q-learning with value function approximation may have the poor performance because of overestimation bias and imprecise estimate. Specifically, overestimation bias is from the maximum operator over noise estimate, which is exaggerated using the estimate of a subsequent state... (read more)

Results in Papers With Code
(↓ scroll down to see all results)