no code implementations • 9 Feb 2018 • Long Yang, Minhao Shi, Qian Zheng, Wenjia Meng, Gang Pan
Results show that, with an intermediate value of $\sigma$, $Q(\sigma ,\lambda)$ creates a mixture of the existing algorithms that can learn the optimal value significantly faster than the extreme end ($\sigma=0$, or $1$).