no code implementations • 4 Sep 2019 • Mahammad Humayoo, Xue-Qi Cheng
The reason stems from the fact that the ordered regularization can reject irrelevant variables and yield an accurate estimation of the parameters.
no code implementations • 30 Oct 2018 • Mahammad Humayoo, Xue-Qi Cheng
One reason for the instability of off-policy learning is a discrepancy between the target ($\pi$) and behavior (b) policy distributions.