no code implementations • 8 Mar 2021 • Yiting Kong, Yang Guan, Jingliang Duan, Shengbo Eben Li, Qi Sun, Bingbing Nie
In this paper, we propose an RL-based end-to-end decision-making method under a framework of offline training and online correction, called the Shielded Distributional Soft Actor-critic (SDSAC).