1 code implementation • NeurIPS 2023 • Liting Chen, Jie Yan, Zhengdao Shao, Lu Wang, QIngwei Lin, Saravan Rajmohan, Thomas Moscibroda, Dongmei Zhang
In this paper, we propose Conservative State Value Estimation (CSVE), a new approach that learns conservative V-function via directly imposing penalty on OOD states.