1 code implementation • 26 May 2021 • Chang Tian, An Liu, Guang Huang, Wu Luo
We propose a successive convex approximation based off-policy optimization (SCAOPO) algorithm to solve the general constrained reinforcement learning problem, which is formulated as a constrained Markov decision process (CMDP) in the context of average cost.