Safe Exploration in Linear Equality Constraint
With the extensive research and application, some shortcomings of reinforcement learning methods are gradually revealed. One of the considerable problems is that it is difficult for reinforcement learning methods to strictly satisfy the constraints. In this paper, a Singular Value Decomposition-based non-training method called 'Action Decomposition Regular' is proposed to achieve safe exploration. By adopting linear dynamics model, our method decomposes the action space into a constraint dimension and a free dimension for separate control, making policy strictly satisfy the linear equality constraint without limiting the exploration region. In addition, we show how our method should be used when the action space is limited and convex, which makes the method more suitable for real-world scenarios. Finally, we show the effectiveness of our method in a physically-based environment and prevail where reward shaping fails.
PDF Abstract