Deep Constrained Q-learning

In many real world applications, reinforcement learning agents have to optimize multiple objectives while following certain rules or satisfying a list of constraints. Classical methods based on reward shaping, i.e. a weighted combination of different objectives in the reward signal, or Lagrangian methods, including constraints in the loss function, have no guarantees that the agent satisfies the constraints at all points in time and can lead to undesired behavior... (read more)

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper