no code implementations • 28 Feb 2023 • Vincent Corlay, Jean-Christophe Sibel
Standard Markov decision process (MDP) and reinforcement learning algorithms optimize the policy with respect to the expected gain.
Q-Learning reinforcement-learning +1