Expected Sarsa is like Qlearning but instead of taking the maximum over next stateaction pairs, we use the expected value, taking into account how likely each action is under the current policy.
$$Q\left(S_{t}, A_{t}\right) \leftarrow Q\left(S_{t}, A_{t}\right) + \alpha\left[R_{t+1} + \gamma\sum_{a}\pi\left(a\mid{S_{t+1}}\right)Q\left(S_{t+1}, a\right)  Q\left(S_{t}, A_{t}\right)\right] $$
Except for this change to the update rule, the algorithm otherwise follows the scheme of Qlearning. It is more computationally expensive than Sarsa but it eliminates the variance due to the random selection of $A_{t+1}$.
Source: Sutton and Barto, Reinforcement Learning, 2nd Edition
PAPER  DATE 

Chrome Dino Run using Reinforcement Learning
• • • 
20200815 
Modelfree Reinforcement Learning for Stochastic Stackelberg Security Games

20200524 
The Concept of Criticality in Reinforcement Learning
• 
20181016 
Multistep Reinforcement Learning: A Unifying Algorithm
• • • 
20170303 
COMPONENT  TYPE 


🤖 No Components Found  You can add them if they exist; e.g. Mask RCNN uses RoIAlign 