Sarsa is an onpolicy TD control algorithm:
$$Q\left(S_{t}, A_{t}\right) \leftarrow Q\left(S_{t}, A_{t}\right) + \alpha\left[R_{t+1} + \gamma{Q}\left(S_{t+1}, A_{t+1}\right)  Q\left(S_{t}, A_{t}\right)\right] $$
This update is done after every transition from a nonterminal state $S_{t}$. if $S_{t+1}$ is terminal, then $Q\left(S_{t+1}, A_{t+1}\right)$ is defined as zero.
To design an onpolicy control algorithm using Sarsa, we estimate $q_{\pi}$ for a behaviour policy $\pi$ and then change $\pi$ towards greediness with respect to $q_{\pi}$.
Source: Sutton and Barto, Reinforcement Learning, 2nd Edition
Paper  Code  Results  Date  Stars 

Task  Papers  Share 

Reinforcement Learning (RL)  30  53.57% 
Continuous Control  3  5.36% 
Combinatorial Optimization  2  3.57% 
Test  2  3.57% 
Decision Making  2  3.57% 
OpenAI Gym  2  3.57% 
Management  2  3.57% 
Autonomous Driving  1  1.79% 
Board Games  1  1.79% 
Component  Type 


🤖 No Components Found  You can add them if they exist; e.g. Mask RCNN uses RoIAlign 