Sarsa

Sarsa is an on-policy TD control algorithm:

$$Q\left(S_{t}, A_{t}\right) \leftarrow Q\left(S_{t}, A_{t}\right) + \alpha\left[R_{t+1} + \gamma{Q}\left(S_{t+1}, A_{t+1}\right) - Q\left(S_{t}, A_{t}\right)\right] $$

This update is done after every transition from a nonterminal state $S_{t}$. if $S_{t+1}$ is terminal, then $Q\left(S_{t+1}, A_{t+1}\right)$ is defined as zero.

To design an on-policy control algorithm using Sarsa, we estimate $q_{\pi}$ for a behaviour policy $\pi$ and then change $\pi$ towards greediness with respect to $q_{\pi}$.

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Latest Papers

PAPER DATE
Chrome Dino Run using Reinforcement Learning
Divyanshu MarwahSneha SrivastavaAnusha GuptaShruti Verma
2020-08-15
Using Reinforcement Learning to Herd a Robotic Swarm to a Target Distribution
Zahi M. KakishKarthik ElamvazhuthiSpring Berman
2020-06-29
Model-free Reinforcement Learning for Stochastic Stackelberg Security Games
Deepanshu Vasal
2020-05-24
FlapAI Bird: Training an Agent to Play Flappy Bird Using Reinforcement Learning Techniques
| Tai VuLeon Tran
2020-03-21
Enhancing the Monte Carlo Tree Search Algorithm for Video Game Testing
Sinan AriyurekAysu Betin-CanElif Surer
2020-03-17
A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry
| Baihan LinGuillermo CecchiDjallel BouneffoufJenna ReinenIrina Rish
2019-06-21
Deep Q-Learning with Q-Matrix Transfer Learning for Novel Fire Evacuation Environment
Jivitesh SharmaPer-Arne AndersenOle-Chrisoffer GranmoMorten Goodwin
2019-05-23
Design of Artificial Intelligence Agents for Games using Deep Reinforcement Learning
Andrei Claudiu Roibu
2019-05-10
Finite-Sample Analysis for SARSA with Linear Function Approximation
Shaofeng ZouTengyu XuYingbin Liang
2019-02-06
Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target
J. Fernando Hernandez-GarciaRichard S. Sutton
2019-01-22
Recursive Sparse Pseudo-input Gaussian Process SARSA
John MartinBrendan Englot
2018-11-17
The Concept of Criticality in Reinforcement Learning
Yitzhak SpielbergAmos Azaria
2018-10-16
Smoothed Action Value Functions for Learning Gaussian Policies
Ofir NachumMohammad NorouziGeorge TuckerDale Schuurmans
2018-03-06
Reactive Reinforcement Learning in Asynchronous Environments
Jaden B. TravnikKory W. MathewsonRichard S. SuttonPatrick M. Pilarski
2018-02-16
A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning
Long YangMinhao ShiQian ZhengWenjia MengGang Pan
2018-02-09
Learning Gaussian Policies from Smoothed Action Value Functions
Ofir NachumMohammad NorouziGeorge TuckerDale Schuurmans
2018-01-01
Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation
Christopher TeghoPaweł BudzianowskiMilica Gašić
2017-11-30
Double Q($σ$) and Q($σ, λ$): Unifying Reinforcement Learning Control Algorithms
Markus Dumke
2017-11-05
A Comparison of Reinforcement Learning Techniques for Fuzzy Cloud Auto-Scaling
Hamid ArabnejadClaus PahlPooyan JamshidiGiovani Estrada
2017-05-19
Multi-step Reinforcement Learning: A Unifying Algorithm
Kristopher De AsisJ. Fernando Hernandez-GarciaG. Zacharias HollandRichard S. Sutton
2017-03-03
Extending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS and Gazebo
Iker ZamoraNestor Gonzalez LopezVictor Mayoral VilchesAlejandro Hernandez Cordero
2016-08-19
Online Transfer Learning in Reinforcement Learning Domains
Yusen ZhanMatthew E. Taylor
2015-07-02
Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation
Shalabh BhatnagarDoina PrecupDavid SilverRichard S. SuttonHamid R. MaeiCsaba Szepesvári
2009-12-01

Tasks

TASK PAPERS SHARE
Decision Making 2 28.57%
Continuous Control 2 28.57%
Recommendation Systems 1 14.29%
Dialogue Management 1 14.29%
Efficient Exploration 1 14.29%

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories