Q-Learning
405 papers with code • 0 benchmarks • 2 datasets
The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.
( Image credit: Playing Atari with Deep Reinforcement Learning )
Benchmarks
These leaderboards are used to track progress in Q-Learning
Libraries
Use these libraries to find Q-Learning models and implementationsMost implemented papers
Improved robustness of reinforcement learning policies upon conversion to spiking neuronal network platforms applied to ATARI games
Previous studies in image classification domain demonstrated that standard NNs (with ReLU nonlinearity) trained using supervised learning can be converted to SNNs with negligible deterioration in performance.
Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology
(i) We develop SLATEQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates.
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
Bootstrapping error is due to bootstrapping from actions that lie outside of the training data distribution, and it accumulates via the Bellman backup operator.
Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the Past
The ERE algorithm samples more aggressively from recent experience, and also orders the updates to ensure that updates from old data do not overwrite updates from new data.
Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms
This paper makes one step forward towards characterizing a new family of \textit{model-free} Deep Reinforcement Learning (DRL) algorithms.
FACMAC: Factored Multi-Agent Centralised Policy Gradients
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from this corrective feedback, and training on the experience collected by the algorithm is not sufficient to correct errors in the Q-function.
Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation
Our method greatly improves stability and sample efficiency of ConvNets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL in environments with unseen visuals.
Mildly Conservative Q-Learning for Offline Reinforcement Learning
The distribution shift between the learned policy and the behavior policy makes it necessary for the value function to stay conservative such that out-of-distribution (OOD) actions will not be severely overestimated.
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
In our approach, we learn an action-value function and we add a term maximizing action-values into the training loss of the conditional diffusion model, which results in a loss that seeks optimal actions that are near the behavior policy.