Q-Learning
386 papers with code • 0 benchmarks • 2 datasets
The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.
( Image credit: Playing Atari with Deep Reinforcement Learning )
Benchmarks
These leaderboards are used to track progress in Q-Learning
Libraries
Use these libraries to find Q-Learning models and implementationsLatest papers
RadDQN: a Deep Q Learning-based Architecture for Finding Time-efficient Minimum Radiation Exposure Pathway
However, the lack of efficient reward function and effective exploration strategy thwarted its implementation in the development of radiation-aware autonomous unmanned aerial vehicle (UAV) for achieving maximum radiation protection.
VQC-Based Reinforcement Learning with Data Re-uploading: Performance and Trainability
This work empirically studies the performance and trainability of such VQC-based Deep Q-Learning models in classic control benchmark environments.
Decision Making in Non-Stationary Environments with Policy-Augmented Search
In this paper, we introduce \textit{Policy-Augmented Monte Carlo tree search} (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment.
SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning
Alleviating overestimation bias is a critical challenge for deep reinforcement learning to achieve successful performance on more complex tasks or offline datasets containing out-of-distribution data.
Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
In the setting of finite episodic Markov decision processes with $S$ states, $A$ actions, and episode length $H$, we present an optimistic Q-learning algorithm that achieves $\tilde{\mathcal{O}}(\text{Poly}(H)\sqrt{T})$ regret under perfect knowledge of $f$, where $T$ is the total number of interactions with the system.
Investigating the Performance and Reliability, of the Q-Learning Algorithm in Various Unknown Environments
As previously indicated, the majority of the conclusions of this study about the relationship between computation cost and environment and also dependability can be transferred to more sophisticated temporal difference-based algorithms because all methods are iterative.
I Open at the Close: A Deep Reinforcement Learning Evaluation of Open Streets Initiatives
In order to simulate the impact of opening streets, we first compare models for predicting vehicle collisions given network and temporal data.
Synthesis of Temporally-Robust Policies for Signal Temporal Logic Tasks using Reinforcement Learning
The second objective is to maximize the worst-case spatial robustness value within a bounded time shift.
Efficient Sparse-Reward Goal-Conditioned Reinforcement Learning with a High Replay Ratio and Regularization
The simplified REDQ with our modifications achieves $\sim 8 \times$ better sample efficiency than the SoTA methods in 4 Fetch tasks of Robotics.
Multi-Agent Reinforcement Learning via Distributed MPC as a Function Approximator
Existing work on RL has demonstrated the use of model predictive control (MPC) as a function approximator for the policy and value functions.