Q-Learning
380 papers with code • 0 benchmarks • 2 datasets
The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.
( Image credit: Playing Atari with Deep Reinforcement Learning )
Benchmarks
These leaderboards are used to track progress in Q-Learning
Libraries
Use these libraries to find Q-Learning models and implementationsLatest papers
Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding
We first propose a selective communication block to gather richer information for better agent coordination within multi-agent environments and train the model with a Q-learning-based algorithm.
Scalable Online Exploration via Coverability
We propose exploration objectives -- policy optimization objectives that enable downstream maximization of any reward function -- as a conceptual framework to systematize the study of exploration.
Belief-Enriched Pessimistic Q-Learning against Adversarial State Perturbations
Existing solutions either introduce a regularization term to improve the smoothness of the trained policy against perturbations or alternatively train the agent's policy and the attacker's policy.
Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning
To address this, we introduce Efficient episodic Memory Utilization (EMU) for MARL, with two primary objectives: (a) accelerating reinforcement learning by leveraging semantically coherent memory from an episodic buffer and (b) selectively promoting desirable transitions to prevent local convergence.
Leveraging Digital Cousins for Ensemble Q-Learning in Large-Scale Wireless Networks
Herein, a novel ensemble Q-learning algorithm that addresses the performance and complexity challenges of the traditional Q-learning algorithm for optimizing wireless networks is presented.
Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization
Reinforcement learning (RL) is a classical tool to solve network control or policy optimization problems in unknown environments.
RadDQN: a Deep Q Learning-based Architecture for Finding Time-efficient Minimum Radiation Exposure Pathway
However, the lack of efficient reward function and effective exploration strategy thwarted its implementation in the development of radiation-aware autonomous unmanned aerial vehicle (UAV) for achieving maximum radiation protection.
VQC-Based Reinforcement Learning with Data Re-uploading: Performance and Trainability
This work empirically studies the performance and trainability of such VQC-based Deep Q-Learning models in classic control benchmark environments.
Decision Making in Non-Stationary Environments with Policy-Augmented Search
In this paper, we introduce \textit{Policy-Augmented Monte Carlo tree search} (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment.
SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning
Alleviating overestimation bias is a critical challenge for deep reinforcement learning to achieve successful performance on more complex tasks or offline datasets containing out-of-distribution data.