Browse > Methodology > Q-Learning

Q-Learning

91 papers with code ยท Methodology

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Latest papers without code

Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning

24 Feb 2020

In the discounted cost, infinite horizon setting, all of the known bounds have a factor that is a polynomial in $1/(1-\beta)$, where $\beta < 1$ is the discount factor.

Q-LEARNING

Periodic Q-Learning

23 Feb 2020

The use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited.

Q-LEARNING

Anypath Routing Protocol Design via Q-Learning for Underwater Sensor Networks

22 Feb 2020

As a promising technology in the Internet of Underwater Things, underwater sensor networks have drawn a widespread attention from both academia and industry.

Q-LEARNING

Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity

17 Feb 2020

2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\mathrm{dim}_E\right)$ is tight even in the agnostic setting.

Q-LEARNING

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

ICLR 2020

Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value.

Q-LEARNING

Listwise Learning to Rank with Deep Q-Networks

13 Feb 2020

Learning to Rank is the problem involved with ranking a sequence of documents based on their relevance to a given query.

DECISION MAKING LEARNING-TO-RANK Q-LEARNING

Regret Bounds for Discounted MDPs

12 Feb 2020

In this paper, we prove a regret lower bound of $\Omega\left(\frac{\sqrt{SAT}}{1 - \gamma} - \frac{1}{(1 - \gamma)^2}\right)$ when $T\geq SA$ on any learning algorithm for infinite-horizon discounted Markov decision processes (MDP), where $S$ and $A$ are the numbers of states and actions, $T$ is the number of actions taken, and $\gamma$ is the discounting factor.

Q-LEARNING

Q-Learning for Mean-Field Controls

10 Feb 2020

Multi-agent reinforcement learning (MARL) has been applied to many challenging problems including two-team computer games, autonomous drivings, and real-time biddings.

MULTI-AGENT REINFORCEMENT LEARNING Q-LEARNING

Fast Detection of Maximum Common Subgraph via Deep Q-Learning

8 Feb 2020

Detecting the Maximum Common Subgraph (MCS) between two input graphs is fundamental for applications in biomedical analysis, malware detection, cloud computing, etc.

GRAPH EMBEDDING GRAPH MATCHING MALWARE DETECTION Q-LEARNING

Safe Wasserstein Constrained Deep Q-Learning

7 Feb 2020

This paper presents a distributionally robust Q-Learning algorithm (DrQ) which leverages Wasserstein ambiguity sets to provide probabilistic out-of-sample safety guarantees during online learning.

Q-LEARNING