91 papers with code ยท
Methodology

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

No evaluation results yet. Help compare methods by
submit
evaluation metrics.

In the discounted cost, infinite horizon setting, all of the known bounds have a factor that is a polynomial in $1/(1-\beta)$, where $\beta < 1$ is the discount factor.

The use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited.

As a promising technology in the Internet of Underwater Things, underwater sensor networks have drawn a widespread attention from both academia and industry.

2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\mathrm{dim}_E\right)$ is tight even in the agnostic setting.

Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value.

Learning to Rank is the problem involved with ranking a sequence of documents based on their relevance to a given query.

In this paper, we prove a regret lower bound of $\Omega\left(\frac{\sqrt{SAT}}{1 - \gamma} - \frac{1}{(1 - \gamma)^2}\right)$ when $T\geq SA$ on any learning algorithm for infinite-horizon discounted Markov decision processes (MDP), where $S$ and $A$ are the numbers of states and actions, $T$ is the number of actions taken, and $\gamma$ is the discounting factor.

Multi-agent reinforcement learning (MARL) has been applied to many challenging problems including two-team computer games, autonomous drivings, and real-time biddings.

Detecting the Maximum Common Subgraph (MCS) between two input graphs is fundamental for applications in biomedical analysis, malware detection, cloud computing, etc.

This paper presents a distributionally robust Q-Learning algorithm (DrQ) which leverages Wasserstein ambiguity sets to provide probabilistic out-of-sample safety guarantees during online learning.