222 papers with code • 0 benchmarks • 2 datasets

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Greatest papers with code

Bridging the Gap Between Value and Policy Based Reinforcement Learning

tensorflow/models NeurIPS 2017

We establish a new connection between value and policy based reinforcement learning (RL) based on a relationship between softmax temporal value consistency and policy optimality under entropy regularization.


Revisiting Fundamentals of Experience Replay

google-research/google-research ICML 2020

Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding.

DQN Replay Dataset Q-Learning

Optimization of Molecules via Deep Reinforcement Learning

google-research/google-research 19 Oct 2018

We present a framework, which we call Molecule Deep $Q$-Networks (MolDQN), for molecule optimization by combining domain knowledge of chemistry and state-of-the-art reinforcement learning techniques (double $Q$-learning and randomized value functions).

 Ranked #1 on Molecular Graph Generation on ZINC (QED Top-3 metric)

Molecular Graph Generation Q-Learning

Deep Reinforcement Learning with Double Q-learning

tensorpack/tensorpack 22 Sep 2015

The popular Q-learning algorithm is known to overestimate action values under certain conditions.

Atari Games Q-Learning

Playing Atari with Deep Reinforcement Learning

tensorpack/tensorpack 19 Dec 2013

We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning.

Atari Games Q-Learning

Increasing the Action Gap: New Operators for Reinforcement Learning

janhuenermann/neurojs 15 Dec 2015

Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator.

Atari Games Q-Learning

Addressing Function Approximation Error in Actor-Critic Methods

hill-a/stable-baselines ICML 2018

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies.

OpenAI Gym Q-Learning

Continuous control with deep reinforcement learning

hill-a/stable-baselines 9 Sep 2015

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain.

Continuous Control Q-Learning

Breaking the Deadly Triad with a Target Network

ShangtongZhang/DeepRL 21 Jan 2021

The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously.