274 papers with code • 0 benchmarks • 2 datasets

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )


Use these libraries to find Q-Learning models and implementations
5 papers
4 papers
See all 15 libraries.

Most implemented papers

Continuous control with deep reinforcement learning

ray-project/ray 9 Sep 2015

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain.

Playing Atari with Deep Reinforcement Learning

ray-project/ray 19 Dec 2013

We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning.

Deep Reinforcement Learning with Double Q-learning

labmlai/annotated_deep_learning_paper_implementations 22 Sep 2015

The popular Q-learning algorithm is known to overestimate action values under certain conditions.

Addressing Function Approximation Error in Actor-Critic Methods

sfujim/TD3 ICML 2018

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies.

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

ray-project/ray 10 Mar 2017

We explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients.

A disembodied developmental robotic agent called Samu Bátfai

nbatfai/isaac 9 Nov 2015

The basic objective of this paper is to reach the same results using reinforcement learning with general function approximators that can be achieved by using the classical Q lookup table on small input samples.

Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

uber-research/deep-neuroevolution 18 Dec 2017

Here we demonstrate they can: we evolve the weights of a DNN with a simple, gradient-free, population-based genetic algorithm (GA) and it performs well on hard deep RL problems, including Atari and humanoid locomotion.

Conservative Q-Learning for Offline Reinforcement Learning

aviralkumar2907/CQL NeurIPS 2020

We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.