Methodology

Q-Learning

405 papers with code • 0 benchmarks • 2 datasets

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Benchmarks

Add a Result

These leaderboards are used to track progress in Q-Learning

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Libraries

Use these libraries to find Q-Learning models and implementations

opendilab/DI-engine

6 papers

2,684

zzmtsvv/rl_task

6 papers

hill-a/stable-baselines

5 papers

4,077

toni-sm/skrl

5 papers

427

See all 29 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

Improved robustness of reinforcement learning policies upon conversion to spiking neuronal network platforms applied to ATARI games

Hananel-Hazan/bindsnet • • 26 Mar 2019

Previous studies in image classification domain demonstrated that standard NNs (with ReLU nonlinearity) trained using supervised learning can be converted to SNNs with negligible deterioration in performance.

Paper
Code

Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology

ray-project/ray • 29 May 2019

(i) We develop SLATEQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates.

Paper
Code

Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

takuseno/d3rlpy • • NeurIPS 2019

Bootstrapping error is due to bootstrapping from actions that lie outside of the training data distribution, and it accumulates via the Bellman backup operator.

Paper
Code

Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the Past

BY571/Soft-Actor-Critic-and-Extensions • • 10 Jun 2019

The ERE algorithm samples more aggressively from recent experience, and also orders the updates to ensure that updates from old data do not overwrite updates from new data.

Paper
Code

Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms

paintception/Deep-Quality-Value-Family- • 1 Sep 2019

This paper makes one step forward towards characterizing a new family of \textit{model-free} Deep Reinforcement Learning (DRL) algorithms.

Paper
Code

FACMAC: Factored Multi-Agent Centralised Policy Gradients

schroederdewitt/multiagent_mujoco • NeurIPS 2021

We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.

Paper
Code

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

ku2482/rljax • • NeurIPS 2020

We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from this corrective feedback, and training on the experience collected by the algorithm is not sufficient to correct errors in the Q-function.

Paper
Code

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

nicklashansen/dmcontrol-generalization-benchmark • • NeurIPS 2021

Our method greatly improves stability and sample efficiency of ConvNets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL in environments with unseen visuals.

Paper
Code

Mildly Conservative Q-Learning for Offline Reinforcement Learning

dmksjfl/mcq • • 9 Jun 2022

The distribution shift between the learned policy and the behavior policy makes it necessary for the value function to stay conservative such that out-of-distribution (OOD) actions will not be severely overestimated.

Paper
Code

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

zhendong-wang/diffusion-policies-for-offline-rl • • 12 Aug 2022

In our approach, we learn an action-value function and we add a term maximizing action-values into the training loss of the conditional diffusion model, which results in a loss that seeks optimal actions that are near the behavior policy.

Paper
Code

Q-Learning

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result