Q-Learning

405 papers with code • 0 benchmarks • 2 datasets

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Libraries

Use these libraries to find Q-Learning models and implementations
6 papers
2,684
6 papers
39
5 papers
427
See all 29 libraries.

Most implemented papers

Improved robustness of reinforcement learning policies upon conversion to spiking neuronal network platforms applied to ATARI games

Hananel-Hazan/bindsnet 26 Mar 2019

Previous studies in image classification domain demonstrated that standard NNs (with ReLU nonlinearity) trained using supervised learning can be converted to SNNs with negligible deterioration in performance.

Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology

ray-project/ray 29 May 2019

(i) We develop SLATEQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates.

Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

takuseno/d3rlpy NeurIPS 2019

Bootstrapping error is due to bootstrapping from actions that lie outside of the training data distribution, and it accumulates via the Bellman backup operator.

Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the Past

BY571/Soft-Actor-Critic-and-Extensions 10 Jun 2019

The ERE algorithm samples more aggressively from recent experience, and also orders the updates to ensure that updates from old data do not overwrite updates from new data.

Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms

paintception/Deep-Quality-Value-Family- 1 Sep 2019

This paper makes one step forward towards characterizing a new family of \textit{model-free} Deep Reinforcement Learning (DRL) algorithms.

FACMAC: Factored Multi-Agent Centralised Policy Gradients

schroederdewitt/multiagent_mujoco NeurIPS 2021

We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

ku2482/rljax NeurIPS 2020

We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from this corrective feedback, and training on the experience collected by the algorithm is not sufficient to correct errors in the Q-function.

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

nicklashansen/dmcontrol-generalization-benchmark NeurIPS 2021

Our method greatly improves stability and sample efficiency of ConvNets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL in environments with unseen visuals.

Mildly Conservative Q-Learning for Offline Reinforcement Learning

dmksjfl/mcq 9 Jun 2022

The distribution shift between the learned policy and the behavior policy makes it necessary for the value function to stay conservative such that out-of-distribution (OOD) actions will not be severely overestimated.

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

zhendong-wang/diffusion-policies-for-offline-rl 12 Aug 2022

In our approach, we learn an action-value function and we add a term maximizing action-values into the training loss of the conditional diffusion model, which results in a loss that seeks optimal actions that are near the behavior policy.