Browse > Methodology > Q-Learning

# Q-Learning Edit

71 papers with code · Methodology

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

No evaluation results yet. Help compare methods by submit evaluation metrics.

# ModelicaGym: Applying Reinforcement Learning to Modelica Models

18 Sep 2019

This paper presents ModelicaGym toolbox that was developed to employ Reinforcement Learning (RL) for solving optimization and control tasks in Modelica models.

# Split Deep Q-Learning for Robust Object Singulation

17 Sep 2019

To achieve the above goal we employ reinforcement learning and particularly Deep Q-learning (DQN) to learn optimal push policies by trial and error.

# ISL: Optimal Policy Learning With Optimal Exploration-Exploitation Trade-Off

13 Sep 2019

The algorithm is of the gradient type (and therefore has good convergence properties even when used in conjunction with function approximators such as neural networks); it is off-policy; and it specifies both the update equations and the strategy to address the exploration-exploitation dilemma.

# SQLR: Short Term Memory Q-Learning for Elastic Provisioning

12 Sep 2019

As more and more application providers transition to the cloud and deliver their services on a Software as a Service (SaaS) basis, cloud providers need to make their provisioning systems agile enough to deliver on Service Level Agreements.

# Joint Inference of Reward Machines and Policies for Reinforcement Learning

12 Sep 2019

The experiments show that learning high-level knowledge in the form of reward machines can lead to fast convergence to optimal policies in RL, while standard RL methods such as q-learning and hierarchical RL methods fail to converge to optimal policies after a substantial number of training steps in many tasks.

# A Deep Learning Approach to Grasping the Invisible

11 Sep 2019

We introduce a new problem named "grasping the invisible", where a robot is tasked to grasp an initially invisible target object via a sequence of non-prehensile (e. g., pushing) and prehensile (e. g., grasping) actions.

# Mutual-Information Regularization in Markov Decision Processes and Actor-Critic Learning

11 Sep 2019

While this has been initially proposed for Markov Decision Processes (MDPs) in tabular settings, it was recently shown that a similar principle leads to significant improvements over vanilla SQL in RL for high-dimensional domains with discrete actions and function approximators.

# A Multistep Lyapunov Approach for Finite-Time Analysis of Biased Stochastic Approximation

10 Sep 2019

Motivated by the widespread use of temporal-difference (TD-) and Q-learning algorithms in reinforcement learning, this paper studies a class of biased stochastic approximation (SA) procedures under a mild "ergodic-like" assumption on the underlying stochastic noise sequence.

# Q-Learning Based Aerial Base Station Placement for Fairness Enhancement in Mobile Networks

10 Sep 2019

In this paper, we use an aerial base station (aerial-BS) to enhance fairness in a dynamic environment with user mobility.

# Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

9 Sep 2019

We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a $\textit{fixed}$ number of future time steps.