The beam search refines these policies on the fly by pruning branches that are unfavourably evaluated by a discriminator.
We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME.
Recent research has shown that graph neural networks (GNNs) can learn policies for locomotion control that are as effective as a typical multi-layer perceptron (MLP), with superior transfer and multi-task performance (Wang et al., 2018; Huang et al., 2020).
They also allow practitioners to inject biases encoded in the structure of the input graph.
To rapidly learn a new task, it is often essential for agents to explore efficiently -- especially when performance matters from the first timestep.
Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments.
We discuss those differences and propose modifications to existing regularization techniques in order to better adapt them to RL.
Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning.
We present Multitask Soft Option Learning(MSOL), a hierarchical multitask framework based on Planning as Inference.
Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown.
We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator.
To address these challenges, we propose TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions.
We build on auto-encoding sequential Monte Carlo (AESMC): a method for model and proposal learning based on maximizing the lower bound to the log marginal likelihood in a broad family of structured probabilistic models.