At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.
SOTA for Multi-Armed Bandits on Mushroom
As a companion to this paper, we have released an open-source software library for building graph networks, with demonstrations of how to use them in practice.
In this paper, we propose the Quantile Option Architecture (QUOTA) for exploration based on recent advances in distributional reinforcement learning (RL).
End-to-end models for goal-orientated dialogue are challenging to train, because linguistic and strategic aspects are entangled in latent state vectors.
Semantic segmentation is an important tool for visual scene understanding and a meaningful measure of uncertainty is essential for decision making.
Such architectural design and abstractions enable researchers and developers to extend the toolkit with their new algorithms and improvements, and to use it for performance benchmarking.
In this survey, we consider seq2seq problems from the RL point of view and provide a formulation combining the power of RL methods in decision-making with sequence-to-sequence models that enable remembering long-term memories.