We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL.
Efficient credit assignment is essential for reinforcement learning algorithms in both prediction and control settings.
Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms.
Building on this insight and on previous results on transfer learning, we show how to approximate options whose cumulants are linear combinations of the cumulants of known options.
Scaling issues are mundane yet irritating for practitioners of reinforcement learning.
Both strategies considerably reduce the amount of data needed to solve a reinforcement-learning problem.
The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem and remains a central research question in reinforcement learning and artificial intelligence.
Determining what experience to generate to best facilitate learning (i. e. exploration) is one of the distinguishing features and open challenges in reinforcement learning.
We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.
Rather than proposing a new method, this paper investigates an issue present in existing learning algorithms.
In this paper we extend the SFs & GPI framework in two ways.
We focus on one aspect in particular, namely the ability to generalise to unseen tasks.
Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent.
We investigate a paradigm in multi-task reinforcement learning (MT-RL) in which an agent is placed in an environment and needs to learn to perform a series of tasks, within this space.
We consider the problem of modelling noisy but highly symmetric shapes that can be viewed as hierarchies of whole-part relationships in which higher level objects are composed of transformed collections of lower level objects.