In this paper, we propose the Correlational Object Search POMDP (COS-POMDP), which can be solved to produce search strategies that use correlational information.
In this paper, we propose to leverage domain-independent heuristic functions commonly used in the classical planning literature to improve the sample efficiency of RL.
In robotic domains, learning and planning are complicated by continuous state spaces, continuous action spaces, and long task horizons.
We then propose a bottom-up relational learning method for operator learning and show how the learned operators can be used for planning in a TAMP system.
The problem of planning for a robot that operates in environments containing a large number of objects, taking actions to move itself through the world as well as to change the state of the objects, is known as task and motion planning (TAMP).
We conclude that learning to predict a sufficient set of objects for a planning problem is a simple, powerful, and general mechanism for planning in large instances.
A general meta-planning strategy is to learn to impose constraints on the states considered and actions taken by the agent.
Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own.
We address the problem of efficient exploration for transition model learning in the relational model-based reinforcement learning setting without extrinsic goals or rewards.
We address the problem of approximate model minimization for MDPs in which the state is partitioned into endogenous and (much larger) exogenous components.
Our insight is that for many tasks, the learning process can be decomposed into learning a state-independent task schema (a sequence of skills to execute) and a policy to choose the parameterizations of the skills in a state-dependent manner.
Multi-object manipulation problems in continuous state and action spaces can be solved by planners that search over sampled values for the continuous parameters of operators.
We consider such a setting in which the agent can, while acting, transmit declarative information to the human that helps them understand aspects of this unseen environment.
In many applications that involve processing high-dimensional data, it is important to identify a small set of entities that account for a significant fraction of detections.
In partially observed environments, it can be useful for a human to provide the robot with declarative information that represents probabilistic relational constraints on properties of objects in the world, augmenting the robot's sensory observations.