In this paper, we focus on the problem of assistive teaching of motor control tasks such as parking a car or landing an aircraft.
To this end, we first propose various forms of comparative feedback, e. g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function, which may be parametric or non-parametric.
At a certain time, to forecast a reasonable future trajectory, each agent needs to pay attention to the interactions with only a small group of most relevant agents instead of unnecessarily paying attention to all the other agents.
Our results show that the proposed partner-aware strategy outperforms other known methods, and our human subject studies suggest humans prefer to collaborate with AI agents implementing our partner-aware strategy.
Reward learning is a fundamental problem in human-robot interaction to have robots that operate in alignment with what their human user wants.
Coordination is often critical to forming prosocial behaviors -- behaviors that increase the overall sum of rewards received by all agents in a multi-agent game.
In turn, significant increases in traffic congestion are expected, since people are likely to prefer using their own vehicles or taxis as opposed to riskier and more crowded options such as the railway.
ROIAL learns Bayesian posteriors that predict each exoskeleton user's utility landscape across four exoskeleton gait parameters.
Multi-agent safe systems have become an increasingly important area of study as we can now easily have multiple AI-powered systems operating together.
To address driving in near-accident scenarios, we propose a hierarchical reinforcement and imitation learning (H-ReIL) approach that consists of low-level policies learned by IL for discrete driving modes, and a high-level policy learned by RL that switches between different driving modes.
As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers.
Our results in simulations and a user study suggest that our approach can efficiently learn expressive reward functions for robotics tasks.
Overall, we extend existing rational human models so that collaborative robots can anticipate and plan around suboptimal human behavior during HRI.
While active learning methods attempt to tackle this issue by labeling only the data samples that give high information, they generally suffer from large computational costs and are impractical in settings where data can be collected in parallel.
We propose a safe exploration algorithm for deterministic Markov Decision Processes with unknown transition models.