We further evaluate on a simulated Sawyer pushing task with eye gaze control, and the Lunar Lander game with simulated user commands, and find that our method improves over baseline interfaces in these domains as well.
However, current VLMs are limited in their understanding of the physical concepts (e. g., material, fragility) of common objects, which restricts their usefulness for robotic manipulation tasks that involve interaction and physical reasoning about such objects.
To bridge the gap between IL and RL, we introduce Distance Weighted Supervised Learning or DWSL, a supervised method for learning goal-conditioned policies from offline data.
In the typing domain, we leverage backspaces as feedback that the interface did not perform the desired action.
Building assistive interfaces for controlling robots through arbitrary, high-dimensional, noisy inputs (e. g., webcam images of eye gaze) can be challenging, especially when it involves inferring the user's desired action in the absence of a natural 'default' interface.
In the typing domain, we leverage backspaces as implicit feedback that the interface did not perform the desired action.