In this paper, we introduce a framework that can generate plausible human grasping motions suitable for training the robot.
For real-world experiments, AnyTeleop can outperform a previous system that was designed for a specific robot hardware with a higher success rate, using the same robot.
The two encoders are used to compute prototypes of image classes for classification.
In simulations, we find that a single RVT model works well across 18 RLBench tasks with 249 task variations, achieving 26% higher relative success than the existing state-of-the-art method (PerAct).
Ranked #2 on Robot Manipulation on RLBench
We propose the first framework to learn control policies for vision-based human-to-robot handovers, a critical task for human-robot interaction.
The policy learned from our dataset can generalize well on unseen object poses in both simulation and the real world
We introduce the Few-Shot Object Learning (FewSOL) dataset for object recognition with a few images per object.
We analyze the performance of a set of baselines and show a correlation with a real-world evaluation.
However, how to responsively generate smooth motions to take an object from a human is still an open question.
Accurate object rearrangement from vision is a crucial problem for a wide variety of real-world robotics applications in unstructured environments.
Second, we treat this low-dimensional concept as an automatic labeler to synthesize a large-scale high-dimensional data set with the simulator.
We introduce DexYCB, a new dataset for capturing hand grasping of objects.
We demonstrate the generalizability, usability, and robustness of our approach on a novel benchmark set of 26 diverse household objects, a user study with naive users (N=6) handing over a subset of 15 objects, and a systematic evaluation examining different ways of handing objects.
We further show that by using the automatically inferred goal from the video demonstration, our robot is able to reproduce the same task in a real kitchen environment.
Teleoperation offers the possibility of imparting robotic systems with sophisticated reasoning skills, intuition, and creativity to perform tasks.
We experimentally demonstrate the strength of our approach over different non-hierarchical and hierarchical baselines.
no code implementations • 7 Aug 2018 • Parker Hill, Babak Zamirai, Shengshuo Lu, Yu-Wei Chao, Michael Laurenzano, Mehrzad Samadi, Marios Papaefthymiou, Scott Mahlke, Thomas Wenisch, Jia Deng, Lingjia Tang, Jason Mars
With ever-increasing computational demand for deep learning, it is critical to investigate the implications of the numeric representation and precision of DNN model weights and activations on computational efficiency.
We propose TAL-Net, an improved approach to temporal action localization in video that is inspired by the Faster R-CNN object detection framework.
Ranked #22 on Temporal Action Localization on THUMOS’14
We study the problem of detecting human-object interactions (HOI) in static images, defined as predicting a human and an object bounding box with an interaction class label that connects them.
We introduce a new benchmark "Humans Interacting with Common Objects" (HICO) for recognizing human-object interactions (HOI).
In this paper we introduce the new problem of mining the knowledge of semantic affordance: given an object, determining whether an action can be performed on it.
Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification.
Ranked #7 on Room Layout Estimation on SUN RGB-D