We analyze the performance of a set of baselines and show a correlation with a real-world evaluation.
no code implementations • 7 May 2022 • Yashraj Narang, Kier Storey, Iretiayo Akinola, Miles Macklin, Philipp Reist, Lukasz Wawrzyniak, Yunrong Guo, Adam Moravanszky, Gavriel State, Michelle Lu, Ankur Handa, Dieter Fox
We aim for Factory to open the doors to using simulation for robotic assembly, as well as many other contact-rich applications in robotics.
In this paper, we explore natural language as an expressive and flexible tool for robot correction.
However, how to responsively generate smooth motions to take an object from a human is still an open question.
In this work, we present DiSECt: the first differentiable simulator for cutting soft materials.
Accurate object rearrangement from vision is a crucial problem for a wide variety of real-world robotics applications in unstructured environments.
Deformable object manipulation remains a challenging task in robotics research.
To address this challenge, we propose a new approach whereby the robot learns a low-dimensional variant of the concept and uses it to generate a larger data set for learning the concept in the high-dimensional space.
no code implementations • 28 Oct 2021 • Nicholas Roy, Ingmar Posner, Tim Barfoot, Philippe Beaudoin, Yoshua Bengio, Jeannette Bohg, Oliver Brock, Isabelle Depatie, Dieter Fox, Dan Koditschek, Tomas Lozano-Perez, Vikash Mansinghka, Christopher Pal, Blake Richards, Dorsa Sadigh, Stefan Schaal, Gaurav Sukhatme, Denis Therien, Marc Toussaint, Michiel Van de Panne
Machine learning has long since become a keystone technology, accelerating science and applications in a broad range of domains.
Geometric organization of objects into semantically meaningful arrangements pervades the built world.
Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task.
In this paper, we consider the problem of continuous control for various robot manipulation tasks with an explicit representation that promotes skill reuse while learning multiple tasks, related through the reward function.
We even learn one multi-task policy for 10 simulated and 9 real-world tasks that is better or comparable to single-task policies.
Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state, where the ability to reason about spatial relationships among object entities from raw sensor inputs is crucial.
We further demonstrate the ability of our planner to generate and execute diverse manipulation plans through a set of real-world experiments with a variety of objects.
Natural language provides an accessible and expressive interface to specify long-term tasks for robotic agents.
This paper outlines BayesSimIG: a library that provides an implementation of BayesSim integrated with the recently released NVIDIA IsaacGym.
Specifically, given a reconstructed 3D scene in the form of point clouds with 3D bounding boxes of potential object candidates, and a language utterance referring to a target object in the scene, our model successfully identifies the target object from a set of potential candidates.
We postulate that a network architecture that encodes relations between objects at a high-level can be beneficial.
We propose NeRP (Neural Rearrangement Planning), a deep learning based approach for multi-step neural object rearrangement planning which works with never-before-seen objects, that is trained on simulation data, and generalizes to the real world.
The adversarial perturbations encourage a optimal policy that is robust to changes in the dynamics.
We introduce DexYCB, a new dataset for capturing hand grasping of objects.
Key to our approach is a local implicit neural representation built on ray-voxel pairs that allows our method to generalize to unseen objects and achieve fast inference speed.
Our novel grasp representation treats 3D points of the recorded point cloud as potential grasp contacts.
The policy structure provides the user an interface to 1) specifying the spaces that are directly relevant to the completion of the tasks, and 2) designing policies for certain tasks that do not need to be learned.
no code implementations • 7 Dec 2020 • Sebastian Höfer, Kostas Bekris, Ankur Handa, Juan Camilo Gamboa, Florian Golemo, Melissa Mozifian, Chris Atkeson, Dieter Fox, Ken Goldberg, John Leonard, C. Karen Liu, Jan Peters, Shuran Song, Peter Welinder, Martha White
This report presents the debates, posters, and discussions of the Sim2Real workshop held in conjunction with the 2020 edition of the "Robotics: Science and System" conference.
The learned model outperforms both traditional pipelines and learned ablations by 9. 8% in accuracy on a dataset of simulated collision queries and is 75x faster than the best-performing baseline.
We demonstrate the generalizability, usability, and robustness of our approach on a novel benchmark set of 26 diverse household objects, a user study with naive users (N=6) handing over a subset of 15 objects, and a systematic evaluation examining different ways of handing objects.
Zero-shot execution of unseen robotic tasks is important to allowing robots to perform a wide variety of tasks in human environments, but collecting the amounts of data necessary to train end-to-end policies in the real-world is often infeasible.
By casting MPC as a Bayesian inference problem, we employ variational methods for posterior computation, naturally encoding the complexity and multi-modality of the decision making problem.
Our key insight is that the geometric structure of the intersection and the incentive of agents to move efficiently and avoid collisions (rationality) reduces the space of likely behaviors, effectively relaxing the problem of trajectory prediction.
We demonstrate that our learned policy can be integrated into a tabletop 6D grasping system and a human-robot handover system to improve the grasping performance of unseen objects.
Point cloud registration is a fundamental problem in 3D computer vision, graphics and robotics.
In this work, we propose a new method for unseen object instance segmentation by learning RGB-D feature embeddings from synthetic data.
We also show that our method can segment unseen objects for robot grasping.
We assume access to different configurations and environmental conditions, i. e., data from unknown interventions on the underlying system; thus, we can hope to discover the correct underlying causal graph without explicit interventions.
The complex motions are encoded as rollouts of a stable dynamical system, which, under a change of coordinates defined by a diffeomorphism, is equivalent to a simple, hand-specified dynamical system.
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
Recent single image unsupervised representation learning techniques show remarkable success on a variety of tasks.
In this paper, we propose an approach for human-to-robot handovers in which the robot meets the human halfway, by classifying the human's grasp of the object and quickly planning a trajectory accordingly to take the object from the human's hand according to their intent.
On the other hand, symbolic planning methods such as STRIPS have long been able to solve new problems given only a domain definition and a symbolic goal, but these approaches often struggle on the real world robotic tasks due to the challenges of grounding these symbols from sensor data in a partially-observable world.
Model-free Reinforcement Learning (RL) works well when experience can be collected cheaply and model-based RL is effective when system dynamics can be modeled accurately.
With the increasing speed and quality of physics simulations, generating large-scale grasping data sets that feed learning algorithms is becoming more and more popular.
We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.
We evaluate the performance of our method for unseen object pose estimation on MOPED as well as the ModelNet and LINEMOD datasets.
We show experimental results for three different camera sensors, demonstrating that our approach is able to achieve accuracy with a single frame that is better than that of classic off-line hand-eye calibration using multiple frames.
We propose a Grasping Objects Approach for Tactile (GOAT) robotic hands, learning to overcome the reality gap problem.
We further show that by using the automatically inferred goal from the video demonstration, our robot is able to reproduce the same task in a real kitchen environment.
For simple short-horizon manipulation tasks with modest variation in task instances, offline learning from a small set of demonstrations can produce controllers that successfully solve the task.
To solve multi-step manipulation tasks in the real world, an autonomous robot must take actions to observe its environment and react to unexpected observations.
In this work, we bridge the gap between recent pose estimation and tracking work to develop a powerful method for robots to track objects in their surroundings.
Widespread adoption of self-driving cars will depend not only on their safety but largely on their ability to interact with human users.
Teleoperation offers the possibility of imparting robotic systems with sophisticated reasoning skills, intuition, and creativity to perform tasks.
RMPfusion supplements RMPflow with weight functions that can hierarchically reshape the Lyapunov functions of the subtask RMPs according to the current configuration of the robot and environment.
In this way, our system is able to continuously collect data and improve its pose estimation modules.
While recent work has shown direct estimation techniques can be quite powerful, geometric tracking methods using point clouds can provide a very high level of 3D accuracy which is necessary for many robotic applications.
We show that our method, trained on this dataset, can produce sharp and accurate masks, outperforming state-of-the-art methods on unseen object instance segmentation.
We introduce BayesSim, a framework for robotics simulations allowing a full Bayesian treatment for the parameters of the simulator.
In this work, we formulate the 6D object pose tracking problem in the Rao-Blackwellized particle filtering framework, where the 3D rotation and the 3D translation of an object are decoupled.
Using a dataset of contact demonstrations from humans grasping diverse household objects, we synthesize functional grasps for three hand models and two functional intents.
In this work we propose Hierarchical Planning and Reinforcement Learning (HIP-RL), a method for merging the benefits and capabilities of Symbolic Planning with the learning abilities of Deep Reinforcement Learning.
Building perceptual systems for robotics which perform well under tight computational budgets requires novel architectures which rethink the traditional computer vision pipeline.
We develop a novel policy synthesis algorithm, RMPflow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs).
Robotics Systems and Control
In doing so, we are able to change the distribution of simulations to improve the policy transfer by matching the policy behavior in simulation and the real world.
Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively large number of training samples for learning complex tasks.
Using synthetic data generated in this manner, we introduce a one-shot deep neural network that is able to perform competitively against a state-of-the-art network trained on a combination of real and synthetic data.
no code implementations • 18 Apr 2018 • Niko Sünderhauf, Oliver Brock, Walter Scheirer, Raia Hadsell, Dieter Fox, Jürgen Leitner, Ben Upcroft, Pieter Abbeel, Wolfram Burgard, Michael Milford, Peter Corke
In this paper we discuss a number of robotics-specific learning, reasoning, and embodiment challenges for deep learning.
Estimating the 6D pose of objects from images is an important problem in various applications such as robot manipulation and virtual reality.
Ranked #1 on 6D Pose Estimation using RGB on YCB-Video
Our experiments show that our proposed model outperforms popular single controller based methods on IQUAD V1.
The last several years have seen significant progress in using depth cameras for tracking articulated objects such as human bodies, hands, and robotic manipulators.
Understanding procedural language requires anticipating the causal effects of actions, even when they are not explicitly stated.
We conduct extensive experiments on our YCB-Video dataset and the OccludedLINEMOD dataset to show that PoseCNN is highly robust to occlusions, can handle symmetric objects, and provide accurate pose estimation using only color images as input.
Ranked #3 on 6D Pose Estimation using RGBD on YCB-Video
A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world.
Robust object tracking requires knowledge and understanding of the object being tracked: its appearance, its motion, and how it changes over time.
Doing so requires estimating the volume of the cup, approximating the amount of water in the pitcher, and predicting the behavior of water when we tilt the pitcher.
Recent advances in AI and robotics have claimed many incredible results with deep learning, yet no work to date has applied deep learning to the problem of liquid perception and reasoning.
We present the first dense SLAM system capable of reconstructing non-rigidly deforming scenes in real-time, by fusing together RGBD scans captured from commodity sensors.
Autonomous learning has been a promising direction in control and robotics for more than a decade since data-driven learning allows to reduce the amount of engineering knowledge, which is otherwise required.
Learning policies that generalize across multiple tasks is an important and challenging research topic in reinforcement learning and robotics.
Complex real-world signals, such as images, contain discriminative structures that differ in many aspects including scale, invariance, and data channel.
We highlight the kernel view of orientation histograms, and show that they are equivalent to a certain type of match kernels over image patches.