1 code implementation • 24 Aug 2023 • Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, Sergey Levine
By publicly sharing BridgeData V2 and our pre-trained models, we aim to accelerate research in scalable robot learning methods.
In such settings, learning individual primitives for each stage that succeed with a high enough rate to perform a complete temporally extended task is impractical: if each stage must be completed successfully and has a non-negligible probability of failure, the likelihood of successful completion of the entire task becomes negligible.
Our method achieves robust performance in the real world by learning an embedding from the labeled data that aligns language not to the goal image, but rather to the desired change between the start and goal images that the instruction corresponds to.
In the same way that the computer vision (CV) and natural language processing (NLP) communities have developed self-supervised methods, reinforcement learning (RL) can be cast as a self-supervised problem: learning to reach any goal, without requiring human-specified rewards or labels.
ATR selects suitable tasks, which consist of an initial environment state and manipulation goal, for learning robust skills by balancing the diversity and feasibility of the tasks.
The utilization of broad datasets has proven to be crucial for generalization for a wide range of fields.
Our experimental results show that PTP can generate feasible sequences of subgoals that enable the policy to efficiently solve the target tasks.
To encourage generalizable skills to emerge, our method trains each skill to specialize in the paired task and maximizes the diversity of the generated tasks.
The experimental results in simulation and on the real robot have demonstrated that the use of implicit neural representations and joint learning of grasp affordance and 3D reconstruction have led to state-of-the-art grasping results.
Search engine has become a fundamental component in various web and mobile applications.
To enable curriculum learning in the absence of a direct indicator of learning progress, we propose to train the task generator by balancing the agent's performance in the generated tasks and the similarity to the target tasks.
Recently, there are a few methods have been proposed which focused on mining information across ranking candidates list for further improvements, such as learning multivariant scoring function or learning contextual embedding.
The fundamental challenge of planning for multi-step manipulation is to find effective and plausible action sequences that lead to the task goal.
Many robotic applications require the agent to perform long-horizon tasks in partially observable environments.
We perform both simulated and real-world experiments on two tool-based manipulation tasks: sweeping and hammering.
Watching expert demonstrations is an important way for humans and robots to reason about affordances of unseen objects.
Ranked #2 on Video-to-image Affordance Grounding on OPRA (28x28)
The external memory explicitly stores previous inputs of each trajectory in a time window, while the internal memory learns to summarize long-term tracking history and associate detections by processing the external memory.
Learning-based approaches to robotic manipulation are limited by the scalability of data collection and accessibility of labels.
We consider the problem of estimating the spatial layout of an indoor scene from a monocular RGB image, modeled as the projection of a 3D cuboid.