Robot Manipulation Generalization

18 papers with code • 2 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Segment Anything

facebookresearch/segment-anything ICCV 2023

We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation.

SAM 2: Segment Anything in Images and Videos

facebookresearch/sam2 1 Aug 2024

We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos.

Instruction-driven history-aware policies for robotic manipulations

vlc-robot/polarnet 11 Sep 2022

In human environments, robots are expected to accomplish a variety of manipulation tasks given simple natural language instructions.

Segment Anything for Videos: A Systematic Survey

liliu-avril/Awesome-Segment-Anything 31 Jul 2024

To address this gap, this work conducts a systematic review on SAM for videos in the era of foundation models.

Sam2Rad: A Segmentation Model for Medical Images with Learnable Prompts

aswahd/sam2radiology 10 Sep 2024

SAM and its variants often fail to segment structures in ultrasound (US) images due to domain shift.

Masked Visual Pre-training for Motor Control

ir413/mvp 11 Mar 2022

This paper shows that self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels.

R3M: A Universal Visual Representation for Robot Manipulation

facebookresearch/r3m 23 Mar 2022

We study how visual representations pre-trained on diverse human video data can enable data-efficient learning of downstream robotic manipulation tasks.

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

peract/peract 12 Sep 2022

With this formulation, we train a single multi-task Transformer for 18 RLBench tasks (with 249 variations) and 7 real-world tasks (with 18 variations) from just a few demonstrations per task.

RVT: Robotic View Transformer for 3D Object Manipulation

NVlabs/RVT 26 Jun 2023

In simulations, we find that a single RVT model works well across 18 RLBench tasks with 249 task variations, achieving 26% higher relative success than the existing state-of-the-art method (PerAct).

PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation

vlc-robot/polarnet 27 Sep 2023

The ability for robots to comprehend and execute manipulation tasks based on natural language instructions is a long-term goal in robotics.