TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Zero-shot Generalization	CALVIN	3D Diffuser Actor	Avg. sequence length	3.27	# 1
Robot Manipulation	RLBench	3D Diffuser Actor	Succ. Rate (18 tasks, 100 demo/task)	81.3	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/3d-diffuser-actor-policy-diffusion-with-3d/zero-shot-generalization-on-calvin)](https://paperswithcode.com/sota/zero-shot-generalization-on-calvin?p=3d-diffuser-actor-policy-diffusion-with-3d)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/3d-diffuser-actor-policy-diffusion-with-3d/robot-manipulation-on-rlbench)](https://paperswithcode.com/sota/robot-manipulation-on-rlbench?p=3d-diffuser-actor-policy-diffusion-with-3d)`

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations

18 Feb 2024 · Tsung-Wei Ke*, Nikolaos Gkanatsios*, and Katerina Fragkiadaki ·

We marry diffusion policies and 3D scene representations for robot manipulation. Diffusion policies learn the action distribution conditioned on the robot and environment state using conditional diffusion models. They have recently shown to outperform both deterministic and alternative state-conditioned action distribution learning methods. 3D robot policies use 3D scene feature representations aggregated from a single or multiple camera views using sensed depth. They have shown to generalize better than their 2D counterparts across camera viewpoints. We unify these two lines of work and present 3D Diffuser Actor, a neural policy architecture that, given a language instruction, builds a 3D representation of the visual scene and conditions on it to iteratively denoise 3D rotations and translations for the robot's end-effector. At each denoising iteration, our model represents end-effector pose estimates as 3D scene tokens and predicts the 3D translation and rotation error for each of them, by featurizing them using 3D relative attention to other 3D visual and language tokens. 3D Diffuser Actor sets a new state-of-the-art on RLBench with an absolute performance gain of 16.3% over the current SOTA on a multi-view setup and an absolute gain of 13.1% on a single-view setup. On the CALVIN benchmark, it outperforms the current SOTA in the setting of zero-shot unseen scene generalization by being able to successfully run 0.2 more tasks, a 7% relative increase. It also works in the real world from a handful of demonstrations. We ablate our model's architectural design choices, such as 3D scene featurization and 3D relative attentions, and show they all help generalization. Our results suggest that 3D scene representations and powerful generative modeling are keys to efficient robot learning from demonstrations.

PDF Abstract

Code

Add Remove Mark official

nickgkan/3d_diffuser_actor

115

Tasks

Add Remove

Denoising

Robot Manipulation

Zero-shot Generalization

Datasets

RLBench

CALVIN

Results from the Paper

Add Remove

Ranked #1 on Zero-shot Generalization on CALVIN

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Zero-shot Generalization	CALVIN	3D Diffuser Actor	Avg. sequence length	3.27	# 1	Compare
Robot Manipulation	RLBench	3D Diffuser Actor	Succ. Rate (18 tasks, 100 demo/task)	81.3	# 1	Compare

Methods

Add Remove

Diffusion

Edit Social Preview

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove