Search Results for author: Mohit Shridhar

Found 11 papers, 6 papers with code

GenSim: Generating Robotic Simulation Tasks via Large Language Models

1 code implementation • 2 Oct 2023 • Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shridhar, Chen Bao, Yuzhe Qin, Bailin Wang, Huazhe Xu, Xiaolong Wang

Collecting large amounts of real-world interaction data to train general robotic policies is often prohibitively expensive, thus motivating the use of simulation data.

Code Generation

231

Paper
Code

AR2-D2:Training a Robot Without a Robot

no code implementations • 23 Jun 2023 • Jiafei Duan, Yi Ru Wang, Mohit Shridhar, Dieter Fox, Ranjay Krishna

By contrast, we introduce AR2-D2: a system for collecting demonstrations which (1) does not require people with specialized training, (2) does not require any real robots during data collection, and therefore, (3) enables manipulation of diverse objects with a real robot.

Paper
Add Code

Retrospectives on the Embodied AI Workshop

no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu

We present a retrospective on the state of Embodied AI research.

Visual Navigation

Paper
Add Code

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

1 code implementation • 12 Sep 2022 • Mohit Shridhar, Lucas Manuelli, Dieter Fox

With this formulation, we train a single multi-task Transformer for 18 RLBench tasks (with 249 variations) and 7 real-world tasks (with 18 variations) from just a few demonstrations per task.

Ranked #4 on Robot Manipulation on RLBench

Robot Manipulation

276

Paper
Code

CLIPort: What and Where Pathways for Robotic Manipulation

1 code implementation • 24 Sep 2021 • Mohit Shridhar, Lucas Manuelli, Dieter Fox

We even learn one multi-task policy for 10 simulated and 9 real-world tasks that is better or comparable to single-task policies.

Imitation Learning Robotic Grasping

414

Paper
Code

Language Grounding with 3D Objects

2 code implementations • 26 Jul 2021 • Jesse Thomason, Mohit Shridhar, Yonatan Bisk, Chris Paxton, Luke Zettlemoyer

We introduce several CLIP-based models for distinguishing objects and demonstrate that while recent advances in jointly modeling vision and language are useful for robotic language understanding, it is still the case that these image-based models are weaker at understanding the 3D nature of objects -- properties which play a key role in manipulation.

Paper
Code

BUTLER: Building Understanding in TextWorld via Language for Embodied Reasoning

no code implementations • ICLR 2021 • Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Cote, Yonatan Bisk, Adam Trischler, Matthew Hausknecht

ALFWorld enables the creation of a new BUTLER agent whose abstract knowledge, learned in TextWorld, corresponds directly to concrete, visually grounded actions.

Scene Understanding

Paper
Add Code

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

1 code implementation • 8 Oct 2020 • Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, Matthew Hausknecht

ALFWorld enables the creation of a new BUTLER agent whose abstract knowledge, learned in TextWorld, corresponds directly to concrete, visually grounded actions.

Natural Language Visual Grounding Scene Understanding

248

Paper
Code

ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

7 code implementations • CVPR 2020 • Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox

We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.

Natural Language Visual Grounding

331

Paper
Code

Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction

no code implementations • 11 Jun 2018 • Mohit Shridhar, David Hsu

The first stage uses a neural network to generate visual descriptions of objects, compares them with the input language expression, and identifies a set of candidate objects.

Question Generation Question-Generation +1

Paper
Add Code

Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction

no code implementations • 18 Jul 2017 • Mohit Shridhar, David Hsu

A core issue for the system is semantic and spatial grounding, which is to infer objects and their spatial relationships from images and natural language expressions.

Object

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.