Search Results for author: Mohit Shridhar

Found 9 papers, 5 papers with code

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

1 code implementation12 Sep 2022 Mohit Shridhar, Lucas Manuelli, Dieter Fox

With this formulation, we train a single multi-task Transformer for 18 RLBench tasks (with 249 variations) and 7 real-world tasks (with 18 variations) from just a few demonstrations per task.

CLIPort: What and Where Pathways for Robotic Manipulation

1 code implementation24 Sep 2021 Mohit Shridhar, Lucas Manuelli, Dieter Fox

We even learn one multi-task policy for 10 simulated and 9 real-world tasks that is better or comparable to single-task policies.

Imitation Learning Robotic Grasping

Language Grounding with 3D Objects

2 code implementations26 Jul 2021 Jesse Thomason, Mohit Shridhar, Yonatan Bisk, Chris Paxton, Luke Zettlemoyer

We introduce several CLIP-based models for distinguishing objects and demonstrate that while recent advances in jointly modeling vision and language are useful for robotic language understanding, it is still the case that these image-based models are weaker at understanding the 3D nature of objects -- properties which play a key role in manipulation.

BUTLER: Building Understanding in TextWorld via Language for Embodied Reasoning

no code implementations ICLR 2021 Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Cote, Yonatan Bisk, Adam Trischler, Matthew Hausknecht

ALFWorld enables the creation of a new BUTLER agent whose abstract knowledge, learned in TextWorld, corresponds directly to concrete, visually grounded actions.

Scene Understanding

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

1 code implementation8 Oct 2020 Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, Matthew Hausknecht

ALFWorld enables the creation of a new BUTLER agent whose abstract knowledge, learned in TextWorld, corresponds directly to concrete, visually grounded actions.

Natural Language Visual Grounding Scene Understanding

ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

5 code implementations CVPR 2020 Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox

We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.

Natural Language Visual Grounding

Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction

no code implementations11 Jun 2018 Mohit Shridhar, David Hsu

The first stage uses a neural network to generate visual descriptions of objects, compares them with the input language expression, and identifies a set of candidate objects.

Question Generation Question-Generation +1

Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction

no code implementations18 Jul 2017 Mohit Shridhar, David Hsu

A core issue for the system is semantic and spatial grounding, which is to infer objects and their spatial relationships from images and natural language expressions.

Cannot find the paper you are looking for? You can Submit a new open access paper.