Search Results for author: Roozbeh Mottaghi

Found 67 papers, 33 papers with code

Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots

3 code implementations • 19 Oct 2023 • Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M. Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, Roozbeh Mottaghi

We present Habitat 3. 0: a simulation platform for studying collaborative human-robot tasks in home environments.

Social Navigation

2,350

Paper
Code

On Evaluation of Embodied Navigation Agents

9 code implementations • 18 Jul 2018 • Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir

Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence.

Benchmarking

1,700

Paper
Code

ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects

3 code implementations • 23 Jun 2020 • Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans

In particular, the agent is initialized at a random location and pose in an environment and asked to find an instance of an object category, e. g., find a chair, by navigating to it.

Object

1,700

Paper
Code

AI2-THOR: An Interactive 3D Environment for Visual AI

2 code implementations • 14 Dec 2017 • Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Matt Deitke, Kiana Ehsani, Daniel Gordon, Yuke Zhu, Aniruddha Kembhavi, Abhinav Gupta, Ali Farhadi

We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at http://ai2thor. allenai. org.

Imitation Learning Navigate +7

1,014

Paper
Code

RoboTHOR: An Open Simulation-to-Real Embodied AI Platform

1 code implementation • CVPR 2020 • Matt Deitke, Winson Han, Alvaro Herrasti, Aniruddha Kembhavi, Eric Kolve, Roozbeh Mottaghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, Luca Weihs, Mark Yatskar, Ali Farhadi

We argue that interactive and embodied visual AI has reached a stage of development similar to visual recognition prior to the advent of these ecosystems.

1,014

Paper
Code

CORA: Benchmarks, Baselines, and Metrics as a Platform for Continual Reinforcement Learning Agents

2 code implementations • 19 Oct 2021 • Sam Powers, Eliot Xing, Eric Kolve, Roozbeh Mottaghi, Abhinav Gupta

In this work, we present CORA, a platform for Continual Reinforcement Learning Agents that provides benchmarks, baselines, and metrics in a single code package.

NetHack reinforcement-learning +1

448

Paper
Code

ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

7 code implementations • CVPR 2020 • Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox

We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.

Natural Language Visual Grounding

331

Paper
Code

AllenAct: A Framework for Embodied AI Research

1 code implementation • 28 Aug 2020 • Luca Weihs, Jordi Salvador, Klemen Kotar, Unnat Jain, Kuo-Hao Zeng, Roozbeh Mottaghi, Aniruddha Kembhavi

The domain of Embodied AI, in which agents learn to complete tasks through interaction with their environment from egocentric observations, has experienced substantial growth with the advent of deep reinforcement learning and increased interest from the computer vision, NLP, and robotics communities.

Embodied Question Answering Instruction Following +1

294

Paper
Code

Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning

2 code implementations • CVPR 2019 • Mitchell Wortsman, Kiana Ehsani, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi

In this paper we study the problem of learning to learn at both training and test time in the context of visual navigation.

Ranked #2 on Visual Navigation on AI2-THOR

Meta-Learning Meta Reinforcement Learning +1

185

Paper
Code

Visual Room Rearrangement

2 code implementations • CVPR 2021 • Luca Weihs, Matt Deitke, Aniruddha Kembhavi, Roozbeh Mottaghi

We particularly focus on the task of Room Rearrangement: an agent begins by exploring a room and recording objects' initial configurations.

Navigate

Paper
Code

Simple but Effective: CLIP Embeddings for Embodied AI

2 code implementations • CVPR 2022 • Apoorv Khandelwal, Luca Weihs, Roozbeh Mottaghi, Aniruddha Kembhavi

Contrastive language image pretraining (CLIP) encoders have been shown to be beneficial for a range of visual tasks from classification and detection to captioning and image manipulation.

Image Manipulation Navigate

Paper
Code

ManipulaTHOR: A Framework for Visual Object Manipulation

1 code implementation • CVPR 2021 • Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi

Object manipulation is an established research domain within the robotics community and poses several challenges including manipulator motion, grasping and long-horizon planning, particularly when dealing with oft-overlooked practical setups involving visually rich and complex scenes, manipulation using mobile agents (as opposed to tabletop manipulation), and generalization to unseen environments and objects.

Object

Paper
Code

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

1 code implementation • CVPR 2019 • Kenneth Marino, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi

In this paper, we address the task of knowledge-based visual question answering and provide a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods that rely on external knowledge resources.

object-detection Object Detection +3

Paper
Code

Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second

1 code implementation • CVPR 2023 • Vincent-Pierre Berges, Andrew Szot, Devendra Singh Chaplot, Aaron Gokaslan, Roozbeh Mottaghi, Dhruv Batra, Eric Undersander

Specifically, a Fetch robot (equipped with a mobile base, 7DoF arm, RGBD camera, egomotion, and onboard sensing) is spawned in a home environment and asked to rearrange objects - by navigating to an object, picking it up, navigating to a target location, and then placing the object at the target location.

Reinforcement Learning (RL)

Paper
Code

Who Let The Dogs Out? Modeling Dog Behavior From Visual Data

1 code implementation • CVPR 2018 • Kiana Ehsani, Hessam Bagherinezhad, Joseph Redmon, Roozbeh Mottaghi, Ali Farhadi

We introduce the task of directly modeling a visually intelligent agent.

Representation Learning

Paper
Code

SeGAN: Segmenting and Generating the Invisible

1 code implementation • CVPR 2018 • Kiana Ehsani, Roozbeh Mottaghi, Ali Farhadi

Objects often occlude each other in scenes; Inferring their appearance beyond their visible parts plays an important role in scene understanding, depth estimation, object interaction and manipulation.

Depth Estimation Scene Understanding +1

Paper
Code

Container: Context Aggregation Network

4 code implementations • 2 Jun 2021 • Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Ranked #462 on Image Classification on ImageNet

Image Classification Inductive Bias +5

Paper
Code

Container: Context Aggregation Networks

2 code implementations • NeurIPS 2021 • Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Inductive Bias Instance Segmentation +4

Paper
Code

A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge

1 code implementation • 3 Jun 2022 • Dustin Schwenk, Apoorv Khandelwal, Christopher Clark, Kenneth Marino, Roozbeh Mottaghi

In contrast to the existing knowledge-based VQA datasets, the questions generally cannot be answered by simply querying a knowledge base, and instead require some form of commonsense reasoning about the scene depicted in the image.

Question Answering Visual Question Answering +1

Paper
Code

Interactron: Embodied Adaptive Object Detection

1 code implementation • CVPR 2022 • Klemen Kotar, Roozbeh Mottaghi

Our adaptive object detection model provides a 7. 2 point improvement in AP (and 12. 7 points in AP50) over DETR, a recent, high-performance object detector.

Object object-detection +1

Paper
Code

What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions

1 code implementation • 16 Oct 2020 • Kiana Ehsani, Daniel Gordon, Thomas Nguyen, Roozbeh Mottaghi, Ali Farhadi

Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision.

Action Recognition Depth Estimation +2

Paper
Code

Factorizing Perception and Policy for Interactive Instruction Following

1 code implementation • ICCV 2021 • Kunal Pratap Singh, Suvaansh Bhambri, Byeonghwi Kim, Roozbeh Mottaghi, Jonghyun Choi

Performing simple household tasks based on language directives is very natural to humans, yet it remains an open challenge for AI agents.

Instruction Following Navigate

Paper
Code

Contrasting Contrastive Self-Supervised Representation Learning Pipelines

1 code implementation • ICCV 2021 • Klemen Kotar, Gabriel Ilharco, Ludwig Schmidt, Kiana Ehsani, Roozbeh Mottaghi

In the past few years, we have witnessed remarkable breakthroughs in self-supervised representation learning.

Representation Learning

Paper
Code

Visual Semantic Navigation using Scene Priors

1 code implementation • ICLR 2019 • Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, Roozbeh Mottaghi

Do we use the semantic/functional priors we have built over years to efficiently search and navigate?

Navigate

Paper
Code

Pushing it out of the Way: Interactive Visual Navigation

1 code implementation • CVPR 2021 • Kuo-Hao Zeng, Luca Weihs, Ali Farhadi, Roozbeh Mottaghi

In this paper, we study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.

Navigate Visual Navigation

Paper
Code

Multi-Modal Answer Validation for Knowledge-Based VQA

1 code implementation • 23 Mar 2021 • Jialin Wu, Jiasen Lu, Ashish Sabharwal, Roozbeh Mottaghi

Instead of searching for the answer in a vast collection of often irrelevant facts as most existing approaches do, MAVEx aims to learn how to extract relevant knowledge from noisy sources, which knowledge source to trust for each answer candidate, and how to validate the candidate using that source.

Question Answering Retrieval +1

Paper
Code

RobustNav: Towards Benchmarking Robustness in Embodied Navigation

1 code implementation • ICCV 2021 • Prithvijit Chattopadhyay, Judy Hoffman, Roozbeh Mottaghi, Aniruddha Kembhavi

As an attempt towards assessing the robustness of embodied navigation agents, we propose RobustNav, a framework to quantify the performance of embodied navigation agents when exposed to a wide variety of visual - affecting RGB inputs - and dynamics - affecting transition dynamics - corruptions.

Benchmarking Data Augmentation +1

Paper
Code

Ask4Help: Learning to Leverage an Expert for Embodied Tasks

1 code implementation • 18 Nov 2022 • Kunal Pratap Singh, Luca Weihs, Alvaro Herrasti, Jonghyun Choi, Aniruddha Kemhavi, Roozbeh Mottaghi

Embodied AI agents continue to become more capable every year with the advent of new models, environments, and benchmarks, but are still far away from being performant and reliable enough to be deployed in real, user-facing, applications.

Paper
Code

What do navigation agents learn about their environment?

1 code implementation • CVPR 2022 • Kshitij Dwivedi, Gemma Roig, Aniruddha Kembhavi, Roozbeh Mottaghi

We use iSEE to probe the dynamic representations produced by these agents for the presence of information about the agent as well as the environment.

Visual Navigation

Paper
Code

Visual Reaction: Learning to Play Catch with Your Drone

1 code implementation • CVPR 2020 • Kuo-Hao Zeng, Roozbeh Mottaghi, Luca Weihs, Ali Farhadi

In this paper we address the problem of visual reaction: the task of interacting with dynamic environments where the changes in the environment are not necessarily caused by the agent itself.

Paper
Code

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

2 code implementations • 16 Sep 2016 • Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi

To address the second issue, we propose AI2-THOR framework, which provides an environment with high-quality 3D scenes and physics engine.

3D Reconstruction Feature Engineering +3

Paper
Code

Neural Priming for Sample-Efficient Adaptation

1 code implementation • NeurIPS 2023 • Matthew Wallingford, Vivek Ramanujan, Alex Fang, Aditya Kusupati, Roozbeh Mottaghi, Aniruddha Kembhavi, Ludwig Schmidt, Ali Farhadi

Performing lightweight updates on the recalled data significantly improves accuracy across a variety of distribution shift and transfer learning benchmarks.

Transfer Learning

Paper
Code

Neural Radiance Field Codebooks

1 code implementation • 10 Jan 2023 • Matthew Wallingford, Aditya Kusupati, Alex Fang, Vivek Ramanujan, Aniruddha Kembhavi, Roozbeh Mottaghi, Ali Farhadi

Compositional representations of the world are a promising step towards enabling high-level scene understanding and efficient transfer to downstream tasks.

Object Representation Learning +1

Paper
Code

See the Glass Half Full: Reasoning about Liquid Containers, their Volume and Content

no code implementations • ICCV 2017 • Roozbeh Mottaghi, Connor Schenck, Dieter Fox, Ali Farhadi

Doing so requires estimating the volume of the cup, approximating the amount of water in the pitcher, and predicting the behavior of water when we tilt the pitcher.

Paper
Add Code

Visual Semantic Planning using Deep Successor Representations

no code implementations • ICCV 2017 • Yuke Zhu, Daniel Gordon, Eric Kolve, Dieter Fox, Li Fei-Fei, Abhinav Gupta, Roozbeh Mottaghi, Ali Farhadi

A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world.

Imitation Learning Reinforcement Learning (RL)

Paper
Add Code

"What happens if..." Learning to Predict the Effect of Forces in Images

no code implementations • 17 Mar 2016 • Roozbeh Mottaghi, Mohammad Rastegari, Abhinav Gupta, Ali Farhadi

To build a dataset of forces in scenes, we reconstructed all images in SUN RGB-D dataset in a physics simulator to estimate the physical movements of objects caused by external forces applied to them.

Paper
Add Code

Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images

no code implementations • 12 Nov 2015 • Roozbeh Mottaghi, Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi

Direct and explicit estimation of the forces and the motion of objects from a single image is extremely challenging.

Object

Paper
Add Code

A Coarse-to-Fine Model for 3D Pose Estimation and Sub-category Recognition

no code implementations • CVPR 2015 • Roozbeh Mottaghi, Yu Xiang, Silvio Savarese

Despite the fact that object detection, 3D pose estimation, and sub-category recognition are highly correlated tasks, they are usually addressed independently from each other because of the huge space of parameters.

3D Pose Estimation Object +2

Paper
Add Code

Human-Machine CRFs for Identifying Bottlenecks in Holistic Scene Understanding

no code implementations • 16 Jun 2014 • Roozbeh Mottaghi, Sanja Fidler, Alan Yuille, Raquel Urtasun, Devi Parikh

Recent trends in image understanding have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning, and local appearance based classifiers.

Object object-detection +4

Paper
Add Code

Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts

no code implementations • CVPR 2014 • Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, Alan Yuille

Our model automatically decouples the holistic object or body parts from the model when they are hard to detect.

Object Semantic Part Detection

Paper
Add Code

Bottom-Up Segmentation for Top-Down Detection

no code implementations • CVPR 2013 • Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun

When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP.

Clustering object-detection +3

Paper
Add Code

Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs

no code implementations • CVPR 2013 • Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun, Devi Parikh

Recent trends in semantic image segmentation have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning.

Image Segmentation object-detection +5

Paper
Add Code

The Role of Context for Object Detection and Semantic Segmentation in the Wild

no code implementations • CVPR 2014 • Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Nam-Gyu Cho, Seong-Whan Lee, Sanja Fidler, Raquel Urtasun, Alan Yuille

In this paper we study the role of context in existing state-of-the-art detection and segmentation approaches.

object-detection Object Detection +2

Paper
Add Code

A Task-Oriented Approach for Cost-Sensitive Recognition

no code implementations • CVPR 2016 • Roozbeh Mottaghi, Hannaneh Hajishirzi, Ali Farhadi

With the recent progress in visual recognition, we have already started to see a surge of vision related real-world applications.

Scene Understanding

Paper
Add Code

Newtonian Scene Understanding: Unfolding the Dynamics of Objects in Static Images

no code implementations • CVPR 2016 • Roozbeh Mottaghi, Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi

Direct and explicit estimation of the forces and the motion of objects from a single image is extremely challenging.

Object Scene Understanding

Paper
Add Code

AAAI-2019 Workshop on Games and Simulations for Artificial Intelligence

no code implementations • 6 Mar 2019 • Marwan Mattar, Roozbeh Mottaghi, Julian Togelius, Danny Lange

This volume represents the accepted submissions from the AAAI-2019 Workshop on Games and Simulations for Artificial Intelligence held on January 29, 2019 in Honolulu, Hawaii, USA.

Paper
Add Code

Complexity of Representation and Inference in Compositional Models with Part Sharing

no code implementations • 16 Jan 2013 • Alan L. Yuille, Roozbeh Mottaghi

This paper describes serial and parallel compositional models of multiple objects with part sharing.

Paper
Add Code

Learning Generalizable Visual Representations via Interactive Gameplay

no code implementations • 17 Dec 2019 • Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, Ali Farhadi

A growing body of research suggests that embodied gameplay, prevalent not just in human cultures but across a variety of animal species including turtles and ravens, is critical in developing the neural flexibility for creative problem solving, decision making, and socialization.

Decision Making Representation Learning

Paper
Add Code

VisualCOMET: Reasoning about the Dynamic Context of a Still Image

no code implementations • ECCV 2020 • Jae Sung Park, Chandra Bhagavatula, Roozbeh Mottaghi, Ali Farhadi, Yejin Choi

In addition, we provide person-grounding (i. e., co-reference links) between people appearing in the image and people mentioned in the textual commonsense descriptions, allowing for tighter integration between images and text.

Visual Commonsense Reasoning

Paper
Add Code

Learning About Objects by Learning to Interact with Them

no code implementations • NeurIPS 2020 • Martin Lohmann, Jordi Salvador, Aniruddha Kembhavi, Roozbeh Mottaghi

Much of the remarkable progress in computer vision has been focused around fully supervised learning mechanisms relying on highly curated datasets for a variety of tasks.

Paper
Add Code

Learning Flexible Visual Representations via Interactive Gameplay

no code implementations • ICLR 2021 • Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, Ali Farhadi

Decision Making Representation Learning

Paper
Add Code

Learning Visual Representation from Human Interactions

no code implementations • ICLR 2021 • Kiana Ehsani, Daniel Gordon, Thomas Hai Dang Nguyen, Roozbeh Mottaghi, Ali Farhadi

Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision.

Action Recognition Depth Estimation +2

Paper
Add Code

Rearrangement: A Challenge for Embodied AI

no code implementations • 3 Nov 2020 • Dhruv Batra, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun, Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, Manolis Savva, Hao Su

In the rearrangement task, the goal is to bring a given physical environment into a specified state.

Benchmarking

Paper
Add Code

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World

no code implementations • ACL 2021 • Rowan Zellers, Ari Holtzman, Matthew Peters, Roozbeh Mottaghi, Aniruddha Kembhavi, Ali Farhadi, Yejin Choi

We propose PIGLeT: a model that learns physical commonsense knowledge through interaction, and then uses this knowledge to ground language.

Language Modelling Sentence

Paper
Add Code

Hierarchical Modular Framework for Long Horizon Instruction Following

no code implementations • 29 Sep 2021 • Suvaansh Bhambri, Byeonghwi Kim, Roozbeh Mottaghi, Jonghyun Choi

To address such composite tasks, we propose a hierarchical modular approach to learn agents that navigate and manipulate objects in a divide-and-conquer manner for the diverse nature of the entailing tasks.

Instruction Following Navigate

Paper
Add Code

ASC me to Do Anything: Multi-task Training for Embodied AI

no code implementations • 14 Feb 2022 • Jiasen Lu, Jordi Salvador, Roozbeh Mottaghi, Aniruddha Kembhavi

We propose Atomic Skill Completion (ASC), an approach for multi-task training for Embodied AI, where a set of atomic skills shared across multiple tasks are composed together to perform the tasks.

Paper
Add Code

Object Manipulation via Visual Target Localization

no code implementations • 15 Mar 2022 • Kiana Ehsani, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi

Object manipulation is a critical skill required for Embodied AI agents interacting with the world around them.

Object object-detection +1

Paper
Add Code

Continuous Scene Representations for Embodied AI

no code implementations • CVPR 2022 • Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song, Roozbeh Mottaghi

Our method captures feature relationships between objects, composes them into a graph structure on-the-fly, and situates an embodied agent within the representation.

Paper
Add Code

ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

no code implementations • 14 Jun 2022 • Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi

Massive datasets and high-capacity models have driven many recent advancements in computer vision and natural language understanding.

Natural Language Understanding

Paper
Add Code

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

no code implementations • 17 Jun 2022 • Jiasen Lu, Christopher Clark, Rowan Zellers, Roozbeh Mottaghi, Aniruddha Kembhavi

We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical computer vision tasks, including pose estimation, object detection, depth estimation and image generation, vision-and-language tasks such as region captioning and referring expression, to natural language processing tasks such as question answering and paraphrasing.

Ranked #1 on Object Segmentation on GRIT

Depth Estimation Image Generation +12

Paper
Add Code

Retrospectives on the Embodied AI Workshop

no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu

We present a retrospective on the state of Embodied AI research.

Visual Navigation

Paper
Add Code

Navigating to Objects Specified by Images

no code implementations • ICCV 2023 • Jacob Krantz, Theophile Gervet, Karmesh Yadav, Austin Wang, Chris Paxton, Roozbeh Mottaghi, Dhruv Batra, Jitendra Malik, Stefan Lee, Devendra Singh Chaplot

Our modular method solves sub-tasks of exploration, goal instance re-identification, goal localization, and local navigation.

Navigate Visual Reasoning

Paper
Add Code

ENTL: Embodied Navigation Trajectory Learner

no code implementations • ICCV 2023 • Klemen Kotar, Aaron Walsman, Roozbeh Mottaghi

ENTL's generic architecture enables sharing of the spatio-temporal sequence encoder for multiple challenging embodied tasks.

Imitation Learning

Paper
Add Code

Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics

no code implementations • 24 Apr 2023 • Kuo-Hao Zeng, Luca Weihs, Roozbeh Mottaghi, Ali Farhadi

A common assumption when training embodied agents is that the impact of taking an action is stable; for instance, executing the "move ahead" action will always move the agent forward by a fixed distance, perhaps with some small amount of actuator-induced noise.

Visual Navigation

Paper
Add Code

HomeRobot: Open-Vocabulary Mobile Manipulation

no code implementations • 20 Jun 2023 • Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alexander William Clegg, John Turner, Zsolt Kira, Manolis Savva, Angel Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton

HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks.

Paper
Add Code

Controllable Human-Object Interaction Synthesis

no code implementations • 6 Dec 2023 • Jiaman Li, Alexander Clegg, Roozbeh Mottaghi, Jiajun Wu, Xavier Puig, C. Karen Liu

Naively applying a diffusion model fails to predict object motion aligned with the input waypoints and cannot ensure the realism of interactions that require precise hand-object contact and appropriate contact grounded by the floor.

Human-Object Interaction Detection Object

Paper
Add Code

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

no code implementations • 9 Apr 2024 • Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani, Sriram Yenamandra, Theophile Gervet, Matthew Chang, Zsolt Kira, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi

The Embodied AI community has made significant strides in visual navigation tasks, exploring targets from 3D coordinates, objects, language descriptions, and images.

Navigate Visual Navigation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.