Search Results for author: Roozbeh Mottaghi

Found 67 papers, 33 papers with code

ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects

3 code implementations23 Jun 2020 Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans

In particular, the agent is initialized at a random location and pose in an environment and asked to find an instance of an object category, e. g., find a chair, by navigating to it.

Object

RoboTHOR: An Open Simulation-to-Real Embodied AI Platform

1 code implementation CVPR 2020 Matt Deitke, Winson Han, Alvaro Herrasti, Aniruddha Kembhavi, Eric Kolve, Roozbeh Mottaghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, Luca Weihs, Mark Yatskar, Ali Farhadi

We argue that interactive and embodied visual AI has reached a stage of development similar to visual recognition prior to the advent of these ecosystems.

CORA: Benchmarks, Baselines, and Metrics as a Platform for Continual Reinforcement Learning Agents

2 code implementations19 Oct 2021 Sam Powers, Eliot Xing, Eric Kolve, Roozbeh Mottaghi, Abhinav Gupta

In this work, we present CORA, a platform for Continual Reinforcement Learning Agents that provides benchmarks, baselines, and metrics in a single code package.

NetHack reinforcement-learning +1

ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

7 code implementations CVPR 2020 Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox

We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.

Natural Language Visual Grounding

AllenAct: A Framework for Embodied AI Research

1 code implementation28 Aug 2020 Luca Weihs, Jordi Salvador, Klemen Kotar, Unnat Jain, Kuo-Hao Zeng, Roozbeh Mottaghi, Aniruddha Kembhavi

The domain of Embodied AI, in which agents learn to complete tasks through interaction with their environment from egocentric observations, has experienced substantial growth with the advent of deep reinforcement learning and increased interest from the computer vision, NLP, and robotics communities.

Embodied Question Answering Instruction Following +1

Visual Room Rearrangement

2 code implementations CVPR 2021 Luca Weihs, Matt Deitke, Aniruddha Kembhavi, Roozbeh Mottaghi

We particularly focus on the task of Room Rearrangement: an agent begins by exploring a room and recording objects' initial configurations.

Navigate

Simple but Effective: CLIP Embeddings for Embodied AI

2 code implementations CVPR 2022 Apoorv Khandelwal, Luca Weihs, Roozbeh Mottaghi, Aniruddha Kembhavi

Contrastive language image pretraining (CLIP) encoders have been shown to be beneficial for a range of visual tasks from classification and detection to captioning and image manipulation.

Image Manipulation Navigate

ManipulaTHOR: A Framework for Visual Object Manipulation

1 code implementation CVPR 2021 Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi

Object manipulation is an established research domain within the robotics community and poses several challenges including manipulator motion, grasping and long-horizon planning, particularly when dealing with oft-overlooked practical setups involving visually rich and complex scenes, manipulation using mobile agents (as opposed to tabletop manipulation), and generalization to unseen environments and objects.

Object

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

1 code implementation CVPR 2019 Kenneth Marino, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi

In this paper, we address the task of knowledge-based visual question answering and provide a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods that rely on external knowledge resources.

object-detection Object Detection +3

Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second

1 code implementation CVPR 2023 Vincent-Pierre Berges, Andrew Szot, Devendra Singh Chaplot, Aaron Gokaslan, Roozbeh Mottaghi, Dhruv Batra, Eric Undersander

Specifically, a Fetch robot (equipped with a mobile base, 7DoF arm, RGBD camera, egomotion, and onboard sensing) is spawned in a home environment and asked to rearrange objects - by navigating to an object, picking it up, navigating to a target location, and then placing the object at the target location.

Reinforcement Learning (RL)

SeGAN: Segmenting and Generating the Invisible

1 code implementation CVPR 2018 Kiana Ehsani, Roozbeh Mottaghi, Ali Farhadi

Objects often occlude each other in scenes; Inferring their appearance beyond their visible parts plays an important role in scene understanding, depth estimation, object interaction and manipulation.

Depth Estimation Scene Understanding +1

Container: Context Aggregation Network

4 code implementations2 Jun 2021 Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Image Classification Inductive Bias +5

Container: Context Aggregation Networks

2 code implementations NeurIPS 2021 Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Inductive Bias Instance Segmentation +4

A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge

1 code implementation3 Jun 2022 Dustin Schwenk, Apoorv Khandelwal, Christopher Clark, Kenneth Marino, Roozbeh Mottaghi

In contrast to the existing knowledge-based VQA datasets, the questions generally cannot be answered by simply querying a knowledge base, and instead require some form of commonsense reasoning about the scene depicted in the image.

Question Answering Visual Question Answering +1

Interactron: Embodied Adaptive Object Detection

1 code implementation CVPR 2022 Klemen Kotar, Roozbeh Mottaghi

Our adaptive object detection model provides a 7. 2 point improvement in AP (and 12. 7 points in AP50) over DETR, a recent, high-performance object detector.

Object object-detection +1

What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions

1 code implementation16 Oct 2020 Kiana Ehsani, Daniel Gordon, Thomas Nguyen, Roozbeh Mottaghi, Ali Farhadi

Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision.

Action Recognition Depth Estimation +2

Factorizing Perception and Policy for Interactive Instruction Following

1 code implementation ICCV 2021 Kunal Pratap Singh, Suvaansh Bhambri, Byeonghwi Kim, Roozbeh Mottaghi, Jonghyun Choi

Performing simple household tasks based on language directives is very natural to humans, yet it remains an open challenge for AI agents.

Instruction Following Navigate

Visual Semantic Navigation using Scene Priors

1 code implementation ICLR 2019 Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, Roozbeh Mottaghi

Do we use the semantic/functional priors we have built over years to efficiently search and navigate?

Navigate

Pushing it out of the Way: Interactive Visual Navigation

1 code implementation CVPR 2021 Kuo-Hao Zeng, Luca Weihs, Ali Farhadi, Roozbeh Mottaghi

In this paper, we study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.

Navigate Visual Navigation

Multi-Modal Answer Validation for Knowledge-Based VQA

1 code implementation23 Mar 2021 Jialin Wu, Jiasen Lu, Ashish Sabharwal, Roozbeh Mottaghi

Instead of searching for the answer in a vast collection of often irrelevant facts as most existing approaches do, MAVEx aims to learn how to extract relevant knowledge from noisy sources, which knowledge source to trust for each answer candidate, and how to validate the candidate using that source.

Question Answering Retrieval +1

RobustNav: Towards Benchmarking Robustness in Embodied Navigation

1 code implementation ICCV 2021 Prithvijit Chattopadhyay, Judy Hoffman, Roozbeh Mottaghi, Aniruddha Kembhavi

As an attempt towards assessing the robustness of embodied navigation agents, we propose RobustNav, a framework to quantify the performance of embodied navigation agents when exposed to a wide variety of visual - affecting RGB inputs - and dynamics - affecting transition dynamics - corruptions.

Benchmarking Data Augmentation +1

Ask4Help: Learning to Leverage an Expert for Embodied Tasks

1 code implementation18 Nov 2022 Kunal Pratap Singh, Luca Weihs, Alvaro Herrasti, Jonghyun Choi, Aniruddha Kemhavi, Roozbeh Mottaghi

Embodied AI agents continue to become more capable every year with the advent of new models, environments, and benchmarks, but are still far away from being performant and reliable enough to be deployed in real, user-facing, applications.

What do navigation agents learn about their environment?

1 code implementation CVPR 2022 Kshitij Dwivedi, Gemma Roig, Aniruddha Kembhavi, Roozbeh Mottaghi

We use iSEE to probe the dynamic representations produced by these agents for the presence of information about the agent as well as the environment.

Visual Navigation

Visual Reaction: Learning to Play Catch with Your Drone

1 code implementation CVPR 2020 Kuo-Hao Zeng, Roozbeh Mottaghi, Luca Weihs, Ali Farhadi

In this paper we address the problem of visual reaction: the task of interacting with dynamic environments where the changes in the environment are not necessarily caused by the agent itself.

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

2 code implementations16 Sep 2016 Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi

To address the second issue, we propose AI2-THOR framework, which provides an environment with high-quality 3D scenes and physics engine.

3D Reconstruction Feature Engineering +3

Neural Priming for Sample-Efficient Adaptation

1 code implementation NeurIPS 2023 Matthew Wallingford, Vivek Ramanujan, Alex Fang, Aditya Kusupati, Roozbeh Mottaghi, Aniruddha Kembhavi, Ludwig Schmidt, Ali Farhadi

Performing lightweight updates on the recalled data significantly improves accuracy across a variety of distribution shift and transfer learning benchmarks.

Transfer Learning

Neural Radiance Field Codebooks

1 code implementation10 Jan 2023 Matthew Wallingford, Aditya Kusupati, Alex Fang, Vivek Ramanujan, Aniruddha Kembhavi, Roozbeh Mottaghi, Ali Farhadi

Compositional representations of the world are a promising step towards enabling high-level scene understanding and efficient transfer to downstream tasks.

Object Representation Learning +1

See the Glass Half Full: Reasoning about Liquid Containers, their Volume and Content

no code implementations ICCV 2017 Roozbeh Mottaghi, Connor Schenck, Dieter Fox, Ali Farhadi

Doing so requires estimating the volume of the cup, approximating the amount of water in the pitcher, and predicting the behavior of water when we tilt the pitcher.

"What happens if..." Learning to Predict the Effect of Forces in Images

no code implementations17 Mar 2016 Roozbeh Mottaghi, Mohammad Rastegari, Abhinav Gupta, Ali Farhadi

To build a dataset of forces in scenes, we reconstructed all images in SUN RGB-D dataset in a physics simulator to estimate the physical movements of objects caused by external forces applied to them.

Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images

no code implementations12 Nov 2015 Roozbeh Mottaghi, Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi

Direct and explicit estimation of the forces and the motion of objects from a single image is extremely challenging.

Object

A Coarse-to-Fine Model for 3D Pose Estimation and Sub-category Recognition

no code implementations CVPR 2015 Roozbeh Mottaghi, Yu Xiang, Silvio Savarese

Despite the fact that object detection, 3D pose estimation, and sub-category recognition are highly correlated tasks, they are usually addressed independently from each other because of the huge space of parameters.

3D Pose Estimation Object +2

Human-Machine CRFs for Identifying Bottlenecks in Holistic Scene Understanding

no code implementations16 Jun 2014 Roozbeh Mottaghi, Sanja Fidler, Alan Yuille, Raquel Urtasun, Devi Parikh

Recent trends in image understanding have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning, and local appearance based classifiers.

Object object-detection +4

Bottom-Up Segmentation for Top-Down Detection

no code implementations CVPR 2013 Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun

When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP.

Clustering object-detection +3

Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs

no code implementations CVPR 2013 Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun, Devi Parikh

Recent trends in semantic image segmentation have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning.

Image Segmentation object-detection +5

A Task-Oriented Approach for Cost-Sensitive Recognition

no code implementations CVPR 2016 Roozbeh Mottaghi, Hannaneh Hajishirzi, Ali Farhadi

With the recent progress in visual recognition, we have already started to see a surge of vision related real-world applications.

Scene Understanding

AAAI-2019 Workshop on Games and Simulations for Artificial Intelligence

no code implementations6 Mar 2019 Marwan Mattar, Roozbeh Mottaghi, Julian Togelius, Danny Lange

This volume represents the accepted submissions from the AAAI-2019 Workshop on Games and Simulations for Artificial Intelligence held on January 29, 2019 in Honolulu, Hawaii, USA.

Complexity of Representation and Inference in Compositional Models with Part Sharing

no code implementations16 Jan 2013 Alan L. Yuille, Roozbeh Mottaghi

This paper describes serial and parallel compositional models of multiple objects with part sharing.

Learning Generalizable Visual Representations via Interactive Gameplay

no code implementations17 Dec 2019 Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, Ali Farhadi

A growing body of research suggests that embodied gameplay, prevalent not just in human cultures but across a variety of animal species including turtles and ravens, is critical in developing the neural flexibility for creative problem solving, decision making, and socialization.

Decision Making Representation Learning

VisualCOMET: Reasoning about the Dynamic Context of a Still Image

no code implementations ECCV 2020 Jae Sung Park, Chandra Bhagavatula, Roozbeh Mottaghi, Ali Farhadi, Yejin Choi

In addition, we provide person-grounding (i. e., co-reference links) between people appearing in the image and people mentioned in the textual commonsense descriptions, allowing for tighter integration between images and text.

Visual Commonsense Reasoning

Learning About Objects by Learning to Interact with Them

no code implementations NeurIPS 2020 Martin Lohmann, Jordi Salvador, Aniruddha Kembhavi, Roozbeh Mottaghi

Much of the remarkable progress in computer vision has been focused around fully supervised learning mechanisms relying on highly curated datasets for a variety of tasks.

Learning Flexible Visual Representations via Interactive Gameplay

no code implementations ICLR 2021 Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, Ali Farhadi

A growing body of research suggests that embodied gameplay, prevalent not just in human cultures but across a variety of animal species including turtles and ravens, is critical in developing the neural flexibility for creative problem solving, decision making and socialization.

Decision Making Representation Learning

Learning Visual Representation from Human Interactions

no code implementations ICLR 2021 Kiana Ehsani, Daniel Gordon, Thomas Hai Dang Nguyen, Roozbeh Mottaghi, Ali Farhadi

Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision.

Action Recognition Depth Estimation +2

Hierarchical Modular Framework for Long Horizon Instruction Following

no code implementations29 Sep 2021 Suvaansh Bhambri, Byeonghwi Kim, Roozbeh Mottaghi, Jonghyun Choi

To address such composite tasks, we propose a hierarchical modular approach to learn agents that navigate and manipulate objects in a divide-and-conquer manner for the diverse nature of the entailing tasks.

Instruction Following Navigate

ASC me to Do Anything: Multi-task Training for Embodied AI

no code implementations14 Feb 2022 Jiasen Lu, Jordi Salvador, Roozbeh Mottaghi, Aniruddha Kembhavi

We propose Atomic Skill Completion (ASC), an approach for multi-task training for Embodied AI, where a set of atomic skills shared across multiple tasks are composed together to perform the tasks.

Object Manipulation via Visual Target Localization

no code implementations15 Mar 2022 Kiana Ehsani, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi

Object manipulation is a critical skill required for Embodied AI agents interacting with the world around them.

Object object-detection +1

Continuous Scene Representations for Embodied AI

no code implementations CVPR 2022 Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song, Roozbeh Mottaghi

Our method captures feature relationships between objects, composes them into a graph structure on-the-fly, and situates an embodied agent within the representation.

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

no code implementations17 Jun 2022 Jiasen Lu, Christopher Clark, Rowan Zellers, Roozbeh Mottaghi, Aniruddha Kembhavi

We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical computer vision tasks, including pose estimation, object detection, depth estimation and image generation, vision-and-language tasks such as region captioning and referring expression, to natural language processing tasks such as question answering and paraphrasing.

Depth Estimation Image Generation +12

ENTL: Embodied Navigation Trajectory Learner

no code implementations ICCV 2023 Klemen Kotar, Aaron Walsman, Roozbeh Mottaghi

ENTL's generic architecture enables sharing of the spatio-temporal sequence encoder for multiple challenging embodied tasks.

Imitation Learning

Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics

no code implementations24 Apr 2023 Kuo-Hao Zeng, Luca Weihs, Roozbeh Mottaghi, Ali Farhadi

A common assumption when training embodied agents is that the impact of taking an action is stable; for instance, executing the "move ahead" action will always move the agent forward by a fixed distance, perhaps with some small amount of actuator-induced noise.

Visual Navigation

Controllable Human-Object Interaction Synthesis

no code implementations6 Dec 2023 Jiaman Li, Alexander Clegg, Roozbeh Mottaghi, Jiajun Wu, Xavier Puig, C. Karen Liu

Naively applying a diffusion model fails to predict object motion aligned with the input waypoints and cannot ensure the realism of interactions that require precise hand-object contact and appropriate contact grounded by the floor.

Human-Object Interaction Detection Object

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

no code implementations9 Apr 2024 Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani, Sriram Yenamandra, Theophile Gervet, Matthew Chang, Zsolt Kira, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi

The Embodied AI community has made significant strides in visual navigation tasks, exploring targets from 3D coordinates, objects, language descriptions, and images.

Navigate Visual Navigation

Cannot find the paper you are looking for? You can Submit a new open access paper.