Search Results for author: Roozbeh Mottaghi

Found 51 papers, 26 papers with code

Continuous Scene Representations for Embodied AI

no code implementations31 Mar 2022 Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song, Roozbeh Mottaghi

Our method captures feature relationships between objects, composes them into a graph structure on-the-fly, and situates an embodied agent within the representation.

Object Manipulation via Visual Target Localization

no code implementations15 Mar 2022 Kiana Ehsani, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi

Object manipulation is a critical skill required for Embodied AI agents interacting with the world around them.

Object Detection

ASC me to Do Anything: Multi-task Training for Embodied AI

no code implementations14 Feb 2022 Jiasen Lu, Jordi Salvador, Roozbeh Mottaghi, Aniruddha Kembhavi

We propose Atomic Skill Completion (ASC), an approach for multi-task training for Embodied AI, where a set of atomic skills shared across multiple tasks are composed together to perform the tasks.

Interactron: Embodied Adaptive Object Detection

1 code implementation1 Feb 2022 Klemen Kotar, Roozbeh Mottaghi

Moreover, we show that our object detection model adapts to environments with completely different appearance characteristics, and its performance is on par with a model trained with full supervision for those environments.

Object Detection

Container: Context Aggregation Networks

1 code implementation NeurIPS 2021 Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Instance Segmentation Object Detection +2

Simple but Effective: CLIP Embeddings for Embodied AI

2 code implementations18 Nov 2021 Apoorv Khandelwal, Luca Weihs, Roozbeh Mottaghi, Aniruddha Kembhavi

Contrastive language image pretraining (CLIP) encoders have been shown to be beneficial for a range of visual tasks from classification and detection to captioning and image manipulation.

Image Manipulation

CORA: Benchmarks, Baselines, and Metrics as a Platform for Continual Reinforcement Learning Agents

2 code implementations19 Oct 2021 Sam Powers, Eliot Xing, Eric Kolve, Roozbeh Mottaghi, Abhinav Gupta

In this work, we present CORA, a platform for Continual Reinforcement Learning Agents that provides benchmarks, baselines, and metrics in a single code package.

NetHack reinforcement-learning

Hierarchical Modular Framework for Long Horizon Instruction Following

no code implementations29 Sep 2021 Suvaansh Bhambri, Byeonghwi Kim, Roozbeh Mottaghi, Jonghyun Choi

To address such composite tasks, we propose a hierarchical modular approach to learn agents that navigate and manipulate objects in a divide-and-conquer manner for the diverse nature of the entailing tasks.

RobustNav: Towards Benchmarking Robustness in Embodied Navigation

1 code implementation ICCV 2021 Prithvijit Chattopadhyay, Judy Hoffman, Roozbeh Mottaghi, Aniruddha Kembhavi

As an attempt towards assessing the robustness of embodied navigation agents, we propose RobustNav, a framework to quantify the performance of embodied navigation agents when exposed to a wide variety of visual - affecting RGB inputs - and dynamics - affecting transition dynamics - corruptions.

Data Augmentation Visual Navigation

Container: Context Aggregation Network

2 code implementations2 Jun 2021 Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Image Classification Instance Segmentation +3

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World

no code implementations ACL 2021 Rowan Zellers, Ari Holtzman, Matthew Peters, Roozbeh Mottaghi, Aniruddha Kembhavi, Ali Farhadi, Yejin Choi

We propose PIGLeT: a model that learns physical commonsense knowledge through interaction, and then uses this knowledge to ground language.

Language Modelling

Pushing it out of the Way: Interactive Visual Navigation

1 code implementation CVPR 2021 Kuo-Hao Zeng, Luca Weihs, Ali Farhadi, Roozbeh Mottaghi

In this paper, we study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.

Visual Navigation

ManipulaTHOR: A Framework for Visual Object Manipulation

1 code implementation CVPR 2021 Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi

Object manipulation is an established research domain within the robotics community and poses several challenges including manipulator motion, grasping and long-horizon planning, particularly when dealing with oft-overlooked practical setups involving visually rich and complex scenes, manipulation using mobile agents (as opposed to tabletop manipulation), and generalization to unseen environments and objects.

Visual Room Rearrangement

2 code implementations CVPR 2021 Luca Weihs, Matt Deitke, Aniruddha Kembhavi, Roozbeh Mottaghi

We particularly focus on the task of Room Rearrangement: an agent begins by exploring a room and recording objects' initial configurations.

Multi-Modal Answer Validation for Knowledge-Based VQA

1 code implementation23 Mar 2021 Jialin Wu, Jiasen Lu, Ashish Sabharwal, Roozbeh Mottaghi

Instead of searching for the answer in a vast collection of often irrelevant facts as most existing approaches do, MAVEx aims to learn how to extract relevant knowledge from noisy sources, which knowledge source to trust for each answer candidate, and how to validate the candidate using that source.

Question Answering Visual Question Answering +1

Learning Visual Representation from Human Interactions

no code implementations ICLR 2021 Kiana Ehsani, Daniel Gordon, Thomas Hai Dang Nguyen, Roozbeh Mottaghi, Ali Farhadi

Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision.

Action Recognition Depth Estimation +2

Learning Flexible Visual Representations via Interactive Gameplay

no code implementations ICLR 2021 Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, Ali Farhadi

A growing body of research suggests that embodied gameplay, prevalent not just in human cultures but across a variety of animal species including turtles and ravens, is critical in developing the neural flexibility for creative problem solving, decision making and socialization.

Decision Making Representation Learning

Factorizing Perception and Policy for Interactive Instruction Following

1 code implementation ICCV 2021 Kunal Pratap Singh, Suvaansh Bhambri, Byeonghwi Kim, Roozbeh Mottaghi, Jonghyun Choi

Performing simple household tasks based on language directives is very natural to humans, yet it remains an open challenge for AI agents.

What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions

1 code implementation16 Oct 2020 Kiana Ehsani, Daniel Gordon, Thomas Nguyen, Roozbeh Mottaghi, Ali Farhadi

Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision.

Action Recognition Depth Estimation +2

AllenAct: A Framework for Embodied AI Research

1 code implementation28 Aug 2020 Luca Weihs, Jordi Salvador, Klemen Kotar, Unnat Jain, Kuo-Hao Zeng, Roozbeh Mottaghi, Aniruddha Kembhavi

The domain of Embodied AI, in which agents learn to complete tasks through interaction with their environment from egocentric observations, has experienced substantial growth with the advent of deep reinforcement learning and increased interest from the computer vision, NLP, and robotics communities.

Embodied Question Answering Question Answering

ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects

3 code implementations23 Jun 2020 Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans

In particular, the agent is initialized at a random location and pose in an environment and asked to find an instance of an object category, e. g., find a chair, by navigating to it.

Learning About Objects by Learning to Interact with Them

no code implementations NeurIPS 2020 Martin Lohmann, Jordi Salvador, Aniruddha Kembhavi, Roozbeh Mottaghi

Much of the remarkable progress in computer vision has been focused around fully supervised learning mechanisms relying on highly curated datasets for a variety of tasks.

VisualCOMET: Reasoning about the Dynamic Context of a Still Image

no code implementations ECCV 2020 Jae Sung Park, Chandra Bhagavatula, Roozbeh Mottaghi, Ali Farhadi, Yejin Choi

In addition, we provide person-grounding (i. e., co-reference links) between people appearing in the image and people mentioned in the textual commonsense descriptions, allowing for tighter integration between images and text.

Frame Visual Commonsense Reasoning

RoboTHOR: An Open Simulation-to-Real Embodied AI Platform

1 code implementation CVPR 2020 Matt Deitke, Winson Han, Alvaro Herrasti, Aniruddha Kembhavi, Eric Kolve, Roozbeh Mottaghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, Luca Weihs, Mark Yatskar, Ali Farhadi

We argue that interactive and embodied visual AI has reached a stage of development similar to visual recognition prior to the advent of these ecosystems.

Learning Generalizable Visual Representations via Interactive Gameplay

no code implementations17 Dec 2019 Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, Ali Farhadi

A growing body of research suggests that embodied gameplay, prevalent not just in human cultures but across a variety of animal species including turtles and ravens, is critical in developing the neural flexibility for creative problem solving, decision making, and socialization.

Decision Making Representation Learning

Visual Reaction: Learning to Play Catch with Your Drone

1 code implementation CVPR 2020 Kuo-Hao Zeng, Roozbeh Mottaghi, Luca Weihs, Ali Farhadi

In this paper we address the problem of visual reaction: the task of interacting with dynamic environments where the changes in the environment are not necessarily caused by the agent itself.

ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

5 code implementations CVPR 2020 Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox

We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.

Natural Language Visual Grounding

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

1 code implementation CVPR 2019 Kenneth Marino, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi

In this paper, we address the task of knowledge-based visual question answering and provide a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods that rely on external knowledge resources.

Object Detection Question Answering +3

AAAI-2019 Workshop on Games and Simulations for Artificial Intelligence

no code implementations6 Mar 2019 Marwan Mattar, Roozbeh Mottaghi, Julian Togelius, Danny Lange

This volume represents the accepted submissions from the AAAI-2019 Workshop on Games and Simulations for Artificial Intelligence held on January 29, 2019 in Honolulu, Hawaii, USA.

Visual Semantic Navigation using Scene Priors

1 code implementation ICLR 2019 Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, Roozbeh Mottaghi

Do we use the semantic/functional priors we have built over years to efficiently search and navigate?

reinforcement-learning

On Evaluation of Embodied Navigation Agents

9 code implementations18 Jul 2018 Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir

Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence.

Visual Semantic Planning using Deep Successor Representations

no code implementations ICCV 2017 Yuke Zhu, Daniel Gordon, Eric Kolve, Dieter Fox, Li Fei-Fei, Abhinav Gupta, Roozbeh Mottaghi, Ali Farhadi

A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world.

Imitation Learning reinforcement-learning

SeGAN: Segmenting and Generating the Invisible

1 code implementation CVPR 2018 Kiana Ehsani, Roozbeh Mottaghi, Ali Farhadi

Objects often occlude each other in scenes; Inferring their appearance beyond their visible parts plays an important role in scene understanding, depth estimation, object interaction and manipulation.

Depth Estimation Scene Understanding

See the Glass Half Full: Reasoning about Liquid Containers, their Volume and Content

no code implementations ICCV 2017 Roozbeh Mottaghi, Connor Schenck, Dieter Fox, Ali Farhadi

Doing so requires estimating the volume of the cup, approximating the amount of water in the pitcher, and predicting the behavior of water when we tilt the pitcher.

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

2 code implementations16 Sep 2016 Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi

To address the second issue, we propose AI2-THOR framework, which provides an environment with high-quality 3D scenes and physics engine.

3D Reconstruction Feature Engineering +2

A Task-Oriented Approach for Cost-Sensitive Recognition

no code implementations CVPR 2016 Roozbeh Mottaghi, Hannaneh Hajishirzi, Ali Farhadi

With the recent progress in visual recognition, we have already started to see a surge of vision related real-world applications.

Scene Understanding

"What happens if..." Learning to Predict the Effect of Forces in Images

no code implementations17 Mar 2016 Roozbeh Mottaghi, Mohammad Rastegari, Abhinav Gupta, Ali Farhadi

To build a dataset of forces in scenes, we reconstructed all images in SUN RGB-D dataset in a physics simulator to estimate the physical movements of objects caused by external forces applied to them.

Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images

no code implementations12 Nov 2015 Roozbeh Mottaghi, Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi

Direct and explicit estimation of the forces and the motion of objects from a single image is extremely challenging.

A Coarse-to-Fine Model for 3D Pose Estimation and Sub-category Recognition

no code implementations CVPR 2015 Roozbeh Mottaghi, Yu Xiang, Silvio Savarese

Despite the fact that object detection, 3D pose estimation, and sub-category recognition are highly correlated tasks, they are usually addressed independently from each other because of the huge space of parameters.

3D Pose Estimation Object Detection

Human-Machine CRFs for Identifying Bottlenecks in Holistic Scene Understanding

no code implementations16 Jun 2014 Roozbeh Mottaghi, Sanja Fidler, Alan Yuille, Raquel Urtasun, Devi Parikh

Recent trends in image understanding have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning, and local appearance based classifiers.

Object Detection Scene Recognition +2

Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs

no code implementations CVPR 2013 Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun, Devi Parikh

Recent trends in semantic image segmentation have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning.

Object Detection Scene Recognition +2

Bottom-Up Segmentation for Top-Down Detection

no code implementations CVPR 2013 Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun

When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP.

14 Object Detection +1

Complexity of Representation and Inference in Compositional Models with Part Sharing

no code implementations16 Jan 2013 Alan L. Yuille, Roozbeh Mottaghi

This paper describes serial and parallel compositional models of multiple objects with part sharing.

Cannot find the paper you are looking for? You can Submit a new open access paper.