Search Results for author: Arjun Majumdar

Found 14 papers, 8 papers with code

Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning

1 code implementation • CVPR 2018 • David Mascharka, Philip Tran, Ryan Soklaski, Arjun Majumdar

Recently, modular networks have been shown to be an effective framework for performing visual reasoning tasks.

Ranked #4 on Visual Question Answering (VQA) on CLEVR

Question Answering Visual Question Answering +1

350

Paper
Code

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

3 code implementations • ECCV 2020 • Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee

We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.

Vision and Language Navigation

215

Paper
Code

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

1 code implementation • ECCV 2020 • Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

Following a navigation instruction such as 'Walk down the stairs and stop at the brown sofa' requires embodied AI agents to ground scene elements referenced via language (e. g. 'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Ranked #6 on Vision and Language Navigation on VLN Challenge

Vision and Language Navigation

Paper
Code

Extended Abstract: Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

no code implementations • ICML Workshop LaReL 2020 • Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

Following a navigation instruction such as 'Walk down the stairs and stop near the sofa' requires an agent to ground scene elements referenced via language (e. g.'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Vision and Language Navigation

Paper
Add Code

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments – Extended Abstract

no code implementations • ICML Workshop LaReL 2020 • Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee

We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.

Vision and Language Navigation

Paper
Add Code

Sim-to-Real Transfer for Vision-and-Language Navigation

1 code implementation • 7 Nov 2020 • Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee

We study the challenging problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions.

Vision and Language Navigation

Paper
Code

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

no code implementations • NeurIPS 2021 • Abhinav Moudgil, Arjun Majumdar, Harsh Agrawal, Stefan Lee, Dhruv Batra

Natural language instructions for visual navigation often use scene descriptions (e. g., "bedroom") and object references (e. g., "green chairs") to provide a breadcrumb trail to a goal location.

Object Scene Classification +2

Paper
Add Code

Offline Visual Representation Learning for Embodied Navigation

1 code implementation • 27 Apr 2022 • Karmesh Yadav, Ram Ramrakhya, Arjun Majumdar, Vincent-Pierre Berges, Sachit Kuhar, Dhruv Batra, Alexei Baevski, Oleksandr Maksymets

In this paper, we show that an alternative 2-stage strategy is far more effective: (1) offline pretraining of visual representations with self-supervised learning (SSL) using large-scale pre-rendered images of indoor environments (Omnidata), and (2) online finetuning of visuomotor representations on specific tasks with image augmentations under long learning schedules.

Representation Learning Self-Supervised Learning

Paper
Code

ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings

1 code implementation • 24 Jun 2022 • Arjun Majumdar, Gunjan Aggarwal, Bhavika Devnani, Judy Hoffman, Dhruv Batra

We present a scalable approach for learning open-world object-goal navigation (ObjectNav) -- the task of asking a virtual robot (agent) to find any instance of an object in an unexplored environment (e. g., "find a sink").

Paper
Code

OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav

no code implementations • 14 Mar 2023 • Karmesh Yadav, Arjun Majumdar, Ram Ramrakhya, Naoki Yokoyama, Alexei Baevski, Zsolt Kira, Oleksandr Maksymets, Dhruv Batra

We present a single neural network architecture composed of task-agnostic components (ViTs, convolutions, and LSTMs) that achieves state-of-art results on both the ImageNav ("go to location in <this picture>") and ObjectNav ("find a chair") tasks without any task-specific modules like object detection, segmentation, mapping, or planning modules.

object-detection Object Detection +3

Paper
Add Code

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

no code implementations • NeurIPS 2023 • Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

Contrary to inferences from prior work, we find that scaling dataset size and diversity does not improve performance universally (but does so on average).

Paper
Add Code

Masked Trajectory Models for Prediction, Representation, and Control

1 code implementation • 4 May 2023 • Philipp Wu, Arjun Majumdar, Kevin Stone, Yixin Lin, Igor Mordatch, Pieter Abbeel, Aravind Rajeswaran

We introduce Masked Trajectory Models (MTM) as a generic abstraction for sequential decision making.

Continuous Control Decision Making +2

138

Paper
Code

Behavioral Analysis of Vision-and-Language Navigation Agents

1 code implementation • CVPR 2023 • Zijiao Yang, Arjun Majumdar, Stefan Lee

To be successful, Vision-and-Language Navigation (VLN) agents must be able to ground instructions to actions based on their surroundings.

Vision and Language Navigation

Paper
Code

What do we learn from a large-scale study of pre-trained visual representations in sim and real environments?

no code implementations • 3 Oct 2023 • Sneha Silwal, Karmesh Yadav, Tingfan Wu, Jay Vakil, Arjun Majumdar, Sergio Arnaud, Claire Chen, Vincent-Pierre Berges, Dhruv Batra, Aravind Rajeswaran, Mrinal Kalakrishnan, Franziska Meier, Oleksandr Maksymets

We present a large empirical investigation on the use of pre-trained visual representations (PVRs) for training downstream policies that execute real-world tasks.

Data Augmentation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.