Search Results for author: Arjun Majumdar

Found 15 papers, 8 papers with code

Behavioral Analysis of Vision-and-Language Navigation Agents

1 code implementation CVPR 2023 Zijiao Yang, Arjun Majumdar, Stefan Lee

To be successful, Vision-and-Language Navigation (VLN) agents must be able to ground instructions to actions based on their surroundings.

Vision and Language Navigation

OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav

no code implementations14 Mar 2023 Karmesh Yadav, Arjun Majumdar, Ram Ramrakhya, Naoki Yokoyama, Alexei Baevski, Zsolt Kira, Oleksandr Maksymets, Dhruv Batra

We present a single neural network architecture composed of task-agnostic components (ViTs, convolutions, and LSTMs) that achieves state-of-art results on both the ImageNav ("go to location in <this picture>") and ObjectNav ("find a chair") tasks without any task-specific modules like object detection, segmentation, mapping, or planning modules.

object-detection Object Detection +3

ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings

1 code implementation24 Jun 2022 Arjun Majumdar, Gunjan Aggarwal, Bhavika Devnani, Judy Hoffman, Dhruv Batra

We present a scalable approach for learning open-world object-goal navigation (ObjectNav) -- the task of asking a virtual robot (agent) to find any instance of an object in an unexplored environment (e. g., "find a sink").

Offline Visual Representation Learning for Embodied Navigation

1 code implementation27 Apr 2022 Karmesh Yadav, Ram Ramrakhya, Arjun Majumdar, Vincent-Pierre Berges, Sachit Kuhar, Dhruv Batra, Alexei Baevski, Oleksandr Maksymets

In this paper, we show that an alternative 2-stage strategy is far more effective: (1) offline pretraining of visual representations with self-supervised learning (SSL) using large-scale pre-rendered images of indoor environments (Omnidata), and (2) online finetuning of visuomotor representations on specific tasks with image augmentations under long learning schedules.

Representation Learning Self-Supervised Learning

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

no code implementations NeurIPS 2021 Abhinav Moudgil, Arjun Majumdar, Harsh Agrawal, Stefan Lee, Dhruv Batra

Natural language instructions for visual navigation often use scene descriptions (e. g., "bedroom") and object references (e. g., "green chairs") to provide a breadcrumb trail to a goal location.

Object Scene Classification +2

Sim-to-Real Transfer for Vision-and-Language Navigation

1 code implementation7 Nov 2020 Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee

We study the challenging problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions.

Vision and Language Navigation

Extended Abstract: Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

no code implementations ICML Workshop LaReL 2020 Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

Following a navigation instruction such as 'Walk down the stairs and stop near the sofa' requires an agent to ground scene elements referenced via language (e. g.'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Vision and Language Navigation

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments – Extended Abstract

no code implementations ICML Workshop LaReL 2020 Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee

We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.

Vision and Language Navigation

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

1 code implementation ECCV 2020 Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

Following a navigation instruction such as 'Walk down the stairs and stop at the brown sofa' requires embodied AI agents to ground scene elements referenced via language (e. g. 'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Vision and Language Navigation

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

3 code implementations ECCV 2020 Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee

We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.

Vision and Language Navigation

Cannot find the paper you are looking for? You can Submit a new open access paper.