Search Results for author: Dhruv Batra

Found 172 papers, 82 papers with code

Talk The Walk: Navigating Grids in New York City through Grounded Dialogue

no code implementations • ICLR 2019 • Harm de Vries, Kurt Shuster, Dhruv Batra, Devi Parikh, Jason Weston, Douwe Kiela

We introduce `"Talk The Walk", the first large-scale dialogue dataset grounded in action and perception.

Paper
Add Code

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

no code implementations • 9 Apr 2024 • Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani, Sriram Yenamandra, Theophile Gervet, Matthew Chang, Zsolt Kira, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi

The Embodied AI community has made significant strides in visual navigation tasks, exploring targets from 3D coordinates, objects, language descriptions, and images.

Navigate Visual Navigation

Paper
Add Code

Seeing the Unseen: Visual Common Sense for Semantic Placement

no code implementations • 15 Jan 2024 • Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra, Zsolt Kira, Kuo-Hao Zeng, Luca Weihs

Datasets for image description are typically constructed by curating relevant images and asking humans to annotate the contents of the image; neither of those two steps are straightforward for objects not present in the image.

Common Sense Reasoning Object

Paper
Add Code

Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

no code implementations • 14 Dec 2023 • Yafei Hu, Quanting Xie, Vidhi Jain, Jonathan Francis, Jay Patrikar, Nikhil Keetha, Seungchan Kim, Yaqi Xie, Tianyi Zhang, Shibo Zhao, Yu Quan Chong, Chen Wang, Katia Sycara, Matthew Johnson-Roberson, Dhruv Batra, Xiaolong Wang, Sebastian Scherer, Zsolt Kira, Fei Xia, Yonatan Bisk

Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i. e., foundation models) in research fields such as Natural Language Processing (NLP) and Computer Vision (CV), we devote this survey to exploring (i) how these existing foundation models from NLP and CV can be applied to the field of robotics, and also exploring (ii) what a robotics-specific foundation model would look like.

Paper
Add Code

VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation

no code implementations • 6 Dec 2023 • Naoki Yokoyama, Sehoon Ha, Dhruv Batra, Jiuguang Wang, Bernadette Bucher

Understanding how humans leverage semantic knowledge to navigate unfamiliar environments and decide where to explore next is pivotal for developing robots capable of human-like search behaviors.

Language Modelling Navigate

Paper
Add Code

Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots

3 code implementations • 19 Oct 2023 • Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M. Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, Roozbeh Mottaghi

We present Habitat 3. 0: a simulation platform for studying collaborative human-robot tasks in home environments.

Social Navigation

2,362

Paper
Code

What do we learn from a large-scale study of pre-trained visual representations in sim and real environments?

no code implementations • 3 Oct 2023 • Sneha Silwal, Karmesh Yadav, Tingfan Wu, Jay Vakil, Arjun Majumdar, Sergio Arnaud, Claire Chen, Vincent-Pierre Berges, Dhruv Batra, Aravind Rajeswaran, Mrinal Kalakrishnan, Franziska Meier, Oleksandr Maksymets

We present a large empirical investigation on the use of pre-trained visual representations (PVRs) for training downstream policies that execute real-world tasks.

Data Augmentation

Paper
Add Code

Skill Transformer: A Monolithic Policy for Mobile Manipulation

no code implementations • ICCV 2023 • Xiaoyu Huang, Dhruv Batra, Akshara Rai, Andrew Szot

We present Skill Transformer, an approach for solving long-horizon robotic tasks by combining conditional sequence modeling and skill modularity.

Paper
Add Code

Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

no code implementations • 7 Aug 2023 • Nirbhay Modhe, Qiaozi Gao, Ashwin Kalyan, Dhruv Batra, Govind Thattai, Gaurav Sukhatme

Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions.

Offline RL reinforcement-learning +1

Paper
Add Code

HomeRobot: Open-Vocabulary Mobile Manipulation

no code implementations • 20 Jun 2023 • Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alexander William Clegg, John Turner, Zsolt Kira, Manolis Savva, Angel Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton

HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks.

Paper
Add Code

Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation

no code implementations • 20 Jun 2023 • Mukul Khanna, Yongsen Mao, Hanxiao Jiang, Sanjay Haresh, Brennan Shacklett, Dhruv Batra, Alexander Clegg, Eric Undersander, Angel X. Chang, Manolis Savva

Surprisingly, we observe that agents trained on just 122 scenes from our dataset outperform agents trained on 10, 000 scenes from the ProcTHOR-10K dataset in terms of zero-shot generalization in real-world scanned environments.

Navigate Zero-shot Generalization

Paper
Add Code

Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second

1 code implementation • CVPR 2023 • Vincent-Pierre Berges, Andrew Szot, Devendra Singh Chaplot, Aaron Gokaslan, Roozbeh Mottaghi, Dhruv Batra, Eric Undersander

Specifically, a Fetch robot (equipped with a mobile base, 7DoF arm, RGBD camera, egomotion, and onboard sensing) is spawned in a home environment and asked to rearrange objects - by navigating to an object, picking it up, navigating to a target location, and then placing the object at the target location.

Reinforcement Learning (RL)

Paper
Code

Adaptive Coordination in Social Embodied Rearrangement

no code implementations • 31 May 2023 • Andrew Szot, Unnat Jain, Dhruv Batra, Zsolt Kira, Ruta Desai, Akshara Rai

We present the task of "Social Rearrangement", consisting of cooperative everyday tasks like setting up the dinner table, tidying a house or unpacking groceries in a simulated multi-agent environment.

Paper
Add Code

AutoNeRF: Training Implicit Scene Representations with Autonomous Agents

1 code implementation • 21 Apr 2023 • Pierre Marza, Laetitia Matignon, Olivier Simonin, Dhruv Batra, Christian Wolf, Devendra Singh Chaplot

Empirical results show that NeRFs can be trained on actively collected data using just a single episode of experience in an unseen environment, and can be used for several downstream robotic tasks, and that modular trained exploration models outperform other classical and end-to-end baselines.

Novel View Synthesis

Paper
Code

Navigating to Objects Specified by Images

no code implementations • ICCV 2023 • Jacob Krantz, Theophile Gervet, Karmesh Yadav, Austin Wang, Chris Paxton, Roozbeh Mottaghi, Dhruv Batra, Jitendra Malik, Stefan Lee, Devendra Singh Chaplot

Our modular method solves sub-tasks of exploration, goal instance re-identification, goal localization, and local navigation.

Navigate Visual Reasoning

Paper
Add Code

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

no code implementations • NeurIPS 2023 • Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

Contrary to inferences from prior work, we find that scaling dataset size and diversity does not improve performance universally (but does so on average).

Paper
Add Code

BC-IRL: Learning Generalizable Reward Functions from Demonstrations

no code implementations • 28 Mar 2023 • Andrew Szot, Amy Zhang, Dhruv Batra, Zsolt Kira, Franziska Meier

How well do reward functions learned with inverse reinforcement learning (IRL) generalize?

reinforcement-learning

Paper
Add Code

OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav

no code implementations • 14 Mar 2023 • Karmesh Yadav, Arjun Majumdar, Ram Ramrakhya, Naoki Yokoyama, Alexei Baevski, Zsolt Kira, Oleksandr Maksymets, Dhruv Batra

We present a single neural network architecture composed of task-agnostic components (ViTs, convolutions, and LSTMs) that achieves state-of-art results on both the ImageNav ("go to location in <this picture>") and ObjectNav ("find a chair") tasks without any task-specific modules like object detection, segmentation, mapping, or planning modules.

object-detection Object Detection +3

Paper
Add Code

Emergence of Maps in the Memories of Blind Navigation Agents

no code implementations • 30 Jan 2023 • Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra

A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial.

Inductive Bias PointGoal Navigation

Paper
Add Code

PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav

1 code implementation • CVPR 2023 • Ram Ramrakhya, Dhruv Batra, Erik Wijmans, Abhishek Das

We find that BC$\rightarrow$RL on human demonstrations outperforms BC$\rightarrow$RL on SP and FE trajectories, even when controlled for same BC-pretraining success on train, and even on a subset of val episodes where BC-pretraining success favors the SP or FE policies.

Imitation Learning Navigate +1

Paper
Code

Cross-Domain Transfer via Semantic Skill Imitation

no code implementations • 14 Dec 2022 • Karl Pertsch, Ruta Desai, Vikash Kumar, Franziska Meier, Joseph J. Lim, Dhruv Batra, Akshara Rai

We propose an approach for semantic imitation, which uses demonstrations from a source domain, e. g. human videos, to accelerate reinforcement learning (RL) in a different target domain, e. g. a robotic manipulator in a simulated kitchen.

Reinforcement Learning (RL) Robot Manipulation

Paper
Add Code

Navigating to Objects in the Real World

no code implementations • 2 Dec 2022 • Theophile Gervet, Soumith Chintala, Dhruv Batra, Jitendra Malik, Devendra Singh Chaplot

In contrast, end-to-end learning does not, dropping from 77% simulation to 23% real-world success rate due to a large image domain gap between simulation and reality.

Navigate Visual Navigation

Paper
Add Code

Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances

no code implementations • 29 Nov 2022 • Jacob Krantz, Stefan Lee, Jitendra Malik, Dhruv Batra, Devendra Singh Chaplot

We consider the problem of embodied visual navigation given an image-goal (ImageNav) where an agent is initialized in an unfamiliar environment and tasked with navigating to a location 'described' by an image.

Visual Navigation

Paper
Add Code

ViNL: Visual Navigation and Locomotion Over Obstacles

1 code implementation • 26 Oct 2022 • Simar Kareer, Naoki Yokoyama, Dhruv Batra, Sehoon Ha, Joanne Truong

ViNL consists of: (1) a visual navigation policy that outputs linear and angular velocity commands that guides the robot to a goal coordinate in unfamiliar indoor environments; and (2) a visual locomotion policy that controls the robot's joints to avoid stepping on obstacles while following provided velocity commands.

Navigate Visual Navigation

Paper
Code

Retrospectives on the Embodied AI Workshop

no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu

We present a retrospective on the state of Embodied AI research.

Visual Navigation

Paper
Add Code

VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement

1 code implementation • 11 Oct 2022 • Erik Wijmans, Irfan Essa, Dhruv Batra

Specifically, the Pick skill involves a robot picking an object from a table.

Navigate Out-of-Distribution Generalization +1

1,708

Paper
Code

Habitat-Matterport 3D Semantics Dataset

2 code implementations • CVPR 2023 • Karmesh Yadav, Ram Ramrakhya, Santhosh Kumar Ramakrishnan, Theo Gervet, John Turner, Aaron Gokaslan, Noah Maestre, Angel Xuan Chang, Dhruv Batra, Manolis Savva, Alexander William Clegg, Devendra Singh Chaplot

The scale, quality, and diversity of object annotations far exceed those of prior datasets.

Object

1,708

Paper
Code

ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings

1 code implementation • 24 Jun 2022 • Arjun Majumdar, Gunjan Aggarwal, Bhavika Devnani, Judy Hoffman, Dhruv Batra

We present a scalable approach for learning open-world object-goal navigation (ObjectNav) -- the task of asking a virtual robot (agent) to find any instance of an object in an unexplored environment (e. g., "find a sink").

Paper
Code

SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning

2 code implementations • 16 Jun 2022 • Changan Chen, Carl Schissler, Sanchit Garg, Philip Kobernik, Alexander Clegg, Paul Calamia, Dhruv Batra, Philip W Robinson, Kristen Grauman

We introduce SoundSpaces 2. 0, a platform for on-the-fly geometry-based audio rendering for 3D environments.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

311

Paper
Code

Is Mapping Necessary for Realistic PointGoal Navigation?

1 code implementation • CVPR 2022 • Ruslan Partsey, Erik Wijmans, Naoki Yokoyama, Oles Dobosevych, Dhruv Batra, Oleksandr Maksymets

However, for PointNav in a realistic setting (RGB-D and actuation noise, no GPS+Compass), this is an open question; one we tackle in this paper.

Data Augmentation Navigate +3

Paper
Code

Housekeep: Tidying Virtual Households using Commonsense Reasoning

1 code implementation • 22 May 2022 • Yash Kant, Arun Ramachandran, Sriram Yenamandra, Igor Gilitschenski, Dhruv Batra, Andrew Szot, Harsh Agrawal

Instead, the agent must learn from and is evaluated against human preferences of which objects belong where in a tidy house.

Language Modelling Large Language Model

Paper
Code

Episodic Memory Question Answering

no code implementations • CVPR 2022 • Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, Devi Parikh

Towards that end, we introduce (1) a new task - Episodic Memory Question Answering (EMQA) wherein an egocentric AI assistant is provided with a video sequence (the tour) and a question as an input and is asked to localize its answer to the question within the tour, (2) a dataset of grounded questions designed to probe the agent's spatio-temporal understanding of the tour, and (3) a model for the task that encodes the scene as an allocentric, top-down semantic feature map and grounds the question into the map to localize the answer.

Question Answering

Paper
Add Code

Offline Visual Representation Learning for Embodied Navigation

1 code implementation • 27 Apr 2022 • Karmesh Yadav, Ram Ramrakhya, Arjun Majumdar, Vincent-Pierre Berges, Sachit Kuhar, Dhruv Batra, Alexei Baevski, Oleksandr Maksymets

In this paper, we show that an alternative 2-stage strategy is far more effective: (1) offline pretraining of visual representations with self-supervised learning (SSL) using large-scale pre-rendered images of indoor environments (Omnidata), and (2) online finetuning of visuomotor representations on specific tasks with image augmentations under long learning schedules.

Representation Learning Self-Supervised Learning

Paper
Code

Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale

no code implementations • CVPR 2022 • Ram Ramrakhya, Eric Undersander, Dhruv Batra, Abhishek Das

We present a large-scale study of imitating human demonstrations on tasks that require a virtual robot to search for objects in new environments -- (1) ObjectGoal Navigation (e. g. 'find & go to a chair') and (2) Pick&Place (e. g. 'find mug, pick mug, find counter, place mug on counter').

Imitation Learning Reinforcement Learning (RL)

Paper
Add Code

Simple and Effective Synthesis of Indoor 3D Scenes

1 code implementation • 6 Apr 2022 • Jing Yu Koh, Harsh Agrawal, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

We study the problem of synthesizing immersive 3D indoor scenes from one or more images.

Data Augmentation Vision and Language Navigation

Paper
Code

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

no code implementations • NeurIPS 2021 • Abhinav Moudgil, Arjun Majumdar, Harsh Agrawal, Stefan Lee, Dhruv Batra

Natural language instructions for visual navigation often use scene descriptions (e. g., "bedroom") and object references (e. g., "green chairs") to provide a breadcrumb trail to a goal location.

Object Scene Classification +2

Paper
Add Code

Ego4D: Around the World in 3,000 Hours of Egocentric Video

6 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

5,040

Paper
Code

Waypoint Models for Instruction-guided Navigation in Continuous Environments

1 code implementation • ICCV 2021 • Jacob Krantz, Aaron Gokaslan, Dhruv Batra, Stefan Lee, Oleksandr Maksymets

Little inquiry has explicitly addressed the role of action spaces in language-guided visual navigation -- either in terms of its effect on navigation success or the efficiency with which a robotic agent could execute the resulting trajectory.

Instruction Following Visual Navigation

216

Paper
Code

Benchmarking Augmentation Methods for Learning Robust Navigation Agents: the Winning Entry of the 2021 iGibson Challenge

no code implementations • 22 Sep 2021 • Naoki Yokoyama, Qian Luo, Dhruv Batra, Sehoon Ha

Recent advances in deep reinforcement learning and scalable photorealistic simulation have led to increasingly mature embodied AI for various visual tasks, including navigation.

Benchmarking Image Augmentation +4

Paper
Add Code

Realistic PointGoal Navigation via Auxiliary Losses and Information Bottleneck

1 code implementation • 17 Sep 2021 • Guillermo Grande, Dhruv Batra, Erik Wijmans

Under this setting, the agent incurs a penalty for using this privileged information, encouraging the agent to only leverage this information when it is crucial to learning.

PointGoal Navigation

Paper
Code

Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI

2 code implementations • 16 Sep 2021 • Santhosh K. Ramakrishnan, Aaron Gokaslan, Erik Wijmans, Oleksandr Maksymets, Alex Clegg, John Turner, Eric Undersander, Wojciech Galuba, Andrew Westbury, Angel X. Chang, Manolis Savva, Yili Zhao, Dhruv Batra

When compared to existing photorealistic 3D datasets such as Replica, MP3D, Gibson, and ScanNet, images rendered from HM3D have 20 - 85% higher visual fidelity w. r. t.

PointGoal Navigation Surface Reconstruction

293

Paper
Code

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

no code implementations • ICCV 2021 • Xiaoming Zhao, Harsh Agrawal, Dhruv Batra, Alexander Schwing

It is fundamental for personal robots to reliably navigate to a specified goal.

Navigate PointGoal Navigation +1

Paper
Add Code

Habitat 2.0: Training Home Assistants to Rearrange their Habitat

6 code implementations • NeurIPS 2021 • Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, Dhruv Batra

We introduce Habitat 2. 0 (H2. 0), a simulation platform for training virtual robots in interactive 3D environments and complex physics-enabled scenarios.

Reinforcement Learning (RL) Single Particle Analysis

2,362

Paper
Code

Model-Advantage and Value-Aware Models for Model-Based Reinforcement Learning: Bridging the Gap in Theory and Practice

1 code implementation • 26 Jun 2021 • Nirbhay Modhe, Harish Kamath, Dhruv Batra, Ashwin Kalyan

This work shows that value-aware model learning, known for its numerous theoretical benefits, is also practically viable for solving challenging continuous control tasks in prevalent model-based reinforcement learning algorithms.

Continuous Control Model-based Reinforcement Learning

Paper
Code

Auxiliary Tasks and Exploration Enable ObjectNav

1 code implementation • 8 Apr 2021 • Joel Ye, Dhruv Batra, Abhishek Das, Erik Wijmans

We instead re-enable a generic learned agent by adding auxiliary learning tasks and an exploration reward.

Ranked #2 on Robot Navigation on Habitat 2020 Object Nav test-std

Auxiliary Learning Navigate +1

Paper
Code

Success Weighted by Completion Time: A Dynamics-Aware Evaluation Criteria for Embodied Navigation

no code implementations • 14 Mar 2021 • Naoki Yokoyama, Sehoon Ha, Dhruv Batra

Several related works on navigation have used Success weighted by Path Length (SPL) as the primary method of evaluating the path an agent makes to a goal location, but SPL is limited in its ability to properly evaluate agents with complex dynamics.

Navigate

Paper
Add Code

Large Batch Simulation for Deep Reinforcement Learning

1 code implementation • ICLR 2021 • Brennan Shacklett, Erik Wijmans, Aleksei Petrenko, Manolis Savva, Dhruv Batra, Vladlen Koltun, Kayvon Fatahalian

We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of over 19, 000 frames of experience per second on a single GPU and up to 72, 000 frames per second on a single eight-GPU machine.

PointGoal Navigation reinforcement-learning +1

Paper
Code

Memory-Augmented Reinforcement Learning for Image-Goal Navigation

1 code implementation • 13 Jan 2021 • Lina Mezghani, Sainbayar Sukhbaatar, Thibaut Lavril, Oleksandr Maksymets, Dhruv Batra, Piotr Bojanowski, Karteek Alahari

In this work, we present a memory-augmented approach for image-goal navigation.

Data Augmentation Navigate +2

Paper
Code

Auxiliary Tasks and Exploration Enable ObjectGoal Navigation

no code implementations • ICCV 2021 • Joel Ye, Dhruv Batra, Abhishek Das, Erik Wijmans

We instead re-enable a generic learned agent by adding auxiliary learning tasks and an exploration reward.

Auxiliary Learning Navigate

Paper
Add Code

THDA: Treasure Hunt Data Augmentation for Semantic Navigation

no code implementations • ICCV 2021 • Oleksandr Maksymets, Vincent Cartillier, Aaron Gokaslan, Erik Wijmans, Wojciech Galuba, Stefan Lee, Dhruv Batra

We show that this is a natural consequence of optimizing for the task metric (which in fact penalizes exploration), is enabled by powerful observation encoders, and is possible due to the finite set of training environment configurations.

Data Augmentation Navigate +2

Paper
Add Code

How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget

no code implementations • 11 Dec 2020 • Erik Wijmans, Irfan Essa, Dhruv Batra

PointGoal navigation has seen significant recent interest and progress, spurred on by the Habitat platform and associated challenge.

PointGoal Navigation

Paper
Add Code

Bi-directional Domain Adaptation for Sim2Real Transfer of Embodied Navigation Agents

no code implementations • 24 Nov 2020 • Joanne Truong, Sonia Chernova, Dhruv Batra

Simulation offers the ability to train large numbers of robots in parallel, and offers an abundance of data.

Domain Adaptation PointGoal Navigation Robotics

Paper
Add Code

Where Are You? Localization from Embodied Dialog

2 code implementations • EMNLP 2020 • Meera Hahn, Jacob Krantz, Dhruv Batra, Devi Parikh, James M. Rehg, Stefan Lee, Peter Anderson

In this paper, we focus on the LED task -- providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices.

Navigate Visual Dialog

Paper
Code

Sim-to-Real Transfer for Vision-and-Language Navigation

1 code implementation • 7 Nov 2020 • Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee

We study the challenging problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions.

Vision and Language Navigation

Paper
Code

Rearrangement: A Challenge for Embodied AI

no code implementations • 3 Nov 2020 • Dhruv Batra, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun, Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, Manolis Savva, Hao Su

In the rearrangement task, the goal is to bring a given physical environment into a specified state.

Benchmarking

Paper
Add Code

SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency

1 code implementation • NAACL 2021 • Sameer Dharur, Purva Tendulkar, Dhruv Batra, Devi Parikh, Ramprasaath R. Selvaraju

Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer seemingly difficult questions requiring reasoning correctly but get simpler associated sub-questions wrong.

Question Answering Visual Grounding +1

Paper
Code

Contrast and Classify: Training Robust VQA Models

1 code implementation • ICCV 2021 • Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal

Recent Visual Question Answering (VQA) models have shown impressive performance on the VQA benchmark but remain sensitive to small linguistic variations in input questions.

Contrastive Learning Data Augmentation +4

Paper
Code

Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views

1 code implementation • 2 Oct 2020 • Vincent Cartillier, Zhile Ren, Neha Jain, Stefan Lee, Irfan Essa, Dhruv Batra

We study the task of semantic mapping - specifically, an embodied agent (a robot or an egocentric AI assistant) is given a tour of a new environment and asked to build an allocentric top-down semantic map ("what is where?")

Representation Learning

Paper
Code

Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents

no code implementations • 7 Sep 2020 • Samyak Datta, Oleksandr Maksymets, Judy Hoffman, Stefan Lee, Dhruv Batra, Devi Parikh

This enables a seamless adaption to changing dynamics (a different robot or floor type) by simply re-calibrating the visual odometry model -- circumventing the expense of re-training of the navigation policy.

Ranked #5 on Robot Navigation on Habitat 2020 Point Nav test-std

Navigate Robot Navigation +1

Paper
Add Code

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

1 code implementation • NeurIPS 2020 • Michael Cogswell, Jiasen Lu, Rishabh Jain, Stefan Lee, Devi Parikh, Dhruv Batra

Can we develop visually grounded dialog agents that can efficiently adapt to new tasks without forgetting how to talk to people?

Visual Dialog Visual Question Answering (VQA)

Paper
Code

Spatially Aware Multimodal Transformers for TextVQA

1 code implementation • ECCV 2020 • Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal

Further, each head in our multi-head self-attention layer focuses on a different subset of relations.

Optical Character Recognition (OCR) Visual Grounding +1

Paper
Code

Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation

no code implementations • ECCV 2020 • Medhini Narasimhan, Erik Wijmans, Xinlei Chen, Trevor Darrell, Dhruv Batra, Devi Parikh, Amanpreet Singh

We also demonstrate that reducing the task of room navigation to point navigation improves the performance further.

Navigate

Paper
Add Code

Auxiliary Tasks Speed Up Learning PointGoal Navigation

1 code implementation • 9 Jul 2020 • Joel Ye, Dhruv Batra, Erik Wijmans, Abhishek Das

PointGoal Navigation is an embodied task that requires agents to navigate to a specified point in an unseen environment.

Navigate PointGoal Navigation

Paper
Code

ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects

3 code implementations • 23 Jun 2020 • Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans

In particular, the agent is initialized at a random location and pose in an environment and asked to find an instance of an object category, e. g., find a chair, by navigating to it.

Object

1,709

Paper
Code

Extended Abstract: Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

no code implementations • ICML Workshop LaReL 2020 • Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

Following a navigation instruction such as 'Walk down the stairs and stop near the sofa' requires an agent to ground scene elements referenced via language (e. g.'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Vision and Language Navigation

Paper
Add Code

Bridging Worlds in Reinforcement Learning with Model-Advantage

no code implementations • ICML Workshop LifelongML 2020 • Nirbhay Modhe, Harish K Kamath, Dhruv Batra, Ashwin Kalyan

Despite the breakthroughs achieved by Reinforcement Learning (RL) in recent years, RL agents often fail to perform well in unseen environments.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments – Extended Abstract

no code implementations • ICML Workshop LaReL 2020 • Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee

We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.

Vision and Language Navigation

Paper
Add Code

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

1 code implementation • ECCV 2020 • Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

Following a navigation instruction such as 'Walk down the stairs and stop at the brown sofa' requires embodied AI agents to ground scene elements referenced via language (e. g. 'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Ranked #6 on Vision and Language Navigation on VLN Challenge

Vision and Language Navigation

Paper
Code

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

3 code implementations • ECCV 2020 • Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee

We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.

Vision and Language Navigation

216

Paper
Code

Analyzing Visual Representations in Embodied Navigation Tasks

no code implementations • 12 Mar 2020 • Erik Wijmans, Julian Straub, Dhruv Batra, Irfan Essa, Judy Hoffman, Ari Morcos

Recent advances in deep reinforcement learning require a large amount of training data and generally result in representations that are often over specialized to the target task.

Reinforcement Learning (RL)

Paper
Add Code

Insights on Visual Representations for Embodied Navigation Tasks

no code implementations • ICLR 2020 • Erik Wijmans, Julian Straub, Irfan Essa, Dhruv Batra, Judy Hoffman, Ari Morcos

Surprisingly, we find that slight differences in task have no measurable effect on the visual representation for both SqueezeNet and ResNet architectures.

Paper
Add Code

Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance?

3 code implementations • 13 Dec 2019 • Abhishek Kadian, Joanne Truong, Aaron Gokaslan, Alexander Clegg, Erik Wijmans, Stefan Lee, Manolis Savva, Sonia Chernova, Dhruv Batra

Second, we investigate the sim2real predictivity of Habitat-Sim for PointGoal navigation.

PointGoal Navigation Visual Navigation

1,709

Paper
Code

Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

2 code implementations • ECCV 2020 • Vishvak Murahari, Dhruv Batra, Devi Parikh, Abhishek Das

Next, we find that additional finetuning using "dense" annotations in VisDial leads to even higher NDCG -- more than 10% over our base model -- but hurts MRR -- more than 17% below our base model!

Language Modelling Representation Learning +2

Paper
Code

DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

8 code implementations • ICLR 2020 • Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra

We leverage this scaling to train an agent for 2. 5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs.

Ranked #1 on PointGoal Navigation on Gibson PointGoal Navigation

Autonomous Navigation Navigate +2

31,092

Paper
Code

DS-VIC: Unsupervised Discovery of Decision States for Transfer in RL

no code implementations • 25 Sep 2019 • Nirbhay Modhe, Prithvijit Chattopadhyay, Mohit Sharma, Abhishek Das, Devi Parikh, Dhruv Batra, Ramakrishna Vedantam

We learn to identify decision states, namely the parsimonious set of states where decisions meaningfully affect the future states an agent can reach in an environment.

Paper
Add Code

Improving Generative Visual Dialog by Answering Diverse Questions

1 code implementation • IJCNLP 2019 • Vishvak Murahari, Prithvijit Chattopadhyay, Dhruv Batra, Devi Parikh, Abhishek Das

Prior work on training generative Visual Dialog models with reinforcement learning(Das et al.) has explored a Qbot-Abot image-guessing game and shown that this 'self-talk' approach can lead to improved performance at the downstream dialog-conditioned image-guessing task.

Representation Learning Visual Dialog

Paper
Code

Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

no code implementations • ICCV 2019 • Jyoti Aneja, Harsh Agrawal, Dhruv Batra, Alexander Schwing

We encourage this temporal latent space to capture the 'intention' about how to complete the sentence by mimicking a representation which summarizes the future.

Image Captioning Language Modelling +1

Paper
Add Code

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

11 code implementations • NeurIPS 2019 • Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee

We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language.

Ranked #5 on Referring Expression Comprehension on Talk2Car

Image Retrieval Question Answering +5

790

Paper
Code

IR-VIC: Unsupervised Discovery of Sub-goals for Transfer in RL

no code implementations • 24 Jul 2019 • Nirbhay Modhe, Prithvijit Chattopadhyay, Mohit Sharma, Abhishek Das, Devi Parikh, Dhruv Batra, Ramakrishna Vedantam

We propose a novel framework to identify sub-goals useful for exploration in sequential decision making tasks under partial observability.

Decision Making Hierarchical Reinforcement Learning

Paper
Add Code

Chasing Ghosts: Instruction Following as Bayesian State Tracking

1 code implementation • NeurIPS 2019 • Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee

Our experiments show that our approach outperforms a strong LingUNet baseline when predicting the goal location on the map.

Instruction Following Vision and Language Navigation

Paper
Code

The Replica Dataset: A Digital Replica of Indoor Spaces

2 code implementations • 13 Jun 2019 • Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra, Hauke M. Strasdat, Renzo De Nardi, Michael Goesele, Steven Lovegrove, Richard Newcombe

We introduce Replica, a dataset of 18 highly photo-realistic 3D indoor scene reconstructions at room and building scale.

3D Scene Reconstruction Instruction Following +2

905

Paper
Code

SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation

1 code implementation • ICCV 2019 • Daniel Gordon, Abhishek Kadian, Devi Parikh, Judy Hoffman, Dhruv Batra

We propose SplitNet, a method for decoupling visual perception and policy learning.

Visual Navigation

Paper
Code

Cross-Task Knowledge Transfer for Visually-Grounded Navigation

no code implementations • ICLR 2019 • Devendra Singh Chaplot, Lisa Lee, Ruslan Salakhutdinov, Devi Parikh, Dhruv Batra

Recent efforts on training visual navigation agents conditioned on language using deep reinforcement learning have been successful in learning policies for two different tasks: learning to follow navigational instructions and embodied question answering.

Disentanglement Embodied Question Answering +3

Paper
Add Code

Modeling the Long Term Future in Model-Based Reinforcement Learning

no code implementations • ICLR 2019 • Nan Rosemary Ke, Amanpreet Singh, Ahmed Touati, Anirudh Goyal, Yoshua Bengio, Devi Parikh, Dhruv Batra

This paper focuses on building a model that reasons about the long-term future and demonstrates how to use this for efficient planning and exploration.

Imitation Learning Model-based Reinforcement Learning +4

Paper
Add Code

Emergence of Compositional Language with Deep Generational Transmission

1 code implementation • ICLR 2020 • Michael Cogswell, Jiasen Lu, Stefan Lee, Devi Parikh, Dhruv Batra

In this paper, we introduce these cultural evolutionary dynamics into language emergence by periodically replacing agents in a population to create a knowledge gap, implicitly inducing cultural transmission of language.

Reinforcement Learning (RL)

Paper
Code

Towards VQA Models That Can Read

7 code implementations • CVPR 2019 • Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, Marcus Rohrbach

We show that LoRRA outperforms existing state-of-the-art VQA models on our TextVQA dataset.

Ranked #3 on Visual Question Answering (VQA) on VizWiz 2018

Visual Question Answering (VQA)

5,416

Paper
Code

Counterfactual Visual Explanations

1 code implementation • 16 Apr 2019 • Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, Stefan Lee

In this work, we develop a technique to produce counterfactual visual explanations.

counterfactual General Classification +1

Paper
Code

Multi-Target Embodied Question Answering

1 code implementation • CVPR 2019 • Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L. Berg, Dhruv Batra

To address this, we propose a modular architecture composed of a program generator, a controller, a navigator, and a VQA module.

Embodied Question Answering Navigate +1

287

Paper
Code

Embodied Visual Recognition

no code implementations • 9 Apr 2019 • Jianwei Yang, Zhile Ren, Mingze Xu, Xinlei Chen, David Crandall, Devi Parikh, Dhruv Batra

Passive visual systems typically fail to recognize objects in the amodal setting where they are heavily occluded.

Object Object Localization +1

Paper
Add Code

Embodied Question Answering in Photorealistic Environments with Point Cloud Perception

no code implementations • CVPR 2019 • Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra

To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task -- Embodied Question Answering [1] in photo-realistic environments (Matterport 3D).

Embodied Question Answering Question Answering

Paper
Add Code

Habitat: A Platform for Embodied AI Research

13 code implementations • ICCV 2019 • Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra

We present Habitat, a platform for research in embodied artificial intelligence (AI).

Ranked #2 on PointGoal Navigation on Gibson PointGoal Navigation

Benchmarking Instruction Following +2

2,362

Paper
Code

CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog

1 code implementation • NAACL 2019 • Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach

Specifically, we construct a dialog grammar that is grounded in the scene graphs of the images from the CLEVR dataset.

coreference-resolution Visual Dialog

Paper
Code

Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future

no code implementations • 5 Mar 2019 • Nan Rosemary Ke, Amanpreet Singh, Ahmed Touati, Anirudh Goyal, Yoshua Bengio, Devi Parikh, Dhruv Batra

This paper focuses on building a model that reasons about the long-term future and demonstrates how to use this for efficient planning and exploration.

Imitation Learning Model-based Reinforcement Learning +4

Paper
Add Code

Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering

no code implementations • ICLR 2019 • Ramakrishna Vedantam, Karan Desai, Stefan Lee, Marcus Rohrbach, Dhruv Batra, Devi Parikh

We propose a new class of probabilistic neural-symbolic models, that have symbolic functional programs as a latent, stochastic variable.

counterfactual Question Answering +1

Paper
Add Code

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

no code implementations • ICCV 2019 • Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, Devi Parikh

Many vision and language models suffer from poor visual grounding - often falling back on easy-to-learn language priors rather than basing their decisions on visual concepts in the image.

Image Captioning Question Answering +2

Paper
Add Code

EvalAI: Towards Better Evaluation Systems for AI Agents

3 code implementations • 10 Feb 2019 • Deshraj Yadav, Rishabh Jain, Harsh Agrawal, Prithvijit Chattopadhyay, Taranjeet Singh, Akash Jain, Shiv Baran Singh, Stefan Lee, Dhruv Batra

We introduce EvalAI, an open source platform for evaluating and comparing machine learning (ML) and artificial intelligence algorithms (AI) at scale.

Benchmarking BIG-bench Machine Learning

1,683

Paper
Code

Embodied Multimodal Multitask Learning

no code implementations • 4 Feb 2019 • Devendra Singh Chaplot, Lisa Lee, Ruslan Salakhutdinov, Devi Parikh, Dhruv Batra

In this paper, we propose a multitask model capable of jointly learning these multimodal tasks, and transferring knowledge of words and their grounding in visual objects across the tasks.

Disentanglement Embodied Question Answering +3

Paper
Add Code

Audio-Visual Scene-Aware Dialog

2 code implementations • 25 Jan 2019 • Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K. Marks, Chiori Hori, Peter Anderson, Stefan Lee, Devi Parikh

We introduce the task of scene-aware dialog.

Scene-Aware Dialogue

Paper
Code

Response to "Visual Dialogue without Vision or Dialogue" (Massiceti et al., 2018)

no code implementations • 16 Jan 2019 • Abhishek Das, Devi Parikh, Dhruv Batra

In a recent workshop paper, Massiceti et al. presented a baseline model and subsequent critique of Visual Dialog (Das et al., CVPR 2017) that raises what we believe to be unfounded concerns about the dataset and evaluation.

Visual Dialog

Paper
Add Code

Dialog System Technology Challenge 7

no code implementations • 11 Jan 2019 • Koichiro Yoshino, Chiori Hori, Julien Perez, Luis Fernando D'Haro, Lazaros Polymenakos, Chulaka Gunasekara, Walter S. Lasecki, Jonathan K. Kummerfeld, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan, Xiang Gao, Huda Alamari, Tim K. Marks, Devi Parikh, Dhruv Batra

This paper introduces the Seventh Dialog System Technology Challenges (DSTC), which use shared datasets to explore the problem of building dialog systems.

Sentence

Paper
Add Code

nocaps: novel object captioning at scale

2 code implementations • ICCV 2019 • Harsh Agrawal, Karan Desai, YuFei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson

To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task.

Image Captioning Object +2

Paper
Code

Fabrik: An Online Collaborative Neural Network Editor

no code implementations • 27 Oct 2018 • Utsav Garg, Viraj Prabhu, Deshraj Yadav, Ram Ramrakhya, Harsh Agrawal, Dhruv Batra

We present Fabrik, an online neural network editor that provides tools to visualize, edit, and share neural networks from within a browser.

Paper
Add Code

TarMAC: Targeted Multi-Agent Communication

no code implementations • ICLR 2019 • Abhishek Das, Théophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Michael Rabbat, Joelle Pineau

We propose a targeted communication architecture for multi-agent reinforcement learning, where agents learn both what messages to send and whom to address them to while performing cooperative tasks in partially-observable environments.

Multi-agent Reinforcement Learning

Paper
Add Code

Neural Modular Control for Embodied Question Answering

2 code implementations • 26 Oct 2018 • Abhishek Das, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra

We use imitation learning to warm-start policies at each level of the hierarchy, dramatically increasing sample efficiency, followed by reinforcement learning.

Embodied Question Answering Imitation Learning +3

1,178

Paper
Code

Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition

no code implementations • 1 Oct 2018 • Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh

Our question generation policy generalizes to new environments and a new pair of eyes, i. e., new visual system.

Question Generation Question-Generation

Paper
Add Code

Visual Coreference Resolution in Visual Dialog using Neural Module Networks

1 code implementation • ECCV 2018 • Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach

Visual dialog entails answering a series of questions grounded in an image, using dialog history as context.

Ranked #1 on Common Sense Reasoning on Visual Dialog v0.9

Common Sense Reasoning coreference-resolution +3

Paper
Code

Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance

1 code implementation • ECCV 2018 • Ramprasaath R. Selvaraju, Prithvijit Chattopadhyay, Mohamed Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee

Our approach, which we call Neuron Importance-AwareWeight Transfer (NIWT), learns to map domain knowledge about novel "unseen" classes onto this dictionary of learned concepts and then optimizes for network parameters that can effectively combine these concepts - essentially learning classifiers by discovering and composing learned semantic concepts in deep networks.

Generalized Zero-Shot Learning

Paper
Code

Graph R-CNN for Scene Graph Generation

3 code implementations • ECCV 2018 • Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh

We propose a novel scene graph generation model called Graph R-CNN, that is both effective and efficient at detecting objects and their relations in images.

Ranked #12 on Scene Graph Generation on Visual Genome

Graph Generation Scene Graph Generation

721

Paper
Code

Pythia v0.1: the Winning Entry to the VQA Challenge 2018

9 code implementations • 26 Jul 2018 • Yu Jiang, Vivek Natarajan, Xinlei Chen, Marcus Rohrbach, Dhruv Batra, Devi Parikh

We demonstrate that by making subtle but important changes to the model architecture and the learning rate schedule, fine-tuning image features, and adding data augmentation, we can significantly improve the performance of the up-down model on VQA v2. 0 dataset -- from 65. 67% to 70. 22%.

Ranked #10 on Visual Question Answering (VQA) on A-OKVQA

Data Augmentation Visual Question Answering (VQA)

5,416

Paper
Code

Talk the Walk: Navigating New York City through Grounded Dialogue

1 code implementation • 9 Jul 2018 • Harm de Vries, Kurt Shuster, Dhruv Batra, Devi Parikh, Jason Weston, Douwe Kiela

We introduce "Talk The Walk", the first large-scale dialogue dataset grounded in action and perception.

Navigate

113

Paper
Code

End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features

2 code implementations • 21 Jun 2018 • Chiori Hori, Huda Alamri, Jue Wang, Gordon Wichern, Takaaki Hori, Anoop Cherian, Tim K. Marks, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Irfan Essa, Dhruv Batra, Devi Parikh

We introduce a new dataset of dialogs about videos of human behaviors.

Question Answering Video Description +1

Paper
Code

Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse Annotations

no code implementations • ICML 2018 • Ashwin Kalyan, Stefan Lee, Anitha Kannan, Dhruv Batra

Many structured prediction problems (particularly in vision and language domains) are ambiguous, with multiple outputs being correct for an input - e. g. there are many ways of describing an image, multiple ways of translating a sentence; however, exhaustively annotating the applicability of all possible outputs is intractable due to exponentially large output spaces (e. g. all English sentences).

Multi-Label Classification Question Generation +3

Paper
Add Code

Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7

4 code implementations • 1 Jun 2018 • Huda Alamri, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Jue Wang, Irfan Essa, Dhruv Batra, Devi Parikh, Anoop Cherian, Tim K. Marks, Chiori Hori

Scene-aware dialog systems will be able to have conversations with users about the objects and events around them.

Video Description Visual Dialog

Paper
Code

Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples

no code implementations • ICLR 2018 • Ashwin Kalyan, Abhishek Mohta, Oleksandr Polozov, Dhruv Batra, Prateek Jain, Sumit Gulwani

In this work, we propose Neural Guided Deductive Search (NGDS), a hybrid synthesis technique that combines the best of both symbolic logic techniques and statistical models.

Program Synthesis

Paper
Add Code

Neural Baby Talk

1 code implementation • CVPR 2018 • Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh

We introduce a novel framework for image captioning that can produce natural language explicitly grounded in entities that object detectors find in the image.

Image Captioning Object +3

523

Paper
Code

CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication

2 code implementations • ACL 2019 • Jin-Hwa Kim, Nikita Kitaev, Xinlei Chen, Marcus Rohrbach, Byoung-Tak Zhang, Yuandong Tian, Dhruv Batra, Devi Parikh

The game involves two players: a Teller and a Drawer.

Imitation Learning

Paper
Code

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

1 code implementation • CVPR 2018 • Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi

Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 respectively).

Question Answering Visual Question Answering

Paper
Code

Embodied Question Answering

4 code implementations • CVPR 2018 • Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra

We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- where an agent is spawned at a random location in a 3D environment and asked a question ("What color is the car?").

Embodied Question Answering Navigate +3

1,178

Paper
Code

Natural Language Does Not Emerge `Naturally' in Multi-Agent Dialog

1 code implementation • EMNLP 2017 • Satwik Kottur, Jos{\'e} Moura, Stefan Lee, Dhruv Batra

A number of recent works have proposed techniques for end-to-end learning of communication protocols among cooperative multi-agent populations, and have simultaneously found the emergence of grounded human-interpretable language in the protocols developed by the agents, learned without any human supervision!

Slot Filling

Paper
Code

Deal or No Deal? End-to-End Learning of Negotiation Dialogues

no code implementations • EMNLP 2017 • Mike Lewis, Denis Yarats, Yann Dauphin, Devi Parikh, Dhruv Batra

Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions.

Paper
Add Code

Evaluating Visual Conversational Agents via Cooperative Human-AI Games

no code implementations • 17 Aug 2017 • Prithvijit Chattopadhyay, Deshraj Yadav, Viraj Prabhu, Arjun Chandrasekaran, Abhishek Das, Stefan Lee, Dhruv Batra, Devi Parikh

This suggests a mismatch between benchmarking of AI in isolation and in the context of human-AI teams.

Benchmarking

Paper
Add Code

Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog

3 code implementations • 26 Jun 2017 • Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra

105

Paper
Code

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

2 code implementations • 16 Jun 2017 • Mike Lewis, Denis Yarats, Yann N. Dauphin, Devi Parikh, Dhruv Batra

Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions.

1,375

Paper
Code

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

1 code implementation • NeurIPS 2017 • Jiasen Lu, Anitha Kannan, Jianwei Yang, Devi Parikh, Dhruv Batra

In contrast, discriminative dialog models (D) that are trained to rank a list of candidate human responses outperform their generative counterparts; in terms of automatic metrics, diversity, and informativeness of the responses.

Ranked #8 on Visual Dialog on VisDial v0.9 val

Informativeness Metric Learning +2

110

Paper
Code

Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning

no code implementations • CVPR 2017 • Qing Sun, Stefan Lee, Dhruv Batra

We develop the first approximate inference algorithm for 1-Best (and M-Best) decoding in bidirectional neural sequence models by extending Beam Search (BS) to reason about both forward and backward time dependencies.

Image Captioning Sentence

Paper
Add Code

ParlAI: A Dialog Research Software Platform

22 code implementations • EMNLP 2017 • Alexander H. Miller, Will Feng, Adam Fisch, Jiasen Lu, Dhruv Batra, Antoine Bordes, Devi Parikh, Jason Weston

We introduce ParlAI (pronounced "par-lay"), an open-source software platform for dialog research implemented in Python, available at http://parl. ai.

reinforcement-learning Reinforcement Learning (RL) +1

10,427

Paper
Code

The Promise of Premise: Harnessing Question Premises in Visual Question Answering

1 code implementation • EMNLP 2017 • Aroma Mahendru, Viraj Prabhu, Akrit Mohapatra, Dhruv Batra, Stefan Lee

In this paper, we make a simple observation that questions about images often contain premises - objects and relationships implied by the question - and that reasoning about premises can help Visual Question Answering (VQA) models respond more intelligently to irrelevant or previously unseen questions.

Question Answering Visual Question Answering

Paper
Code

C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset

no code implementations • 26 Apr 2017 • Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh

Finally, we evaluate several existing VQA models under this new setting and show that the performances of these models degrade by a significant amount compared to the original VQA setting.

Question Answering Visual Question Answering

Paper
Add Code

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

7 code implementations • ICCV 2017 • Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra

Specifically, we pose a cooperative 'image guessing' game between two agents -- Qbot and Abot -- who communicate in natural language dialog so that Qbot can select an unseen image from a lineup of images.

reinforcement-learning Reinforcement Learning (RL) +2

190

Paper
Code

LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation

1 code implementation • 5 Mar 2017 • Jianwei Yang, Anitha Kannan, Dhruv Batra, Devi Parikh

We present LR-GAN: an adversarial image generation model which takes scene structure and context into account.

Ranked #4 on Image Generation on Stanford Cars

Image Generation

151

Paper
Code

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

7 code implementations • CVPR 2017 • Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, Devi Parikh

We propose to counter these language priors for the task of Visual Question Answering (VQA) and make vision (the V in VQA) matter!

Ranked #3 on Visual Question Answering (VQA) on COCO Visual Question Answering (VQA) real images 2.0 open ended

Visual Question Answering

105

Paper
Code

Visual Dialog

11 code implementations • CVPR 2017 • Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content.

Ranked #15 on Visual Dialog on VisDial v0.9 val

Chatbot Retrieval +1

10,427

Paper
Code

Grad-CAM: Why did you say that?

2 code implementations • 22 Nov 2016 • Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra

We propose a technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing input regions that are 'important' for predictions -- or visual explanations.

Image Captioning Visual Question Answering

9,444

Paper
Code

Resolving Language and Vision Ambiguities Together: Joint Segmentation \& Prepositional Attachment Resolution in Captioned Scenes

no code implementations • EMNLP 2016 • Gordon Christie, Ankit Laddha, Aishwarya Agrawal, Stanislaw Antol, Yash Goyal, Kevin Kochersberger, Dhruv Batra

Common Sense Reasoning Prepositional Phrase Attachment +1

Paper
Add Code

Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models

25 code implementations • 7 Oct 2016 • Ashwin K. Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra

We observe that our method consistently outperforms BS and previously proposed techniques for diverse decoding from neural sequence models.

Image Captioning Machine Translation +4

29,257

Paper
Code

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

124 code implementations • ICCV 2017 • Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra

For captioning and VQA, we show that even non-attention based models can localize inputs.

General Classification Image Classification +2

9,444

Paper
Code

Measuring Machine Intelligence Through Visual Question Answering

no code implementations • 31 Aug 2016 • C. Lawrence Zitnick, Aishwarya Agrawal, Stanislaw Antol, Margaret Mitchell, Dhruv Batra, Devi Parikh

As machines have become more intelligent, there has been a renewed interest in methods for measuring their intelligence.

Image Captioning Question Answering +1

Paper
Add Code

Towards Transparent AI Systems: Interpreting Visual Question Answering Models

no code implementations • 31 Aug 2016 • Yash Goyal, Akrit Mohapatra, Devi Parikh, Dhruv Batra

In this paper, we address the problem of interpreting Visual Question Answering (VQA) models.

Question Answering Visual Question Answering

Paper
Add Code

Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles

no code implementations • NeurIPS 2016 • Stefan Lee, Senthil Purushwalkam, Michael Cogswell, Viresh Ranjan, David Crandall, Dhruv Batra

Many practical perception systems exist within larger processes that include interactions with users or additional components capable of evaluating the quality of predicted solutions.

Multiple-choice

Paper
Add Code

Analyzing the Behavior of Visual Question Answering Models

1 code implementation • EMNLP 2016 • Aishwarya Agrawal, Dhruv Batra, Devi Parikh

Recently, a number of deep-learning based models have been proposed for the task of Visual Question Answering (VQA).

Question Answering Visual Question Answering

218

Paper
Code

Sort Story: Sorting Jumbled Images and Captions into Stories

no code implementations • EMNLP 2016 • Harsh Agrawal, Arjun Chandrasekaran, Dhruv Batra, Devi Parikh, Mohit Bansal

Temporal common sense has applications in AI tasks such as QA, multi-document summarization, and human-AI communication.

Common Sense Reasoning Document Summarization +2

Paper
Add Code

Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions

no code implementations • EMNLP 2016 • Arijit Ray, Gordon Christie, Mohit Bansal, Dhruv Batra, Devi Parikh

We introduce the novel problem of determining the relevance of questions to images in VQA.

Question Answering Question Similarity +1

Paper
Add Code

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

no code implementations • 17 Jun 2016 • Abhishek Das, Harsh Agrawal, C. Lawrence Zitnick, Devi Parikh, Dhruv Batra

We conduct large-scale studies on `human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images.

Question Answering Visual Question Answering

Paper
Add Code

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

no code implementations • EMNLP 2016 • Abhishek Das, Harsh Agrawal, C. Lawrence Zitnick, Devi Parikh, Dhruv Batra

We conduct large-scale studies on `human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images.

Question Answering Visual Question Answering

Paper
Add Code

A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories

no code implementations • NAACL 2016 • Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, V, Lucy erwende, Pushmeet Kohli, James Allen

Question Answering Text Summarization

Paper
Add Code

Hierarchical Question-Image Co-Attention for Visual Question Answering

9 code implementations • NeurIPS 2016 • Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh

In addition, our model reasons about the question (and consequently the image via the co-attention mechanism) in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN).

Ranked #3 on Visual Question Answering (VQA) on VQA v1 test-std

Visual Dialog Visual Question Answering

344

Paper
Code

Radio Transformer Networks: Attention Models for Learning to Synchronize in Wireless Systems

no code implementations • 3 May 2016 • Timothy J. O'Shea, Latha Pemula, Dhruv Batra, T. Charles Clancy

This attention model allows the network to learn a localization network capable of synchronizing and normalizing a radio signal blindly with zero knowledge of the signals structure based on optimization of the network for classification accuracy, sparse representation, and regularization.

General Classification

Paper
Add Code

Visual Storytelling

1 code implementation • NAACL 2016 • Ting-Hao, Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell

We introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling.

Descriptive Visual Storytelling

Paper
Code

Joint Unsupervised Learning of Deep Representations and Image Clusters

3 code implementations • CVPR 2016 • Jianwei Yang, Devi Parikh, Dhruv Batra

In this paper, we propose a recurrent framework for Joint Unsupervised LEarning (JULE) of deep representations and image clusters.

Ranked #1 on Image Clustering on Coil-20

Clustering Image Clustering +1

285

Paper
Code

Counting Everyday Objects in Everyday Scenes

1 code implementation • CVPR 2017 • Prithvijit Chattopadhyay, Ramakrishna Vedantam, Ramprasaath R. Selvaraju, Dhruv Batra, Devi Parikh

In this work, we build dedicated models for counting designed to tackle the large variance in counts, appearances, and scales of objects found in natural scenes.

Ranked #1 on Object Counting on COCO count-test

Object Object Counting +4

Paper
Code

Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes

no code implementations • EMNLP 2016 • Gordon Christie, Ankit Laddha, Aishwarya Agrawal, Stanislaw Antol, Yash Goyal, Kevin Kochersberger, Dhruv Batra

Our approach produces a diverse set of plausible hypotheses for both semantic segmentation and prepositional phrase attachment resolution that are then jointly reranked to select the most consistent pair.

Common Sense Reasoning Prepositional Phrase Attachment +3

Paper
Add Code

A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories

no code implementations • 6 Apr 2016 • Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, James Allen

We created a new corpus of ~50k five-sentence commonsense stories, ROCStories, to enable this evaluation.

Cloze Test Sentence +1

Paper
Add Code

We Are Humor Beings: Understanding and Predicting Visual Humor

no code implementations • CVPR 2016 • Arjun Chandrasekaran, Ashwin K. Vijayakumar, Stanislaw Antol, Mohit Bansal, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh

We collect two datasets of abstract scenes that facilitate the study of humor at both the scene-level and the object-level.

Paper
Add Code

Optimizing Expected Intersection-Over-Union With Candidate-Constrained CRFs

no code implementations • ICCV 2015 • Faruk Ahmed, Dany Tarlow, Dhruv Batra

Currently, there are two dominant approaches: the first approximates the Expected-IoU (EIoU) score as Expected-Intersection-over-Expected-Union (EIoEU); and the second approach is to compute exact EIoU but only over a small set of high-quality candidate solutions.

Image Segmentation Segmentation +1

Paper
Add Code

SubmodBoxes: Near-Optimal Search for a Set of Diverse Object Proposals

no code implementations • NeurIPS 2015 • Qing Sun, Dhruv Batra

This paper formulates the search for a set of bounding boxes (as needed in object proposal generation) as a monotone submodular maximization problem over the space of all possible bounding boxes in an image.

Object Proposal Generation

Paper
Add Code

Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks

no code implementations • 19 Nov 2015 • Stefan Lee, Senthil Purushwalkam, Michael Cogswell, David Crandall, Dhruv Batra

Convolutional Neural Networks have achieved state-of-the-art performance on a wide range of tasks.

Paper
Add Code

Reducing Overfitting in Deep Networks by Decorrelating Representations

no code implementations • 19 Nov 2015 • Michael Cogswell, Faruk Ahmed, Ross Girshick, Larry Zitnick, Dhruv Batra

One major challenge in training Deep Neural Networks is preventing overfitting.

Data Augmentation

Paper
Add Code

Yin and Yang: Balancing and Answering Binary Visual Questions

no code implementations • CVPR 2016 • Peng Zhang, Yash Goyal, Douglas Summers-Stay, Dhruv Batra, Devi Parikh

If the concept can be found in the image, the answer to the question is "yes", and otherwise "no".

Question Answering Visual Question Answering

Paper
Add Code

CloudCV: Large Scale Distributed Computer Vision as a Cloud Service

no code implementations • 12 Jun 2015 • Harsh Agrawal, Clint Solomon Mathialagan, Yash Goyal, Neelima Chavali, Prakriti Banik, Akrit Mohapatra, Ahmed Osman, Dhruv Batra

We are witnessing a proliferation of massive visual data.

Distributed Computing

Paper
Add Code

Active Learning for Structured Probabilistic Models With Histogram Approximation

no code implementations • CVPR 2015 • Qing Sun, Ankit Laddha, Dhruv Batra

Abstract.

Active Learning General Classification +2

Paper
Add Code

Object-Proposal Evaluation Protocol is 'Gameable'

1 code implementation • CVPR 2016 • Neelima Chavali, Harsh Agrawal, Aroma Mahendru, Dhruv Batra

Finally, we plan to release an easy-to-use toolbox which combines various publicly available implementations of object proposal algorithms which standardizes the proposal generation and evaluation so that new methods can be added and evaluated on different datasets.

Object object-detection +2

173

Paper
Code

VQA: Visual Question Answering

21 code implementations • ICCV 2015 • Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh

Given an image and a natural language question about the image, the task is to provide an accurate natural language answer.

Ranked #1 on Visual Question Answering (VQA) on COCO Visual Question Answering (VQA) real images 2.0 open ended

Image Captioning Multiple-choice +1

1,425

Paper
Code

VIP: Finding Important People in Images

no code implementations • CVPR 2015 • Clint Solomon Mathialagan, Andrew C. Gallagher, Dhruv Batra

We address two specific questions -- Given an image, who are the most important individuals in it?

Paper
Add Code

Combining the Best of Graphical Models and ConvNets for Semantic Segmentation

no code implementations • 14 Dec 2014 • Michael Cogswell, Xiao Lin, Senthil Purushwalkam, Dhruv Batra

We present a two-module approach to semantic segmentation that incorporates Convolutional Networks (CNNs) and Graphical Models.

Segmentation Semantic Segmentation

Paper
Add Code

Candidate Constrained CRFs for Loss-Aware Structured Prediction

no code implementations • 10 Dec 2014 • Faruk Ahmed, Daniel Tarlow, Dhruv Batra

The result is that we can use loss-aware prediction methodology to improve performance of the highly tuned pipeline system.

Image Segmentation Semantic Segmentation +1

Paper
Add Code

Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets

no code implementations • NeurIPS 2014 • Adarsh Prasad, Stefanie Jegelka, Dhruv Batra

To cope with the high level of ambiguity faced in domains such as Computer Vision or Natural Language processing, robust prediction methods often search for a diverse set of high-quality candidate solutions or proposals.

Sentence Structured Prediction

Paper
Add Code

Empirical Minimum Bayes Risk Prediction: How to Extract an Extra Few % Performance from Vision Models with Just Three More Parameters

no code implementations • CVPR 2014 • Vittal Premachandran, Daniel Tarlow, Dhruv Batra

When building vision systems that predict structured objects such as image segmentations or human poses, a crucial concern is performance under task-specific evaluation measures (e. g. Jaccard Index or Average Precision).

Paper
Add Code

Multimodal Learning in Loosely-organized Web Images

no code implementations • CVPR 2014 • Kun Duan, David J. Crandall, Dhruv Batra

Photo-sharing websites have become very popular in the last few years, leading to huge collections of online images.

Metric Learning

Paper
Add Code

A Comparative Study of Modern Inference Techniques for Structured Discrete Energy Minimization Problems

no code implementations • 2 Apr 2014 • Jörg H. Kappes, Bjoern Andres, Fred A. Hamprecht, Christoph Schnörr, Sebastian Nowozin, Dhruv Batra, Sungwoong Kim, Bernhard X. Kausler, Thorben Kröger, Jan Lellmann, Nikos Komodakis, Bogdan Savchynskyy, Carsten Rother

However, on new and challenging types of models our findings disagree and suggest that polyhedral methods and integer programming solvers are competitive in terms of runtime and solution quality over a large range of model types.

Paper
Add Code

A Systematic Exploration of Diversity in Machine Translation

no code implementations • EMNLP 2013 • Kevin Gimpel, Dhruv Batra, Chris Dyer, Gregory Shakhnarovich

Machine Translation Translation

Paper
Add Code

Discriminative Re-ranking of Diverse Segmentations

no code implementations • CVPR 2013 • Payman Yadollahpour, Dhruv Batra, Gregory Shakhnarovich

This paper introduces a two-stage approach to semantic image segmentation.

Image Segmentation Re-Ranking +2

Paper
Add Code

Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

no code implementations • NeurIPS 2012 • Abner Guzmán-Rivera, Dhruv Batra, Pushmeet Kohli

The paper addresses the problem of generating multiple hypotheses for prediction tasks that involve interaction with users or successive components in a cascade.

Multiple-choice

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.