Search Results for author: Dhruv Batra

Found 138 papers, 63 papers with code

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

no code implementations NeurIPS 2021 Abhinav Moudgil, Arjun Majumdar, Harsh Agrawal, Stefan Lee, Dhruv Batra

Natural language instructions for visual navigation often use scene descriptions (e. g., "bedroom") and object references (e. g., "green chairs") to provide a breadcrumb trail to a goal location.

Scene Classification Vision and Language Navigation +1

Waypoint Models for Instruction-guided Navigation in Continuous Environments

1 code implementation ICCV 2021 Jacob Krantz, Aaron Gokaslan, Dhruv Batra, Stefan Lee, Oleksandr Maksymets

Little inquiry has explicitly addressed the role of action spaces in language-guided visual navigation -- either in terms of its effect on navigation success or the efficiency with which a robotic agent could execute the resulting trajectory.

Visual Navigation

Learning Robust Agents for Visual Navigation in Dynamic Environments: The Winning Entry of iGibson Challenge 2021

no code implementations22 Sep 2021 Naoki Yokoyama, Qian Luo, Dhruv Batra, Sehoon Ha

Lastly, we achieve sim-to-sim transfer from Habitat to the iGibson simulator, and demonstrate that our proposed methods allow us to train robust agents in dynamic environments with interactive objects or moving humans.

Data Augmentation PointGoal Navigation +1

Realistic PointGoal Navigation via Auxiliary Losses and Information Bottleneck

1 code implementation17 Sep 2021 Guillermo Grande, Dhruv Batra, Erik Wijmans

Under this setting, the agent incurs a penalty for using this privileged information, encouraging the agent to only leverage this information when it is crucial to learning.

PointGoal Navigation

Model-Advantage Optimization for Model-Based Reinforcement Learning

no code implementations26 Jun 2021 Nirbhay Modhe, Harish Kamath, Dhruv Batra, Ashwin Kalyan

In this work, we propose a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models.

Continuous Control Model-based Reinforcement Learning

Success Weighted by Completion Time: A Dynamics-Aware Evaluation Criteria for Embodied Navigation

no code implementations14 Mar 2021 Naoki Yokoyama, Sehoon Ha, Dhruv Batra

Several related works on navigation have used Success weighted by Path Length (SPL) as the primary method of evaluating the path an agent makes to a goal location, but SPL is limited in its ability to properly evaluate agents with complex dynamics.

Large Batch Simulation for Deep Reinforcement Learning

1 code implementation ICLR 2021 Brennan Shacklett, Erik Wijmans, Aleksei Petrenko, Manolis Savva, Dhruv Batra, Vladlen Koltun, Kayvon Fatahalian

We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of over 19, 000 frames of experience per second on a single GPU and up to 72, 000 frames per second on a single eight-GPU machine.

PointGoal Navigation

THDA: Treasure Hunt Data Augmentation for Semantic Navigation

no code implementations ICCV 2021 Oleksandr Maksymets, Vincent Cartillier, Aaron Gokaslan, Erik Wijmans, Wojciech Galuba, Stefan Lee, Dhruv Batra

We show that this is a natural consequence of optimizing for the task metric (which in fact penalizes exploration), is enabled by powerful observation encoders, and is possible due to the finite set of training environment configurations.

Data Augmentation PointGoal Navigation

Auxiliary Tasks and Exploration Enable ObjectGoal Navigation

no code implementations ICCV 2021 Joel Ye, Dhruv Batra, Abhishek Das, Erik Wijmans

We instead re-enable a generic learned agent by adding auxiliary learning tasks and an exploration reward.

Auxiliary Learning

How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget

no code implementations11 Dec 2020 Erik Wijmans, Irfan Essa, Dhruv Batra

PointGoal navigation has seen significant recent interest and progress, spurred on by the Habitat platform and associated challenge.

PointGoal Navigation

Bi-directional Domain Adaptation for Sim2Real Transfer of Embodied Navigation Agents

no code implementations24 Nov 2020 Joanne Truong, Sonia Chernova, Dhruv Batra

Simulation offers the ability to train large numbers of robots in parallel, and offers an abundance of data.

Domain Adaptation PointGoal Navigation Robotics

Where Are You? Localization from Embodied Dialog

2 code implementations EMNLP 2020 Meera Hahn, Jacob Krantz, Dhruv Batra, Devi Parikh, James M. Rehg, Stefan Lee, Peter Anderson

In this paper, we focus on the LED task -- providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices.

Visual Dialog

Sim-to-Real Transfer for Vision-and-Language Navigation

1 code implementation7 Nov 2020 Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee

We study the challenging problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions.

Vision and Language Navigation

SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency

1 code implementation NAACL 2021 Sameer Dharur, Purva Tendulkar, Dhruv Batra, Devi Parikh, Ramprasaath R. Selvaraju

Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer seemingly difficult questions requiring reasoning correctly but get simpler associated sub-questions wrong.

Question Answering Visual Grounding +1

Contrast and Classify: Training Robust VQA Models

1 code implementation ICCV 2021 Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal

Recent Visual Question Answering (VQA) models have shown impressive performance on the VQA benchmark but remain sensitive to small linguistic variations in input questions.

Contrastive Learning Data Augmentation +3

Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views

1 code implementation2 Oct 2020 Vincent Cartillier, Zhile Ren, Neha Jain, Stefan Lee, Irfan Essa, Dhruv Batra

We study the task of semantic mapping - specifically, an embodied agent (a robot or an egocentric AI assistant) is given a tour of a new environment and asked to build an allocentric top-down semantic map ("what is where?")

Representation Learning

Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents

no code implementations7 Sep 2020 Samyak Datta, Oleksandr Maksymets, Judy Hoffman, Stefan Lee, Dhruv Batra, Devi Parikh

This enables a seamless adaption to changing dynamics (a different robot or floor type) by simply re-calibrating the visual odometry model -- circumventing the expense of re-training of the navigation policy.

Robot Navigation Visual Odometry

Auxiliary Tasks Speed Up Learning PointGoal Navigation

1 code implementation9 Jul 2020 Joel Ye, Dhruv Batra, Erik Wijmans, Abhishek Das

PointGoal Navigation is an embodied task that requires agents to navigate to a specified point in an unseen environment.

PointGoal Navigation

ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects

3 code implementations23 Jun 2020 Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans

In particular, the agent is initialized at a random location and pose in an environment and asked to find an instance of an object category, e. g., find a chair, by navigating to it.

Bridging Worlds in Reinforcement Learning with Model-Advantage

no code implementations ICML Workshop LifelongML 2020 Nirbhay Modhe, Harish K Kamath, Dhruv Batra, Ashwin Kalyan

Despite the breakthroughs achieved by Reinforcement Learning (RL) in recent years, RL agents often fail to perform well in unseen environments.

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments – Extended Abstract

no code implementations ICML Workshop LaReL 2020 Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee

We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.

Vision and Language Navigation

Extended Abstract: Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

no code implementations ICML Workshop LaReL 2020 Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

Following a navigation instruction such as 'Walk down the stairs and stop near the sofa' requires an agent to ground scene elements referenced via language (e. g.'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Vision and Language Navigation

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

1 code implementation ECCV 2020 Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

Following a navigation instruction such as 'Walk down the stairs and stop at the brown sofa' requires embodied AI agents to ground scene elements referenced via language (e. g. 'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Fine-tuning Vision and Language Navigation

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

3 code implementations ECCV 2020 Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee

We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.

Vision and Language Navigation

Analyzing Visual Representations in Embodied Navigation Tasks

no code implementations12 Mar 2020 Erik Wijmans, Julian Straub, Dhruv Batra, Irfan Essa, Judy Hoffman, Ari Morcos

Recent advances in deep reinforcement learning require a large amount of training data and generally result in representations that are often over specialized to the target task.

Insights on Visual Representations for Embodied Navigation Tasks

no code implementations ICLR 2020 Erik Wijmans, Julian Straub, Irfan Essa, Dhruv Batra, Judy Hoffman, Ari Morcos

Surprisingly, we find that slight differences in task have no measurable effect on the visual representation for both SqueezeNet and ResNet architectures.

Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

1 code implementation ECCV 2020 Vishvak Murahari, Dhruv Batra, Devi Parikh, Abhishek Das

Next, we find that additional finetuning using "dense" annotations in VisDial leads to even higher NDCG -- more than 10% over our base model -- but hurts MRR -- more than 17% below our base model!

Language Modelling Representation Learning +1

DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

6 code implementations ICLR 2020 Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra

We leverage this scaling to train an agent for 2. 5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs.

Autonomous Navigation Fine-tuning +2

DS-VIC: Unsupervised Discovery of Decision States for Transfer in RL

no code implementations25 Sep 2019 Nirbhay Modhe, Prithvijit Chattopadhyay, Mohit Sharma, Abhishek Das, Devi Parikh, Dhruv Batra, Ramakrishna Vedantam

We learn to identify decision states, namely the parsimonious set of states where decisions meaningfully affect the future states an agent can reach in an environment.

Improving Generative Visual Dialog by Answering Diverse Questions

1 code implementation IJCNLP 2019 Vishvak Murahari, Prithvijit Chattopadhyay, Dhruv Batra, Devi Parikh, Abhishek Das

Prior work on training generative Visual Dialog models with reinforcement learning(Das et al.) has explored a Qbot-Abot image-guessing game and shown that this 'self-talk' approach can lead to improved performance at the downstream dialog-conditioned image-guessing task.

Representation Learning Visual Dialog

Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

no code implementations ICCV 2019 Jyoti Aneja, Harsh Agrawal, Dhruv Batra, Alexander Schwing

We encourage this temporal latent space to capture the 'intention' about how to complete the sentence by mimicking a representation which summarizes the future.

Image Captioning Language Modelling +1

Chasing Ghosts: Instruction Following as Bayesian State Tracking

1 code implementation NeurIPS 2019 Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee

Our experiments show that our approach outperforms a strong LingUNet baseline when predicting the goal location on the map.

Vision and Language Navigation

Cross-Task Knowledge Transfer for Visually-Grounded Navigation

no code implementations ICLR 2019 Devendra Singh Chaplot, Lisa Lee, Ruslan Salakhutdinov, Devi Parikh, Dhruv Batra

Recent efforts on training visual navigation agents conditioned on language using deep reinforcement learning have been successful in learning policies for two different tasks: learning to follow navigational instructions and embodied question answering.

Embodied Question Answering Question Answering +2

Emergence of Compositional Language with Deep Generational Transmission

1 code implementation ICLR 2020 Michael Cogswell, Jiasen Lu, Stefan Lee, Devi Parikh, Dhruv Batra

In this paper, we introduce these cultural evolutionary dynamics into language emergence by periodically replacing agents in a population to create a knowledge gap, implicitly inducing cultural transmission of language.

Embodied Visual Recognition

no code implementations9 Apr 2019 Jianwei Yang, Zhile Ren, Mingze Xu, Xinlei Chen, David Crandall, Devi Parikh, Dhruv Batra

Passive visual systems typically fail to recognize objects in the amodal setting where they are heavily occluded.

Object Classification Object Localization +1

Multi-Target Embodied Question Answering

1 code implementation CVPR 2019 Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L. Berg, Dhruv Batra

To address this, we propose a modular architecture composed of a program generator, a controller, a navigator, and a VQA module.

Embodied Question Answering Question Answering

Embodied Question Answering in Photorealistic Environments with Point Cloud Perception

no code implementations CVPR 2019 Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra

To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task -- Embodied Question Answering [1] in photo-realistic environments (Matterport 3D).

Embodied Question Answering Question Answering

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

no code implementations ICCV 2019 Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, Devi Parikh

Many vision and language models suffer from poor visual grounding - often falling back on easy-to-learn language priors rather than basing their decisions on visual concepts in the image.

Image Captioning Question Answering +2

EvalAI: Towards Better Evaluation Systems for AI Agents

3 code implementations10 Feb 2019 Deshraj Yadav, Rishabh Jain, Harsh Agrawal, Prithvijit Chattopadhyay, Taranjeet Singh, Akash Jain, Shiv Baran Singh, Stefan Lee, Dhruv Batra

We introduce EvalAI, an open source platform for evaluating and comparing machine learning (ML) and artificial intelligence algorithms (AI) at scale.

Embodied Multimodal Multitask Learning

no code implementations4 Feb 2019 Devendra Singh Chaplot, Lisa Lee, Ruslan Salakhutdinov, Devi Parikh, Dhruv Batra

In this paper, we propose a multitask model capable of jointly learning these multimodal tasks, and transferring knowledge of words and their grounding in visual objects across the tasks.

Embodied Question Answering Question Answering +2

Response to "Visual Dialogue without Vision or Dialogue" (Massiceti et al., 2018)

no code implementations16 Jan 2019 Abhishek Das, Devi Parikh, Dhruv Batra

In a recent workshop paper, Massiceti et al. presented a baseline model and subsequent critique of Visual Dialog (Das et al., CVPR 2017) that raises what we believe to be unfounded concerns about the dataset and evaluation.

Visual Dialog

Dialog System Technology Challenge 7

no code implementations11 Jan 2019 Koichiro Yoshino, Chiori Hori, Julien Perez, Luis Fernando D'Haro, Lazaros Polymenakos, Chulaka Gunasekara, Walter S. Lasecki, Jonathan K. Kummerfeld, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan, Xiang Gao, Huda Alamari, Tim K. Marks, Devi Parikh, Dhruv Batra

This paper introduces the Seventh Dialog System Technology Challenges (DSTC), which use shared datasets to explore the problem of building dialog systems.

nocaps: novel object captioning at scale

2 code implementations ICCV 2019 Harsh Agrawal, Karan Desai, YuFei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson

To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task.

Image Captioning Object Detection

Fabrik: An Online Collaborative Neural Network Editor

no code implementations27 Oct 2018 Utsav Garg, Viraj Prabhu, Deshraj Yadav, Ram Ramrakhya, Harsh Agrawal, Dhruv Batra

We present Fabrik, an online neural network editor that provides tools to visualize, edit, and share neural networks from within a browser.

TarMAC: Targeted Multi-Agent Communication

no code implementations ICLR 2019 Abhishek Das, Théophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Michael Rabbat, Joelle Pineau

We propose a targeted communication architecture for multi-agent reinforcement learning, where agents learn both what messages to send and whom to address them to while performing cooperative tasks in partially-observable environments.

Multi-agent Reinforcement Learning

Neural Modular Control for Embodied Question Answering

2 code implementations26 Oct 2018 Abhishek Das, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra

We use imitation learning to warm-start policies at each level of the hierarchy, dramatically increasing sample efficiency, followed by reinforcement learning.

Embodied Question Answering Imitation Learning +1

Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition

no code implementations1 Oct 2018 Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh

Our question generation policy generalizes to new environments and a new pair of eyes, i. e., new visual system.

Question Generation

Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance

1 code implementation ECCV 2018 Ramprasaath R. Selvaraju, Prithvijit Chattopadhyay, Mohamed Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee

Our approach, which we call Neuron Importance-AwareWeight Transfer (NIWT), learns to map domain knowledge about novel "unseen" classes onto this dictionary of learned concepts and then optimizes for network parameters that can effectively combine these concepts - essentially learning classifiers by discovering and composing learned semantic concepts in deep networks.

Generalized Zero-Shot Learning

Graph R-CNN for Scene Graph Generation

3 code implementations ECCV 2018 Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh

We propose a novel scene graph generation model called Graph R-CNN, that is both effective and efficient at detecting objects and their relations in images.

Graph Generation Scene Graph Generation

Pythia v0.1: the Winning Entry to the VQA Challenge 2018

7 code implementations26 Jul 2018 Yu Jiang, Vivek Natarajan, Xinlei Chen, Marcus Rohrbach, Dhruv Batra, Devi Parikh

We demonstrate that by making subtle but important changes to the model architecture and the learning rate schedule, fine-tuning image features, and adding data augmentation, we can significantly improve the performance of the up-down model on VQA v2. 0 dataset -- from 65. 67% to 70. 22%.

Data Augmentation Fine-tuning +1

Talk the Walk: Navigating New York City through Grounded Dialogue

1 code implementation9 Jul 2018 Harm de Vries, Kurt Shuster, Dhruv Batra, Devi Parikh, Jason Weston, Douwe Kiela

We introduce "Talk The Walk", the first large-scale dialogue dataset grounded in action and perception.

Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse Annotations

no code implementations ICML 2018 Ashwin Kalyan, Stefan Lee, Anitha Kannan, Dhruv Batra

Many structured prediction problems (particularly in vision and language domains) are ambiguous, with multiple outputs being correct for an input - e. g. there are many ways of describing an image, multiple ways of translating a sentence; however, exhaustively annotating the applicability of all possible outputs is intractable due to exponentially large output spaces (e. g. all English sentences).

Multi-Label Classification Question Generation +1

Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples

no code implementations ICLR 2018 Ashwin Kalyan, Abhishek Mohta, Oleksandr Polozov, Dhruv Batra, Prateek Jain, Sumit Gulwani

In this work, we propose Neural Guided Deductive Search (NGDS), a hybrid synthesis technique that combines the best of both symbolic logic techniques and statistical models.

Program Synthesis

Neural Baby Talk

1 code implementation CVPR 2018 Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh

We introduce a novel framework for image captioning that can produce natural language explicitly grounded in entities that object detectors find in the image.

Image Captioning Slot Filling

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

1 code implementation CVPR 2018 Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi

Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 respectively).

Question Answering Visual Question Answering

Embodied Question Answering

4 code implementations CVPR 2018 Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra

We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- where an agent is spawned at a random location in a 3D environment and asked a question ("What color is the car?").

Embodied Question Answering Language understanding +1

Natural Language Does Not Emerge `Naturally' in Multi-Agent Dialog

1 code implementation EMNLP 2017 Satwik Kottur, Jos{\'e} Moura, Stefan Lee, Dhruv Batra

A number of recent works have proposed techniques for end-to-end learning of communication protocols among cooperative multi-agent populations, and have simultaneously found the emergence of grounded human-interpretable language in the protocols developed by the agents, learned without any human supervision!

Slot Filling

Deal or No Deal? End-to-End Learning of Negotiation Dialogues

no code implementations EMNLP 2017 Mike Lewis, Denis Yarats, Yann Dauphin, Devi Parikh, Dhruv Batra

Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions.

Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog

3 code implementations26 Jun 2017 Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra

A number of recent works have proposed techniques for end-to-end learning of communication protocols among cooperative multi-agent populations, and have simultaneously found the emergence of grounded human-interpretable language in the protocols developed by the agents, all learned without any human supervision!

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

1 code implementation16 Jun 2017 Mike Lewis, Denis Yarats, Yann N. Dauphin, Devi Parikh, Dhruv Batra

Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions.

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

1 code implementation NeurIPS 2017 Jiasen Lu, Anitha Kannan, Jianwei Yang, Devi Parikh, Dhruv Batra

In contrast, discriminative dialog models (D) that are trained to rank a list of candidate human responses outperform their generative counterparts; in terms of automatic metrics, diversity, and informativeness of the responses.

Metric Learning Transfer Learning +1

Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning

no code implementations CVPR 2017 Qing Sun, Stefan Lee, Dhruv Batra

We develop the first approximate inference algorithm for 1-Best (and M-Best) decoding in bidirectional neural sequence models by extending Beam Search (BS) to reason about both forward and backward time dependencies.

Image Captioning

ParlAI: A Dialog Research Software Platform

18 code implementations EMNLP 2017 Alexander H. Miller, Will Feng, Adam Fisch, Jiasen Lu, Dhruv Batra, Antoine Bordes, Devi Parikh, Jason Weston

We introduce ParlAI (pronounced "par-lay"), an open-source software platform for dialog research implemented in Python, available at http://parl. ai.

Visual Question Answering

The Promise of Premise: Harnessing Question Premises in Visual Question Answering

1 code implementation EMNLP 2017 Aroma Mahendru, Viraj Prabhu, Akrit Mohapatra, Dhruv Batra, Stefan Lee

In this paper, we make a simple observation that questions about images often contain premises - objects and relationships implied by the question - and that reasoning about premises can help Visual Question Answering (VQA) models respond more intelligently to irrelevant or previously unseen questions.

Question Answering Visual Question Answering

C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset

no code implementations26 Apr 2017 Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh

Finally, we evaluate several existing VQA models under this new setting and show that the performances of these models degrade by a significant amount compared to the original VQA setting.

Question Answering Visual Question Answering

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

6 code implementations ICCV 2017 Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra

Specifically, we pose a cooperative 'image guessing' game between two agents -- Qbot and Abot -- who communicate in natural language dialog so that Qbot can select an unseen image from a lineup of images.

Visual Dialog Visual Question Answering

LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation

1 code implementation5 Mar 2017 Jianwei Yang, Anitha Kannan, Dhruv Batra, Devi Parikh

We present LR-GAN: an adversarial image generation model which takes scene structure and context into account.

Image Generation

Visual Dialog

10 code implementations CVPR 2017 Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content.

Chatbot Visual Dialog

Grad-CAM: Why did you say that?

1 code implementation22 Nov 2016 Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra

We propose a technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing input regions that are 'important' for predictions -- or visual explanations.

Image Captioning Visual Question Answering

Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models

17 code implementations7 Oct 2016 Ashwin K. Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra

We observe that our method consistently outperforms BS and previously proposed techniques for diverse decoding from neural sequence models.

Image Captioning Machine Translation +3

Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles

no code implementations NeurIPS 2016 Stefan Lee, Senthil Purushwalkam, Michael Cogswell, Viresh Ranjan, David Crandall, Dhruv Batra

Many practical perception systems exist within larger processes that include interactions with users or additional components capable of evaluating the quality of predicted solutions.

Analyzing the Behavior of Visual Question Answering Models

1 code implementation EMNLP 2016 Aishwarya Agrawal, Dhruv Batra, Devi Parikh

Recently, a number of deep-learning based models have been proposed for the task of Visual Question Answering (VQA).

Question Answering Visual Question Answering

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

no code implementations17 Jun 2016 Abhishek Das, Harsh Agrawal, C. Lawrence Zitnick, Devi Parikh, Dhruv Batra

We conduct large-scale studies on `human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images.

Question Answering Visual Question Answering

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

no code implementations EMNLP 2016 Abhishek Das, Harsh Agrawal, C. Lawrence Zitnick, Devi Parikh, Dhruv Batra

We conduct large-scale studies on `human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images.

Question Answering Visual Question Answering

Hierarchical Question-Image Co-Attention for Visual Question Answering

9 code implementations NeurIPS 2016 Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh

In addition, our model reasons about the question (and consequently the image via the co-attention mechanism) in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN).

Visual Dialog Visual Question Answering

Radio Transformer Networks: Attention Models for Learning to Synchronize in Wireless Systems

no code implementations3 May 2016 Timothy J. O'Shea, Latha Pemula, Dhruv Batra, T. Charles Clancy

This attention model allows the network to learn a localization network capable of synchronizing and normalizing a radio signal blindly with zero knowledge of the signals structure based on optimization of the network for classification accuracy, sparse representation, and regularization.

General Classification

Joint Unsupervised Learning of Deep Representations and Image Clusters

2 code implementations CVPR 2016 Jianwei Yang, Devi Parikh, Dhruv Batra

In this paper, we propose a recurrent framework for Joint Unsupervised LEarning (JULE) of deep representations and image clusters.

Image Clustering Representation Learning

Counting Everyday Objects in Everyday Scenes

no code implementations CVPR 2017 Prithvijit Chattopadhyay, Ramakrishna Vedantam, Ramprasaath R. Selvaraju, Dhruv Batra, Devi Parikh

In this work, we build dedicated models for counting designed to tackle the large variance in counts, appearances, and scales of objects found in natural scenes.

Object Detection Question Answering +1

SubmodBoxes: Near-Optimal Search for a Set of Diverse Object Proposals

no code implementations NeurIPS 2015 Qing Sun, Dhruv Batra

This paper formulates the search for a set of bounding boxes (as needed in object proposal generation) as a monotone submodular maximization problem over the space of all possible bounding boxes in an image.

Object Proposal Generation

Optimizing Expected Intersection-Over-Union With Candidate-Constrained CRFs

no code implementations ICCV 2015 Faruk Ahmed, Dany Tarlow, Dhruv Batra

Currently, there are two dominant approaches: the first approximates the Expected-IoU (EIoU) score as Expected-Intersection-over-Expected-Union (EIoEU); and the second approach is to compute exact EIoU but only over a small set of high-quality candidate solutions.

Semantic Segmentation

Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks

no code implementations19 Nov 2015 Stefan Lee, Senthil Purushwalkam, Michael Cogswell, David Crandall, Dhruv Batra

Convolutional Neural Networks have achieved state-of-the-art performance on a wide range of tasks.

Object-Proposal Evaluation Protocol is 'Gameable'

no code implementations CVPR 2016 Neelima Chavali, Harsh Agrawal, Aroma Mahendru, Dhruv Batra

Finally, we plan to release an easy-to-use toolbox which combines various publicly available implementations of object proposal algorithms which standardizes the proposal generation and evaluation so that new methods can be added and evaluated on different datasets.

Object Detection Object Discovery

VIP: Finding Important People in Images

no code implementations CVPR 2015 Clint Solomon Mathialagan, Andrew C. Gallagher, Dhruv Batra

We address two specific questions -- Given an image, who are the most important individuals in it?

Combining the Best of Graphical Models and ConvNets for Semantic Segmentation

no code implementations14 Dec 2014 Michael Cogswell, Xiao Lin, Senthil Purushwalkam, Dhruv Batra

We present a two-module approach to semantic segmentation that incorporates Convolutional Networks (CNNs) and Graphical Models.

Semantic Segmentation

Candidate Constrained CRFs for Loss-Aware Structured Prediction

no code implementations10 Dec 2014 Faruk Ahmed, Daniel Tarlow, Dhruv Batra

The result is that we can use loss-aware prediction methodology to improve performance of the highly tuned pipeline system.

Semantic Segmentation Structured Prediction

Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets

no code implementations NeurIPS 2014 Adarsh Prasad, Stefanie Jegelka, Dhruv Batra

To cope with the high level of ambiguity faced in domains such as Computer Vision or Natural Language processing, robust prediction methods often search for a diverse set of high-quality candidate solutions or proposals.

Structured Prediction

Multimodal Learning in Loosely-organized Web Images

no code implementations CVPR 2014 Kun Duan, David J. Crandall, Dhruv Batra

Photo-sharing websites have become very popular in the last few years, leading to huge collections of online images.

Metric Learning

Empirical Minimum Bayes Risk Prediction: How to Extract an Extra Few % Performance from Vision Models with Just Three More Parameters

no code implementations CVPR 2014 Vittal Premachandran, Daniel Tarlow, Dhruv Batra

When building vision systems that predict structured objects such as image segmentations or human poses, a crucial concern is performance under task-specific evaluation measures (e. g. Jaccard Index or Average Precision).

A Comparative Study of Modern Inference Techniques for Structured Discrete Energy Minimization Problems

no code implementations2 Apr 2014 Jörg H. Kappes, Bjoern Andres, Fred A. Hamprecht, Christoph Schnörr, Sebastian Nowozin, Dhruv Batra, Sungwoong Kim, Bernhard X. Kausler, Thorben Kröger, Jan Lellmann, Nikos Komodakis, Bogdan Savchynskyy, Carsten Rother

However, on new and challenging types of models our findings disagree and suggest that polyhedral methods and integer programming solvers are competitive in terms of runtime and solution quality over a large range of model types.

Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

no code implementations NeurIPS 2012 Abner Guzmán-Rivera, Dhruv Batra, Pushmeet Kohli

The paper addresses the problem of generating multiple hypotheses for prediction tasks that involve interaction with users or successive components in a cascade.

Cannot find the paper you are looking for? You can Submit a new open access paper.