no code implementations • ICLR 2019 • Harm de Vries, Kurt Shuster, Dhruv Batra, Devi Parikh, Jason Weston, Douwe Kiela
We introduce `"Talk The Walk", the first large-scale dialogue dataset grounded in action and perception.
no code implementations • 14 Mar 2023 • Karmesh Yadav, Arjun Majumdar, Ram Ramrakhya, Naoki Yokoyama, Alexei Baevski, Zsolt Kira, Oleksandr Maksymets, Dhruv Batra
We present a single neural network architecture composed of task-agnostic components (ViTs, convolutions, and LSTMs) that achieves state-of-art results on both the ImageNav ("go to location in <this picture>") and ObjectNav ("find a chair") tasks without any task-specific modules like object detection, segmentation, mapping, or planning modules.
no code implementations • 30 Jan 2023 • Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra
A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial.
no code implementations • 18 Jan 2023 • Ram Ramrakhya, Dhruv Batra, Erik Wijmans, Abhishek Das
We present a two-stage learning scheme for IL pretraining on human demonstrations followed by RL-finetuning.
no code implementations • 14 Dec 2022 • Karl Pertsch, Ruta Desai, Vikash Kumar, Franziska Meier, Joseph J. Lim, Dhruv Batra, Akshara Rai
We propose an approach for semantic imitation, which uses demonstrations from a source domain, e. g. human videos, to accelerate reinforcement learning (RL) in a different target domain, e. g. a robotic manipulator in a simulated kitchen.
no code implementations • 2 Dec 2022 • Theophile Gervet, Soumith Chintala, Dhruv Batra, Jitendra Malik, Devendra Singh Chaplot
In contrast, end-to-end learning does not, dropping from 77% simulation to 23% real-world success rate due to a large image domain gap between simulation and reality.
1 code implementation • 29 Nov 2022 • Jacob Krantz, Stefan Lee, Jitendra Malik, Dhruv Batra, Devendra Singh Chaplot
We consider the problem of embodied visual navigation given an image-goal (ImageNav) where an agent is initialized in an unfamiliar environment and tasked with navigating to a location 'described' by an image.
1 code implementation • 26 Oct 2022 • Simar Kareer, Naoki Yokoyama, Dhruv Batra, Sehoon Ha, Joanne Truong
ViNL consists of: (1) a visual navigation policy that outputs linear and angular velocity commands that guides the robot to a goal coordinate in unfamiliar indoor environments; and (2) a visual locomotion policy that controls the robot's joints to avoid stepping on obstacles while following provided velocity commands.
no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu
We present a retrospective on the state of Embodied AI research.
1 code implementation • 11 Oct 2022 • Erik Wijmans, Irfan Essa, Dhruv Batra
Specifically, the Pick skill involves a robot picking an object from a table.
2 code implementations • 11 Oct 2022 • Karmesh Yadav, Ram Ramrakhya, Santhosh Kumar Ramakrishnan, Theo Gervet, John Turner, Aaron Gokaslan, Noah Maestre, Angel Xuan Chang, Dhruv Batra, Manolis Savva, Alexander William Clegg, Devendra Singh Chaplot
The scale, quality, and diversity of object annotations far exceed those of prior datasets.
no code implementations • 24 Jun 2022 • Arjun Majumdar, Gunjan Aggarwal, Bhavika Devnani, Judy Hoffman, Dhruv Batra
We present a scalable approach for learning open-world object-goal navigation (ObjectNav) -- the task of asking a virtual robot (agent) to find any instance of an object in an unexplored environment (e. g., "find a sink").
2 code implementations • 16 Jun 2022 • Changan Chen, Carl Schissler, Sanchit Garg, Philip Kobernik, Alexander Clegg, Paul Calamia, Dhruv Batra, Philip W Robinson, Kristen Grauman
We introduce SoundSpaces 2. 0, a platform for on-the-fly geometry-based audio rendering for 3D environments.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • CVPR 2022 • Ruslan Partsey, Erik Wijmans, Naoki Yokoyama, Oles Dobosevych, Dhruv Batra, Oleksandr Maksymets
However, for PointNav in a realistic setting (RGB-D and actuation noise, no GPS+Compass), this is an open question; one we tackle in this paper.
1 code implementation • 22 May 2022 • Yash Kant, Arun Ramachandran, Sriram Yenamandra, Igor Gilitschenski, Dhruv Batra, Andrew Szot, Harsh Agrawal
Instead, the agent must learn from and is evaluated against human preferences of which objects belong where in a tidy house.
no code implementations • CVPR 2022 • Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, Devi Parikh
Towards that end, we introduce (1) a new task - Episodic Memory Question Answering (EMQA) wherein an egocentric AI assistant is provided with a video sequence (the tour) and a question as an input and is asked to localize its answer to the question within the tour, (2) a dataset of grounded questions designed to probe the agent's spatio-temporal understanding of the tour, and (3) a model for the task that encodes the scene as an allocentric, top-down semantic feature map and grounds the question into the map to localize the answer.
no code implementations • 27 Apr 2022 • Karmesh Yadav, Ram Ramrakhya, Arjun Majumdar, Vincent-Pierre Berges, Sachit Kuhar, Dhruv Batra, Alexei Baevski, Oleksandr Maksymets
In this paper, we show that an alternative 2-stage strategy is far more effective: (1) offline pretraining of visual representations with self-supervised learning (SSL) using large-scale pre-rendered images of indoor environments (Omnidata), and (2) online finetuning of visuomotor representations on specific tasks with image augmentations under long learning schedules.
no code implementations • CVPR 2022 • Ram Ramrakhya, Eric Undersander, Dhruv Batra, Abhishek Das
We present a large-scale study of imitating human demonstrations on tasks that require a virtual robot to search for objects in new environments -- (1) ObjectGoal Navigation (e. g. 'find & go to a chair') and (2) Pick&Place (e. g. 'find mug, pick mug, find counter, place mug on counter').
1 code implementation • 6 Apr 2022 • Jing Yu Koh, Harsh Agrawal, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
We study the problem of synthesizing immersive 3D indoor scenes from one or more images.
no code implementations • NeurIPS 2021 • Abhinav Moudgil, Arjun Majumdar, Harsh Agrawal, Stefan Lee, Dhruv Batra
Natural language instructions for visual navigation often use scene descriptions (e. g., "bedroom") and object references (e. g., "green chairs") to provide a breadcrumb trail to a goal location.
3 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.
1 code implementation • ICCV 2021 • Jacob Krantz, Aaron Gokaslan, Dhruv Batra, Stefan Lee, Oleksandr Maksymets
Little inquiry has explicitly addressed the role of action spaces in language-guided visual navigation -- either in terms of its effect on navigation success or the efficiency with which a robotic agent could execute the resulting trajectory.
no code implementations • 22 Sep 2021 • Naoki Yokoyama, Qian Luo, Dhruv Batra, Sehoon Ha
Recent advances in deep reinforcement learning and scalable photorealistic simulation have led to increasingly mature embodied AI for various visual tasks, including navigation.
1 code implementation • 17 Sep 2021 • Guillermo Grande, Dhruv Batra, Erik Wijmans
Under this setting, the agent incurs a penalty for using this privileged information, encouraging the agent to only leverage this information when it is crucial to learning.
2 code implementations • 16 Sep 2021 • Santhosh K. Ramakrishnan, Aaron Gokaslan, Erik Wijmans, Oleksandr Maksymets, Alex Clegg, John Turner, Eric Undersander, Wojciech Galuba, Andrew Westbury, Angel X. Chang, Manolis Savva, Yili Zhao, Dhruv Batra
When compared to existing photorealistic 3D datasets such as Replica, MP3D, Gibson, and ScanNet, images rendered from HM3D have 20 - 85% higher visual fidelity w. r. t.
no code implementations • ICCV 2021 • Xiaoming Zhao, Harsh Agrawal, Dhruv Batra, Alexander Schwing
It is fundamental for personal robots to reliably navigate to a specified goal.
6 code implementations • NeurIPS 2021 • Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, Dhruv Batra
We introduce Habitat 2. 0 (H2. 0), a simulation platform for training virtual robots in interactive 3D environments and complex physics-enabled scenarios.
1 code implementation • 26 Jun 2021 • Nirbhay Modhe, Harish Kamath, Dhruv Batra, Ashwin Kalyan
This work shows that value-aware model learning, known for its numerous theoretical benefits, is also practically viable for solving challenging continuous control tasks in prevalent model-based reinforcement learning algorithms.
1 code implementation • 8 Apr 2021 • Joel Ye, Dhruv Batra, Abhishek Das, Erik Wijmans
We instead re-enable a generic learned agent by adding auxiliary learning tasks and an exploration reward.
Ranked #2 on
Robot Navigation
on Habitat 2020 Object Nav test-std
no code implementations • 14 Mar 2021 • Naoki Yokoyama, Sehoon Ha, Dhruv Batra
Several related works on navigation have used Success weighted by Path Length (SPL) as the primary method of evaluating the path an agent makes to a goal location, but SPL is limited in its ability to properly evaluate agents with complex dynamics.
1 code implementation • ICLR 2021 • Brennan Shacklett, Erik Wijmans, Aleksei Petrenko, Manolis Savva, Dhruv Batra, Vladlen Koltun, Kayvon Fatahalian
We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of over 19, 000 frames of experience per second on a single GPU and up to 72, 000 frames per second on a single eight-GPU machine.
1 code implementation • 13 Jan 2021 • Lina Mezghani, Sainbayar Sukhbaatar, Thibaut Lavril, Oleksandr Maksymets, Dhruv Batra, Piotr Bojanowski, Karteek Alahari
In this work, we present a memory-augmented approach for image-goal navigation.
no code implementations • ICCV 2021 • Oleksandr Maksymets, Vincent Cartillier, Aaron Gokaslan, Erik Wijmans, Wojciech Galuba, Stefan Lee, Dhruv Batra
We show that this is a natural consequence of optimizing for the task metric (which in fact penalizes exploration), is enabled by powerful observation encoders, and is possible due to the finite set of training environment configurations.
no code implementations • ICCV 2021 • Joel Ye, Dhruv Batra, Abhishek Das, Erik Wijmans
We instead re-enable a generic learned agent by adding auxiliary learning tasks and an exploration reward.
no code implementations • 11 Dec 2020 • Erik Wijmans, Irfan Essa, Dhruv Batra
PointGoal navigation has seen significant recent interest and progress, spurred on by the Habitat platform and associated challenge.
no code implementations • 24 Nov 2020 • Joanne Truong, Sonia Chernova, Dhruv Batra
Simulation offers the ability to train large numbers of robots in parallel, and offers an abundance of data.
Domain Adaptation
PointGoal Navigation
Robotics
2 code implementations • EMNLP 2020 • Meera Hahn, Jacob Krantz, Dhruv Batra, Devi Parikh, James M. Rehg, Stefan Lee, Peter Anderson
In this paper, we focus on the LED task -- providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices.
1 code implementation • 7 Nov 2020 • Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee
We study the challenging problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions.
no code implementations • 3 Nov 2020 • Dhruv Batra, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun, Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, Manolis Savva, Hao Su
In the rearrangement task, the goal is to bring a given physical environment into a specified state.
1 code implementation • NAACL 2021 • Sameer Dharur, Purva Tendulkar, Dhruv Batra, Devi Parikh, Ramprasaath R. Selvaraju
Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer seemingly difficult questions requiring reasoning correctly but get simpler associated sub-questions wrong.
1 code implementation • ICCV 2021 • Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal
Recent Visual Question Answering (VQA) models have shown impressive performance on the VQA benchmark but remain sensitive to small linguistic variations in input questions.
1 code implementation • 2 Oct 2020 • Vincent Cartillier, Zhile Ren, Neha Jain, Stefan Lee, Irfan Essa, Dhruv Batra
We study the task of semantic mapping - specifically, an embodied agent (a robot or an egocentric AI assistant) is given a tour of a new environment and asked to build an allocentric top-down semantic map ("what is where?")
no code implementations • 7 Sep 2020 • Samyak Datta, Oleksandr Maksymets, Judy Hoffman, Stefan Lee, Dhruv Batra, Devi Parikh
This enables a seamless adaption to changing dynamics (a different robot or floor type) by simply re-calibrating the visual odometry model -- circumventing the expense of re-training of the navigation policy.
Ranked #5 on
Robot Navigation
on Habitat 2020 Point Nav test-std
1 code implementation • NeurIPS 2020 • Michael Cogswell, Jiasen Lu, Rishabh Jain, Stefan Lee, Devi Parikh, Dhruv Batra
Can we develop visually grounded dialog agents that can efficiently adapt to new tasks without forgetting how to talk to people?
1 code implementation • ECCV 2020 • Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal
Further, each head in our multi-head self-attention layer focuses on a different subset of relations.
no code implementations • ECCV 2020 • Medhini Narasimhan, Erik Wijmans, Xinlei Chen, Trevor Darrell, Dhruv Batra, Devi Parikh, Amanpreet Singh
We also demonstrate that reducing the task of room navigation to point navigation improves the performance further.
1 code implementation • 9 Jul 2020 • Joel Ye, Dhruv Batra, Erik Wijmans, Abhishek Das
PointGoal Navigation is an embodied task that requires agents to navigate to a specified point in an unseen environment.
2 code implementations • 23 Jun 2020 • Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans
In particular, the agent is initialized at a random location and pose in an environment and asked to find an instance of an object category, e. g., find a chair, by navigating to it.
no code implementations • ICML Workshop LifelongML 2020 • Nirbhay Modhe, Harish K Kamath, Dhruv Batra, Ashwin Kalyan
Despite the breakthroughs achieved by Reinforcement Learning (RL) in recent years, RL agents often fail to perform well in unseen environments.
no code implementations • ICML Workshop LaReL 2020 • Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee
We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.
no code implementations • ICML Workshop LaReL 2020 • Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra
Following a navigation instruction such as 'Walk down the stairs and stop near the sofa' requires an agent to ground scene elements referenced via language (e. g.'stairs') to visual content in the environment (pixels corresponding to 'stairs').
1 code implementation • ECCV 2020 • Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra
Following a navigation instruction such as 'Walk down the stairs and stop at the brown sofa' requires embodied AI agents to ground scene elements referenced via language (e. g. 'stairs') to visual content in the environment (pixels corresponding to 'stairs').
Ranked #6 on
Vision and Language Navigation
on VLN Challenge
3 code implementations • ECCV 2020 • Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee
We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.
no code implementations • 12 Mar 2020 • Erik Wijmans, Julian Straub, Dhruv Batra, Irfan Essa, Judy Hoffman, Ari Morcos
Recent advances in deep reinforcement learning require a large amount of training data and generally result in representations that are often over specialized to the target task.
no code implementations • ICLR 2020 • Erik Wijmans, Julian Straub, Irfan Essa, Dhruv Batra, Judy Hoffman, Ari Morcos
Surprisingly, we find that slight differences in task have no measurable effect on the visual representation for both SqueezeNet and ResNet architectures.
3 code implementations • 13 Dec 2019 • Abhishek Kadian, Joanne Truong, Aaron Gokaslan, Alexander Clegg, Erik Wijmans, Stefan Lee, Manolis Savva, Sonia Chernova, Dhruv Batra
Second, we investigate the sim2real predictivity of Habitat-Sim for PointGoal navigation.
2 code implementations • ECCV 2020 • Vishvak Murahari, Dhruv Batra, Devi Parikh, Abhishek Das
Next, we find that additional finetuning using "dense" annotations in VisDial leads to even higher NDCG -- more than 10% over our base model -- but hurts MRR -- more than 17% below our base model!
8 code implementations • ICLR 2020 • Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra
We leverage this scaling to train an agent for 2. 5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs.
Ranked #1 on
PointGoal Navigation
on Gibson PointGoal Navigation
no code implementations • 25 Sep 2019 • Nirbhay Modhe, Prithvijit Chattopadhyay, Mohit Sharma, Abhishek Das, Devi Parikh, Dhruv Batra, Ramakrishna Vedantam
We learn to identify decision states, namely the parsimonious set of states where decisions meaningfully affect the future states an agent can reach in an environment.
1 code implementation • IJCNLP 2019 • Vishvak Murahari, Prithvijit Chattopadhyay, Dhruv Batra, Devi Parikh, Abhishek Das
Prior work on training generative Visual Dialog models with reinforcement learning(Das et al.) has explored a Qbot-Abot image-guessing game and shown that this 'self-talk' approach can lead to improved performance at the downstream dialog-conditioned image-guessing task.
no code implementations • ICCV 2019 • Jyoti Aneja, Harsh Agrawal, Dhruv Batra, Alexander Schwing
We encourage this temporal latent space to capture the 'intention' about how to complete the sentence by mimicking a representation which summarizes the future.
12 code implementations • NeurIPS 2019 • Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee
We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language.
Ranked #4 on
Referring Expression Comprehension
on Talk2Car
no code implementations • 24 Jul 2019 • Nirbhay Modhe, Prithvijit Chattopadhyay, Mohit Sharma, Abhishek Das, Devi Parikh, Dhruv Batra, Ramakrishna Vedantam
We propose a novel framework to identify sub-goals useful for exploration in sequential decision making tasks under partial observability.
1 code implementation • NeurIPS 2019 • Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee
Our experiments show that our approach outperforms a strong LingUNet baseline when predicting the goal location on the map.
2 code implementations • 13 Jun 2019 • Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra, Hauke M. Strasdat, Renzo De Nardi, Michael Goesele, Steven Lovegrove, Richard Newcombe
We introduce Replica, a dataset of 18 highly photo-realistic 3D indoor scene reconstructions at room and building scale.
1 code implementation • ICCV 2019 • Daniel Gordon, Abhishek Kadian, Devi Parikh, Judy Hoffman, Dhruv Batra
We propose SplitNet, a method for decoupling visual perception and policy learning.
no code implementations • ICLR 2019 • Nan Rosemary Ke, Amanpreet Singh, Ahmed Touati, Anirudh Goyal, Yoshua Bengio, Devi Parikh, Dhruv Batra
This paper focuses on building a model that reasons about the long-term future and demonstrates how to use this for efficient planning and exploration.
no code implementations • ICLR 2019 • Devendra Singh Chaplot, Lisa Lee, Ruslan Salakhutdinov, Devi Parikh, Dhruv Batra
Recent efforts on training visual navigation agents conditioned on language using deep reinforcement learning have been successful in learning policies for two different tasks: learning to follow navigational instructions and embodied question answering.
1 code implementation • ICLR 2020 • Michael Cogswell, Jiasen Lu, Stefan Lee, Devi Parikh, Dhruv Batra
In this paper, we introduce these cultural evolutionary dynamics into language emergence by periodically replacing agents in a population to create a knowledge gap, implicitly inducing cultural transmission of language.
7 code implementations • CVPR 2019 • Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, Marcus Rohrbach
We show that LoRRA outperforms existing state-of-the-art VQA models on our TextVQA dataset.
Ranked #3 on
Visual Question Answering (VQA)
on VizWiz 2018
1 code implementation • 16 Apr 2019 • Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, Stefan Lee
In this work, we develop a technique to produce counterfactual visual explanations.
1 code implementation • CVPR 2019 • Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L. Berg, Dhruv Batra
To address this, we propose a modular architecture composed of a program generator, a controller, a navigator, and a VQA module.
no code implementations • 9 Apr 2019 • Jianwei Yang, Zhile Ren, Mingze Xu, Xinlei Chen, David Crandall, Devi Parikh, Dhruv Batra
Passive visual systems typically fail to recognize objects in the amodal setting where they are heavily occluded.
no code implementations • CVPR 2019 • Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra
To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task -- Embodied Question Answering [1] in photo-realistic environments (Matterport 3D).
12 code implementations • ICCV 2019 • Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra
We present Habitat, a platform for research in embodied artificial intelligence (AI).
Ranked #2 on
PointGoal Navigation
on Gibson PointGoal Navigation
1 code implementation • NAACL 2019 • Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach
Specifically, we construct a dialog grammar that is grounded in the scene graphs of the images from the CLEVR dataset.
no code implementations • 5 Mar 2019 • Nan Rosemary Ke, Amanpreet Singh, Ahmed Touati, Anirudh Goyal, Yoshua Bengio, Devi Parikh, Dhruv Batra
This paper focuses on building a model that reasons about the long-term future and demonstrates how to use this for efficient planning and exploration.
no code implementations • ICLR 2019 • Ramakrishna Vedantam, Karan Desai, Stefan Lee, Marcus Rohrbach, Dhruv Batra, Devi Parikh
We propose a new class of probabilistic neural-symbolic models, that have symbolic functional programs as a latent, stochastic variable.
no code implementations • ICCV 2019 • Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, Devi Parikh
Many vision and language models suffer from poor visual grounding - often falling back on easy-to-learn language priors rather than basing their decisions on visual concepts in the image.
3 code implementations • 10 Feb 2019 • Deshraj Yadav, Rishabh Jain, Harsh Agrawal, Prithvijit Chattopadhyay, Taranjeet Singh, Akash Jain, Shiv Baran Singh, Stefan Lee, Dhruv Batra
We introduce EvalAI, an open source platform for evaluating and comparing machine learning (ML) and artificial intelligence algorithms (AI) at scale.
no code implementations • 4 Feb 2019 • Devendra Singh Chaplot, Lisa Lee, Ruslan Salakhutdinov, Devi Parikh, Dhruv Batra
In this paper, we propose a multitask model capable of jointly learning these multimodal tasks, and transferring knowledge of words and their grounding in visual objects across the tasks.
2 code implementations • 25 Jan 2019 • Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K. Marks, Chiori Hori, Peter Anderson, Stefan Lee, Devi Parikh
We introduce the task of scene-aware dialog.
no code implementations • 16 Jan 2019 • Abhishek Das, Devi Parikh, Dhruv Batra
In a recent workshop paper, Massiceti et al. presented a baseline model and subsequent critique of Visual Dialog (Das et al., CVPR 2017) that raises what we believe to be unfounded concerns about the dataset and evaluation.
no code implementations • 11 Jan 2019 • Koichiro Yoshino, Chiori Hori, Julien Perez, Luis Fernando D'Haro, Lazaros Polymenakos, Chulaka Gunasekara, Walter S. Lasecki, Jonathan K. Kummerfeld, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan, Xiang Gao, Huda Alamari, Tim K. Marks, Devi Parikh, Dhruv Batra
This paper introduces the Seventh Dialog System Technology Challenges (DSTC), which use shared datasets to explore the problem of building dialog systems.
2 code implementations • ICCV 2019 • Harsh Agrawal, Karan Desai, YuFei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson
To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task.
no code implementations • 27 Oct 2018 • Utsav Garg, Viraj Prabhu, Deshraj Yadav, Ram Ramrakhya, Harsh Agrawal, Dhruv Batra
We present Fabrik, an online neural network editor that provides tools to visualize, edit, and share neural networks from within a browser.
no code implementations • ICLR 2019 • Abhishek Das, Théophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Michael Rabbat, Joelle Pineau
We propose a targeted communication architecture for multi-agent reinforcement learning, where agents learn both what messages to send and whom to address them to while performing cooperative tasks in partially-observable environments.
2 code implementations • 26 Oct 2018 • Abhishek Das, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra
We use imitation learning to warm-start policies at each level of the hierarchy, dramatically increasing sample efficiency, followed by reinforcement learning.
no code implementations • 1 Oct 2018 • Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh
Our question generation policy generalizes to new environments and a new pair of eyes, i. e., new visual system.
1 code implementation • ECCV 2018 • Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach
Visual dialog entails answering a series of questions grounded in an image, using dialog history as context.
Ranked #1 on
Common Sense Reasoning
on Visual Dialog v0.9
1 code implementation • ECCV 2018 • Ramprasaath R. Selvaraju, Prithvijit Chattopadhyay, Mohamed Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee
Our approach, which we call Neuron Importance-AwareWeight Transfer (NIWT), learns to map domain knowledge about novel "unseen" classes onto this dictionary of learned concepts and then optimizes for network parameters that can effectively combine these concepts - essentially learning classifiers by discovering and composing learned semantic concepts in deep networks.
3 code implementations • ECCV 2018 • Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh
We propose a novel scene graph generation model called Graph R-CNN, that is both effective and efficient at detecting objects and their relations in images.
Ranked #10 on
Scene Graph Generation
on Visual Genome
9 code implementations • 26 Jul 2018 • Yu Jiang, Vivek Natarajan, Xinlei Chen, Marcus Rohrbach, Dhruv Batra, Devi Parikh
We demonstrate that by making subtle but important changes to the model architecture and the learning rate schedule, fine-tuning image features, and adding data augmentation, we can significantly improve the performance of the up-down model on VQA v2. 0 dataset -- from 65. 67% to 70. 22%.
Ranked #8 on
Visual Question Answering (VQA)
on A-OKVQA
1 code implementation • 9 Jul 2018 • Harm de Vries, Kurt Shuster, Dhruv Batra, Devi Parikh, Jason Weston, Douwe Kiela
We introduce "Talk The Walk", the first large-scale dialogue dataset grounded in action and perception.
2 code implementations • 21 Jun 2018 • Chiori Hori, Huda Alamri, Jue Wang, Gordon Wichern, Takaaki Hori, Anoop Cherian, Tim K. Marks, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Irfan Essa, Dhruv Batra, Devi Parikh
We introduce a new dataset of dialogs about videos of human behaviors.
no code implementations • ICML 2018 • Ashwin Kalyan, Stefan Lee, Anitha Kannan, Dhruv Batra
Many structured prediction problems (particularly in vision and language domains) are ambiguous, with multiple outputs being correct for an input - e. g. there are many ways of describing an image, multiple ways of translating a sentence; however, exhaustively annotating the applicability of all possible outputs is intractable due to exponentially large output spaces (e. g. all English sentences).
4 code implementations • 1 Jun 2018 • Huda Alamri, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Jue Wang, Irfan Essa, Dhruv Batra, Devi Parikh, Anoop Cherian, Tim K. Marks, Chiori Hori
Scene-aware dialog systems will be able to have conversations with users about the objects and events around them.
no code implementations • ICLR 2018 • Ashwin Kalyan, Abhishek Mohta, Oleksandr Polozov, Dhruv Batra, Prateek Jain, Sumit Gulwani
In this work, we propose Neural Guided Deductive Search (NGDS), a hybrid synthesis technique that combines the best of both symbolic logic techniques and statistical models.
1 code implementation • CVPR 2018 • Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh
We introduce a novel framework for image captioning that can produce natural language explicitly grounded in entities that object detectors find in the image.
2 code implementations • ACL 2019 • Jin-Hwa Kim, Nikita Kitaev, Xinlei Chen, Marcus Rohrbach, Byoung-Tak Zhang, Yuandong Tian, Dhruv Batra, Devi Parikh
The game involves two players: a Teller and a Drawer.
1 code implementation • CVPR 2018 • Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi
Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 respectively).
4 code implementations • CVPR 2018 • Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra
We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- where an agent is spawned at a random location in a 3D environment and asked a question ("What color is the car?").
no code implementations • EMNLP 2017 • Mike Lewis, Denis Yarats, Yann Dauphin, Devi Parikh, Dhruv Batra
Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions.
1 code implementation • EMNLP 2017 • Satwik Kottur, Jos{\'e} Moura, Stefan Lee, Dhruv Batra
A number of recent works have proposed techniques for end-to-end learning of communication protocols among cooperative multi-agent populations, and have simultaneously found the emergence of grounded human-interpretable language in the protocols developed by the agents, learned without any human supervision!
no code implementations • 17 Aug 2017 • Prithvijit Chattopadhyay, Deshraj Yadav, Viraj Prabhu, Arjun Chandrasekaran, Abhishek Das, Stefan Lee, Dhruv Batra, Devi Parikh
This suggests a mismatch between benchmarking of AI in isolation and in the context of human-AI teams.
3 code implementations • 26 Jun 2017 • Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra
A number of recent works have proposed techniques for end-to-end learning of communication protocols among cooperative multi-agent populations, and have simultaneously found the emergence of grounded human-interpretable language in the protocols developed by the agents, all learned without any human supervision!
1 code implementation • 16 Jun 2017 • Mike Lewis, Denis Yarats, Yann N. Dauphin, Devi Parikh, Dhruv Batra
Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions.
1 code implementation • NeurIPS 2017 • Jiasen Lu, Anitha Kannan, Jianwei Yang, Devi Parikh, Dhruv Batra
In contrast, discriminative dialog models (D) that are trained to rank a list of candidate human responses outperform their generative counterparts; in terms of automatic metrics, diversity, and informativeness of the responses.
Ranked #8 on
Visual Dialog
on VisDial v0.9 val
no code implementations • CVPR 2017 • Qing Sun, Stefan Lee, Dhruv Batra
We develop the first approximate inference algorithm for 1-Best (and M-Best) decoding in bidirectional neural sequence models by extending Beam Search (BS) to reason about both forward and backward time dependencies.
21 code implementations • EMNLP 2017 • Alexander H. Miller, Will Feng, Adam Fisch, Jiasen Lu, Dhruv Batra, Antoine Bordes, Devi Parikh, Jason Weston
We introduce ParlAI (pronounced "par-lay"), an open-source software platform for dialog research implemented in Python, available at http://parl. ai.
1 code implementation • EMNLP 2017 • Aroma Mahendru, Viraj Prabhu, Akrit Mohapatra, Dhruv Batra, Stefan Lee
In this paper, we make a simple observation that questions about images often contain premises - objects and relationships implied by the question - and that reasoning about premises can help Visual Question Answering (VQA) models respond more intelligently to irrelevant or previously unseen questions.
no code implementations • 26 Apr 2017 • Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh
Finally, we evaluate several existing VQA models under this new setting and show that the performances of these models degrade by a significant amount compared to the original VQA setting.
6 code implementations • ICCV 2017 • Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra
Specifically, we pose a cooperative 'image guessing' game between two agents -- Qbot and Abot -- who communicate in natural language dialog so that Qbot can select an unseen image from a lineup of images.
1 code implementation • 5 Mar 2017 • Jianwei Yang, Anitha Kannan, Dhruv Batra, Devi Parikh
We present LR-GAN: an adversarial image generation model which takes scene structure and context into account.
Ranked #4 on
Image Generation
on Stanford Dogs
7 code implementations • CVPR 2017 • Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, Devi Parikh
We propose to counter these language priors for the task of Visual Question Answering (VQA) and make vision (the V in VQA) matter!
11 code implementations • CVPR 2017 • Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra
We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content.
Ranked #15 on
Visual Dialog
on VisDial v0.9 val
2 code implementations • 22 Nov 2016 • Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra
We propose a technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing input regions that are 'important' for predictions -- or visual explanations.
121 code implementations • ICCV 2017 • Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra
For captioning and VQA, we show that even non-attention based models can localize inputs.
24 code implementations • 7 Oct 2016 • Ashwin K. Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra
We observe that our method consistently outperforms BS and previously proposed techniques for diverse decoding from neural sequence models.
no code implementations • 31 Aug 2016 • Yash Goyal, Akrit Mohapatra, Devi Parikh, Dhruv Batra
In this paper, we address the problem of interpreting Visual Question Answering (VQA) models.
no code implementations • 31 Aug 2016 • C. Lawrence Zitnick, Aishwarya Agrawal, Stanislaw Antol, Margaret Mitchell, Dhruv Batra, Devi Parikh
As machines have become more intelligent, there has been a renewed interest in methods for measuring their intelligence.
no code implementations • NeurIPS 2016 • Stefan Lee, Senthil Purushwalkam, Michael Cogswell, Viresh Ranjan, David Crandall, Dhruv Batra
Many practical perception systems exist within larger processes that include interactions with users or additional components capable of evaluating the quality of predicted solutions.
1 code implementation • EMNLP 2016 • Aishwarya Agrawal, Dhruv Batra, Devi Parikh
Recently, a number of deep-learning based models have been proposed for the task of Visual Question Answering (VQA).
no code implementations • EMNLP 2016 • Harsh Agrawal, Arjun Chandrasekaran, Dhruv Batra, Devi Parikh, Mohit Bansal
Temporal common sense has applications in AI tasks such as QA, multi-document summarization, and human-AI communication.
no code implementations • EMNLP 2016 • Arijit Ray, Gordon Christie, Mohit Bansal, Dhruv Batra, Devi Parikh
We introduce the novel problem of determining the relevance of questions to images in VQA.
no code implementations • 17 Jun 2016 • Abhishek Das, Harsh Agrawal, C. Lawrence Zitnick, Devi Parikh, Dhruv Batra
We conduct large-scale studies on `human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images.
no code implementations • EMNLP 2016 • Abhishek Das, Harsh Agrawal, C. Lawrence Zitnick, Devi Parikh, Dhruv Batra
We conduct large-scale studies on `human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images.
9 code implementations • NeurIPS 2016 • Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh
In addition, our model reasons about the question (and consequently the image via the co-attention mechanism) in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN).
Ranked #3 on
Visual Question Answering (VQA)
on VQA v1 test-std
no code implementations • 3 May 2016 • Timothy J. O'Shea, Latha Pemula, Dhruv Batra, T. Charles Clancy
This attention model allows the network to learn a localization network capable of synchronizing and normalizing a radio signal blindly with zero knowledge of the signals structure based on optimization of the network for classification accuracy, sparse representation, and regularization.
1 code implementation • NAACL 2016 • Ting-Hao, Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell
We introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling.
2 code implementations • CVPR 2016 • Jianwei Yang, Devi Parikh, Dhruv Batra
In this paper, we propose a recurrent framework for Joint Unsupervised LEarning (JULE) of deep representations and image clusters.
Ranked #1 on
Image Clustering
on Coil-20
1 code implementation • CVPR 2017 • Prithvijit Chattopadhyay, Ramakrishna Vedantam, Ramprasaath R. Selvaraju, Dhruv Batra, Devi Parikh
In this work, we build dedicated models for counting designed to tackle the large variance in counts, appearances, and scales of objects found in natural scenes.
Ranked #1 on
Object Counting
on COCO count-test
no code implementations • EMNLP 2016 • Gordon Christie, Ankit Laddha, Aishwarya Agrawal, Stanislaw Antol, Yash Goyal, Kevin Kochersberger, Dhruv Batra
Our approach produces a diverse set of plausible hypotheses for both semantic segmentation and prepositional phrase attachment resolution that are then jointly reranked to select the most consistent pair.
no code implementations • 6 Apr 2016 • Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, James Allen
We created a new corpus of ~50k five-sentence commonsense stories, ROCStories, to enable this evaluation.
no code implementations • CVPR 2016 • Arjun Chandrasekaran, Ashwin K. Vijayakumar, Stanislaw Antol, Mohit Bansal, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh
We collect two datasets of abstract scenes that facilitate the study of humor at both the scene-level and the object-level.
no code implementations • ICCV 2015 • Faruk Ahmed, Dany Tarlow, Dhruv Batra
Currently, there are two dominant approaches: the first approximates the Expected-IoU (EIoU) score as Expected-Intersection-over-Expected-Union (EIoEU); and the second approach is to compute exact EIoU but only over a small set of high-quality candidate solutions.
no code implementations • NeurIPS 2015 • Qing Sun, Dhruv Batra
This paper formulates the search for a set of bounding boxes (as needed in object proposal generation) as a monotone submodular maximization problem over the space of all possible bounding boxes in an image.
no code implementations • 19 Nov 2015 • Michael Cogswell, Faruk Ahmed, Ross Girshick, Larry Zitnick, Dhruv Batra
One major challenge in training Deep Neural Networks is preventing overfitting.
no code implementations • 19 Nov 2015 • Stefan Lee, Senthil Purushwalkam, Michael Cogswell, David Crandall, Dhruv Batra
Convolutional Neural Networks have achieved state-of-the-art performance on a wide range of tasks.
no code implementations • CVPR 2016 • Peng Zhang, Yash Goyal, Douglas Summers-Stay, Dhruv Batra, Devi Parikh
If the concept can be found in the image, the answer to the question is "yes", and otherwise "no".
no code implementations • 12 Jun 2015 • Harsh Agrawal, Clint Solomon Mathialagan, Yash Goyal, Neelima Chavali, Prakriti Banik, Akrit Mohapatra, Ahmed Osman, Dhruv Batra
We are witnessing a proliferation of massive visual data.
no code implementations • CVPR 2016 • Neelima Chavali, Harsh Agrawal, Aroma Mahendru, Dhruv Batra
Finally, we plan to release an easy-to-use toolbox which combines various publicly available implementations of object proposal algorithms which standardizes the proposal generation and evaluation so that new methods can be added and evaluated on different datasets.
20 code implementations • ICCV 2015 • Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh
Given an image and a natural language question about the image, the task is to provide an accurate natural language answer.
no code implementations • CVPR 2015 • Clint Solomon Mathialagan, Andrew C. Gallagher, Dhruv Batra
We address two specific questions -- Given an image, who are the most important individuals in it?
no code implementations • 14 Dec 2014 • Michael Cogswell, Xiao Lin, Senthil Purushwalkam, Dhruv Batra
We present a two-module approach to semantic segmentation that incorporates Convolutional Networks (CNNs) and Graphical Models.
no code implementations • 10 Dec 2014 • Faruk Ahmed, Daniel Tarlow, Dhruv Batra
The result is that we can use loss-aware prediction methodology to improve performance of the highly tuned pipeline system.
no code implementations • NeurIPS 2014 • Adarsh Prasad, Stefanie Jegelka, Dhruv Batra
To cope with the high level of ambiguity faced in domains such as Computer Vision or Natural Language processing, robust prediction methods often search for a diverse set of high-quality candidate solutions or proposals.
no code implementations • CVPR 2014 • Kun Duan, David J. Crandall, Dhruv Batra
Photo-sharing websites have become very popular in the last few years, leading to huge collections of online images.
no code implementations • CVPR 2014 • Vittal Premachandran, Daniel Tarlow, Dhruv Batra
When building vision systems that predict structured objects such as image segmentations or human poses, a crucial concern is performance under task-specific evaluation measures (e. g. Jaccard Index or Average Precision).
no code implementations • 2 Apr 2014 • Jörg H. Kappes, Bjoern Andres, Fred A. Hamprecht, Christoph Schnörr, Sebastian Nowozin, Dhruv Batra, Sungwoong Kim, Bernhard X. Kausler, Thorben Kröger, Jan Lellmann, Nikos Komodakis, Bogdan Savchynskyy, Carsten Rother
However, on new and challenging types of models our findings disagree and suggest that polyhedral methods and integer programming solvers are competitive in terms of runtime and solution quality over a large range of model types.