Search Results for author: Abhinav Gupta

Found 154 papers, 62 papers with code

Beyond Games: Bringing Exploration to Robots in Real-world

no code implementations ICLR 2019 Deepak Pathak, Dhiraj Gandhi, Abhinav Gupta

But most importantly, we are able to implement an exploration policy on a robot which learns to interact with objects completely from scratch just using data collected via the differentiable exploration module.

Efficient Exploration

Pre-train, Self-train, Distill: A simple recipe for Supersizing 3D Reconstruction

no code implementations7 Apr 2022 Kalyan Vasudev Alwala, Abhinav Gupta, Shubham Tulsiani

Our final 3D reconstruction model is also capable of zero-shot inference on images from unseen object categories and we empirically show that increasing the number of training categories improves the reconstruction quality.

3D Reconstruction Single-View 3D Reconstruction

The Challenges of Continuous Self-Supervised Learning

no code implementations23 Mar 2022 Senthil Purushwalkam, Pedro Morgado, Abhinav Gupta

As a result, SSL holds the promise to learn representations from data in-the-wild, i. e., without the need for finite and static datasets.

Representation Learning Self-Supervised Learning

R3M: A Universal Visual Representation for Robot Manipulation

1 code implementation23 Mar 2022 Suraj Nair, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, Abhinav Gupta

We study how visual representations pre-trained on diverse human video data can enable data-efficient learning of downstream robotic manipulation tasks.

Contrastive Learning

The Unsurprising Effectiveness of Pre-Trained Vision Models for Control

no code implementations7 Mar 2022 Simone Parisi, Aravind Rajeswaran, Senthil Purushwalkam, Abhinav Gupta

In this context, we revisit and study the role of pre-trained visual representations for control, and in particular representations trained on large-scale computer vision datasets.

Interesting Object, Curious Agent: Learning Task-Agnostic Exploration

2 code implementations NeurIPS 2021 Simone Parisi, Victoria Dean, Deepak Pathak, Abhinav Gupta

In this setup, the agent first learns to explore across many environments without any extrinsic goal in a task-agnostic manner.

A Differentiable Recipe for Learning Visual Non-Prehensile Planar Manipulation

1 code implementation9 Nov 2021 Bernardo Aceituno, Alberto Rodriguez, Shubham Tulsiani, Abhinav Gupta, Mustafa Mukadam

Specifying tasks with videos is a powerful technique towards acquiring novel and general robot skills.

ReSkin: versatile, replaceable, lasting tactile skins

no code implementations29 Oct 2021 Raunaq Bhirangi, Tess Hellebrekers, Carmel Majidi, Abhinav Gupta

Soft sensors have continued growing interest in robotics, due to their ability to enable both passive conformal contact from the material properties and active contact data from the sensor properties.

Self-Supervised Learning

Dynamic population-based meta-learning for multi-agent communication with natural language

no code implementations NeurIPS 2021 Abhinav Gupta, Marc Lanctot, Angeliki Lazaridou

In this work, our goal is to train agents that can coordinate with seen, unseen as well as human partners in a multi-agent communication environment involving natural language.

Meta-Learning Text Generation

CORA: Benchmarks, Baselines, and Metrics as a Platform for Continual Reinforcement Learning Agents

2 code implementations19 Oct 2021 Sam Powers, Eliot Xing, Eric Kolve, Roozbeh Mottaghi, Abhinav Gupta

In this work, we present CORA, a platform for Continual Reinforcement Learning Agents that provides benchmarks, baselines, and metrics in a single code package.

NetHack reinforcement-learning

No RL, No Simulation: Learning to Navigate without Navigating

1 code implementation NeurIPS 2021 Meera Hahn, Devendra Chaplot, Shubham Tulsiani, Mustafa Mukadam, James M. Rehg, Abhinav Gupta

Most prior methods for learning navigation policies require access to simulation environments, as they need online policy interaction and rely on ground-truth maps for rewards.


Learning Multi-Objective Curricula for Deep Reinforcement Learning

no code implementations6 Oct 2021 Jikun Kang, Miao Liu, Abhinav Gupta, Chris Pal, Xue Liu, Jie Fu

Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL).


The Functional Correspondence Problem

no code implementations ICCV 2021 Zihang Lai, Senthil Purushwalkam, Abhinav Gupta

For example, what are the correspondences between a bottle and shoe for the task of pounding or the task of pouring.

Wanderlust: Online Continual Object Detection in the Real World

1 code implementation ICCV 2021 Jianren Wang, Xin Wang, Yue Shang-Guan, Abhinav Gupta

To bridge the gap, we present a new online continual object detection benchmark with an egocentric video dataset, Objects Around Krishna (OAK).

Continual Learning Object Detection

Hierarchical Neural Dynamic Policies

no code implementations12 Jul 2021 Shikhar Bahl, Abhinav Gupta, Deepak Pathak

We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input.

Digital-Twin-Based Improvements to Diagnosis, Prognosis, Strategy Assessment, and Discrepancy Checking in a Nearly Autonomous Management and Control System

no code implementations23 May 2021 Linyu Lin, Paridhi Athe, Pascal Rouxelin, Maria Avramova, Abhinav Gupta, Robert Youngblood, Nam Dinh

The Nearly Autonomous Management and Control System (NAMAC) is a comprehensive control system that assists plant operations by furnishing control recommendations to operators in a broad class of situations.

Decision Making

PixelTransformer: Sample Conditioned Signal Generation

no code implementations29 Mar 2021 Shubham Tulsiani, Abhinav Gupta

We propose a generative model that can infer a distribution for the underlying spatial signal conditioned on sparse samples e. g. plausible images given a few observed pixels.

Learn-to-Race: A Multimodal Control Environment for Autonomous Racing

2 code implementations ICCV 2021 James Herman, Jonathan Francis, Siddha Ganju, Bingqing Chen, Anirudh Koul, Abhinav Gupta, Alexey Skabelkin, Ivan Zhukov, Max Kumskoy, Eric Nyberg

Existing research on autonomous driving primarily focuses on urban driving, which is insufficient for characterising the complex driving behaviour underlying high-speed racing.

Autonomous Driving Trajectory Prediction

Meta Learning for Multi-agent Communication

no code implementations ICLR Workshop Learning_to_Learn 2021 Abhinav Gupta, Angeliki Lazaridou, Marc Lanctot

Recent works have shown remarkable progress in training artificial agents to understand natural language but are focused on using large amounts of raw data involving huge compute requirements.

Meta-Learning Meta Reinforcement Learning

Shelf-Supervised Mesh Prediction in the Wild

1 code implementation CVPR 2021 Yufei Ye, Shubham Tulsiani, Abhinav Gupta

We first infer a volumetric representation in a canonical frame, along with the camera pose.


droidlet: modular, heterogenous, multi-modal agents

1 code implementation25 Jan 2021 Anurag Pratik, Soumith Chintala, Kavya Srinet, Dhiraj Gandhi, Rebecca Qian, Yuxuan Sun, Ryan Drew, Sara Elkafrawy, Anoushka Tiwari, Tucker Hart, Mary Williamson, Abhinav Gupta, Arthur Szlam

In recent years, there have been significant advances in building end-to-end Machine Learning (ML) systems that learn at scale.

Where2Act: From Pixels to Actions for Articulated 3D Objects

1 code implementation ICCV 2021 Kaichun Mo, Leonidas Guibas, Mustafa Mukadam, Abhinav Gupta, Shubham Tulsiani

One of the fundamental goals of visual perception is to allow agents to meaningfully interact with their environment.

Self-Activating Neural Ensembles for Continual Reinforcement Learning

no code implementations1 Jan 2021 Sam Powers, Abhinav Gupta

The ability for an agent to continuously learn new skills without catastrophically forgetting existing knowledge is of critical importance for the development of generally intelligent agents.


Neural Closure Models for Dynamical Systems

1 code implementation27 Dec 2020 Abhinav Gupta, Pierre F. J. Lermusiaux

The new "neural closure models" augment low-fidelity models with neural delay differential equations (nDDEs), motivated by the Mori-Zwanzig formulation and the inherent delays in complex dynamical systems.

A 55-line code for large-scale parallel topology optimization in 2D and 3D

1 code implementation15 Dec 2020 Abhinav Gupta, Rajib Chowdhury, Anupam Chakrabarti, Timon Rabczuk

This paper presents a 55-line code written in python for 2D and 3D topology optimization (TO) based on the open-source finite element computing software (FEniCS), equipped with various finite element tools and solvers.

Mathematical Software Computational Engineering, Finance, and Science Optimization and Control

Neural Dynamic Policies for End-to-End Sensorimotor Learning

no code implementations NeurIPS 2020 Shikhar Bahl, Mustafa Mukadam, Abhinav Gupta, Deepak Pathak

We show that NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks for both imitation and reinforcement learning setups.

Imitation Learning reinforcement-learning

Transformers for One-Shot Visual Imitation

no code implementations11 Nov 2020 Sudeep Dasari, Abhinav Gupta

Humans are able to seamlessly visually imitate others, by inferring their intentions and using past experience to achieve the same end goal.

Imitation Learning

Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning

1 code implementation ICLR 2021 Valerie Chen, Abhinav Gupta, Kenneth Marino

We also find that incorporating natural language allows the model to generalize to unseen tasks in a zero-shot setting and to learn quickly from a few demonstrations.

Multi-Task Learning reinforcement-learning

Visual Imitation Made Easy

no code implementations11 Aug 2020 Sarah Young, Dhiraj Gandhi, Shubham Tulsiani, Abhinav Gupta, Pieter Abbeel, Lerrel Pinto

We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.

Imitation Learning

Implicit Mesh Reconstruction from Unannotated Image Collections

no code implementations16 Jul 2020 Shubham Tulsiani, Nilesh Kulkarni, Abhinav Gupta

We present an approach to infer the 3D shape, texture, and camera pose for an object from a single RGB image, using only category-level image collections with foreground masks as supervision.

Aligning Videos in Space and Time

no code implementations ECCV 2020 Senthil Purushwalkam, Tian Ye, Saurabh Gupta, Abhinav Gupta

During training, given a pair of videos, we compute cycles that connect patches in a given frame in the first video by matching through frames in the second video.


Swoosh! Rattle! Thump! -- Actions that Sound

no code implementations3 Jul 2020 Dhiraj Gandhi, Abhinav Gupta, Lerrel Pinto

In this work, we perform the first large-scale study of the interactions between sound and robotic action.

Compositionality and Capacity in Emergent Languages

no code implementations WS 2020 Abhinav Gupta, Cinjon Resnick, Jakob Foerster, Andrew Dai, Kyunghyun Cho

Our hypothesis is that there should be a specific range of model capacity and channel bandwidth that induces compositional structure in the resulting language and consequently encourages systematic generalization.

Systematic Generalization

Object Goal Navigation using Goal-Oriented Semantic Exploration

2 code implementations NeurIPS 2020 Devendra Singh Chaplot, Dhiraj Gandhi, Abhinav Gupta, Ruslan Salakhutdinov

We propose a modular system called, `Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently based on the goal object category.

Robot Navigation

Learning Robot Skills with Temporal Variational Inference

no code implementations ICML 2020 Tanmay Shankar, Abhinav Gupta

In this paper, we address the discovery of robotic options from demonstrations in an unsupervised manner.

Frame Variational Inference

Semantic Curiosity for Active Visual Learning

no code implementations ECCV 2020 Devendra Singh Chaplot, Helen Jiang, Saurabh Gupta, Abhinav Gupta

Instead, we explore a self-supervised approach for training our exploration policy by introducing a notion of semantic curiosity.

Object Detection

Neural Topological SLAM for Visual Navigation

no code implementations CVPR 2020 Devendra Singh Chaplot, Ruslan Salakhutdinov, Abhinav Gupta, Saurabh Gupta

This paper studies the problem of image-goal navigation which involves navigating to the location indicated by a goal image in a novel previously unseen environment.

Visual Navigation

Learning to Explore using Active Neural SLAM

2 code implementations ICLR 2020 Devendra Singh Chaplot, Dhiraj Gandhi, Saurabh Gupta, Abhinav Gupta, Ruslan Salakhutdinov

The use of learning provides flexibility with respect to input modalities (in the SLAM module), leverages structural regularities of the world (in global policies), and provides robustness to errors in state estimation (in local policies).

PointGoal Navigation

Articulation-aware Canonical Surface Mapping

1 code implementation CVPR 2020 Nilesh Kulkarni, Abhinav Gupta, David F. Fouhey, Shubham Tulsiani

We tackle the tasks of: 1) predicting a Canonical Surface Mapping (CSM) that indicates the mapping from 2D pixels to corresponding points on a canonical template shape, and 2) inferring the articulation and pose of the template corresponding to the input image.

Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

1 code implementation CVPR 2020 Kiana Ehsani, Shubham Tulsiani, Saurabh Gupta, Ali Farhadi, Abhinav Gupta

Our quantitative and qualitative results show that (a) we can predict meaningful forces from videos whose effects lead to accurate imitation of the motions observed, (b) by jointly optimizing for contact point and force prediction, we can improve the performance on both tasks in comparison to independent training, and (c) we can learn a representation from this model that generalizes to novel objects using few shot examples.

Human-Object Interaction Detection

Beyond the Camera: Neural Networks in World Coordinates

no code implementations12 Mar 2020 Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Karteek Alahari

Eye movement and strategic placement of the visual field onto the retina, gives animals increased resolution of the scene and suppresses distracting information.

Action Recognition Frame +2

Intrinsic Motivation for Encouraging Synergistic Behavior

no code implementations ICLR 2020 Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, Abhinav Gupta

Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own.

On the interaction between supervision and self-play in emergent communication

1 code implementation ICLR 2020 Ryan Lowe, Abhinav Gupta, Jakob Foerster, Douwe Kiela, Joelle Pineau

A promising approach for teaching artificial agents to use natural language involves using human-in-the-loop training.

Structural Inductive Biases in Emergent Communication

no code implementations4 Feb 2020 Agnieszka Słowik, Abhinav Gupta, William L. Hamilton, Mateja Jamnik, Sean B. Holden, Christopher Pal

In order to communicate, humans flatten a complex representation of ideas and their attributes into a single word or a sentence.

Representation Learning

Towards Graph Representation Learning in Emergent Communication

no code implementations24 Jan 2020 Agnieszka Słowik, Abhinav Gupta, William L. Hamilton, Mateja Jamnik, Sean B. Holden

Recent findings in neuroscience suggest that the human brain represents information in a geometric structure (for instance, through conceptual spaces).

Graph Representation Learning

Discovering Motor Programs by Recomposing Demonstrations

no code implementations ICLR 2020 Tanmay Shankar, Shubham Tulsiani, Lerrel Pinto, Abhinav Gupta

In this paper, we present an approach to learn recomposable motor primitives across large-scale and diverse manipulation demonstrations.

Hierarchical Reinforcement Learning

ClusterFit: Improving Generalization of Visual Representations

1 code implementation CVPR 2020 Xueting Yan, Ishan Misra, Abhinav Gupta, Deepti Ghadiyaram, Dhruv Mahajan

Pre-training convolutional neural networks with weakly-supervised and self-supervised strategies is becoming increasingly popular for several computer vision tasks.

Action Classification Image Classification +1

Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller

1 code implementation NeurIPS 2019 Pratyusha Sharma, Deepak Pathak, Abhinav Gupta

We study a generalized setup for learning from demonstration to build an agent that can manipulate novel objects in unseen scenarios by looking at only a single video of human demonstration from a third-person perspective.

Imitation Learning

Seeded self-play for language learning

no code implementations WS 2019 Abhinav Gupta, Ryan Lowe, Jakob Foerster, Douwe Kiela, Joelle Pineau

Once the meta-learning agent is able to quickly adapt to each population of agents, it can be deployed in new populations, including populations speaking human language.

Imitation Learning Meta-Learning

Capacity, Bandwidth, and Compositionality in Emergent Language Learning

1 code implementation24 Oct 2019 Cinjon Resnick, Abhinav Gupta, Jakob Foerster, Andrew M. Dai, Kyunghyun Cho

In this paper, we investigate the learning biases that affect the efficacy and compositionality of emergent languages.

Systematic Generalization

Object-centric Forward Modeling for Model Predictive Control

1 code implementation8 Oct 2019 Yufei Ye, Dhiraj Gandhi, Abhinav Gupta, Shubham Tulsiani

We present an approach to learn an object-centric forward model, and show that this allows us to plan for sequences of actions to achieve distant desired goals.

Efficient Bimanual Manipulation Using Learned Task Schemas

no code implementations30 Sep 2019 Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, Abhinav Gupta

Our insight is that for many tasks, the learning process can be decomposed into learning a state-independent task schema (a sequence of skills to execute) and a policy to choose the parameterizations of the skills in a state-dependent manner.

Swoosh! Rattle! Thump! - Actions that Sound

no code implementations25 Sep 2019 Dhiraj Gandhi, Abhinav Gupta, Lerrel Pinto

In this work, we perform the first large-scale study of the interactions between sound and robotic action.

Agent as Scientist: Learning to Verify Hypotheses

no code implementations25 Sep 2019 Kenneth Marino, Rob Fergus, Arthur Szlam, Abhinav Gupta

In order to train the agents, we exploit the underlying structure in the majority of hypotheses -- they can be formulated as triplets (pre-condition, action sequence, post-condition).


Dynamics-aware Embeddings

2 code implementations ICLR 2020 William Whitney, Rajat Agarwal, Kyunghyun Cho, Abhinav Gupta

In this paper we consider self-supervised representation learning to improve sample efficiency in reinforcement learning (RL).

Continuous Control reinforcement-learning +1

Environment Probing Interaction Policies

1 code implementation ICLR 2019 Wenxuan Zhou, Lerrel Pinto, Abhinav Gupta

A key challenge in reinforcement learning (RL) is environment generalization: a policy trained to solve a task in one environment often fails to solve the same task in a slightly different test environment.

Task-Driven Modular Networks for Zero-Shot Compositional Learning

1 code implementation ICCV 2019 Senthil Purushwalkam, Maximilian Nickel, Abhinav Gupta, Marc'Aurelio Ranzato

When extending the evaluation to the generalized setting which accounts also for pairs seen during training, we discover that naive baseline methods perform similarly or better than current approaches.

Zero-Shot Learning

Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies

no code implementations ICLR 2019 Kenneth Marino, Abhinav Gupta, Rob Fergus, Arthur Szlam

The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time.

Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces

no code implementations ICLR 2019 Senthil Purushwalkam, Abhinav Gupta, Danny M. Kaufman, Bryan Russell

To achieve our results, we introduce the Bounce Dataset comprising 5K RGB-D videos of bouncing trajectories of a foam ball to probe surfaces of varying shapes and materials in everyday scenes including homes and offices.

Beyond Grids: Learning Graph Representations for Visual Recognition

no code implementations NeurIPS 2018 Yin Li, Abhinav Gupta

Our method further learns to propagate information across all vertices on the graph, and is able to project the learned graph representation back into 2D grids.

Instance Segmentation Object Detection +1

Hardware Conditioned Policies for Multi-Robot Transfer Learning

1 code implementation NeurIPS 2018 Tao Chen, Adithyavairavan Murali, Abhinav Gupta

In tasks where knowing the agent dynamics is important for success, we learn an embedding for robot hardware and show that policies conditioned on the encoding of hardware tend to generalize and transfer well.

Industrial Robots Transfer Reinforcement Learning

Multiple Interactions Made Easy (MIME): Large Scale Demonstrations Data for Imitation

no code implementations16 Oct 2018 Pratyusha Sharma, Lekha Mohan, Lerrel Pinto, Abhinav Gupta

In order to make progress and capture the space of manipulation, we would need to collect a large-scale dataset of diverse tasks such as pouring, opening bottles, stacking objects etc.

Trajectory Prediction

Visual Semantic Navigation using Scene Priors

1 code implementation ICLR 2019 Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, Roozbeh Mottaghi

Do we use the semantic/functional priors we have built over years to efficiently search and navigate?


BOLD5000: A public fMRI dataset of 5000 images

3 code implementations5 Sep 2018 Nadine Chang, John A. Pyles, Abhinav Gupta, Michael J. Tarr, Elissa M. Aminoff

Vision science, particularly machine vision, has been revolutionized by introducing large-scale image datasets and statistical learning approaches.

Scene Understanding

Interpretable Intuitive Physics Model

no code implementations ECCV 2018 Tian Ye, Xiaolong Wang, James Davidson, Abhinav Gupta

In order to demonstrate that our system models these underlying physical properties, we train our model on collisions of different shapes (cube, cone, cylinder, spheres etc.)

Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias

no code implementations NeurIPS 2018 Abhinav Gupta, Adithyavairavan Murali, Dhiraj Gandhi, Lerrel Pinto

The models trained with our home dataset showed a marked improvement of 43. 7% over a baseline model trained with data collected in lab.

Robotic Grasping

Videos as Space-Time Region Graphs

no code implementations ECCV 2018 Xiaolong Wang, Abhinav Gupta

These nodes are connected by two types of relations: (i) similarity relations capturing the long range dependencies between correlated objects and (ii) spatial-temporal relations capturing the interactions between nearby objects.

Ranked #27 on Action Classification on Charades (using extra training data)

Action Classification Action Recognition

Learning to Grasp Without Seeing

no code implementations10 May 2018 Adithyavairavan Murali, Yin Li, Dhiraj Gandhi, Abhinav Gupta

We believe this is the first attempt at learning to grasp with only tactile sensing and without any prior object knowledge.

Object Localization

Actor and Observer: Joint Modeling of First and Third-Person Videos

1 code implementation CVPR 2018 Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari

Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor).

Action Recognition

Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos

no code implementations25 Apr 2018 Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari

In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68, 536 activity instances in 68. 8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available.

General Classification Video Classification +1

Binge Watching: Scaling Affordance Learning from Sitcoms

no code implementations CVPR 2017 Xiaolong Wang, Rohit Girdhar, Abhinav Gupta

In this paper, we tackle the challenge of creating one of the biggest dataset for learning affordances.

Iterative Visual Reasoning Beyond Convolutions

no code implementations CVPR 2018 Xinlei Chen, Li-Jia Li, Li Fei-Fei, Abhinav Gupta

The framework consists of two core modules: a local module that uses spatial memory to store previous beliefs with parallel updates; and a global graph-reasoning module.

Visual Reasoning

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

3 code implementations CVPR 2018 Xiaolong Wang, Yufei Ye, Abhinav Gupta

Given a learned knowledge graph (KG), our approach takes as input semantic embeddings for each node (representing visual category).

Knowledge Graphs Zero-Shot Learning

Learning by Asking Questions

no code implementations CVPR 2018 Ishan Misra, Ross Girshick, Rob Fergus, Martial Hebert, Abhinav Gupta, Laurens van der Maaten

We also show that our model asks questions that generalize to state-of-the-art VQA models and to novel test time distributions.

Question Answering Visual Question Answering +1

Sentiment Classification using Images and Label Embeddings

no code implementations3 Dec 2017 Laura Graesser, Abhinav Gupta, Lakshay Sharma, Evelina Bakhturina

In this project we analysed how much semantic information images carry, and how much value image data can add to sentiment analysis of the text associated with the images.

Classification General Classification +1

Visual Features for Context-Aware Speech Recognition

no code implementations1 Dec 2017 Abhinav Gupta, Yajie Miao, Leonardo Neves, Florian Metze

We are working on a corpus of "how-to" videos from the web, and the idea is that an object that can be seen ("car"), or a scene that is being detected ("kitchen") can be used to condition both models on the "context" of the recording, thereby reducing perplexity and improving transcription.

Speech Recognition

Non-local Neural Networks

25 code implementations CVPR 2018 Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.

Ranked #7 on Action Classification on Toyota Smarthome dataset (using extra training data)

Action Classification Action Recognition +3

Learning 6-DOF Grasping Interaction via Deep Geometry-aware 3D Representations

1 code implementation24 Aug 2017 Xinchen Yan, Jasmine Hsu, Mohi Khansari, Yunfei Bai, Arkanath Pathak, Abhinav Gupta, James Davidson, Honglak Lee

Our contributions are fourfold: (1) To best of our knowledge, we are presenting for the first time a method to learn a 6-DOF grasping net from RGBD input; (2) We build a grasping dataset from demonstrations in virtual reality with rich sensory and interaction annotations.

3D Geometry Prediction 3D Shape Modeling +1

Transitive Invariance for Self-supervised Visual Representation Learning

no code implementations ICCV 2017 Xiaolong Wang, Kaiming He, Abhinav Gupta

The objects are connected by two types of edges which correspond to two types of invariance: "different instances but a similar viewpoint and category" and "different viewpoints of the same instance".

Multi-Task Learning Object Detection +2

What Actions are Needed for Understanding Human Actions in Videos?

1 code implementation ICCV 2017 Gunnar A. Sigurdsson, Olga Russakovsky, Abhinav Gupta

We present the many kinds of information that will be needed to achieve substantial gains in activity understanding: objects, verbs, intent, and sequential reasoning.

CASSL: Curriculum Accelerated Self-Supervised Learning

no code implementations4 Aug 2017 Adithyavairavan Murali, Lerrel Pinto, Dhiraj Gandhi, Abhinav Gupta

Recent self-supervised learning approaches focus on using a few thousand data points to learn policies for high-level, low-dimensional action spaces.

Self-Supervised Learning

Combining Keystroke Dynamics and Face Recognition for User Verification

no code implementations2 Aug 2017 Abhinav Gupta, Agrim Khanna, Anmol Jagetia, Devansh Sharma, Sanchit Alekh, Vaibhav Choudhary

Keystroke Dynamics is a novel Biometric Technique; it is not only unobtrusive, but also transparent and inexpensive.

Face Recognition

Temporal Dynamic Graph LSTM for Action-driven Video Object Detection

no code implementations ICCV 2017 Yuan Yuan, Xiaodan Liang, Xiaolong Wang, Dit-yan Yeung, Abhinav Gupta

A common issue, however, is that objects of interest that are not involved in human actions are often absent in global action descriptions known as "missing label".

Object Recognition Video Object Detection +1

From Red Wine to Red Tomato: Composition With Context

no code implementations CVPR 2017 Ishan Misra, Abhinav Gupta, Martial Hebert

In this paper, we present a simple method that respects contextuality in order to compose classifiers of known visual concepts.

Visual Semantic Planning using Deep Successor Representations

no code implementations ICCV 2017 Yuke Zhu, Daniel Gordon, Eric Kolve, Dieter Fox, Li Fei-Fei, Abhinav Gupta, Roozbeh Mottaghi, Ali Farhadi

A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world.

Imitation Learning reinforcement-learning

WebVision Challenge: Visual Learning and Understanding With Web Data

no code implementations16 May 2017 Wen Li, Li-Min Wang, Wei Li, Eirikur Agustsson, Jesse Berent, Abhinav Gupta, Rahul Sukthankar, Luc van Gool

The 2017 WebVision challenge consists of two tracks, the image classification task on WebVision test set, and the transfer learning task on PASCAL VOC 2012 dataset.

Image Classification Transfer Learning

The Pose Knows: Video Forecasting by Generating Pose Futures

1 code implementation ICCV 2017 Jacob Walker, Kenneth Marino, Abhinav Gupta, Martial Hebert

First we explicitly model the high level structure of active objects in the scene---humans---and use a VAE to model the possible future movements of humans in the pose space.

Video Prediction

Spatial Memory for Context Reasoning in Object Detection

35 code implementations ICCV 2017 Xinlei Chen, Abhinav Gupta

On the other hand, modeling object-object relationships requires {\bf spatial} reasoning -- not only do we need a memory to store the spatial layout, but also a effective reasoning module to extract spatial patterns.

Object Detection

ActionVLAD: Learning spatio-temporal aggregation for action classification

no code implementations CVPR 2017 Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, Bryan Russell

In this work, we introduce a new video representation for action classification that aggregates local convolutional features across the entire spatio-temporal extent of the video.

Action Classification Classification +2

Robust Adversarial Reinforcement Learning

6 code implementations ICML 2017 Lerrel Pinto, James Davidson, Rahul Sukthankar, Abhinav Gupta

Deep neural networks coupled with fast simulation and improved computation have led to recent successes in the field of reinforcement learning (RL).


PixelNet: Representation of the pixels, by the pixels, and for the pixels

1 code implementation21 Feb 2017 Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

We explore design principles for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation.

Edge Detection Semantic Segmentation

An Implementation of Faster RCNN with Study for Region Sampling

46 code implementations7 Feb 2017 Xinlei Chen, Abhinav Gupta

We adapted the join-training scheme of Faster RCNN framework from Caffe to TensorFlow as a baseline implementation for object detection.

General Classification Object Detection

From Images to 3D Shape Attributes

no code implementations20 Dec 2016 David F. Fouhey, Abhinav Gupta, Andrew Zisserman

Our first objective is to infer these 3D shape attributes from a single image.

The More You Know: Using Knowledge Graphs for Image Classification

no code implementations CVPR 2017 Kenneth Marino, Ruslan Salakhutdinov, Abhinav Gupta

One characteristic that sets humans apart from modern learning-based computer vision algorithms is the ability to acquire knowledge about the world and use that knowledge to reason about the visual world.

Classification General Classification +3

Supervision via Competition: Robot Adversaries for Learning Tasks

1 code implementation5 Oct 2016 Lerrel Pinto, James Davidson, Abhinav Gupta

Due to large number of experiences required for training, most of these approaches use a self-supervised paradigm: using sensors to measure success/failure.

Learning to Push by Grasping: Using multiple tasks for effective learning

no code implementations28 Sep 2016 Lerrel Pinto, Abhinav Gupta

The argument of the difficulty in scalability to multiple tasks is well founded, since training these tasks often require hundreds or thousands of examples.

Multi-Task Learning

PixelNet: Towards a General Pixel-level Architecture

no code implementations21 Sep 2016 Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

We explore architectures for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation.

Edge Detection Semantic Segmentation

Pose from Action: Unsupervised Learning of Pose Features based on Motion

no code implementations18 Sep 2016 Senthil Purushwalkam, Abhinav Gupta

We propose an unsupervised method to learn pose features from videos that exploits a signal which is complementary to appearance and can be used as supervision: motion.

Action Recognition Action Recognition In Videos +3

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

2 code implementations16 Sep 2016 Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi

To address the second issue, we propose AI2-THOR framework, which provides an environment with high-quality 3D scenes and physics engine.

3D Reconstruction Feature Engineering +2

Much Ado About Time: Exhaustive Annotation of Temporal Data

no code implementations25 Jul 2016 Gunnar A. Sigurdsson, Olga Russakovsky, Ali Farhadi, Ivan Laptev, Abhinav Gupta

We conclude that the optimal strategy is to ask as many questions as possible in a HIT (up to 52 binary questions after watching a 30-second video clip in our experiments).

An Uncertain Future: Forecasting from Static Images using Variational Autoencoders

no code implementations25 Jun 2016 Jacob Walker, Carl Doersch, Abhinav Gupta, Martial Hebert

We show that our method is able to successfully predict events in a wide variety of scenes and can produce multiple different predictions when the future is ambiguous.

3D Shape Attributes

no code implementations CVPR 2016 David F. Fouhey, Abhinav Gupta, Andrew Zisserman

In this paper we investigate 3D attributes as a means to understand the shape of an object in a single image.

Training Region-based Object Detectors with Online Hard Example Mining

5 code implementations CVPR 2016 Abhinav Shrivastava, Abhinav Gupta, Ross Girshick

Our motivation is the same as it has always been -- detection datasets contain an overwhelming number of easy examples and a small number of hard examples.

Object Detection

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

no code implementations6 Apr 2016 Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, Abhinav Gupta

Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects.

Action Recognition

Marr Revisited: 2D-3D Alignment via Surface Normal Prediction

no code implementations CVPR 2016 Aayush Bansal, Bryan Russell, Abhinav Gupta

We introduce an approach that leverages surface normal predictions, along with appearance cues, to retrieve 3D models for objects depicted in 2D still images from a large CAD object library.

Pose Prediction

The Curious Robot: Learning Visual Representations via Physical Interactions

no code implementations5 Apr 2016 Lerrel Pinto, Dhiraj Gandhi, Yuanfeng Han, Yong-Lae Park, Abhinav Gupta

We argue that biological agents use physical interactions with the world to learn visual representations unlike current vision systems which just use passive observations (images and videos downloaded from web).

Image Classification Representation Learning

Learning a Predictable and Generative Vector Representation for Objects

2 code implementations29 Mar 2016 Rohit Girdhar, David F. Fouhey, Mikel Rodriguez, Abhinav Gupta

The network consists of two components: (a) an autoencoder that ensures the representation is generative; and (b) a convolutional network that ensures the representation is predictable.

"What happens if..." Learning to Predict the Effect of Forces in Images

no code implementations17 Mar 2016 Roozbeh Mottaghi, Mohammad Rastegari, Abhinav Gupta, Ali Farhadi

To build a dataset of forces in scenes, we reconstructed all images in SUN RGB-D dataset in a physics simulator to estimate the physical movements of objects caused by external forces applied to them.

Generative Image Modeling using Style and Structure Adversarial Networks

no code implementations17 Mar 2016 Xiaolong Wang, Abhinav Gupta

Current generative frameworks use end-to-end learning and generate images by sampling from uniform noise distribution.

Image Generation

Actions ~ Transformations

1 code implementation CVPR 2016 Xiaolong Wang, Ali Farhadi, Abhinav Gupta

In this paper, we propose a novel representation for actions by modeling an action as a transformation which changes the state of the environment before the action happens (precondition) to the state after the action (effect).

Action Recognition

Supersizing Self-supervision: Learning to Grasp from 50K Tries and 700 Robot Hours

no code implementations23 Sep 2015 Lerrel Pinto, Abhinav Gupta

Our experiments clearly show the benefit of using large-scale datasets (and multi-stage training) for the task of grasping.

Sense Discovery via Co-Clustering on Images and Text

no code implementations CVPR 2015 Xinlei Chen, Alan Ritter, Abhinav Gupta, Tom Mitchell

We present a co-clustering framework that can be used to discover multiple semantic and visual senses of a given Noun Phrase (NP).

Unsupervised Visual Representation Learning by Context Prediction

2 code implementations ICCV 2015 Carl Doersch, Abhinav Gupta, Alexei A. Efros

This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation.

Representation Learning

Webly Supervised Learning of Convolutional Networks

no code implementations ICCV 2015 Xinlei Chen, Abhinav Gupta

Specifically inspired by curriculum learning, we present a two-step approach for CNN training.

Image Retrieval

In Defense of the Direct Perception of Affordances

no code implementations5 May 2015 David F. Fouhey, Xiaolong Wang, Abhinav Gupta

The field of functional recognition or affordance estimation from images has seen a revival in recent years.

Dense Optical Flow Prediction from a Static Image

no code implementations ICCV 2015 Jacob Walker, Abhinav Gupta, Martial Hebert

Because our CNN model makes no assumptions about the underlying scene, it can predict future optical flow on a diverse set of scenarios.

motion prediction Optical Flow Estimation

Mid-level Elements for Object Detection

no code implementations27 Apr 2015 Aayush Bansal, Abhinav Shrivastava, Carl Doersch, Abhinav Gupta

Building on the success of recent discriminative mid-level elements, we propose a surprisingly simple approach for object detection which performs comparable to the current state-of-the-art approaches on PASCAL VOC comp-3 detection challenge (no external data).

Object Detection

Transferring Rich Feature Hierarchies for Robust Visual Tracking

no code implementations19 Jan 2015 Naiyan Wang, Siyi Li, Abhinav Gupta, Dit-yan Yeung

To fit the characteristics of object tracking, we first pre-train the CNN to recognize what is an object, and then propose to generate a probability map instead of producing a simple class label.

Frame Image Classification +3

Designing Deep Networks for Surface Normal Estimation

no code implementations CVPR 2015 Xiaolong Wang, David F. Fouhey, Abhinav Gupta

We show by incorporating several constraints (man-made, manhattan world) and meaningful intermediate representations (room layout, edge labels) in the architecture leads to state of the art performance on surface normal estimation.

Scene Understanding

Patch to the Future: Unsupervised Visual Prediction

no code implementations CVPR 2014 Jacob Walker, Abhinav Gupta, Martial Hebert

In this paper we present a conceptually simple but surprisingly powerful method for visual prediction which combines the effectiveness of mid-level visual elements with temporal modeling.

Enriching Visual Knowledge Bases via Object Discovery and Segmentation

no code implementations CVPR 2014 Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta

In this paper, we propose to enrich these knowledge bases by automatically discovering objects and their segmentations from noisy Internet images.

Object Discovery

Mid-level Visual Element Discovery as Discriminative Mode Seeking

no code implementations NeurIPS 2013 Carl Doersch, Abhinav Gupta, Alexei A. Efros

We also propose the Purity-Coverage plot as a principled way of experimentally analyzing and evaluating different visual discovery approaches, and compare our method against prior work on the Paris Street View dataset.

Scene Classification

A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context

no code implementations NeurIPS 2008 Abhinav Gupta, Jianbo Shi, Larry S. Davis

Using an analogous reasoning, we present an approach that combines bag-of-words and spatial models to perform semantic and syntactic analysis for recognition of an object based on its internal appearance and its context.

Object Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.