Search Results for author: Rohit Girdhar

Found 23 papers, 12 papers with code

Detecting Twenty-thousand Classes using Image-level Supervision

1 code implementation7 Jan 2022 Xingyi Zhou, Rohit Girdhar, Armand Joulin, Phillip Krähenbühl, Ishan Misra

For the first time, we train a detector with all the twenty-one-thousand classes of the ImageNet dataset and show that it generalizes to new datasets without fine-tuning.

Image Classification

Mask2Former for Video Instance Segmentation

1 code implementation20 Dec 2021 Bowen Cheng, Anwesa Choudhuri, Ishan Misra, Alexander Kirillov, Rohit Girdhar, Alexander G. Schwing

We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline.

Instance Segmentation Panoptic Segmentation +3

Anticipative Video Transformer

1 code implementation ICCV 2021 Rohit Girdhar, Kristen Grauman

We propose Anticipative Video Transformer (AVT), an end-to-end attention-based video modeling architecture that attends to the previously observed video in order to anticipate future actions.

 Ranked #1 on Action Anticipation on EPIC-KITCHENS-100 (test) (using extra training data)

Action Anticipation

3D Spatial Recognition without Spatially Labeled 3D

no code implementations CVPR 2021 Zhongzheng Ren, Ishan Misra, Alexander G. Schwing, Rohit Girdhar

We introduce WyPR, a Weakly-supervised framework for Point cloud Recognition, requiring only scene-level class tags as supervision.

3D Object Detection Multiple Instance Learning +2

Physical Reasoning Using Dynamics-Aware Models

1 code implementation20 Feb 2021 Eltayeb Ahmed, Anton Bakhtin, Laurens van der Maaten, Rohit Girdhar

A common approach to solving physical reasoning tasks is to train a value learner on example tasks.

Visual Reasoning

Self-Supervised Pretraining of 3D Features on any Point-Cloud

1 code implementation ICCV 2021 Zaiwei Zhang, Rohit Girdhar, Armand Joulin, Ishan Misra

Pretraining on large labeled datasets is a prerequisite to achieve good performance in many computer vision tasks like 2D object recognition, video classification etc.

Object Detection Object Recognition +2

Forward Prediction for Physical Reasoning

1 code implementation18 Jun 2020 Rohit Girdhar, Laura Gustafson, Aaron Adcock, Laurens van der Maaten

Physical reasoning requires forward prediction: the ability to forecast what will happen next given some initial world state.

Visual Reasoning

Video Understanding as Machine Translation

no code implementations12 Jun 2020 Bruno Korbar, Fabio Petroni, Rohit Girdhar, Lorenzo Torresani

With the advent of large-scale multimodal video datasets, especially sequences with audio or transcribed speech, there has been a growing interest in self-supervised learning of video representations.

Machine Translation Metric Learning +5

CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning

no code implementations ICLR 2020 Rohit Girdhar, Deva Ramanan

In this work, we build a video dataset with fully observable and controllable object and scene bias, and which truly requires spatiotemporal understanding in order to be solved.

Video Understanding

Are we asking the right questions in MovieQA?

no code implementations8 Nov 2019 Bhavan Jasani, Rohit Girdhar, Deva Ramanan

Joint vision and language tasks like visual question answering are fascinating because they explore high-level understanding, but at the same time, can be more prone to language biases.

Question Answering Visual Question Answering

MetaPix: Few-Shot Video Retargeting

no code implementations ICLR 2020 Jessica Lee, Deva Ramanan, Rohit Girdhar

We address the task of unsupervised retargeting of human actions from one video to another.

Meta-Learning

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

1 code implementation10 Oct 2019 Rohit Girdhar, Deva Ramanan

In this work, we build a video dataset with fully observable and controllable object and scene bias, and which truly requires spatiotemporal understanding in order to be solved.

Video Understanding

DistInit: Learning Video Representations Without a Single Labeled Video

no code implementations ICCV 2019 Rohit Girdhar, Du Tran, Lorenzo Torresani, Deva Ramanan

In this work, we propose an alternative approach to learning video representations that require no semantically labeled videos and instead leverages the years of effort in collecting and labeling large and clean still-image datasets.

Ranked #62 on Action Recognition on HMDB-51 (using extra training data)

Action Recognition Video Recognition

Binge Watching: Scaling Affordance Learning from Sitcoms

no code implementations CVPR 2017 Xiaolong Wang, Rohit Girdhar, Abhinav Gupta

In this paper, we tackle the challenge of creating one of the biggest dataset for learning affordances.

Attentional Pooling for Action Recognition

1 code implementation NeurIPS 2017 Rohit Girdhar, Deva Ramanan

We introduce a simple yet surprisingly powerful model to incorporate attention in action recognition and human object interaction tasks.

Action Recognition Human-Object Interaction Detection

ActionVLAD: Learning spatio-temporal aggregation for action classification

no code implementations CVPR 2017 Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, Bryan Russell

In this work, we introduce a new video representation for action classification that aggregates local convolutional features across the entire spatio-temporal extent of the video.

Action Classification General Classification +1

Learning a Predictable and Generative Vector Representation for Objects

2 code implementations29 Mar 2016 Rohit Girdhar, David F. Fouhey, Mikel Rodriguez, Abhinav Gupta

The network consists of two components: (a) an autoencoder that ensures the representation is generative; and (b) a convolutional network that ensures the representation is predictable.

Cannot find the paper you are looking for? You can Submit a new open access paper.