no code implementations • 23 Dec 2021 • Eduard Gabriel Bazavan, Andrei Zanfir, Mihai Zanfir, William T. Freeman, Rahul Sukthankar, Cristian Sminchisescu
We combine a hundred diverse individuals of varying ages, gender, proportions, and ethnicity, with hundreds of motions and scenes, as well as parametric variations in body shape (for a total of 1, 600 different humans), in order to generate an initial dataset of over 1 million frames.
Ranked #1 on
3D Human Pose Estimation
on HSPACE
1 code implementation • ICLR 2022 • Chengzhi Mao, Lu Jiang, Mostafa Dehghani, Carl Vondrick, Rahul Sukthankar, Irfan Essa
Vision Transformer (ViT) is emerging as the state-of-the-art architecture for image recognition.
Ranked #3 on
Domain Generalization
on Stylized-ImageNet
no code implementations • ICCV 2021 • Mihai Zanfir, Andrei Zanfir, Eduard Gabriel Bazavan, William T. Freeman, Rahul Sukthankar, Cristian Sminchisescu
We present THUNDR, a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people, given monocular RGB images.
Ranked #185 on
3D Human Pose Estimation
on Human3.6M
1 code implementation • 2 Oct 2020 • Marius Leordeanu, Mihai Pirvu, Dragos Costea, Alina Marcu, Emil Slusanschi, Rahul Sukthankar
The unsupervised learning process is repeated over several generations, in which each edge becomes a "student" and also part of different ensemble "teachers" for training other students.
no code implementations • CVPR 2021 • Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Zanfir, William T. Freeman, Rahul Sukthankar, Cristian Sminchisescu
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image.
Ranked #27 on
3D Human Pose Estimation
on 3DPW
(MPJPE metric)
1 code implementation • 3 Aug 2020 • Samuel Albanie, Yang Liu, Arsha Nagrani, Antoine Miech, Ernesto Coto, Ivan Laptev, Rahul Sukthankar, Bernard Ghanem, Andrew Zisserman, Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid, Shi-Zhe Chen, Yida Zhao, Qin Jin, Kaixu Cui, Hui Liu, Chen Wang, Yudong Jiang, Xiaoshuai Hao
This report summarizes the results of the first edition of the challenge together with the findings of the participants.
no code implementations • 29 Jul 2020 • Jonathan C. Stroud, Zhichao Lu, Chen Sun, Jia Deng, Rahul Sukthankar, Cordelia Schmid, David A. Ross
Based on this observation, we propose to use text as a method for learning video representations.
no code implementations • CVPR 2020 • Arsha Nagrani, Chen Sun, David Ross, Rahul Sukthankar, Cordelia Schmid, Andrew Zisserman
We train a BERT-based Speech2Action classifier on over a thousand movie screenplays, to predict action labels from transcribed speech segments.
no code implementations • ECCV 2020 • Andrei Zanfir, Eduard Gabriel Bazavan, Hongyi Xu, Bill Freeman, Rahul Sukthankar, Cristian Sminchisescu
Monocular 3D human pose and shape estimation is challenging due to the many degrees of freedom of the human body and thedifficulty to acquire training data for large-scale supervised learning in complex visual scenes.
Ranked #25 on
3D Human Pose Estimation
on 3DPW
(PA-MPJPE metric)
no code implementations • 14 Sep 2019 • Saif Alabachi, Gita Sukthankar, Rahul Sukthankar
This paper describes two key innovations required to deploy deep reinforcement learning models on a real robot: 1) an abstract state representation for transferring learning from simulation to the hardware platform, and 2) reward shaping and staging paradigms for training the controller.
no code implementations • 21 Jul 2019 • Rui Hou, Chen Chen, Rahul Sukthankar, Mubarak Shah
Convolutional Neural Network (CNN) based image segmentation has made great progress in recent years.
Ranked #62 on
Semi-Supervised Video Object Segmentation
on DAVIS 2016
no code implementations • CVPR 2019 • Chen Sun, Abhinav Shrivastava, Carl Vondrick, Rahul Sukthankar, Kevin Murphy, Cordelia Schmid
This paper focuses on multi-person action forecasting in videos.
1 code implementation • 19 Dec 2018 • Jonathan C. Stroud, David A. Ross, Chen Sun, Jia Deng, Rahul Sukthankar
State-of-the-art methods for video action recognition commonly use an ensemble of two networks: the spatial stream, which takes RGB frames as input, and the temporal stream, which takes optical flow as input.
Ranked #10 on
Action Recognition
on AVA v2.1
no code implementations • 30 Nov 2018 • Alexander Pashevich, Danijar Hafner, James Davidson, Rahul Sukthankar, Cordelia Schmid
To achieve this, we study different modulation signals and exploration for hierarchical controllers.
1 code implementation • ECCV 2018 • Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, Cordelia Schmid
A visualization of the learned relation features confirms that our approach is able to attend to the relevant relations for each action.
Ranked #14 on
Action Recognition
on AVA v2.1
no code implementations • CVPR 2018 • Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A. Ross, Jia Deng, Rahul Sukthankar
We propose TAL-Net, an improved approach to temporal action localization in video that is inspired by the Faster R-CNN object detection framework.
Ranked #18 on
Temporal Action Localization
on THUMOS’14
1 code implementation • 26 Jan 2018 • Steven Hickson, Anelia Angelova, Irfan Essa, Rahul Sukthankar
We consider the problem of retrieving objects from image data and learning to classify them into meaningful semantic categories with minimal supervision.
6 code implementations • CVPR 2018 • Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik
The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.
Ranked #5 on
Action Detection
on UCF101-24
no code implementations • 16 May 2017 • Wen Li, Li-Min Wang, Wei Li, Eirikur Agustsson, Jesse Berent, Abhinav Gupta, Rahul Sukthankar, Luc van Gool
The 2017 WebVision challenge consists of two tracks, the image classification task on WebVision test set, and the transfer learning task on PASCAL VOC 2012 dataset.
no code implementations • 5 May 2017 • Katerina Fragkiadaki, Jonathan Huang, Alex Alemi, Sudheendra Vijayanarasimhan, Susanna Ricco, Rahul Sukthankar
In this work, we present stochastic neural network architectures that handle such multimodality through stochasticity: future trajectories of objects, body joints or frames are represented as deep, non-linear transformations of random (as opposed to deterministic) variables.
no code implementations • 25 Apr 2017 • Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, Katerina Fragkiadaki
We propose SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations.
6 code implementations • ICML 2017 • Lerrel Pinto, James Davidson, Rahul Sukthankar, Abhinav Gupta
Deep neural networks coupled with fast simulation and improved computation have led to recent successes in the field of reinforcement learning (RL).
6 code implementations • CVPR 2017 • Saurabh Gupta, Varun Tolani, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik
The accumulated belief of the world enables the agent to track visited regions of the environment.
no code implementations • 3 Feb 2017 • Shumeet Baluja, Michele Covell, Rahul Sukthankar
Real-time optimization of traffic flow addresses important practical problems: reducing a driver's wasted time, improving city-wide efficiency, reducing gas emissions and improving air quality.
1 code implementation • 20 Dec 2016 • Abhinav Shrivastava, Rahul Sukthankar, Jitendra Malik, Abhinav Gupta
But most of these fine details are lost in the early convolutional layers.
Ranked #217 on
Object Detection
on COCO test-dev
no code implementations • CVPR 2016 • Luca Del Pero, Susanna Ricco, Rahul Sukthankar, Vittorio Ferrari
We propose a motion-based method to discover the physical parts of an articulated object class (e. g. head/torso/leg of a horse) from multiple videos.
no code implementations • 21 Apr 2016 • Haroon Idrees, Amir R. Zamir, Yu-Gang Jiang, Alex Gorban, Ivan Laptev, Rahul Sukthankar, Mubarak Shah
Additionally, we include a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimmed videos, and how well methods trained on trimmed videos generalize to untrimmed videos.
no code implementations • 1 Dec 2015 • Marius Leordeanu, Alexandra Radu, Shumeet Baluja, Rahul Sukthankar
Our method works both as a feature selection mechanism and as a fully competitive classifier.
no code implementations • 30 Nov 2015 • Luca Del Pero, Susanna Ricco, Rahul Sukthankar, Vittorio Ferrari
On behavior discovery, we outperform the state-of-the-art Improved DTF descriptor.
no code implementations • 19 Nov 2015 • Abhimanyu Dubey, Nikhil Naik, Dan Raviv, Rahul Sukthankar, Ramesh Raskar
We propose a method for learning from streaming visual data using a compact, constant size representation of all the data that was seen until a given moment.
1 code implementation • 19 Nov 2015 • George Toderici, Sean M. O'Malley, Sung Jin Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, Rahul Sukthankar
A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements.
no code implementations • CVPR 2015 • Zhengyang Wu, Fuxin Li, Rahul Sukthankar, James M. Rehg
We propose a robust algorithm to generate video segment proposals.
2 code implementations • CVPR 2015 • Xufeng Han, Thomas Leung, Yangqing Jia, Rahul Sukthankar, Alexander C. Berg
We perform a comprehensive set of experiments on standard datasets to carefully study the contributions of each aspect of MatchNet, with direct comparisons to established methods.
1 code implementation • 4 Apr 2015 • Chen Sun, Sanketh Shetty, Rahul Sukthankar, Ram Nevatia
To solve this problem, we propose a simple yet effective method that takes weak video labels and noisy image labels as input, and generates localized action frames as output.
no code implementations • 1 Dec 2014 • Luca Del Pero, Susanna Ricco, Rahul Sukthankar, Vittorio Ferrari
Given unstructured videos of deformable objects, we automatically recover spatiotemporal correspondences to map one object to another (such as animals in the wild).
no code implementations • CVPR 2015 • Luca Del Pero, Susanna Ricco, Rahul Sukthankar, Vittorio Ferrari
We propose an unsupervised approach for discovering characteristic motion patterns in videos of highly articulated objects performing natural, unscripted behaviors, such as tigers in the wild.
no code implementations • 27 Nov 2014 • Marius Leordeanu, Alexandra Radu, Rahul Sukthankar
Feature selection is an essential problem in computer vision, important for category learning and recognition.
no code implementations • 2014 IEEE Conference on Computer Vision and Pattern Recognition 2014 • Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei
We further study the generalization performance of our best model by retraining the top layers on the UCF-101 Action Recognition dataset and observe significant performance improvements compared to the UCF-101 baseline model (63. 3% up from 43. 9%).
Ranked #9 on
Action Recognition
on Sports-1M
no code implementations • CVPR 2014 • Subhabrata Bhattacharya, Mahdi M. Kalayeh, Rahul Sukthankar, Mubarak Shah
While approaches based on bags of features excel at low-level action classification, they are ill-suited for recognizing complex events in video, where concept-based temporal representations currently dominate.
no code implementations • 2 Apr 2014 • Marius Leordeanu, Rahul Sukthankar
In this manner we can learn and grow both a deep, complex graph of classifiers and a rich pool of features at different levels of abstraction and interpretation.
no code implementations • CVPR 2013 • Yicong Tian, Rahul Sukthankar, Mubarak Shah
Deformable part models have achieved impressive performance for object detection, even on difficult image datasets.
no code implementations • CVPR 2013 • Kevin Tang, Rahul Sukthankar, Jay Yagnik, Li Fei-Fei
Second, we ensure that CRANE is robust to label noise, both in terms of tagged videos that fail to contain the concept as well as occasional negative videos that do.
no code implementations • NeurIPS 2009 • Marius Leordeanu, Martial Hebert, Rahul Sukthankar
When applied to MAP inference, the algorithm is a parallel extension of Iterated Conditional Modes (ICM) with climbing and convergence properties that make it a compelling alternative to the sequential ICM.
no code implementations • NeurIPS 2008 • Liu Yang, Rong Jin, Rahul Sukthankar
For empirical evaluation, we present a direct comparison with a number of state-of-the-art methods for inductive semi-supervised learning and text categorization; and we show that SSLW results in a significant improvement in categorization accuracy, equipped with a small training set and an unlabeled resource that is weakly related to the test beds."