1 code implementation • 1 Jun 2023 • Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer
Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance.
Ranked #1 on
Action Recognition
on AVA v2.2
(using extra training data)
1 code implementation • 31 May 2023 • Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, Jitendra Malik
To analyze video, we use 3D reconstructions from HMR 2. 0 as input to a tracking system that operates in 3D.
Ranked #3 on
Pose Tracking
on PoseTrack2018
no code implementations • 3 Apr 2023 • Jacob Krantz, Theophile Gervet, Karmesh Yadav, Austin Wang, Chris Paxton, Roozbeh Mottaghi, Dhruv Batra, Jitendra Malik, Stefan Lee, Devendra Singh Chaplot
Our modular method solves sub-tasks of exploration, goal instance re-identification, goal localization, and local navigation.
no code implementations • CVPR 2023 • Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Christoph Feichtenhofer, Jitendra Malik
Subsequently, we propose a Lagrangian Action Recognition model by fusing 3D pose and contextualized appearance over tracklets.
Ranked #3 on
Action Recognition
on AVA v2.2
(using extra training data)
no code implementations • 31 Mar 2023 • Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier
We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual 'foundation models' for Embodied AI.
no code implementations • 6 Mar 2023 • Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, Koushil Sreenath
We present a sim-to-real learning-based approach for real-world humanoid locomotion.
1 code implementation • CVPR 2023 • Vickie Ye, Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa
Our method robustly recovers the global 3D trajectories of people in challenging in-the-wild videos, such as PoseTrack.
1 code implementation • 15 Feb 2023 • Sehoon Kim, Karttikeya Mangalam, Suhong Moon, John Canny, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer
To address this, we propose Big Little Decoder (BiLD), a framework that can improve inference efficiency and latency for a wide range of text generation applications.
4 code implementations • CVPR 2022 • Karttikeya Mangalam, Haoqi Fan, Yanghao Li, Chao-yuan Wu, Bo Xiong, Christoph Feichtenhofer, Jitendra Malik
Reversible Vision Transformers achieve a reduced memory footprint of up to 15. 5x at roughly identical model complexity, parameters and accuracy, demonstrating the promise of reversible vision transformers as an efficient backbone for hardware resource limited training regimes.
1 code implementation • CVPR 2023 • Chao-yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, Georgia Gkioxari
We introduce a simple framework that operates on 3D points of single objects or whole scenes coupled with category-agnostic large-scale training from diverse RGB-D videos.
no code implementations • 5 Jan 2023 • Jasmine Collins, Anqi Liang, Jitendra Malik, Hao Zhang, Frédéric Devernay
We present a neural network approach to transfer the motion from a single image of an articulated object to a rest-state (i. e., unarticulated) 3D model.
no code implementations • 20 Dec 2022 • Boyi Li, Rodolfo Corona, Karttikeya Mangalam, Catherine Chen, Daniel Flaherty, Serge Belongie, Kilian Q. Weinberger, Jitendra Malik, Trevor Darrell, Dan Klein
Are extralinguistic signals such as image pixels crucial for inducing constituency grammars?
no code implementations • 15 Dec 2022 • Po-Yao Huang, Vasu Sharma, Hu Xu, Chaitanya Ryali, Haoqi Fan, Yanghao Li, Shang-Wen Li, Gargi Ghosh, Jitendra Malik, Christoph Feichtenhofer
We present Masked Audio-Video Learners (MAViL) to train audio-visual representations.
Ranked #1 on
Audio Classification
on VGGSound
(using extra training data)
no code implementations • 2 Dec 2022 • Theophile Gervet, Soumith Chintala, Dhruv Batra, Jitendra Malik, Devendra Singh Chaplot
In contrast, end-to-end learning does not, dropping from 77% simulation to 23% real-world success rate due to a large image domain gap between simulation and reality.
1 code implementation • 29 Nov 2022 • Jacob Krantz, Stefan Lee, Jitendra Malik, Dhruv Batra, Devendra Singh Chaplot
We consider the problem of embodied visual navigation given an image-goal (ImageNav) where an agent is initialized in an unfamiliar environment and tasked with navigating to a location 'described' by an image.
no code implementations • 23 Nov 2022 • Austin Patel, Andrew Wang, Ilija Radosavovic, Jitendra Malik
In this paper we make two main contributions: (1) a novel reconstruction technique RHOV (Reconstructing Hands and Objects from Videos), which reconstructs 4D trajectories of both the hand and the object using 2D image cues and temporal smoothness constraints; (2) a system for imitating object interactions in a physics simulator with reinforcement learning.
no code implementations • 14 Nov 2022 • Ananye Agarwal, Ashish Kumar, Jitendra Malik, Deepak Pathak
Animals are capable of precise and agile locomotion using vision.
no code implementations • 7 Nov 2022 • Antonio Loquercio, Ashish Kumar, Jitendra Malik
In this work, we show how to learn a visual walking policy that only uses a monocular RGB camera and proprioception.
1 code implementation • 10 Oct 2022 • Haozhi Qi, Ashish Kumar, Roberto Calandra, Yi Ma, Jitendra Malik
Generalized in-hand manipulation has long been an unsolved challenge of robotics.
1 code implementation • 6 Oct 2022 • Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell
Finally, we train a 307M parameter vision transformer on a massive collection of 4. 5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning.
1 code implementation • 26 Sep 2022 • William Peebles, Ilija Radosavovic, Tim Brooks, Alexei A. Efros, Jitendra Malik
We explore a data-driven approach for learning to optimize neural networks.
no code implementations • 19 Sep 2022 • Dingqi Zhang, Antonio Loquercio, Xiangyu Wu, Ashish Kumar, Jitendra Malik, Mark W. Mueller
This paper proposes an adaptive near-hover position controller for quadcopters, which can be deployed to quadcopters of very different mass, size and motor constants, and also shows rapid adaptation to unknown disturbances during runtime.
no code implementations • 6 Sep 2022 • Jiayuan Gu, Devendra Singh Chaplot, Hao Su, Jitendra Malik
To tackle the entire task, prior work chains multiple stationary manipulation skills with a point-goal navigation skill, which are learned individually on subtasks.
3 code implementations • 2 Jun 2022 • Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer
After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes.
Ranked #27 on
Speech Recognition
on LibriSpeech test-clean
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
no code implementations • 30 May 2022 • Ashish Kumar, Zhongyu Li, Jun Zeng, Deepak Pathak, Koushil Sreenath, Jitendra Malik
In this work, we leverage recent advances in rapid adaptation for locomotion control, and extend them to work on bipedal robots.
1 code implementation • CVPR 2022 • Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, Du Tran
From PA we construct a large set of pseudo-ground-truth instance masks; combined with human-annotated instance masks we train GGNs and significantly outperform the SOTA on open-world instance segmentation on various benchmarks including COCO, LVIS, ADE20K, and UVO.
1 code implementation • 11 Mar 2022 • Tete Xiao, Ilija Radosavovic, Trevor Darrell, Jitendra Malik
This paper shows that self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels.
1 code implementation • 10 Feb 2022 • Anastasios N Angelopoulos, Amit P Kohli, Stephen Bates, Michael I Jordan, Jitendra Malik, Thayer Alshaabi, Srigokul Upadhyayula, Yaniv Romano
Image-to-image regression is an important learning task, used frequently in biological imaging.
no code implementations • CVPR 2022 • Santhosh Kumar Ramakrishnan, Devendra Singh Chaplot, Ziad Al-Halah, Jitendra Malik, Kristen Grauman
We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of `where to look?'
1 code implementation • CVPR 2022 • Chao-yuan Wu, Yanghao Li, Karttikeya Mangalam, Haoqi Fan, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer
Instead of trying to process more frames at once like most existing methods, we propose to process videos in an online fashion and cache "memory" at each iteration.
Ranked #2 on
Action Anticipation
on EPIC-KITCHENS-100
(using extra training data)
no code implementations • CVPR 2022 • Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik
For a future frame, we compute the similarity between the predicted state of a tracklet and the single frame observations in a probabilistic manner.
no code implementations • 8 Dec 2021 • Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik
For a future frame, we compute the similarity between the predicted state of a tracklet and the single frame observations in a probabilistic manner.
no code implementations • CVPR 2022 • Zipeng Fu, Ashish Kumar, Ananye Agarwal, Haozhi Qi, Jitendra Malik, Deepak Pathak
A safety advisor module adds sensed unexpected obstacles to the occupancy map and environment-determined speed limits to the velocity command generator.
6 code implementations • CVPR 2022 • Yanghao Li, Chao-yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer
In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection.
Ranked #1 on
Action Classification
on Kinetics-600
(GFLOPs metric)
no code implementations • NeurIPS 2021 • Devendra Singh Chaplot, Murtaza Dalal, Saurabh Gupta, Jitendra Malik, Ruslan Salakhutdinov
The observations gathered by this exploration policy are labelled using 3D consistency and used to improve the perception model.
no code implementations • 2 Dec 2021 • Devendra Singh Chaplot, Deepak Pathak, Jitendra Malik
We consider the problem of spatial path planning.
1 code implementation • 18 Nov 2021 • Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer
We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.
1 code implementation • NeurIPS 2021 • Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik
We find that 3D representations are more effective than 2D representations for tracking in these settings, and we obtain state-of-the-art performance.
no code implementations • 25 Oct 2021 • Zipeng Fu, Ashish Kumar, Jitendra Malik, Deepak Pathak
We demonstrate that learning to minimize energy consumption plays a key role in the emergence of natural locomotion gaits at different speeds in real quadruped robots.
3 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.
1 code implementation • CVPR 2022 • Jasmine Collins, Shubham Goel, Kenan Deng, Achleshwar Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F. Yago Vicente, Thomas Dideriksen, Himanshu Arora, Matthieu Guillaumin, Jitendra Malik
ABO contains product catalog images, metadata, and artist-created 3D models with complex geometries and physically-based materials that correspond to real, household objects.
1 code implementation • CVPR 2022 • Shubham Goel, Georgia Gkioxari, Jitendra Malik
We propose Differentiable Stereopsis, a multi-view stereo approach that reconstructs shape and texture from few input views and noisy cameras.
1 code implementation • ICCV 2021 • Ainaz Eftekhar, Alexander Sax, Roman Bachmann, Jitendra Malik, Amir Zamir
This paper introduces a pipeline to parametrically sample and render multi-task vision datasets from comprehensive 3D scans from the real world.
2 code implementations • NeurIPS 2021 • Edward J. Smith, David Meger, Luis Pineda, Roberto Calandra, Jitendra Malik, Adriana Romero, Michal Drozdzal
In this paper, we focus on this problem and introduce a system composed of: 1) a haptic simulator leveraging high spatial resolution vision-based tactile sensors for active touching of 3D objects; 2)a mesh-based 3D shape reconstruction model that relies on tactile or visuotactile signals; and 3) a set of data-driven solutions with either tactile or visuotactile priors to guide the shape exploration.
no code implementations • 8 Jul 2021 • Ashish Kumar, Zipeng Fu, Deepak Pathak, Jitendra Malik
Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear.
6 code implementations • NeurIPS 2021 • Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, Dhruv Batra
We introduce Habitat 2. 0 (H2. 0), a simulation platform for training virtual robots in interactive 3D environments and complex physics-enabled scenarios.
6 code implementations • ICCV 2021 • Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer
We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.
Ranked #13 on
Action Classification
on Charades
2 code implementations • 7 Jan 2021 • Stephen Bates, Anastasios Angelopoulos, Lihua Lei, Jitendra Malik, Michael I. Jordan
While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making.
no code implementations • CVPR 2022 • Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa
The tools we develop open the door to processing and analyzing in 3D content from a large library of edited media, which could be helpful for many downstream applications.
no code implementations • ICCV 2021 • Zhe Cao, Ilija Radosavovic, Angjoo Kanazawa, Jitendra Malik
In this work we explore reconstructing hand-object interactions in the wild.
2 code implementations • ICCV 2021 • Karttikeya Mangalam, Yang An, Harshayu Girase, Jitendra Malik
Uncertainty in future trajectories stems from two sources: (a) sources that are known to the agent but unknown to the model, such as long term goals and (b)sources that are unknown to both the agent & the model, such as intent of other agents & irreducible randomness indecisions.
Ranked #2 on
Trajectory Prediction
on ETH/UCY
no code implementations • 26 Nov 2020 • Ke Li, Shichong Peng, Kailas Vodrahalli, Jitendra Malik
In continual learning, new categories may be introduced over time, and an ideal learning system should perform well on both the original categories and the new categories.
no code implementations • 13 Nov 2020 • Bryan Chen, Alexander Sax, Gene Lewis, Iro Armeni, Silvio Savarese, Amir Zamir, Jitendra Malik, Lerrel Pinto
Vision-based robotics often separates the control loop into one module for perception and a separate module for control.
no code implementations • 3 Nov 2020 • Dhruv Batra, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun, Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, Manolis Savva, Hao Su
In the rearrangement task, the goal is to bring a given physical environment into a specified state.
no code implementations • 7 Oct 2020 • Jonathan T. Barron, Jitendra Malik
A fundamental problem in computer vision is that of inferring the intrinsic, 3D structure of the world from flat, 2D images of that world.
2 code implementations • ICLR 2021 • Anastasios Angelopoulos, Stephen Bates, Jitendra Malik, Michael. I. Jordan
Convolutional image classifiers can achieve high predictive accuracy, but quantifying their uncertainty remains an unresolved challenge, hindering their deployment in consequential settings.
1 code implementation • ICLR 2021 • Haozhi Qi, Xiaolong Wang, Deepak Pathak, Yi Ma, Jitendra Malik
Learning long-term dynamics models is the key to understanding physical common sense.
Ranked #1 on
Visual Reasoning
on PHYRE-1B-Within
1 code implementation • ECCV 2020 • Jason Y. Zhang, Sam Pepose, Hanbyul Joo, Deva Ramanan, Jitendra Malik, Angjoo Kanazawa
We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment.
3D Human Pose Estimation
3D Shape Reconstruction From A Single 2D Image
+2
no code implementations • ECCV 2020 • Shubham Goel, Angjoo Kanazawa, Jitendra Malik
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image, trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.
1 code implementation • ECCV 2020 • Zhe Cao, Hang Gao, Karttikeya Mangalam, Qi-Zhi Cai, Minh Vo, Jitendra Malik
Human movement is goal-directed and influenced by the spatial layout of the objects in the scene.
1 code implementation • NeurIPS 2020 • Edward J. Smith, Roberto Calandra, Adriana Romero, Georgia Gkioxari, David Meger, Jitendra Malik, Michal Drozdzal
When a toddler is presented a new toy, their instinctual behaviour is to pick it upand inspect it with their hand and eyes in tandem, clearly searching over its surface to properly understand what they are playing with.
1 code implementation • ICML 2020 • Haozhi Qi, Chong You, Xiaolong Wang, Yi Ma, Jitendra Malik
Initialization, normalization, and skip connections are believed to be three indispensable techniques for training very deep convolutional neural networks and obtaining state-of-the-art performance.
1 code implementation • 7 Jun 2020 • Amir Zamir, Alexander Sax, Teresa Yeo, Oğuzhan Kar, Nikhil Cheerla, Rohan Suri, Zhangjie Cao, Jitendra Malik, Leonidas Guibas
Visual perception entails solving a wide set of tasks, e. g., object detection, depth estimation, etc.
1 code implementation • ECCV 2020 • Ning Yu, Ke Li, Peng Zhou, Jitendra Malik, Larry Davis, Mario Fritz
Generative Adversarial Networks (GANs) have brought about rapid progress towards generating photorealistic images.
no code implementations • 7 Apr 2020 • Ilija Radosavovic, Xiaolong Wang, Lerrel Pinto, Jitendra Malik
To tackle this setting, we train an inverse dynamics model and use it to predict actions for state-only demonstrations.
2 code implementations • 7 Apr 2020 • Ke Li, Shichong Peng, Tianhao Zhang, Jitendra Malik
Many tasks in computer vision and graphics fall within the framework of conditional image synthesis.
3 code implementations • ECCV 2020 • Karttikeya Mangalam, Harshayu Girase, Shreyas Agarwal, Kuan-Hui Lee, Ehsan Adeli, Jitendra Malik, Adrien Gaidon
In this work, we present Predicted Endpoint Conditioned Network (PECNet) for flexible human trajectory prediction.
Ranked #1 on
Multi Future Trajectory Prediction
on ETH/UCY
1 code implementation • 23 Jan 2020 • Fanyi Xiao, Yong Jae Lee, Kristen Grauman, Jitendra Malik, Christoph Feichtenhofer
We present Audiovisual SlowFast Networks, an architecture for integrated audiovisual perception.
2 code implementations • ECCV 2020 • Jeffrey O. Zhang, Alexander Sax, Amir Zamir, Leonidas Guibas, Jitendra Malik
When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than starting from randomly initialized weights.
1 code implementation • 23 Dec 2019 • Alexander Sax, Jeffrey O. Zhang, Bradley Emi, Amir Zamir, Silvio Savarese, Leonidas Guibas, Jitendra Malik
How much does having visual priors about the world (e. g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e. g. navigating a complex environment)?
1 code implementation • NeurIPS 2019 • Ke Li, Tianhao Zhang, Jitendra Malik
Work on adversarial examples has shown that neural nets are surprisingly sensitive to adversarially chosen changes of small magnitude.
1 code implementation • ICCV 2019 • Iro Armeni, Zhi-Yang He, JunYoung Gwak, Amir R. Zamir, Martin Fischer, Jitendra Malik, Silvio Savarese
Given a 3D mesh and registered panoramic images, we construct a graph that spans the entire building and includes semantics on objects (e. g., class, material, and other attributes), rooms (e. g., scene category, volume, etc.)
1 code implementation • ICCV 2019 • Jason Y. Zhang, Panna Felsen, Angjoo Kanazawa, Jitendra Malik
In this work, we present perhaps the first approach for predicting a future 3D mesh model sequence of a person from past video input.
2 code implementations • CVPR 2019 • Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, Jitendra Malik
Specifically, we perform cross-modal translation from "in-the-wild'' monologue speech of a single speaker to their hand and arm motion.
Ranked #4 on
Gesture Generation
on BEAT
6 code implementations • ICCV 2019 • Georgia Gkioxari, Jitendra Malik, Justin Johnson
We propose a system that detects objects in real-world images and produces a triangle mesh giving the full 3D shape of each detected object.
Ranked #1 on
3D Shape Modeling
on Pix3D S2
no code implementations • 29 May 2019 • Ashish Kumar, Saurabh Gupta, Jitendra Malik
We demonstrate our proposed approach in context of navigation, and show that we can successfully learn consistent and diverse visuomotor subroutines from passive egocentric videos.
1 code implementation • ICML 2020 • Trevor Standley, Amir R. Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, Silvio Savarese
Many computer vision applications require solving multiple tasks in real-time.
no code implementations • ICLR 2019 • Ke Li, Jitendra Malik
Extensive work on compressed sensing has yielded a rich collection of sparse recovery algorithms, each making different tradeoffs between recovery condition and computational efficiency.
1 code implementation • ICCV 2019 • Wei-cheng Kuo, Anelia Angelova, Jitendra Malik, Tsung-Yi Lin
However, it is difficult and costly to segment objects in novel categories because a large number of mask annotations is required.
12 code implementations • ICCV 2019 • Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra
We present Habitat, a platform for research in embodied artificial intelligence (AI).
Ranked #2 on
PointGoal Navigation
on Gibson PointGoal Navigation
no code implementations • 6 Mar 2019 • Somil Bansal, Varun Tolani, Saurabh Gupta, Jitendra Malik, Claire Tomlin
Model-based control is a popular paradigm for robot navigation because it can leverage a known dynamics model to efficiently plan robust robot trajectories.
no code implementations • 24 Jan 2019 • Jianqiao Wangni, Ke Li, Jianbo Shi, Jitendra Malik
Recently, researchers proposed various low-precision gradient compression, for efficient communication in large-scale distributed optimization.
no code implementations • CVPR 2019 • Zhe Cao, Abhishek Kar, Christian Haene, Jitendra Malik
Unlike prior learning based work which has focused on predicting dense pixel-wise optical flow field and/or a depth map for each image, we propose to predict object instance specific 3D scene flow maps and instance masks from which we are able to derive the motion direction and speed for each object instance.
1 code implementation • 31 Dec 2018 • Alexander Sax, Bradley Emi, Amir R. Zamir, Leonidas Guibas, Silvio Savarese, Jitendra Malik
This skill set (hereafter mid-level perception) provides the policy with a more processed state of the world compared to raw images.
1 code implementation • CVPR 2019 • Yedid Hoshen, Jitendra Malik
GLANN combines the strengths of IMLE and GLO in a way that overcomes the main drawbacks of each method.
11 code implementations • ICCV 2019 • Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He
We present SlowFast networks for video recognition.
Ranked #3 on
Action Recognition
on AVA v2.1
1 code implementation • CVPR 2019 • Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, Jitendra Malik
We present a framework that can similarly learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features.
Ranked #8 on
3D Human Pose Estimation
on 3DPW
(Acceleration Error metric)
no code implementations • NeurIPS 2018 • Ashish Kumar, Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik
Equipped with this abstraction, a second network observes the world and decides how to act to retrace the path under noisy actuation and a changing environment.
no code implementations • 30 Nov 2018 • Kailas Vodrahalli, Ke Li, Jitendra Malik
Modern computer vision algorithms often rely on very large training datasets.
no code implementations • 29 Nov 2018 • Ke Li, Jitendra Malik
Generative adversarial nets (GANs) have generated a lot of excitement.
1 code implementation • ICCV 2019 • Ke Li, Tianhao Zhang, Jitendra Malik
Most existing methods for conditional image synthesis are only able to generate a single plausible image for any given input, or at best a fixed number of plausible images.
1 code implementation • 8 Oct 2018 • Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, Sergey Levine
In this paper, we propose a method that enables physically simulated characters to learn skills from videos (SFV).
no code implementations • 2 Oct 2018 • Ke Li, Shichong Peng, Jitendra Malik
Single-image super-resolution (SISR) is a canonical problem with diverse applications.
no code implementations • 27 Sep 2018 • Ke Li*, Tianhao Zhang*, Jitendra Malik
In recent years, various studies have focused on the robustness of neural nets.
1 code implementation • ICLR 2019 • Ke Li, Jitendra Malik
Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly.
no code implementations • 8 Sep 2018 • Wei-cheng Kuo, Christian Häne, Esther Yuh, Pratik Mukherjee, Jitendra Malik
Deep learning for clinical applications is subject to stringent performance requirements, which raises a need for large labeled datasets.
5 code implementations • CVPR 2018 • Fei Xia, Amir Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, Silvio Savarese
Developing visual perception models for active agents and sensorimotor control are cumbersome to be done in the physical world, as existing algorithms are too slow to efficiently learn in real-time and robots are fragile and costly.
9 code implementations • 18 Jul 2018 • Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir
Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence.
1 code implementation • 21 Jun 2018 • Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik
The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels.
no code implementations • 8 Jun 2018 • Wei-cheng Kuo, Christian Häne, Esther Yuh, Pratik Mukherjee, Jitendra Malik
This paper studies the problem of detecting and segmenting acute intracranial hemorrhage on head computed tomography (CT) scans.
no code implementations • 28 May 2018 • Roberto Calandra, Andrew Owens, Dinesh Jayaraman, Justin Lin, Wenzhen Yuan, Jitendra Malik, Edward H. Adelson, Sergey Levine
This model -- a deep, multimodal convolutional network -- predicts the outcome of a candidate grasp adjustment, and then executes a grasp by iteratively selecting the most promising actions.
1 code implementation • CVPR 2018 • Amir Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, Silvio Savarese
The product is a computational taxonomic map for task transfer learning.
1 code implementation • ICLR 2018 • Deepak Pathak, Parsa Mahmoudieh, Guanghao Luo, Pulkit Agrawal, Dian Chen, Yide Shentu, Evan Shelhamer, Jitendra Malik, Alexei A. Efros, Trevor Darrell
In our framework, the role of the expert is only to communicate the goals (i. e., what to imitate) during inference.
no code implementations • ECCV 2018 • Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, Jitendra Malik
The shape is represented as a deformable 3D mesh model of an object category where a shape is parameterized by a learned mean shape and per-instance predicted deformation.
no code implementations • CVPR 2018 • Shubham Tulsiani, Alexei A. Efros, Jitendra Malik
We present a framework for learning single-view shape and pose prediction without using direct supervision for either.
no code implementations • 21 Dec 2017 • Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik
This works presents a formulation for visual navigation that unifies map based spatial reasoning and path planning, with landmark based robust plan execution in noisy environments.
7 code implementations • CVPR 2018 • Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik
The main objective is to minimize the reprojection loss of keypoints, which allow our model to be trained using images in-the-wild that only have ground truth 2D annotations.
Ranked #1 on
Weakly-supervised 3D Human Pose Estimation
on Human3.6M
(3D Annotations metric)
no code implementations • CVPR 2018 • David F. Fouhey, Wei-cheng Kuo, Alexei A. Efros, Jitendra Malik
A major stumbling block to progress in understanding basic human interactions, such as getting out of bed or opening a refrigerator, is lack of good training data.
no code implementations • CVPR 2018 • Shubham Tulsiani, Saurabh Gupta, David Fouhey, Alexei A. Efros, Jitendra Malik
The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in terms of shape and pose.
1 code implementation • 23 Oct 2017 • Amir R. Zamir, Tilman Wekel, Pulkit Argrawal, Colin Weil, Jitendra Malik, Silvio Savarese
Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited.
1 code implementation • 17 Oct 2017 • Li Yi, Lin Shao, Manolis Savva, Haibin Huang, Yang Zhou, Qirui Wang, Benjamin Graham, Martin Engelcke, Roman Klokov, Victor Lempitsky, Yuan Gan, Pengyu Wang, Kun Liu, Fenggen Yu, Panpan Shui, Bingyang Hu, Yan Zhang, Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Minki Jeong, Jaehoon Choi, Changick Kim, Angom Geetchandra, Narasimha Murthy, Bhargava Ramu, Bharadwaj Manda, M. Ramanathan, Gautam Kumar, P Preetham, Siddharth Srivastava, Swati Bhugra, Brejesh lall, Christian Haene, Shubham Tulsiani, Jitendra Malik, Jared Lafer, Ramsey Jones, Siyuan Li, Jie Lu, Shi Jin, Jingyi Yu, Qi-Xing Huang, Evangelos Kalogerakis, Silvio Savarese, Pat Hanrahan, Thomas Funkhouser, Hao Su, Leonidas Guibas
We introduce a large-scale 3D shape understanding benchmark using data and annotation from ShapeNet 3D object database.
no code implementations • ICCV 2017 • Panna Felsen, Pulkit Agrawal, Jitendra Malik
A large number of very popular team sports involve the act of one team trying to score a goal against the other.
1 code implementation • NeurIPS 2017 • Abhishek Kar, Christian Häne, Jitendra Malik
We thoroughly evaluate our approach on the ShapeNet dataset and demonstrate the benefits over classical approaches as well as recent learning based methods.
6 code implementations • CVPR 2018 • Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik
The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.
Ranked #3 on
Temporal Action Localization
on UCF101-24
no code implementations • CVPR 2017 • Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, Jitendra Malik
We study the notion of consistency between a 3D shape and a 2D observation and propose a differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view.
1 code implementation • 3 Apr 2017 • Christian Häne, Shubham Tulsiani, Jitendra Malik
A major limitation of such approaches is that they only predict a coarse resolution voxel grid, which does not capture the surface of the objects well.
no code implementations • 6 Mar 2017 • Ashvin Nair, Dian Chen, Pulkit Agrawal, Phillip Isola, Pieter Abbeel, Jitendra Malik, Sergey Levine
Manipulation of deformable objects, such as ropes and cloth, is an important but challenging problem in robotics.
2 code implementations • ICML 2017 • Ke Li, Jitendra Malik
Most exact methods for k-nearest neighbour search suffer from the curse of dimensionality; that is, their query times exhibit exponential dependence on either the ambient or the intrinsic dimensionality.
no code implementations • ICLR 2018 • Ke Li, Jitendra Malik
Learning to Optimize is a recently proposed framework for learning optimization algorithms using reinforcement learning.
6 code implementations • CVPR 2017 • Saurabh Gupta, Varun Tolani, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik
The accumulated belief of the world enables the agent to track visited regions of the environment.
1 code implementation • CVPR 2017 • Amir R. Zamir, Te-Lin Wu, Lin Sun, William Shen, Jitendra Malik, Silvio Savarese
Currently, the most successful learning models in computer vision are based on learning successive representations followed by a decision layer.
1 code implementation • 20 Dec 2016 • Abhinav Shrivastava, Rahul Sukthankar, Jitendra Malik, Abhinav Gupta
But most of these fine details are lost in the early convolutional layers.
Ranked #217 on
Object Detection
on COCO test-dev
3 code implementations • CVPR 2017 • Shubham Tulsiani, Hao Su, Leonidas J. Guibas, Alexei A. Efros, Jitendra Malik
We present a learning framework for abstracting complex shapes by learning to assemble objects using 3D volumetric primitives.
1 code implementation • NeurIPS 2016 • Pulkit Agrawal, Ashvin Nair, Pieter Abbeel, Jitendra Malik, Sergey Levine
We investigate an experiential learning paradigm for acquiring an internal model of intuitive physics.
no code implementations • 2016 2016 • Ke Li, Jitendra Malik
Algorithm design is a laborious process and often requires many iterations of ideation and validation.
4 code implementations • 11 May 2016 • Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, Alexei A. Efros
We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints.
no code implementations • 27 Apr 2016 • Ke Li, Jitendra Malik
We consider the problem of amodal instance segmentation, the objective of which is to predict the region encompassing both visible and occluded parts of each object.
1 code implementation • 1 Dec 2015 • Ke Li, Jitendra Malik
Existing methods for retrieving k-nearest neighbours suffer from the curse of dimensionality.
no code implementations • CVPR 2016 • Ke Li, Bharath Hariharan, Jitendra Malik
Existing methods for pixel-wise labelling tasks generally disregard the underlying structure of labellings, often leading to predictions that are visually implausible.
no code implementations • 25 Nov 2015 • Saurabh Gupta, Bharath Hariharan, Jitendra Malik
In this paper we explore two ways of using context for object detection.
no code implementations • 24 Nov 2015 • Shubham Tulsiani, Abhishek Kar, Qi-Xing Huang, João Carreira, Jitendra Malik
Actions as simple as grasping an object or navigating around it require a rich understanding of that object's 3D shape from a given viewpoint.
no code implementations • 23 Nov 2015 • Katerina Fragkiadaki, Pulkit Agrawal, Sergey Levine, Jitendra Malik
The ability to plan and execute goal specific actions in varied, unexpected settings is a central requirement of intelligent agents.
no code implementations • ICCV 2015 • Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik
We consider the problem of enriching current object detection systems with veridical object sizes and relative depth estimates from a single image.
no code implementations • 22 Sep 2015 • Ke Li, Jitendra Malik
The scarcity of data annotated at the desired level of granularity is a recurring issue in many applications.
no code implementations • ICCV 2015 • Katerina Fragkiadaki, Sergey Levine, Panna Felsen, Jitendra Malik
We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture.
Ranked #8 on
Human Pose Forecasting
on Human3.6M
(MAR, walking, 1,000ms metric)
1 code implementation • CVPR 2016 • Joao Carreira, Pulkit Agrawal, Katerina Fragkiadaki, Jitendra Malik
Hierarchical feature extractors such as Convolutional Networks (ConvNets) have achieved impressive performance on a variety of classification tasks using purely feedforward processing.
Ranked #43 on
Pose Estimation
on MPII Human Pose
1 code implementation • CVPR 2016 • Saurabh Gupta, Judy Hoffman, Jitendra Malik
In this work we propose a technique that transfers supervision between images from different modalities.
no code implementations • CVPR 2015 • Michael W. Tao, Pratul P. Srinivasan, Jitendra Malik, Szymon Rusinkiewicz, Ravi Ramamoorthi
Using shading information is essential to improve the shape estimation.
no code implementations • CVPR 2015 • Saurabh Gupta, Pablo Arbelaez, Ross Girshick, Jitendra Malik
The goal of this work is to represent objects in an RGB-D scene with corresponding 3D models from a library.
1 code implementation • 17 May 2015 • Saurabh Gupta, Jitendra Malik
In this paper we introduce the problem of Visual Semantic Role Labeling: given an image we want to detect people doing actions and localize the objects of interaction.
1 code implementation • ICCV 2015 • Wei-cheng Kuo, Bharath Hariharan, Jitendra Malik
Existing object proposal approaches use primarily bottom-up cues to rank proposals, while we believe that objectness is in fact a high level construct.
no code implementations • ICCV 2015 • Pulkit Agrawal, Joao Carreira, Jitendra Malik
We show that given the same number of training images, features learnt using egomotion as supervision compare favourably to features learnt using class-label as supervision on visual tasks of scene recognition, object recognition, visual odometry and keypoint matching.
2 code implementations • ICCV 2015 • Georgia Gkioxari, Ross Girshick, Jitendra Malik
In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition system.
Ranked #4 on
Weakly Supervised Object Detection
on HICO-DET
1 code implementation • ICCV 2015 • Shubham Tulsiani, João Carreira, Jitendra Malik
We address the task of predicting pose for objects of unannotated object categories from a small seed set of annotated object classes.
1 code implementation • 3 Mar 2015 • Jordi Pont-Tuset, Pablo Arbelaez, Jonathan T. Barron, Ferran Marques, Jitendra Malik
We propose a unified approach for bottom-up hierarchical image segmentation and object proposal generation for recognition, called Multiscale Combinatorial Grouping (MCG).
no code implementations • 16 Feb 2015 • Saurabh Gupta, Pablo Arbeláez, Ross Girshick, Jitendra Malik
The goal of this work is to replace objects in an RGB-D scene with corresponding 3D models from a library.
no code implementations • CVPR 2015 • Katerina Fragkiadaki, Pablo Arbelaez, Panna Felsen, Jitendra Malik
We segment moving objects in videos by ranking spatio-temporal segment proposals according to "moving objectness": how likely they are to contain a moving object.
no code implementations • ICCV 2015 • Georgia Gkioxari, Ross Girshick, Jitendra Malik
We investigate the importance of parts for the tasks of action and attribute classification.
no code implementations • NeurIPS 2014 • Katerina Fragkiadaki, Marta Salas, Pablo Arbelaez, Jitendra Malik
Furthermore, NRSfM needs to be robust to noise in both segmentation and tracking, e. g., drifting, segmentation ``leaking'', optical flow ``bleeding'' etc.
no code implementations • CVPR 2015 • Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik
Object reconstruction from a single image -- in the wild -- is a problem where we can make progress and get meaningful results today.
no code implementations • CVPR 2015 • João Carreira, Abhishek Kar, Shubham Tulsiani, Jitendra Malik
All that structure from motion algorithms "see" are sets of 2D points.
no code implementations • CVPR 2015 • Shubham Tulsiani, Jitendra Malik
We characterize the problem of pose estimation for rigid objects in terms of determining viewpoint to explain coarse pose and keypoint prediction to capture the finer details.
Ranked #3 on
Keypoint Detection
on Pascal3D+
1 code implementation • CVPR 2015 • Georgia Gkioxari, Jitendra Malik
We address the problem of action detection in videos.
Ranked #10 on
Skeleton Based Action Recognition
on J-HMDB
6 code implementations • CVPR 2015 • Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik
Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation.
no code implementations • 22 Sep 2014 • Shiry Ginosar, Daniel Haas, Timothy Brown, Jitendra Malik
Although the human visual system is surprisingly robust to extreme distortion when recognizing objects, most evaluations of computer object detection methods focus only on robustness to natural form deformations such as people's pose changes.
1 code implementation • CVPR 2015 • Ross Girshick, Forrest Iandola, Trevor Darrell, Jitendra Malik
Deformable part models (DPMs) and convolutional neural networks (CNNs) are two widely used tools for visual recognition.
Ranked #27 on
Object Detection
on PASCAL VOC 2007
no code implementations • 22 Jul 2014 • Saurabh Gupta, Ross Girshick, Pablo Arbeláez, Jitendra Malik
In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features.
Ranked #6 on
Object Detection In Indoor Scenes
on SUN RGB-D
no code implementations • 18 Jul 2014 • Pulkit Agrawal, Dustin Stansbury, Jitendra Malik, Jack L. Gallant
We find that both classes of models accurately predict brain activity in high-level visual areas, directly from pixels and without the need for any semantic tags or hand annotation of images.
1 code implementation • 7 Jul 2014 • Pulkit Agrawal, Ross Girshick, Jitendra Malik
In the last two years, convolutional neural networks (CNNs) have achieved an impressive suite of results on standard recognition datasets and tasks.
no code implementations • 7 Jul 2014 • Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik
Unlike classical semantic segmentation, we require individual object instances.
Ranked #4 on
Object Detection
on PASCAL VOC 2012
no code implementations • 19 Jun 2014 • Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik
We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images.
no code implementations • CVPR 2014 • Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik
A k-poselet is a deformable part model (DPM) with k parts, where each of the parts is a poselet, aligned to a specific configuration of keypoints based on ground-truth annotations.
no code implementations • CVPR 2014 • Pablo Arbelaez, Jordi Pont-Tuset, Jonathan T. Barron, Ferran Marques, Jitendra Malik
We propose a unified approach for bottom-up hierarchical image segmentation and object candidate generation for recognition, called Multiscale Combinatorial Grouping (MCG).