Search Results for author: Jitendra Malik

Found 167 papers, 86 papers with code

Navigating to Objects Specified by Images

no code implementations3 Apr 2023 Jacob Krantz, Theophile Gervet, Karmesh Yadav, Austin Wang, Chris Paxton, Roozbeh Mottaghi, Dhruv Batra, Jitendra Malik, Stefan Lee, Devendra Singh Chaplot

Our modular method solves sub-tasks of exploration, goal instance re-identification, goal localization, and local navigation.

Navigate Visual Reasoning

Learning Humanoid Locomotion with Transformers

no code implementations6 Mar 2023 Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, Koushil Sreenath

We present a sim-to-real learning-based approach for real-world humanoid locomotion.

Decoupling Human and Camera Motion from Videos in the Wild

1 code implementation CVPR 2023 Vickie Ye, Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa

Our method robustly recovers the global 3D trajectories of people in challenging in-the-wild videos, such as PoseTrack.

Big Little Transformer Decoder

1 code implementation15 Feb 2023 Sehoon Kim, Karttikeya Mangalam, Suhong Moon, John Canny, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer

To address this, we propose Big Little Decoder (BiLD), a framework that can improve inference efficiency and latency for a wide range of text generation applications.

Language Modelling Machine Translation +1

Reversible Vision Transformers

4 code implementations CVPR 2022 Karttikeya Mangalam, Haoqi Fan, Yanghao Li, Chao-yuan Wu, Bo Xiong, Christoph Feichtenhofer, Jitendra Malik

Reversible Vision Transformers achieve a reduced memory footprint of up to 15. 5x at roughly identical model complexity, parameters and accuracy, demonstrating the promise of reversible vision transformers as an efficient backbone for hardware resource limited training regimes.

Image Classification object-detection +2

Multiview Compressive Coding for 3D Reconstruction

1 code implementation CVPR 2023 Chao-yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, Georgia Gkioxari

We introduce a simple framework that operates on 3D points of single objects or whole scenes coupled with category-agnostic large-scale training from diverse RGB-D videos.

3D Reconstruction Self-Supervised Learning +1

CA$^2$T-Net: Category-Agnostic 3D Articulation Transfer from Single Image

no code implementations5 Jan 2023 Jasmine Collins, Anqi Liang, Jitendra Malik, Hao Zhang, Frédéric Devernay

We present a neural network approach to transfer the motion from a single image of an articulated object to a rest-state (i. e., unarticulated) 3D model.

Navigating to Objects in the Real World

no code implementations2 Dec 2022 Theophile Gervet, Soumith Chintala, Dhruv Batra, Jitendra Malik, Devendra Singh Chaplot

In contrast, end-to-end learning does not, dropping from 77% simulation to 23% real-world success rate due to a large image domain gap between simulation and reality.

Navigate Visual Navigation

Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances

1 code implementation29 Nov 2022 Jacob Krantz, Stefan Lee, Jitendra Malik, Dhruv Batra, Devendra Singh Chaplot

We consider the problem of embodied visual navigation given an image-goal (ImageNav) where an agent is initialized in an unfamiliar environment and tasked with navigating to a location 'described' by an image.

Visual Navigation

Learning to Imitate Object Interactions from Internet Videos

no code implementations23 Nov 2022 Austin Patel, Andrew Wang, Ilija Radosavovic, Jitendra Malik

In this paper we make two main contributions: (1) a novel reconstruction technique RHOV (Reconstructing Hands and Objects from Videos), which reconstructs 4D trajectories of both the hand and the object using 2D image cues and temporal smoothness constraints; (2) a system for imitating object interactions in a physics simulator with reinforcement learning.

Learning Visual Locomotion with Cross-Modal Supervision

no code implementations7 Nov 2022 Antonio Loquercio, Ashish Kumar, Jitendra Malik

In this work, we show how to learn a visual walking policy that only uses a monocular RGB camera and proprioception.

Real-World Robot Learning with Masked Visual Pre-training

1 code implementation6 Oct 2022 Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell

Finally, we train a 307M parameter vision transformer on a massive collection of 4. 5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning.

Learning a Single Near-hover Position Controller for Vastly Different Quadcopters

no code implementations19 Sep 2022 Dingqi Zhang, Antonio Loquercio, Xiangyu Wu, Ashish Kumar, Jitendra Malik, Mark W. Mueller

This paper proposes an adaptive near-hover position controller for quadcopters, which can be deployed to quadcopters of very different mass, size and motor constants, and also shows rapid adaptation to unknown disturbances during runtime.

Drone Controller

Multi-skill Mobile Manipulation for Object Rearrangement

no code implementations6 Sep 2022 Jiayuan Gu, Devendra Singh Chaplot, Hao Su, Jitendra Malik

To tackle the entire task, prior work chains multiple stationary manipulation skills with a point-goal navigation skill, which are learned individually on subtasks.

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

3 code implementations2 Jun 2022 Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer

After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes.

Automatic Speech Recognition Automatic Speech Recognition (ASR)

Adapting Rapid Motor Adaptation for Bipedal Robots

no code implementations30 May 2022 Ashish Kumar, Zhongyu Li, Jun Zeng, Deepak Pathak, Koushil Sreenath, Jitendra Malik

In this work, we leverage recent advances in rapid adaptation for locomotion control, and extend them to work on bipedal robots.

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

1 code implementation CVPR 2022 Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, Du Tran

From PA we construct a large set of pseudo-ground-truth instance masks; combined with human-annotated instance masks we train GGNs and significantly outperform the SOTA on open-world instance segmentation on various benchmarks including COCO, LVIS, ADE20K, and UVO.

Instance Segmentation Semantic Segmentation

Masked Visual Pre-training for Motor Control

1 code implementation11 Mar 2022 Tete Xiao, Ilija Radosavovic, Trevor Darrell, Jitendra Malik

This paper shows that self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels.

PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

no code implementations CVPR 2022 Santhosh Kumar Ramakrishnan, Devendra Singh Chaplot, Ziad Al-Halah, Jitendra Malik, Kristen Grauman

We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of `where to look?'


MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

1 code implementation CVPR 2022 Chao-yuan Wu, Yanghao Li, Karttikeya Mangalam, Haoqi Fan, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

Instead of trying to process more frames at once like most existing methods, we propose to process videos in an online fashion and cache "memory" at each iteration.

Ranked #2 on Action Anticipation on EPIC-KITCHENS-100 (using extra training data)

Action Anticipation Action Classification +2

Tracking People by Predicting 3D Appearance, Location and Pose

no code implementations CVPR 2022 Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

For a future frame, we compute the similarity between the predicted state of a tracklet and the single frame observations in a probabilistic manner.

Tracking People by Predicting 3D Appearance, Location & Pose

no code implementations8 Dec 2021 Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

For a future frame, we compute the similarity between the predicted state of a tracklet and the single frame observations in a probabilistic manner.

Coupling Vision and Proprioception for Navigation of Legged Robots

no code implementations CVPR 2022 Zipeng Fu, Ashish Kumar, Ananye Agarwal, Haozhi Qi, Jitendra Malik, Deepak Pathak

A safety advisor module adds sensed unexpected obstacles to the occupancy map and environment-determined speed limits to the velocity command generator.

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

6 code implementations CVPR 2022 Yanghao Li, Chao-yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection.

 Ranked #1 on Action Classification on Kinetics-600 (GFLOPs metric)

Action Classification Action Recognition +5

PyTorchVideo: A Deep Learning Library for Video Understanding

1 code implementation18 Nov 2021 Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer

We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.

Self-Supervised Learning Video Understanding

Tracking People with 3D Representations

1 code implementation NeurIPS 2021 Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

We find that 3D representations are more effective than 2D representations for tracking in these settings, and we obtain state-of-the-art performance.

Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots

no code implementations25 Oct 2021 Zipeng Fu, Ashish Kumar, Jitendra Malik, Deepak Pathak

We demonstrate that learning to minimize energy consumption plays a key role in the emergence of natural locomotion gaits at different speeds in real quadruped robots.

Ego4D: Around the World in 3,000 Hours of Egocentric Video

3 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Differentiable Stereopsis: Meshes from multiple views using differentiable rendering

1 code implementation CVPR 2022 Shubham Goel, Georgia Gkioxari, Jitendra Malik

We propose Differentiable Stereopsis, a multi-view stereo approach that reconstructs shape and texture from few input views and noisy cameras.

Active 3D Shape Reconstruction from Vision and Touch

2 code implementations NeurIPS 2021 Edward J. Smith, David Meger, Luis Pineda, Roberto Calandra, Jitendra Malik, Adriana Romero, Michal Drozdzal

In this paper, we focus on this problem and introduce a system composed of: 1) a haptic simulator leveraging high spatial resolution vision-based tactile sensors for active touching of 3D objects; 2)a mesh-based 3D shape reconstruction model that relies on tactile or visuotactile signals; and 3) a set of data-driven solutions with either tactile or visuotactile priors to guide the shape exploration.

3D Reconstruction 3D Shape Reconstruction

RMA: Rapid Motor Adaptation for Legged Robots

no code implementations8 Jul 2021 Ashish Kumar, Zipeng Fu, Deepak Pathak, Jitendra Malik

Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear.

Multiscale Vision Transformers

6 code implementations ICCV 2021 Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

Action Classification Action Recognition +2

Distribution-Free, Risk-Controlling Prediction Sets

2 code implementations7 Jan 2021 Stephen Bates, Anastasios Angelopoulos, Lihua Lei, Jitendra Malik, Michael I. Jordan

While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making.

BIG-bench Machine Learning Classification +8

Human Mesh Recovery from Multiple Shots

no code implementations CVPR 2022 Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa

The tools we develop open the door to processing and analyzing in 3D content from a large library of edited media, which could be helpful for many downstream applications.

3D Reconstruction Human Mesh Recovery

From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting

2 code implementations ICCV 2021 Karttikeya Mangalam, Yang An, Harshayu Girase, Jitendra Malik

Uncertainty in future trajectories stems from two sources: (a) sources that are known to the agent but unknown to the model, such as long term goals and (b)sources that are unknown to both the agent & the model, such as intent of other agents & irreducible randomness indecisions.

Trajectory Forecasting

Better Knowledge Retention through Metric Learning

no code implementations26 Nov 2020 Ke Li, Shichong Peng, Kailas Vodrahalli, Jitendra Malik

In continual learning, new categories may be introduced over time, and an ideal learning system should perform well on both the original categories and the new categories.

Continual Learning Metric Learning

Shape, Illumination, and Reflectance from Shading

no code implementations7 Oct 2020 Jonathan T. Barron, Jitendra Malik

A fundamental problem in computer vision is that of inferring the intrinsic, 3D structure of the world from flat, 2D images of that world.

Color Constancy

Uncertainty Sets for Image Classifiers using Conformal Prediction

2 code implementations ICLR 2021 Anastasios Angelopoulos, Stephen Bates, Jitendra Malik, Michael. I. Jordan

Convolutional image classifiers can achieve high predictive accuracy, but quantifying their uncertainty remains an unresolved challenge, hindering their deployment in consequential settings.

Conformal Prediction

Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild

1 code implementation ECCV 2020 Jason Y. Zhang, Sam Pepose, Hanbyul Joo, Deva Ramanan, Jitendra Malik, Angjoo Kanazawa

We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment.

3D Human Pose Estimation 3D Shape Reconstruction From A Single 2D Image +2

Shape and Viewpoint without Keypoints

no code implementations ECCV 2020 Shubham Goel, Angjoo Kanazawa, Jitendra Malik

We present a learning framework that learns to recover the 3D shape, pose and texture from a single image, trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.

3D Shape Reconstruction from Vision and Touch

1 code implementation NeurIPS 2020 Edward J. Smith, Roberto Calandra, Adriana Romero, Georgia Gkioxari, David Meger, Jitendra Malik, Michal Drozdzal

When a toddler is presented a new toy, their instinctual behaviour is to pick it upand inspect it with their hand and eyes in tandem, clearly searching over its surface to properly understand what they are playing with.

3D Shape Reconstruction

Deep Isometric Learning for Visual Recognition

1 code implementation ICML 2020 Haozhi Qi, Chong You, Xiaolong Wang, Yi Ma, Jitendra Malik

Initialization, normalization, and skip connections are believed to be three indispensable techniques for training very deep convolutional neural networks and obtaining state-of-the-art performance.

Inclusive GAN: Improving Data and Minority Coverage in Generative Models

1 code implementation ECCV 2020 Ning Yu, Ke Li, Peng Zhou, Jitendra Malik, Larry Davis, Mario Fritz

Generative Adversarial Networks (GANs) have brought about rapid progress towards generating photorealistic images.

State-Only Imitation Learning for Dexterous Manipulation

no code implementations7 Apr 2020 Ilija Radosavovic, Xiaolong Wang, Lerrel Pinto, Jitendra Malik

To tackle this setting, we train an inverse dynamics model and use it to predict actions for state-only demonstrations.

Imitation Learning

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

2 code implementations ECCV 2020 Jeffrey O. Zhang, Alexander Sax, Amir Zamir, Leonidas Guibas, Jitendra Malik

When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than starting from randomly initialized weights.

Imitation Learning Incremental Learning +3

Learning to Navigate Using Mid-Level Visual Priors

1 code implementation23 Dec 2019 Alexander Sax, Jeffrey O. Zhang, Bradley Emi, Amir Zamir, Silvio Savarese, Leonidas Guibas, Jitendra Malik

How much does having visual priors about the world (e. g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e. g. navigating a complex environment)?

Navigate reinforcement-learning +2

Approximate Feature Collisions in Neural Nets

1 code implementation NeurIPS 2019 Ke Li, Tianhao Zhang, Jitendra Malik

Work on adversarial examples has shown that neural nets are surprisingly sensitive to adversarially chosen changes of small magnitude.

3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera

1 code implementation ICCV 2019 Iro Armeni, Zhi-Yang He, JunYoung Gwak, Amir R. Zamir, Martin Fischer, Jitendra Malik, Silvio Savarese

Given a 3D mesh and registered panoramic images, we construct a graph that spans the entire building and includes semantics on objects (e. g., class, material, and other attributes), rooms (e. g., scene category, volume, etc.)

Predicting 3D Human Dynamics from Video

1 code implementation ICCV 2019 Jason Y. Zhang, Panna Felsen, Angjoo Kanazawa, Jitendra Malik

In this work, we present perhaps the first approach for predicting a future 3D mesh model sequence of a person from past video input.

3D Human Dynamics 3D Human Pose Estimation +2

Mesh R-CNN

6 code implementations ICCV 2019 Georgia Gkioxari, Jitendra Malik, Justin Johnson

We propose a system that detects objects in real-world images and produces a triangle mesh giving the full 3D shape of each detected object.

3D Shape Modeling

Learning Navigation Subroutines from Egocentric Videos

no code implementations29 May 2019 Ashish Kumar, Saurabh Gupta, Jitendra Malik

We demonstrate our proposed approach in context of navigation, and show that we can successfully learn consistent and diverse visuomotor subroutines from passive egocentric videos.

Pseudo Label

Accelerated Sparse Recovery Under Structured Measurements

no code implementations ICLR 2019 Ke Li, Jitendra Malik

Extensive work on compressed sensing has yielded a rich collection of sparse recovery algorithms, each making different tradeoffs between recovery condition and computational efficiency.

ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors

1 code implementation ICCV 2019 Wei-cheng Kuo, Anelia Angelova, Jitendra Malik, Tsung-Yi Lin

However, it is difficult and costly to segment objects in novel categories because a large number of mask annotations is required.

Instance Segmentation Semantic Segmentation

Combining Optimal Control and Learning for Visual Navigation in Novel Environments

no code implementations6 Mar 2019 Somil Bansal, Varun Tolani, Saurabh Gupta, Jitendra Malik, Claire Tomlin

Model-based control is a popular paradigm for robot navigation because it can leverage a known dynamics model to efficiently plan robust robot trajectories.

Robot Navigation Visual Navigation

Trajectory Normalized Gradients for Distributed Optimization

no code implementations24 Jan 2019 Jianqiao Wangni, Ke Li, Jianbo Shi, Jitendra Malik

Recently, researchers proposed various low-precision gradient compression, for efficient communication in large-scale distributed optimization.

Benchmarking Distributed Optimization

Learning Independent Object Motion from Unlabelled Stereoscopic Videos

no code implementations CVPR 2019 Zhe Cao, Abhishek Kar, Christian Haene, Jitendra Malik

Unlike prior learning based work which has focused on predicting dense pixel-wise optical flow field and/or a depth map for each image, we propose to predict object instance specific 3D scene flow maps and instance masks from which we are able to derive the motion direction and speed for each object instance.

Optical Flow Estimation

Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies

1 code implementation31 Dec 2018 Alexander Sax, Bradley Emi, Amir R. Zamir, Leonidas Guibas, Silvio Savarese, Jitendra Malik

This skill set (hereafter mid-level perception) provides the policy with a more processed state of the world compared to raw images.

Object Detection

Learning 3D Human Dynamics from Video

1 code implementation CVPR 2019 Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, Jitendra Malik

We present a framework that can similarly learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features.

Ranked #8 on 3D Human Pose Estimation on 3DPW (Acceleration Error metric)

3D Human Dynamics 3D Human Pose Estimation

Visual Memory for Robust Path Following

no code implementations NeurIPS 2018 Ashish Kumar, Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik

Equipped with this abstraction, a second network observes the world and decides how to act to retrace the path under noisy actuation and a changing environment.

Are All Training Examples Created Equal? An Empirical Study

no code implementations30 Nov 2018 Kailas Vodrahalli, Ke Li, Jitendra Malik

Modern computer vision algorithms often rely on very large training datasets.

Active Learning

On the Implicit Assumptions of GANs

no code implementations29 Nov 2018 Ke Li, Jitendra Malik

Generative adversarial nets (GANs) have generated a lot of excitement.

Diverse Image Synthesis from Semantic Layouts via Conditional IMLE

1 code implementation ICCV 2019 Ke Li, Tianhao Zhang, Jitendra Malik

Most existing methods for conditional image synthesis are only able to generate a single plausible image for any given input, or at best a fixed number of plausible images.

Image Generation Semantic Segmentation

SFV: Reinforcement Learning of Physical Skills from Videos

1 code implementation8 Oct 2018 Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, Sergey Levine

In this paper, we propose a method that enables physically simulated characters to learn skills from videos (SFV).

Pose Estimation reinforcement-learning +1

Super-Resolution via Conditional Implicit Maximum Likelihood Estimation

no code implementations2 Oct 2018 Ke Li, Shichong Peng, Jitendra Malik

Single-image super-resolution (SISR) is a canonical problem with diverse applications.

Image Super-Resolution

A Study of Robustness of Neural Nets Using Approximate Feature Collisions

no code implementations27 Sep 2018 Ke Li*, Tianhao Zhang*, Jitendra Malik

In recent years, various studies have focused on the robustness of neural nets.

Implicit Maximum Likelihood Estimation

1 code implementation ICLR 2019 Ke Li, Jitendra Malik

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly.

Cost-Sensitive Active Learning for Intracranial Hemorrhage Detection

no code implementations8 Sep 2018 Wei-cheng Kuo, Christian Häne, Esther Yuh, Pratik Mukherjee, Jitendra Malik

Deep learning for clinical applications is subject to stringent performance requirements, which raises a need for large labeled datasets.

Active Learning Computed Tomography (CT)

Gibson Env: Real-World Perception for Embodied Agents

5 code implementations CVPR 2018 Fei Xia, Amir Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, Silvio Savarese

Developing visual perception models for active agents and sensorimotor control are cumbersome to be done in the physical world, as existing algorithms are too slow to efficiently learn in real-time and robots are fragile and costly.

Domain Adaptation General Reinforcement Learning +1

Learning Instance Segmentation by Interaction

1 code implementation21 Jun 2018 Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik

The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels.

Instance Segmentation Semantic Segmentation

PatchFCN for Intracranial Hemorrhage Detection

no code implementations8 Jun 2018 Wei-cheng Kuo, Christian Häne, Esther Yuh, Pratik Mukherjee, Jitendra Malik

This paper studies the problem of detecting and segmenting acute intracranial hemorrhage on head computed tomography (CT) scans.

Computed Tomography (CT) object-detection +2

More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch

no code implementations28 May 2018 Roberto Calandra, Andrew Owens, Dinesh Jayaraman, Justin Lin, Wenzhen Yuan, Jitendra Malik, Edward H. Adelson, Sergey Levine

This model -- a deep, multimodal convolutional network -- predicts the outcome of a candidate grasp adjustment, and then executes a grasp by iteratively selecting the most promising actions.

Robotic Grasping

Learning Category-Specific Mesh Reconstruction from Image Collections

no code implementations ECCV 2018 Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, Jitendra Malik

The shape is represented as a deformable 3D mesh model of an object category where a shape is parameterized by a learned mean shape and per-instance predicted deformation.

Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction

no code implementations CVPR 2018 Shubham Tulsiani, Alexei A. Efros, Jitendra Malik

We present a framework for learning single-view shape and pose prediction without using direct supervision for either.

Pose Prediction

Unifying Map and Landmark Based Representations for Visual Navigation

no code implementations21 Dec 2017 Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik

This works presents a formulation for visual navigation that unifies map based spatial reasoning and path planning, with landmark based robust plan execution in noisy environments.

Navigate Visual Navigation

End-to-end Recovery of Human Shape and Pose

7 code implementations CVPR 2018 Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik

The main objective is to minimize the reprojection loss of keypoints, which allow our model to be trained using images in-the-wild that only have ground truth 2D annotations.

3D Hand Pose Estimation 3D Human Shape Estimation +4

From Lifestyle Vlogs to Everyday Interactions

no code implementations CVPR 2018 David F. Fouhey, Wei-cheng Kuo, Alexei A. Efros, Jitendra Malik

A major stumbling block to progress in understanding basic human interactions, such as getting out of bed or opening a refrigerator, is lack of good training data.

Future prediction

Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene

no code implementations CVPR 2018 Shubham Tulsiani, Saurabh Gupta, David Fouhey, Alexei A. Efros, Jitendra Malik

The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in terms of shape and pose.

Generic 3D Representation via Pose Estimation and Matching

1 code implementation23 Oct 2017 Amir R. Zamir, Tilman Wekel, Pulkit Argrawal, Colin Weil, Jitendra Malik, Silvio Savarese

Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited.

Pose Estimation Surface Normal Estimation

What Will Happen Next? Forecasting Player Moves in Sports Videos

no code implementations ICCV 2017 Panna Felsen, Pulkit Agrawal, Jitendra Malik

A large number of very popular team sports involve the act of one team trying to score a goal against the other.

Learning a Multi-View Stereo Machine

1 code implementation NeurIPS 2017 Abhishek Kar, Christian Häne, Jitendra Malik

We thoroughly evaluate our approach on the ShapeNet dataset and demonstrate the benefits over classical approaches as well as recent learning based methods.

3D Reconstruction

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

6 code implementations CVPR 2018 Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

Action Recognition Temporal Action Localization +1

Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency

no code implementations CVPR 2017 Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, Jitendra Malik

We study the notion of consistency between a 3D shape and a 2D observation and propose a differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view.

Hierarchical Surface Prediction for 3D Object Reconstruction

1 code implementation3 Apr 2017 Christian Häne, Shubham Tulsiani, Jitendra Malik

A major limitation of such approaches is that they only predict a coarse resolution voxel grid, which does not capture the surface of the objects well.

3D Geometry Prediction 3D Object Reconstruction

Fast k-Nearest Neighbour Search via Prioritized DCI

2 code implementations ICML 2017 Ke Li, Jitendra Malik

Most exact methods for k-nearest neighbour search suffer from the curse of dimensionality; that is, their query times exhibit exponential dependence on either the ambient or the intrinsic dimensionality.

Learning to Optimize Neural Nets

no code implementations ICLR 2018 Ke Li, Jitendra Malik

Learning to Optimize is a recently proposed framework for learning optimization algorithms using reinforcement learning.

reinforcement-learning Reinforcement Learning (RL) +1

Feedback Networks

1 code implementation CVPR 2017 Amir R. Zamir, Te-Lin Wu, Lin Sun, William Shen, Jitendra Malik, Silvio Savarese

Currently, the most successful learning models in computer vision are based on learning successive representations followed by a decision layer.

Learning Shape Abstractions by Assembling Volumetric Primitives

3 code implementations CVPR 2017 Shubham Tulsiani, Hao Su, Leonidas J. Guibas, Alexei A. Efros, Jitendra Malik

We present a learning framework for abstracting complex shapes by learning to assemble objects using 3D volumetric primitives.

Learning to Optimize

no code implementations 2016 2016 Ke Li, Jitendra Malik

Algorithm design is a laborious process and often requires many iterations of ideation and validation.

reinforcement-learning Reinforcement Learning (RL)

View Synthesis by Appearance Flow

4 code implementations11 May 2016 Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, Alexei A. Efros

We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints.

Novel View Synthesis

Amodal Instance Segmentation

no code implementations27 Apr 2016 Ke Li, Jitendra Malik

We consider the problem of amodal instance segmentation, the objective of which is to predict the region encompassing both visible and occluded parts of each object.

Amodal Instance Segmentation Semantic Segmentation

Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing

1 code implementation1 Dec 2015 Ke Li, Jitendra Malik

Existing methods for retrieving k-nearest neighbours suffer from the curse of dimensionality.

Iterative Instance Segmentation

no code implementations CVPR 2016 Ke Li, Bharath Hariharan, Jitendra Malik

Existing methods for pixel-wise labelling tasks generally disregard the underlying structure of labellings, often leading to predictions that are visually implausible.

Instance Segmentation Semantic Segmentation +1

Shape and Symmetry Induction for 3D Objects

no code implementations24 Nov 2015 Shubham Tulsiani, Abhishek Kar, Qi-Xing Huang, João Carreira, Jitendra Malik

Actions as simple as grasping an object or navigating around it require a rich understanding of that object's 3D shape from a given viewpoint.

General Classification

Learning Visual Predictive Models of Physics for Playing Billiards

no code implementations23 Nov 2015 Katerina Fragkiadaki, Pulkit Agrawal, Sergey Levine, Jitendra Malik

The ability to plan and execute goal specific actions in varied, unexpected settings is a central requirement of intelligent agents.

Amodal Completion and Size Constancy in Natural Scenes

no code implementations ICCV 2015 Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik

We consider the problem of enriching current object detection systems with veridical object sizes and relative depth estimates from a single image.

object-detection Object Detection +2

Bandit Label Inference for Weakly Supervised Learning

no code implementations22 Sep 2015 Ke Li, Jitendra Malik

The scarcity of data annotated at the desired level of granularity is a recurring issue in many applications.

Weakly-supervised Learning

Recurrent Network Models for Human Dynamics

no code implementations ICCV 2015 Katerina Fragkiadaki, Sergey Levine, Panna Felsen, Jitendra Malik

We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture.

Ranked #8 on Human Pose Forecasting on Human3.6M (MAR, walking, 1,000ms metric)

Human Dynamics Human Pose Forecasting +2

Human Pose Estimation with Iterative Error Feedback

1 code implementation CVPR 2016 Joao Carreira, Pulkit Agrawal, Katerina Fragkiadaki, Jitendra Malik

Hierarchical feature extractors such as Convolutional Networks (ConvNets) have achieved impressive performance on a variety of classification tasks using purely feedforward processing.

Pose Estimation Semantic Segmentation

Cross Modal Distillation for Supervision Transfer

1 code implementation CVPR 2016 Saurabh Gupta, Judy Hoffman, Jitendra Malik

In this work we propose a technique that transfers supervision between images from different modalities.

Optical Flow Estimation

Aligning 3D Models to RGB-D Images of Cluttered Scenes

no code implementations CVPR 2015 Saurabh Gupta, Pablo Arbelaez, Ross Girshick, Jitendra Malik

The goal of this work is to represent objects in an RGB-D scene with corresponding 3D models from a library.

Visual Semantic Role Labeling

1 code implementation17 May 2015 Saurabh Gupta, Jitendra Malik

In this paper we introduce the problem of Visual Semantic Role Labeling: given an image we want to detect people doing actions and localize the objects of interaction.

Action Classification Action Recognition +2

DeepBox: Learning Objectness with Convolutional Networks

1 code implementation ICCV 2015 Wei-cheng Kuo, Bharath Hariharan, Jitendra Malik

Existing object proposal approaches use primarily bottom-up cues to rank proposals, while we believe that objectness is in fact a high level construct.

Learning to See by Moving

no code implementations ICCV 2015 Pulkit Agrawal, Joao Carreira, Jitendra Malik

We show that given the same number of training images, features learnt using egomotion as supervision compare favourably to features learnt using class-label as supervision on visual tasks of scene recognition, object recognition, visual odometry and keypoint matching.

Object Recognition Scene Recognition +1

Contextual Action Recognition with R*CNN

2 code implementations ICCV 2015 Georgia Gkioxari, Ross Girshick, Jitendra Malik

In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition system.

Action Recognition General Classification +2

Pose Induction for Novel Object Categories

1 code implementation ICCV 2015 Shubham Tulsiani, João Carreira, Jitendra Malik

We address the task of predicting pose for objects of unannotated object categories from a small seed set of annotated object classes.

Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation

1 code implementation3 Mar 2015 Jordi Pont-Tuset, Pablo Arbelaez, Jonathan T. Barron, Ferran Marques, Jitendra Malik

We propose a unified approach for bottom-up hierarchical image segmentation and object proposal generation for recognition, called Multiscale Combinatorial Grouping (MCG).

Image Segmentation Object Proposal Generation +1

Inferring 3D Object Pose in RGB-D Images

no code implementations16 Feb 2015 Saurabh Gupta, Pablo Arbeláez, Ross Girshick, Jitendra Malik

The goal of this work is to replace objects in an RGB-D scene with corresponding 3D models from a library.

Learning to Segment Moving Objects in Videos

no code implementations CVPR 2015 Katerina Fragkiadaki, Pablo Arbelaez, Panna Felsen, Jitendra Malik

We segment moving objects in videos by ranking spatio-temporal segment proposals according to "moving objectness": how likely they are to contain a moving object.

Video Segmentation Video Semantic Segmentation

Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction

no code implementations NeurIPS 2014 Katerina Fragkiadaki, Marta Salas, Pablo Arbelaez, Jitendra Malik

Furthermore, NRSfM needs to be robust to noise in both segmentation and tracking, e. g., drifting, segmentation ``leaking'', optical flow ``bleeding'' etc.

3D Reconstruction Optical Flow Estimation +3

Category-Specific Object Reconstruction from a Single Image

no code implementations CVPR 2015 Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik

Object reconstruction from a single image -- in the wild -- is a problem where we can make progress and get meaningful results today.

object-detection Object Detection +1

Viewpoints and Keypoints

no code implementations CVPR 2015 Shubham Tulsiani, Jitendra Malik

We characterize the problem of pose estimation for rigid objects in terms of determining viewpoint to explain coarse pose and keypoint prediction to capture the finer details.

Keypoint Detection

Hypercolumns for Object Segmentation and Fine-grained Localization

6 code implementations CVPR 2015 Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik

Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation.

Semantic Segmentation

Detecting People in Cubist Art

no code implementations22 Sep 2014 Shiry Ginosar, Daniel Haas, Timothy Brown, Jitendra Malik

Although the human visual system is surprisingly robust to extreme distortion when recognizing objects, most evaluations of computer object detection methods focus only on robustness to natural form deformations such as people's pose changes.

object-detection Object Detection

Pixels to Voxels: Modeling Visual Representation in the Human Brain

no code implementations18 Jul 2014 Pulkit Agrawal, Dustin Stansbury, Jitendra Malik, Jack L. Gallant

We find that both classes of models accurately predict brain activity in high-level visual areas, directly from pixels and without the need for any semantic tags or hand annotation of images.

BIG-bench Machine Learning Object Recognition

Analyzing the Performance of Multilayer Neural Networks for Object Recognition

1 code implementation7 Jul 2014 Pulkit Agrawal, Ross Girshick, Jitendra Malik

In the last two years, convolutional neural networks (CNNs) have achieved an impressive suite of results on standard recognition datasets and tasks.

Object Recognition

R-CNNs for Pose Estimation and Action Detection

no code implementations19 Jun 2014 Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik

We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images.

Action Classification Action Detection +3

Using k-Poselets for Detecting People and Localizing Their Keypoints

no code implementations CVPR 2014 Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik

A k-poselet is a deformable part model (DPM) with k parts, where each of the parts is a poselet, aligned to a specific configuration of keypoints based on ground-truth annotations.

Human Detection

Multiscale Combinatorial Grouping

no code implementations CVPR 2014 Pablo Arbelaez, Jordi Pont-Tuset, Jonathan T. Barron, Ferran Marques, Jitendra Malik

We propose a unified approach for bottom-up hierarchical image segmentation and object candidate generation for recognition, called Multiscale Combinatorial Grouping (MCG).

Image Segmentation Semantic Segmentation