Search Results for author: Jitendra Malik

Found 136 papers, 65 papers with code

Tracking People by Predicting 3D Appearance, Location & Pose

no code implementations8 Dec 2021 Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

For a future frame, we compute the similarity between the predicted state of a tracklet and the single frame observations in a probabilistic manner.

Coupling Vision and Proprioception for Navigation of Legged Robots

no code implementations3 Dec 2021 Zipeng Fu, Ashish Kumar, Ananye Agarwal, Haozhi Qi, Jitendra Malik, Deepak Pathak

We exploit the complementary strengths of vision and proprioception to achieve point goal navigation in a legged robot.

Legged Robots

Improved Multiscale Vision Transformers for Classification and Detection

no code implementations2 Dec 2021 Yanghao Li, Chao-yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

In this paper, we study Multiscale Vision Transformers (MViT) as a unified architecture for image and video classification, as well as object detection.

Ranked #2 on Action Recognition on Something-Something V2 (using extra training data)

Action Classification Action Recognition +4

PyTorchVideo: A Deep Learning Library for Video Understanding

1 code implementation18 Nov 2021 Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer

We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.

Self-Supervised Learning Video Understanding

Tracking People with 3D Representations

no code implementations NeurIPS 2021 Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

We find that 3D representations are more effective than 2D representations for tracking in these settings, and we obtain state-of-the-art performance.

Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots

no code implementations25 Oct 2021 Zipeng Fu, Ashish Kumar, Jitendra Malik, Deepak Pathak

We demonstrate that learning to minimize energy consumption plays a key role in the emergence of natural locomotion gaits at different speeds in real quadruped robots.

Legged Robots

Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans

no code implementations ICCV 2021 Ainaz Eftekhar, Alexander Sax, Roman Bachmann, Jitendra Malik, Amir Zamir

This paper introduces a pipeline to parametrically sample and render multi-task vision datasets from comprehensive 3D scans from the real world.

Depth Estimation

Differentiable Stereopsis: Meshes from multiple views using differentiable rendering

no code implementations11 Oct 2021 Shubham Goel, Georgia Gkioxari, Jitendra Malik

We propose Differentiable Stereopsis, a multi-view stereo approach that reconstructs shape and texture from few input views and noisy cameras.

Active 3D Shape Reconstruction from Vision and Touch

1 code implementation NeurIPS 2021 Edward J. Smith, David Meger, Luis Pineda, Roberto Calandra, Jitendra Malik, Adriana Romero, Michal Drozdzal

In this paper, we focus on this problem and introduce a system composed of: 1) a haptic simulator leveraging high spatial resolution vision-based tactile sensors for active touching of 3D objects; 2)a mesh-based 3D shape reconstruction model that relies on tactile or visuotactile signals; and 3) a set of data-driven solutions with either tactile or visuotactile priors to guide the shape exploration.

3D Reconstruction 3D Shape Reconstruction

RMA: Rapid Motor Adaptation for Legged Robots

no code implementations8 Jul 2021 Ashish Kumar, Zipeng Fu, Deepak Pathak, Jitendra Malik

Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear.

Legged Robots

Multiscale Vision Transformers

2 code implementations ICCV 2021 Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

Action Classification Action Recognition +2

Distribution-Free, Risk-Controlling Prediction Sets

3 code implementations7 Jan 2021 Stephen Bates, Anastasios Angelopoulos, Lihua Lei, Jitendra Malik, Michael I. Jordan

While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making.

Decision Making General Classification +4

Human Mesh Recovery from Multiple Shots

no code implementations17 Dec 2020 Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa

The tools we develop open the door to processing and analyzing in 3D content from a large library of edited media, which could be helpful for many downstream applications.

3D Reconstruction

From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting

1 code implementation ICCV 2021 Karttikeya Mangalam, Yang An, Harshayu Girase, Jitendra Malik

Uncertainty in future trajectories stems from two sources: (a) sources that are known to the agent but unknown to the model, such as long term goals and (b)sources that are unknown to both the agent & the model, such as intent of other agents & irreducible randomness indecisions.

Trajectory Forecasting

Better Knowledge Retention through Metric Learning

no code implementations26 Nov 2020 Ke Li, Shichong Peng, Kailas Vodrahalli, Jitendra Malik

In continual learning, new categories may be introduced over time, and an ideal learning system should perform well on both the original categories and the new categories.

Continual Learning Metric Learning

Shape, Illumination, and Reflectance from Shading

no code implementations7 Oct 2020 Jonathan T. Barron, Jitendra Malik

A fundamental problem in computer vision is that of inferring the intrinsic, 3D structure of the world from flat, 2D images of that world.

Color Constancy

Uncertainty Sets for Image Classifiers using Conformal Prediction

2 code implementations ICLR 2021 Anastasios Angelopoulos, Stephen Bates, Jitendra Malik, Michael. I. Jordan

Convolutional image classifiers can achieve high predictive accuracy, but quantifying their uncertainty remains an unresolved challenge, hindering their deployment in consequential settings.

Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild

1 code implementation ECCV 2020 Jason Y. Zhang, Sam Pepose, Hanbyul Joo, Deva Ramanan, Jitendra Malik, Angjoo Kanazawa

We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment.

3D Human Pose Estimation 3D Shape Reconstruction From A Single 2D Image +2

Shape and Viewpoint without Keypoints

no code implementations ECCV 2020 Shubham Goel, Angjoo Kanazawa, Jitendra Malik

We present a learning framework that learns to recover the 3D shape, pose and texture from a single image, trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.

3D Shape Reconstruction from Vision and Touch

1 code implementation NeurIPS 2020 Edward J. Smith, Roberto Calandra, Adriana Romero, Georgia Gkioxari, David Meger, Jitendra Malik, Michal Drozdzal

When a toddler is presented a new toy, their instinctual behaviour is to pick it upand inspect it with their hand and eyes in tandem, clearly searching over its surface to properly understand what they are playing with.

3D Shape Reconstruction

Deep Isometric Learning for Visual Recognition

1 code implementation ICML 2020 Haozhi Qi, Chong You, Xiaolong Wang, Yi Ma, Jitendra Malik

Initialization, normalization, and skip connections are believed to be three indispensable techniques for training very deep convolutional neural networks and obtaining state-of-the-art performance.

State-Only Imitation Learning for Dexterous Manipulation

no code implementations7 Apr 2020 Ilija Radosavovic, Xiaolong Wang, Lerrel Pinto, Jitendra Malik

To tackle this setting, we train an inverse dynamics model and use it to predict actions for state-only demonstrations.

Imitation Learning

Inclusive GAN: Improving Data and Minority Coverage in Generative Models

1 code implementation ECCV 2020 Ning Yu, Ke Li, Peng Zhou, Jitendra Malik, Larry Davis, Mario Fritz

Generative Adversarial Networks (GANs) have brought about rapid progress towards generating photorealistic images.

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

1 code implementation ECCV 2020 Jeffrey O. Zhang, Alexander Sax, Amir Zamir, Leonidas Guibas, Jitendra Malik

When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than starting from randomly initialized weights.

Imitation Learning Incremental Learning +3

Learning to Navigate Using Mid-Level Visual Priors

1 code implementation23 Dec 2019 Alexander Sax, Jeffrey O. Zhang, Bradley Emi, Amir Zamir, Silvio Savarese, Leonidas Guibas, Jitendra Malik

How much does having visual priors about the world (e. g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e. g. navigating a complex environment)?

Representation Learning

Approximate Feature Collisions in Neural Nets

1 code implementation NeurIPS 2019 Ke Li, Tianhao Zhang, Jitendra Malik

Work on adversarial examples has shown that neural nets are surprisingly sensitive to adversarially chosen changes of small magnitude.

3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera

1 code implementation ICCV 2019 Iro Armeni, Zhi-Yang He, JunYoung Gwak, Amir R. Zamir, Martin Fischer, Jitendra Malik, Silvio Savarese

Given a 3D mesh and registered panoramic images, we construct a graph that spans the entire building and includes semantics on objects (e. g., class, material, and other attributes), rooms (e. g., scene category, volume, etc.)

Predicting 3D Human Dynamics from Video

1 code implementation ICCV 2019 Jason Y. Zhang, Panna Felsen, Angjoo Kanazawa, Jitendra Malik

In this work, we present perhaps the first approach for predicting a future 3D mesh model sequence of a person from past video input.

3D Human Dynamics 3D Human Pose Estimation +2

Learning Individual Styles of Conversational Gesture

1 code implementation CVPR 2019 Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, Jitendra Malik

Specifically, we perform cross-modal translation from "in-the-wild'' monologue speech of a single speaker to their hand and arm motion.

Speech-to-Gesture Translation Translation

Mesh R-CNN

5 code implementations ICCV 2019 Georgia Gkioxari, Jitendra Malik, Justin Johnson

We propose a system that detects objects in real-world images and produces a triangle mesh giving the full 3D shape of each detected object.

3D Shape Modeling

Learning Navigation Subroutines from Egocentric Videos

no code implementations29 May 2019 Ashish Kumar, Saurabh Gupta, Jitendra Malik

We demonstrate our proposed approach in context of navigation, and show that we can successfully learn consistent and diverse visuomotor subroutines from passive egocentric videos.

Accelerated Sparse Recovery Under Structured Measurements

no code implementations ICLR 2019 Ke Li, Jitendra Malik

Extensive work on compressed sensing has yielded a rich collection of sparse recovery algorithms, each making different tradeoffs between recovery condition and computational efficiency.

ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors

1 code implementation ICCV 2019 Wei-cheng Kuo, Anelia Angelova, Jitendra Malik, Tsung-Yi Lin

However, it is difficult and costly to segment objects in novel categories because a large number of mask annotations is required.

Instance Segmentation Semantic Segmentation

Combining Optimal Control and Learning for Visual Navigation in Novel Environments

no code implementations6 Mar 2019 Somil Bansal, Varun Tolani, Saurabh Gupta, Jitendra Malik, Claire Tomlin

Model-based control is a popular paradigm for robot navigation because it can leverage a known dynamics model to efficiently plan robust robot trajectories.

Robot Navigation Visual Navigation

Trajectory Normalized Gradients for Distributed Optimization

no code implementations24 Jan 2019 Jianqiao Wangni, Ke Li, Jianbo Shi, Jitendra Malik

Recently, researchers proposed various low-precision gradient compression, for efficient communication in large-scale distributed optimization.

Distributed Optimization

Learning Independent Object Motion from Unlabelled Stereoscopic Videos

no code implementations CVPR 2019 Zhe Cao, Abhishek Kar, Christian Haene, Jitendra Malik

Unlike prior learning based work which has focused on predicting dense pixel-wise optical flow field and/or a depth map for each image, we propose to predict object instance specific 3D scene flow maps and instance masks from which we are able to derive the motion direction and speed for each object instance.

Optical Flow Estimation

Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies

1 code implementation31 Dec 2018 Alexander Sax, Bradley Emi, Amir R. Zamir, Leonidas Guibas, Silvio Savarese, Jitendra Malik

This skill set (hereafter mid-level perception) provides the policy with a more processed state of the world compared to raw images.

Object Detection

Learning 3D Human Dynamics from Video

1 code implementation CVPR 2019 Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, Jitendra Malik

We present a framework that can similarly learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features.

3D Human Dynamics

Visual Memory for Robust Path Following

no code implementations NeurIPS 2018 Ashish Kumar, Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik

Equipped with this abstraction, a second network observes the world and decides how to act to retrace the path under noisy actuation and a changing environment.

Are All Training Examples Created Equal? An Empirical Study

no code implementations30 Nov 2018 Kailas Vodrahalli, Ke Li, Jitendra Malik

Modern computer vision algorithms often rely on very large training datasets.

Active Learning

On the Implicit Assumptions of GANs

no code implementations29 Nov 2018 Ke Li, Jitendra Malik

Generative adversarial nets (GANs) have generated a lot of excitement.

Diverse Image Synthesis from Semantic Layouts via Conditional IMLE

1 code implementation ICCV 2019 Ke Li, Tianhao Zhang, Jitendra Malik

Most existing methods for conditional image synthesis are only able to generate a single plausible image for any given input, or at best a fixed number of plausible images.

Image Generation Semantic Segmentation

SFV: Reinforcement Learning of Physical Skills from Videos

1 code implementation8 Oct 2018 Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, Sergey Levine

In this paper, we propose a method that enables physically simulated characters to learn skills from videos (SFV).

Pose Estimation

Super-Resolution via Conditional Implicit Maximum Likelihood Estimation

no code implementations2 Oct 2018 Ke Li, Shichong Peng, Jitendra Malik

Single-image super-resolution (SISR) is a canonical problem with diverse applications.

Image Super-Resolution

A Study of Robustness of Neural Nets Using Approximate Feature Collisions

no code implementations27 Sep 2018 Ke Li*, Tianhao Zhang*, Jitendra Malik

In recent years, various studies have focused on the robustness of neural nets.

Implicit Maximum Likelihood Estimation

1 code implementation ICLR 2019 Ke Li, Jitendra Malik

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly.

Cost-Sensitive Active Learning for Intracranial Hemorrhage Detection

no code implementations8 Sep 2018 Wei-cheng Kuo, Christian Häne, Esther Yuh, Pratik Mukherjee, Jitendra Malik

Deep learning for clinical applications is subject to stringent performance requirements, which raises a need for large labeled datasets.

Active Learning Computed Tomography (CT)

Gibson Env: Real-World Perception for Embodied Agents

5 code implementations CVPR 2018 Fei Xia, Amir Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, Silvio Savarese

Developing visual perception models for active agents and sensorimotor control are cumbersome to be done in the physical world, as existing algorithms are too slow to efficiently learn in real-time and robots are fragile and costly.

Domain Adaptation General Reinforcement Learning +1

On Evaluation of Embodied Navigation Agents

9 code implementations18 Jul 2018 Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir

Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence.

Learning Instance Segmentation by Interaction

1 code implementation21 Jun 2018 Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik

The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels.

Instance Segmentation Semantic Segmentation

PatchFCN for Intracranial Hemorrhage Detection

no code implementations8 Jun 2018 Wei-cheng Kuo, Christian Häne, Esther Yuh, Pratik Mukherjee, Jitendra Malik

This paper studies the problem of detecting and segmenting acute intracranial hemorrhage on head computed tomography (CT) scans.

Computed Tomography (CT) Object Detection +1

More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch

no code implementations28 May 2018 Roberto Calandra, Andrew Owens, Dinesh Jayaraman, Justin Lin, Wenzhen Yuan, Jitendra Malik, Edward H. Adelson, Sergey Levine

This model -- a deep, multimodal convolutional network -- predicts the outcome of a candidate grasp adjustment, and then executes a grasp by iteratively selecting the most promising actions.

Robotic Grasping

Learning Category-Specific Mesh Reconstruction from Image Collections

no code implementations ECCV 2018 Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, Jitendra Malik

The shape is represented as a deformable 3D mesh model of an object category where a shape is parameterized by a learned mean shape and per-instance predicted deformation.

Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction

no code implementations CVPR 2018 Shubham Tulsiani, Alexei A. Efros, Jitendra Malik

We present a framework for learning single-view shape and pose prediction without using direct supervision for either.

Pose Prediction

Unifying Map and Landmark Based Representations for Visual Navigation

no code implementations21 Dec 2017 Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik

This works presents a formulation for visual navigation that unifies map based spatial reasoning and path planning, with landmark based robust plan execution in noisy environments.

Visual Navigation

End-to-end Recovery of Human Shape and Pose

5 code implementations CVPR 2018 Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik

The main objective is to minimize the reprojection loss of keypoints, which allow our model to be trained using images in-the-wild that only have ground truth 2D annotations.

3D Multi-Person Pose Estimation Monocular 3D Human Pose Estimation +1

From Lifestyle Vlogs to Everyday Interactions

no code implementations CVPR 2018 David F. Fouhey, Wei-cheng Kuo, Alexei A. Efros, Jitendra Malik

A major stumbling block to progress in understanding basic human interactions, such as getting out of bed or opening a refrigerator, is lack of good training data.

Future prediction

Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene

no code implementations CVPR 2018 Shubham Tulsiani, Saurabh Gupta, David Fouhey, Alexei A. Efros, Jitendra Malik

The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in terms of shape and pose.

Generic 3D Representation via Pose Estimation and Matching

1 code implementation23 Oct 2017 Amir R. Zamir, Tilman Wekel, Pulkit Argrawal, Colin Weil, Jitendra Malik, Silvio Savarese

Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited.

Pose Estimation Rectification

What Will Happen Next? Forecasting Player Moves in Sports Videos

no code implementations ICCV 2017 Panna Felsen, Pulkit Agrawal, Jitendra Malik

A large number of very popular team sports involve the act of one team trying to score a goal against the other.

Learning a Multi-View Stereo Machine

1 code implementation NeurIPS 2017 Abhishek Kar, Christian Häne, Jitendra Malik

We thoroughly evaluate our approach on the ShapeNet dataset and demonstrate the benefits over classical approaches as well as recent learning based methods.

3D Reconstruction

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

3 code implementations CVPR 2018 Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

Action Recognition Video Understanding

Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency

no code implementations CVPR 2017 Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, Jitendra Malik

We study the notion of consistency between a 3D shape and a 2D observation and propose a differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view.

Hierarchical Surface Prediction for 3D Object Reconstruction

1 code implementation3 Apr 2017 Christian Häne, Shubham Tulsiani, Jitendra Malik

A major limitation of such approaches is that they only predict a coarse resolution voxel grid, which does not capture the surface of the objects well.

3D Geometry Prediction 3D Object Reconstruction

Fast k-Nearest Neighbour Search via Prioritized DCI

2 code implementations ICML 2017 Ke Li, Jitendra Malik

Most exact methods for k-nearest neighbour search suffer from the curse of dimensionality; that is, their query times exhibit exponential dependence on either the ambient or the intrinsic dimensionality.

Learning to Optimize Neural Nets

no code implementations ICLR 2018 Ke Li, Jitendra Malik

Learning to Optimize is a recently proposed framework for learning optimization algorithms using reinforcement learning.

Stochastic Optimization

Feedback Networks

1 code implementation CVPR 2017 Amir R. Zamir, Te-Lin Wu, Lin Sun, William Shen, Jitendra Malik, Silvio Savarese

Currently, the most successful learning models in computer vision are based on learning successive representations followed by a decision layer.

Curriculum Learning

Learning Shape Abstractions by Assembling Volumetric Primitives

3 code implementations CVPR 2017 Shubham Tulsiani, Hao Su, Leonidas J. Guibas, Alexei A. Efros, Jitendra Malik

We present a learning framework for abstracting complex shapes by learning to assemble objects using 3D volumetric primitives.

Learning to Optimize

no code implementations 2016 2016 Ke Li, Jitendra Malik

Algorithm design is a laborious process and often requires many iterations of ideation and validation.

View Synthesis by Appearance Flow

4 code implementations11 May 2016 Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, Alexei A. Efros

We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints.

Novel View Synthesis

Amodal Instance Segmentation

no code implementations27 Apr 2016 Ke Li, Jitendra Malik

We consider the problem of amodal instance segmentation, the objective of which is to predict the region encompassing both visible and occluded parts of each object.

Amodal Instance Segmentation Semantic Segmentation

Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing

1 code implementation1 Dec 2015 Ke Li, Jitendra Malik

Existing methods for retrieving k-nearest neighbours suffer from the curse of dimensionality.

Iterative Instance Segmentation

no code implementations CVPR 2016 Ke Li, Bharath Hariharan, Jitendra Malik

Existing methods for pixel-wise labelling tasks generally disregard the underlying structure of labellings, often leading to predictions that are visually implausible.

Instance Segmentation Semantic Segmentation +1

Shape and Symmetry Induction for 3D Objects

no code implementations24 Nov 2015 Shubham Tulsiani, Abhishek Kar, Qi-Xing Huang, João Carreira, Jitendra Malik

Actions as simple as grasping an object or navigating around it require a rich understanding of that object's 3D shape from a given viewpoint.

General Classification

Learning Visual Predictive Models of Physics for Playing Billiards

no code implementations23 Nov 2015 Katerina Fragkiadaki, Pulkit Agrawal, Sergey Levine, Jitendra Malik

The ability to plan and execute goal specific actions in varied, unexpected settings is a central requirement of intelligent agents.

Amodal Completion and Size Constancy in Natural Scenes

no code implementations ICCV 2015 Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik

We consider the problem of enriching current object detection systems with veridical object sizes and relative depth estimates from a single image.

Object Detection Object Recognition +1

Bandit Label Inference for Weakly Supervised Learning

no code implementations22 Sep 2015 Ke Li, Jitendra Malik

The scarcity of data annotated at the desired level of granularity is a recurring issue in many applications.

Recurrent Network Models for Human Dynamics

no code implementations ICCV 2015 Katerina Fragkiadaki, Sergey Levine, Panna Felsen, Jitendra Malik

We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture.

Human Dynamics Human Pose Forecasting +2

Human Pose Estimation with Iterative Error Feedback

1 code implementation CVPR 2016 Joao Carreira, Pulkit Agrawal, Katerina Fragkiadaki, Jitendra Malik

Hierarchical feature extractors such as Convolutional Networks (ConvNets) have achieved impressive performance on a variety of classification tasks using purely feedforward processing.

Pose Estimation Semantic Segmentation

Cross Modal Distillation for Supervision Transfer

1 code implementation CVPR 2016 Saurabh Gupta, Judy Hoffman, Jitendra Malik

In this work we propose a technique that transfers supervision between images from different modalities.

Optical Flow Estimation

Aligning 3D Models to RGB-D Images of Cluttered Scenes

no code implementations CVPR 2015 Saurabh Gupta, Pablo Arbelaez, Ross Girshick, Jitendra Malik

The goal of this work is to represent objects in an RGB-D scene with corresponding 3D models from a library.

Visual Semantic Role Labeling

1 code implementation17 May 2015 Saurabh Gupta, Jitendra Malik

In this paper we introduce the problem of Visual Semantic Role Labeling: given an image we want to detect people doing actions and localize the objects of interaction.

Action Classification Action Recognition +1

DeepBox: Learning Objectness with Convolutional Networks

1 code implementation ICCV 2015 Wei-cheng Kuo, Bharath Hariharan, Jitendra Malik

Existing object proposal approaches use primarily bottom-up cues to rank proposals, while we believe that objectness is in fact a high level construct.

Learning to See by Moving

no code implementations ICCV 2015 Pulkit Agrawal, Joao Carreira, Jitendra Malik

We show that given the same number of training images, features learnt using egomotion as supervision compare favourably to features learnt using class-label as supervision on visual tasks of scene recognition, object recognition, visual odometry and keypoint matching.

Object Recognition Scene Recognition +1

Contextual Action Recognition with R*CNN

2 code implementations ICCV 2015 Georgia Gkioxari, Ross Girshick, Jitendra Malik

In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition system.

Action Recognition General Classification +1

Pose Induction for Novel Object Categories

1 code implementation ICCV 2015 Shubham Tulsiani, João Carreira, Jitendra Malik

We address the task of predicting pose for objects of unannotated object categories from a small seed set of annotated object classes.

Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation

1 code implementation3 Mar 2015 Jordi Pont-Tuset, Pablo Arbelaez, Jonathan T. Barron, Ferran Marques, Jitendra Malik

We propose a unified approach for bottom-up hierarchical image segmentation and object proposal generation for recognition, called Multiscale Combinatorial Grouping (MCG).

BSDS500 Object Proposal Generation +1

Inferring 3D Object Pose in RGB-D Images

no code implementations16 Feb 2015 Saurabh Gupta, Pablo Arbeláez, Ross Girshick, Jitendra Malik

The goal of this work is to replace objects in an RGB-D scene with corresponding 3D models from a library.

Learning to Segment Moving Objects in Videos

no code implementations CVPR 2015 Katerina Fragkiadaki, Pablo Arbelaez, Panna Felsen, Jitendra Malik

We segment moving objects in videos by ranking spatio-temporal segment proposals according to "moving objectness": how likely they are to contain a moving object.

Video Segmentation Video Semantic Segmentation

Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction

no code implementations NeurIPS 2014 Katerina Fragkiadaki, Marta Salas, Pablo Arbelaez, Jitendra Malik

Furthermore, NRSfM needs to be robust to noise in both segmentation and tracking, e. g., drifting, segmentation ``leaking'', optical flow ``bleeding'' etc.

3D Reconstruction Optical Flow Estimation +2

Viewpoints and Keypoints

no code implementations CVPR 2015 Shubham Tulsiani, Jitendra Malik

We characterize the problem of pose estimation for rigid objects in terms of determining viewpoint to explain coarse pose and keypoint prediction to capture the finer details.

Keypoint Detection

Category-Specific Object Reconstruction from a Single Image

no code implementations CVPR 2015 Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik

Object reconstruction from a single image -- in the wild -- is a problem where we can make progress and get meaningful results today.

Object Detection Object Reconstruction

Hypercolumns for Object Segmentation and Fine-grained Localization

6 code implementations CVPR 2015 Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik

Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation.

Semantic Segmentation

Detecting People in Cubist Art

no code implementations22 Sep 2014 Shiry Ginosar, Daniel Haas, Timothy Brown, Jitendra Malik

Although the human visual system is surprisingly robust to extreme distortion when recognizing objects, most evaluations of computer object detection methods focus only on robustness to natural form deformations such as people's pose changes.

Object Detection

Deformable Part Models are Convolutional Neural Networks

1 code implementation CVPR 2015 Ross Girshick, Forrest Iandola, Trevor Darrell, Jitendra Malik

Deformable part models (DPMs) and convolutional neural networks (CNNs) are two widely used tools for visual recognition.

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

no code implementations22 Jul 2014 Saurabh Gupta, Ross Girshick, Pablo Arbeláez, Jitendra Malik

In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features.

Instance Segmentation Object Detection +1

Pixels to Voxels: Modeling Visual Representation in the Human Brain

no code implementations18 Jul 2014 Pulkit Agrawal, Dustin Stansbury, Jitendra Malik, Jack L. Gallant

We find that both classes of models accurately predict brain activity in high-level visual areas, directly from pixels and without the need for any semantic tags or hand annotation of images.

Object Recognition

Analyzing the Performance of Multilayer Neural Networks for Object Recognition

1 code implementation7 Jul 2014 Pulkit Agrawal, Ross Girshick, Jitendra Malik

In the last two years, convolutional neural networks (CNNs) have achieved an impressive suite of results on standard recognition datasets and tasks.

Object Recognition

R-CNNs for Pose Estimation and Action Detection

no code implementations19 Jun 2014 Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik

We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images.

Action Classification Action Detection +3

Multiscale Combinatorial Grouping

no code implementations CVPR 2014 Pablo Arbelaez, Jordi Pont-Tuset, Jonathan T. Barron, Ferran Marques, Jitendra Malik

We propose a unified approach for bottom-up hierarchical image segmentation and object candidate generation for recognition, called Multiscale Combinatorial Grouping (MCG).

BSDS500 Semantic Segmentation

Using k-Poselets for Detecting People and Localizing Their Keypoints

no code implementations CVPR 2014 Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik

A k-poselet is a deformable part model (DPM) with k parts, where each of the parts is a poselet, aligned to a specific configuration of keypoints based on ground-truth annotations.

Human Detection

Articulated Pose Estimation Using Discriminative Armlet Classifiers

no code implementations CVPR 2013 Georgia Gkioxari, Pablo Arbelaez, Lubomir Bourdev, Jitendra Malik

We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image.

Pose Estimation

Intrinsic Scene Properties from a Single RGB-D Image

no code implementations CVPR 2013 Jonathan T. Barron, Jitendra Malik

Our model takes as input a single RGB-D image and produces as output an improved depth map, a set of surface normals, a reflectance image, a shading image, and a spatially varying model of illumination.

Cannot find the paper you are looking for? You can Submit a new open access paper.