Search Results for author: Jitendra Malik

Found 217 papers, 104 papers with code

Whole-Body Conditioned Egocentric Video Prediction

no code implementations26 Jun 2025 Yutong Bai, Danny Tran, Amir Bar, Yann Lecun, Trevor Darrell, Jitendra Malik

We train models to Predict Ego-centric Video from human Actions (PEVA), given the past video and an action represented by the relative 3D body pose.

Prediction Video Prediction

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

no code implementations30 May 2025 Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, huan zhang

$\alpha$1 first introduces $\alpha$ moment, which represents the scaled thinking phase with a universal parameter $\alpha$.

Answer Generation

DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy

1 code implementation16 May 2025 Yuran Wang, Ruihai Wu, Yue Chen, Jiarui Wang, Jiaqi Liang, Ziyu Zhu, Haoran Geng, Jitendra Malik, Pieter Abbeel, Hao Dong

To improve generalization across diverse garment shapes and deformations, we propose a Hierarchical gArment-manipuLation pOlicy (HALO).

Reinforcement Learning (RL)

Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids

no code implementations27 Feb 2025 Toru Lin, Kartik Sachdev, Linxi Fan, Jitendra Malik, Yuke Zhu

This work investigates the key challenges in applying reinforcement learning to solve a collection of contact-rich manipulation tasks on a humanoid embodiment.

Contact-rich Manipulation reinforcement-learning +1

Poly-Autoregressive Prediction for Modeling Interactions

no code implementations CVPR 2025 Neerja Thakkar, Tara Sadjadpour, Jathushan Rajasegaran, Shiry Ginosar, Jitendra Malik

At its core, PAR represents the behavior of all agents as a sequence of tokens, each representing an agent's state at a specific timestep.

Autonomous Vehicles Prediction +1

From Simple to Complex Skills: The Case of In-Hand Object Reorientation

no code implementations9 Jan 2025 Haozhi Qi, Brent Yi, Mike Lambeta, Yi Ma, Roberto Calandra, Jitendra Malik

This hierarchical policy learns to select which low-level skill to execute based on feedback from both the environment and the low-level skill policies themselves.

Object

Gaussian Masked Autoencoders

no code implementations6 Jan 2025 Jathushan Rajasegaran, Xinlei Chen, Rulilong Li, Christoph Feichtenhofer, Jitendra Malik, Shiry Ginosar

Our approach, named Gaussian Masked Autoencoder, or GMAE, aims to learn semantic abstractions and spatial understanding jointly.

Edge Detection Representation Learning +2

Reconstructing People, Places, and Cameras

1 code implementation CVPR 2025 Lea Müller, Hongsuk Choi, Anthony Zhang, Brent Yi, Jitendra Malik, Angjoo Kanazawa

We present "Humans and Structure from Motion" (HSfM), a method for jointly reconstructing multiple human meshes, scene point clouds, and camera parameters in a metric world coordinate system from a sparse set of uncalibrated multi-view images featuring people.

Camera Pose Estimation Pose Estimation

Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment

no code implementations6 Dec 2024 Ran Tian, Yilin Wu, Chenfeng Xu, Masayoshi Tomizuka, Jitendra Malik, Andrea Bajcsy

While reinforcement learning from human feedback (RLHF) has become the predominant mechanism for alignment in non-embodied domains like large language models, it has not seen the same success in aligning visuomotor policies due to the prohibitive amount of human feedback required to learn visual reward functions.

Scaling Properties of Diffusion Models for Perceptual Tasks

no code implementations CVPR 2025 Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik

In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks.

Depth Estimation Image-to-Image Translation +1

Learning Humanoid Locomotion over Challenging Terrain

no code implementations4 Oct 2024 Ilija Radosavovic, Sarthak Kamat, Trevor Darrell, Jitendra Malik

Humanoid robots can, in principle, use their legs to go almost anywhere.

Synergy and Synchrony in Couple Dances

no code implementations6 Sep 2024 Vongani Maluleke, Lea Müller, Jathushan Rajasegaran, Georgios Pavlakos, Shiry Ginosar, Angjoo Kanazawa, Jitendra Malik

Our contributions are a demonstration of the advantages of socially conditioned future motion prediction and an in-the-wild, couple dance video dataset to enable future research in this direction.

motion prediction Prediction

Lessons from Learning to Spin "Pens"

no code implementations26 Jul 2024 Jun Wang, Ying Yuan, Haichuan Che, Haozhi Qi, Yi Ma, Jitendra Malik, Xiaolong Wang

This serves two purposes: 1) pre-training a sensorimotor policy in simulation; 2) conducting open-loop trajectory replay in the real world.

Learning In-Hand Translation Using Tactile Skin With Shear and Normal Force Sensing

no code implementations10 Jul 2024 Jessica Yin, Haozhi Qi, Jitendra Malik, James Pikul, Mark Yim, Tess Hellebrekers

We introduce a sensor model for tactile skin that enables zero-shot sim-to-real transfer of ternary shear and binary normal forces.

Reinforcement Learning (RL)

Learning Visuotactile Skills with Two Multifingered Hands

1 code implementation25 Apr 2024 Toru Lin, Yu Zhang, Qiyang Li, Haozhi Qi, Brent Yi, Sergey Levine, Jitendra Malik

Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hardware equipped with touch sensing.

Reconstructing Hand-Held Objects in 3D from Images and Videos

no code implementations9 Apr 2024 Jane Wu, Georgios Pavlakos, Georgia Gkioxari, Jitendra Malik

In order to obtain the best performing single frame model, we first present MCC-Hand-Object (MCC-HO), which jointly reconstructs hand and object geometry given a single RGB image and inferred 3D hand as inputs.

Object Object Reconstruction +1

AutoEval Done Right: Using Synthetic Data for Model Evaluation

1 code implementation9 Mar 2024 Pierre Boyeau, Anastasios N. Angelopoulos, Nir Yosef, Jitendra Malik, Michael I. Jordan

The evaluation of machine learning models using human-labeled validation data can be expensive and time-consuming.

xT: Nested Tokenization for Larger Context in Large Images

1 code implementation4 Mar 2024 Ritwik Gupta, Shufan Li, Tyler Zhu, Jitendra Malik, Trevor Darrell, Karttikeya Mangalam

There are many downstream applications in which global context matters as much as high frequency details, such as in real-world satellite imagery; in such cases researchers have to make the uncomfortable choice of which information to discard.

Twisting Lids Off with Two Hands

no code implementations4 Mar 2024 Toru Lin, Zhao-Heng Yin, Haozhi Qi, Pieter Abbeel, Jitendra Malik

Manipulating objects with two multi-fingered hands has been a long-standing challenge in robotics, due to the contact-rich nature of many manipulation tasks and the complexity inherent in coordinating a high-dimensional bimanual system.

Deep Reinforcement Learning reinforcement-learning +1

Humanoid Locomotion as Next Token Prediction

no code implementations29 Feb 2024 Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, Jitendra Malik

We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language.

Humanoid Control Prediction

Synthesizing Moving People with 3D Control

no code implementations19 Jan 2024 Boyi Li, Junming Chen, Jathushan Rajasegaran, Yossi Gandelsman, Alexei A. Efros, Jitendra Malik

This disentangled approach allows our method to generate a sequence of images that are faithful to the target motion in the 3D pose and, to the input image in terms of visual similarity.

Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

1 code implementation8 Jan 2024 Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem

We use two coefficients on either type of residual connections respectively, and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision.

object-detection Small Object Detection +1

Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

no code implementations CVPR 2024 Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem

We use two coefficients on either type of residual connections respectively and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision.

object-detection Small Object Detection +1

Adaptive Human Trajectory Prediction via Latent Corridors

no code implementations11 Dec 2023 Neerja Thakkar, Karttikeya Mangalam, Andrea Bajcsy, Jitendra Malik

We formalize the problem of scene-specific adaptive trajectory prediction and propose a new adaptation approach inspired by prompt tuning called latent corridors.

Prediction Trajectory Prediction +1

Sequential Modeling Enables Scalable Learning for Large Vision Models

1 code implementation CVPR 2024 Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A Efros

We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.

Diversity

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

2 code implementations CVPR 2024 Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.

Video Understanding

Conformal Policy Learning for Sensorimotor Control Under Distribution Shifts

no code implementations2 Nov 2023 Huang Huang, Satvik Sharma, Antonio Loquercio, Anastasios Angelopoulos, Ken Goldberg, Jitendra Malik

The key idea is the design of switching policies that can take conformal quantiles as input, which we define as conformal policy learning, that allows robots to detect distribution shifts with formal statistical guarantees.

Autonomous Driving Conformal Prediction

Interactive Task Planning with Language Models

no code implementations16 Oct 2023 Boyi Li, Philipp Wu, Pieter Abbeel, Jitendra Malik

To tackle this, we propose a simple framework that achieves interactive task planning with language models by incorporating both high-level planning and low-level skill execution through function calling, leveraging pretrained vision models to ground the scene in language.

Language Modeling Language Modelling +3

Conformal Decision Theory: Safe Autonomous Decisions from Imperfect Predictions

no code implementations9 Oct 2023 Jordan Lekeufack, Anastasios N. Angelopoulos, Andrea Bajcsy, Michael I. Jordan, Jitendra Malik

We introduce Conformal Decision Theory, a framework for producing safe autonomous decisions despite imperfect machine learning predictions.

Conformal Prediction Motion Planning

General In-Hand Object Rotation with Vision and Touch

no code implementations18 Sep 2023 Haozhi Qi, Brent Yi, Sudharshan Suresh, Mike Lambeta, Yi Ma, Roberto Calandra, Jitendra Malik

We introduce RotateIt, a system that enables fingertip-based object rotation along multiple axes by leveraging multimodal sensory inputs.

Object

Learning Vision-based Pursuit-Evasion Robot Policies

no code implementations30 Aug 2023 Andrea Bajcsy, Antonio Loquercio, Ashish Kumar, Jitendra Malik

We find that the quality of the supervision signal for the partially-observable pursuer policy depends on two key factors: the balance of diversity and optimality of the evader's behavior and the strength of the modeling assumptions in the fully-observable policy.

Diversity

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

1 code implementation NeurIPS 2023 Karttikeya Mangalam, Raiymbek Akshulakov, Jitendra Malik

We introduce EgoSchema, a very long-form video question-answering dataset, and benchmark to evaluate long video understanding capabilities of modern vision and language systems.

Diagnostic EgoSchema +4

Learning Space-Time Semantic Correspondences

no code implementations16 Jun 2023 Du Tran, Jitendra Malik

We propose a new task of space-time semantic correspondence prediction in videos.

Imitation Learning Semantic correspondence +1

Real-World Humanoid Locomotion with Reinforcement Learning

no code implementations6 Mar 2023 Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, Koushil Sreenath

Humanoid robots that can autonomously operate in diverse environments have the potential to help address labour shortages in factories, assist elderly at homes, and colonize new planets.

reinforcement-learning Reinforcement Learning

Decoupling Human and Camera Motion from Videos in the Wild

1 code implementation CVPR 2023 Vickie Ye, Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa

Our method robustly recovers the global 3D trajectories of people in challenging in-the-wild videos, such as PoseTrack.

Speculative Decoding with Big Little Decoder

1 code implementation NeurIPS 2023 Sehoon Kim, Karttikeya Mangalam, Suhong Moon, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer

To address this, we propose Big Little Decoder (BiLD), a framework that can improve inference efficiency and latency for a wide range of text generation applications.

Decoder de-en +2

Reversible Vision Transformers

4 code implementations CVPR 2022 Karttikeya Mangalam, Haoqi Fan, Yanghao Li, Chao-yuan Wu, Bo Xiong, Christoph Feichtenhofer, Jitendra Malik

Reversible Vision Transformers achieve a reduced memory footprint of up to 15. 5x at roughly identical model complexity, parameters and accuracy, demonstrating the promise of reversible vision transformers as an efficient backbone for hardware resource limited training regimes.

image-classification Image Classification +3

Multiview Compressive Coding for 3D Reconstruction

1 code implementation CVPR 2023 Chao-yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, Georgia Gkioxari

We introduce a simple framework that operates on 3D points of single objects or whole scenes coupled with category-agnostic large-scale training from diverse RGB-D videos.

3D Reconstruction Decoder +2

CA$^2$T-Net: Category-Agnostic 3D Articulation Transfer from Single Image

no code implementations5 Jan 2023 Jasmine Collins, Anqi Liang, Jitendra Malik, Hao Zhang, Frédéric Devernay

We present a neural network approach to transfer the motion from a single image of an articulated object to a rest-state (i. e., unarticulated) 3D model.

Object

Navigating to Objects in the Real World

no code implementations2 Dec 2022 Theophile Gervet, Soumith Chintala, Dhruv Batra, Jitendra Malik, Devendra Singh Chaplot

In contrast, end-to-end learning does not, dropping from 77% simulation to 23% real-world success rate due to a large image domain gap between simulation and reality.

Navigate Visual Navigation

Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances

no code implementations29 Nov 2022 Jacob Krantz, Stefan Lee, Jitendra Malik, Dhruv Batra, Devendra Singh Chaplot

We consider the problem of embodied visual navigation given an image-goal (ImageNav) where an agent is initialized in an unfamiliar environment and tasked with navigating to a location 'described' by an image.

Visual Navigation

Learning to Imitate Object Interactions from Internet Videos

no code implementations23 Nov 2022 Austin Patel, Andrew Wang, Ilija Radosavovic, Jitendra Malik

In this paper we make two main contributions: (1) a novel reconstruction technique RHOV (Reconstructing Hands and Objects from Videos), which reconstructs 4D trajectories of both the hand and the object using 2D image cues and temporal smoothness constraints; (2) a system for imitating object interactions in a physics simulator with reinforcement learning.

Object

Learning Visual Locomotion with Cross-Modal Supervision

no code implementations7 Nov 2022 Antonio Loquercio, Ashish Kumar, Jitendra Malik

In this work, we show how to learn a visual walking policy that only uses a monocular RGB camera and proprioception.

Real-World Robot Learning with Masked Visual Pre-training

1 code implementation6 Oct 2022 Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell

Finally, we train a 307M parameter vision transformer on a massive collection of 4. 5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning.

Learning a Single Near-hover Position Controller for Vastly Different Quadcopters

no code implementations19 Sep 2022 Dingqi Zhang, Antonio Loquercio, Xiangyu Wu, Ashish Kumar, Jitendra Malik, Mark W. Mueller

This paper proposes an adaptive near-hover position controller for quadcopters, which can be deployed to quadcopters of very different mass, size and motor constants, and also shows rapid adaptation to unknown disturbances during runtime.

Drone Controller Position

Multi-skill Mobile Manipulation for Object Rearrangement

1 code implementation6 Sep 2022 Jiayuan Gu, Devendra Singh Chaplot, Hao Su, Jitendra Malik

To tackle the entire task, prior work chains multiple stationary manipulation skills with a point-goal navigation skill, which are learned individually on subtasks.

Object Object Rearrangement

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

4 code implementations2 Jun 2022 Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer

After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes.

Automatic Speech Recognition Automatic Speech Recognition (ASR)

Adapting Rapid Motor Adaptation for Bipedal Robots

no code implementations30 May 2022 Ashish Kumar, Zhongyu Li, Jun Zeng, Deepak Pathak, Koushil Sreenath, Jitendra Malik

In this work, we leverage recent advances in rapid adaptation for locomotion control, and extend them to work on bipedal robots.

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

1 code implementation CVPR 2022 Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, Du Tran

From PA we construct a large set of pseudo-ground-truth instance masks; combined with human-annotated instance masks we train GGNs and significantly outperform the SOTA on open-world instance segmentation on various benchmarks including COCO, LVIS, ADE20K, and UVO.

Diversity Open-World Instance Segmentation +1

Masked Visual Pre-training for Motor Control

1 code implementation11 Mar 2022 Tete Xiao, Ilija Radosavovic, Trevor Darrell, Jitendra Malik

This paper shows that self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels.

Robot Manipulation Generalization State Estimation

PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

1 code implementation CVPR 2022 Santhosh Kumar Ramakrishnan, Devendra Singh Chaplot, Ziad Al-Halah, Jitendra Malik, Kristen Grauman

We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of `where to look?'

Navigate ObjectGoal Navigation

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

1 code implementation CVPR 2022 Chao-yuan Wu, Yanghao Li, Karttikeya Mangalam, Haoqi Fan, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

Instead of trying to process more frames at once like most existing methods, we propose to process videos in an online fashion and cache "memory" at each iteration.

Ranked #6 on Action Anticipation on EPIC-KITCHENS-100 (using extra training data)

Action Anticipation Action Classification +2

Tracking People by Predicting 3D Appearance, Location and Pose

no code implementations CVPR 2022 Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

For a future frame, we compute the similarity between the predicted state of a tracklet and the single frame observations in a probabilistic manner.

Tracking People by Predicting 3D Appearance, Location & Pose

no code implementations8 Dec 2021 Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

For a future frame, we compute the similarity between the predicted state of a tracklet and the single frame observations in a probabilistic manner.

Coupling Vision and Proprioception for Navigation of Legged Robots

no code implementations CVPR 2022 Zipeng Fu, Ashish Kumar, Ananye Agarwal, Haozhi Qi, Jitendra Malik, Deepak Pathak

A safety advisor module adds sensed unexpected obstacles to the occupancy map and environment-determined speed limits to the velocity command generator.

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

9 code implementations CVPR 2022 Yanghao Li, Chao-yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection.

 Ranked #1 on Action Classification on Kinetics-600 (GFLOPs metric)

Action Classification Action Recognition +6

PyTorchVideo: A Deep Learning Library for Video Understanding

1 code implementation18 Nov 2021 Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer

We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.

Deep Learning Self-Supervised Learning +1

Tracking People with 3D Representations

1 code implementation NeurIPS 2021 Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

We find that 3D representations are more effective than 2D representations for tracking in these settings, and we obtain state-of-the-art performance.

3D geometry

Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots

no code implementations25 Oct 2021 Zipeng Fu, Ashish Kumar, Jitendra Malik, Deepak Pathak

We demonstrate that learning to minimize energy consumption plays a key role in the emergence of natural locomotion gaits at different speeds in real quadruped robots.

Ego4D: Around the World in 3,000 Hours of Egocentric Video

8 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Differentiable Stereopsis: Meshes from multiple views using differentiable rendering

1 code implementation CVPR 2022 Shubham Goel, Georgia Gkioxari, Jitendra Malik

We propose Differentiable Stereopsis, a multi-view stereo approach that reconstructs shape and texture from few input views and noisy cameras.

Active 3D Shape Reconstruction from Vision and Touch

2 code implementations NeurIPS 2021 Edward J. Smith, David Meger, Luis Pineda, Roberto Calandra, Jitendra Malik, Adriana Romero, Michal Drozdzal

In this paper, we focus on this problem and introduce a system composed of: 1) a haptic simulator leveraging high spatial resolution vision-based tactile sensors for active touching of 3D objects; 2)a mesh-based 3D shape reconstruction model that relies on tactile or visuotactile signals; and 3) a set of data-driven solutions with either tactile or visuotactile priors to guide the shape exploration.

3D Reconstruction 3D Shape Reconstruction

RMA: Rapid Motor Adaptation for Legged Robots

1 code implementation8 Jul 2021 Ashish Kumar, Zipeng Fu, Deepak Pathak, Jitendra Malik

Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear.

Sand

Multiscale Vision Transformers

8 code implementations ICCV 2021 Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

Action Classification Action Recognition +3

Distribution-Free, Risk-Controlling Prediction Sets

3 code implementations7 Jan 2021 Stephen Bates, Anastasios Angelopoulos, Lihua Lei, Jitendra Malik, Michael I. Jordan

While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making.

BIG-bench Machine Learning Classification +11

Human Mesh Recovery from Multiple Shots

1 code implementation CVPR 2022 Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa

The tools we develop open the door to processing and analyzing in 3D content from a large library of edited media, which could be helpful for many downstream applications.

3D Reconstruction Human Mesh Recovery

From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting

2 code implementations ICCV 2021 Karttikeya Mangalam, Yang An, Harshayu Girase, Jitendra Malik

Uncertainty in future trajectories stems from two sources: (a) sources that are known to the agent but unknown to the model, such as long term goals and (b)sources that are unknown to both the agent & the model, such as intent of other agents & irreducible randomness indecisions.

Prediction Trajectory Forecasting

Better Knowledge Retention through Metric Learning

no code implementations26 Nov 2020 Ke Li, Shichong Peng, Kailas Vodrahalli, Jitendra Malik

In continual learning, new categories may be introduced over time, and an ideal learning system should perform well on both the original categories and the new categories.

Continual Learning Metric Learning

Shape, Illumination, and Reflectance from Shading

no code implementations7 Oct 2020 Jonathan T. Barron, Jitendra Malik

A fundamental problem in computer vision is that of inferring the intrinsic, 3D structure of the world from flat, 2D images of that world.

Color Constancy

Uncertainty Sets for Image Classifiers using Conformal Prediction

5 code implementations ICLR 2021 Anastasios Angelopoulos, Stephen Bates, Jitendra Malik, Michael. I. Jordan

Convolutional image classifiers can achieve high predictive accuracy, but quantifying their uncertainty remains an unresolved challenge, hindering their deployment in consequential settings.

Conformal Prediction Prediction +1

Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild

2 code implementations ECCV 2020 Jason Y. Zhang, Sam Pepose, Hanbyul Joo, Deva Ramanan, Jitendra Malik, Angjoo Kanazawa

We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment.

3D Human Pose Estimation 3D Human Reconstruction +5

Shape and Viewpoint without Keypoints

no code implementations ECCV 2020 Shubham Goel, Angjoo Kanazawa, Jitendra Malik

We present a learning framework that learns to recover the 3D shape, pose and texture from a single image, trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.

3D Shape Reconstruction from Vision and Touch

1 code implementation NeurIPS 2020 Edward J. Smith, Roberto Calandra, Adriana Romero, Georgia Gkioxari, David Meger, Jitendra Malik, Michal Drozdzal

When a toddler is presented a new toy, their instinctual behaviour is to pick it upand inspect it with their hand and eyes in tandem, clearly searching over its surface to properly understand what they are playing with.

3D Shape Reconstruction

Deep Isometric Learning for Visual Recognition

1 code implementation ICML 2020 Haozhi Qi, Chong You, Xiaolong Wang, Yi Ma, Jitendra Malik

Initialization, normalization, and skip connections are believed to be three indispensable techniques for training very deep convolutional neural networks and obtaining state-of-the-art performance.

Inclusive GAN: Improving Data and Minority Coverage in Generative Models

1 code implementation ECCV 2020 Ning Yu, Ke Li, Peng Zhou, Jitendra Malik, Larry Davis, Mario Fritz

Generative Adversarial Networks (GANs) have brought about rapid progress towards generating photorealistic images.

State-Only Imitation Learning for Dexterous Manipulation

no code implementations7 Apr 2020 Ilija Radosavovic, Xiaolong Wang, Lerrel Pinto, Jitendra Malik

To tackle this setting, we train an inverse dynamics model and use it to predict actions for state-only demonstrations.

Imitation Learning

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

2 code implementations ECCV 2020 Jeffrey O. Zhang, Alexander Sax, Amir Zamir, Leonidas Guibas, Jitendra Malik

When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than starting from randomly initialized weights.

Imitation Learning Incremental Learning +4

Learning to Navigate Using Mid-Level Visual Priors

1 code implementation23 Dec 2019 Alexander Sax, Jeffrey O. Zhang, Bradley Emi, Amir Zamir, Silvio Savarese, Leonidas Guibas, Jitendra Malik

How much does having visual priors about the world (e. g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e. g. navigating a complex environment)?

Navigate reinforcement-learning +3

Approximate Feature Collisions in Neural Nets

1 code implementation NeurIPS 2019 Ke Li, Tianhao Zhang, Jitendra Malik

Work on adversarial examples has shown that neural nets are surprisingly sensitive to adversarially chosen changes of small magnitude.

3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera

1 code implementation ICCV 2019 Iro Armeni, Zhi-Yang He, JunYoung Gwak, Amir R. Zamir, Martin Fischer, Jitendra Malik, Silvio Savarese

Given a 3D mesh and registered panoramic images, we construct a graph that spans the entire building and includes semantics on objects (e. g., class, material, and other attributes), rooms (e. g., scene category, volume, etc.)

Predicting 3D Human Dynamics from Video

1 code implementation ICCV 2019 Jason Y. Zhang, Panna Felsen, Angjoo Kanazawa, Jitendra Malik

In this work, we present perhaps the first approach for predicting a future 3D mesh model sequence of a person from past video input.

3D Human Dynamics 3D Human Pose Estimation +3

Mesh R-CNN

7 code implementations ICCV 2019 Georgia Gkioxari, Jitendra Malik, Justin Johnson

We propose a system that detects objects in real-world images and produces a triangle mesh giving the full 3D shape of each detected object.

3D Shape Modeling Prediction

Learning Navigation Subroutines from Egocentric Videos

no code implementations29 May 2019 Ashish Kumar, Saurabh Gupta, Jitendra Malik

We demonstrate our proposed approach in context of navigation, and show that we can successfully learn consistent and diverse visuomotor subroutines from passive egocentric videos.

Computational Efficiency Pseudo Label +1

Accelerated Sparse Recovery Under Structured Measurements

no code implementations ICLR 2019 Ke Li, Jitendra Malik

Extensive work on compressed sensing has yielded a rich collection of sparse recovery algorithms, each making different tradeoffs between recovery condition and computational efficiency.

compressed sensing Computational Efficiency

ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors

1 code implementation ICCV 2019 Wei-cheng Kuo, Anelia Angelova, Jitendra Malik, Tsung-Yi Lin

However, it is difficult and costly to segment objects in novel categories because a large number of mask annotations is required.

Instance Segmentation Object +1

Combining Optimal Control and Learning for Visual Navigation in Novel Environments

no code implementations6 Mar 2019 Somil Bansal, Varun Tolani, Saurabh Gupta, Jitendra Malik, Claire Tomlin

Model-based control is a popular paradigm for robot navigation because it can leverage a known dynamics model to efficiently plan robust robot trajectories.

Robot Navigation Visual Navigation

Trajectory Normalized Gradients for Distributed Optimization

no code implementations24 Jan 2019 Jianqiao Wangni, Ke Li, Jianbo Shi, Jitendra Malik

Recently, researchers proposed various low-precision gradient compression, for efficient communication in large-scale distributed optimization.

Benchmarking Distributed Optimization

Learning Independent Object Motion from Unlabelled Stereoscopic Videos

no code implementations CVPR 2019 Zhe Cao, Abhishek Kar, Christian Haene, Jitendra Malik

Unlike prior learning based work which has focused on predicting dense pixel-wise optical flow field and/or a depth map for each image, we propose to predict object instance specific 3D scene flow maps and instance masks from which we are able to derive the motion direction and speed for each object instance.

3D geometry Object +1

Learning 3D Human Dynamics from Video

1 code implementation CVPR 2019 Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, Jitendra Malik

We present a framework that can similarly learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features.

Ranked #17 on 3D Human Pose Estimation on 3DPW (Acceleration Error metric)

3D Human Dynamics 3D Human Pose Estimation

Visual Memory for Robust Path Following

no code implementations NeurIPS 2018 Ashish Kumar, Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik

Equipped with this abstraction, a second network observes the world and decides how to act to retrace the path under noisy actuation and a changing environment.

On the Implicit Assumptions of GANs

no code implementations29 Nov 2018 Ke Li, Jitendra Malik

Generative adversarial nets (GANs) have generated a lot of excitement.

Diverse Image Synthesis from Semantic Layouts via Conditional IMLE

1 code implementation ICCV 2019 Ke Li, Tianhao Zhang, Jitendra Malik

Most existing methods for conditional image synthesis are only able to generate a single plausible image for any given input, or at best a fixed number of plausible images.

Image Generation Semantic Segmentation

SFV: Reinforcement Learning of Physical Skills from Videos

1 code implementation8 Oct 2018 Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, Sergey Levine

In this paper, we propose a method that enables physically simulated characters to learn skills from videos (SFV).

Deep Reinforcement Learning Pose Estimation +2

Super-Resolution via Conditional Implicit Maximum Likelihood Estimation

no code implementations2 Oct 2018 Ke Li, Shichong Peng, Jitendra Malik

Single-image super-resolution (SISR) is a canonical problem with diverse applications.

Image Super-Resolution

A Study of Robustness of Neural Nets Using Approximate Feature Collisions

no code implementations27 Sep 2018 Ke Li*, Tianhao Zhang*, Jitendra Malik

In recent years, various studies have focused on the robustness of neural nets.

Implicit Maximum Likelihood Estimation

1 code implementation ICLR 2019 Ke Li, Jitendra Malik

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly.

Cost-Sensitive Active Learning for Intracranial Hemorrhage Detection

no code implementations8 Sep 2018 Wei-cheng Kuo, Christian Häne, Esther Yuh, Pratik Mukherjee, Jitendra Malik

Deep learning for clinical applications is subject to stringent performance requirements, which raises a need for large labeled datasets.

Active Learning Computed Tomography (CT)

Gibson Env: Real-World Perception for Embodied Agents

5 code implementations CVPR 2018 Fei Xia, Amir Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, Silvio Savarese

Developing visual perception models for active agents and sensorimotor control are cumbersome to be done in the physical world, as existing algorithms are too slow to efficiently learn in real-time and robots are fragile and costly.

Domain Adaptation General Reinforcement Learning +1

Learning Instance Segmentation by Interaction

1 code implementation21 Jun 2018 Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik

The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels.

Instance Segmentation Segmentation +1

PatchFCN for Intracranial Hemorrhage Detection

no code implementations8 Jun 2018 Wei-cheng Kuo, Christian Häne, Esther Yuh, Pratik Mukherjee, Jitendra Malik

This paper studies the problem of detecting and segmenting acute intracranial hemorrhage on head computed tomography (CT) scans.

Computed Tomography (CT) Diversity +4

More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch

no code implementations28 May 2018 Roberto Calandra, Andrew Owens, Dinesh Jayaraman, Justin Lin, Wenzhen Yuan, Jitendra Malik, Edward H. Adelson, Sergey Levine

This model -- a deep, multimodal convolutional network -- predicts the outcome of a candidate grasp adjustment, and then executes a grasp by iteratively selecting the most promising actions.

Robotic Grasping

Learning Category-Specific Mesh Reconstruction from Image Collections

no code implementations ECCV 2018 Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, Jitendra Malik

The shape is represented as a deformable 3D mesh model of an object category where a shape is parameterized by a learned mean shape and per-instance predicted deformation.

Prediction

Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction

no code implementations CVPR 2018 Shubham Tulsiani, Alexei A. Efros, Jitendra Malik

We present a framework for learning single-view shape and pose prediction without using direct supervision for either.

Pose Prediction

Unifying Map and Landmark Based Representations for Visual Navigation

no code implementations21 Dec 2017 Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik

This works presents a formulation for visual navigation that unifies map based spatial reasoning and path planning, with landmark based robust plan execution in noisy environments.

Navigate Spatial Reasoning +1

End-to-end Recovery of Human Shape and Pose

10 code implementations CVPR 2018 Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik

The main objective is to minimize the reprojection loss of keypoints, which allow our model to be trained using images in-the-wild that only have ground truth 2D annotations.

3D Hand Pose Estimation 3D Human Shape Estimation +5

From Lifestyle Vlogs to Everyday Interactions

no code implementations CVPR 2018 David F. Fouhey, Wei-cheng Kuo, Alexei A. Efros, Jitendra Malik

A major stumbling block to progress in understanding basic human interactions, such as getting out of bed or opening a refrigerator, is lack of good training data.

Diversity Future prediction

Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene

no code implementations CVPR 2018 Shubham Tulsiani, Saurabh Gupta, David Fouhey, Alexei A. Efros, Jitendra Malik

The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in terms of shape and pose.

Generic 3D Representation via Pose Estimation and Matching

1 code implementation23 Oct 2017 Amir R. Zamir, Tilman Wekel, Pulkit Argrawal, Colin Weil, Jitendra Malik, Silvio Savarese

Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited.

Camera Pose Estimation Object +2

What Will Happen Next? Forecasting Player Moves in Sports Videos

no code implementations ICCV 2017 Panna Felsen, Pulkit Agrawal, Jitendra Malik

A large number of very popular team sports involve the act of one team trying to score a goal against the other.

Learning a Multi-View Stereo Machine

1 code implementation NeurIPS 2017 Abhishek Kar, Christian Häne, Jitendra Malik

We thoroughly evaluate our approach on the ShapeNet dataset and demonstrate the benefits over classical approaches as well as recent learning based methods.

3D geometry 3D Reconstruction

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

9 code implementations CVPR 2018 Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

Actin Detection Action Detection +3

Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency

no code implementations CVPR 2017 Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, Jitendra Malik

We study the notion of consistency between a 3D shape and a 2D observation and propose a differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view.

Hierarchical Surface Prediction for 3D Object Reconstruction

1 code implementation3 Apr 2017 Christian Häne, Shubham Tulsiani, Jitendra Malik

A major limitation of such approaches is that they only predict a coarse resolution voxel grid, which does not capture the surface of the objects well.

3D geometry 3D Geometry Prediction +3

Learning to Optimize Neural Nets

no code implementations ICLR 2018 Ke Li, Jitendra Malik

Learning to Optimize is a recently proposed framework for learning optimization algorithms using reinforcement learning.

reinforcement-learning Reinforcement Learning +2

Fast k-Nearest Neighbour Search via Prioritized DCI

2 code implementations ICML 2017 Ke Li, Jitendra Malik

Most exact methods for k-nearest neighbour search suffer from the curse of dimensionality; that is, their query times exhibit exponential dependence on either the ambient or the intrinsic dimensionality.

Feedback Networks

1 code implementation CVPR 2017 Amir R. Zamir, Te-Lin Wu, Lin Sun, William Shen, Jitendra Malik, Silvio Savarese

Currently, the most successful learning models in computer vision are based on learning successive representations followed by a decision layer.

Learning Shape Abstractions by Assembling Volumetric Primitives

4 code implementations CVPR 2017 Shubham Tulsiani, Hao Su, Leonidas J. Guibas, Alexei A. Efros, Jitendra Malik

We present a learning framework for abstracting complex shapes by learning to assemble objects using 3D volumetric primitives.

Learning to Optimize

no code implementations 2016 2016 Ke Li, Jitendra Malik

Algorithm design is a laborious process and often requires many iterations of ideation and validation.

reinforcement-learning Reinforcement Learning +1

View Synthesis by Appearance Flow

4 code implementations11 May 2016 Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, Alexei A. Efros

We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints.

Novel View Synthesis

Amodal Instance Segmentation

no code implementations27 Apr 2016 Ke Li, Jitendra Malik

We consider the problem of amodal instance segmentation, the objective of which is to predict the region encompassing both visible and occluded parts of each object.

Amodal Instance Segmentation Segmentation +1

Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing

1 code implementation1 Dec 2015 Ke Li, Jitendra Malik

Existing methods for retrieving k-nearest neighbours suffer from the curse of dimensionality.

Iterative Instance Segmentation

no code implementations CVPR 2016 Ke Li, Bharath Hariharan, Jitendra Malik

Existing methods for pixel-wise labelling tasks generally disregard the underlying structure of labellings, often leading to predictions that are visually implausible.

Instance Segmentation Prediction +3

Shape and Symmetry Induction for 3D Objects

no code implementations24 Nov 2015 Shubham Tulsiani, Abhishek Kar, Qi-Xing Huang, João Carreira, Jitendra Malik

Actions as simple as grasping an object or navigating around it require a rich understanding of that object's 3D shape from a given viewpoint.

General Classification Object

Learning Visual Predictive Models of Physics for Playing Billiards

no code implementations23 Nov 2015 Katerina Fragkiadaki, Pulkit Agrawal, Sergey Levine, Jitendra Malik

The ability to plan and execute goal specific actions in varied, unexpected settings is a central requirement of intelligent agents.

Amodal Completion and Size Constancy in Natural Scenes

no code implementations ICCV 2015 Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik

We consider the problem of enriching current object detection systems with veridical object sizes and relative depth estimates from a single image.

Object object-detection +3

Bandit Label Inference for Weakly Supervised Learning

no code implementations22 Sep 2015 Ke Li, Jitendra Malik

The scarcity of data annotated at the desired level of granularity is a recurring issue in many applications.

Weakly-supervised Learning

Recurrent Network Models for Human Dynamics

no code implementations ICCV 2015 Katerina Fragkiadaki, Sergey Levine, Panna Felsen, Jitendra Malik

We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture.

Ranked #8 on Human Pose Forecasting on Human3.6M (MAR, walking, 1,000ms metric)

Decoder Human Dynamics +3

Human Pose Estimation with Iterative Error Feedback

1 code implementation CVPR 2016 Joao Carreira, Pulkit Agrawal, Katerina Fragkiadaki, Jitendra Malik

Hierarchical feature extractors such as Convolutional Networks (ConvNets) have achieved impressive performance on a variety of classification tasks using purely feedforward processing.

Pose Estimation Semantic Segmentation

Cross Modal Distillation for Supervision Transfer

1 code implementation CVPR 2016 Saurabh Gupta, Judy Hoffman, Jitendra Malik

In this work we propose a technique that transfers supervision between images from different modalities.

Optical Flow Estimation

Aligning 3D Models to RGB-D Images of Cluttered Scenes

no code implementations CVPR 2015 Saurabh Gupta, Pablo Arbelaez, Ross Girshick, Jitendra Malik

The goal of this work is to represent objects in an RGB-D scene with corresponding 3D models from a library.

Visual Semantic Role Labeling

2 code implementations17 May 2015 Saurabh Gupta, Jitendra Malik

In this paper we introduce the problem of Visual Semantic Role Labeling: given an image we want to detect people doing actions and localize the objects of interaction.

16k Action Classification +3

DeepBox: Learning Objectness with Convolutional Networks

1 code implementation ICCV 2015 Wei-cheng Kuo, Bharath Hariharan, Jitendra Malik

Existing object proposal approaches use primarily bottom-up cues to rank proposals, while we believe that objectness is in fact a high level construct.

Learning to See by Moving

no code implementations ICCV 2015 Pulkit Agrawal, Joao Carreira, Jitendra Malik

We show that given the same number of training images, features learnt using egomotion as supervision compare favourably to features learnt using class-label as supervision on visual tasks of scene recognition, object recognition, visual odometry and keypoint matching.

Object Recognition Scene Recognition +1

Contextual Action Recognition with R*CNN

2 code implementations ICCV 2015 Georgia Gkioxari, Ross Girshick, Jitendra Malik

In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition system.

Action Recognition Attribute +3

Pose Induction for Novel Object Categories

1 code implementation ICCV 2015 Shubham Tulsiani, João Carreira, Jitendra Malik

We address the task of predicting pose for objects of unannotated object categories from a small seed set of annotated object classes.

Object

Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation

1 code implementation3 Mar 2015 Jordi Pont-Tuset, Pablo Arbelaez, Jonathan T. Barron, Ferran Marques, Jitendra Malik

We propose a unified approach for bottom-up hierarchical image segmentation and object proposal generation for recognition, called Multiscale Combinatorial Grouping (MCG).

Image Segmentation Object +2

Inferring 3D Object Pose in RGB-D Images

no code implementations16 Feb 2015 Saurabh Gupta, Pablo Arbeláez, Ross Girshick, Jitendra Malik

The goal of this work is to replace objects in an RGB-D scene with corresponding 3D models from a library.

Object

Learning to Segment Moving Objects in Videos

no code implementations CVPR 2015 Katerina Fragkiadaki, Pablo Arbelaez, Panna Felsen, Jitendra Malik

We segment moving objects in videos by ranking spatio-temporal segment proposals according to "moving objectness": how likely they are to contain a moving object.

Segmentation Video Segmentation +1

Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction

no code implementations NeurIPS 2014 Katerina Fragkiadaki, Marta Salas, Pablo Arbelaez, Jitendra Malik

Furthermore, NRSfM needs to be robust to noise in both segmentation and tracking, e. g., drifting, segmentation ``leaking'', optical flow ``bleeding'' etc.

3D Reconstruction Clustering +5

Viewpoints and Keypoints

no code implementations CVPR 2015 Shubham Tulsiani, Jitendra Malik

We characterize the problem of pose estimation for rigid objects in terms of determining viewpoint to explain coarse pose and keypoint prediction to capture the finer details.

Keypoint Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.