Search Results for author: Jitendra Malik

Found 189 papers, 94 papers with code

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

8 code implementations • CVPR 2018 • Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

Ranked #6 on Action Detection on UCF101-24

Actin Detection Action Detection +3

76,579

Paper
Code

Cognitive Mapping and Planning for Visual Navigation

6 code implementations • CVPR 2017 • Saurabh Gupta, Varun Tolani, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik

The accumulated belief of the world enables the agent to track visited regions of the environment.

Visual Navigation

76,579

Paper
Code

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

7 code implementations • CVPR 2022 • Yanghao Li, Chao-yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection.

Ranked #1 on Action Classification on Kinetics-600 (GFLOPs metric)

Action Classification Action Recognition +6

29,671

Paper
Code

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

4 code implementations • 2 Jun 2022 • Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer

After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes.

Ranked #30 on Speech Recognition on LibriSpeech test-clean

Automatic Speech Recognition Automatic Speech Recognition (ASR)

10,005

Paper
Code

Mesh R-CNN

6 code implementations • ICCV 2019 • Georgia Gkioxari, Jitendra Malik, Justin Johnson

We propose a system that detects objects in real-world images and produces a triangle mesh giving the full 3D shape of each detected object.

Ranked #1 on 3D Shape Modeling on Pix3D S2

3D Shape Modeling

8,265

Paper
Code

SlowFast Networks for Video Recognition

15 code implementations • ICCV 2019 • Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He

We present SlowFast networks for video recognition.

Ranked #4 on Action Recognition on AVA v2.1

Action Classification Action Detection +3

6,264

Paper
Code

Audiovisual SlowFast Networks for Video Recognition

3 code implementations • 23 Jan 2020 • Fanyi Xiao, Yong Jae Lee, Kristen Grauman, Jitendra Malik, Christoph Feichtenhofer

We present Audiovisual SlowFast Networks, an architecture for integrated audiovisual perception.

Action Classification Video Recognition

6,264

Paper
Code

Multiscale Vision Transformers

7 code implementations • ICCV 2021 • Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

Ranked #14 on Action Classification on Charades

Action Classification Action Recognition +2

6,264

Paper
Code

Reversible Vision Transformers

4 code implementations • CVPR 2022 • Karttikeya Mangalam, Haoqi Fan, Yanghao Li, Chao-yuan Wu, Bo Xiong, Christoph Feichtenhofer, Jitendra Malik

Reversible Vision Transformers achieve a reduced memory footprint of up to 15. 5x at roughly identical model complexity, parameters and accuracy, demonstrating the promise of reversible vision transformers as an efficient backbone for hardware resource limited training regimes.

Image Classification object-detection +2

6,264

Paper
Code

ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors

1 code implementation • ICCV 2019 • Wei-cheng Kuo, Anelia Angelova, Jitendra Malik, Tsung-Yi Lin

However, it is difficult and costly to segment objects in novel categories because a large number of mask annotations is required.

Instance Segmentation Object +1

5,176

Paper
Code

Ego4D: Around the World in 3,000 Hours of Egocentric Video

6 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

4,978

Paper
Code

End-to-end Recovery of Human Shape and Pose

9 code implementations • CVPR 2018 • Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik

The main objective is to minimize the reprojection loss of keypoints, which allow our model to be trained using images in-the-wild that only have ground truth 2D annotations.

Ranked #1 on Weakly-supervised 3D Human Pose Estimation on Human3.6M (3D Annotations metric)

3D Hand Pose Estimation 3D Human Shape Estimation +4

4,966

Paper
Code

PyTorchVideo: A Deep Learning Library for Video Understanding

1 code implementation • 18 Nov 2021 • Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer

We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.

Self-Supervised Learning Video Understanding

3,176

Paper
Code

Habitat: A Platform for Embodied AI Research

13 code implementations • ICCV 2019 • Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra

We present Habitat, a platform for research in embodied artificial intelligence (AI).

Ranked #2 on PointGoal Navigation on Gibson PointGoal Navigation

Benchmarking Instruction Following +2

2,350

Paper
Code

Habitat 2.0: Training Home Assistants to Rearrange their Habitat

6 code implementations • NeurIPS 2021 • Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, Dhruv Batra

We introduce Habitat 2. 0 (H2. 0), a simulation platform for training virtual robots in interactive 3D environments and complex physics-enabled scenarios.

Reinforcement Learning (RL) Single Particle Analysis

2,350

Paper
Code

Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots

3 code implementations • 19 Oct 2023 • Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M. Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, Roozbeh Mottaghi

We present Habitat 3. 0: a simulation platform for studying collaborative human-robot tasks in home environments.

Social Navigation

2,350

Paper
Code

Rich feature hierarchies for accurate object detection and semantic segmentation

29 code implementations • CVPR 2014 • Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset.

Ranked #27 on Object Detection on PASCAL VOC 2007 (using extra training data)

Object Detection Semantic Segmentation

2,339

Paper
Code

Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55

1 code implementation • 17 Oct 2017 • Li Yi, Lin Shao, Manolis Savva, Haibin Huang, Yang Zhou, Qirui Wang, Benjamin Graham, Martin Engelcke, Roman Klokov, Victor Lempitsky, Yuan Gan, Pengyu Wang, Kun Liu, Fenggen Yu, Panpan Shui, Bingyang Hu, Yan Zhang, Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Minki Jeong, Jaehoon Choi, Changick Kim, Angom Geetchandra, Narasimha Murthy, Bhargava Ramu, Bharadwaj Manda, M. Ramanathan, Gautam Kumar, P Preetham, Siddharth Srivastava, Swati Bhugra, Brejesh lall, Christian Haene, Shubham Tulsiani, Jitendra Malik, Jared Lafer, Ramsey Jones, Siyuan Li, Jie Lu, Shi Jin, Jingyi Yu, Qi-Xing Huang, Evangelos Kalogerakis, Silvio Savarese, Pat Hanrahan, Thomas Funkhouser, Hao Su, Leonidas Guibas

We introduce a large-scale 3D shape understanding benchmark using data and annotation from ShapeNet 3D object database.

3D Part Segmentation 3D Reconstruction +1

1,989

Paper
Code

On Evaluation of Embodied Navigation Agents

9 code implementations • 18 Jul 2018 • Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir

Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence.

Benchmarking

1,698

Paper
Code

Sequential Modeling Enables Scalable Learning for Large Vision Models

1 code implementation • 1 Dec 2023 • Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A Efros

We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.

1,586

Paper
Code

Distribution-Free, Risk-Controlling Prediction Sets

3 code implementations • 7 Jan 2021 • Stephen Bates, Anastasios Angelopoulos, Lihua Lei, Jitendra Malik, Michael I. Jordan

While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making.

BIG-bench Machine Learning Classification +9

1,149

Paper
Code

Humans in 4D: Reconstructing and Tracking Humans with Transformers

1 code implementation • ICCV 2023 • Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, Jitendra Malik

To analyze video, we use 3D reconstructions from HMR 2. 0 as input to a tracking system that operates in 3D.

Ranked #3 on Pose Tracking on PoseTrack2018

3D Human Pose Estimation Action Recognition +2

1,018

Paper
Code

Taskonomy: Disentangling Task Transfer Learning

1 code implementation • CVPR 2018 • Amir Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, Silvio Savarese

The product is a computational taxonomic map for task transfer learning.

Multi-Task Learning

829

Paper
Code

Gibson Env: Real-World Perception for Embodied Agents

5 code implementations • CVPR 2018 • Fei Xia, Amir Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, Silvio Savarese

Developing visual perception models for active agents and sensorimotor control are cumbersome to be done in the physical world, as existing algorithms are too slow to efficiently learn in real-time and robots are fragile and costly.

Domain Adaptation General Reinforcement Learning +1

823

Paper
Code

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

2 code implementations • 1 Jun 2023 • Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer

Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance.

Ranked #1 on Image Classification on iNaturalist 2019 (using extra training data)

Action Classification Action Recognition In Videos +4

691

Paper
Code

Learning 3D Human Dynamics from Video

1 code implementation • CVPR 2019 • Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, Jitendra Malik

We present a framework that can similarly learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features.

Ranked #15 on 3D Human Pose Estimation on 3DPW (Acceleration Error metric)

3D Human Dynamics 3D Human Pose Estimation

625

Paper
Code

Multiview Compressive Coding for 3D Reconstruction

1 code implementation • CVPR 2023 • Chao-yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, Georgia Gkioxari

We introduce a simple framework that operates on 3D points of single objects or whole scenes coupled with category-agnostic large-scale training from diverse RGB-D videos.

Ranked #2 on Single-View 3D Reconstruction on Common Objects in 3D

3D Reconstruction Self-Supervised Learning +1

602

Paper
Code

View Synthesis by Appearance Flow

4 code implementations • 11 May 2016 • Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, Alexei A. Efros

We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints.

Novel View Synthesis

557

Paper
Code

Generic 3D Representation via Pose Estimation and Matching

1 code implementation • 23 Oct 2017 • Amir R. Zamir, Tilman Wekel, Pulkit Argrawal, Colin Weil, Jitendra Malik, Silvio Savarese

Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited.

Object Pose Estimation +1

423

Paper
Code

Decoupling Human and Camera Motion from Videos in the Wild

1 code implementation • CVPR 2023 • Vickie Ye, Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa

Our method robustly recovers the global 3D trajectories of people in challenging in-the-wild videos, such as PoseTrack.

421

Paper
Code

Learning Individual Styles of Conversational Gesture

2 code implementations • CVPR 2019 • Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, Jitendra Malik

Specifically, we perform cross-modal translation from "in-the-wild'' monologue speech of a single speaker to their hand and arm motion.

Ranked #4 on Gesture Generation on BEAT

Gesture Generation Speech-to-Gesture Translation +1

356

Paper
Code

Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans

1 code implementation • ICCV 2021 • Ainaz Eftekhar, Alexander Sax, Roman Bachmann, Jitendra Malik, Amir Zamir

This paper introduces a pipeline to parametrically sample and render multi-task vision datasets from comprehensive 3D scans from the real world.

Depth Estimation Surface Normal Estimation

353

Paper
Code

It Is Not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction

3 code implementations • ECCV 2020 • Karttikeya Mangalam, Harshayu Girase, Shreyas Agarwal, Kuan-Hui Lee, Ehsan Adeli, Jitendra Malik, Adrien Gaidon

In this work, we present Predicted Endpoint Conditioned Network (PECNet) for flexible human trajectory prediction.

Ranked #1 on Multi Future Trajectory Prediction on ETH/UCY

Autonomous Navigation Multi-future Trajectory Prediction +3

325

Paper
Code

From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting

2 code implementations • ICCV 2021 • Karttikeya Mangalam, Yang An, Harshayu Girase, Jitendra Malik

Uncertainty in future trajectories stems from two sources: (a) sources that are known to the agent but unknown to the model, such as long term goals and (b)sources that are unknown to both the agent & the model, such as intent of other agents & irreducible randomness indecisions.

Ranked #3 on Trajectory Prediction on ETH/UCY

Trajectory Forecasting

325

Paper
Code

Learning to Learn with Generative Models of Neural Network Checkpoints

1 code implementation • 26 Sep 2022 • William Peebles, Ilija Radosavovic, Tim Brooks, Alexei A. Efros, Jitendra Malik

We explore a data-driven approach for learning to optimize neural networks.

322

Paper
Code

SFV: Reinforcement Learning of Physical Skills from Videos

1 code implementation • 8 Oct 2018 • Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, Sergey Levine

In this paper, we propose a method that enables physically simulated characters to learn skills from videos (SFV).

Pose Estimation reinforcement-learning +1

305

Paper
Code

Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation

1 code implementation • 3 Mar 2015 • Jordi Pont-Tuset, Pablo Arbelaez, Jonathan T. Barron, Ferran Marques, Jitendra Malik

We propose a unified approach for bottom-up hierarchical image segmentation and object proposal generation for recognition, called Multiscale Combinatorial Grouping (MCG).

Image Segmentation Object +2

304

Paper
Code

Learning a Multi-View Stereo Machine

1 code implementation • NeurIPS 2017 • Abhishek Kar, Christian Häne, Jitendra Malik

We thoroughly evaluate our approach on the ShapeNet dataset and demonstrate the benefits over classical approaches as well as recent learning based methods.

3D Reconstruction

249

Paper
Code

Long-term Human Motion Prediction with Scene Context

1 code implementation • ECCV 2020 • Zhe Cao, Hang Gao, Karttikeya Mangalam, Qi-Zhi Cai, Minh Vo, Jitendra Malik

Human movement is goal-directed and influenced by the spatial layout of the objects in the scene.

Human motion prediction motion prediction +1

240

Paper
Code

On the Benefits of 3D Pose and Tracking for Human Action Recognition

1 code implementation • CVPR 2023 • Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Christoph Feichtenhofer, Jitendra Malik

Subsequently, we propose a Lagrangian Action Recognition model by fusing 3D pose and contextualized appearance over tracklets.

Ranked #1 on Action Recognition on AVA v2.2 (using extra training data)

Action Recognition Temporal Action Localization

218

Paper
Code

Uncertainty Sets for Image Classifiers using Conformal Prediction

5 code implementations • ICLR 2021 • Anastasios Angelopoulos, Stephen Bates, Jitendra Malik, Michael. I. Jordan

Convolutional image classifiers can achieve high predictive accuracy, but quantifying their uncertainty remains an unresolved challenge, hindering their deployment in consequential settings.

Conformal Prediction Uncertainty Quantification

210

Paper
Code

3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera

1 code implementation • ICCV 2019 • Iro Armeni, Zhi-Yang He, JunYoung Gwak, Amir R. Zamir, Martin Fischer, Jitendra Malik, Silvio Savarese

Given a 3D mesh and registered panoramic images, we construct a graph that spans the entire building and includes semantics on objects (e. g., class, material, and other attributes), rooms (e. g., scene category, volume, etc.)

208

Paper
Code

Zero-Shot Visual Imitation

1 code implementation • ICLR 2018 • Deepak Pathak, Parsa Mahmoudieh, Guanghao Luo, Pulkit Agrawal, Dian Chen, Yide Shentu, Evan Shelhamer, Jitendra Malik, Alexei A. Efros, Trevor Darrell

In our framework, the role of the expert is only to communicate the goals (i. e., what to imitate) during inference.

Imitation Learning

203

Paper
Code

Masked Visual Pre-training for Motor Control

1 code implementation • 11 Mar 2022 • Tete Xiao, Ilija Radosavovic, Trevor Darrell, Jitendra Malik

This paper shows that self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels.

191

Paper
Code

Real-World Robot Learning with Masked Visual Pre-training

1 code implementation • 6 Oct 2022 • Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell

Finally, we train a 307M parameter vision transformer on a massive collection of 4. 5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning.

191

Paper
Code

Robust Learning Through Cross-Task Consistency

1 code implementation • 7 Jun 2020 • Amir Zamir, Alexander Sax, Teresa Yeo, Oğuzhan Kar, Nikhil Cheerla, Rohan Suri, Zhangjie Cao, Jitendra Malik, Leonidas Guibas

Visual perception entails solving a wide set of tasks, e. g., object detection, depth estimation, etc.

3D Reconstruction Depth Estimation +4

176

Paper
Code

Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild

1 code implementation • ECCV 2020 • Jason Y. Zhang, Sam Pepose, Hanbyul Joo, Deva Ramanan, Jitendra Malik, Angjoo Kanazawa

We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment.

Ranked #3 on 3D Object Reconstruction on BEHAVE

3D Human Pose Estimation 3D Human Reconstruction +5

173

Paper
Code

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

1 code implementation • 22 Jul 2014 • Saurabh Gupta, Ross Girshick, Pablo Arbeláez, Jitendra Malik

In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features.

Ranked #6 on Object Detection In Indoor Scenes on SUN RGB-D

Instance Segmentation Object +3

164

Paper
Code

Hypercolumns for Object Segmentation and Fine-grained Localization

6 code implementations • CVPR 2015 • Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik

Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation.

Object Semantic Segmentation

156

Paper
Code

Learning Shape Abstractions by Assembling Volumetric Primitives

4 code implementations • CVPR 2017 • Shubham Tulsiani, Hao Su, Leonidas J. Guibas, Alexei A. Efros, Jitendra Malik

We present a learning framework for abstracting complex shapes by learning to assemble objects using 3D volumetric primitives.

147

Paper
Code

Visual Semantic Role Labeling

1 code implementation • 17 May 2015 • Saurabh Gupta, Jitendra Malik

In this paper we introduce the problem of Visual Semantic Role Labeling: given an image we want to detect people doing actions and localize the objects of interaction.

16k Action Classification +3

144

Paper
Code

Deep Isometric Learning for Visual Recognition

1 code implementation • ICML 2020 • Haozhi Qi, Chong You, Xiaolong Wang, Yi Ma, Jitendra Malik

Initialization, normalization, and skip connections are believed to be three indispensable techniques for training very deep convolutional neural networks and obtaining state-of-the-art performance.

142

Paper
Code

Multimodal Image Synthesis with Conditional Implicit Maximum Likelihood Estimation

2 code implementations • 7 Apr 2020 • Ke Li, Shichong Peng, Tianhao Zhang, Jitendra Malik

Many tasks in computer vision and graphics fall within the framework of conditional image synthesis.

Image Generation Image Super-Resolution

139

Paper
Code

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

1 code implementation • CVPR 2022 • Chao-yuan Wu, Yanghao Li, Karttikeya Mangalam, Haoqi Fan, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

Instead of trying to process more frames at once like most existing methods, we propose to process videos in an online fashion and cache "memory" at each iteration.

Ranked #3 on Action Anticipation on EPIC-KITCHENS-100 (using extra training data)

Action Anticipation Action Classification +2

135

Paper
Code

Contextual Action Recognition with R*CNN

2 code implementations • ICCV 2015 • Georgia Gkioxari, Ross Girshick, Jitendra Malik

In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition system.

Ranked #4 on Weakly Supervised Object Detection on HICO-DET

Action Recognition Attribute +3

132

Paper
Code

DeepBox: Learning Objectness with Convolutional Networks

1 code implementation • ICCV 2015 • Wei-cheng Kuo, Bharath Hariharan, Jitendra Malik

Existing object proposal approaches use primarily bottom-up cues to rank proposals, while we believe that objectness is in fact a high level construct.

128

Paper
Code

Deformable Part Models are Convolutional Neural Networks

1 code implementation • CVPR 2015 • Ross Girshick, Forrest Iandola, Trevor Darrell, Jitendra Malik

Deformable part models (DPMs) and convolutional neural networks (CNNs) are two widely used tools for visual recognition.

Ranked #28 on Object Detection on PASCAL VOC 2007

Object Detection Rolling Shutter Correction

128

Paper
Code

Beyond Skip Connections: Top-Down Modulation for Object Detection

1 code implementation • 20 Dec 2016 • Abhinav Shrivastava, Rahul Sukthankar, Jitendra Malik, Abhinav Gupta

But most of these fine details are lost in the early convolutional layers.

Ranked #214 on Object Detection on COCO test-dev

Object object-detection +1

111

Paper
Code

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

1 code implementation • CVPR 2022 • Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, Du Tran

From PA we construct a large set of pseudo-ground-truth instance masks; combined with human-annotated instance masks we train GGNs and significantly outperform the SOTA on open-world instance segmentation on various benchmarks including COCO, LVIS, ADE20K, and UVO.

Open-World Instance Segmentation Semantic Segmentation

111

Paper
Code

Learning Long-term Visual Dynamics with Region Proposal Interaction Networks

1 code implementation • ICLR 2021 • Haozhi Qi, Xiaolong Wang, Deepak Pathak, Yi Ma, Jitendra Malik

Learning long-term dynamics models is the key to understanding physical common sense.

Ranked #1 on Visual Reasoning on PHYRE-1B-Within

Common Sense Reasoning Object +2

110

Paper
Code

Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies

1 code implementation • 31 Dec 2018 • Alexander Sax, Bradley Emi, Amir R. Zamir, Leonidas Guibas, Silvio Savarese, Jitendra Malik

This skill set (hereafter mid-level perception) provides the policy with a more processed state of the world compared to raw images.

Object Detection

107

Paper
Code

Learning to Navigate Using Mid-Level Visual Priors

1 code implementation • 23 Dec 2019 • Alexander Sax, Jeffrey O. Zhang, Bradley Emi, Amir Zamir, Silvio Savarese, Leonidas Guibas, Jitendra Malik

How much does having visual priors about the world (e. g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e. g. navigating a complex environment)?

Navigate reinforcement-learning +2

107

Paper
Code

Cross Modal Distillation for Supervision Transfer

1 code implementation • CVPR 2016 • Saurabh Gupta, Judy Hoffman, Jitendra Malik

In this work we propose a technique that transfers supervision between images from different modalities.

Optical Flow Estimation

Paper
Code

Feedback Networks

1 code implementation • CVPR 2017 • Amir R. Zamir, Te-Lin Wu, Lin Sun, William Shen, Jitendra Malik, Silvio Savarese

Currently, the most successful learning models in computer vision are based on learning successive representations followed by a decision layer.

Paper
Code

Which Tasks Should Be Learned Together in Multi-task Learning?

1 code implementation • ICML 2020 • Trevor Standley, Amir R. Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, Silvio Savarese

Many computer vision applications require solving multiple tasks in real-time.

Multi-Task Learning

Paper
Code

In-Hand Object Rotation via Rapid Motor Adaptation

1 code implementation • 10 Oct 2022 • Haozhi Qi, Ashish Kumar, Roberto Calandra, Yi Ma, Jitendra Malik

Generalized in-hand manipulation has long been an unsolved challenge of robotics.

Object Reinforcement Learning (RL)

Paper
Code

Predicting 3D Human Dynamics from Video

1 code implementation • ICCV 2019 • Jason Y. Zhang, Panna Felsen, Angjoo Kanazawa, Jitendra Malik

In this work, we present perhaps the first approach for predicting a future 3D mesh model sequence of a person from past video input.

3D Human Dynamics 3D Human Pose Estimation +2

Paper
Code

Tracking People with 3D Representations

1 code implementation • NeurIPS 2021 • Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

We find that 3D representations are more effective than 2D representations for tracking in these settings, and we obtain state-of-the-art performance.

Paper
Code

Speculative Decoding with Big Little Decoder

1 code implementation • NeurIPS 2023 • Sehoon Kim, Karttikeya Mangalam, Suhong Moon, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer

To address this, we propose Big Little Decoder (BiLD), a framework that can improve inference efficiency and latency for a wide range of text generation applications.

Machine Translation Text Generation

Paper
Code

Learning Instance Segmentation by Interaction

1 code implementation • 21 Jun 2018 • Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik

The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels.

Instance Segmentation Segmentation +1

Paper
Code

Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging

2 code implementations • 10 Feb 2022 • Anastasios N Angelopoulos, Amit P Kohli, Stephen Bates, Michael I Jordan, Jitendra Malik, Thayer Alshaabi, Srigokul Upadhyayula, Yaniv Romano

Image-to-image regression is an important learning task, used frequently in biological imaging.

Conformal Prediction Prediction Intervals +3

Paper
Code

Differentiable Stereopsis: Meshes from multiple views using differentiable rendering

1 code implementation • CVPR 2022 • Shubham Goel, Georgia Gkioxari, Jitendra Malik

We propose Differentiable Stereopsis, a multi-view stereo approach that reconstructs shape and texture from few input views and noisy cameras.

Paper
Code

Learning to Poke by Poking: Experiential Learning of Intuitive Physics

1 code implementation • NeurIPS 2016 • Pulkit Agrawal, Ashvin Nair, Pieter Abbeel, Jitendra Malik, Sergey Levine

We investigate an experiential learning paradigm for acquiring an internal model of intuitive physics.

Decision Making

Paper
Code

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

2 code implementations • ECCV 2020 • Jeffrey O. Zhang, Alexander Sax, Amir Zamir, Leonidas Guibas, Jitendra Malik

When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than starting from randomly initialized weights.

Imitation Learning Incremental Learning +3

Paper
Code

3D Shape Reconstruction from Vision and Touch

1 code implementation • NeurIPS 2020 • Edward J. Smith, Roberto Calandra, Adriana Romero, Georgia Gkioxari, David Meger, Jitendra Malik, Michal Drozdzal

When a toddler is presented a new toy, their instinctual behaviour is to pick it upand inspect it with their hand and eyes in tandem, clearly searching over its surface to properly understand what they are playing with.

3D Shape Reconstruction

Paper
Code

Finding Action Tubes

1 code implementation • CVPR 2015 • Georgia Gkioxari, Jitendra Malik

We address the problem of action detection in videos.

Ranked #4 on Action Detection on UCF Sports

object-detection Object Detection +1

Paper
Code

Human Pose Estimation with Iterative Error Feedback

1 code implementation • CVPR 2016 • Joao Carreira, Pulkit Agrawal, Katerina Fragkiadaki, Jitendra Malik

Hierarchical feature extractors such as Convolutional Networks (ConvNets) have achieved impressive performance on a variety of classification tasks using purely feedforward processing.

Ranked #43 on Pose Estimation on MPII Human Pose

Pose Estimation Semantic Segmentation

Paper
Code

Diverse Image Synthesis from Semantic Layouts via Conditional IMLE

1 code implementation • ICCV 2019 • Ke Li, Tianhao Zhang, Jitendra Malik

Most existing methods for conditional image synthesis are only able to generate a single plausible image for any given input, or at best a fixed number of plausible images.

Image Generation Semantic Segmentation

Paper
Code

xT: Nested Tokenization for Larger Context in Large Images

1 code implementation • 4 Mar 2024 • Ritwik Gupta, Shufan Li, Tyler Zhu, Jitendra Malik, Trevor Darrell, Karttikeya Mangalam

Modern computer vision pipelines handle large images in one of two sub-optimal ways: down-sampling or cropping.

Paper
Code

Fast k-Nearest Neighbour Search via Prioritized DCI

2 code implementations • ICML 2017 • Ke Li, Jitendra Malik

Most exact methods for k-nearest neighbour search suffer from the curse of dimensionality; that is, their query times exhibit exponential dependence on either the ambient or the intrinsic dimensionality.

Paper
Code

Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing

1 code implementation • 1 Dec 2015 • Ke Li, Jitendra Malik

Existing methods for retrieving k-nearest neighbours suffer from the curse of dimensionality.

Paper
Code

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

1 code implementation • NeurIPS 2023 • Karttikeya Mangalam, Raiymbek Akshulakov, Jitendra Malik

We introduce EgoSchema, a very long-form video question-answering dataset, and benchmark to evaluate long video understanding capabilities of modern vision and language systems.

Multiple-choice Question Answering +2

Paper
Code

ABO: Dataset and Benchmarks for Real-World 3D Object Understanding

1 code implementation • CVPR 2022 • Jasmine Collins, Shubham Goel, Kenan Deng, Achleshwar Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F. Yago Vicente, Thomas Dideriksen, Himanshu Arora, Matthieu Guillaumin, Jitendra Malik

ABO contains product catalog images, metadata, and artist-created 3D models with complex geometries and physically-based materials that correspond to real, household objects.

3D Reconstruction Metric Learning +3

Paper
Code

MAViL: Masked Audio-Video Learners

1 code implementation • NeurIPS 2023 • Po-Yao Huang, Vasu Sharma, Hu Xu, Chaitanya Ryali, Haoqi Fan, Yanghao Li, Shang-Wen Li, Gargi Ghosh, Jitendra Malik, Christoph Feichtenhofer

We present Masked Audio-Video Learners (MAViL) to train audio-visual representations.

Contrastive Learning Retrieval

Paper
Code

Implicit Maximum Likelihood Estimation

1 code implementation • ICLR 2019 • Ke Li, Jitendra Malik

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly.

Paper
Code

Non-Adversarial Image Synthesis with Generative Latent Nearest Neighbors

1 code implementation • CVPR 2019 • Yedid Hoshen, Jitendra Malik

GLANN combines the strengths of IMLE and GLO in a way that overcomes the main drawbacks of each method.

Image Generation Unconditional Image Generation

Paper
Code

Inclusive GAN: Improving Data and Minority Coverage in Generative Models

1 code implementation • ECCV 2020 • Ning Yu, Ke Li, Peng Zhou, Jitendra Malik, Larry Davis, Mario Fritz

Generative Adversarial Networks (GANs) have brought about rapid progress towards generating photorealistic images.

Paper
Code

Active 3D Shape Reconstruction from Vision and Touch

2 code implementations • NeurIPS 2021 • Edward J. Smith, David Meger, Luis Pineda, Roberto Calandra, Jitendra Malik, Adriana Romero, Michal Drozdzal

In this paper, we focus on this problem and introduce a system composed of: 1) a haptic simulator leveraging high spatial resolution vision-based tactile sensors for active touching of 3D objects; 2)a mesh-based 3D shape reconstruction model that relies on tactile or visuotactile signals; and 3) a set of data-driven solutions with either tactile or visuotactile priors to guide the shape exploration.

3D Reconstruction 3D Shape Reconstruction

Paper
Code

Hierarchical Surface Prediction for 3D Object Reconstruction

1 code implementation • 3 Apr 2017 • Christian Häne, Shubham Tulsiani, Jitendra Malik

A major limitation of such approaches is that they only predict a coarse resolution voxel grid, which does not capture the surface of the objects well.

3D Geometry Prediction 3D Object Reconstruction +1

Paper
Code

Pose Induction for Novel Object Categories

1 code implementation • ICCV 2015 • Shubham Tulsiani, João Carreira, Jitendra Malik

We address the task of predicting pose for objects of unannotated object categories from a small seed set of annotated object classes.

Object

Paper
Code

RMA: Rapid Motor Adaptation for Legged Robots

1 code implementation • 8 Jul 2021 • Ashish Kumar, Zipeng Fu, Deepak Pathak, Jitendra Malik

Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear.

Paper
Code

Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

1 code implementation • 8 Jan 2024 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem

We use two coefficients on either type of residual connections respectively, and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision.

object-detection Small Object Detection +1

Paper
Code

Approximate Feature Collisions in Neural Nets

1 code implementation • NeurIPS 2019 • Ke Li, Tianhao Zhang, Jitendra Malik

Work on adversarial examples has shown that neural nets are surprisingly sensitive to adversarially chosen changes of small magnitude.

Paper
Code

AutoEval Done Right: Using Synthetic Data for Model Evaluation

1 code implementation • 9 Mar 2024 • Pierre Boyeau, Anastasios N. Angelopoulos, Nir Yosef, Jitendra Malik, Michael I. Jordan

The evaluation of machine learning models using human-labeled validation data can be expensive and time-consuming.

Paper
Code

PatchFCN for Intracranial Hemorrhage Detection

no code implementations • 8 Jun 2018 • Wei-cheng Kuo, Christian Häne, Esther Yuh, Pratik Mukherjee, Jitendra Malik

This paper studies the problem of detecting and segmenting acute intracranial hemorrhage on head computed tomography (CT) scans.

Computed Tomography (CT) object-detection +3

Paper
Add Code

More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch

no code implementations • 28 May 2018 • Roberto Calandra, Andrew Owens, Dinesh Jayaraman, Justin Lin, Wenzhen Yuan, Jitendra Malik, Edward H. Adelson, Sergey Levine

This model -- a deep, multimodal convolutional network -- predicts the outcome of a candidate grasp adjustment, and then executes a grasp by iteratively selecting the most promising actions.

Robotic Grasping

Paper
Add Code

Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene

no code implementations • CVPR 2018 • Shubham Tulsiani, Saurabh Gupta, David Fouhey, Alexei A. Efros, Jitendra Malik

The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in terms of shape and pose.

Paper
Add Code

Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction

no code implementations • CVPR 2018 • Shubham Tulsiani, Alexei A. Efros, Jitendra Malik

We present a framework for learning single-view shape and pose prediction without using direct supervision for either.

Pose Prediction

Paper
Add Code

Learning Category-Specific Mesh Reconstruction from Image Collections

no code implementations • ECCV 2018 • Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, Jitendra Malik

The shape is represented as a deformable 3D mesh model of an object category where a shape is parameterized by a learned mean shape and per-instance predicted deformation.

Paper
Add Code

Unifying Map and Landmark Based Representations for Visual Navigation

no code implementations • 21 Dec 2017 • Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik

This works presents a formulation for visual navigation that unifies map based spatial reasoning and path planning, with landmark based robust plan execution in noisy environments.

Navigate Visual Navigation

Paper
Add Code

From Lifestyle Vlogs to Everyday Interactions

no code implementations • CVPR 2018 • David F. Fouhey, Wei-cheng Kuo, Alexei A. Efros, Jitendra Malik

A major stumbling block to progress in understanding basic human interactions, such as getting out of bed or opening a refrigerator, is lack of good training data.

Future prediction

Paper
Add Code

Learning to Optimize Neural Nets

no code implementations • ICLR 2018 • Ke Li, Jitendra Malik

Learning to Optimize is a recently proposed framework for learning optimization algorithms using reinforcement learning.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency

no code implementations • CVPR 2017 • Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, Jitendra Malik

We study the notion of consistency between a 3D shape and a 2D observation and propose a differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view.

Paper
Add Code

Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulation

no code implementations • 6 Mar 2017 • Ashvin Nair, Dian Chen, Pulkit Agrawal, Phillip Isola, Pieter Abbeel, Jitendra Malik, Sergey Levine

Manipulation of deformable objects, such as ropes and cloth, is an important but challenging problem in robotics.

Self-Supervised Learning

Paper
Add Code

Amodal Instance Segmentation

no code implementations • 27 Apr 2016 • Ke Li, Jitendra Malik

We consider the problem of amodal instance segmentation, the objective of which is to predict the region encompassing both visible and occluded parts of each object.

Amodal Instance Segmentation Segmentation +1

Paper
Add Code

Iterative Instance Segmentation

no code implementations • CVPR 2016 • Ke Li, Bharath Hariharan, Jitendra Malik

Existing methods for pixel-wise labelling tasks generally disregard the underlying structure of labellings, often leading to predictions that are visually implausible.

Instance Segmentation Segmentation +2

Paper
Add Code

Learning to Optimize

no code implementations • 2016 2016 • Ke Li, Jitendra Malik

Algorithm design is a laborious process and often requires many iterations of ideation and validation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning Visual Predictive Models of Physics for Playing Billiards

no code implementations • 23 Nov 2015 • Katerina Fragkiadaki, Pulkit Agrawal, Sergey Levine, Jitendra Malik

The ability to plan and execute goal specific actions in varied, unexpected settings is a central requirement of intelligent agents.

Paper
Add Code

Exploring Person Context and Local Scene Context for Object Detection

no code implementations • 25 Nov 2015 • Saurabh Gupta, Bharath Hariharan, Jitendra Malik

In this paper we explore two ways of using context for object detection.

Object object-detection +1

Paper
Add Code

Shape and Symmetry Induction for 3D Objects

no code implementations • 24 Nov 2015 • Shubham Tulsiani, Abhishek Kar, Qi-Xing Huang, João Carreira, Jitendra Malik

Actions as simple as grasping an object or navigating around it require a rich understanding of that object's 3D shape from a given viewpoint.

General Classification Object

Paper
Add Code

Amodal Completion and Size Constancy in Natural Scenes

no code implementations • ICCV 2015 • Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik

We consider the problem of enriching current object detection systems with veridical object sizes and relative depth estimates from a single image.

Object object-detection +3

Paper
Add Code

Recurrent Network Models for Human Dynamics

no code implementations • ICCV 2015 • Katerina Fragkiadaki, Sergey Levine, Panna Felsen, Jitendra Malik

We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture.

Ranked #8 on Human Pose Forecasting on Human3.6M (MAR, walking, 1,000ms metric)

Human Dynamics Human Pose Forecasting +2

Paper
Add Code

Bandit Label Inference for Weakly Supervised Learning

no code implementations • 22 Sep 2015 • Ke Li, Jitendra Malik

The scarcity of data annotated at the desired level of granularity is a recurring issue in many applications.

Weakly-supervised Learning

Paper
Add Code

Learning to See by Moving

no code implementations • ICCV 2015 • Pulkit Agrawal, Joao Carreira, Jitendra Malik

We show that given the same number of training images, features learnt using egomotion as supervision compare favourably to features learnt using class-label as supervision on visual tasks of scene recognition, object recognition, visual odometry and keypoint matching.

Object Recognition Scene Recognition +1

Paper
Add Code

Learning to Segment Moving Objects in Videos

no code implementations • CVPR 2015 • Katerina Fragkiadaki, Pablo Arbelaez, Panna Felsen, Jitendra Malik

We segment moving objects in videos by ranking spatio-temporal segment proposals according to "moving objectness": how likely they are to contain a moving object.

Segmentation Video Segmentation +1

Paper
Add Code

Category-Specific Object Reconstruction from a Single Image

no code implementations • CVPR 2015 • Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik

Object reconstruction from a single image -- in the wild -- is a problem where we can make progress and get meaningful results today.

Object object-detection +2

Paper
Add Code

Actions and Attributes from Wholes and Parts

no code implementations • ICCV 2015 • Georgia Gkioxari, Ross Girshick, Jitendra Malik

We investigate the importance of parts for the tasks of action and attribute classification.

Attribute Classification +2

Paper
Add Code

Viewpoints and Keypoints

no code implementations • CVPR 2015 • Shubham Tulsiani, Jitendra Malik

We characterize the problem of pose estimation for rigid objects in terms of determining viewpoint to explain coarse pose and keypoint prediction to capture the finer details.

Ranked #3 on Keypoint Detection on Pascal3D+

Keypoint Detection

Paper
Add Code

Inferring 3D Object Pose in RGB-D Images

no code implementations • 16 Feb 2015 • Saurabh Gupta, Pablo Arbeláez, Ross Girshick, Jitendra Malik

The goal of this work is to replace objects in an RGB-D scene with corresponding 3D models from a library.

Object

Paper
Add Code

Virtual View Networks for Object Reconstruction

no code implementations • CVPR 2015 • João Carreira, Abhishek Kar, Shubham Tulsiani, Jitendra Malik

All that structure from motion algorithms "see" are sets of 2D points.

Object Object Reconstruction +1

Paper
Add Code

Analyzing the Performance of Multilayer Neural Networks for Object Recognition

no code implementations • 7 Jul 2014 • Pulkit Agrawal, Ross Girshick, Jitendra Malik

In the last two years, convolutional neural networks (CNNs) have achieved an impressive suite of results on standard recognition datasets and tasks.

Object Recognition

Paper
Add Code

Detecting People in Cubist Art

no code implementations • 22 Sep 2014 • Shiry Ginosar, Daniel Haas, Timothy Brown, Jitendra Malik

Although the human visual system is surprisingly robust to extreme distortion when recognizing objects, most evaluations of computer object detection methods focus only on robustness to natural form deformations such as people's pose changes.

Object object-detection +1

Paper
Add Code

Pixels to Voxels: Modeling Visual Representation in the Human Brain

no code implementations • 18 Jul 2014 • Pulkit Agrawal, Dustin Stansbury, Jitendra Malik, Jack L. Gallant

We find that both classes of models accurately predict brain activity in high-level visual areas, directly from pixels and without the need for any semantic tags or hand annotation of images.

BIG-bench Machine Learning Object Recognition

Paper
Add Code

Simultaneous Detection and Segmentation

no code implementations • 7 Jul 2014 • Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik

Unlike classical semantic segmentation, we require individual object instances.

Ranked #5 on Object Detection on PASCAL VOC 2012

object-detection Object Detection +2

Paper
Add Code

R-CNNs for Pose Estimation and Action Detection

no code implementations • 19 Jun 2014 • Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik

We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images.

Action Classification Action Detection +3

Paper
Add Code

Cost-Sensitive Active Learning for Intracranial Hemorrhage Detection

no code implementations • 8 Sep 2018 • Wei-cheng Kuo, Christian Häne, Esther Yuh, Pratik Mukherjee, Jitendra Malik

Deep learning for clinical applications is subject to stringent performance requirements, which raises a need for large labeled datasets.

Active Learning Computed Tomography (CT)

Paper
Add Code

Super-Resolution via Conditional Implicit Maximum Likelihood Estimation

no code implementations • 2 Oct 2018 • Ke Li, Shichong Peng, Jitendra Malik

Single-image super-resolution (SISR) is a canonical problem with diverse applications.

Image Super-Resolution

Paper
Add Code

On the Implicit Assumptions of GANs

no code implementations • 29 Nov 2018 • Ke Li, Jitendra Malik

Generative adversarial nets (GANs) have generated a lot of excitement.

Paper
Add Code

Are All Training Examples Created Equal? An Empirical Study

no code implementations • 30 Nov 2018 • Kailas Vodrahalli, Ke Li, Jitendra Malik

Modern computer vision algorithms often rely on very large training datasets.

Active Learning

Paper
Add Code

Visual Memory for Robust Path Following

no code implementations • NeurIPS 2018 • Ashish Kumar, Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik

Equipped with this abstraction, a second network observes the world and decides how to act to retrace the path under noisy actuation and a changing environment.

Paper
Add Code

Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction

no code implementations • NeurIPS 2014 • Katerina Fragkiadaki, Marta Salas, Pablo Arbelaez, Jitendra Malik

Furthermore, NRSfM needs to be robust to noise in both segmentation and tracking, e. g., drifting, segmentation ``leaking'', optical flow ``bleeding'' etc.

3D Reconstruction Clustering +5

Paper
Add Code

Accelerated Sparse Recovery Under Structured Measurements

no code implementations • ICLR 2019 • Ke Li, Jitendra Malik

Extensive work on compressed sensing has yielded a rich collection of sparse recovery algorithms, each making different tradeoffs between recovery condition and computational efficiency.

Computational Efficiency

Paper
Add Code

Learning Independent Object Motion from Unlabelled Stereoscopic Videos

no code implementations • CVPR 2019 • Zhe Cao, Abhishek Kar, Christian Haene, Jitendra Malik

Unlike prior learning based work which has focused on predicting dense pixel-wise optical flow field and/or a depth map for each image, we propose to predict object instance specific 3D scene flow maps and instance masks from which we are able to derive the motion direction and speed for each object instance.

Object Optical Flow Estimation

Paper
Add Code

Trajectory Normalized Gradients for Distributed Optimization

no code implementations • 24 Jan 2019 • Jianqiao Wangni, Ke Li, Jianbo Shi, Jitendra Malik

Recently, researchers proposed various low-precision gradient compression, for efficient communication in large-scale distributed optimization.

Benchmarking Distributed Optimization

Paper
Add Code

Intrinsic Scene Properties from a Single RGB-D Image

no code implementations • CVPR 2013 • Jonathan T. Barron, Jitendra Malik

Our model takes as input a single RGB-D image and produces as output an improved depth map, a set of surface normals, a reflectance image, a shading image, and a spatially varying model of illumination.

Paper
Add Code

Articulated Pose Estimation Using Discriminative Armlet Classifiers

no code implementations • CVPR 2013 • Georgia Gkioxari, Pablo Arbelaez, Lubomir Bourdev, Jitendra Malik

We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image.

Pose Estimation

Paper
Add Code

Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images

no code implementations • CVPR 2013 • Saurabh Gupta, Pablo Arbelaez, Jitendra Malik

We address the problems of contour detection, bottomup grouping and semantic segmentation using RGB-D data.

Boundary Detection Contour Detection +6

Paper
Add Code

Multiscale Combinatorial Grouping

no code implementations • CVPR 2014 • Pablo Arbelaez, Jordi Pont-Tuset, Jonathan T. Barron, Ferran Marques, Jitendra Malik

We propose a unified approach for bottom-up hierarchical image segmentation and object candidate generation for recognition, called Multiscale Combinatorial Grouping (MCG).

Image Segmentation Object +2

Paper
Add Code

Using k-Poselets for Detecting People and Localizing Their Keypoints

no code implementations • CVPR 2014 • Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik

A k-poselet is a deformable part model (DPM) with k parts, where each of the parts is a poselet, aligned to a specific configuration of keypoints based on ground-truth annotations.

Human Detection

Paper
Add Code

Depth From Shading, Defocus, and Correspondence Using Light-Field Angular Coherence

no code implementations • CVPR 2015 • Michael W. Tao, Pratul P. Srinivasan, Jitendra Malik, Szymon Rusinkiewicz, Ravi Ramamoorthi

Using shading information is essential to improve the shape estimation.

Depth Estimation

Paper
Add Code

Aligning 3D Models to RGB-D Images of Cluttered Scenes

no code implementations • CVPR 2015 • Saurabh Gupta, Pablo Arbelaez, Ross Girshick, Jitendra Malik

The goal of this work is to represent objects in an RGB-D scene with corresponding 3D models from a library.

Paper
Add Code

What Will Happen Next? Forecasting Player Moves in Sports Videos

no code implementations • ICCV 2017 • Panna Felsen, Pulkit Agrawal, Jitendra Malik

A large number of very popular team sports involve the act of one team trying to score a goal against the other.

Paper
Add Code

Combining Optimal Control and Learning for Visual Navigation in Novel Environments

no code implementations • 6 Mar 2019 • Somil Bansal, Varun Tolani, Saurabh Gupta, Jitendra Malik, Claire Tomlin

Model-based control is a popular paradigm for robot navigation because it can leverage a known dynamics model to efficiently plan robust robot trajectories.

Robot Navigation Visual Navigation

Paper
Add Code

Learning Navigation Subroutines from Egocentric Videos

no code implementations • 29 May 2019 • Ashish Kumar, Saurabh Gupta, Jitendra Malik

We demonstrate our proposed approach in context of navigation, and show that we can successfully learn consistent and diverse visuomotor subroutines from passive egocentric videos.

Computational Efficiency Pseudo Label

Paper
Add Code

State-Only Imitation Learning for Dexterous Manipulation

no code implementations • 7 Apr 2020 • Ilija Radosavovic, Xiaolong Wang, Lerrel Pinto, Jitendra Malik

To tackle this setting, we train an inverse dynamics model and use it to predict actions for state-only demonstrations.

Imitation Learning

Paper
Add Code

Shape and Viewpoint without Keypoints

no code implementations • ECCV 2020 • Shubham Goel, Angjoo Kanazawa, Jitendra Malik

We present a learning framework that learns to recover the 3D shape, pose and texture from a single image, trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.

Paper
Add Code

Shape, Illumination, and Reflectance from Shading

no code implementations • 7 Oct 2020 • Jonathan T. Barron, Jitendra Malik

A fundamental problem in computer vision is that of inferring the intrinsic, 3D structure of the world from flat, 2D images of that world.

Color Constancy

Paper
Add Code

Rearrangement: A Challenge for Embodied AI

no code implementations • 3 Nov 2020 • Dhruv Batra, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun, Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, Manolis Savva, Hao Su

In the rearrangement task, the goal is to bring a given physical environment into a specified state.

Benchmarking

Paper
Add Code

Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation

no code implementations • 13 Nov 2020 • Bryan Chen, Alexander Sax, Gene Lewis, Iro Armeni, Silvio Savarese, Amir Zamir, Jitendra Malik, Lerrel Pinto

Vision-based robotics often separates the control loop into one module for perception and a separate module for control.

Paper
Add Code

Better Knowledge Retention through Metric Learning

no code implementations • 26 Nov 2020 • Ke Li, Shichong Peng, Kailas Vodrahalli, Jitendra Malik

In continual learning, new categories may be introduced over time, and an ideal learning system should perform well on both the original categories and the new categories.

Continual Learning Metric Learning

Paper
Add Code

Human Mesh Recovery from Multiple Shots

no code implementations • CVPR 2022 • Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa

The tools we develop open the door to processing and analyzing in 3D content from a large library of edited media, which could be helpful for many downstream applications.

3D Reconstruction Human Mesh Recovery

Paper
Add Code

Reconstructing Hand-Object Interactions in the Wild

no code implementations • ICCV 2021 • Zhe Cao, Ilija Radosavovic, Angjoo Kanazawa, Jitendra Malik

In this work we explore reconstructing hand-object interactions in the wild.

3D Reconstruction Object

Paper
Add Code

Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots

no code implementations • 25 Oct 2021 • Zipeng Fu, Ashish Kumar, Jitendra Malik, Deepak Pathak

We demonstrate that learning to minimize energy consumption plays a key role in the emergence of natural locomotion gaits at different speeds in real quadruped robots.

Paper
Add Code

A Study of Robustness of Neural Nets Using Approximate Feature Collisions

no code implementations • 27 Sep 2018 • Ke Li*, Tianhao Zhang*, Jitendra Malik

In recent years, various studies have focused on the robustness of neural nets.

Paper
Add Code

Differentiable Spatial Planning using Transformers

no code implementations • 2 Dec 2021 • Devendra Singh Chaplot, Deepak Pathak, Jitendra Malik

We consider the problem of spatial path planning.

Paper
Add Code

SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency

no code implementations • NeurIPS 2021 • Devendra Singh Chaplot, Murtaza Dalal, Saurabh Gupta, Jitendra Malik, Ruslan Salakhutdinov

The observations gathered by this exploration policy are labelled using 3D consistency and used to improve the perception model.

Active Learning Instance Segmentation +3

Paper
Add Code

Coupling Vision and Proprioception for Navigation of Legged Robots

no code implementations • CVPR 2022 • Zipeng Fu, Ashish Kumar, Ananye Agarwal, Haozhi Qi, Jitendra Malik, Deepak Pathak

A safety advisor module adds sensed unexpected obstacles to the occupancy map and environment-determined speed limits to the velocity command generator.

Paper
Add Code

Tracking People by Predicting 3D Appearance, Location & Pose

no code implementations • 8 Dec 2021 • Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

For a future frame, we compute the similarity between the predicted state of a tracklet and the single frame observations in a probabilistic manner.

Paper
Add Code

PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

no code implementations • CVPR 2022 • Santhosh Kumar Ramakrishnan, Devendra Singh Chaplot, Ziad Al-Halah, Jitendra Malik, Kristen Grauman

We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of `where to look?'

Navigate

Paper
Add Code

Adapting Rapid Motor Adaptation for Bipedal Robots

no code implementations • 30 May 2022 • Ashish Kumar, Zhongyu Li, Jun Zeng, Deepak Pathak, Koushil Sreenath, Jitendra Malik

In this work, we leverage recent advances in rapid adaptation for locomotion control, and extend them to work on bipedal robots.

Paper
Add Code

Tracking People by Predicting 3D Appearance, Location and Pose

no code implementations • CVPR 2022 • Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

For a future frame, we compute the similarity between the predicted state of a tracklet and the single frame observations in a probabilistic manner.

Paper
Add Code

Multi-skill Mobile Manipulation for Object Rearrangement

no code implementations • 6 Sep 2022 • Jiayuan Gu, Devendra Singh Chaplot, Hao Su, Jitendra Malik

To tackle the entire task, prior work chains multiple stationary manipulation skills with a point-goal navigation skill, which are learned individually on subtasks.

Object

Paper
Add Code

Learning a Single Near-hover Position Controller for Vastly Different Quadcopters

no code implementations • 19 Sep 2022 • Dingqi Zhang, Antonio Loquercio, Xiangyu Wu, Ashish Kumar, Jitendra Malik, Mark W. Mueller

This paper proposes an adaptive near-hover position controller for quadcopters, which can be deployed to quadcopters of very different mass, size and motor constants, and also shows rapid adaptation to unknown disturbances during runtime.

Drone Controller Position

Paper
Add Code

Learning Visual Locomotion with Cross-Modal Supervision

no code implementations • 7 Nov 2022 • Antonio Loquercio, Ashish Kumar, Jitendra Malik

In this work, we show how to learn a visual walking policy that only uses a monocular RGB camera and proprioception.

Paper
Add Code

Legged Locomotion in Challenging Terrains using Egocentric Vision

no code implementations • 14 Nov 2022 • Ananye Agarwal, Ashish Kumar, Jitendra Malik, Deepak Pathak

Animals are capable of precise and agile locomotion using vision.

Paper
Add Code

Learning to Imitate Object Interactions from Internet Videos

no code implementations • 23 Nov 2022 • Austin Patel, Andrew Wang, Ilija Radosavovic, Jitendra Malik

In this paper we make two main contributions: (1) a novel reconstruction technique RHOV (Reconstructing Hands and Objects from Videos), which reconstructs 4D trajectories of both the hand and the object using 2D image cues and temporal smoothness constraints; (2) a system for imitating object interactions in a physics simulator with reinforcement learning.

Object

Paper
Add Code

Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances

no code implementations • 29 Nov 2022 • Jacob Krantz, Stefan Lee, Jitendra Malik, Dhruv Batra, Devendra Singh Chaplot

We consider the problem of embodied visual navigation given an image-goal (ImageNav) where an agent is initialized in an unfamiliar environment and tasked with navigating to a location 'described' by an image.

Visual Navigation

Paper
Add Code

Navigating to Objects in the Real World

no code implementations • 2 Dec 2022 • Theophile Gervet, Soumith Chintala, Dhruv Batra, Jitendra Malik, Devendra Singh Chaplot

In contrast, end-to-end learning does not, dropping from 77% simulation to 23% real-world success rate due to a large image domain gap between simulation and reality.

Navigate Visual Navigation

Paper
Add Code

Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction

no code implementations • 20 Dec 2022 • Boyi Li, Rodolfo Corona, Karttikeya Mangalam, Catherine Chen, Daniel Flaherty, Serge Belongie, Kilian Q. Weinberger, Jitendra Malik, Trevor Darrell, Dan Klein

Are multimodal inputs necessary for grammar induction?

Constituency Parsing

Paper
Add Code

CA$^2$T-Net: Category-Agnostic 3D Articulation Transfer from Single Image

no code implementations • 5 Jan 2023 • Jasmine Collins, Anqi Liang, Jitendra Malik, Hao Zhang, Frédéric Devernay

We present a neural network approach to transfer the motion from a single image of an articulated object to a rest-state (i. e., unarticulated) 3D model.

Object

Paper
Add Code

Real-World Humanoid Locomotion with Reinforcement Learning

no code implementations • 6 Mar 2023 • Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, Koushil Sreenath

Humanoid robots that can autonomously operate in diverse environments have the potential to help address labour shortages in factories, assist elderly at homes, and colonize new planets.

reinforcement-learning

Paper
Add Code

Navigating to Objects Specified by Images

no code implementations • ICCV 2023 • Jacob Krantz, Theophile Gervet, Karmesh Yadav, Austin Wang, Chris Paxton, Roozbeh Mottaghi, Dhruv Batra, Jitendra Malik, Stefan Lee, Devendra Singh Chaplot

Our modular method solves sub-tasks of exploration, goal instance re-identification, goal localization, and local navigation.

Navigate Visual Reasoning

Paper
Add Code

Robot Learning with Sensorimotor Pre-training

no code implementations • 16 Jun 2023 • Ilija Radosavovic, Baifeng Shi, Letian Fu, Ken Goldberg, Trevor Darrell, Jitendra Malik

We present a self-supervised sensorimotor pre-training approach for robotics.

Motion Planning

Paper
Add Code

Learning Space-Time Semantic Correspondences

no code implementations • 16 Jun 2023 • Du Tran, Jitendra Malik

We propose a new task of space-time semantic correspondence prediction in videos.

Imitation Learning Semantic correspondence +1

Paper
Add Code

Learning Vision-based Pursuit-Evasion Robot Policies

no code implementations • 30 Aug 2023 • Andrea Bajcsy, Antonio Loquercio, Ashish Kumar, Jitendra Malik

We find that the quality of the supervision signal for the partially-observable pursuer policy depends on two key factors: the balance of diversity and optimality of the evader's behavior and the strength of the modeling assumptions in the fully-observable policy.

Paper
Add Code

General In-Hand Object Rotation with Vision and Touch

no code implementations • 18 Sep 2023 • Haozhi Qi, Brent Yi, Sudharshan Suresh, Mike Lambeta, Yi Ma, Roberto Calandra, Jitendra Malik

We introduce RotateIt, a system that enables fingertip-based object rotation along multiple axes by leveraging multimodal sensory inputs.

Object

Paper
Add Code

Conformal Decision Theory: Safe Autonomous Decisions from Imperfect Predictions

no code implementations • 9 Oct 2023 • Jordan Lekeufack, Anastasios N. Angelopoulos, Andrea Bajcsy, Michael I. Jordan, Jitendra Malik

We introduce Conformal Decision Theory, a framework for producing safe autonomous decisions despite imperfect machine learning predictions.

Conformal Prediction Motion Planning

Paper
Add Code

What Matters to You? Towards Visual Representation Alignment for Robot Learning

no code implementations • 11 Oct 2023 • Ran Tian, Chenfeng Xu, Masayoshi Tomizuka, Jitendra Malik, Andrea Bajcsy

When operating in service of people, robots need to optimize rewards aligned with end-user preferences.

Zero-shot Generalization

Paper
Add Code

Interactive Task Planning with Language Models

no code implementations • 16 Oct 2023 • Boyi Li, Philipp Wu, Pieter Abbeel, Jitendra Malik

An interactive robot framework accomplishes long-horizon task planning and can easily generalize to new goals or distinct tasks, even during execution.

Language Modelling Large Language Model +1

Paper
Add Code

Conformal Policy Learning for Sensorimotor Control Under Distribution Shifts

no code implementations • 2 Nov 2023 • Huang Huang, Satvik Sharma, Antonio Loquercio, Anastasios Angelopoulos, Ken Goldberg, Jitendra Malik

The key idea is the design of switching policies that can take conformal quantiles as input, which we define as conformal policy learning, that allows robots to detect distribution shifts with formal statistical guarantees.

Autonomous Driving Conformal Prediction

Paper
Add Code

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

no code implementations • 30 Nov 2023 • Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.

Video Understanding

Paper
Add Code

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

no code implementations • NeurIPS 2023 • Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

Contrary to inferences from prior work, we find that scaling dataset size and diversity does not improve performance universally (but does so on average).

Paper
Add Code

Reconstructing Hands in 3D with Transformers

no code implementations • 8 Dec 2023 • Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, Jitendra Malik

The key to HaMeR's success lies in scaling up both the data used for training and the capacity of the deep network for hand reconstruction.

Paper
Add Code

Adaptive Human Trajectory Prediction via Latent Corridors

no code implementations • 11 Dec 2023 • Neerja Thakkar, Karttikeya Mangalam, Andrea Bajcsy, Jitendra Malik

We formalize the problem of scene-specific adaptive trajectory prediction and propose a new adaptation approach inspired by prompt tuning called latent corridors.

Trajectory Prediction Zero-shot Generalization

Paper
Add Code

Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation

no code implementations • 20 Dec 2023 • Sudharshan Suresh, Haozhi Qi, Tingfan Wu, Taosha Fan, Luis Pineda, Mike Lambeta, Jitendra Malik, Mrinal Kalakrishnan, Roberto Calandra, Michael Kaess, Joseph Ortiz, Mustafa Mukadam

Our neural representation driven by multimodal sensing can serve as a perception backbone towards advancing robot dexterity.

Benchmarking

Paper
Add Code

Synthesizing Moving People with 3D Control

no code implementations • 19 Jan 2024 • Boyi Li, Jathushan Rajasegaran, Yossi Gandelsman, Alexei A. Efros, Jitendra Malik

This disentangled approach allows our method to generate a sequence of images that are faithful to the target motion in the 3D pose and, to the input image in terms of visual similarity.

Paper
Add Code

Humanoid Locomotion as Next Token Prediction

no code implementations • 29 Feb 2024 • Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, Jitendra Malik

We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language.

Humanoid Control

Paper
Add Code

Twisting Lids Off with Two Hands

no code implementations • 4 Mar 2024 • Toru Lin, Zhao-Heng Yin, Haozhi Qi, Pieter Abbeel, Jitendra Malik

Manipulating objects with two multi-fingered hands has been a long-standing challenge in robotics, attributed to the contact-rich nature of many manipulation tasks and the complexity inherent in coordinating a high-dimensional bimanual system.

reinforcement-learning

Paper
Add Code

Reconstructing Hand-Held Objects in 3D

no code implementations • 9 Apr 2024 • Jane Wu, Georgios Pavlakos, Georgia Gkioxari, Jitendra Malik

At the same time, two strong anchors emerge in this setting: (1) estimated 3D hands help disambiguate the location and scale of the object, and (2) the set of manipulanda is small relative to all possible objects.

Object Object Reconstruction

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.