Search Results for author: Deepak Pathak

Found 80 papers, 43 papers with code

Beyond Games: Bringing Exploration to Robots in Real-world

no code implementations ICLR 2019 Deepak Pathak, Dhiraj Gandhi, Abhinav Gupta

But most importantly, we are able to implement an exploration policy on a robot which learns to interact with objects completely from scratch just using data collected via the differentiable exploration module.

Efficient Exploration

Learning General-Purpose Controllers via Locally Communicating Sensorimotor Modules

no code implementations ICML 2020 Wenlong Huang, Igor Mordatch, Deepak Pathak

We observe a wide variety of drastically diverse locomotion styles across morphologies as well as centralized coordination emerging via message passing between decentralized modules purely from the reinforcement learning objective.

reinforcement-learning Reinforcement Learning (RL)

Evaluating Text-to-Visual Generation with Image-to-Text Generation

2 code implementations1 Apr 2024 Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, Deva Ramanan

For instance, the widely-used CLIPScore measures the alignment between a (generated) image and text prompt, but it fails to produce reliable scores for complex prompts involving compositions of objects, attributes, and relations.

Question Answering Text Generation +2

Adaptive Mobile Manipulation for Articulated Objects In the Open World

no code implementations25 Jan 2024 Haoyu Xiong, Russell Mendonca, Kenneth Shaw, Deepak Pathak

We also develop a low-cost mobile manipulation hardware platform capable of safe and autonomous online adaptation in unstructured environments with a cost of around 20, 000 USD.

PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play

no code implementations7 Dec 2023 Lili Chen, Shikhar Bahl, Deepak Pathak

To make diffusion models more useful for skill learning, we encourage robotic agents to acquire a vocabulary of skills by introducing discrete bottlenecks into the conditional behavior generation process.

Denoising

Dexterous Functional Grasping

no code implementations5 Dec 2023 Ananye Agarwal, Shagun Uppal, Kenneth Shaw, Deepak Pathak

However, this task requires both a complex understanding of functional affordances as well as precise low-level control.

Diffusion-TTA: Test-time Adaptation of Discriminative Models via Generative Feedback

1 code implementation27 Nov 2023 Mihir Prabhudesai, Tsung-Wei Ke, Alexander C. Li, Deepak Pathak, Katerina Fragkiadaki

Our method, Diffusion-TTA, adapts pre-trained discriminative models such as image classifiers, segmenters and depth predictors, to each unlabelled example in the test set using generative feedback from a diffusion model.

Test-time Adaptation

DEFT: Dexterous Fine-Tuning for Real-World Hand Policies

1 code implementation30 Oct 2023 Aditya Kannan, Kenneth Shaw, Shikhar Bahl, Pragna Mannam, Deepak Pathak

In this paper, we investigate these challenges, especially in the case of soft, deformable objects as well as complex, relatively long-horizon tasks.

Aligning Text-to-Image Diffusion Models with Reward Backpropagation

1 code implementation5 Oct 2023 Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, Katerina Fragkiadaki

Due to their unsupervised training, controlling their behavior in downstream tasks, such as maximizing human-perceived image quality, image-text alignment, or ethical image generation, is difficult.

Denoising Image Generation

Extreme Parkour with Legged Robots

no code implementations25 Sep 2023 Xuxin Cheng, Kexin Shi, Ananye Agarwal, Deepak Pathak

In this paper, we take a similar approach to developing robot parkour on a small low-cost robot with imprecise actuation and a single front-facing depth camera for perception which is low-frequency, jittery, and prone to artifacts.

LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning

no code implementations12 Sep 2023 Kenneth Shaw, Ananye Agarwal, Deepak Pathak

We show that LEAP Hand can be used to perform several manipulation tasks in the real world -- from visual teleoperation to learning from passive video data and sim2real.

Language Models as Black-Box Optimizers for Vision-Language Models

1 code implementation12 Sep 2023 Shihong Liu, Zhiqiu Lin, Samuel Yu, Ryan Lee, Tiffany Ling, Deepak Pathak, Deva Ramanan

We highlight the advantage of conversational feedback that incorporates both positive and negative prompts, suggesting that LLMs can utilize the implicit gradient direction in textual feedback for a more efficient search.

Few-Shot Image Classification

Efficient RL via Disentangled Environment and Agent Representations

no code implementations5 Sep 2023 Kevin Gmelin, Shikhar Bahl, Russell Mendonca, Deepak Pathak

Agents that are aware of the separation between themselves and their environments can leverage this understanding to form effective representations of visual input.

Structured World Models from Human Videos

no code implementations21 Aug 2023 Russell Mendonca, Shikhar Bahl, Deepak Pathak

We propose an approach for robots to efficiently learn manipulation skills using only a handful of real-world interaction trajectories from many different settings.

Revisiting the Role of Language Priors in Vision-Language Models

1 code implementation2 Jun 2023 Zhiqiu Lin, Xinyue Chen, Deepak Pathak, Pengchuan Zhang, Deva Ramanan

Our first observation is that they can be repurposed for discriminative tasks (such as image-text retrieval) by simply computing the match score of generating a particular text string given an image.

Image-text matching Language Modelling +6

Affordances from Human Videos as a Versatile Representation for Robotics

no code implementations CVPR 2023 Shikhar Bahl, Russell Mendonca, Lili Chen, Unnat Jain, Deepak Pathak

Utilizing internet videos of human behavior, we train a visual affordance model that estimates where and how in the scene a human is likely to interact.

Imitation Learning

Your Diffusion Model is Secretly a Zero-Shot Classifier

2 code implementations ICCV 2023 Alexander C. Li, Mihir Prabhudesai, Shivam Duggal, Ellis Brown, Deepak Pathak

Our generative approach to classification, which we call Diffusion Classifier, attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models.

Domain Generalization Fine-Grained Image Classification +5

Legs as Manipulator: Pushing Quadrupedal Agility Beyond Locomotion

no code implementations20 Mar 2023 Xuxin Cheng, Ashish Kumar, Deepak Pathak

Locomotion has seen dramatic progress for walking or running across challenging terrains.

Internet Explorer: Targeted Representation Learning on the Open Web

1 code implementation27 Feb 2023 Alexander C. Li, Ellis Brown, Alexei A. Efros, Deepak Pathak

Modern vision models typically rely on fine-tuning general-purpose models pre-trained on large, static datasets.

Classification Representation Learning +1

ALAN: Autonomously Exploring Robotic Agents in the Real World

no code implementations13 Feb 2023 Russell Mendonca, Shikhar Bahl, Deepak Pathak

Robotic agents that operate autonomously in the real world need to continuously explore their environment and learn from the data collected, with minimal human supervision.

Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models

1 code implementation CVPR 2023 Zhiqiu Lin, Samuel Yu, Zhiyi Kuang, Deepak Pathak, Deva Ramanan

By repurposing class names as additional one-shot training samples, we achieve SOTA results with an embarrassingly simple linear classifier for vision-language adaptation.

Audio Classification Few-Shot Learning

HERD: Continuous Human-to-Robot Evolution for Learning from Human Demonstration

no code implementations8 Dec 2022 Xingyu Liu, Deepak Pathak, Kris M. Kitani

The ability to learn from human demonstration endows robots with the ability to automate various tasks.

VideoDex: Learning Dexterity from Internet Videos

no code implementations8 Dec 2022 Kenneth Shaw, Shikhar Bahl, Deepak Pathak

We build a learning algorithm, VideoDex, that leverages visual, action, and physical priors from human video datasets to guide robot behavior.

Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion

no code implementations18 Oct 2022 Zipeng Fu, Xuxin Cheng, Deepak Pathak

The standard hierarchical control pipeline for such legged manipulators is to decouple the controller into that of manipulation and locomotion.

Continual Learning with Evolving Class Ontologies

no code implementations10 Oct 2022 Zhiqiu Lin, Deepak Pathak, Yu-Xiong Wang, Deva Ramanan, Shu Kong

LECO requires learning classifiers in distinct time periods (TPs); each TP introduces a new ontology of "fine" labels that refines old ontologies of "coarse" labels (e. g., dog breeds that refine the previous ${\tt dog}$).

Class Incremental Learning Image Classification +3

Understanding Collapse in Non-Contrastive Siamese Representation Learning

1 code implementation29 Sep 2022 Alexander C. Li, Alexei A. Efros, Deepak Pathak

We empirically analyze these non-contrastive methods and find that SimSiam is extraordinarily sensitive to dataset and model size.

Continual Learning Contrastive Learning +1

Human-to-Robot Imitation in the Wild

no code implementations19 Jul 2022 Shikhar Bahl, Abhinav Gupta, Deepak Pathak

We approach the problem of learning by watching humans in the wild.

Adapting Rapid Motor Adaptation for Bipedal Robots

no code implementations30 May 2022 Ashish Kumar, Zhongyu Li, Jun Zeng, Deepak Pathak, Koushil Sreenath, Jitendra Malik

In this work, we leverage recent advances in rapid adaptation for locomotion control, and extend them to work on bipedal robots.

Topologically-Aware Deformation Fields for Single-View 3D Reconstruction

1 code implementation CVPR 2022 Shivam Duggal, Deepak Pathak

The 3D shapes are generated implicitly as deformations to a category-specific signed distance field and are learned in an unsupervised manner solely from unaligned image collections and their poses without any 3D supervision.

3D Reconstruction Object +1

Test-time Adaptation with Slot-Centric Models

1 code implementation21 Mar 2022 Mihir Prabhudesai, Anirudh Goyal, Sujoy Paul, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Gaurav Aggarwal, Thomas Kipf, Deepak Pathak, Katerina Fragkiadaki

In our work, we find evidence that these losses are insufficient for the task of scene decomposition, without also considering architectural inductive biases.

Image Classification Image Segmentation +7

Robotic Telekinesis: Learning a Robotic Hand Imitator by Watching Humans on Youtube

no code implementations21 Feb 2022 Aravind Sivakumar, Kenneth Shaw, Deepak Pathak

Human hands and robot hands differ in shape, size, and joint structure, and performing this translation from a single uncalibrated camera is a highly underconstrained problem.

REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer

1 code implementation10 Feb 2022 Xingyu Liu, Deepak Pathak, Kris M. Kitani

We interpolate between the source robot and the target robot by finding a continuous evolutionary change of robot parameters.

Imitation Learning

The CLEAR Benchmark: Continual LEArning on Real-World Imagery

1 code implementation17 Jan 2022 Zhiqiu Lin, Jia Shi, Deepak Pathak, Deva Ramanan

The major strength of CLEAR over prior CL benchmarks is the smooth temporal evolution of visual concepts with real-world imagery, including both high-quality labeled data along with abundant unlabeled samples per time period for continual semi-supervised learning.

Continual Learning Image Classification +2

Functional Regularization for Reinforcement Learning via Learned Fourier Features

1 code implementation NeurIPS 2021 Alexander C. Li, Deepak Pathak

We propose a simple architecture for deep reinforcement learning by embedding inputs into a learned Fourier basis and show that it improves the sample efficiency of both state-based and image-based RL.

reinforcement-learning Reinforcement Learning (RL)

Coupling Vision and Proprioception for Navigation of Legged Robots

no code implementations CVPR 2022 Zipeng Fu, Ashish Kumar, Ananye Agarwal, Haozhi Qi, Jitendra Malik, Deepak Pathak

A safety advisor module adds sensed unexpected obstacles to the occupancy map and environment-determined speed limits to the velocity command generator.

Interesting Object, Curious Agent: Learning Task-Agnostic Exploration

1 code implementation NeurIPS 2021 Simone Parisi, Victoria Dean, Deepak Pathak, Abhinav Gupta

In this setup, the agent first learns to explore across many environments without any extrinsic goal in a task-agnostic manner.

Object

Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning

no code implementations4 Nov 2021 Wenlong Huang, Igor Mordatch, Pieter Abbeel, Deepak Pathak

We show that a single generalist policy can perform in-hand manipulation of over 100 geometrically-diverse real-world objects and generalize to new objects with unseen shape or size.

Multi-Task Learning Object +2

Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots

no code implementations25 Oct 2021 Zipeng Fu, Ashish Kumar, Jitendra Malik, Deepak Pathak

We demonstrate that learning to minimize energy consumption plays a key role in the emergence of natural locomotion gaits at different speeds in real quadruped robots.

Discovering and Achieving Goals via World Models

2 code implementations NeurIPS 2021 Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, Deepak Pathak

How can artificial agents learn to solve many diverse tasks in complex visual environments in the absence of any supervision?

Zero-Shot Reward Specification via Grounded Natural Language

no code implementations29 Sep 2021 Parsa Mahmoudieh, Sayna Ebrahimi, Deepak Pathak, Trevor Darrell

Reward signals in reinforcement learning can be expensive signals in many tasks and often require access to direct state.

Reinforcement Learning (RL)

How to Adapt Your Large-Scale Vision-and-Language Model

no code implementations29 Sep 2021 Konwoo Kim, Michael Laskin, Igor Mordatch, Deepak Pathak

Finally, we provide an empirical analysis and recommend general recipes for efficient transfer learning of vision and language models.

Image Classification Language Modelling +1

Hierarchical Neural Dynamic Policies

no code implementations12 Jul 2021 Shikhar Bahl, Abhinav Gupta, Deepak Pathak

We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input.

RMA: Rapid Motor Adaptation for Legged Robots

1 code implementation8 Jul 2021 Ashish Kumar, Zipeng Fu, Deepak Pathak, Jitendra Malik

Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear.

Unsupervised Learning of Visual 3D Keypoints for Control

1 code implementation14 Jun 2021 Boyuan Chen, Pieter Abbeel, Deepak Pathak

Prior works show that structured latent space such as visual keypoints often outperforms unstructured representations for robotic control.

Discovering and Achieving Goals with World Models

no code implementations ICML Workshop URL 2021 Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, Deepak Pathak

How can an artificial agent learn to solve a wide range of tasks in a complex visual environment in the absence of external supervision?

Auto-Tuned Sim-to-Real Transfer

1 code implementation15 Apr 2021 Yuqing Du, Olivia Watkins, Trevor Darrell, Pieter Abbeel, Deepak Pathak

Policies trained in simulation often fail when transferred to the real world due to the `reality gap' where the simulator is unable to accurately capture the dynamics and visual properties of the real world.

FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation

1 code implementation15 Dec 2020 Tarun Kalluri, Deepak Pathak, Manmohan Chandraker, Du Tran

A majority of methods for video frame interpolation compute bidirectional optical flow between adjacent frames of a video, followed by a suitable warping algorithm to generate the output frames.

Action Recognition Motion Magnification +2

Neural Dynamic Policies for End-to-End Sensorimotor Learning

no code implementations NeurIPS 2020 Shikhar Bahl, Mustafa Mukadam, Abhinav Gupta, Deepak Pathak

We show that NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks for both imitation and reinforcement learning setups.

Imitation Learning reinforcement-learning +1

One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control

2 code implementations ICML 2020 Wenlong Huang, Igor Mordatch, Deepak Pathak

We observe that a wide variety of drastically diverse locomotion styles across morphologies as well as centralized coordination emerges via message passing between decentralized modules purely from the reinforcement learning objective.

reinforcement-learning Reinforcement Learning (RL)

Locally Masked Convolution for Autoregressive Models

1 code implementation22 Jun 2020 Ajay Jain, Pieter Abbeel, Deepak Pathak

For tasks such as image completion, these models are unable to use much of the observed context.

Anomaly Detection Density Estimation +2

Planning to Explore via Self-Supervised World Models

4 code implementations12 May 2020 Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak

Reinforcement learning allows solving complex tasks, however, the learning tends to be task-specific and the sample efficiency remains a challenge.

reinforcement-learning Reinforcement Learning (RL)

Exploring Exploration: Comparing Children with RL Agents in Unified Environments

1 code implementation6 May 2020 Eliza Kosoy, Jasmine Collins, David M. Chan, Sandy Huang, Deepak Pathak, Pulkit Agrawal, John Canny, Alison Gopnik, Jessica B. Hamrick

Research in developmental psychology consistently shows that children explore the world thoroughly and efficiently and that this exploration allows them to learn.

Sparse Graphical Memory for Robust Planning

1 code implementation NeurIPS 2020 Scott Emmons, Ajay Jain, Michael Laskin, Thanard Kurutach, Pieter Abbeel, Deepak Pathak

To operate effectively in the real world, agents should be able to act from high-dimensional raw sensory input such as images and achieve diverse goals across long time-horizons.

Imitation Learning Visual Navigation

Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller

1 code implementation NeurIPS 2019 Pratyusha Sharma, Deepak Pathak, Abhinav Gupta

We study a generalized setup for learning from demonstration to build an agent that can manipulate novel objects in unseen scenarios by looking at only a single video of human demonstration from a third-person perspective.

Imitation Learning

Weakly-Supervised Trajectory Segmentation for Learning Reusable Skills

no code implementations25 Sep 2019 Parsa Mahmoudieh, Trevor Darrell, Deepak Pathak

Instead of direct manual supervision which is tedious and prone to bias, in this work, our goal is to extract reusable skills from a collection of human demonstrations collected directly for several end-tasks.

Multiple Instance Learning Segmentation

Compositional GAN (Extended Abstract): Learning Image-Conditional Binary Composition

no code implementations ICLR Workshop DeepGenStruct 2019 Samaneh Azadi, Deepak Pathak, Sayna Ebrahimi, Trevor Darrell

Generative Adversarial Networks (GANs) can produce images of surprising complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene.

Large-Scale Study of Curiosity-Driven Learning

4 code implementations ICLR 2019 Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros

However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent.

Atari Games SNES Games

Compositional GAN: Learning Image-Conditional Binary Composition

1 code implementation19 Jul 2018 Samaneh Azadi, Deepak Pathak, Sayna Ebrahimi, Trevor Darrell

Generative Adversarial Networks (GANs) can produce images of remarkable complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene.

Learning Instance Segmentation by Interaction

1 code implementation21 Jun 2018 Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik

The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels.

Instance Segmentation Segmentation +1

Learning Features by Watching Objects Move

1 code implementation CVPR 2017 Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, Bharath Hariharan

Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed 'pretext' tasks studied in the literature.

object-detection Object Detection +1

Context Encoders: Feature Learning by Inpainting

11 code implementations CVPR 2016 Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros

In order to succeed at this task, context encoders need to both understand the content of the entire image, as well as produce a plausible hypothesis for the missing part(s).

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation

1 code implementation ICCV 2015 Deepak Pathak, Philipp Krähenbühl, Trevor Darrell

We propose Constrained CNN (CCNN), a method which uses a novel loss function to optimize for any set of linear constraints on the output space (i. e. predicted label distribution) of a CNN.

Image Segmentation Semantic Segmentation +2

Fully Convolutional Multi-Class Multiple Instance Learning

1 code implementation22 Dec 2014 Deepak Pathak, Evan Shelhamer, Jonathan Long, Trevor Darrell

We propose a novel MIL formulation of multi-class semantic segmentation learning by a fully convolutional network.

Multiple Instance Learning Segmentation +1

Detector Discovery in the Wild: Joint Multiple Instance and Representation Learning

no code implementations CVPR 2015 Judy Hoffman, Deepak Pathak, Trevor Darrell, Kate Saenko

We develop methods for detector learning which exploit joint training over both weak and strong labels and which transfer learned perceptual representations from strongly-labeled auxiliary tasks.

Multiple Instance Learning Representation Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.