Search Results for author: Debidatta Dwibedi

Despite these strong priors, we show that deep trackers often default to tracking by saliency detection - without relying on the object instance representation.

Object Saliency Detection +1

Paper
Add Code

Counting Out Time: Class Agnostic Video Repetition Counting in the Wild

no code implementations • CVPR 2020 • Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman

We present an approach for estimating the period with which an action is repeated in a video.

Paper
Add Code

With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations

4 code implementations • ICCV 2021 • Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman

On semi-supervised learning benchmarks we improve performance significantly when only 1% ImageNet labels are available, from 53. 8% to 56. 5%.

Ranked #1 on Image Classification on PASCAL VOC 2007

Contrastive Learning Fine-Grained Image Classification +4

2,747

Paper
Code

XIRL: Cross-embodiment Inverse Reinforcement Learning

1 code implementation • 7 Jun 2021 • Kevin Zakka, Andy Zeng, Pete Florence, Jonathan Tompson, Jeannette Bohg, Debidatta Dwibedi

We investigate the visual cross-embodiment imitation setting, in which agents learn policies from videos of other agents (such as humans) demonstrating the same task, but with stark differences in their embodiments -- shape, actions, end-effector dynamics, etc.

reinforcement-learning Reinforcement Learning (RL)

32,816

Paper
Code

Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations

no code implementations • 12 May 2022 • Negin Heravi, Ayzaan Wahid, Corey Lynch, Pete Florence, Travis Armstrong, Jonathan Tompson, Pierre Sermanet, Jeannette Bohg, Debidatta Dwibedi

Our self-supervised representations are learned by observing the agent freely interacting with different parts of the environment and is queried in two different settings: (i) policy learning and (ii) object location prediction.

Object Object Localization +2

Paper
Add Code

Q-Match: Self-Supervised Learning by Matching Distributions Induced by a Queue

1 code implementation • 10 Feb 2023 • Thomas Mulc, Debidatta Dwibedi

In semi-supervised learning, student-teacher distribution matching has been successful in improving performance of models using unlabeled data in conjunction with few labeled samples.

Self-Supervised Learning

32,808

Paper
Code

AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

no code implementations • 23 Jan 2024 • Michael Ahn, Debidatta Dwibedi, Chelsea Finn, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Karol Hausman, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Sean Kirmani, Edward Lee, Sergey Levine, Yao Lu, Isabel Leal, Sharath Maddineni, Kanishka Rao, Dorsa Sadigh, Pannag Sanketi, Pierre Sermanet, Quan Vuong, Stefan Welker, Fei Xia, Ted Xiao, Peng Xu, Steve Xu, Zhuo Xu

We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.

Instruction Following Scene Understanding

Paper
Add Code

RT-H: Action Hierarchies Using Language

no code implementations • 4 Mar 2024 • Suneel Belkhale, Tianli Ding, Ted Xiao, Pierre Sermanet, Quon Vuong, Jonathan Tompson, Yevgen Chebotar, Debidatta Dwibedi, Dorsa Sadigh

Predicting these language motions as an intermediate step between tasks and actions forces the policy to learn the shared structure of low-level motions across seemingly disparate tasks.

Imitation Learning

Paper
Add Code

FlexCap: Generating Rich, Localized, and Flexible Captions in Images

no code implementations • 18 Mar 2024 • Debidatta Dwibedi, Vidhi Jain, Jonathan Tompson, Andrew Zisserman, Yusuf Aytar

The model, FlexCap, is trained to produce length-conditioned captions for input bounding boxes, and this allows control over the information density of its output, with descriptions ranging from concise object labels to detailed captions.

Attribute Dense Captioning +8

Paper
Add Code

Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers

no code implementations • 19 Mar 2024 • Vidhi Jain, Maria Attarian, Nikhil J Joshi, Ayzaan Wahid, Danny Driess, Quan Vuong, Pannag R Sanketi, Pierre Sermanet, Stefan Welker, Christine Chan, Igor Gilitschenski, Yonatan Bisk, Debidatta Dwibedi

Given a video demonstration of a manipulation task and current visual observations, Vid2Robot directly produces robot actions.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.