no code implementations • 24 Jul 2024 • Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Andrew Zisserman
The model is trained and evaluated on the OVR dataset, and its performance assessed with and without using text to specify the target class to count.
no code implementations • 19 Mar 2024 • Vidhi Jain, Maria Attarian, Nikhil J Joshi, Ayzaan Wahid, Danny Driess, Quan Vuong, Pannag R Sanketi, Pierre Sermanet, Stefan Welker, Christine Chan, Igor Gilitschenski, Yonatan Bisk, Debidatta Dwibedi
Vid2Robot uses cross-attention transformer layers between video features and the current robot state to produce the actions and perform the same task as shown in the video.
no code implementations • 18 Mar 2024 • Debidatta Dwibedi, Vidhi Jain, Jonathan Tompson, Andrew Zisserman, Yusuf Aytar
The model, FlexCap, is trained to produce length-conditioned captions for input bounding boxes, and this allows control over the information density of its output, with descriptions ranging from concise object labels to detailed captions.
no code implementations • 4 Mar 2024 • Suneel Belkhale, Tianli Ding, Ted Xiao, Pierre Sermanet, Quon Vuong, Jonathan Tompson, Yevgen Chebotar, Debidatta Dwibedi, Dorsa Sadigh
Predicting these language motions as an intermediate step between tasks and actions forces the policy to learn the shared structure of low-level motions across seemingly disparate tasks.
no code implementations • 7 Feb 2024 • ALOHA 2 Team, Jorge Aldaco, Travis Armstrong, Robert Baruch, Jeff Bingham, Sanky Chan, Kenneth Draper, Debidatta Dwibedi, Chelsea Finn, Pete Florence, Spencer Goodrich, Wayne Gramlich, Torr Hage, Alexander Herzog, Jonathan Hoech, Thinh Nguyen, Ian Storz, Baruch Tabanpour, Leila Takayama, Jonathan Tompson, Ayzaan Wahid, Ted Wahrburg, Sichun Xu, Sergey Yaroshenko, Kevin Zakka, Tony Z. Zhao
Diverse demonstration datasets have powered significant advances in robot learning, but the dexterity and scale of such data can be limited by the hardware cost, the hardware robustness, and the ease of teleoperation.
no code implementations • 23 Jan 2024 • Michael Ahn, Debidatta Dwibedi, Chelsea Finn, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Karol Hausman, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Sean Kirmani, Edward Lee, Sergey Levine, Yao Lu, Isabel Leal, Sharath Maddineni, Kanishka Rao, Dorsa Sadigh, Pannag Sanketi, Pierre Sermanet, Quan Vuong, Stefan Welker, Fei Xia, Ted Xiao, Peng Xu, Steve Xu, Zhuo Xu
We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
1 code implementation • 10 Feb 2023 • Thomas Mulc, Debidatta Dwibedi
In semi-supervised learning, student-teacher distribution matching has been successful in improving performance of models using unlabeled data in conjunction with few labeled samples.
no code implementations • 12 May 2022 • Negin Heravi, Ayzaan Wahid, Corey Lynch, Pete Florence, Travis Armstrong, Jonathan Tompson, Pierre Sermanet, Jeannette Bohg, Debidatta Dwibedi
Our self-supervised representations are learned by observing the agent freely interacting with different parts of the environment and is queried in two different settings: (i) policy learning and (ii) object location prediction.
1 code implementation • 7 Jun 2021 • Kevin Zakka, Andy Zeng, Pete Florence, Jonathan Tompson, Jeannette Bohg, Debidatta Dwibedi
We investigate the visual cross-embodiment imitation setting, in which agents learn policies from videos of other agents (such as humans) demonstrating the same task, but with stark differences in their embodiments -- shape, actions, end-effector dynamics, etc.
4 code implementations • ICCV 2021 • Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
On semi-supervised learning benchmarks we improve performance significantly when only 1% ImageNet labels are available, from 53. 8% to 56. 5%.
Ranked #1 on Image Classification on PASCAL VOC 2007
2 code implementations • CVPR 2020 • Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
We present an approach for estimating the period with which an action is repeated in a video.
Ranked #1 on Repetitive Action Counting on Countix
no code implementations • 8 Jan 2020 • Ross Goroshin, Jonathan Tompson, Debidatta Dwibedi
Despite these strong priors, we show that deep trackers often default to tracking by saliency detection - without relying on the object instance representation.
no code implementations • 25 Sep 2019 • Ross Goroshin, Jonathan Tompson, Debidatta Dwibedi
Fully convolutional deep correlation networks are integral components of state-of- the-art approaches to single object visual tracking.
2 code implementations • CVPR 2019 • Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
We introduce a self-supervised representation learning method based on the task of temporal alignment between videos.
Ranked #1 on Video Alignment on UPenn Action
3 code implementations • ICLR 2019 • Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, Jonathan Tompson
We identify two issues with the family of algorithms based on the Adversarial Imitation Learning framework.
no code implementations • 2 Aug 2018 • Debidatta Dwibedi, Jonathan Tompson, Corey Lynch, Pierre Sermanet
In this work we explore a new approach for robots to teach themselves about the world simply by observing it.
6 code implementations • ICCV 2017 • Debidatta Dwibedi, Ishan Misra, Martial Hebert
In this paper, we propose a simple approach to generate large annotated instance datasets with minimal effort.
1 code implementation • 30 Nov 2016 • Debidatta Dwibedi, Tomasz Malisiewicz, Vijay Badrinarayanan, Andrew Rabinovich
We present a Deep Cuboid Detector which takes a consumer-quality RGB image of a cluttered scene and localizes all 3D cuboids (box-like objects).