2 code implementations • 6 Mar 2023 • Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence
Large language models excel at a wide range of complex tasks.
Ranked #1 on
Visual Question Answering (VQA)
on OK-VQA
(using extra training data)
no code implementations • 21 Nov 2022 • Ted Xiao, Harris Chan, Pierre Sermanet, Ayzaan Wahid, Anthony Brohan, Karol Hausman, Sergey Levine, Jonathan Tompson
To accomplish this, we introduce Data-driven Instruction Augmentation for Language-conditioned control (DIAL): we utilize semi-supervised language labels leveraging the semantic understanding of CLIP to propagate knowledge onto large datasets of unlabelled demonstration data and then train language-conditioned policies on the augmented datasets.
no code implementations • 12 Jul 2022 • Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter
We investigate a variety of sources of feedback, such as success detection, scene description, and human interaction.
no code implementations • 12 May 2022 • Negin Heravi, Ayzaan Wahid, Corey Lynch, Pete Florence, Travis Armstrong, Jonathan Tompson, Pierre Sermanet, Jeannette Bohg, Debidatta Dwibedi
Our self-supervised representations are learned by observing the agent freely interacting with different parts of the environment and is queried in two different settings: (i) policy learning and (ii) object location prediction.
2 code implementations • 4 Apr 2022 • Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan, Andy Zeng
We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment.
4 code implementations • ICCV 2021 • Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
On semi-supervised learning benchmarks we improve performance significantly when only 1% ImageNet labels are available, from 53. 8% to 56. 5%.
Ranked #1 on
Image Classification
on PASCAL VOC 2007
no code implementations • 13 Oct 2020 • Brian Ichter, Pierre Sermanet, Corey Lynch
This task space can be quite general and abstract; its only requirements are to be sampleable and to well-cover the space of useful tasks.
no code implementations • CVPR 2020 • Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
We present an approach for estimating the period with which an action is repeated in a video.
no code implementations • 11 Jun 2020 • Rostam Dinyari, Pierre Sermanet, Corey Lynch
Acquiring multiple skills has commonly involved collecting a large number of expert demonstrations per task or engineering custom reward functions.
no code implementations • 31 May 2020 • Ajay Kumar Tanwani, Pierre Sermanet, Andy Yan, Raghav Anand, Mariano Phielipp, Ken Goldberg
We demonstrate the use of this representation to imitate surgical suturing motions from publicly available videos of the JIGSAWS dataset.
no code implementations • 15 May 2020 • Corey Lynch, Pierre Sermanet
Prior work in imitation learning typically requires each task be specified with a task id or goal image -- something that is often impractical in open-world environments.
no code implementations • 10 Jun 2019 • Sören Pirk, Mohi Khansari, Yunfei Bai, Corey Lynch, Pierre Sermanet
We propose a self-supervised approach for learning representations of objects from monocular videos and demonstrate it is particularly useful in situated settings such as robotics.
2 code implementations • CVPR 2019 • Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
We introduce a self-supervised representation learning method based on the task of temporal alignment between videos.
Ranked #1 on
Video Alignment
on UPenn Action
no code implementations • NeurIPS 2019 • Sherjil Ozair, Corey Lynch, Yoshua Bengio, Aaron van den Oord, Sergey Levine, Pierre Sermanet
Mutual information maximization has emerged as a powerful learning objective for unsupervised representation learning obtaining state-of-the-art performance in applications such as object recognition, speech recognition, and reinforcement learning.
1 code implementation • 5 Mar 2019 • Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, Pierre Sermanet
Learning from play (LfP) offers three main advantages: 1) It is cheap.
Robotics
no code implementations • 2 Aug 2018 • Debidatta Dwibedi, Jonathan Tompson, Corey Lynch, Pierre Sermanet
In this work we explore a new approach for robots to teach themselves about the world simply by observing it.
6 code implementations • 23 Apr 2017 • Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine
While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human.
Ranked #3 on
Video Alignment
on UPenn Action
no code implementations • 20 Dec 2016 • Pierre Sermanet, Kelvin Xu, Sergey Levine
We present a method that is able to identify key intermediate steps of a task from only a handful of demonstration sequences, and automatically identify the most discriminative features for identifying these steps.
no code implementations • 22 Dec 2014 • Pierre Sermanet, Andrea Frome, Esteban Real
This paper presents experiments extending the work of Ba et al. (2014) on recurrent neural models for attention into less constrained visual environments, specifically fine-grained categorization on the Stanford Dogs data set.
79 code implementations • CVPR 2015 • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich
We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014).
4 code implementations • 21 Dec 2013 • Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann Lecun
This integrated framework is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) and obtained very competitive results for the detection and classifications tasks.
no code implementations • CVPR 2013 • Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala, Yann Lecun
Pedestrian detection is a problem of considerable practical interest.
2 code implementations • 18 Apr 2012 • Pierre Sermanet, Soumith Chintala, Yann Lecun
We classify digits of real-world house numbers using convolutional neural networks (ConvNets).
no code implementations • NeurIPS 2010 • Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann L. Cun
We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features.