Search Results for author: Pierre Sermanet

Found 30 papers, 10 papers with code

Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers

no code implementations • 19 Mar 2024 • Vidhi Jain, Maria Attarian, Nikhil J Joshi, Ayzaan Wahid, Danny Driess, Quan Vuong, Pannag R Sanketi, Pierre Sermanet, Stefan Welker, Christine Chan, Igor Gilitschenski, Yonatan Bisk, Debidatta Dwibedi

Given a video demonstration of a manipulation task and current visual observations, Vid2Robot directly produces robot actions.

Paper
Add Code

RT-H: Action Hierarchies Using Language

no code implementations • 4 Mar 2024 • Suneel Belkhale, Tianli Ding, Ted Xiao, Pierre Sermanet, Quon Vuong, Jonathan Tompson, Yevgen Chebotar, Debidatta Dwibedi, Dorsa Sadigh

Predicting these language motions as an intermediate step between tasks and actions forces the policy to learn the shared structure of low-level motions across seemingly disparate tasks.

Imitation Learning

Paper
Add Code

AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

no code implementations • 23 Jan 2024 • Michael Ahn, Debidatta Dwibedi, Chelsea Finn, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Karol Hausman, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Sean Kirmani, Edward Lee, Sergey Levine, Yao Lu, Isabel Leal, Sharath Maddineni, Kanishka Rao, Dorsa Sadigh, Pannag Sanketi, Pierre Sermanet, Quan Vuong, Stefan Welker, Fei Xia, Ted Xiao, Peng Xu, Steve Xu, Zhuo Xu

We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.

Instruction Following Scene Understanding

Paper
Add Code

Video Language Planning

no code implementations • 16 Oct 2023 • Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson

We are interested in enabling visual planning for complex long-horizon tasks in the space of generated videos and language, leveraging recent advances in large generative models pretrained on Internet-scale data.

Paper
Add Code

Robotic Table Tennis: A Case Study into a High Speed Learning System

no code implementations • 6 Sep 2023 • David B. D'Ambrosio, Jonathan Abelian, Saminda Abeyruwan, Michael Ahn, Alex Bewley, Justin Boyd, Krzysztof Choromanski, Omar Cortes, Erwin Coumans, Tianli Ding, Wenbo Gao, Laura Graesser, Atil Iscen, Navdeep Jaitly, Deepali Jain, Juhana Kangaspunta, Satoshi Kataoka, Gus Kouretas, Yuheng Kuang, Nevena Lazic, Corey Lynch, Reza Mahjourian, Sherry Q. Moore, Thinh Nguyen, Ken Oslund, Barney J Reed, Krista Reymann, Pannag R. Sanketi, Anish Shankar, Pierre Sermanet, Vikas Sindhwani, Avi Singh, Vincent Vanhoucke, Grace Vesom, Peng Xu

We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets.

Paper
Add Code

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

1 code implementation • 28 Jul 2023 • Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Lisa Lee, Tsang-Wei Edward Lee, Sergey Levine, Yao Lu, Henryk Michalewski, Igor Mordatch, Karl Pertsch, Kanishka Rao, Krista Reymann, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Pierre Sermanet, Jaspiar Singh, Anikait Singh, Radu Soricut, Huong Tran, Vincent Vanhoucke, Quan Vuong, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Jialin Wu, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich

Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web.

Object Question Answering +1

266

Paper
Code

PaLM-E: An Embodied Multimodal Language Model

2 code implementations • 6 Mar 2023 • Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence

Large language models excel at a wide range of complex tasks.

Ranked #2 on Visual Question Answering (VQA) on OK-VQA

Language Modelling Large Language Model +2

201

Paper
Code

Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models

no code implementations • 21 Nov 2022 • Ted Xiao, Harris Chan, Pierre Sermanet, Ayzaan Wahid, Anthony Brohan, Karol Hausman, Sergey Levine, Jonathan Tompson

To accomplish this, we introduce Data-driven Instruction Augmentation for Language-conditioned control (DIAL): we utilize semi-supervised language labels leveraging the semantic understanding of CLIP to propagate knowledge onto large datasets of unlabelled demonstration data and then train language-conditioned policies on the augmented datasets.

Imitation Learning

Paper
Add Code

Inner Monologue: Embodied Reasoning through Planning with Language Models

no code implementations • 12 Jul 2022 • Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter

We investigate a variety of sources of feedback, such as success detection, scene description, and human interaction.

Paper
Add Code

Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations

no code implementations • 12 May 2022 • Negin Heravi, Ayzaan Wahid, Corey Lynch, Pete Florence, Travis Armstrong, Jonathan Tompson, Pierre Sermanet, Jeannette Bohg, Debidatta Dwibedi

Our self-supervised representations are learned by observing the agent freely interacting with different parts of the environment and is queried in two different settings: (i) policy learning and (ii) object location prediction.

Object Object Localization +2

Paper
Add Code

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

3 code implementations • 4 Apr 2022 • Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan, Andy Zeng

We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment.

Decision Making Language Modelling +1

160

Paper
Code

With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations

4 code implementations • ICCV 2021 • Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman

On semi-supervised learning benchmarks we improve performance significantly when only 1% ImageNet labels are available, from 53. 8% to 56. 5%.

Ranked #1 on Image Classification on PASCAL VOC 2007

Contrastive Learning Fine-Grained Image Classification +4

2,745

Paper
Code

Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning

no code implementations • 13 Oct 2020 • Brian Ichter, Pierre Sermanet, Corey Lynch

This task space can be quite general and abstract; its only requirements are to be sampleable and to well-cover the space of useful tasks.

Motion Planning

Paper
Add Code

Counting Out Time: Class Agnostic Video Repetition Counting in the Wild

no code implementations • CVPR 2020 • Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman

We present an approach for estimating the period with which an action is repeated in a video.

Paper
Add Code

Learning to Play by Imitating Humans

no code implementations • 11 Jun 2020 • Rostam Dinyari, Pierre Sermanet, Corey Lynch

Acquiring multiple skills has commonly involved collecting a large number of expert demonstrations per task or engineering custom reward functions.

Paper
Add Code

Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos

no code implementations • 31 May 2020 • Ajay Kumar Tanwani, Pierre Sermanet, Andy Yan, Raghav Anand, Mariano Phielipp, Ken Goldberg

We demonstrate the use of this representation to imitate surgical suturing motions from publicly available videos of the JIGSAWS dataset.

Action Segmentation Metric Learning +1

Paper
Add Code

Language Conditioned Imitation Learning over Unstructured Data

no code implementations • 15 May 2020 • Corey Lynch, Pierre Sermanet

Prior work in imitation learning typically requires each task be specified with a task id or goal image -- something that is often impractical in open-world environments.

Continuous Control Imitation Learning +2

Paper
Add Code

Online Object Representations with Contrastive Learning

no code implementations • 10 Jun 2019 • Sören Pirk, Mohi Khansari, Yunfei Bai, Corey Lynch, Pierre Sermanet

We propose a self-supervised approach for learning representations of objects from monocular videos and demonstrate it is particularly useful in situated settings such as robotics.

Contrastive Learning Object

Paper
Add Code

Temporal Cycle-Consistency Learning

2 code implementations • CVPR 2019 • Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman

We introduce a self-supervised representation learning method based on the task of temporal alignment between videos.

Ranked #1 on Video Alignment on UPenn Action

Anomaly Detection Representation Learning +2

32,796

Paper
Code

Wasserstein Dependency Measure for Representation Learning

no code implementations • NeurIPS 2019 • Sherjil Ozair, Corey Lynch, Yoshua Bengio, Aaron van den Oord, Sergey Levine, Pierre Sermanet

Mutual information maximization has emerged as a powerful learning objective for unsupervised representation learning obtaining state-of-the-art performance in applications such as object recognition, speech recognition, and reinforcement learning.

Object Recognition reinforcement-learning +5

Paper
Add Code

Learning Latent Plans from Play

1 code implementation • 5 Mar 2019 • Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, Pierre Sermanet

Learning from play (LfP) offers three main advantages: 1) It is cheap.

Robotics

Paper
Code

Learning Actionable Representations from Visual Observations

no code implementations • 2 Aug 2018 • Debidatta Dwibedi, Jonathan Tompson, Corey Lynch, Pierre Sermanet

In this work we explore a new approach for robots to teach themselves about the world simply by observing it.

Continuous Control

Paper
Add Code

Time-Contrastive Networks: Self-Supervised Learning from Video

7 code implementations • 23 Apr 2017 • Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine

While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human.

Ranked #3 on Video Alignment on UPenn Action

Metric Learning reinforcement-learning +3

76,591

Paper
Code

Unsupervised Perceptual Rewards for Imitation Learning

no code implementations • 20 Dec 2016 • Pierre Sermanet, Kelvin Xu, Sergey Levine

We present a method that is able to identify key intermediate steps of a task from only a handful of demonstration sequences, and automatically identify the most discriminative features for identifying these steps.

Imitation Learning Reinforcement Learning (RL)

Paper
Add Code

Attention for Fine-Grained Categorization

no code implementations • 22 Dec 2014 • Pierre Sermanet, Andrea Frome, Esteban Real

This paper presents experiments extending the work of Ba et al. (2014) on recurrent neural models for attention into less constrained visual environments, specifically fine-grained categorization on the Stanford Dogs data set.

Paper
Add Code

Going Deeper with Convolutions

79 code implementations • CVPR 2015 • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich

We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014).

General Classification Image Classification +2

76,590

Paper
Code

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

4 code implementations • 21 Dec 2013 • Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann Lecun

This integrated framework is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) and obtained very competitive results for the detection and classifications tasks.

General Classification Image Classification +2