Search Results for author: Jonathan Tompson

Found 39 papers, 16 papers with code

FlexCap: Generating Rich, Localized, and Flexible Captions in Images

no code implementations • 18 Mar 2024 • Debidatta Dwibedi, Vidhi Jain, Jonathan Tompson, Andrew Zisserman, Yusuf Aytar

The model, FlexCap, is trained to produce length-conditioned captions for input bounding boxes, and this allows control over the information density of its output, with descriptions ranging from concise object labels to detailed captions.

Attribute Dense Captioning +8

Paper
Add Code

RT-H: Action Hierarchies Using Language

no code implementations • 4 Mar 2024 • Suneel Belkhale, Tianli Ding, Ted Xiao, Pierre Sermanet, Quon Vuong, Jonathan Tompson, Yevgen Chebotar, Debidatta Dwibedi, Dorsa Sadigh

Predicting these language motions as an intermediate step between tasks and actions forces the policy to learn the shared structure of low-level motions across seemingly disparate tasks.

Imitation Learning

Paper
Add Code

Video Language Planning

no code implementations • 16 Oct 2023 • Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson

We are interested in enabling visual planning for complex long-horizon tasks in the space of generated videos and language, leveraging recent advances in large generative models pretrained on Internet-scale data.

Paper
Add Code

Learning Interactive Real-World Simulators

no code implementations • 9 Oct 2023 • Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, Pieter Abbeel

Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied agents purely in simulation that can be directly deployed in the real world.

Video Captioning

Paper
Add Code

PaLM-E: An Embodied Multimodal Language Model

2 code implementations • 6 Mar 2023 • Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence

Large language models excel at a wide range of complex tasks.

Ranked #2 on Visual Question Answering (VQA) on OK-VQA

Language Modelling Large Language Model +2

199

Paper
Code

Scaling Robot Learning with Semantically Imagined Experience

no code implementations • 22 Feb 2023 • Tianhe Yu, Ted Xiao, Austin Stone, Jonathan Tompson, Anthony Brohan, Su Wang, Jaspiar Singh, Clayton Tan, Dee M, Jodilyn Peralta, Brian Ichter, Karol Hausman, Fei Xia

Specifically, we make use of the state of the art text-to-image diffusion models and perform aggressive data augmentation on top of our existing robotic manipulation datasets via inpainting various unseen objects for manipulation, backgrounds, and distractors with text guidance.

Data Augmentation

Paper
Add Code

Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models

no code implementations • 21 Nov 2022 • Ted Xiao, Harris Chan, Pierre Sermanet, Ayzaan Wahid, Anthony Brohan, Karol Hausman, Sergey Levine, Jonathan Tompson

To accomplish this, we introduce Data-driven Instruction Augmentation for Language-conditioned control (DIAL): we utilize semi-supervised language labels leveraging the semantic understanding of CLIP to propagate knowledge onto large datasets of unlabelled demonstration data and then train language-conditioned policies on the augmented datasets.

Imitation Learning

Paper
Add Code

Contrastive Value Learning: Implicit Models for Simple Offline RL

no code implementations • 3 Nov 2022 • Bogdan Mazoure, Benjamin Eysenbach, Ofir Nachum, Jonathan Tompson

In this paper, we propose Contrastive Value Learning (CVL), which learns an implicit, multi-step model of the environment dynamics.

Continuous Control Model-based Reinforcement Learning +2

Paper
Add Code

Interactive Language: Talking to Robots in Real Time

1 code implementation • 12 Oct 2022 • Corey Lynch, Ayzaan Wahid, Jonathan Tompson, Tianli Ding, James Betker, Robert Baruch, Travis Armstrong, Pete Florence

We present a framework for building interactive, real-time, natural language-instructable robots in the real world, and we open source related assets (dataset, environment, benchmark, and policies).

218

Paper
Code

Inner Monologue: Embodied Reasoning through Planning with Language Models

no code implementations • 12 Jul 2022 • Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter

We investigate a variety of sources of feedback, such as success detection, scene description, and human interaction.

Paper
Add Code

Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations

no code implementations • 12 May 2022 • Negin Heravi, Ayzaan Wahid, Corey Lynch, Pete Florence, Travis Armstrong, Jonathan Tompson, Pierre Sermanet, Jeannette Bohg, Debidatta Dwibedi

Our self-supervised representations are learned by observing the agent freely interacting with different parts of the environment and is queried in two different settings: (i) policy learning and (ii) object location prediction.

Object Object Localization +2

Paper
Add Code

Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions

no code implementations • 29 Nov 2021 • Bogdan Mazoure, Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

We show that performance of online algorithms for generalization in RL can be hindered in the offline setting due to poor estimation of similarity between observations.

Contrastive Learning Decision Making +5

Paper
Add Code

Implicit Behavioral Cloning

4 code implementations • 1 Sep 2021 • Pete Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, Jonathan Tompson

We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models.

D4RL

2,505

Paper
Code

XIRL: Cross-embodiment Inverse Reinforcement Learning

1 code implementation • 7 Jun 2021 • Kevin Zakka, Andy Zeng, Pete Florence, Jonathan Tompson, Jeannette Bohg, Debidatta Dwibedi

We investigate the visual cross-embodiment imitation setting, in which agents learn policies from videos of other agents (such as humans) demonstrating the same task, but with stark differences in their embodiments -- shape, actions, end-effector dynamics, etc.

reinforcement-learning Reinforcement Learning (RL)

32,737

Paper
Code

With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations

4 code implementations • ICCV 2021 • Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman

On semi-supervised learning benchmarks we improve performance significantly when only 1% ImageNet labels are available, from 53. 8% to 56. 5%.

Ranked #1 on Image Classification on PASCAL VOC 2007

Contrastive Learning Fine-Grained Image Classification +4

2,740

Paper
Code

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

2 code implementations • 14 Mar 2021 • Ilya Kostrikov, Jonathan Tompson, Rob Fergus, Ofir Nachum

Many modern approaches to offline Reinforcement Learning (RL) utilize behavior regularization, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the offline data.

Offline RL reinforcement-learning +1

32,745

Paper
Code

Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks

no code implementations • 6 Dec 2020 • Daniel Seita, Pete Florence, Jonathan Tompson, Erwin Coumans, Vikas Sindhwani, Ken Goldberg, Andy Zeng

Goals cannot be as easily specified as rigid object poses, and may involve complex relative spatial relations such as "place the item inside the bag".

Paper
Add Code

ADAIL: Adaptive Adversarial Imitation Learning

no code implementations • 23 Aug 2020 • Yiren Lu, Jonathan Tompson

We present the ADaptive Adversarial Imitation Learning (ADAIL) algorithm for learning adaptive policies that can be transferred between environments of varying dynamics, by imitating a small number of demonstrations collected from a single source domain.

Imitation Learning

Paper
Add Code

Counting Out Time: Class Agnostic Video Repetition Counting in the Wild

no code implementations • CVPR 2020 • Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman

We present an approach for estimating the period with which an action is repeated in a video.

Paper
Add Code

An Analysis of Object Representations in Deep Visual Trackers

no code implementations • 8 Jan 2020 • Ross Goroshin, Jonathan Tompson, Debidatta Dwibedi

Despite these strong priors, we show that deep trackers often default to tracking by saliency detection - without relying on the object instance representation.

Object Saliency Detection +1

Paper
Add Code

Imitation Learning via Off-Policy Distribution Matching

3 code implementations • ICLR 2020 • Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

In this work, we show how the original distribution ratio estimation objective may be transformed in a principled manner to yield a completely off-policy objective.

Imitation Learning Reinforcement Learning (RL)

32,745

Paper
Code

Adaptive Adversarial Imitation Learning

no code implementations • 25 Sep 2019 • Yiren Lu, Jonathan Tompson, Sergey Levine

Imitation Learning

Paper
Add Code

Inducing Stronger Object Representations in Deep Visual Trackers

no code implementations • 25 Sep 2019 • Ross Goroshin, Jonathan Tompson, Debidatta Dwibedi

Fully convolutional deep correlation networks are integral components of state-of- the-art approaches to single object visual tracking.

Object Saliency Detection +1

Paper
Add Code

Beyond Photo Realism for Domain Adaptation from Synthetic Data

no code implementations • 4 Sep 2019 • Kristofer Schlachter, Connor DeFanti, Sebastian Herscher, Ken Perlin, Jonathan Tompson

As synthetic imagery is used more frequently in training deep models, it is important to understand how different synthesis techniques impact the performance of such models.

Denoising Domain Adaptation +1

Paper
Add Code

Temporal Cycle-Consistency Learning

2 code implementations • CVPR 2019 • Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman

We introduce a self-supervised representation learning method based on the task of temporal alignment between videos.

Ranked #1 on Video Alignment on UPenn Action

Anomaly Detection Representation Learning +2

32,745

Paper
Code

Learning Latent Plans from Play

1 code implementation • 5 Mar 2019 • Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, Pierre Sermanet

Learning from play (LfP) offers three main advantages: 1) It is cheap.

Robotics

Paper
Code

Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

3 code implementations • ICLR 2019 • Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, Jonathan Tompson

We identify two issues with the family of algorithms based on the Adversarial Imitation Learning framework.

Imitation Learning

385

Paper
Code

Learning Actionable Representations from Visual Observations

no code implementations • 2 Aug 2018 • Debidatta Dwibedi, Jonathan Tompson, Corey Lynch, Pierre Sermanet

In this work we explore a new approach for robots to teach themselves about the world simply by observing it.

Continuous Control

Paper
Add Code

Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning

1 code implementation • NeurIPS 2018 • Supasorn Suwajanakorn, Noah Snavely, Jonathan Tompson, Mohammad Norouzi

We demonstrate this framework on 3D pose estimation by proposing a differentiable objective that seeks the optimal set of keypoints for recovering the relative pose between two views of an object.

3D Pose Estimation

65,338

Paper
Code

PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model

3 code implementations • ECCV 2018 • George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, Kevin Murphy

We present a box-free bottom-up approach for the tasks of pose estimation and instance segmentation of people in multi-person images using an efficient single-shot model.

Ranked #8 on Multi-Person Pose Estimation on COCO test-dev

Instance Segmentation Keypoint Detection +2

539

Paper
Code

Towards Accurate Multi-person Pose Estimation in the Wild

no code implementations • CVPR 2017 • George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy

Trained on COCO data alone, our final system achieves average precision of 0. 649 on the COCO test-dev set and the 0. 643 test-standard sets, outperforming the winner of the 2016 COCO keypoints challenge and other recent state-of-art.

Ranked #6 on Keypoint Detection on COCO test-challenge

Human Detection Keypoint Detection +1

Paper
Add Code

Accelerating Eulerian Fluid Simulation With Convolutional Networks

1 code implementation • ICML 2017 • Jonathan Tompson, Kristofer Schlachter, Pablo Sprechmann, Ken Perlin

Efficient simulation of the Navier-Stokes equations for fluid flow is a long standing problem in applied mathematics, for which state-of-the-art methods require large compute resources.

Paper
Code

Efficient ConvNet-Based Marker-Less Motion Capture in General Scenes With a Low Number of Cameras

no code implementations • CVPR 2015 • Ahmed Elhayek, Edilson de Aguiar, Arjun Jain, Jonathan Tompson, Leonid Pishchulin, Micha Andriluka, Chris Bregler, Bernt Schiele, Christian Theobalt

Our approach unites a discriminative image-based joint detection method with a model-based generative motion tracking algorithm through a combined pose optimization energy.

Pose Estimation

Paper
Add Code

Unsupervised Feature Learning from Temporal Data

no code implementations • 9 Apr 2015 • Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann Lecun

Current state-of-the-art classification and detection algorithms rely on supervised training.

General Classification Metric Learning

Paper
Add Code

Unsupervised Learning of Spatiotemporally Coherent Metrics

no code implementations • ICCV 2015 • Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann Lecun

Current state-of-the-art classification and detection algorithms rely on supervised training.

General Classification Metric Learning

Paper
Add Code

Efficient Object Localization Using Convolutional Networks

2 code implementations • CVPR 2015 • Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann Lecun, Christopher Bregler

Recent state-of-the-art performance on human-body pose estimation has been achieved with Deep Convolutional Networks (ConvNets).

Ranked #42 on Pose Estimation on MPII Human Pose

Object Object Localization +2

Paper
Code

MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation

no code implementations • 28 Sep 2014 • Arjun Jain, Jonathan Tompson, Yann Lecun, Christoph Bregler

In this work, we propose a novel and efficient method for articulated human pose estimation in videos using a convolutional network architecture, which incorporates both color and motion features.

2D Human Pose Estimation Pose Estimation

Paper
Add Code

Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation

1 code implementation • NeurIPS 2014 • Jonathan Tompson, Arjun Jain, Yann Lecun, Christoph Bregler

This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field.

Pose Estimation

Paper
Code

Learning Human Pose Estimation Features with Convolutional Networks

1 code implementation • 27 Dec 2013 • Arjun Jain, Jonathan Tompson, Mykhaylo Andriluka, Graham W. Taylor, Christoph Bregler

This paper introduces a new architecture for human pose estimation using a multi- layer convolutional network architecture and a modified learning technique that learns low-level features and higher-level weak spatial models.

Object Recognition Pose Estimation +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.