Search Results for author: Wilson Yan

Found 11 papers, 7 papers with code

Learning to Manipulate Deformable Objects without Demonstrations

2 code implementations • 29 Oct 2019 • Yilin Wu, Wilson Yan, Thanard Kurutach, Lerrel Pinto, Pieter Abbeel

Second, instead of jointly learning both the pick and the place locations, we only explicitly learn the placing policy conditioned on random pick points.

Deformable Object Manipulation Object +1

Paper
Code

Natural Image Manipulation for Autoregressive Models Using Fisher Scores

no code implementations • 25 Nov 2019 • Wilson Yan, Jonathan Ho, Pieter Abbeel

Deep autoregressive models are one of the most powerful models that exist today which achieve state-of-the-art bits per dim.

Image Manipulation

Paper
Add Code

Learning Predictive Representations for Deformable Objects Using Contrastive Estimation

1 code implementation • 11 Mar 2020 • Wilson Yan, Ashwin Vangipuram, Pieter Abbeel, Lerrel Pinto

Using visual model-based learning for deformable object manipulation is challenging due to difficulties in learning plannable visual representations along with complex dynamic models.

Deformable Object Manipulation

Paper
Code

VideoGen: Generative Modeling of Videos using VQ-VAE and Transformers

no code implementations • 1 Jan 2021 • Yunzhi Zhang, Wilson Yan, Pieter Abbeel, Aravind Srinivas

We present VideoGen: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.

Position Video Generation

Paper
Add Code

VideoGPT: Video Generation using VQ-VAE and Transformers

3 code implementations • 20 Apr 2021 • Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas

We present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.

Ranked #3 on Video Generation on UCF-101 16 frames, 128x128, Unconditional

Position Video Generation

872

Paper
Code

Patch-based Object-centric Transformers for Efficient Video Generation

1 code implementation • 8 Jun 2022 • Wilson Yan, Ryo Okumura, Stephen James, Pieter Abbeel

In this work, we present Patch-based Object-centric Video Transformer (POVT), a novel region-based video generation architecture that leverages object-centric information to efficiently model temporal dynamics in videos.

Object Video Editing +2

Paper
Code

Temporally Consistent Transformers for Video Generation

1 code implementation • 5 Oct 2022 • Wilson Yan, Danijar Hafner, Stephen James, Pieter Abbeel

To generate accurate videos, algorithms have to understand the spatial and temporal dependencies in the world.

Video Generation Video Prediction

Paper
Code

Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment

1 code implementation • NeurIPS 2023 • Hao liu, Wilson Yan, Pieter Abbeel

Recent progress in scaling up large language models has shown impressive capabilities in performing few-shot learning across a wide range of text-based tasks.

Attribute Few-Shot Image Classification +3

Paper
Code

ALP: Action-Aware Embodied Learning for Perception

no code implementations • 16 Jun 2023 • Xinran Liang, Anthony Han, Wilson Yan, aditi raghunathan, Pieter Abbeel

In addition, we show that by training on actively collected data more relevant to the environment and task, our method generalizes more robustly to downstream tasks compared to models pre-trained on fixed datasets such as ImageNet.

Benchmarking object-detection +3

Paper
Add Code

Motion-Conditioned Image Animation for Video Editing

no code implementations • 30 Nov 2023 • Wilson Yan, Andrew Brown, Pieter Abbeel, Rohit Girdhar, Samaneh Azadi

We introduce MoCA, a Motion-Conditioned Image Animation approach for video editing.

Image Animation Video Editing

Paper
Add Code

World Model on Million-Length Video And Language With Blockwise RingAttention

1 code implementation • 13 Feb 2024 • Hao liu, Wilson Yan, Matei Zaharia, Pieter Abbeel

To address these challenges, we curate a large dataset of diverse videos and books, utilize the Blockwise RingAttention technique to scalably train on long sequences, and gradually increase context size from 4K to 1M tokens.

4k Video Understanding

6,776

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.