Search Results for author: Wilson Yan

Found 11 papers, 7 papers with code

World Model on Million-Length Video And Language With Blockwise RingAttention

1 code implementation13 Feb 2024 Hao liu, Wilson Yan, Matei Zaharia, Pieter Abbeel

To address these challenges, we curate a large dataset of diverse videos and books, utilize the Blockwise RingAttention technique to scalably train on long sequences, and gradually increase context size from 4K to 1M tokens.

Video Understanding

ALP: Action-Aware Embodied Learning for Perception

no code implementations16 Jun 2023 Xinran Liang, Anthony Han, Wilson Yan, aditi raghunathan, Pieter Abbeel

In addition, we show that by training on actively collected data more relevant to the environment and task, our method generalizes more robustly to downstream tasks compared to models pre-trained on fixed datasets such as ImageNet.

Benchmarking object-detection +3

Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment

1 code implementation NeurIPS 2023 Hao liu, Wilson Yan, Pieter Abbeel

Recent progress in scaling up large language models has shown impressive capabilities in performing few-shot learning across a wide range of text-based tasks.

Attribute Few-Shot Image Classification +3

Temporally Consistent Transformers for Video Generation

1 code implementation5 Oct 2022 Wilson Yan, Danijar Hafner, Stephen James, Pieter Abbeel

To generate accurate videos, algorithms have to understand the spatial and temporal dependencies in the world.

Video Generation Video Prediction

Patch-based Object-centric Transformers for Efficient Video Generation

1 code implementation8 Jun 2022 Wilson Yan, Ryo Okumura, Stephen James, Pieter Abbeel

In this work, we present Patch-based Object-centric Video Transformer (POVT), a novel region-based video generation architecture that leverages object-centric information to efficiently model temporal dynamics in videos.

Object Video Editing +2

VideoGPT: Video Generation using VQ-VAE and Transformers

3 code implementations20 Apr 2021 Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas

We present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.

Position Video Generation

VideoGen: Generative Modeling of Videos using VQ-VAE and Transformers

no code implementations1 Jan 2021 Yunzhi Zhang, Wilson Yan, Pieter Abbeel, Aravind Srinivas

We present VideoGen: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.

Position Video Generation

Learning Predictive Representations for Deformable Objects Using Contrastive Estimation

1 code implementation11 Mar 2020 Wilson Yan, Ashwin Vangipuram, Pieter Abbeel, Lerrel Pinto

Using visual model-based learning for deformable object manipulation is challenging due to difficulties in learning plannable visual representations along with complex dynamic models.

Deformable Object Manipulation

Natural Image Manipulation for Autoregressive Models Using Fisher Scores

no code implementations25 Nov 2019 Wilson Yan, Jonathan Ho, Pieter Abbeel

Deep autoregressive models are one of the most powerful models that exist today which achieve state-of-the-art bits per dim.

Image Manipulation

Learning to Manipulate Deformable Objects without Demonstrations

2 code implementations29 Oct 2019 Yilin Wu, Wilson Yan, Thanard Kurutach, Lerrel Pinto, Pieter Abbeel

Second, instead of jointly learning both the pick and the place locations, we only explicitly learn the placing policy conditioned on random pick points.

Deformable Object Manipulation Object +1

Cannot find the paper you are looking for? You can Submit a new open access paper.