Search Results for author: Yunzhi Zhang

Found 20 papers, 8 papers with code

Birth and Death of a Rose

no code implementations6 Dec 2024 Chen Geng, Yunzhi Zhang, Shangzhe Wu, Jiajun Wu

We study the problem of generating temporal object intrinsics -- temporally evolving sequences of object geometry, reflectance, and texture, such as a blooming rose -- from pre-trained 2D foundation models.

Object Self-Supervised Learning

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

no code implementations27 Nov 2024 Shengqu Cai, Eric Chan, Yunzhi Zhang, Leonidas Guibas, Jiajun Wu, Gordon Wetzstein

We first leverage a text-to-image diffusion model's in-context generation ability to create grids of images and curate a large paired dataset with the help of a Visual-Language Model.

Image Generation Language Modeling +1

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

no code implementations22 Oct 2024 Yunzhi Zhang, Zizhang Li, Matt Zhou, Shangzhe Wu, Jiajun Wu

We introduce the Scene Language, a visual scene representation that concisely and precisely describes the structure, semantics, and identity of visual scenes.

Scene Generation

3D Congealing: 3D-Aware Image Alignment in the Wild

no code implementations2 Apr 2024 Yunzhi Zhang, Zizhang Li, Amit Raj, Andreas Engelhardt, Yuanzhen Li, Tingbo Hou, Jiajun Wu, Varun Jampani

The framework optimizes for the canonical representation together with the pose for each input image, and a per-image coordinate map that warps 2D pixel coordinates to the 3D canonical frame to account for the shape matching.

Pose Estimation

Learning the 3D Fauna of the Web

no code implementations CVPR 2024 Zizhang Li, Dor Litvak, Ruining Li, Yunzhi Zhang, Tomas Jakab, Christian Rupprecht, Shangzhe Wu, Andrea Vedaldi, Jiajun Wu

We show that prior category-specific attempts fail to generalize to rare species with limited training images.

Ponymation: Learning 3D Animal Motions from Unlabeled Online Videos

no code implementations21 Dec 2023 Keqiang Sun, Dor Litvak, Yunzhi Zhang, Hongsheng Li, Jiajun Wu, Shangzhe Wu

We introduce Ponymation, a new method for learning a generative model of articulated 3D animal motions from raw, unlabeled online videos.

Motion Synthesis

Language-Informed Visual Concept Learning

1 code implementation6 Dec 2023 Sharon Lee, Yunzhi Zhang, Shangzhe Wu, Jiajun Wu

To encourage better disentanglement of different concept encoders, we anchor the concept embeddings to a set of text embeddings obtained from a pre-trained Visual Question Answering (VQA) model.

Disentanglement Novel Concepts +2

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image

1 code implementation CVPR 2024 Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu

Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views.

Diversity Novel View Synthesis

Seeing a Rose in Five Thousand Ways

1 code implementation CVPR 2023 Yunzhi Zhang, Shangzhe Wu, Noah Snavely, Jiajun Wu

These instances all share the same intrinsics, but appear different due to a combination of variance within these intrinsics and differences in extrinsic factors, such as pose and illumination.

Image Generation Intrinsic Image Decomposition +1

Translating a Visual LEGO Manual to a Machine-Executable Plan

no code implementations25 Jul 2022 Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Chin-Yi Cheng, Jiajun Wu

We study the problem of translating an image-based, step-by-step assembly manual created by human designers into machine-interpretable instructions.

3D Pose Estimation Keypoint Detection +1

MaskViT: Masked Visual Pre-Training for Video Prediction

no code implementations23 Jun 2022 Agrim Gupta, Stephen Tian, Yunzhi Zhang, Jiajun Wu, Roberto Martín-Martín, Li Fei-Fei

This work shows that we can create good video prediction models by pre-training transformers via masked visual modeling.

Scheduling Video Prediction

Video Extrapolation in Space and Time

no code implementations4 May 2022 Yunzhi Zhang, Jiajun Wu

Novel view synthesis (NVS) and video prediction (VP) are typically considered disjoint tasks in computer vision.

Novel View Synthesis Video Prediction

VideoGPT: Video Generation using VQ-VAE and Transformers

3 code implementations20 Apr 2021 Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas

We present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.

Position Video Generation

VideoGen: Generative Modeling of Videos using VQ-VAE and Transformers

no code implementations1 Jan 2021 Yunzhi Zhang, Wilson Yan, Pieter Abbeel, Aravind Srinivas

We present VideoGen: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.

Position Video Generation

Automatic Curriculum Learning through Value Disagreement

1 code implementation NeurIPS 2020 Yunzhi Zhang, Pieter Abbeel, Lerrel Pinto

Our key insight is that if we can sample goals at the frontier of the set of goals that an agent is able to reach, it will provide a significantly stronger learning signal compared to randomly sampled goals.

Reinforcement Learning (RL)

Asynchronous Methods for Model-Based Reinforcement Learning

1 code implementation28 Oct 2019 Yunzhi Zhang, Ignasi Clavera, Boren Tsai, Pieter Abbeel

In this work, we propose an asynchronous framework for model-based reinforcement learning methods that brings down the run time of these algorithms to be just the data collection time.

Model-based Reinforcement Learning reinforcement-learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.