no code implementations • 6 Dec 2024 • Chen Geng, Yunzhi Zhang, Shangzhe Wu, Jiajun Wu
We study the problem of generating temporal object intrinsics -- temporally evolving sequences of object geometry, reflectance, and texture, such as a blooming rose -- from pre-trained 2D foundation models.
no code implementations • 27 Nov 2024 • Shengqu Cai, Eric Chan, Yunzhi Zhang, Leonidas Guibas, Jiajun Wu, Gordon Wetzstein
We first leverage a text-to-image diffusion model's in-context generation ability to create grids of images and curate a large paired dataset with the help of a Visual-Language Model.
no code implementations • 22 Oct 2024 • Yunzhi Zhang, Zizhang Li, Matt Zhou, Shangzhe Wu, Jiajun Wu
We introduce the Scene Language, a visual scene representation that concisely and precisely describes the structure, semantics, and identity of visual scenes.
no code implementations • 2 Apr 2024 • Yunzhi Zhang, Zizhang Li, Amit Raj, Andreas Engelhardt, Yuanzhen Li, Tingbo Hou, Jiajun Wu, Varun Jampani
The framework optimizes for the canonical representation together with the pose for each input image, and a per-image coordinate map that warps 2D pixel coordinates to the 3D canonical frame to account for the shape matching.
no code implementations • CVPR 2024 • Andreas Engelhardt, Amit Raj, Mark Boss, Yunzhi Zhang, Abhishek Kar, Yuanzhen Li, Deqing Sun, Ricardo Martin Brualla, Jonathan T. Barron, Hendrik P. A. Lensch, Varun Jampani
We present SHINOBI, an end-to-end framework for the reconstruction of shape, material, and illumination from object images captured with varying lighting, pose, and background.
no code implementations • CVPR 2024 • Zizhang Li, Dor Litvak, Ruining Li, Yunzhi Zhang, Tomas Jakab, Christian Rupprecht, Shangzhe Wu, Andrea Vedaldi, Jiajun Wu
We show that prior category-specific attempts fail to generalize to rare species with limited training images.
no code implementations • 21 Dec 2023 • Keqiang Sun, Dor Litvak, Yunzhi Zhang, Hongsheng Li, Jiajun Wu, Shangzhe Wu
We introduce Ponymation, a new method for learning a generative model of articulated 3D animal motions from raw, unlabeled online videos.
1 code implementation • 6 Dec 2023 • Sharon Lee, Yunzhi Zhang, Shangzhe Wu, Jiajun Wu
To encourage better disentanglement of different concept encoders, we anchor the concept embeddings to a set of text embeddings obtained from a pre-trained Visual Question Answering (VQA) model.
1 code implementation • NeurIPS 2023 • Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, Percy Liang
The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption.
1 code implementation • CVPR 2024 • Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu
Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views.
1 code implementation • NeurIPS 2023 • Zhengfei Kuang, Yunzhi Zhang, Hong-Xing Yu, Samir Agarwala, Shangzhe Wu, Jiajun Wu
We introduce Stanford-ORB, a new real-world 3D Object inverse Rendering Benchmark.
no code implementations • 3 Feb 2023 • Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Ran Zhang, Chin-Yi Cheng, Jiajun Wu
Human-designed visual manuals are crucial components in shape assembly activities.
1 code implementation • CVPR 2023 • Yunzhi Zhang, Shangzhe Wu, Noah Snavely, Jiajun Wu
These instances all share the same intrinsics, but appear different due to a combination of variance within these intrinsics and differences in extrinsic factors, such as pose and illumination.
no code implementations • 25 Jul 2022 • Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Chin-Yi Cheng, Jiajun Wu
We study the problem of translating an image-based, step-by-step assembly manual created by human designers into machine-interpretable instructions.
no code implementations • 23 Jun 2022 • Agrim Gupta, Stephen Tian, Yunzhi Zhang, Jiajun Wu, Roberto Martín-Martín, Li Fei-Fei
This work shows that we can create good video prediction models by pre-training transformers via masked visual modeling.
no code implementations • 4 May 2022 • Yunzhi Zhang, Jiajun Wu
Novel view synthesis (NVS) and video prediction (VP) are typically considered disjoint tasks in computer vision.
3 code implementations • 20 Apr 2021 • Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas
We present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.
no code implementations • 1 Jan 2021 • Yunzhi Zhang, Wilson Yan, Pieter Abbeel, Aravind Srinivas
We present VideoGen: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.
1 code implementation • NeurIPS 2020 • Yunzhi Zhang, Pieter Abbeel, Lerrel Pinto
Our key insight is that if we can sample goals at the frontier of the set of goals that an agent is able to reach, it will provide a significantly stronger learning signal compared to randomly sampled goals.
1 code implementation • 28 Oct 2019 • Yunzhi Zhang, Ignasi Clavera, Boren Tsai, Pieter Abbeel
In this work, we propose an asynchronous framework for model-based reinforcement learning methods that brings down the run time of these algorithms to be just the data collection time.
Model-based Reinforcement Learning reinforcement-learning +2