no code implementations • 21 Mar 2024 • Haiwen Feng, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Abrevaya, Michael J. Black, Xuaner Zhang
We introduce bounded generation as a generalized task to control video generation to synthesize arbitrary camera and subject motion based only on a given start and end frame.
no code implementations • CVPR 2024 • Mengqi Zhang, Yang Fu, Zheng Ding, Sifei Liu, Zhuowen Tu, Xiaolong Wang
In this paper, we propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data.
no code implementations • CVPR 2024 • Zheng Ding, Xuaner Zhang, Zhuowen Tu, Zhihao Xia
We propose a method to adapt a pretrained diffusion model for image restoration by simply adding noise to the input image to be restored and then denoise.
1 code implementation • CVPR 2024 • ZiRui Wang, Zhizhou Sha, Zheng Ding, Yilin Wang, Zhuowen Tu
We present TokenCompose, a Latent Diffusion Model for text-to-image generation that achieves enhanced consistency between user-specified text prompts and model-generated images.
no code implementations • 25 Oct 2023 • Yilin Wang, Zeyuan Chen, Liangjun Zhong, Zheng Ding, Zhizhou Sha, Zhuowen Tu
In this paper, we introduce a novel generative model, Diffusion Layout Transformers without Autoencoder (Dolfin), which significantly improves the modeling capability with reduced complexity compared to existing methods.
1 code implementation • 2 Aug 2023 • Zheng Ding, Mengqi Zhang, Jiajun Wu, Zhuowen Tu
Feature collage systematically crops and combines partial features of the neighboring patches to predict the features of a shifted image patch, allowing the seamless generation of the entire image due to the overlap in the patch feature space.
1 code implementation • CVPR 2023 • Zheng Ding, Xuaner Zhang, Zhihao Xia, Lars Jebe, Zhuowen Tu, Xiuming Zhang
On a high level, DiffusionRig learns to map simplistic renderings of 3D face models to realistic photos of a given person.
1 code implementation • ICCV 2023 • Xin Xu, Tianyi Xiong, Zheng Ding, Zhuowen Tu
We present a new method for open-vocabulary universal image segmentation, which is capable of performing instance, semantic, and panoptic segmentation under a unified framework.
Ranked #1 on Panoptic Segmentation on ADE20K
no code implementations • 5 Oct 2022 • Zheng Ding, James Hou, Zhuowen Tu
In this paper, we present Position-to-Structure Attention Transformers (PS-Former), a Transformer-based algorithm for 3D point cloud recognition.
1 code implementation • 18 Aug 2022 • Zheng Ding, Jieke Wang, Zhuowen Tu
In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time.
no code implementations • CVPR 2020 • Zheng Ding, Yifan Xu, Weijian Xu, Gaurav Parmar, Yang Yang, Max Welling, Zhuowen Tu
We propose an algorithm, guided variational autoencoder (Guided-VAE), that is able to learn a controllable generative model by performing latent representation disentanglement learning.