no code implementations • 10 Jan 2025 • Yi He, Shengqi Dang, Long Ling, Ziqing Qian, Nanxuan Zhao, Nan Cao
In this work, we introduce the new task of continuous emotional image content generation (C-EICG) and present EmotiCrafter, an emotional image generation model that generates images based on text prompts and Valence-Arousal values.
no code implementations • 17 Dec 2024 • Maham Tanveer, Yang Zhou, Simon Niklaus, Ali Mahdavi Amiri, Hao Zhang, Krishna Kumar Singh, Nanxuan Zhao
By generating plausible and smooth transitions between two image frames, video inbetweening is an essential tool for video editing and long video synthesis.
no code implementations • 13 Dec 2024 • Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Nanxuan Zhao, Jing Shi, Tong Sun
Compared to previous methods, SUGAR achieves state-of-the-art results in identity preservation, video dynamics, and video-text alignment for subject-driven video customization, demonstrating the effectiveness of our proposed method.
no code implementations • 10 Dec 2024 • Xi Chen, Zhifei Zhang, He Zhang, Yuqian Zhou, Soo Ye Kim, Qing Liu, Yijun Li, Jianming Zhang, Nanxuan Zhao, Yilin Wang, Hui Ding, Zhe Lin, Hengshuang Zhao
We introduce UniReal, a unified framework designed to address various image generation and editing tasks.
no code implementations • 5 Dec 2024 • Yusuf Dalva, Yijun Li, Qing Liu, Nanxuan Zhao, Jianming Zhang, Zhe Lin, Pinar Yanardag
In this paper, we propose a novel image generation pipeline based on Latent Diffusion Models (LDMs) that generates images with two layers: a foreground layer (RGBA) with transparency information and a background layer (RGB).
1 code implementation • 22 Sep 2024 • Yuming Jiang, Nanxuan Zhao, Qing Liu, Krishna Kumar Singh, Shuai Yang, Chen Change Loy, Ziwei Liu
The training data engine covers the diverse needs of group portrait editing.
no code implementations • 13 Jun 2024 • Yufan Zhou, Ruiyi Zhang, Kaizhi Zheng, Nanxuan Zhao, Jiuxiang Gu, Zichao Wang, Xin Eric Wang, Tong Sun
Our dataset is 5 times the size of previous largest dataset, yet our cost is tens of thousands of GPU hours lower.
no code implementations • CVPR 2024 • Vikas Thamizharasan, Difan Liu, Matthew Fisher, Nanxuan Zhao, Evangelos Kalogerakis, Michal Lukac
The success of denoising diffusion models in representing rich data distributions over 2D raster images has prompted research on extending them to other data representations, such as vector graphics.
no code implementations • 16 May 2024 • Peiying Zhang, Nanxuan Zhao, Jing Liao
By optimizing the combination of neural paths, we can incorporate geometric constraints while preserving expressivity in generated SVGs.
no code implementations • 30 Apr 2024 • Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, Zexiang Xu
We propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D Gaussian primitives from 2-4 posed sparse images in 0. 23 seconds on single A100 GPU.
no code implementations • 8 Apr 2024 • Jing Gu, Nanxuan Zhao, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Yilin Wang, Xin Eric Wang
Compared with existing methods for personalized subject swapping, SwapAnything has three unique advantages: (1) precise control of arbitrary objects and parts rather than the main subject, (2) more faithful preservation of context pixels, (3) better adaptation of the personalized concept to the image.
no code implementations • CVPR 2024 • Yuhao Liu, Zhanghan Ke, Fang Liu, Nanxuan Zhao, Rynson W. H. Lau
Diffusion models trained on large-scale datasets have achieved remarkable progress in image synthesis.
no code implementations • 5 Feb 2024 • Maham Tanveer, Yizhi Wang, Ruiqi Wang, Nanxuan Zhao, Ali Mahdavi-Amiri, Hao Zhang
We present AnaMoDiff, a novel diffusion-based method for 2D motion analogies that is applied to raw, unannotated videos of articulated characters.
no code implementations • CVPR 2024 • Mohammad Amin Shabani, Zhaowen Wang, Difan Liu, Nanxuan Zhao, Jimei Yang, Yasutaka Furukawa
This paper proposes an image-vector dual diffusion model for generative layout design.
no code implementations • 20 Oct 2023 • Samyadeep Basu, Nanxuan Zhao, Vlad Morariu, Soheil Feizi, Varun Manjunatha
We adapt Causal Mediation Analysis for text-to-image models and trace knowledge about distinct visual attributes to various (causal) components in the (i) UNet and (ii) text-encoder of the diffusion model.
no code implementations • 21 Sep 2023 • Peiying Zhang, Nanxuan Zhao, Jing Liao
In this paper, we propose a novel pipeline that generates high-quality customized vector graphics based on textual prompts while preserving the properties and layer-wise information of a given exemplar SVG.
1 code implementation • 23 Aug 2023 • Ziyu Yang, Sucheng Ren, Zongwei Wu, Nanxuan Zhao, Junle Wang, Jing Qin, Shengfeng He
Non-photorealistic videos are in demand with the wave of the metaverse, but lack of sufficient research studies.
no code implementations • 6 Aug 2023 • Zhenwei Wang, Nanxuan Zhao, Gerhard Hancke, Rynson W. H. Lau
We also introduce an approach for generating a synthetic graphic design dataset with instructions to enable model training.
1 code implementation • 8 May 2023 • Anran Lin, Nanxuan Zhao, Shuliang Ning, Yuda Qiu, Baoyuan Wang, Xiaoguang Han
Virtual try-on attracts increasing research attention as a promising way for enhancing the user experience for online cloth shopping.
no code implementations • ICCV 2023 • Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Bo Dai, Dahua Lin
Traditional modeling pipelines keep an asset library storing unique object templates, which is both versatile and memory efficient in practice.
no code implementations • CVPR 2023 • Linning Xu, Yuanbo Xiangli, Sida Peng, Xingang Pan, Nanxuan Zhao, Christian Theobalt, Bo Dai, Dahua Lin
An alternative solution is to use a feature grid representation, which is computationally efficient and can naturally scale to a large scene with increased grid resolutions.
1 code implementation • CVPR 2023 • Zhanghan Ke, Yuhao Liu, Lei Zhu, Nanxuan Zhao, Rynson W. H. Lau
In this paper, we present a Neural Preset technique to address the limitations of existing color style transfer methods, including visual artifacts, vast memory requirement, and slow style switching speed.
1 code implementation • ICCV 2023 • Nanxuan Zhao, Shengqi Dang, Hexun Lin, Yang Shi, Nan Cao
The development of face editing has been boosted since the birth of StyleGAN.
no code implementations • 22 Sep 2022 • Zhitong Huang, Nanxuan Zhao, Jing Liao
In the first stage, multi-modal conditions are converted into a common representation of hint points.
1 code implementation • CVPR 2022 • Haodong Duan, Nanxuan Zhao, Kai Chen, Dahua Lin
To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation.
no code implementations • 10 Dec 2021 • Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Anyi Rao, Christian Theobalt, Bo Dai, Dahua Lin
The wide span of viewing positions within these scenes yields multi-scale renderings with very different levels of detail, which poses great challenges to neural radiance field and biases it towards compromised results.
no code implementations • ICCV 2021 • Ailing Zeng, Xiao Sun, Lei Yang, Nanxuan Zhao, Minhao Liu, Qiang Xu
While the average prediction accuracy has been improved significantly over the years, the performance on hard poses with depth ambiguity, self-occlusion, and complex or rare poses is still far from satisfactory.
Ranked #30 on Skeleton Based Action Recognition on NTU RGB+D 120
1 code implementation • 5 Aug 2021 • Sucheng Ren, Qiang Wen, Nanxuan Zhao, Guoqiang Han, Shengfeng He
In this paper, we introduce a new attention-based encoder, vision transformer, into salient object detection to ensure the globalization of the representations from shallow to deep layers.
1 code implementation • CVPR 2021 • Haoxin Chen, Hanjie Wu, Nanxuan Zhao, Sucheng Ren, Shengfeng He
The key is to model the relationship between the query videos and the support images for propagating the object information.
no code implementations • ICCV 2021 • Linning Xu, Yuanbo Xiangli, Anyi Rao, Nanxuan Zhao, Bo Dai, Ziwei Liu, Dahua Lin
City modeling is the foundation for computational urban planning, navigation, and entertainment.
no code implementations • ICLR 2021 • Nanxuan Zhao, Zhirong Wu, Rynson W. H. Lau, Stephen Lin
Contrastive visual pretraining based on the instance discrimination pretext task has made significant progress.
no code implementations • 14 Apr 2020 • Nanxuan Zhao, Zhirong Wu, Rynson W. H. Lau, Stephen Lin
To address this problem, we propose a data-driven approach for learning invariance to backgrounds.