Search Results for author: Nanxuan Zhao

Found 32 papers, 8 papers with code

EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model

no code implementations10 Jan 2025 Yi He, Shengqi Dang, Long Ling, Ziqing Qian, Nanxuan Zhao, Nan Cao

In this work, we introduce the new task of continuous emotional image content generation (C-EICG) and present EmotiCrafter, an emotional image generation model that generates images based on text prompts and Valence-Arousal values.

Emotion Recognition Image Generation

MotionBridge: Dynamic Video Inbetweening with Flexible Controls

no code implementations17 Dec 2024 Maham Tanveer, Yang Zhou, Simon Niklaus, Ali Mahdavi Amiri, Hao Zhang, Krishna Kumar Singh, Nanxuan Zhao

By generating plausible and smooth transitions between two image frames, video inbetweening is an essential tool for video editing and long video synthesis.

Video Editing Video Generation

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner

no code implementations13 Dec 2024 Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Nanxuan Zhao, Jing Shi, Tong Sun

Compared to previous methods, SUGAR achieves state-of-the-art results in identity preservation, video dynamics, and video-text alignment for subject-driven video customization, demonstrating the effectiveness of our proposed method.

LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors

no code implementations5 Dec 2024 Yusuf Dalva, Yijun Li, Qing Liu, Nanxuan Zhao, Jianming Zhang, Zhe Lin, Pinar Yanardag

In this paper, we propose a novel image generation pipeline based on Latent Diffusion Models (LDMs) that generates images with two layers: a foreground layer (RGBA) with transparency information and a background layer (RGB).

Text-to-Image Generation

NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation

no code implementations CVPR 2024 Vikas Thamizharasan, Difan Liu, Matthew Fisher, Nanxuan Zhao, Evangelos Kalogerakis, Michal Lukac

The success of denoising diffusion models in representing rich data distributions over 2D raster images has prompted research on extending them to other data representations, such as vector graphics.

Denoising Vector Graphics

Text-to-Vector Generation with Neural Path Representation

no code implementations16 May 2024 Peiying Zhang, Nanxuan Zhao, Jing Liao

By optimizing the combination of neural paths, we can incorporate geometric constraints while preserving expressivity in generated SVGs.

Vector Graphics

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

no code implementations30 Apr 2024 Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, Zexiang Xu

We propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D Gaussian primitives from 2-4 posed sparse images in 0. 23 seconds on single A100 GPU.

3D Generation

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

no code implementations8 Apr 2024 Jing Gu, Nanxuan Zhao, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Yilin Wang, Xin Eric Wang

Compared with existing methods for personalized subject swapping, SwapAnything has three unique advantages: (1) precise control of arbitrary objects and parts rather than the main subject, (2) more faithful preservation of context pixels, (3) better adaptation of the personalized concept to the image.

Image Generation Object

AnaMoDiff: 2D Analogical Motion Diffusion via Disentangled Denoising

no code implementations5 Feb 2024 Maham Tanveer, Yizhi Wang, Ruiqi Wang, Nanxuan Zhao, Ali Mahdavi-Amiri, Hao Zhang

We present AnaMoDiff, a novel diffusion-based method for 2D motion analogies that is applied to raw, unannotated videos of articulated characters.

Denoising Optical Flow Estimation

Localizing and Editing Knowledge in Text-to-Image Generative Models

no code implementations20 Oct 2023 Samyadeep Basu, Nanxuan Zhao, Vlad Morariu, Soheil Feizi, Varun Manjunatha

We adapt Causal Mediation Analysis for text-to-image models and trace knowledge about distinct visual attributes to various (causal) components in the (i) UNet and (ii) text-encoder of the diffusion model.

Attribute Image Generation +1

Text-Guided Vector Graphics Customization

no code implementations21 Sep 2023 Peiying Zhang, Nanxuan Zhao, Jing Liao

In this paper, we propose a novel pipeline that generates high-quality customized vector graphics based on textual prompts while preserving the properties and layer-wise information of a given exemplar SVG.

Vector Graphics

NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos

1 code implementation23 Aug 2023 Ziyu Yang, Sucheng Ren, Zongwei Wu, Nanxuan Zhao, Junle Wang, Jing Qin, Shengfeng He

Non-photorealistic videos are in demand with the wave of the metaverse, but lack of sufficient research studies.

Saliency Detection

Language-based Photo Color Adjustment for Graphic Designs

no code implementations6 Aug 2023 Zhenwei Wang, Nanxuan Zhao, Gerhard Hancke, Rynson W. H. Lau

We also introduce an approach for generating a synthetic graphic design dataset with instructions to enable model training.

FashionTex: Controllable Virtual Try-on with Text and Texture

1 code implementation8 May 2023 Anran Lin, Nanxuan Zhao, Shuliang Ning, Yuda Qiu, Baoyuan Wang, Xiaoguang Han

Virtual try-on attracts increasing research attention as a promising way for enhancing the user experience for online cloth shopping.

Virtual Try-on

AssetField: Assets Mining and Reconfiguration in Ground Feature Plane Representation

no code implementations ICCV 2023 Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Bo Dai, Dahua Lin

Traditional modeling pipelines keep an asset library storing unique object templates, which is both versatile and memory efficient in practice.

Novel View Synthesis Object

Grid-guided Neural Radiance Fields for Large Urban Scenes

no code implementations CVPR 2023 Linning Xu, Yuanbo Xiangli, Sida Peng, Xingang Pan, Nanxuan Zhao, Christian Theobalt, Bo Dai, Dahua Lin

An alternative solution is to use a feature grid representation, which is computationally efficient and can naturally scale to a large scene with increased grid resolutions.

Neural Preset for Color Style Transfer

1 code implementation CVPR 2023 Zhanghan Ke, Yuhao Liu, Lei Zhu, Nanxuan Zhao, Rynson W. H. Lau

In this paper, we present a Neural Preset technique to address the limitations of existing color style transfer methods, including visual artifacts, vast memory requirement, and slow style switching speed.

4k Color Normalization +4

Bring Clipart to Life

1 code implementation ICCV 2023 Nanxuan Zhao, Shengqi Dang, Hexun Lin, Yang Shi, Nan Cao

The development of face editing has been boosted since the birth of StyleGAN.

UniColor: A Unified Framework for Multi-Modal Colorization with Transformer

no code implementations22 Sep 2022 Zhitong Huang, Nanxuan Zhao, Jing Liao

In the first stage, multi-modal conditions are converted into a common representation of hint points.

Colorization

BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-scale Scene Rendering

no code implementations10 Dec 2021 Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Anyi Rao, Christian Theobalt, Bo Dai, Dahua Lin

The wide span of viewing positions within these scenes yields multi-scale renderings with very different levels of detail, which poses great challenges to neural radiance field and biases it towards compromised results.

Minecraft

Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation

no code implementations ICCV 2021 Ailing Zeng, Xiao Sun, Lei Yang, Nanxuan Zhao, Minhao Liu, Qiang Xu

While the average prediction accuracy has been improved significantly over the years, the performance on hard poses with depth ambiguity, self-occlusion, and complex or rare poses is still far from satisfactory.

3D Human Pose Estimation 3D Pose Estimation +3

Unifying Global-Local Representations in Salient Object Detection with Transformer

1 code implementation5 Aug 2021 Sucheng Ren, Qiang Wen, Nanxuan Zhao, Guoqiang Han, Shengfeng He

In this paper, we introduce a new attention-based encoder, vision transformer, into salient object detection to ensure the globalization of the representations from shallow to deep layers.

Decoder object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.