MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis

no code implementations28 Jun 2024 Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Qi He, Wangmeng Xiang, Hanyuan Chen, Jin-Peng Lan, Xianhui Lin, Kang Zhu, Bin Luo, Yifeng Geng, Xuansong Xie, Alexander G. Hauptmann

MetaDesigner revolutionizes artistic typography synthesis by leveraging the strengths of Large Language Models (LLMs) to drive a design paradigm centered around user engagement.

VirtualModel: Generating Object-ID-retentive Human-object Interaction Image by Diffusion Model for E-commerce Marketing

no code implementations16 May 2024 Binghui Chen, Chongyang Zhong, Wangmeng Xiang, Yifeng Geng, Xuansong Xie

Due to the significant advances in large-scale text-to-image generation by diffusion model (DM), controllable human image generation has been attracting much attention recently.

Human-Object Interaction Detection Marketing

RobustMVS: Single Domain Generalized Deep Multi-view Stereo

1 code implementation15 May 2024 Hongbin Xu, Weitao Chen, Baigui Sun, Xuansong Xie, Wenxiong Kang

To evaluate the generalization results, we build a novel MVS domain generalization benchmark including synthetic and real-world datasets.

Domain Generalization

DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation

1 code implementation9 Apr 2024 Junkai Yan, Yipeng Gao, Qize Yang, Xihan Wei, Xuansong Xie, AnCong Wu, Wei-Shi Zheng

Text-to-3D generation, which synthesizes 3D assets according to an overall text description, has significantly progressed.

3D Generation Text to 3D

SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions

1 code implementation9 Apr 2024 Xiaoyu Liu, Yuxiang Wei, Ming Liu, Xianhui Lin, Peiran Ren, Xuansong Xie, WangMeng Zuo

The key idea of our SmartControl is to relax the visual condition on the areas that are conflicted with text prompts.

Strictly-ID-Preserved and Controllable Accessory Advertising Image Generation

no code implementations7 Apr 2024 Youze Xue, Binghui Chen, Yifeng Geng, Xuansong Xie, Jiansheng Chen, Hongbing Ma

Customized generative text-to-image models have the ability to produce images that closely resemble a given subject.

Image Generation

ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model

no code implementations7 Apr 2024 Binghui Chen, Wenyu Li, Yifeng Geng, Xuansong Xie, WangMeng Zuo

Specifically, we propose a shoe-wearing system, called Shoe-Model, to generate plausible images of human legs interacting with the given shoes.

Image Generation Marketing

VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

1 code implementation8 Mar 2024 Yabo Zhang, Yuxiang Wei, Xianhui Lin, Zheng Hui, Peiran Ren, Xuansong Xie, Xiangyang Ji, WangMeng Zuo

Different from conventional T2V sampling (i. e., temporal and spatial modeling), VideoElevator explicitly decomposes each sampling step into temporal motion refining and spatial quality elevating.

Video Generation

DivAvatar: Diverse 3D Avatar Generation with a Single Prompt

no code implementations27 Feb 2024 Weijing Tao, Biwen Lei, Kunhao Liu, Shijian Lu, Miaomiao Cui, Xuansong Xie, Chunyan Miao

We design DivAvatar, a novel framework that generates diverse avatars, empowering 3D creatives with a multitude of distinct and richly varied 3D avatars from a single text prompt.


WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope

no code implementations3 Jan 2024 Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Yusen Hu, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou

This paper introduces the WordArt Designer API, a novel framework for user-driven artistic typography synthesis utilizing Large Language Models (LLMs) on ModelScope.

3DToonify: Creating Your High-Fidelity 3D Stylized Avatar Easily from 2D Portrait Images

no code implementations CVPR 2024 Yifang Men, Hanxi Liu, Yuan YAO, Miaomiao Cui, Xuansong Xie, Zhouhui Lian

In this paper we make a connection between the two and tackle the challenging task of 3D portrait stylization - modeling high-fidelity 3D stylized avatars from captured 2D portrait images.

Style Transfer

DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors

no code implementations CVPR 2024 Biwen Lei, Kai Yu, Mengyang Feng, Miaomiao Cui, Xuansong Xie

Extensive experiments demonstrate that the proposed framework achieves excellent results in both domain adaptation and text-to-avatar tasks, outperforming existing methods in terms of generation quality and efficiency.

3D Generation Domain Adaptation

DreaMoving: A Human Video Generation Framework based on Diffusion Models

no code implementations8 Dec 2023 Mengyang Feng, Jinlin Liu, Kai Yu, Yuan YAO, Zheng Hui, Xiefan Guo, Xianhui Lin, Haolan Xue, Chen Shi, Xiaowen Li, Aojie Li, Xiaoyang Kang, Biwen Lei, Miaomiao Cui, Peiran Ren, Xuansong Xie

In this paper, we present DreaMoving, a diffusion-based controllable video generation framework to produce high-quality customized human videos.

Video Generation

Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models

1 code implementation22 Nov 2023 Mengyang Feng, Jinlin Liu, Miaomiao Cui, Xuansong Xie

This is a technical report on the 360-degree panoramic image generation task based on diffusion models.

Denoising Image Generation

Boosting3D: High-Fidelity Image-to-3D by Boosting 2D Diffusion Prior to 3D Prior with Progressive Learning

no code implementations22 Nov 2023 Kai Yu, Jinlin Liu, Mengyang Feng, Miaomiao Cui, Xuansong Xie

After the progressive training, the LoRA learns the 3D information of the generated object and eventually turns to an object-level 3D prior.

3D Generation Image to 3D +1

FMViT: A multiple-frequency mixing Vision Transformer

no code implementations9 Nov 2023 Wei Tan, Yifeng Geng, Xuansong Xie

On CoreML, FMViT outperforms MobileOne by 2. 6% in top-1 accuracy on the ImageNet dataset, with inference latency comparable to MobileOne (78. 5% vs. 75. 9%).

AnyText: Multilingual Visual Text Generation And Editing

1 code implementation6 Nov 2023 Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, Xuansong Xie

Based on AnyWord-3M dataset, we propose AnyText-benchmark for the evaluation of visual text generation accuracy and quality.

Optical Character Recognition (OCR) Text Generation

Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization

1 code implementation28 Aug 2023 Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, Lei Zhang

Diffusion models have demonstrated impressive performance in various image generation, editing, enhancement and translation tasks.

Image Enhancement Image Generation +3

FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content

1 code implementation28 Aug 2023 Yang Liu, Cheng Yu, Lei Shang, Yongyi He, Ziheng Wu, Xingjun Wang, Chao Xu, Haoyu Xie, Weida Wang, Yuze Zhao, Lin Zhu, Chen Cheng, Weitao Chen, Yuan YAO, Wenmeng Zhou, Jiaqi Xu, Qiang Wang, Yingda Chen, Xuansong Xie, Baigui Sun

In this paper, we present FaceChain, a personalized portrait generation framework that combines a series of customized image-generation model and a rich set of face-related perceptual understanding models (\eg, face detection, deep face embedding extraction, and facial attribute recognition), to tackle aforementioned challenges and to generate truthful personalized portraits, with only a handful of portrait images as input.

Attribute Personalized Image Generation +2

TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective

1 code implementation ICCV 2023 Jun Dan, Yang Liu, Haoyu Xie, Jiankang Deng, Haoran Xie, Xuansong Xie, Baigui Sun

We investigate the reasons for this phenomenon and discover that the existing data augmentation approach and hard sample mining strategy are incompatible with ViTs-based FR backbone due to the lack of tailored consideration on preserving face structural information and leveraging each local token information.

Data Augmentation Diversity +1

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness

1 code implementation19 May 2023 Yuxuan Zhou, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Yifeng Geng, Xuansong Xie

As a remedy, we propose a threefold strategy: (1) We forge an innovative pathway that encodes bone connectivity by harnessing the power of graph distances.

Action Recognition Skeleton Based Action Recognition

CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo

no code implementations17 May 2023 Weitao Chen, Hongbin Xu, Zhipeng Zhou, Yang Liu, Baigui Sun, Wenxiong Kang, Xuansong Xie

The Residual Depth-Aware Cost Transformer(RDACT) is proposed to aggregate long-range features on cost volume via self-attention mechanisms along the depth and spatial dimensions.

PointDC:Unsupervised Semantic Segmentation of 3D Point Clouds via Cross-modal Distillation and Super-Voxel Clustering

1 code implementation18 Apr 2023 Zisheng Chen, Hongbin Xu, Weitao Chen, Zhipeng Zhou, Haihong Xiao, Baigui Sun, Xuansong Xie, Wenxiong Kang

Semantic segmentation of point clouds usually requires exhausting efforts of human annotations, hence it attracts wide attention to the challenging topic of learning from unlabeled or weaker forms of annotations.

Clustering Segmentation +1

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

1 code implementation30 Mar 2023 Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie

Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research.

Autonomous Driving

RSFNet: A White-Box Image Retouching Approach using Region-Specific Color Filters

1 code implementation ICCV 2023 Wenqi Ouyang, Yi Dong, Xiaoyang Kang, Peiran Ren, Xin Xu, Xuansong Xie

Therefore, there is a need for white-box approaches that produce satisfying results and enable users to conveniently edit their images simultaneously.

Ranked #3 on Image Enhancement on MIT-Adobe 5k (PSNR on proRGB metric)

Image Enhancement Image Retouching +1

Synthesizing Realistic Image Restoration Training Pairs: A Diffusion Approach

no code implementations13 Mar 2023 Tao Yang, Peiran Ren, Xuansong Xie, Lei Zhang

In supervised image restoration tasks, one key issue is how to obtain the aligned high-quality (HQ) and low-quality (LQ) training image pairs.

Denoising Image Restoration +1

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning

1 code implementation8 Mar 2023 Ziheng Qin, Kai Wang, Zangwei Zheng, Jianyang Gu, Xiangyu Peng, Zhaopan Xu, Daquan Zhou, Lei Shang, Baigui Sun, Xuansong Xie, Yang You

To solve this problem, we propose \textbf{InfoBatch}, a novel framework aiming to achieve lossless training acceleration by unbiased dynamic data pruning.

Semantic Segmentation

PointDC: Unsupervised Semantic Segmentation of 3D Point Clouds via Cross-Modal Distillation and Super-Voxel Clustering

1 code implementation ICCV 2023 Zisheng Chen, Hongbin Xu, Weitao Chen, Zhipeng Zhou, Haihong Xiao, Baigui Sun, Xuansong Xie, Wenxiong Kang

Semantic segmentation of point clouds usually requires exhausting efforts of human annotations, hence it attracts wide attention to a challenging topic of learning from unlabeled or weaker form of annotations.

Clustering Segmentation +1

Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection

no code implementations CVPR 2023 Xiaolin Song, Binghui Chen, Pengyu Li, Jun-Yan He, Biao Wang, Yifeng Geng, Xuansong Xie, Honggang Zhang

End-to-end pedestrian detection focuses on training a pedestrian detection model via discarding the Non-Maximum Suppression (NMS) post-processing.

Pedestrian Detection

LongShortNet: Exploring Temporal and Semantic Features Fusion in Streaming Perception

2 code implementations27 Oct 2022 Chenyang Li, Zhi-Qi Cheng, Jun-Yan He, Pengyu Li, Bin Luo, Hanyuan Chen, Yifeng Geng, Jin-Peng Lan, Xuansong Xie

Streaming perception is a critical task in autonomous driving that requires balancing the latency and accuracy of the autopilot system.

Autonomous Driving

Semi-supervised Deep Multi-view Stereo

no code implementations24 Jul 2022 Hongbin Xu, Weitao Chen, Yang Liu, Zhipeng Zhou, Haihong Xiao, Baigui Sun, Xuansong Xie, Wenxiong Kang

For further troublesome case that the basic assumption is conflicted in MVS data, we propose a novel style consistency loss to alleviate the negative effect caused by the distribution gap.

DCT-Net: Domain-Calibrated Translation for Portrait Stylization

3 code implementations6 Jul 2022 Yifang Men, Yuan YAO, Miaomiao Cui, Zhouhui Lian, Xuansong Xie

This paper introduces DCT-Net, a novel image translation architecture for few-shot portrait stylization.

Few-Shot Learning Style Transfer +1

Towards Counterfactual Image Manipulation via CLIP

1 code implementation6 Jul 2022 Yingchen Yu, Fangneng Zhan, Rongliang Wu, Jiahui Zhang, Shijian Lu, Miaomiao Cui, Xuansong Xie, Xian-Sheng Hua, Chunyan Miao

In addition, we design a simple yet effective scheme that explicitly maps CLIP embeddings (of target text) to the latent space and fuses them with latent codes for effective latent code optimization and accurate editing.

counterfactual Image Manipulation

Improving Nighttime Driving-Scene Segmentation via Dual Image-adaptive Learnable Filters

2 code implementations4 Jul 2022 Wenyu Liu, Wentong Li, Jianke Zhu, Miaomiao Cui, Xuansong Xie, Lei Zhang

With DIAL-Filters, we design both unsupervised and supervised frameworks for nighttime driving-scene segmentation, which can be trained in an end-to-end manner.

Autonomous Driving Scene Segmentation +1

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

2 code implementations11 May 2022 Yawei Li, Kai Zhang, Radu Timofte, Luc van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu1, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoğlu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gaoand Dengwen Zhouand Qian Ning, Jingzhu Tang, Han Huang, YuFei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Dengwen Zhou, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang

The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29. 00dB on DIV2K validation set.

Image Super-Resolution

Beyond a Video Frame Interpolator: A Space Decoupled Learning Approach to Continuous Image Transition

1 code implementation18 Mar 2022 Tao Yang, Peiran Ren, Xuansong Xie, Xiansheng Hua, Lei Zhang

Most of the existing deep learning based VFI methods adopt off-the-shelf optical flow algorithms to estimate the bidirectional flows and interpolate the missing frames accordingly.

Image Generation Image Morphing +3

ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo

1 code implementation CVPR 2022 Biwen Lei, Xiefan Guo, Hongyu Yang, Miaomiao Cui, Xuansong Xie, Di Huang

The network is mainly composed of two components: a context-aware local retouching layer (LRL) and an adaptive blend pyramid layer (BPL).

4k Photo Retouching

Unpaired Cartoon Image Synthesis via Gated Cycle Mapping

no code implementations CVPR 2022 Yifang Men, Yuan YAO, Miaomiao Cui, Zhouhui Lian, Xuansong Xie, Xian-Sheng Hua

Experimental results demonstrate the superiority of the proposed method over the state of the art and validate its effectiveness in the brand-new task of general cartoon image synthesis.

Image Generation Video Generation

Noise-Resistant Deep Metric Learning with Probabilistic Instance Filtering

no code implementations3 Aug 2021 Chang Liu, Han Yu, Boyang Li, Zhiqi Shen, Zhanning Gao, Peiran Ren, Xuansong Xie, Lizhen Cui, Chunyan Miao

Noisy labels are commonly found in real-world data, which cause performance degradation of deep neural networks.

Metric Learning

WaveFill: A Wavelet-based Generation Network for Image Inpainting

1 code implementation ICCV 2021 Yingchen Yu, Fangneng Zhan, Shijian Lu, Jianxiong Pan, Feiying Ma, Xuansong Xie, Chunyan Miao

This paper presents WaveFill, a wavelet-based inpainting network that decomposes images into multiple frequency bands and fills the missing regions in each frequency band separately and explicitly.

Image Inpainting

Sparse Needlets for Lighting Estimation with Spherical Transport Loss

no code implementations ICCV 2021 Fangneng Zhan, Changgong Zhang, WenBo Hu, Shijian Lu, Feiying Ma, Xuansong Xie, Ling Shao

Accurate lighting estimation is challenging yet critical to many computer vision and computer graphics tasks such as high-dynamic-range (HDR) relighting.

Lighting Estimation

Attention-guided Temporally Coherent Video Object Matting

1 code implementation24 May 2021 Yunke Zhang, Chi Wang, Miaomiao Cui, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Hujun Bao, QiXing Huang, Weiwei Xu

Experimental results show that our method can generate high-quality alpha mattes for various videos featuring appearance change, occlusion, and fast motion.

Image Matting Object +4

PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency

1 code implementation CVPR 2021 Jie Liang, Hui Zeng, Miaomiao Cui, Xuansong Xie, Lei Zhang

HRP requires that more attention should be paid to human regions, while GLC requires that a group of portrait photos should be retouched to a consistent tone.

Photo Retouching

GAN Prior Embedded Network for Blind Face Restoration in the Wild

3 code implementations CVPR 2021 Tao Yang, Peiran Ren, Xuansong Xie, Lei Zhang

The proposed GAN prior embedded network (GPEN) is easy-to-implement, and it can generate visually photo-realistic results.

Blind Face Restoration Decoder +2

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

no code implementations26 Apr 2021 Yingchen Yu, Fangneng Zhan, Rongliang Wu, Jianxiong Pan, Kaiwen Cui, Shijian Lu, Feiying Ma, Xuansong Xie, Chunyan Miao

With image-level attention, transformers enable to model long-range dependencies and generate diverse contents with autoregressive modeling of pixel-sequence distributions.

Diversity Image Inpainting +1

GMLight: Lighting Estimation via Geometric Distribution Approximation

1 code implementation20 Feb 2021 Fangneng Zhan, Yingchen Yu, Changgong Zhang, Rongliang Wu, WenBo Hu, Shijian Lu, Feiying Ma, Xuansong Xie, Ling Shao

This paper presents Geometric Mover's Light (GMLight), a lighting estimation framework that employs a regression network and a generative projector for effective illumination estimation.

Lighting Estimation regression

EMLight: Lighting Estimation via Spherical Distribution Approximation

no code implementations21 Dec 2020 Fangneng Zhan, Changgong Zhang, Yingchen Yu, Yuan Chang, Shijian Lu, Feiying Ma, Xuansong Xie

Motivated by the Earth Mover distance, we design a novel spherical mover's loss that guides to regress light distribution parameters accurately by taking advantage of the subtleties of spherical distribution.

Lighting Estimation regression

Adversarial Image Composition with Auxiliary Illumination

no code implementations17 Sep 2020 Fangneng Zhan, Shijian Lu, Changgong Zhang, Feiying Ma, Xuansong Xie

State-of-the-art methods strive to harmonize the composed image by adapting the style of foreground objects to be compatible with the background image, whereas the potential shadow of foreground objects within the composed image which is critical to the composition realism is largely neglected.

Towards Realistic 3D Embedding via View Alignment

no code implementations14 Jul 2020 Changgong Zhang, Fangneng Zhan, Shijian Lu, Feiying Ma, Xuansong Xie

Recent advances in generative adversarial networks (GANs) have achieved great success in automated image composition that generates new images by embedding interested foreground objects into background images automatically.

Boosting Semantic Human Matting with Coarse Annotations

1 code implementation CVPR 2020 Jinlin Liu, Yuan YAO, Wendi Hou, Miaomiao Cui, Xuansong Xie, Chang-Shui Zhang, Xian-Sheng Hua

In this paper, we propose to use coarse annotated data coupled with fine annotated data to boost end-to-end semantic human matting without trimaps as extra input.

Image Matting Semantic Segmentation

Automated Segmentation of Pulmonary Lobes using Coordination-Guided Deep Neural Networks

2 code implementations19 Apr 2019 Wenjia Wang, Junxuan Chen, Jie Zhao, Ying Chi, Xuansong Xie, Li Zhang, Xian-Sheng Hua

The proposed model is trained and evaluated on a few publicly available datasets and has achieved the state-of-the-art accuracy with a mean Dice coefficient index of 0. 947 $\pm$ 0. 044.


Attention-aware Multi-stroke Style Transfer

1 code implementation CVPR 2019 Yuan Yao, Jianqiang Ren, Xuansong Xie, Weidong Liu, Yong-Jin Liu, Jun Wang

Neural style transfer has drawn considerable attention from both academic and industrial field.

Style Transfer

