Search Results for author: Xuansong Xie

Found 72 papers, 45 papers with code

FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content

1 code implementation • 28 Aug 2023 • Yang Liu, Cheng Yu, Lei Shang, Yongyi He, Ziheng Wu, Xingjun Wang, Chao Xu, Haoyu Xie, Weida Wang, Yuze Zhao, Lin Zhu, Chen Cheng, Weitao Chen, Yuan YAO, Wenmeng Zhou, Jiaqi Xu, Qiang Wang, Yingda Chen, Xuansong Xie, Baigui Sun

In this paper, we present FaceChain, a personalized portrait generation framework that combines a series of customized image-generation model and a rich set of face-related perceptual understanding models (\eg, face detection, deep face embedding extraction, and facial attribute recognition), to tackle aforementioned challenges and to generate truthful personalized portraits, with only a handful of portrait images as input.

Attribute Potrait Generation +1

8,282

Paper
Code

DCT-Net: Domain-Calibrated Translation for Portrait Stylization

3 code implementations • 6 Jul 2022 • Yifang Men, Yuan YAO, Miaomiao Cui, Zhouhui Lian, Xuansong Xie

This paper introduces DCT-Net, a novel image translation architecture for few-shot portrait stylization.

Few-Shot Learning Style Transfer +1

6,005

Paper
Code

AnyText: Multilingual Visual Text Generation And Editing

1 code implementation • 6 Nov 2023 • Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, Xuansong Xie

Based on AnyWord-3M dataset, we propose AnyText-benchmark for the evaluation of visual text generation accuracy and quality.

Optical Character Recognition (OCR) Text Generation

3,723

Paper
Code

GAN Prior Embedded Network for Blind Face Restoration in the Wild

3 code implementations • CVPR 2021 • Tao Yang, Peiran Ren, Xuansong Xie, Lei Zhang

The proposed GAN prior embedded network (GPEN) is easy-to-implement, and it can generate visually photo-realistic results.

Ranked #1 on Blind Face Restoration on CelebA-HQ

Blind Face Restoration Generative Adversarial Network +1

2,288

Paper
Code

Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization

1 code implementation • 28 Aug 2023 • Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, Lei Zhang

Diffusion models have demonstrated impressive performance in various image generation, editing, enhancement and translation tasks.

Image Enhancement Image Generation +3

788

Paper
Code

DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders

1 code implementation • ICCV 2023 • Xiaoyang Kang, Tao Yang, Wenqi Ouyang, Peiran Ren, Lingzhi Li, Xuansong Xie

Image colorization is a challenging problem due to multi-modal uncertainty and high ill-posedness.

Colorization Image Colorization

779

Paper
Code

Improving Nighttime Driving-Scene Segmentation via Dual Image-adaptive Learnable Filters

2 code implementations • 4 Jul 2022 • Wenyu Liu, Wentong Li, Jianke Zhu, Miaomiao Cui, Xuansong Xie, Lei Zhang

With DIAL-Filters, we design both unsupervised and supervised frameworks for nighttime driving-scene segmentation, which can be trained in an end-to-end manner.

Autonomous Driving Scene Segmentation +1

463

Paper
Code

Boosting Semantic Human Matting with Coarse Annotations

1 code implementation • CVPR 2020 • Jinlin Liu, Yuan YAO, Wendi Hou, Miaomiao Cui, Xuansong Xie, Chang-Shui Zhang, Xian-Sheng Hua

In this paper, we propose to use coarse annotated data coupled with fine annotated data to boost end-to-end semantic human matting without trimaps as extra input.

Ranked #9 on Image Matting on AM-2K

Image Matting Semantic Segmentation

374

Paper
Code

A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images

1 code implementation • CVPR 2023 • Biwen Lei, Jianqiang Ren, Mengyang Feng, Miaomiao Cui, Xuansong Xie

Meanwhile, 3D priors of facial details are incorporated to enhance the accuracy and authenticity of the reconstruction results.

Ranked #3 on 3D Face Reconstruction on REALY (side-view)

3D Face Reconstruction Disentanglement

374

Paper
Code

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning

1 code implementation • 8 Mar 2023 • Ziheng Qin, Kai Wang, Zangwei Zheng, Jianyang Gu, Xiangyu Peng, Zhaopan Xu, Daquan Zhou, Lei Shang, Baigui Sun, Xuansong Xie, Yang You

To solve this problem, we propose \textbf{InfoBatch}, a novel framework aiming to achieve lossless training acceleration by unbiased dynamic data pruning.

Semantic Segmentation

267

Paper
Code

PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency

1 code implementation • CVPR 2021 • Jie Liang, Hui Zeng, Miaomiao Cui, Xuansong Xie, Lei Zhang

HRP requires that more attention should be paid to human regions, while GLC requires that a group of portrait photos should be retouched to a consistent tone.

Photo Retouching

242

Paper
Code

GMLight: Lighting Estimation via Geometric Distribution Approximation

1 code implementation • 20 Feb 2021 • Fangneng Zhan, Yingchen Yu, Changgong Zhang, Rongliang Wu, WenBo Hu, Shijian Lu, Feiying Ma, Xuansong Xie, Ling Shao

This paper presents Geometric Mover's Light (GMLight), a lighting estimation framework that employs a regression network and a generative projector for effective illumination estimation.

Lighting Estimation regression

160

Paper
Code

FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation

1 code implementation • CVPR 2023 • Junjie He, Pengyu Li, Yifeng Geng, Xuansong Xie

In this paper, we show the strong potential of query-based models on efficient instance segmentation algorithm designs.

Real-time Instance Segmentation Segmentation +1

156

Paper
Code

Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models

1 code implementation • 22 Nov 2023 • Mengyang Feng, Jinlin Liu, Miaomiao Cui, Xuansong Xie

This is a technical report on the 360-degree panoramic image generation task based on diffusion models.

Denoising Image Generation

147

Paper
Code

DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation

2 code implementations • 9 Apr 2024 • Junkai Yan, Yipeng Gao, Qize Yang, Xihan Wei, Xuansong Xie, AnCong Wu, Wei-Shi Zheng

Text-to-3D generation, which synthesizes 3D assets according to an overall text description, has significantly progressed.

3D Generation Text to 3D

132

Paper
Code

Structure-Aware Flow Generation for Human Body Reshaping

1 code implementation • CVPR 2022 • Jianqiang Ren, Yuan YAO, Biwen Lei, Miaomiao Cui, Xuansong Xie

Body reshaping is an important procedure in portrait photo retouching.

Photo Retouching

129

Paper
Code

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

2 code implementations • 11 May 2022 • Yawei Li, Kai Zhang, Radu Timofte, Luc van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu1, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoğlu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gaoand Dengwen Zhouand Qian Ning, Jingzhu Tang, Han Huang, YuFei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Dengwen Zhou, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang

The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29. 00dB on DIV2K validation set.

Image Super-Resolution

116

Paper
Code

VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

1 code implementation • 8 Mar 2024 • Yabo Zhang, Yuxiang Wei, Xianhui Lin, Zheng Hui, Peiran Ren, Xuansong Xie, Xiangyang Ji, WangMeng Zuo

Different from conventional T2V sampling (i. e., temporal and spatial modeling), VideoElevator explicitly decomposes each sampling step into temporal motion refining and spatial quality elevating.

Video Generation

111

Paper
Code

Attention-aware Multi-stroke Style Transfer

1 code implementation • CVPR 2019 • Yuan Yao, Jianqiang Ren, Xuansong Xie, Weidong Liu, Yong-Jin Liu, Jun Wang

Neural style transfer has drawn considerable attention from both academic and industrial field.

Style Transfer

Paper
Code

Attention-guided Temporally Coherent Video Object Matting

1 code implementation • 24 May 2021 • Yunke Zhang, Chi Wang, Miaomiao Cui, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Hujun Bao, QiXing Huang, Weiwei Xu

Experimental results show that our method can generate high-quality alpha mattes for various videos featuring appearance change, occlusion, and fast motion.

Image Matting Object +4

Paper
Code

Active Boundary Loss for Semantic Segmentation

1 code implementation • 4 Feb 2021 • Chi Wang, Yunke Zhang, Miaomiao Cui, Peiran Ren, Yin Yang, Xuansong Xie, Xiansheng Hua, Hujun Bao, Weiwei Xu

This paper proposes a novel active boundary loss for semantic segmentation.

Segmentation Semantic Segmentation +2

Paper
Code

Hypergraph Transformer for Skeleton-based Action Recognition

1 code implementation • 17 Nov 2022 • Yuxuan Zhou, Zhi-Qi Cheng, Chao Li, Yanwen Fang, Yifeng Geng, Xuansong Xie, Margret Keuper

Skeleton-based action recognition aims to recognize human actions given human joint coordinates with skeletal interconnections.

Ranked #7 on Skeleton Based Action Recognition on NTU RGB+D 120

Action Recognition Skeleton Based Action Recognition

Paper
Code

ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo

1 code implementation • CVPR 2022 • Biwen Lei, Xiefan Guo, Hongyu Yang, Miaomiao Cui, Xuansong Xie, Di Huang

The network is mainly composed of two components: a context-aware local retouching layer (LRL) and an adaptive blend pyramid layer (BPL).

4k Photo Retouching

Paper
Code

WaveFill: A Wavelet-based Generation Network for Image Inpainting

1 code implementation • ICCV 2021 • Yingchen Yu, Fangneng Zhan, Shijian Lu, Jianxiong Pan, Feiying Ma, Xuansong Xie, Chunyan Miao

This paper presents WaveFill, a wavelet-based inpainting network that decomposes images into multiple frequency bands and fills the missing regions in each frequency band separately and explicitly.

Image Inpainting

Paper
Code

Tracking with Human-Intent Reasoning

1 code implementation • 29 Dec 2023 • Jiawen Zhu, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Huchuan Lu, Yifeng Geng, Xuansong Xie

The perception component then generates the tracking results based on the embeddings.

Language Modelling Object +4

Paper
Code

SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions

1 code implementation • 9 Apr 2024 • Xiaoyu Liu, Yuxiang Wei, Ming Liu, Xianhui Lin, Peiran Ren, Xuansong Xie, WangMeng Zuo

The key idea of our SmartControl is to relax the visual condition on the areas that are conflicted with text prompts.

Paper
Code

Towards Counterfactual Image Manipulation via CLIP

1 code implementation • 6 Jul 2022 • Yingchen Yu, Fangneng Zhan, Rongliang Wu, Jiahui Zhang, Shijian Lu, Miaomiao Cui, Xuansong Xie, Xian-Sheng Hua, Chunyan Miao

In addition, we design a simple yet effective scheme that explicitly maps CLIP embeddings (of target text) to the latent space and fuses them with latent codes for effective latent code optimization and accurate editing.

counterfactual Image Manipulation

Paper
Code

Noise-resistant Deep Metric Learning with Ranking-based Instance Selection

1 code implementation • CVPR 2021 • Chang Liu, Han Yu, Boyang Li, Zhiqi Shen, Zhanning Gao, Peiran Ren, Xuansong Xie, Lizhen Cui, Chunyan Miao

The existence of noisy labels in real-world data negatively impacts the performance of deep learning models.

Metric Learning

Paper
Code

TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective

1 code implementation • ICCV 2023 • Jun Dan, Yang Liu, Haoyu Xie, Jiankang Deng, Haoran Xie, Xuansong Xie, Baigui Sun

We investigate the reasons for this phenomenon and discover that the existing data augmentation approach and hard sample mining strategy are incompatible with ViTs-based FR backbone due to the lack of tailored consideration on preserving face structural information and leveraging each local token information.

Data Augmentation Face Recognition

Paper
Code

HDFormer: High-order Directed Transformer for 3D Human Pose Estimation

1 code implementation • 3 Feb 2023 • Hanyuan Chen, Jun-Yan He, Wangmeng Xiang, Zhi-Qi Cheng, Wei Liu, Hanbing Liu, Bin Luo, Yifeng Geng, Xuansong Xie

Human pose estimation is a challenging task due to its structured data sequence nature.

Ranked #74 on 3D Human Pose Estimation on Human3.6M

3D Human Pose Estimation 3D Pose Estimation +1

Paper
Code

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness

1 code implementation • 19 May 2023 • Yuxuan Zhou, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Yifeng Geng, Xuansong Xie

As a remedy, we propose a threefold strategy: (1) We forge an innovative pathway that encodes bone connectivity by harnessing the power of graph distances.

Action Recognition Skeleton Based Action Recognition

Paper
Code

RSFNet: A White-Box Image Retouching Approach using Region-Specific Color Filters

1 code implementation • ICCV 2023 • Wenqi Ouyang, Yi Dong, Xiaoyang Kang, Peiran Ren, Xin Xu, Xuansong Xie

Therefore, there is a need for white-box approaches that produce satisfying results and enable users to conveniently edit their images simultaneously.

Ranked #3 on Image Enhancement on MIT-Adobe 5k (PSNR on proRGB metric)

Image Enhancement Image Retouching +1

Paper
Code

Beyond a Video Frame Interpolator: A Space Decoupled Learning Approach to Continuous Image Transition

1 code implementation • 18 Mar 2022 • Tao Yang, Peiran Ren, Xuansong Xie, Xiansheng Hua, Lei Zhang

Most of the existing deep learning based VFI methods adopt off-the-shelf optical flow algorithms to estimate the bidirectional flows and interpolate the missing frames accordingly.

Image Generation Image Morphing +3

Paper
Code

ProContEXT: Exploring Progressive Context Transformer for Tracking

2 code implementations • 27 Oct 2022 • Jin-Peng Lan, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Xu Bao, Wangmeng Xiang, Yifeng Geng, Xuansong Xie

Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template.

Object Visual Object Tracking

Paper
Code

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation

1 code implementation • 18 Aug 2023 • Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie

Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain.

3D Human Pose Estimation Domain Adaptation

Paper
Code

Towards Deeply Unified Depth-aware Panoptic Segmentation with Bi-directional Guidance Learning

1 code implementation • ICCV 2023 • Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Jin-Peng Lan, Bin Luo, Yifeng Geng, Xuansong Xie

Our method sets the new state of the art for depth-aware panoptic segmentation on both Cityscapes-DVPS and SemKITTI-DVPS datasets.

Depth Estimation Panoptic Segmentation +1

Paper
Code

LongShortNet: Exploring Temporal and Semantic Features Fusion in Streaming Perception

2 code implementations • 27 Oct 2022 • Chenyang Li, Zhi-Qi Cheng, Jun-Yan He, Pengyu Li, Bin Luo, Hanyuan Chen, Yifeng Geng, Jin-Peng Lan, Xuansong Xie

Streaming perception is a critical task in autonomous driving that requires balancing the latency and accuracy of the autopilot system.

Autonomous Driving

Paper
Code

Region-adaptive Texture Enhancement for Detailed Person Image Synthesis

1 code implementation • 26 May 2020 • Lingbo Yang, Pan Wang, Xinfeng Zhang, Shanshe Wang, Zhanning Gao, Peiran Ren, Xuansong Xie, Siwei Ma, Wen Gao

The ability to produce convincing textural details is essential for the fidelity of synthesized person images.

Ranked #4 on Pose Transfer on Deep-Fashion

Pose Transfer

Paper
Code

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

1 code implementation • 25 May 2023 • Xu Bao, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Jingdong Sun, Hanbing Liu, Wei Liu, Bin Luo, Yifeng Geng, Xuansong Xie

By spearheading the integration of Multilateration with facial analysis, KeyPosS marks a paradigm shift in facial landmark detection.

Benchmarking Face Recognition +3

Paper
Code

Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

1 code implementation • 4 Sep 2023 • Hanbing Liu, Wangmeng Xiang, Jun-Yan He, Zhi-Qi Cheng, Bin Luo, Yifeng Geng, Xuansong Xie

Accurately estimating the 3D pose of humans in video sequences requires both accuracy and a well-structured architecture.

3D Human Pose Estimation

Paper
Code

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

1 code implementation • 30 Mar 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie

Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research.

Autonomous Driving

Paper
Code

PointDC:Unsupervised Semantic Segmentation of 3D Point Clouds via Cross-modal Distillation and Super-Voxel Clustering

1 code implementation • 18 Apr 2023 • Zisheng Chen, Hongbin Xu, Weitao Chen, Zhipeng Zhou, Haihong Xiao, Baigui Sun, Xuansong Xie, Wenxiong Kang

Semantic segmentation of point clouds usually requires exhausting efforts of human annotations, hence it attracts wide attention to the challenging topic of learning from unlabeled or weaker forms of annotations.

Clustering Segmentation +1

Paper
Code

PointDC: Unsupervised Semantic Segmentation of 3D Point Clouds via Cross-Modal Distillation and Super-Voxel Clustering

1 code implementation • ICCV 2023 • Zisheng Chen, Hongbin Xu, Weitao Chen, Zhipeng Zhou, Haihong Xiao, Baigui Sun, Xuansong Xie, Wenxiong Kang

Semantic segmentation of point clouds usually requires exhausting efforts of human annotations, hence it attracts wide attention to a challenging topic of learning from unlabeled or weaker form of annotations.

Clustering Segmentation +1

Paper
Code

Boosting Novel Category Discovery Over Domains with Soft Contrastive Learning and All in One Classifier

1 code implementation • ICCV 2023 • Zelin Zang, Lei Shang, Senqiao Yang, Fei Wang, Baigui Sun, Xuansong Xie, Stan Z. Li

The SCL loss weakens the adverse effects of the data augmentation view-noise problem which is amplified in domain transfer tasks.

Ranked #3 on Universal Domain Adaptation on Office-31

Contrastive Learning Data Augmentation +2

Paper
Code

Automated Segmentation of Pulmonary Lobes using Coordination-Guided Deep Neural Networks

2 code implementations • 19 Apr 2019 • Wenjia Wang, Junxuan Chen, Jie Zhao, Ying Chi, Xuansong Xie, Li Zhang, Xian-Sheng Hua

The proposed model is trained and evaluated on a few publicly available datasets and has achieved the state-of-the-art accuracy with a mean Dice coefficient index of 0. 947 $\pm$ 0. 044.

Segmentation

Paper
Code

Generating Persuasive Visual Storylines for Promotional Videos

no code implementations • 30 Aug 2019 • Chang Liu, Yi Dong, Han Yu, Zhiqi Shen, Zhanning Gao, Pan Wang, Changgong Zhang, Peiran Ren, Xuansong Xie, Lizhen Cui, Chunyan Miao

Video contents have become a critical tool for promoting products in E-commerce.

Clustering Persuasiveness +1

Paper
Add Code

A Multi-Task Learning Framework for Extracting Bacteria Biotope Information

no code implementations • WS 2019 • Qi Zhang, Chao Liu, Ying Chi, Xuansong Xie, Xian-Sheng Hua

This paper presents a novel transfer multi-task learning method for Bacteria Biotope rel+ner task at BioNLP-OST 2019.

Multi-Task Learning NER +2

Paper
Add Code

Towards Realistic 3D Embedding via View Alignment

no code implementations • 14 Jul 2020 • Changgong Zhang, Fangneng Zhan, Shijian Lu, Feiying Ma, Xuansong Xie

Recent advances in generative adversarial networks (GANs) have achieved great success in automated image composition that generates new images by embedding interested foreground objects into background images automatically.

Paper
Add Code

Adversarial Image Composition with Auxiliary Illumination

no code implementations • 17 Sep 2020 • Fangneng Zhan, Shijian Lu, Changgong Zhang, Feiying Ma, Xuansong Xie

State-of-the-art methods strive to harmonize the composed image by adapting the style of foreground objects to be compatible with the background image, whereas the potential shadow of foreground objects within the composed image which is critical to the composition realism is largely neglected.

Paper
Add Code

EMLight: Lighting Estimation via Spherical Distribution Approximation

no code implementations • 21 Dec 2020 • Fangneng Zhan, Changgong Zhang, Yingchen Yu, Yuan Chang, Shijian Lu, Feiying Ma, Xuansong Xie

Motivated by the Earth Mover distance, we design a novel spherical mover's loss that guides to regress light distribution parameters accurately by taking advantage of the subtleties of spherical distribution.

Lighting Estimation regression

Paper
Add Code

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

no code implementations • 26 Apr 2021 • Yingchen Yu, Fangneng Zhan, Rongliang Wu, Jianxiong Pan, Kaiwen Cui, Shijian Lu, Feiying Ma, Xuansong Xie, Chunyan Miao

With image-level attention, transformers enable to model long-range dependencies and generate diverse contents with autoregressive modeling of pixel-sequence distributions.

Image Inpainting Language Modelling

Paper
Add Code

Unbalanced Feature Transport for Exemplar-based Image Translation

no code implementations • CVPR 2021 • Fangneng Zhan, Yingchen Yu, Kaiwen Cui, Gongjie Zhang, Shijian Lu, Jianxiong Pan, Changgong Zhang, Feiying Ma, Xuansong Xie, Chunyan Miao

In addition, we design a semantic-activation normalization scheme that injects style features of exemplars into the image translation process successfully.

Image-to-Image Translation Semantic Segmentation +1

Paper
Add Code

Sparse Needlets for Lighting Estimation with Spherical Transport Loss

no code implementations • ICCV 2021 • Fangneng Zhan, Changgong Zhang, WenBo Hu, Shijian Lu, Feiying Ma, Xuansong Xie, Ling Shao

Accurate lighting estimation is challenging yet critical to many computer vision and computer graphics tasks such as high-dynamic-range (HDR) relighting.

Lighting Estimation

Paper
Add Code

Noise-Resistant Deep Metric Learning with Probabilistic Instance Filtering

no code implementations • 3 Aug 2021 • Chang Liu, Han Yu, Boyang Li, Zhiqi Shen, Zhanning Gao, Peiran Ren, Xuansong Xie, Lizhen Cui, Chunyan Miao

Noisy labels are commonly found in real-world data, which cause performance degradation of deep neural networks.

Metric Learning

Paper
Add Code

Unpaired Cartoon Image Synthesis via Gated Cycle Mapping

no code implementations • CVPR 2022 • Yifang Men, Yuan YAO, Miaomiao Cui, Zhouhui Lian, Xuansong Xie, Xian-Sheng Hua

Experimental results demonstrate the superiority of the proposed method over the state of the art and validate its effectiveness in the brand-new task of general cartoon image synthesis.

Image Generation Video Generation

Paper
Add Code

Semi-supervised Deep Multi-view Stereo

no code implementations • 24 Jul 2022 • Hongbin Xu, Weitao Chen, Yang Liu, Zhipeng Zhou, Haihong Xiao, Baigui Sun, Xuansong Xie, Wenxiong Kang

For further troublesome case that the basic assumption is conflicted in MVS data, we propose a novel style consistency loss to alleviate the negative effect caused by the distribution gap.

Paper
Add Code

Boosting Novel Category Discovery Over Domains with Soft Contrastive Learning and All-in-One Classifier

no code implementations • 21 Nov 2022 • Zelin Zang, Lei Shang, Senqiao Yang, Fei Wang, Baigui Sun, Xuansong Xie, Stan Z. Li

The SCL loss weakens the adverse effects of the data augmentation view-noise problem which is amplified in domain transfer tasks.

Contrastive Learning Data Augmentation +3

Paper
Add Code

Improving Training and Inference of Face Recognition Models via Random Temperature Scaling

no code implementations • 2 Dec 2022 • Lei Shang, Mouxiao Huang, Wu Shi, Yuchen Liu, Yang Liu, Fei Wang, Baigui Sun, Xuansong Xie, Yu Qiao

Intuitively, FR algorithms can benefit from both the estimation of uncertainty and the detection of out-of-distribution (OOD) samples.

Face Recognition Out of Distribution (OOD) Detection

Paper
Add Code

Synthesizing Realistic Image Restoration Training Pairs: A Diffusion Approach

no code implementations • 13 Mar 2023 • Tao Yang, Peiran Ren, Xuansong Xie, Lei Zhang

In supervised image restoration tasks, one key issue is how to obtain the aligned high-quality (HQ) and low-quality (LQ) training image pairs.

Denoising Image Restoration +1

Paper
Add Code

Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection

no code implementations • CVPR 2023 • Xiaolin Song, Binghui Chen, Pengyu Li, Jun-Yan He, Biao Wang, Yifeng Geng, Xuansong Xie, Honggang Zhang

End-to-end pedestrian detection focuses on training a pedestrian detection model via discarding the Non-Maximum Suppression (NMS) post-processing.

Pedestrian Detection

Paper
Add Code

CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo

no code implementations • 17 May 2023 • Weitao Chen, Hongbin Xu, Zhipeng Zhou, Yang Liu, Baigui Sun, Wenxiong Kang, Xuansong Xie

The Residual Depth-Aware Cost Transformer(RDACT) is proposed to aggregate long-range features on cost volume via self-attention mechanisms along the depth and spatial dimensions.

Paper
Add Code

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models

no code implementations • 20 Oct 2023 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Yusen Hu, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou

This paper introduces WordArt Designer, a user-driven framework for artistic typography synthesis, relying on the Large Language Model (LLM).

Language Modelling Large Language Model

Paper
Add Code

FMViT: A multiple-frequency mixing Vision Transformer

no code implementations • 9 Nov 2023 • Wei Tan, Yifeng Geng, Xuansong Xie

On CoreML, FMViT outperforms MobileOne by 2. 6% in top-1 accuracy on the ImageNet dataset, with inference latency comparable to MobileOne (78. 5% vs. 75. 9%).

Paper
Add Code

Boosting3D: High-Fidelity Image-to-3D by Boosting 2D Diffusion Prior to 3D Prior with Progressive Learning

no code implementations • 22 Nov 2023 • Kai Yu, Jinlin Liu, Mengyang Feng, Miaomiao Cui, Xuansong Xie

After the progressive training, the LoRA learns the 3D information of the generated object and eventually turns to an object-level 3D prior.

3D Generation Image to 3D +1

Paper
Add Code

DreaMoving: A Human Video Generation Framework based on Diffusion Models

no code implementations • 8 Dec 2023 • Mengyang Feng, Jinlin Liu, Kai Yu, Yuan YAO, Zheng Hui, Xiefan Guo, Xianhui Lin, Haolan Xue, Chen Shi, Xiaowen Li, Aojie Li, Xiaoyang Kang, Biwen Lei, Miaomiao Cui, Peiran Ren, Xuansong Xie

In this paper, we present DreaMoving, a diffusion-based controllable video generation framework to produce high-quality customized human videos.

Video Generation

Paper
Add Code

DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors

no code implementations • 28 Dec 2023 • Biwen Lei, Kai Yu, Mengyang Feng, Miaomiao Cui, Xuansong Xie

Extensive experiments demonstrate that the proposed framework achieves excellent results in both domain adaptation and text-to-avatar tasks, outperforming existing methods in terms of generation quality and efficiency.

3D Generation Domain Adaptation

Paper
Add Code

En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data

no code implementations • 2 Jan 2024 • Yifang Men, Biwen Lei, Yuan YAO, Miaomiao Cui, Zhouhui Lian, Xuansong Xie

We present En3D, an enhanced generative scheme for sculpting high-quality 3D human avatars.

Anatomy

Paper
Add Code

WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope

no code implementations • 3 Jan 2024 • Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Yusen Hu, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou

This paper introduces the WordArt Designer API, a novel framework for user-driven artistic typography synthesis utilizing Large Language Models (LLMs) on ModelScope.

Paper
Add Code

DivAvatar: Diverse 3D Avatar Generation with a Single Prompt

no code implementations • 27 Feb 2024 • Weijing Tao, Biwen Lei, Kunhao Liu, Shijian Lu, Miaomiao Cui, Xuansong Xie, Chunyan Miao

We design DivAvatar, a novel framework that generates diverse avatars, empowering 3D creatives with a multitude of distinct and richly varied 3D avatars from a single text prompt.

Paper
Add Code

Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception

no code implementations • 5 Mar 2024 • Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Jin-Peng Lan, Bin Luo, Xuansong Xie

Multimodal Large Language Model (MLLMs) leverages Large Language Models as a cognitive framework for diverse visual-language tasks.

Language Modelling Large Language Model +2

Paper
Add Code

ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model

no code implementations • 7 Apr 2024 • Binghui Chen, Wenyu Li, Yifeng Geng, Xuansong Xie, WangMeng Zuo

Specifically, we propose a shoe-wearing system, called Shoe-Model, to generate plausible images of human legs interacting with the given shoes.

Image Generation Marketing

Paper
Add Code

Strictly-ID-Preserved and Controllable Accessory Advertising Image Generation

no code implementations • 7 Apr 2024 • Youze Xue, Binghui Chen, Yifeng Geng, Xuansong Xie, Jiansheng Chen, Hongbing Ma

Customized generative text-to-image models have the ability to produce images that closely resemble a given subject.

Image Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.