Search Results for author: Pengfei Wan

Found 29 papers, 14 papers with code

UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark

no code implementations • 15 Apr 2024 • Zhaokun Zhou, Qiulin Wang, Bin Lin, Yiwei Su, Rui Chen, Xin Tao, Amin Zheng, Li Yuan, Pengfei Wan, Di Zhang

To further evaluate the IAA capability of MLLMs, we construct the UNIAA-Bench, which consists of three aesthetic levels: Perception, Description, and Assessment.

Language Modelling Large Language Model

Paper
Add Code

Motion Inversion for Video Customization

no code implementations • 29 Mar 2024 • Luozhou Wang, Guibao Shen, Yixun Liang, Xin Tao, Pengfei Wan, Di Zhang, Yijun Li, Yingcong Chen

In this research, we present a novel approach to motion customization in video generation, addressing the widespread gap in the thorough exploration of motion representation within video generative models.

Video Generation

Paper
Add Code

VRMM: A Volumetric Relightable Morphable Head Model

no code implementations • 6 Feb 2024 • Haotian Yang, Mingwu Zheng, Chongyang Ma, Yu-Kun Lai, Pengfei Wan, Haibin Huang

In this paper, we introduce the Volumetric Relightable Morphable Model (VRMM), a novel volumetric and parametric facial prior for 3D face modeling.

3D Face Reconstruction Self-Supervised Learning

Paper
Add Code

Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

no code implementations • 5 Feb 2024 • Shiyuan Yang, Liang Hou, Haibin Huang, Chongyang Ma, Pengfei Wan, Di Zhang, Xiaodong Chen, Jing Liao

In practice, users often desire the ability to control object motion and camera movement independently for customized video creation.

Object Video Generation

Paper
Add Code

I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models

no code implementations • 27 Dec 2023 • Xun Guo, Mingwu Zheng, Liang Hou, Yuan Gao, Yufan Deng, Pengfei Wan, Di Zhang, Yufan Liu, Weiming Hu, ZhengJun Zha, Haibin Huang, Chongyang Ma

I2V-Adapter adeptly propagates the unnoised input image to subsequent noised frames through a cross-frame attention mechanism, maintaining the identity of the input image without any changes to the pretrained T2V model.

Video Generation

Paper
Add Code

A-SDM: Accelerating Stable Diffusion through Redundancy Removal and Performance Optimization

no code implementations • 24 Dec 2023 • Jinchao Zhu, Yuxuan Wang, Xiaobing Tu, Siyuan Pan, Pengfei Wan, Gao Huang

The Stable Diffusion Model (SDM) is a popular and efficient text-to-image (t2i) generation and image-to-image (i2i) generation model.

Quantization

Paper
Add Code

DVIS++: Improved Decoupled Framework for Universal Video Segmentation

1 code implementation • 20 Dec 2023 • Tao Zhang, Xingye Tian, Yikang Zhou, Shunping Ji, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu Wu

We present the \textbf{D}ecoupled \textbf{VI}deo \textbf{S}egmentation (DVIS) framework, a novel approach for the challenging task of universal video segmentation, including video instance segmentation (VIS), video semantic segmentation (VSS), and video panoptic segmentation (VPS).

Ranked #1 on Video Semantic Segmentation on VSPW

Contrastive Learning Denoising +6

Paper
Code

Stable Segment Anything Model

1 code implementation • 27 Nov 2023 • Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu-Wing Tai, Chi-Keung Tang

Thus, our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality, with 3) minimal learnable parameters (0. 08 M) and fast adaptation (by 1 training epoch).

Segmentation

Paper
Code

Temporal-Aware Refinement for Video-based Human Pose and Shape Recovery

no code implementations • 16 Nov 2023 • Ming Chen, Yan Zhou, Weihua Jian, Pengfei Wan, Zhongyuan Wang

Though significant progress in human pose and shape recovery from monocular RGB images has been made in recent years, obtaining 3D human motion with high accuracy and temporal consistency from videos remains challenging.

TAR

Paper
Add Code

Towards Practical Capture of High-Fidelity Relightable Avatars

no code implementations • 8 Sep 2023 • Haotian Yang, Mingwu Zheng, Wanquan Feng, Haibin Huang, Yu-Kun Lai, Pengfei Wan, Zhongyuan Wang, Chongyang Ma

Specifically, TRAvatar is trained with dynamic image sequences captured in a Light Stage under varying lighting conditions, enabling realistic relighting and real-time animation for avatars in diverse scenes.

Paper
Add Code

1st Place Solution for the 5th LSVOS Challenge: Video Instance Segmentation

1 code implementation • 28 Aug 2023 • Tao Zhang, Xingye Tian, Yikang Zhou, Yu Wu, Shunping Ji, Cilin Yan, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan

Video instance segmentation is a challenging task that serves as the cornerstone of numerous downstream applications, including video editing and autonomous driving.

Autonomous Driving Denoising +6

114

Paper
Code

1st Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation

1 code implementation • 7 Jun 2023 • Tao Zhang, Xingye Tian, Haoran Wei, Yu Wu, Shunping Ji, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan

In this report, we successfully validated the effectiveness of the decoupling strategy in video panoptic segmentation.

Autonomous Driving Segmentation +2

114

Paper
Code

DVIS: Decoupled Video Instance Segmentation Framework

1 code implementation • ICCV 2023 • Tao Zhang, Xingye Tian, Yu Wu, Shunping Ji, Xuebo Wang, Yuan Zhang, Pengfei Wan

The efficacy of the decoupling strategy relies on two crucial elements: 1) attaining precise long-term alignment outcomes via frame-by-frame association during tracking, and 2) the effective utilization of temporal information predicated on the aforementioned accurate alignment outcomes during refinement.

Ranked #3 on Video Panoptic Segmentation on VIPSeg

Autonomous Driving Instance Segmentation +5

114

Paper
Code

Multi-Modal Face Stylization with a Generative Prior

no code implementations • 29 May 2023 • Mengtian Li, Yi Dong, Minxuan Lin, Haibin Huang, Pengfei Wan, Chongyang Ma

We also introduce a two-stage training strategy, where we train the encoder in the first stage to align the feature maps with StyleGAN and enable a faithful reconstruction of input faces.

Face Generation

Paper
Add Code

Bridging CLIP and StyleGAN through Latent Alignment for Image Editing

no code implementations • 10 Oct 2022 • Wanfeng Zheng, Qiang Li, Xiaoyan Guo, Pengfei Wan, Zhongyuan Wang

More specifically, our efforts consist of three parts: 1) a data-free training strategy to train latent mappers to bridge the latent space of CLIP and StyleGAN; 2) for more precise mapping, temporal relative consistency is proposed to address the knowledge distribution bias problem among different latent spaces; 3) to refine the mapped latent in s space, adaptive style mixing is also proposed.

Image Manipulation Language Modelling +1

Paper
Add Code

Augmentation-Aware Self-Supervision for Data-Efficient GAN Training

1 code implementation • NeurIPS 2023 • Liang Hou, Qi Cao, Yige Yuan, Songtao Zhao, Chongyang Ma, Siyuan Pan, Pengfei Wan, Zhongyuan Wang, HuaWei Shen, Xueqi Cheng

Training generative adversarial networks (GANs) with limited data is challenging because the discriminator is prone to overfitting.

Data Augmentation Representation Learning

Paper
Code

ITTR: Unpaired Image-to-Image Translation with Transformers

no code implementations • 30 Mar 2022 • Wanfeng Zheng, Qiang Li, Guoxin Zhang, Pengfei Wan, Zhongyuan Wang

Unpaired image-to-image translation is to translate an image from a source domain to a target domain without paired training data.

Image-to-Image Translation Translation

Paper
Add Code

Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

no code implementations • CVPR 2022 • Linfeng Zhang, Xin Chen, Xiaobing Tu, Pengfei Wan, Ning Xu, Kaisheng Ma

Instead of directly distilling the generated images of teachers, wavelet knowledge distillation first decomposes the images into different frequency bands with discrete wavelet transformation and then only distills the high frequency bands.

Image-to-Image Translation Knowledge Distillation +1

Paper
Add Code

PMP-Net++: Point Cloud Completion by Transformer-Enhanced Multi-step Point Moving Paths

1 code implementation • 19 Feb 2022 • Xin Wen, Peng Xiang, Zhizhong Han, Yan-Pei Cao, Pengfei Wan, Wen Zheng, Yu-Shen Liu

It moves each point of incomplete input to obtain a complete point cloud, where total distance of point moving paths (PMPs) should be the shortest.

Ranked #1 on Point Cloud Completion on Completion3D

Point Cloud Completion Representation Learning

Paper
Code

Snowflake Point Deconvolution for Point Cloud Completion and Generation with Skip-Transformer

1 code implementation • 18 Feb 2022 • Peng Xiang, Xin Wen, Yu-Shen Liu, Yan-Pei Cao, Pengfei Wan, Wen Zheng, Zhizhong Han

Our insight into the detailed geometry is to introduce a skip-transformer in the SPD to learn the point splitting patterns that can best fit the local regions.

Ranked #5 on Point Cloud Completion on ShapeNet

Image Reconstruction Point Cloud Completion

135

Paper
Code

Debiased Self-Training for Semi-Supervised Learning

1 code implementation • 15 Feb 2022 • Baixu Chen, Junguang Jiang, Ximei Wang, Pengfei Wan, Jianmin Wang, Mingsheng Long

Yet these datasets are time-consuming and labor-exhaustive to obtain on realistic tasks.

Object Recognition Scene Classification +2

Paper
Code

Assessing a Single Image in Reference-Guided Image Synthesis

no code implementations • 8 Dec 2021 • Jiayi Guo, Chaoqun Du, Jiangshan Wang, Huijuan Huang, Pengfei Wan, Gao Huang

For Reference-guided Image Synthesis (RIS) tasks, i. e., rendering a source image in the style of another reference image, where assessing the quality of a single generated image is crucial, these metrics are not applicable.

Image Generation

Paper
Add Code

BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation

3 code implementations • NeurIPS 2021 • Mingcong Liu, Qiang Li, Zekui Qin, Guoxin Zhang, Pengfei Wan, Wen Zheng

Specifically, we first train a self-supervised style encoder on the generic artistic dataset to extract the representations of arbitrary styles.

Face Generation

501

Paper
Code

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

2 code implementations • ICCV 2021 • Peng Xiang, Xin Wen, Yu-Shen Liu, Yan-Pei Cao, Pengfei Wan, Wen Zheng, Zhizhong Han

However, previous methods usually suffered from discrete nature of point cloud and unstructured prediction of points in local regions, which makes it hard to reveal fine local geometric details on the complete shape.

Point Cloud Completion

520

Paper
Code

Exploring Set Similarity for Dense Self-supervised Representation Learning

no code implementations • CVPR 2022 • Zhaoqing Wang, Qiang Li, Guoxin Zhang, Pengfei Wan, Wen Zheng, Nannan Wang, Mingming Gong, Tongliang Liu

By considering the spatial correspondence, dense self-supervised representation learning has achieved superior performance on various dense prediction tasks.

Instance Segmentation Keypoint Detection +5

Paper
Add Code

Cycle4Completion: Unpaired Point Cloud Completion using Cycle Transformation with Missing Region Coding

1 code implementation • CVPR 2021 • Xin Wen, Zhizhong Han, Yan-Pei Cao, Pengfei Wan, Wen Zheng, Yu-Shen Liu

We provide a comprehensive evaluation in experiments, which shows that our model with the learned bidirectional geometry correspondence outperforms state-of-the-art unpaired completion methods.

Point Cloud Completion

Paper
Code

Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration

1 code implementation • CVPR 2021 • Xingyu Chen, Yufeng Liu, Chongyang Ma, Jianlong Chang, Huayan Wang, Tian Chen, Xiaoyan Guo, Pengfei Wan, Wen Zheng

In the root-relative mesh recovery task, we exploit semantic relations among joints to generate a 3D mesh from the extracted 2D cues.

Position

324

Paper
Code

PMP-Net: Point Cloud Completion by Learning Multi-step Point Moving Paths

1 code implementation • CVPR 2021 • Xin Wen, Peng Xiang, Zhizhong Han, Yan-Pei Cao, Pengfei Wan, Wen Zheng, Yu-Shen Liu

As a result, the network learns a strict and unique correspondence on point-level, which can capture the detailed topology and structure relationships between the incomplete shape and the complete target, and thus improves the quality of the predicted complete shape.

Point Cloud Completion

Paper
Code

Precision Enhancement of 3D Surfaces from Multiple Compressed Depth Maps

no code implementations • 25 Feb 2014 • Pengfei Wan, Gene Cheung, Philip A. Chou, Dinei Florencio, Cha Zhang, Oscar C. Au

In texture-plus-depth representation of a 3D scene, depth maps from different camera viewpoints are typically lossily compressed via the classical transform coding / coefficient quantization paradigm.

Quantization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.