Search Results for author: Pengfei Wan

Found 29 papers, 14 papers with code

UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark

no code implementations15 Apr 2024 Zhaokun Zhou, Qiulin Wang, Bin Lin, Yiwei Su, Rui Chen, Xin Tao, Amin Zheng, Li Yuan, Pengfei Wan, Di Zhang

To further evaluate the IAA capability of MLLMs, we construct the UNIAA-Bench, which consists of three aesthetic levels: Perception, Description, and Assessment.

Language Modelling Large Language Model

Motion Inversion for Video Customization

no code implementations29 Mar 2024 Luozhou Wang, Guibao Shen, Yixun Liang, Xin Tao, Pengfei Wan, Di Zhang, Yijun Li, Yingcong Chen

In this research, we present a novel approach to motion customization in video generation, addressing the widespread gap in the thorough exploration of motion representation within video generative models.

Video Generation

VRMM: A Volumetric Relightable Morphable Head Model

no code implementations6 Feb 2024 Haotian Yang, Mingwu Zheng, Chongyang Ma, Yu-Kun Lai, Pengfei Wan, Haibin Huang

In this paper, we introduce the Volumetric Relightable Morphable Model (VRMM), a novel volumetric and parametric facial prior for 3D face modeling.

3D Face Reconstruction Self-Supervised Learning

Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

no code implementations5 Feb 2024 Shiyuan Yang, Liang Hou, Haibin Huang, Chongyang Ma, Pengfei Wan, Di Zhang, Xiaodong Chen, Jing Liao

In practice, users often desire the ability to control object motion and camera movement independently for customized video creation.

Object Video Generation

I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models

no code implementations27 Dec 2023 Xun Guo, Mingwu Zheng, Liang Hou, Yuan Gao, Yufan Deng, Pengfei Wan, Di Zhang, Yufan Liu, Weiming Hu, ZhengJun Zha, Haibin Huang, Chongyang Ma

I2V-Adapter adeptly propagates the unnoised input image to subsequent noised frames through a cross-frame attention mechanism, maintaining the identity of the input image without any changes to the pretrained T2V model.

Video Generation

A-SDM: Accelerating Stable Diffusion through Redundancy Removal and Performance Optimization

no code implementations24 Dec 2023 Jinchao Zhu, Yuxuan Wang, Xiaobing Tu, Siyuan Pan, Pengfei Wan, Gao Huang

The Stable Diffusion Model (SDM) is a popular and efficient text-to-image (t2i) generation and image-to-image (i2i) generation model.


DVIS++: Improved Decoupled Framework for Universal Video Segmentation

1 code implementation20 Dec 2023 Tao Zhang, Xingye Tian, Yikang Zhou, Shunping Ji, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu Wu

We present the \textbf{D}ecoupled \textbf{VI}deo \textbf{S}egmentation (DVIS) framework, a novel approach for the challenging task of universal video segmentation, including video instance segmentation (VIS), video semantic segmentation (VSS), and video panoptic segmentation (VPS).

Contrastive Learning Denoising +6

Stable Segment Anything Model

1 code implementation27 Nov 2023 Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu-Wing Tai, Chi-Keung Tang

Thus, our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality, with 3) minimal learnable parameters (0. 08 M) and fast adaptation (by 1 training epoch).


Temporal-Aware Refinement for Video-based Human Pose and Shape Recovery

no code implementations16 Nov 2023 Ming Chen, Yan Zhou, Weihua Jian, Pengfei Wan, Zhongyuan Wang

Though significant progress in human pose and shape recovery from monocular RGB images has been made in recent years, obtaining 3D human motion with high accuracy and temporal consistency from videos remains challenging.


Towards Practical Capture of High-Fidelity Relightable Avatars

no code implementations8 Sep 2023 Haotian Yang, Mingwu Zheng, Wanquan Feng, Haibin Huang, Yu-Kun Lai, Pengfei Wan, Zhongyuan Wang, Chongyang Ma

Specifically, TRAvatar is trained with dynamic image sequences captured in a Light Stage under varying lighting conditions, enabling realistic relighting and real-time animation for avatars in diverse scenes.

1st Place Solution for the 5th LSVOS Challenge: Video Instance Segmentation

1 code implementation28 Aug 2023 Tao Zhang, Xingye Tian, Yikang Zhou, Yu Wu, Shunping Ji, Cilin Yan, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan

Video instance segmentation is a challenging task that serves as the cornerstone of numerous downstream applications, including video editing and autonomous driving.

Autonomous Driving Denoising +6

1st Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation

1 code implementation7 Jun 2023 Tao Zhang, Xingye Tian, Haoran Wei, Yu Wu, Shunping Ji, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan

In this report, we successfully validated the effectiveness of the decoupling strategy in video panoptic segmentation.

Autonomous Driving Segmentation +2

DVIS: Decoupled Video Instance Segmentation Framework

1 code implementation ICCV 2023 Tao Zhang, Xingye Tian, Yu Wu, Shunping Ji, Xuebo Wang, Yuan Zhang, Pengfei Wan

The efficacy of the decoupling strategy relies on two crucial elements: 1) attaining precise long-term alignment outcomes via frame-by-frame association during tracking, and 2) the effective utilization of temporal information predicated on the aforementioned accurate alignment outcomes during refinement.

Autonomous Driving Instance Segmentation +5

Multi-Modal Face Stylization with a Generative Prior

no code implementations29 May 2023 Mengtian Li, Yi Dong, Minxuan Lin, Haibin Huang, Pengfei Wan, Chongyang Ma

We also introduce a two-stage training strategy, where we train the encoder in the first stage to align the feature maps with StyleGAN and enable a faithful reconstruction of input faces.

Decoder Face Generation

Bridging CLIP and StyleGAN through Latent Alignment for Image Editing

no code implementations10 Oct 2022 Wanfeng Zheng, Qiang Li, Xiaoyan Guo, Pengfei Wan, Zhongyuan Wang

More specifically, our efforts consist of three parts: 1) a data-free training strategy to train latent mappers to bridge the latent space of CLIP and StyleGAN; 2) for more precise mapping, temporal relative consistency is proposed to address the knowledge distribution bias problem among different latent spaces; 3) to refine the mapped latent in s space, adaptive style mixing is also proposed.

Image Manipulation Language Modelling +1

ITTR: Unpaired Image-to-Image Translation with Transformers

no code implementations30 Mar 2022 Wanfeng Zheng, Qiang Li, Guoxin Zhang, Pengfei Wan, Zhongyuan Wang

Unpaired image-to-image translation is to translate an image from a source domain to a target domain without paired training data.

Image-to-Image Translation Translation

Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

no code implementations CVPR 2022 Linfeng Zhang, Xin Chen, Xiaobing Tu, Pengfei Wan, Ning Xu, Kaisheng Ma

Instead of directly distilling the generated images of teachers, wavelet knowledge distillation first decomposes the images into different frequency bands with discrete wavelet transformation and then only distills the high frequency bands.

Image-to-Image Translation Knowledge Distillation +1

PMP-Net++: Point Cloud Completion by Transformer-Enhanced Multi-step Point Moving Paths

1 code implementation19 Feb 2022 Xin Wen, Peng Xiang, Zhizhong Han, Yan-Pei Cao, Pengfei Wan, Wen Zheng, Yu-Shen Liu

It moves each point of incomplete input to obtain a complete point cloud, where total distance of point moving paths (PMPs) should be the shortest.

Point Cloud Completion Representation Learning

Snowflake Point Deconvolution for Point Cloud Completion and Generation with Skip-Transformer

1 code implementation18 Feb 2022 Peng Xiang, Xin Wen, Yu-Shen Liu, Yan-Pei Cao, Pengfei Wan, Wen Zheng, Zhizhong Han

Our insight into the detailed geometry is to introduce a skip-transformer in the SPD to learn the point splitting patterns that can best fit the local regions.

Image Reconstruction Point Cloud Completion

Assessing a Single Image in Reference-Guided Image Synthesis

no code implementations8 Dec 2021 Jiayi Guo, Chaoqun Du, Jiangshan Wang, Huijuan Huang, Pengfei Wan, Gao Huang

For Reference-guided Image Synthesis (RIS) tasks, i. e., rendering a source image in the style of another reference image, where assessing the quality of a single generated image is crucial, these metrics are not applicable.

Image Generation

BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation

3 code implementations NeurIPS 2021 Mingcong Liu, Qiang Li, Zekui Qin, Guoxin Zhang, Pengfei Wan, Wen Zheng

Specifically, we first train a self-supervised style encoder on the generic artistic dataset to extract the representations of arbitrary styles.

Face Generation

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

2 code implementations ICCV 2021 Peng Xiang, Xin Wen, Yu-Shen Liu, Yan-Pei Cao, Pengfei Wan, Wen Zheng, Zhizhong Han

However, previous methods usually suffered from discrete nature of point cloud and unstructured prediction of points in local regions, which makes it hard to reveal fine local geometric details on the complete shape.

Point Cloud Completion

Exploring Set Similarity for Dense Self-supervised Representation Learning

no code implementations CVPR 2022 Zhaoqing Wang, Qiang Li, Guoxin Zhang, Pengfei Wan, Wen Zheng, Nannan Wang, Mingming Gong, Tongliang Liu

By considering the spatial correspondence, dense self-supervised representation learning has achieved superior performance on various dense prediction tasks.

Instance Segmentation Keypoint Detection +5

Cycle4Completion: Unpaired Point Cloud Completion using Cycle Transformation with Missing Region Coding

1 code implementation CVPR 2021 Xin Wen, Zhizhong Han, Yan-Pei Cao, Pengfei Wan, Wen Zheng, Yu-Shen Liu

We provide a comprehensive evaluation in experiments, which shows that our model with the learned bidirectional geometry correspondence outperforms state-of-the-art unpaired completion methods.

Point Cloud Completion

PMP-Net: Point Cloud Completion by Learning Multi-step Point Moving Paths

1 code implementation CVPR 2021 Xin Wen, Peng Xiang, Zhizhong Han, Yan-Pei Cao, Pengfei Wan, Wen Zheng, Yu-Shen Liu

As a result, the network learns a strict and unique correspondence on point-level, which can capture the detailed topology and structure relationships between the incomplete shape and the complete target, and thus improves the quality of the predicted complete shape.

Point Cloud Completion

Precision Enhancement of 3D Surfaces from Multiple Compressed Depth Maps

no code implementations25 Feb 2014 Pengfei Wan, Gene Cheung, Philip A. Chou, Dinei Florencio, Cha Zhang, Oscar C. Au

In texture-plus-depth representation of a 3D scene, depth maps from different camera viewpoints are typically lossily compressed via the classical transform coding / coefficient quantization paradigm.


Cannot find the paper you are looking for? You can Submit a new open access paper.