Search Results for author: Wayne Wu

Found 56 papers, 37 papers with code

Look at Boundary: A Boundary-Aware Face Alignment Algorithm

2 code implementations CVPR 2018 Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, Qiang Zhou

By utilising boundary information of 300-W dataset, our method achieves 3. 92% mean error with 0. 39% failure rate on COFW dataset, and 1. 25% mean error on AFLW-Full dataset.

Ranked #4 on Face Alignment on AFLW-19 (using extra training data)

Face Alignment Facial Landmark Detection

StyleGAN-Human: A Data-Centric Odyssey of Human Generation

4 code implementations25 Apr 2022 Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Chen Qian, Chen Change Loy, Wayne Wu, Ziwei Liu

In addition, a model zoo and human editing applications are demonstrated to facilitate future research in the community.

Image Generation

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation

1 code implementation CVPR 2021 Hang Zhou, Yasheng Sun, Wayne Wu, Chen Change Loy, Xiaogang Wang, Ziwei Liu

While speech content information can be defined by learning the intrinsic synchronization between audio-visual modalities, we identify that a pose code will be complementarily learned in a modulated convolution-based reconstruction framework.

Talking Face Generation

MotionBERT: A Unified Perspective on Learning Human Motion Representations

1 code implementation ICCV 2023 Wentao Zhu, Xiaoxuan Ma, Zhaoyang Liu, Libin Liu, Wayne Wu, Yizhou Wang

We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources.

 Ranked #1 on Monocular 3D Human Pose Estimation on Human3.6M (using extra training data)

3D Pose Estimation Action Recognition +3

Text2Human: Text-Driven Controllable Human Image Generation

2 code implementations31 May 2022 Yuming Jiang, Shuai Yang, Haonan Qiu, Wayne Wu, Chen Change Loy, Ziwei Liu

In this work, we present a text-driven controllable framework, Text2Human, for a high-quality and diverse human generation.

Human Parsing Image Generation

CelebV-Text: A Large-Scale Facial Text-Video Dataset

1 code implementation CVPR 2023 Jianhui Yu, Hao Zhu, Liming Jiang, Chen Change Loy, Weidong Cai, Wayne Wu

This paper presents CelebV-Text, a large-scale, diverse, and high-quality dataset of facial text-video pairs, to facilitate research on facial text-to-video generation tasks.

Text Generation Text-to-Video Generation +1

CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

1 code implementation25 Jul 2022 Hao Zhu, Wayne Wu, Wentao Zhu, Liming Jiang, Siwei Tang, Li Zhang, Ziwei Liu, Chen Change Loy

Large-scale datasets have played indispensable roles in the recent success of face generation/editing and significantly facilitated the advances of emerging research fields.

Attribute Face Generation +1

Text2Performer: Text-Driven Human Video Generation

1 code implementation ICCV 2023 Yuming Jiang, Shuai Yang, Tong Liang Koh, Wayne Wu, Chen Change Loy, Ziwei Liu

In this work, we present Text2Performer to generate vivid human videos with articulated motions from texts.

Video Generation

Audio-Driven Emotional Video Portraits

1 code implementation CVPR 2021 Xinya Ji, Hang Zhou, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu

In this work, we present Emotional Video Portraits (EVP), a system for synthesizing high-quality video portraits with vivid emotional dynamics driven by audios.

Disentanglement Face Generation

Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

2 code implementations NeurIPS 2021 Liming Jiang, Bo Dai, Wayne Wu, Chen Change Loy

Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images.

3DHumanGAN: 3D-Aware Human Image Generation with 3D Pose Mapping

1 code implementation ICCV 2023 Zhuoqian Yang, Shikai Li, Wayne Wu, Bo Dai

We present 3DHumanGAN, a 3D-aware generative adversarial network that synthesizes photorealistic images of full-body humans with consistent appearances under different view-angles and body-poses.

Generative Adversarial Network Image Generation

OrthoPlanes: A Novel Representation for Better 3D-Awareness of GANs

1 code implementation ICCV 2023 Honglin He, Zhuoqian Yang, Shikai Li, Bo Dai, Wayne Wu

We present a new method for generating realistic and view-consistent images with fine geometry from 2D image collections.

RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars

1 code implementation NeurIPS 2023 Dongwei Pan, Long Zhuo, Jingtan Piao, Huiwen Luo, Wei Cheng, Yuxin Wang, Siming Fan, Shengqi Liu, Lei Yang, Bo Dai, Ziwei Liu, Chen Change Loy, Chen Qian, Wayne Wu, Dahua Lin, Kwan-Yee Lin

It is a large-scale digital library for head avatars with three key attributes: 1) High Fidelity: all subjects are captured by 60 synchronized, high-resolution 2K cameras in 360 degrees.

2k Image Matting +2

TAM: Temporal Adaptive Module for Video Recognition

2 code implementations ICCV 2021 Zhao-Yang Liu, Li-Min Wang, Wayne Wu, Chen Qian, Tong Lu

Video data is with complex temporal dynamics due to various factors such as camera motion, speed variation, and different activities.

Action Recognition Video Recognition

Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis

1 code implementation25 Apr 2022 Wei Cheng, Su Xu, Jingtan Piao, Chen Qian, Wayne Wu, Kwan-Yee Lin, Hongsheng Li

Specifically, we compress the light fields for novel view human rendering as conditional implicit neural radiance fields from both geometry and appearance aspects.

Novel View Synthesis

Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis

1 code implementation11 Jul 2022 Long Zhuo, Guangcong Wang, Shikai Li, Wayne Wu, Ziwei Liu

In this paper, we present a spatial-temporal compression framework, \textbf{Fast-Vid2Vid}, which focuses on data aspects of generative models.

Knowledge Distillation Motion Compensation +1

StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3

1 code implementation16 Aug 2022 Haonan Qiu, Yuming Jiang, Hang Zhou, Wayne Wu, Ziwei Liu

Notably, StyleFaceV is capable of generating realistic $1024\times1024$ face videos even without high-resolution training videos.

Image Generation Video Generation

MonoHuman: Animatable Human Neural Field from Monocular Video

1 code implementation CVPR 2023 Zhengming Yu, Wei Cheng, Xian Liu, Wayne Wu, Kwan-Yee Lin

Recent works propose to graft a deformation network into the NeRF to further model the dynamics of the human neural field for animating vivid human motions.

ReliTalk: Relightable Talking Portrait Generation from a Single Video

1 code implementation5 Sep 2023 Haonan Qiu, Zhaoxi Chen, Yuming Jiang, Hang Zhou, Xiangyu Fan, Lei Yang, Wayne Wu, Ziwei Liu

Our key insight is to decompose the portrait's reflectance from implicitly learned audio-driven facial normals and images.

Single-Image Portrait Relighting

Everything's Talkin': Pareidolia Face Reenactment

1 code implementation7 Apr 2021 Linsen Song, Wayne Wu, Chaoyou Fu, Chen Qian, Chen Change Loy, Ran He

We present a new application direction named Pareidolia Face Reenactment, which is defined as animating a static illusory face to move in tandem with a human face in the video.

Face Reenactment Texture Synthesis

Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection

3 code implementations CVPR 2022 Jiaqi Tang, Zhaoyang Liu, Chen Qian, Wayne Wu, LiMin Wang

Generic event boundary detection is an important yet challenging task in video understanding, which aims at detecting the moments where humans naturally perceive event boundaries.

Boundary Detection Generic Event Boundary Detection +1

Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing

2 code implementations25 Apr 2022 Haoyue Cheng, Zhaoyang Liu, Hang Zhou, Chen Qian, Wayne Wu, LiMin Wang

This paper focuses on the weakly-supervised audio-visual video parsing task, which aims to recognize all events belonging to each modality and localize their temporal boundaries.

Denoising valid

VLG: General Video Recognition with Web Textual Knowledge

1 code implementation3 Dec 2022 Jintao Lin, Zhaoyang Liu, Wenhai Wang, Wayne Wu, LiMin Wang

Our VLG is first pre-trained on video and language datasets to learn a shared feature space, and then devises a flexible bi-modal attention head to collaborate high-level semantic concepts under different settings.

Video Recognition

Disentangling Content and Style via Unsupervised Geometry Distillation

1 code implementation ICLR Workshop DeepGenStruct 2019 Wayne Wu, Kaidi Cao, Cheng Li, Chen Qian, Chen Change Loy

It is challenging to disentangle an object into two orthogonal spaces of content and style since each can influence the visual observation differently and unpredictably.

Disentanglement

TransGaGa: Geometry-Aware Unsupervised Image-to-Image Translation

no code implementations CVPR 2019 Wayne Wu, Kaidi Cao, Cheng Li, Chen Qian, Chen Change Loy

Extensive experiments demonstrate the superior performance of our method to other state-of-the-art approaches, especially in the challenging near-rigid and non-rigid objects translation tasks.

Translation Unsupervised Image-To-Image Translation

Make a Face: Towards Arbitrary High Fidelity Face Manipulation

no code implementations ICCV 2019 Shengju Qian, Kwan-Yee Lin, Wayne Wu, Yangxiaokang Liu, Quan Wang, Fumin Shen, Chen Qian, Ran He

Recent studies have shown remarkable success in face manipulation task with the advance of GANs and VAEs paradigms, but the outputs are sometimes limited to low-resolution and lack of diversity.

Clustering Disentanglement +1

Everybody's Talkin': Let Me Talk as You Want

no code implementations15 Jan 2020 Linsen Song, Wayne Wu, Chen Qian, Ran He, Chen Change Loy

The audio-translated expression parameters are then used to synthesize a photo-realistic human subject in each video frame, with the movement of the mouth regions precisely mapped to the source audio.

3D Face Reconstruction

TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

no code implementations CVPR 2020 Zhuoqian Yang, Wentao Zhu, Wayne Wu, Chen Qian, Qiang Zhou, Bolei Zhou, Chen Change Loy

We present a lightweight video motion retargeting approach TransMoMo that is capable of transferring motion of a person in a source video realistically to another video of a target person.

motion retargeting

AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection

no code implementations NeurIPS 2020 Hao Zhu, Chaoyou Fu, Qianyi Wu, Wayne Wu, Chen Qian, Ran He

However, due to the lack of Deepfakes datasets with large variance in appearance, which can be hardly produced by recent identity swapping methods, the detection algorithm may fail in this situation.

Pareidolia Face Reenactment

no code implementations CVPR 2021 Linsen Song, Wayne Wu, Chaoyou Fu, Chen Qian, Chen Change Loy, Ran He

We present a new application direction named Pareidolia Face Reenactment, which is defined as animating a static illusory face to move in tandem with a human face in the video.

Face Reenactment Texture Synthesis

Unsupervised Disentangling Structure and Appearance

no code implementations27 Sep 2018 Wayne Wu, Kaidi Cao, Cheng Li, Chen Qian, Chen Change Loy

It is challenging to disentangle an object into two orthogonal spaces of structure and appearance since each can influence the visual observation in a different and unpredictable way.

Disentanglement

MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks

no code implementations19 Dec 2021 Wentao Zhu, Zhuoqian Yang, Ziang Di, Wayne Wu, Yizhou Wang, Chen Change Loy

Trained with the canonicalization operations and the derived regularizations, our method learns to factorize a skeleton sequence into three independent semantic subspaces, i. e., motion, structure, and view angle.

3D Reconstruction Action Analysis +2

Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation

no code implementations19 Jan 2022 Xian Liu, Yinghao Xu, Qianyi Wu, Hang Zhou, Wayne Wu, Bolei Zhou

Moreover, to enable portrait rendering in one unified neural radiance field, a Torso Deformation module is designed to stabilize the large-scale non-rigid torso motions.

EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model

no code implementations30 May 2022 Xinya Ji, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Wayne Wu, Feng Xu, Xun Cao

Although significant progress has been made to audio-driven talking face generation, existing methods either neglect facial emotion or cannot be applied to arbitrary subjects.

Talking Face Generation

Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding Approach

no code implementations30 Jun 2022 Jiaqi Tang, Zhaoyang Liu, Jing Tan, Chen Qian, Wayne Wu, LiMin Wang

Local context modeling sub-network is proposed to perceive diverse patterns of generic event boundaries, and it generates powerful video representations and reliable boundary confidence.

Boundary Detection Generic Event Boundary Detection +1

Audio-Driven Co-Speech Gesture Video Generation

no code implementations5 Dec 2022 Xian Liu, Qianyi Wu, Hang Zhou, Yuanqi Du, Wayne Wu, Dahua Lin, Ziwei Liu

Our key insight is that the co-speech gestures can be decomposed into common motion patterns and subtle rhythmic dynamics.

Video Generation

HyperStyle3D: Text-Guided 3D Portrait Stylization via Hypernetworks

no code implementations19 Apr 2023 Zhuo Chen, Xudong Xu, Yichao Yan, Ye Pan, Wenhan Zhu, Wayne Wu, Bo Dai, Xiaokang Yang

While the use of 3D-aware GANs bypasses the requirement of 3D data, we further alleviate the necessity of style images with the CLIP model being the stylization guidance.

Attribute

Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis

no code implementations ICCV 2023 Yuxin Wang, Wayne Wu, Dan Xu

State-of-the-art methods in this direction typically consider building separate networks for these two tasks (i. e., view synthesis and editing).

Novel View Synthesis

Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis

no code implementations31 Aug 2023 Linsen Song, Wayne Wu, Chaoyou Fu, Chen Change Loy, Ran He

Existing automated dubbing methods are usually designed for Professionally Generated Content (PGC) production, which requires massive training data and training time to learn a person-specific audio-video mapping.

Parameterization-driven Neural Implicit Surfaces Editing

no code implementations9 Oct 2023 Baixin Xu, Jiangbei Hu, Fei Hou, Kwan-Yee Lin, Wayne Wu, Chen Qian, Ying He

In this paper, we present a novel neural algorithm to parameterize neural implicit surfaces to simple parametric domains, such as spheres, cubes, or polycubes, thereby facilitating visualization and various editing tasks.

Neural Rendering

PaintHuman: Towards High-fidelity Text-to-3D Human Texturing via Denoised Score Distillation

no code implementations14 Oct 2023 Jianhui Yu, Hao Zhu, Liming Jiang, Chen Change Loy, Weidong Cai, Wayne Wu

We first propose a novel score function, Denoised Score Distillation (DSD), which directly modifies the SDS by introducing negative gradient components to iteratively correct the gradient direction and generate high-quality textures.

Text to 3D text-to-3d-human +1

CosmicMan: A Text-to-Image Foundation Model for Humans

no code implementations1 Apr 2024 Shikai Li, Jianglin Fu, Kaiyuan Liu, Wentao Wang, Kwan-Yee Lin, Wayne Wu

We present CosmicMan, a text-to-image foundation model specialized for generating high-fidelity human images.

Cannot find the paper you are looking for? You can Submit a new open access paper.