Search Results for author: Hang Zhou

Found 91 papers, 40 papers with code

BOOTPLACE: Bootstrapped Object Placement with Detection Transformers

1 code implementation27 Mar 2025 Hang Zhou, Xinxin Zuo, Rui Ma, Li Cheng

In this paper, we tackle the copy-paste image-to-image composition problem with a focus on object placement learning.

Data Augmentation Object

AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers

no code implementations25 Mar 2025 Jiazhi Guan, Kaisiyuan Wang, Zhiliang Xu, Quanwei Yang, Yasheng Sun, Shengyi He, Borong Liang, Yukang Cao, YingYing Li, Haocheng Feng, Errui Ding, Jingdong Wang, Youjian Zhao, Hang Zhou, Ziwei Liu

Despite the recent progress of audio-driven video generation, existing methods mostly focus on driving facial movements, leading to non-coherent head and body dynamics.

Video Generation

Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model

no code implementations21 Mar 2025 Yingying Fan, Quanwei Yang, Kaisiyuan Wang, Hang Zhou, YingYing Li, Haocheng Feng, Errui Ding, Yu Wu, Jingdong Wang

Current digital human studies focusing on lip-syncing and body movement are no longer sufficient to meet the growing industrial demand, while human video generation techniques that support interacting with real-world environments (e. g., objects) have not been well investigated.

Disentanglement Human-Object Interaction Detection +2

Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers

no code implementations13 Mar 2025 Yasheng Sun, Zhiliang Xu, Hang Zhou, Jiazhi Guan, Quanwei Yang, Kaisiyuan Wang, Borong Liang, YingYing Li, Haocheng Feng, Jingdong Wang, Ziwei Liu, Koike Hideki

Co-speech gesture video synthesis is a challenging task that requires both probabilistic modeling of human gestures and the synthesis of realistic images that align with the rhythmic nuances of speech.

MDE: Modality Discrimination Enhancement for Multi-modal Recommendation

no code implementations8 Feb 2025 Hang Zhou, Yucheng Wang, Huijing Zhan

Multi-modal recommendation systems aim to enhance performance by integrating an item's content features across various modalities with user behavior data.

cross-modal alignment Multi-modal Recommendation

Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries

1 code implementation4 Feb 2025 Huakun Luo, Haixu Wu, Hang Zhou, Lanxiang Xing, Yichen Di, Jianmin Wang, Mingsheng Long

Although deep models have been widely explored in solving partial differential equations (PDEs), previous works are primarily limited to data only with up to tens of thousands of mesh points, far from the million-point scale required by industrial simulations that involve complex geometries.

Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model

1 code implementation12 Dec 2024 Hang Zhou, Jiale Cai, Yuteng Ye, Yonghui Feng, Chenxing Gao, Junqing Yu, Zikai Song, Wei Yang

To address this, we introduce innovative motion and appearance conditions that are seamlessly integrated into our patch diffusion model.

Anomaly Detection Video Anomaly Detection

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer

1 code implementation1 Dec 2024 Jiahao Cui, Hui Li, Yun Zhan, Hanlin Shang, Kaihui Cheng, Yuqi Ma, Shan Mu, Hang Zhou, Jingdong Wang, Siyu Zhu

Existing methodologies for animating portrait images face significant challenges, particularly in handling non-frontal perspectives, rendering dynamic objects around the portrait, and generating immersive, realistic backgrounds.

Image Animation Portrait Animation

Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning

1 code implementation21 Nov 2024 Hang Zhou, Yehui Tang, Haochen Qin, Yujie Yang, Renren Jin, Deyi Xiong, Kai Han, Yunhe Wang

Our empirical studies, including instruction tuning experiments with models such as Pythia and LLaMA, demonstrate the effectiveness of the proposed framework.

GhostRNN: Reducing State Redundancy in RNN with Cheap Operations

no code implementations20 Nov 2024 Hang Zhou, Xiaoxu Zheng, Yunhe Wang, Michael Bi Mi, Deyi Xiong, Kai Han

Recurrent neural network (RNNs) that are capable of modeling long-distance dependencies are widely used in various speech tasks, eg., keyword spotting (KWS) and speech enhancement (SE).

Keyword Spotting Speech Enhancement

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

1 code implementation10 Oct 2024 Jiahao Cui, Hui Li, Yao Yao, Hao Zhu, Hanlin Shang, Kaihui Cheng, Hang Zhou, Siyu Zhu, Jingdong Wang

To the best of our knowledge, Hallo2, proposed in this paper, is the first method to achieve 4K resolution and generate hour-long, audio-driven portrait image animations enhanced with textual prompts.

4k Image Animation +2

Forest2Seq: Revitalizing Order Prior for Sequential Indoor Scene Synthesis

no code implementations7 Jul 2024 Qi Sun, Hang Zhou, Wengang Zhou, Li Li, Houqiang Li

Synthesizing realistic 3D indoor scenes is a challenging task that traditionally relies on manual arrangement and annotation by expert designers.

Indoor Scene Synthesis Scene Generation

Hybrid Alignment Training for Large Language Models

1 code implementation21 Jun 2024 Chenglong Wang, Hang Zhou, Kaiyan Chang, Bei Li, Yongyu Mu, Tong Xiao, Tongran Liu, Jingbo Zhu

Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences.

Instruction Following

Sample-efficient Imitative Multi-token Decision Transformer for Real-world Driving

no code implementations18 Jun 2024 Hang Zhou, Dan Xu, Yiding Ji

Recent advancements in autonomous driving technologies involve the capability to effectively process and learn from extensive real-world driving data.

Autonomous Driving Imitation Learning +2

DiffPop: Plausibility-Guided Object Placement Diffusion for Image Composition

no code implementations12 Jun 2024 Jiacheng Liu, Hang Zhou, Shida Wei, Rui Ma

In this paper, we address the problem of plausible object placement for the challenging task of realistic image composition.

Data Augmentation Denoising +1

Coupled Mamba: Enhanced Multi-modal Fusion with Coupled State Space Model

no code implementations28 May 2024 Wenbing Li, Hang Zhou, Junqing Yu, Zikai Song, Wei Yang

However, fusing multiple modalities is challenging for SSMs due to its hardware-aware parallelism designs.

Mamba State Space Models

Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers

no code implementations27 May 2024 Hang Zhou, Yuezhou Ma, Haixu Wu, Haowen Wang, Mingsheng Long

This limits the generalization of neural solvers to diverse PDEs, impeding them from being practical surrogate models for numerical solvers.

Prior Constraints-based Reward Model Training for Aligning Large Language Models

1 code implementation1 Apr 2024 Hang Zhou, Chenglong Wang, Yimin Hu, Tong Xiao, Chunliang Zhang, Jingbo Zhu

Reinforcement learning with human feedback for aligning large language models (LLMs) trains a reward model typically using ranking loss with comparison pairs. However, the training procedure suffers from an inherent problem: the uncontrolled scaling of reward scores during reinforcement learning due to the lack of constraints while training the reward model. This paper proposes a Prior Constraints-based Reward Model (namely PCRM) training method to mitigate this problem.

reinforcement-learning Reinforcement Learning

Attacking Transformers with Feature Diversity Adversarial Perturbation

no code implementations10 Mar 2024 Chenxing Gao, Hang Zhou, Junqing Yu, Yuteng Ye, Jiale Cai, Junle Wang, Wei Yang

Understanding the mechanisms behind Vision Transformer (ViT), particularly its vulnerability to adversarial perturba tions, is crucial for addressing challenges in its real-world applications.

Diversity

Fine-grained Appearance Transfer with Diffusion Models

1 code implementation27 Nov 2023 Yuteng Ye, Guanwen Li, Hang Zhou, Cai Jiale, Junqing Yu, Yawei Luo, Zikai Song, Qilong Xing, Youjia Zhang, Wei Yang

A pivotal aspect of our approach is the strategic use of the predicted $x_0$ space by diffusion models within the latent space of diffusion processes.

Appearance Transfer Image-to-Image Translation

DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation

1 code implementation22 Nov 2023 Zhiqin Chen, Qimin Chen, Hang Zhou, Hao Zhang

To accommodate structural variations in the collection, our network composes each shape by a selected subset of template parts which are affine-transformed.

Decoder

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

1 code implementation28 Sep 2023 Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, Gang Zeng

In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks.

3D Generation

Progressive Text-to-Image Diffusion with Soft Latent Direction

1 code implementation18 Sep 2023 Yuteng Ye, Jiale Cai, Hang Zhou, Guanwen Li, Youjia Zhang, Zikai Song, Chenxing Gao, Junqing Yu, Wei Yang

In spite of the rapidly evolving landscape of text-to-image generation, the synthesis and manipulation of multiple entities while adhering to specific relational constraints pose enduring challenges.

Language Modelling Large Language Model +1

ReliTalk: Relightable Talking Portrait Generation from a Single Video

1 code implementation5 Sep 2023 Haonan Qiu, Zhaoxi Chen, Yuming Jiang, Hang Zhou, Xiangyu Fan, Lei Yang, Wayne Wu, Ziwei Liu

Our key insight is to decompose the portrait's reflectance from implicitly learned audio-driven facial normals and images.

Single-Image Portrait Relighting

Learning Evaluation Models from Large Language Models for Sequence Generation

1 code implementation8 Aug 2023 Chenglong Wang, Hang Zhou, Kaiyan Chang, Tongran Liu, Chunliang Zhang, Quan Du, Tong Xiao, Yue Zhang, Jingbo Zhu

Automatic evaluation of sequence generation, traditionally reliant on metrics like BLEU and ROUGE, often fails to capture the semantic accuracy of generated text sequences due to their emphasis on n-gram overlap.

Machine Translation Style Transfer +1

ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation

2 code implementations4 Aug 2023 Chenglong Wang, Hang Zhou, Yimin Hu, Yifu Huo, Bei Li, Tongran Liu, Tong Xiao, Jingbo Zhu

Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e. g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences.

Abstractive Text Summarization Language Modeling +7

ShaDDR: Interactive Example-Based Geometry and Texture Generation via 3D Shape Detailization and Differentiable Rendering

1 code implementation8 Jun 2023 Qimin Chen, Zhiqin Chen, Hang Zhou, Hao Zhang

Furthermore, we showcase the ability of our method to learn geometric details and textures from shapes reconstructed from real-world photos.

Texture Synthesis

Detecting Errors in a Numerical Response via any Regression Model

2 code implementations26 May 2023 Hang Zhou, Jonas Mueller, Mayank Kumar, Jane-Ling Wang, Jing Lei

Noise plagues many numerical datasets, where the recorded values in the data may fail to match the true underlying values due to reasons including: erroneous sensors, data entry/processing mistakes, or imperfect human estimates.

regression

Building an Invisible Shield for Your Portrait against Deepfakes

no code implementations22 May 2023 Jiazhi Guan, Tianshu Hu, Hang Zhou, Zhizhi Guo, Lirui Deng, Chengbin Quan, Errui Ding, Youjian Zhao

Unlike authentic images, where the hidden messages can be extracted with precision, manipulating the facial attributes through deepfake techniques can disrupt the decoding process.

Face Swapping

StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator

no code implementations CVPR 2023 Jiazhi Guan, Zhanwang Zhang, Hang Zhou, Tianshu Hu, Kaisiyuan Wang, Dongliang He, Haocheng Feng, Jingtuo Liu, Errui Ding, Ziwei Liu, Jingdong Wang

Despite recent advances in syncing lip movements with any audio waves, current methods still struggle to balance generation quality and the model's generalization ability.

Dual Memory Units with Uncertainty Regulation for Weakly Supervised Video Anomaly Detection

1 code implementation10 Feb 2023 Hang Zhou, Junqing Yu, Wei Yang

To address this issue, we propose an Uncertainty Regulated Dual Memory Units (UR-DMU) model to learn both the representations of normal data and discriminative features of abnormal data.

Anomaly Detection Weakly-supervised Video Anomaly Detection

Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers

no code implementations9 Dec 2022 Yasheng Sun, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Zhibin Hong, Jingtuo Liu, Errui Ding, Jingdong Wang, Ziwei Liu, Hideki Koike

This requires masking a large percentage of the original image and seamlessly inpainting it with the aid of audio and reference frames.

Audio-Driven Co-Speech Gesture Video Generation

no code implementations5 Dec 2022 Xian Liu, Qianyi Wu, Hang Zhou, Yuanqi Du, Wayne Wu, Dahua Lin, Ziwei Liu

Our key insight is that the co-speech gestures can be decomposed into common motion patterns and subtle rhythmic dynamics.

Video Generation

Ada3Diff: Defending against 3D Adversarial Point Clouds via Adaptive Diffusion

no code implementations29 Nov 2022 Kui Zhang, Hang Zhou, Jie Zhang, Qidong Huang, Weiming Zhang, Nenghai Yu

Deep 3D point cloud models are sensitive to adversarial attacks, which poses threats to safety-critical applications such as autonomous driving.

Autonomous Driving Denoising

Dynamic Feature Pruning and Consolidation for Occluded Person Re-Identification

1 code implementation27 Nov 2022 Yuteng Ye, Hang Zhou, Jiale Cai, Chenxing Gao, Youjia Zhang, Junle Wang, Qiang Hu, Junqing Yu, Wei Yang

The framework mainly consists of a sparse encoder, a multi-view feature mathcing module, and a feature consolidation decoder.

Decoder Occluded Person Re-Identification

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition

2 code implementations22 Nov 2022 Jiaxiang Tang, Kaisiyuan Wang, Hang Zhou, Xiaokang Chen, Dongliang He, Tianshu Hu, Jingtuo Liu, Gang Zeng, Jingdong Wang

While dynamic Neural Radiance Fields (NeRF) have shown success in high-fidelity 3D modeling of talking portraits, the slow training and inference speed severely obstruct their potential usage.

NeRF Talking Face Generation

Person Text-Image Matching via Text-Feature Interpretability Embedding and External Attack Node Implantation

1 code implementation16 Nov 2022 Fan Li, Hang Zhou, Huafeng Li, Yafei Zhang, Zhengtao Yu

Specifically, we improve the interpretability of text features by providing them with consistent semantic information with image features to achieve the alignment of text and describe image region features. To address the challenges posed by the diversity of text and the corresponding person images, we treat the variation caused by diversity to features as caused by perturbation information and propose a novel adversarial attack and defense method to solve it.

Adversarial Attack Diversity +2

Learning to Immunize Images for Tamper Localization and Self-Recovery

no code implementations28 Oct 2022 Qichao Ying, Hang Zhou, Zhenxing Qian, Sheng Li, Xinpeng Zhang

Image immunization (Imuge) is a technology of protecting the images by introducing trivial perturbation, so that the protected images are immune to the viruses in that the tampered contents can be auto-recovered.

TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

3 code implementations5 Oct 2022 Haixu Wu, Tengge Hu, Yong liu, Hang Zhou, Jianmin Wang, Mingsheng Long

TimesBlock can discover the multi-periodicity adaptively and extract the complex temporal variations from transformed 2D tensors by a parameter-efficient inception block.

Action Recognition Anomaly Detection +4

StyleSwap: Style-Based Generator Empowers Robust Face Swapping

no code implementations27 Sep 2022 Zhiliang Xu, Hang Zhou, Zhibin Hong, Ziwei Liu, Jiaming Liu, Zhizhi Guo, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Our core idea is to leverage a style-based generator to empower high-fidelity and robust face swapping, thus the generator's advantage can be adopted for optimizing identity similarity.

Face Swapping

PointCAT: Contrastive Adversarial Training for Robust Point Cloud Recognition

no code implementations16 Sep 2022 Qidong Huang, Xiaoyi Dong, Dongdong Chen, Hang Zhou, Weiming Zhang, Kui Zhang, Gang Hua, Nenghai Yu

Notwithstanding the prominent performance achieved in various applications, point cloud recognition models have often suffered from natural corruptions and adversarial perturbations.

StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3

1 code implementation16 Aug 2022 Haonan Qiu, Yuming Jiang, Hang Zhou, Wayne Wu, Ziwei Liu

Notably, StyleFaceV is capable of generating realistic $1024\times1024$ face videos even without high-resolution training videos.

Image Generation Video Generation

Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption

no code implementations21 Jul 2022 Jiazhi Guan, Hang Zhou, Mingming Gong, Errui Ding, Jingdong Wang, Youjian Zhao

Specifically, by carefully examining the spatial and temporal properties, we propose to disrupt a real video through a Pseudo-fake Generator and create a wide range of pseudo-fake videos for training.

DeepFake Detection Face Swapping

TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers

1 code implementation18 Jul 2022 Jihao Liu, Boxiao Liu, Hang Zhou, Hongsheng Li, Yu Liu

In this paper, we propose a novel data augmentation technique TokenMix to improve the performance of vision transformers.

Data Augmentation

Delving into Sequential Patches for Deepfake Detection

no code implementations6 Jul 2022 Jiazhi Guan, Hang Zhou, Zhibin Hong, Errui Ding, Jingdong Wang, Chengbin Quan, Youjian Zhao

Recent advances in face forgery techniques produce nearly visually untraceable deepfake videos, which could be leveraged with malicious intentions.

DeepFake Detection Face Swapping

Agriculture-Vision Challenge 2022 -- The Runner-Up Solution for Agricultural Pattern Recognition via Transformer-based Models

no code implementations23 Jun 2022 Zhicheng Yang, Jui-Hsin Lai, Jun Zhou, Hang Zhou, Chen Du, Zhongcheng Lai

The Agriculture-Vision Challenge in CVPR is one of the most famous and competitive challenges for global researchers to break the boundary between computer vision and agriculture sectors, aiming at agricultural pattern recognition from aerial images.

Data Augmentation

MultiEarth 2022 -- The Champion Solution for Image-to-Image Translation Challenge via Generation Models

no code implementations17 Jun 2022 Yuchuan Gou, Bo Peng, Hongchen Liu, Hang Zhou, Jui-Hsin Lai

The MultiEarth 2022 Image-to-Image Translation challenge provides a well-constrained test bed for generating the corresponding RGB Sentinel-2 imagery with the given Sentinel-1 VV & VH imagery.

Image-to-Image Translation Translation

MultiEarth 2022 -- The Champion Solution for the Matrix Completion Challenge via Multimodal Regression and Generation

no code implementations17 Jun 2022 Bo Peng, Hongchen Liu, Hang Zhou, Yuchuan Gou, Jui-Hsin Lai

Earth observation satellites have been continuously monitoring the earth environment for years at different locations and spectral bands with different modalities.

Earth Observation Matrix Completion +2

Image Protection for Robust Cropping Localization and Recovery

no code implementations6 Jun 2022 Qichao Ying, Hang Zhou, Xiaoxiao Hu, Zhenxing Qian, Sheng Li, Xinpeng Zhang

Existing image cropping detection schemes ignore that recovering the cropped-out contents can unveil the purpose of the behaved cropping attack.

Image Cropping

EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model

no code implementations30 May 2022 Xinya Ji, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Wayne Wu, Feng Xu, Xun Cao

Although significant progress has been made to audio-driven talking face generation, existing methods either neglect facial emotion or cannot be applied to arbitrary subjects.

Talking Face Generation

Few-Shot Head Swapping in the Wild

no code implementations CVPR 2022 Changyong Shu, Hemao Wu, Hang Zhou, Jiaming Liu, Zhibin Hong, Changxing Ding, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Particularly, seamless blending is achieved with the help of a Semantic-Guided Color Reference Creation procedure and a Blending UNet.

Face Swapping

Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing

2 code implementations25 Apr 2022 Haoyue Cheng, Zhaoyang Liu, Hang Zhou, Chen Qian, Wayne Wu, LiMin Wang

This paper focuses on the weakly-supervised audio-visual video parsing task, which aims to recognize all events belonging to each modality and localize their temporal boundaries.

Denoising valid

SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance

no code implementations25 Mar 2022 Xinchi Zhou, Dongzhan Zhou, Wanli Ouyang, Hang Zhou, Ziwei Liu, Di Hu

Recent years have witnessed the success of deep learning on the visual sound separation task.

Shape-invariant 3D Adversarial Point Clouds

1 code implementation CVPR 2022 Qidong Huang, Xiaoyi Dong, Dongdong Chen, Hang Zhou, Weiming Zhang, Nenghai Yu

In this paper, we propose a novel Point-Cloud Sensitivity Map to boost both the efficiency and imperceptibility of point perturbations.

Visual Sound Localization in the Wild by Cross-Modal Interference Erasing

1 code implementation13 Feb 2022 Xian Liu, Rui Qian, Hang Zhou, Di Hu, Weiyao Lin, Ziwei Liu, Bolei Zhou, Xiaowei Zhou

Specifically, we observe that the previous practice of learning only a single audio representation is insufficient due to the additive nature of audio signals.

Sound Source Localization

Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation

no code implementations19 Jan 2022 Xian Liu, Yinghao Xu, Qianyi Wu, Hang Zhou, Wayne Wu, Bolei Zhou

Moreover, to enable portrait rendering in one unified neural radiance field, a Torso Deformation module is designed to stabilize the large-scale non-rigid torso motions.

NeRF

SAC-GAN: Structure-Aware Image Composition

1 code implementation13 Dec 2021 Hang Zhou, Rui Ma, Ling-Xiao Zhang, Lin Gao, Ali Mahdavi-Amiri, Hao Zhang

Specifically, our network takes the semantic layout features from the input scene image, features encoded from the edges and silhouette in the input object patch, as well as a latent code as inputs, and generates a 2D spatial affine transform defining the translation and scaling of the object patch.

Image Augmentation Object

Hyperspectral Mixed Noise Removal via Subspace Representation and Weighted Low-rank Tensor Regularization

no code implementations13 Nov 2021 Hang Zhou, Yanchi Su, Zhanshan Li

Recently, the low-rank property of different components extracted from the image has been considered in man hyperspectral image denoising methods.

Hyperspectral Image Denoising Image Denoising

From Image to Imuge: Immunized Image Generation

1 code implementation27 Oct 2021 Qichao Ying, Zhenxing Qian, Hang Zhou, Haisheng Xu, Xinpeng Zhang, Siyi Li

At the recipient's side, the verifying network localizes the malicious modifications, and the original content can be approximately recovered by the decoder, despite the presence of the attacks.

Decoder Image Cropping +1

Hiding Images into Images with Real-world Robustness

no code implementations12 Oct 2021 Qichao Ying, Hang Zhou, Xianhan Zeng, Haisheng Xu, Zhenxing Qian, Xinpeng Zhang

The existing image embedding networks are basically vulnerable to malicious attacks such as JPEG compression and noise adding, not applicable for real-world copyright protection tasks.

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation

1 code implementation CVPR 2021 Hang Zhou, Yasheng Sun, Wayne Wu, Chen Change Loy, Xiaogang Wang, Ziwei Liu

While speech content information can be defined by learning the intrinsic synchronization between audio-visual modalities, we identify that a pose code will be complementarily learned in a modulated convolution-based reconstruction framework.

Talking Face Generation

Audio-Driven Emotional Video Portraits

1 code implementation CVPR 2021 Xinya Ji, Hang Zhou, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu

In this work, we present Emotional Video Portraits (EVP), a system for synthesizing high-quality video portraits with vivid emotional dynamics driven by audios.

Disentanglement Face Generation

Visually Informed Binaural Audio Generation without Binaural Audios

no code implementations CVPR 2021 Xudong Xu, Hang Zhou, Ziwei Liu, Bo Dai, Xiaogang Wang, Dahua Lin

Moreover, combined with binaural recordings, our method is able to further boost the performance of binaural audio generation under supervised settings.

Audio Generation

Reversible Watermarking in Deep Convolutional Neural Networks for Integrity Authentication

no code implementations9 Apr 2021 Xiquan Guan, Huamin Feng, Weiming Zhang, Hang Zhou, Jie Zhang, Nenghai Yu

Specifically, we present the reversible watermarking problem of deep convolutional neural networks and utilize the pruning theory of model compression technology to construct a host sequence used for embedding watermarking information by histogram shift.

Model Compression

Adversarial Examples Detection beyond Image Space

1 code implementation23 Feb 2021 Kejiang Chen, Yuefeng Chen, Hang Zhou, Chuan Qin, Xiaofeng Mao, Weiming Zhang, Nenghai Yu

To detect both few-perturbation attacks and large-perturbation attacks, we propose a method beyond image space by a two-stream architecture, in which the image stream focuses on the pixel artifacts and the gradient stream copes with the confidence artifacts.

LG-GAN: Label Guided Adversarial Network for Flexible Targeted Attack of Point Cloud-based Deep Networks

no code implementations1 Nov 2020 Hang Zhou, Dongdong Chen, Jing Liao, Weiming Zhang, Kejiang Chen, Xiaoyi Dong, Kunlin Liu, Gang Hua, Nenghai Yu

To overcome these shortcomings, this paper proposes a novel label guided adversarial network (LG-GAN) for real-time flexible targeted point cloud attack.

Discriminability Distillation in Group Representation Learning

no code implementations ECCV 2020 Manyuan Zhang, Guanglu Song, Hang Zhou, Yu Liu

We show the discrimiability knowledge has good properties that can be distilled by a light-weight distillation network and can be generalized on the unseen target set.

Representation Learning

LG-GAN: Label Guided Adversarial Network for Flexible Targeted Attack of Point Cloud Based Deep Networks

no code implementations CVPR 2020 Hang Zhou, Dongdong Chen, Jing Liao, Kejiang Chen, Xiaoyi Dong, Kunlin Liu, Weiming Zhang, Gang Hua, Nenghai Yu

To overcome these shortcomings, this paper proposes a novel label guided adversarial network (LG-GAN) for real-time flexible targeted point cloud attack.

Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

1 code implementation CVPR 2020 Hang Zhou, Jihao Liu, Ziwei Liu, Yu Liu, Xiaogang Wang

Though face rotation has achieved rapid progress in recent years, the lack of high-quality paired training data remains a great hurdle for existing methods.

3D Face Modelling Data Augmentation +1

Powerful Speaker Embedding Training Framework by Adversarially Disentangled Identity Representation

no code implementations27 Nov 2019 Jianwei Tai, Hang Zhou, Qingjia Huang, Xiaoqi Jia

The main challenge of speaker verification in the wild is the interference caused by irrelevant information in speech and the lack of speaker labels in speech datasets.

Speaker Verification

Self-supervised Adversarial Training

1 code implementation15 Nov 2019 Kejiang Chen, Hang Zhou, Yuefeng Chen, Xiaofeng Mao, Yuhong Li, Yuan He, Hui Xue, Weiming Zhang, Nenghai Yu

Recent work has demonstrated that neural networks are vulnerable to adversarial examples.

Self-Supervised Learning

A Graph-Based Framework to Bridge Movies and Synopses

no code implementations ICCV 2019 Yu Xiong, Qingqiu Huang, Lingfeng Guo, Hang Zhou, Bolei Zhou, Dahua Lin

On top of this dataset, we develop a framework to perform matching between movie segments and synopsis paragraphs.

Vision-Infused Deep Audio Inpainting

no code implementations ICCV 2019 Hang Zhou, Ziwei Liu, Xudong Xu, Ping Luo, Xiaogang Wang

Extensive experiments demonstrate that our framework is capable of inpainting realistic and varying audio segments with or without visual contexts.

Audio inpainting Image Inpainting

ET-GAN: Cross-Language Emotion Transfer Based on Cycle-Consistent Generative Adversarial Networks

no code implementations27 May 2019 Xiaoqi Jia, Jianwei Tai, Hang Zhou, Yakai Li, Weijuan Zhang, Haichao Du, Qingjia Huang

Despite the remarkable progress made in synthesizing emotional speech from text, it is still challenging to provide emotion information to existing speech segments.

Domain Adaptation Generative Adversarial Network +2

DUP-Net: Denoiser and Upsampler Network for 3D Adversarial Point Clouds Defense

1 code implementation ICCV 2019 Hang Zhou, Kejiang Chen, Weiming Zhang, Han Fang, Wenbo Zhou, Nenghai Yu

We propose a Denoiser and UPsampler Network (DUP-Net) structure as defenses for 3D adversarial point cloud classification, where the two modules reconstruct surface smoothness by dropping or adding points.

Denoising Point Cloud Classification

Hierarchical Neural Network Architecture In Keyword Spotting

no code implementations6 Nov 2018 Yixiao Qu, Sihao Xue, Zhenyi Ying, Hang Zhou, Jue Sun

Keyword Spotting (KWS) provides the start signal of ASR problem, and thus it is essential to ensure a high recall rate.

Keyword Spotting speech-recognition +1

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

1 code implementation20 Jul 2018 Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, Xiaogang Wang

Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech.

Lip Reading Retrieval +2

Cannot find the paper you are looking for? You can Submit a new open access paper.