no code implementations • ECCV 2020 • Pei-Pei Li, Huaibo Huang, Yibo Hu, Xiang Wu, Ran He, Zhenan Sun
To explore the age effects on facial images, we propose a Disentangled Adversarial Autoencoder (DAAE) to disentangle the facial images into three independent factors: age, identity and extraneous information.
no code implementations • 12 Feb 2025 • Qianrui Teng, Xing Cui, Xuannan Liu, Peipei Li, Zekun Li, Huaibo Huang, Ran He
Personalized text-to-image models allow users to generate images of new concepts from several reference photos, thereby leading to critical concerns regarding civil privacy.
no code implementations • 7 Feb 2025 • Yueying Zou, Peipei Li, Zekun Li, Huaibo Huang, Xing Cui, Xuannan Liu, Chenghanyu Zhang, Ran He
Despite significant progress in this field, there remains a gap in literature regarding a comprehensive survey that examines the transition from domain-specific to general-purpose detection methods.
no code implementations • 26 Jan 2025 • Nan Gao, Jia Li, Huaibo Huang, Ke Shang, Ran He
Blind face restoration (BFR) is a highly challenging problem due to the uncertainty of data degradation patterns.
1 code implementation • 14 Nov 2024 • Xuannan Liu, Xing Cui, Peipei Li, Zekun Li, Huaibo Huang, Shuhan Xia, Miaoxuan Zhang, Yueying Zou, Ran He
Consequently, understanding the methods of jailbreak attacks and existing defense mechanisms is essential to ensure the safe deployment of multimodal generative models in real-world scenarios, particularly in security-sensitive applications.
1 code implementation • 12 Nov 2024 • Qihang Fan, Huaibo Huang, Ran He
The Softmax attention mechanism in Transformer models is notoriously computationally expensive, particularly due to its quadratic complexity, posing significant challenges in vision applications.
1 code implementation • 24 Oct 2024 • Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Xiaotian Han, Zhengyu Chen, Quanzeng You, Hongxia Yang
Our second contribution, DreamClear, is a DiT-based image restoration model.
1 code implementation • 20 Oct 2024 • Yuang Ai, Huaibo Huang, Ran He
In the pre-training stage, we enhance the pre-trained CLIP model by introducing a simple mechanism that scales it to higher resolutions, allowing us to extract robust degradation representations that adaptively guide the IR network.
no code implementations • 22 Sep 2024 • Lixia Ma, Puning Yang, Yuting Xu, Ziming Yang, Peipei Li, Huaibo Huang
This paper presents a comprehensive survey of recent deep learning-based approaches for facial forgery detection.
no code implementations • 19 Sep 2024 • Xiaotian Han, Yiren Jian, Xuefeng Hu, Haogeng Liu, Yiqi Wang, Qihang Fan, Yuang Ai, Huaibo Huang, Ran He, Zhenheng Yang, Quanzeng You
Pre-training on large-scale, high-quality datasets is crucial for enhancing the reasoning capabilities of Large Language Models (LLMs), especially in specialized domains such as mathematics.
1 code implementation • 10 Aug 2024 • Jin Liu, Huaibo Huang, Jie Cao, Ran He
To blend the Consistency Features extracted from both content and style images, we introduce a Style Enhancement Attention Control technique that meticulously merges content and style features within the attention space of the target image.
no code implementations • 13 Jun 2024 • Xuannan Liu, Zekun Li, Peipei Li, Shuhan Xia, Xing Cui, Linzhi Huang, Huaibo Huang, Weihong Deng, Zhaofeng He
Current multimodal misinformation detection (MMD) methods often assume a single source and type of forgery for each sample, which is insufficient for real-world scenarios where multiple forgery sources coexist.
1 code implementation • 28 May 2024 • Haogeng Liu, Quanzeng You, Xiaotian Han, Yongfei Liu, Huaibo Huang, Ran He, Hongxia Yang
In the realm of Multimodal Large Language Models (MLLMs), vision-language connector plays a crucial role to link the pre-trained vision encoders with Large Language Models (LLMs).
no code implementations • 22 May 2024 • Qihang Fan, Huaibo Huang, Mingrui Chen, Ran He
In recent years, Transformers have achieved remarkable progress in computer vision tasks.
no code implementations • 22 May 2024 • Qihang Fan, Huaibo Huang, Mingrui Chen, Ran He
The Vision Transformer (ViT) has gained prominence for its superior relational modeling prowess.
no code implementations • 27 Mar 2024 • Qihang Fan, Quanzeng You, Xiaotian Han, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang
Firstly, we propose a novel module for dynamic resolution adjustment, designed with a single Transformer block, specifically to achieve highly efficient incremental token integration.
no code implementations • 15 Mar 2024 • Nan Gao, Jia Li, Huaibo Huang, Zhi Zeng, Ke Shang, Shuwu Zhang, Ran He
Experimental results demonstrate the superiority of DiffMAC over state-of-the-art methods, with a high degree of generalization in real-world and heterogeneous settings.
no code implementations • 4 Mar 2024 • Xuannan Liu, Peipei Li, Huaibo Huang, Zekun Li, Xing Cui, Jiahao Liang, Lixiong Qin, Weihong Deng, Zhaofeng He
The massive generation of multimodal fake news involving both text and images exhibits substantial distribution discrepancies, prompting the need for generalized detectors.
no code implementations • 3 Mar 2024 • Haogeng Liu, Quanzeng You, Xiaotian Han, Yiqi Wang, Bohan Zhai, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang
Multimodal Large Language Models (MLLMs) have experienced significant advancements recently.
Ranked #118 on
Visual Question Answering
on MM-Vet
no code implementations • CVPR 2024 • Yuang Ai, Huaibo Huang, Xiaoqiang Zhou, Jiexiang Wang, Ran He
Extensive experiments on 16 IR tasks underscore the superiority of MPerceiver in terms of adaptiveness generalizability and fidelity.
no code implementations • 5 Dec 2023 • Yuang Ai, Huaibo Huang, Xiaoqiang Zhou, Jiexiang Wang, Ran He
Extensive experiments on 16 IR tasks underscore the superiority of MPerceiver in terms of adaptiveness, generalizability and fidelity.
1 code implementation • 3 Dec 2023 • Jin Liu, Huaibo Huang, Chao Jin, Ran He
Face stylization refers to the transformation of a face into a specific portrait style.
no code implementations • 28 Nov 2023 • Siyu Xing, Jie Cao, Huaibo Huang, Xiao-Yu Zhang, Ran He
First, we propose a coupling strategy to straighten trajectories, creating couplings between image and noise samples under diffusion model guidance.
1 code implementation • 25 Nov 2023 • Xing Cui, Zekun Li, Pei Pei Li, Huaibo Huang, Xuannan Liu, Zhaofeng He
We employ DDIM inversion to extract this noise from the reference image and leverage a diffusion model to generate new stylized images from the "style" noise.
1 code implementation • 8 Oct 2023 • Tingkai Liu, Yunzhe Tao, Haogeng Liu, Qihang Fan, Ding Zhou, Huaibo Huang, Ran He, Hongxia Yang
Finally, we benchmarked a wide range of current video-language models on DeVAn, and we aim for DeVAn to serve as a useful evaluation set in the age of large language models and complex multi-modal tasks.
no code implementations • 8 Oct 2023 • Haogeng Liu, Qihang Fan, Tingkai Liu, Linjie Yang, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang
This paper proposes Video-Teller, a video-language foundation model that leverages multi-modal fusion and fine-grained modality alignment to significantly enhance the video-to-text generation task.
1 code implementation • CVPR 2024 • Qihang Fan, Huaibo Huang, Mingrui Chen, Hongmin Liu, Ran He
To alleviate these issues, we draw inspiration from the recent Retentive Network (RetNet) in the field of NLP, and propose RMT, a strong vision backbone with explicit spatial prior for general purposes.
2 code implementations • NeurIPS 2023 • Rui Wang, Peipei Li, Huaibo Huang, Chunshui Cao, Ran He, Zhaofeng He
Consequently, we propose a cross-modal ordinal pairwise loss to refine the CLIP feature space, where texts and images maintain both semantic alignment and ordering alignment.
1 code implementation • NeurIPS 2023 • Qihang Fan, Huaibo Huang, Xiaoqiang Zhou, Ran He
This paper proposes a Fully Adaptive Self-Attention (FASA) mechanism for vision transformer to model the local and global information as well as the bidirectional interaction between them in context-aware ways.
1 code implementation • 31 Mar 2023 • Qihang Fan, Huaibo Huang, Jiyang Guan, Ran He
The combination of the AttnConv and vanilla attention which uses pooling to reduce FLOPs in CloFormer enables the model to perceive high-frequency and low-frequency information.
Ranked #621 on
Image Classification
on ImageNet
1 code implementation • CVPR 2024 • Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Lei Zhang, Ran He
Unsupervised Domain Adaptation (UDA) can effectively address domain gap issues in real-world image Super-Resolution (SR) by accessing both the source and target data.
no code implementations • ICCV 2023 • Peipei Li, Rui Wang, Huaibo Huang, Ran He, Zhaofeng He
Face aging is an ill-posed problem because multiple plausible aging patterns may correspond to a given input.
no code implementations • ICCV 2023 • Xiaoqiang Zhou, Huaibo Huang, Ran He, Zilei Wang, Jie Hu, Tieniu Tan
In particular, self-attention with cross-scale matching and convolution filters with different kernel sizes are designed to exploit the multi-scale features in images.
1 code implementation • CVPR 2023 • Huaibo Huang, Xiaoqiang Zhou, Jie Cao, Ran He, Tieniu Tan
STA decomposes vanilla global attention into multiplications of a sparse association map and a low-dimensional attention, leading to high efficiency in capturing global dependencies.
1 code implementation • 11 Oct 2022 • Zi Wang, Huaibo Huang, Aihua Zheng, Chenglong Li, Ran He
To alleviate these two issues, we propose a simple yet effective method with Parallel Augmentation and Dual Enhancement (PADE), which is robust on both occluded and non-occluded data and does not require any auxiliary clues.
1 code implementation • CVPR 2022 • Xin Xie, Yi Li, Huaibo Huang, Haiyan Fu, Wanwan Wang, Yanqing Guo
Style transfer has been well studied in recent years with excellent performance processed.
no code implementations • CVPR 2022 • Gengyun Jia, Huaibo Huang, Chaoyou Fu, Ran He
In this paper, we regard image cropping as a set prediction problem.
no code implementations • 20 Dec 2021 • Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Gengyun Jia, Zhenhua Chai, Xiaolin Wei
This multi-scale architecture is beneficial for the decoder to utilize discriminative representations learned from encoders into images.
no code implementations • 20 Oct 2021 • Jianze Wei, Huaibo Huang, Muyi Sun, Yunlong Wang, Min Ren, Ran He, Zhenan Sun
To make further efforts on accurate and reliable iris segmentation, we propose a bilateral self-attention module and design Bilateral Transformer (BiTrans) with hierarchical architecture by exploring spatial and visual relationships.
no code implementations • 4 Oct 2021 • Gege Gao, Huaibo Huang, Chaoyou Fu, Ran He
Human face synthesis involves transferring knowledge about the identity and identity-dependent face shape (IDFS) of a human face to target face images where the context (e. g., facial expressions, head poses, and other background factors) may change dramatically.
no code implementations • 3 Oct 2021 • Jia Li, Huaibo Huang, Xiaofei Jia, Ran He
Blind face restoration (BFR) is a challenging problem because of the uncertainty of the degradation patterns.
no code implementations • 29 Sep 2021 • Chenyu Liu, Jia Li, Junxian Duan, Huaibo Huang
The first is that capturing the general clue of artifacts is difficult.
1 code implementation • CVPR 2021 • Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, Ran He
In this work, we propose a novel information disentangling and swapping network, called InfoSwap, to extract the most expressive information for identity representation from a pre-trained face recognition model.
no code implementations • CVPR 2021 • Huaibo Huang, Aijing Yu, Ran He
To address this issue, we propose a memory-oriented semi-supervised (MOSS) method which enables the network to explore and exploit the properties of rain streaks from both synthetic and real data.
no code implementations • 29 Oct 2020 • Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Zhenhua Chai, Xiaolin Wei, Ran He
It is difficult for encoders to capture such powerful representations under this complex situation.
1 code implementation • 20 Sep 2020 • Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He
As a consequence, massive new diverse paired heterogeneous images with the same identity can be generated from noises.
no code implementations • 20 Apr 2020 • Yi Li, Huaibo Huang, Junchi Yu, Ran He, Tieniu Tan
Face verification aims at determining whether a pair of face images belongs to the same identity.
no code implementations • ECCV 2020 • Jie Cao, Huaibo Huang, Yi Li, Ran He, Zhenan Sun
The performance of multi-domain image-to-image translation has been significantly improved by recent progress in deep generative models.
no code implementations • 21 Dec 2019 • Xin Ma, Yi Li, Huaibo Huang, Mandi Luo, Ran He
Real-world image super-resolution (SR) is a challenging image translation problem.
no code implementations • 17 Dec 2019 • Aijing Yu, Haoxue Wu, Huaibo Huang, Zhen Lei, Ran He
A spectral conditional attention module is introduced to reduce the domain gap between NIR and VIS data and then improve the performance of NIR-VIS heterogeneous face recognition on various databases including the LAMP-HQ.
no code implementations • NeurIPS 2019 • Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He
Specifically, we first introduce a dual variational autoencoder to represent a joint distribution of paired heterogeneous images.
no code implementations • 14 Apr 2019 • Jie Cao, Huaibo Huang, Yi Li, Jingtuo Liu, Ran He, Zhenan Sun
In this work, we present a novel training framework for GANs, namely biphasic learning, to achieve image-to-image translation in multiple visual domains at $1024^2$ resolution.
no code implementations • 30 Mar 2019 • Pei-Pei Li, Huaibo Huang, Yibo Hu, Xiang Wu, Ran He, Zhenan Sun
UVA is the first attempt to achieve facial age analysis tasks, including age translation, age generation and age estimation, in a universal framework.
1 code implementation • 25 Mar 2019 • Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He
Then, in order to ensure the identity consistency of the generated paired heterogeneous images, we impose a distribution alignment in the latent space and a pairwise identity preserving in the image space.
Ranked #1 on
Face Verification
on CASIA NIR-VIS 2.0
no code implementations • 26 Dec 2018 • Xin Zheng, Yanqing Guo, Huaibo Huang, Yi Li, Ran He
Deep learning based facial attribute analysis consists of two basic sub-issues: facial attribute estimation (FAE), which recognizes whether facial attributes are present in given images, and facial attribute manipulation (FAM), which synthesizes or removes desired facial attributes.
no code implementations • 17 Dec 2018 • Hao Zhu, Huaibo Huang, Yi Li, Aihua Zheng, Ran He
Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image.
no code implementations • 6 Sep 2018 • Xiang Wu, Huaibo Huang, Vishal M. Patel, Ran He, Zhenan Sun
Visible (VIS) to near infrared (NIR) face matching is a challenging problem due to the significant domain discrepancy between the domains and a lack of sufficient data for training cross-modal matching algorithms.
Ranked #2 on
Face Verification
on CASIA NIR-VIS 2.0
3 code implementations • NeurIPS 2018 • Huaibo Huang, Zhihang Li, Ran He, Zhenan Sun, Tieniu Tan
On the other hand, the inference model is encouraged to classify between the generated and real samples while the generator tries to fool it as GANs.
no code implementations • 11 Jul 2018 • Huaibo Huang, Lingxiao Song, Ran He, Zhenan Sun, Tieniu Tan
Variational capsules model an image as a composition of entities in a probabilistic model.
no code implementations • ICCV 2017 • Huaibo Huang, Ran He, Zhenan Sun, Tieniu Tan
Most modern face super-resolution methods resort to convolutional neural networks (CNN) to infer high-resolution (HR) face images.
Ranked #3 on
Face Hallucination
on FFHQ 512 x 512 - 16x upscaling