Search Results for author: Huaibo Huang

Found 60 papers, 21 papers with code

Hierarchical Face Aging through Disentangled Latent Characteristics

no code implementations ECCV 2020 Pei-Pei Li, Huaibo Huang, Yibo Hu, Xiang Wu, Ran He, Zhenan Sun

To explore the age effects on facial images, we propose a Disentangled Adversarial Autoencoder (DAAE) to disentangle the facial images into three independent factors: age, identity and extraneous information.

Age Estimation MORPH

ID-Cloak: Crafting Identity-Specific Cloaks Against Personalized Text-to-Image Generation

no code implementations12 Feb 2025 Qianrui Teng, Xing Cui, Xuannan Liu, Peipei Li, Zekun Li, Huaibo Huang, Ran He

Personalized text-to-image models allow users to generate images of new concepts from several reference photos, thereby leading to critical concerns regarding civil privacy.

Text-to-Image Generation

Survey on AI-Generated Media Detection: From Non-MLLM to MLLM

no code implementations7 Feb 2025 Yueying Zou, Peipei Li, Zekun Li, Huaibo Huang, Xing Cui, Xuannan Liu, Chenghanyu Zhang, Ran He

Despite significant progress in this field, there remains a gap in literature regarding a comprehensive survey that examines the transition from domain-specific to general-purpose detection methods.

InfoBFR: Real-World Blind Face Restoration via Information Bottleneck

no code implementations26 Jan 2025 Nan Gao, Jia Li, Huaibo Huang, Ke Shang, Ran He

Blind face restoration (BFR) is a highly challenging problem due to the uncertainty of data degradation patterns.

Attribute Blind Face Restoration

Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey

1 code implementation14 Nov 2024 Xuannan Liu, Xing Cui, Peipei Li, Zekun Li, Huaibo Huang, Shuhan Xia, Miaoxuan Zhang, Yueying Zou, Ran He

Consequently, understanding the methods of jailbreak attacks and existing defense mechanisms is essential to ensure the safe deployment of multimodal generative models in real-world scenarios, particularly in security-sensitive applications.

Breaking the Low-Rank Dilemma of Linear Attention

1 code implementation12 Nov 2024 Qihang Fan, Huaibo Huang, Ran He

The Softmax attention mechanism in Transformer models is notoriously computationally expensive, particularly due to its quadratic complexity, posing significant challenges in vision applications.

LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration

1 code implementation20 Oct 2024 Yuang Ai, Huaibo Huang, Ran He

In the pre-training stage, we enhance the pre-trained CLIP model by introducing a simple mechanism that scales it to higher resolutions, allowing us to extract robust degradation representations that adaptively guide the IR network.

All Computational Efficiency +2

Deep Learning Technology for Face Forgery Detection: A Survey

no code implementations22 Sep 2024 Lixia Ma, Puning Yang, Yuting Xu, Ziming Yang, Peipei Li, Huaibo Huang

This paper presents a comprehensive survey of recent deep learning-based approaches for facial forgery detection.

DeepFake Detection Deep Learning +3

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

no code implementations19 Sep 2024 Xiaotian Han, Yiren Jian, Xuefeng Hu, Haogeng Liu, Yiqi Wang, Qihang Fan, Yuang Ai, Huaibo Huang, Ran He, Zhenheng Yang, Quanzeng You

Pre-training on large-scale, high-quality datasets is crucial for enhancing the reasoning capabilities of Large Language Models (LLMs), especially in specialized domains such as mathematics.

Math Mathematical Reasoning

ZePo: Zero-Shot Portrait Stylization with Faster Sampling

1 code implementation10 Aug 2024 Jin Liu, Huaibo Huang, Jie Cao, Ran He

To blend the Consistency Features extracted from both content and style images, we introduce a Style Enhancement Attention Control technique that meticulously merges content and style features within the attention space of the target image.

Text-to-Image Generation

MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

no code implementations13 Jun 2024 Xuannan Liu, Zekun Li, Peipei Li, Shuhan Xia, Xing Cui, Linzhi Huang, Huaibo Huang, Weihong Deng, Zhaofeng He

Current multimodal misinformation detection (MMD) methods often assume a single source and type of forgery for each sample, which is insufficient for real-world scenarios where multiple forgery sources coexist.

Misinformation

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

1 code implementation28 May 2024 Haogeng Liu, Quanzeng You, Xiaotian Han, Yongfei Liu, Huaibo Huang, Ran He, Hongxia Yang

In the realm of Multimodal Large Language Models (MLLMs), vision-language connector plays a crucial role to link the pre-trained vision encoders with Large Language Models (LLMs).

Language Modeling Language Modelling +2

Vision Transformer with Sparse Scan Prior

no code implementations22 May 2024 Qihang Fan, Huaibo Huang, Mingrui Chen, Ran He

In recent years, Transformers have achieved remarkable progress in computer vision tasks.

Instance Segmentation object-detection +2

ViTAR: Vision Transformer with Any Resolution

no code implementations27 Mar 2024 Qihang Fan, Quanzeng You, Xiaotian Han, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

Firstly, we propose a novel module for dynamic resolution adjustment, designed with a single Transformer block, specifically to achieve highly efficient incremental token integration.

Self-Supervised Learning Semantic Segmentation

DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration

no code implementations15 Mar 2024 Nan Gao, Jia Li, Huaibo Huang, Zhi Zeng, Ke Shang, Shuwu Zhang, Ran He

Experimental results demonstrate the superiority of DiffMAC over state-of-the-art methods, with a high degree of generalization in real-world and heterogeneous settings.

Attribute Blind Face Restoration +1

FKA-Owl: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs

no code implementations4 Mar 2024 Xuannan Liu, Peipei Li, Huaibo Huang, Zekun Li, Xing Cui, Jiahao Liang, Lixiong Qin, Weihong Deng, Zhaofeng He

The massive generation of multimodal fake news involving both text and images exhibits substantial distribution discrepancies, prompting the need for generalized detectors.

Fake News Detection Image Manipulation +2

Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration

no code implementations5 Dec 2023 Yuang Ai, Huaibo Huang, Xiaoqiang Zhou, Jiexiang Wang, Ran He

Extensive experiments on 16 IR tasks underscore the superiority of MPerceiver in terms of adaptiveness, generalizability and fidelity.

All Decoder +1

Portrait Diffusion: Training-free Face Stylization with Chain-of-Painting

1 code implementation3 Dec 2023 Jin Liu, Huaibo Huang, Chao Jin, Ran He

Face stylization refers to the transformation of a face into a specific portrait style.

Image Reconstruction

Exploring Straighter Trajectories of Flow Matching with Diffusion Guidance

no code implementations28 Nov 2023 Siyu Xing, Jie Cao, Huaibo Huang, Xiao-Yu Zhang, Ran He

First, we propose a coupling strategy to straighten trajectories, creating couplings between image and noise samples under diffusion model guidance.

InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser

1 code implementation25 Nov 2023 Xing Cui, Zekun Li, Pei Pei Li, Huaibo Huang, Xuannan Liu, Zhaofeng He

We employ DDIM inversion to extract this noise from the reference image and leverage a diffusion model to generate new stylized images from the "style" noise.

Text-to-Image Generation

DeVAn: Dense Video Annotation for Video-Language Models

1 code implementation8 Oct 2023 Tingkai Liu, Yunzhe Tao, Haogeng Liu, Qihang Fan, Ding Zhou, Huaibo Huang, Ran He, Hongxia Yang

Finally, we benchmarked a wide range of current video-language models on DeVAn, and we aim for DeVAn to serve as a useful evaluation set in the age of large language models and complex multi-modal tasks.

Retrieval Sentence +1

Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling

no code implementations8 Oct 2023 Haogeng Liu, Qihang Fan, Tingkai Liu, Linjie Yang, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

This paper proposes Video-Teller, a video-language foundation model that leverages multi-modal fusion and fine-grained modality alignment to significantly enhance the video-to-text generation task.

Text Generation Video Summarization

RMT: Retentive Networks Meet Vision Transformers

1 code implementation CVPR 2024 Qihang Fan, Huaibo Huang, Mingrui Chen, Hongmin Liu, Ran He

To alleviate these issues, we draw inspiration from the recent Retentive Network (RetNet) in the field of NLP, and propose RMT, a strong vision backbone with explicit spatial prior for general purposes.

Instance Segmentation object-detection +2

Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification

2 code implementations NeurIPS 2023 Rui Wang, Peipei Li, Huaibo Huang, Chunshui Cao, Ran He, Zhaofeng He

Consequently, we propose a cross-modal ordinal pairwise loss to refine the CLIP feature space, where texts and images maintain both semantic alignment and ordering alignment.

Age Estimation Classification +2

Lightweight Vision Transformer with Bidirectional Interaction

1 code implementation NeurIPS 2023 Qihang Fan, Huaibo Huang, Xiaoqiang Zhou, Ran He

This paper proposes a Fully Adaptive Self-Attention (FASA) mechanism for vision transformer to model the local and global information as well as the bidirectional interaction between them in context-aware ways.

Rethinking Local Perception in Lightweight Vision Transformer

1 code implementation31 Mar 2023 Qihang Fan, Huaibo Huang, Jiyang Guan, Ran He

The combination of the AttnConv and vanilla attention which uses pooling to reduce FLOPs in CloFormer enables the model to perceive high-frequency and low-frequency information.

Image Classification object-detection +2

Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer

1 code implementation CVPR 2024 Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Lei Zhang, Ran He

Unsupervised Domain Adaptation (UDA) can effectively address domain gap issues in real-world image Super-Resolution (SR) by accessing both the source and target data.

Image Super-Resolution Source-Free Domain Adaptation +1

Pluralistic Aging Diffusion Autoencoder

no code implementations ICCV 2023 Peipei Li, Rui Wang, Huaibo Huang, Ran He, Zhaofeng He

Face aging is an ill-posed problem because multiple plausible aging patterns may correspond to a given input.

Denoising Diversity

MSRA-SR: Image Super-resolution Transformer with Multi-scale Shared Representation Acquisition

no code implementations ICCV 2023 Xiaoqiang Zhou, Huaibo Huang, Ran He, Zilei Wang, Jie Hu, Tieniu Tan

In particular, self-attention with cross-scale matching and convolution filters with different kernel sizes are designed to exploit the multi-scale features in images.

Image Super-Resolution

Vision Transformer with Super Token Sampling

1 code implementation CVPR 2023 Huaibo Huang, Xiaoqiang Zhou, Jie Cao, Ran He, Tieniu Tan

STA decomposes vanilla global attention into multiplications of a sparse association map and a low-dimensional attention, leading to high efficiency in capturing global dependencies.

Semantic Segmentation Superpixels

Parallel Augmentation and Dual Enhancement for Occluded Person Re-identification

1 code implementation11 Oct 2022 Zi Wang, Huaibo Huang, Aihua Zheng, Chenglong Li, Ran He

To alleviate these two issues, we propose a simple yet effective method with Parallel Augmentation and Dual Enhancement (PADE), which is robust on both occluded and non-occluded data and does not require any auxiliary clues.

Occluded Person Re-Identification

Contrastive Attention Network with Dense Field Estimation for Face Completion

no code implementations20 Dec 2021 Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Gengyun Jia, Zhenhua Chai, Xiaolin Wei

This multi-scale architecture is beneficial for the decoder to utilize discriminative representations learned from encoders into images.

Decoder Face Recognition +1

Toward Accurate and Reliable Iris Segmentation Using Uncertainty Learning

no code implementations20 Oct 2021 Jianze Wei, Huaibo Huang, Muyi Sun, Yunlong Wang, Min Ren, Ran He, Zhenan Sun

To make further efforts on accurate and reliable iris segmentation, we propose a bilateral self-attention module and design Bilateral Transformer (BiTrans) with hierarchical architecture by exploring spatial and visual relationships.

Iris Recognition Iris Segmentation +1

Causal Representation Learning for Context-Aware Face Transfer

no code implementations4 Oct 2021 Gege Gao, Huaibo Huang, Chaoyou Fu, Ran He

Human face synthesis involves transferring knowledge about the identity and identity-dependent face shape (IDFS) of a human face to target face images where the context (e. g., facial expressions, head poses, and other background factors) may change dramatically.

counterfactual Counterfactual Inference +4

Universal Face Restoration With Memorized Modulation

no code implementations3 Oct 2021 Jia Li, Huaibo Huang, Xiaofei Jia, Ran He

Blind face restoration (BFR) is a challenging problem because of the uncertainty of the degradation patterns.

Blind Face Restoration

Information Bottleneck Disentanglement for Identity Swapping

1 code implementation CVPR 2021 Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, Ran He

In this work, we propose a novel information disentangling and swapping network, called InfoSwap, to extract the most expressive information for identity representation from a pre-trained face recognition model.

Disentanglement Face Recognition +1

Memory Oriented Transfer Learning for Semi-Supervised Image Deraining

no code implementations CVPR 2021 Huaibo Huang, Aijing Yu, Ran He

To address this issue, we propose a memory-oriented semi-supervised (MOSS) method which enables the network to explore and exploit the properties of rain streaks from both synthetic and real data.

Rain Removal Transfer Learning

Free-Form Image Inpainting via Contrastive Attention Network

no code implementations29 Oct 2020 Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Zhenhua Chai, Xiaolin Wei, Ran He

It is difficult for encoders to capture such powerful representations under this complex situation.

Decoder Form +1

DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition

1 code implementation20 Sep 2020 Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He

As a consequence, massive new diverse paired heterogeneous images with the same identity can be generated from noises.

Contrastive Learning Diversity +2

Cosmetic-Aware Makeup Cleanser

no code implementations20 Apr 2020 Yi Li, Huaibo Huang, Junchi Yu, Ran He, Tieniu Tan

Face verification aims at determining whether a pair of face images belongs to the same identity.

Face Parsing Face Verification +1

Informative Sample Mining Network for Multi-Domain Image-to-Image Translation

no code implementations ECCV 2020 Jie Cao, Huaibo Huang, Yi Li, Ran He, Zhenan Sun

The performance of multi-domain image-to-image translation has been significantly improved by recent progress in deep generative models.

Image-to-Image Translation Informativeness +1

LAMP-HQ: A Large-Scale Multi-Pose High-Quality Database and Benchmark for NIR-VIS Face Recognition

no code implementations17 Dec 2019 Aijing Yu, Haoxue Wu, Huaibo Huang, Zhen Lei, Ran He

A spectral conditional attention module is introduced to reduce the domain gap between NIR and VIS data and then improve the performance of NIR-VIS heterogeneous face recognition on various databases including the LAMP-HQ.

Attribute Face Recognition +1

Dual Variational Generation for Low Shot Heterogeneous Face Recognition

no code implementations NeurIPS 2019 Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He

Specifically, we first introduce a dual variational autoencoder to represent a joint distribution of paired heterogeneous images.

Face Recognition Heterogeneous Face Recognition

Biphasic Learning of GANs for High-Resolution Image-to-Image Translation

no code implementations14 Apr 2019 Jie Cao, Huaibo Huang, Yi Li, Jingtuo Liu, Ran He, Zhenan Sun

In this work, we present a novel training framework for GANs, namely biphasic learning, to achieve image-to-image translation in multiple visual domains at $1024^2$ resolution.

Image-to-Image Translation Mutual Information Estimation +2

UVA: A Universal Variational Framework for Continuous Age Analysis

no code implementations30 Mar 2019 Pei-Pei Li, Huaibo Huang, Yibo Hu, Xiang Wu, Ran He, Zhenan Sun

UVA is the first attempt to achieve facial age analysis tasks, including age translation, age generation and age estimation, in a universal framework.

Age Estimation MORPH +1

Dual Variational Generation for Low-Shot Heterogeneous Face Recognition

1 code implementation25 Mar 2019 Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He

Then, in order to ensure the identity consistency of the generated paired heterogeneous images, we impose a distribution alignment in the latent space and a pairwise identity preserving in the image space.

Face Recognition Heterogeneous Face Recognition

A Survey of Deep Facial Attribute Analysis

no code implementations26 Dec 2018 Xin Zheng, Yanqing Guo, Huaibo Huang, Yi Li, Ran He

Deep learning based facial attribute analysis consists of two basic sub-issues: facial attribute estimation (FAE), which recognizes whether facial attributes are present in given images, and facial attribute manipulation (FAM), which synthesizes or removes desired facial attributes.

Attribute Survey

Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning

no code implementations17 Dec 2018 Hao Zhu, Huaibo Huang, Yi Li, Aihua Zheng, Ran He

Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image.

Talking Face Generation

Disentangled Variational Representation for Heterogeneous Face Recognition

no code implementations6 Sep 2018 Xiang Wu, Huaibo Huang, Vishal M. Patel, Ran He, Zhenan Sun

Visible (VIS) to near infrared (NIR) face matching is a challenging problem due to the significant domain discrepancy between the domains and a lack of sufficient data for training cross-modal matching algorithms.

Face Recognition Heterogeneous Face Recognition

IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis

3 code implementations NeurIPS 2018 Huaibo Huang, Zhihang Li, Ran He, Zhenan Sun, Tieniu Tan

On the other hand, the inference model is encouraged to classify between the generated and real samples while the generator tries to fool it as GANs.

Image Generation

Variational Capsules for Image Analysis and Synthesis

no code implementations11 Jul 2018 Huaibo Huang, Lingxiao Song, Ran He, Zhenan Sun, Tieniu Tan

Variational capsules model an image as a composition of entities in a probabilistic model.

Attribute Diversity +3

Cannot find the paper you are looking for? You can Submit a new open access paper.