Search Results for author: Yanbin Hao

Found 33 papers, 22 papers with code

SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models

1 code implementation10 Mar 2025 Ouxiang Li, YuAn Wang, Xinting Hu, Houcheng Jiang, Tao Liang, Yanbin Hao, Guojun Ma, Fuli Feng

Erasing concepts from large-scale text-to-image (T2I) diffusion models has become increasingly crucial due to the growing concerns over copyright infringement, offensive content, and privacy violations.

Model Editing

Accelerating Diffusion Transformer via Error-Optimized Cache

no code implementations31 Jan 2025 Junxiang Qiu, Shuo Wang, Jinda Lu, Lin Liu, Houcheng Jiang, Yanbin Hao

Existing caching methods accelerate generation by reusing DiT features from the previous time step and skipping calculations in the next, but they tend to locate and cache low-error modules without focusing on reducing caching-induced errors, resulting in a sharp decline in generated content quality when increasing caching intensity.

CookingDiffusion: Cooking Procedural Image Generation with Stable Diffusion

no code implementations15 Jan 2025 YuAn Wang, Bin Zhu, Yanbin Hao, Chong-Wah Ngo, Yi Tan, Xiang Wang

These prompts encompass text prompts (representing cooking steps), image prompts (corresponding to cooking images), and multi-modal prompts (mixing cooking steps and images), ensuring the consistent generation of cooking procedural images.

Text-to-Image Generation

Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters

1 code implementation9 Dec 2024 YuAn Wang, Ouxiang Li, Tingting Mu, Yanbin Hao, Kuien Liu, Xiang Wang, Xiangnan He

The success of text-to-image generation enabled by diffuion models has imposed an urgent need to erase unwanted concepts, e. g., copyrighted, offensive, and unsafe ones, from the pre-trained models in a precise, timely, and low-cost manner.

Navigate Text-to-Image Generation

Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting

no code implementations25 Oct 2024 Xingyu Zhu, Beier Zhu, Yi Tan, Shuo Wang, Yanbin Hao, Hanwang Zhang

Vision-language models, such as CLIP, have shown impressive generalization capacities when using appropriate text descriptions.

Hand1000: Generating Realistic Hands from Text with Only 1,000 Images

no code implementations28 Aug 2024 Haozhuo Zhang, Bin Zhu, Yu Cao, Yanbin Hao

The training of Hand1000 is divided into three stages with the first stage aiming to enhance the model's understanding of hand anatomy by using a pre-trained hand gesture recognition model to extract gesture representation.

Anatomy Hand Gesture Recognition +3

Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective

1 code implementation13 Aug 2024 Ouxiang Li, Jiayin Cai, Yanbin Hao, XiaoLong Jiang, Yao Hu, Fuli Feng

In this paper, we re-examine the SID problem and identify two prevalent biases in current training paradigms, i. e., weakened artifact features and overfitted artifact features.

Image Generation Synthetic Image Detection

Selective Vision-Language Subspace Projection for Few-shot CLIP

1 code implementation24 Jul 2024 Xingyu Zhu, Beier Zhu, Yi Tan, Shuo Wang, Yanbin Hao, Hanwang Zhang

Vision-language models such as CLIP are capable of mapping the different modality data into a unified feature space, enabling zero/few-shot inference by measuring the similarity of given images and texts.

Few-Shot Learning

Rethinking Visual Content Refinement in Low-Shot CLIP Adaptation

1 code implementation19 Jul 2024 Jinda Lu, Shuo Wang, Yanbin Hao, Haifeng Liu, Xiang Wang, Meng Wang

However, these adaptation methods are usually operated on the global view of an input image, and thus biased perception of partial local details of the image.

Transfer Learning

Model Inversion Attacks Through Target-Specific Conditional Diffusion Models

1 code implementation16 Jul 2024 Ouxiang Li, Yanbin Hao, Zhicai Wang, Bin Zhu, Shuo Wang, Zaixi Zhang, Fuli Feng

To alleviate these issues, leveraging on diffusion models' remarkable synthesis capabilities, we propose Diffusion-based Model Inversion (Diff-MI) attacks.

Image Reconstruction

PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition

1 code implementation3 Jul 2024 Yanbin Hao, Diansong Zhou, Zhicai Wang, Chong-Wah Ngo, Meng Wang

In recent years, vision Transformers and MLPs have demonstrated remarkable performance in image understanding tasks.

Position Video Recognition

A Sanity Check for AI-generated Image Detection

1 code implementation27 Jun 2024 Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, XiaoLong Jiang, Yao Hu, Weidi Xie

This effectively enables the model to discern AI-generated images based on semantics or contextual information; Secondly, we select the highest frequency patches and the lowest frequency patches in the image, and compute the low-level patchwise features, aiming to detect AI-generated images by low-level artifacts, for example, noise pattern, anti-aliasing, etc.

Hierarchical Space-Time Attention for Micro-Expression Recognition

1 code implementation6 May 2024 Haihong Hao, Shuo Wang, Huixia Ben, Yanbin Hao, Yansong Wang, Weiwei Wang

Specifically, we first process ME video frames and special frames or data parallelly by our cascaded Unimodal Space-Time Attention (USTA) to establish connections between subtle facial movements and specific facial areas.

Micro Expression Recognition Micro-Expression Recognition +1

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

1 code implementation CVPR 2024 Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi Tian

Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo-realistic images and giving rise to a multitude of applications.

Data Augmentation Diversity +1

Boosting Few-Shot Learning via Attentive Feature Regularization

no code implementations23 Mar 2024 Xingyu Zhu, Shuo Wang, Jinda Lu, Yanbin Hao, Haifeng Liu, Xiangnan He

Few-shot learning (FSL) based on manifold regularization aims to improve the recognition capacity of novel objects with limited training samples by mixing two samples from different categories with a blending factor.

Few-Shot Learning

A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming

no code implementations30 Jan 2024 Pengyuan Zhou, Lin Wang, Zhi Liu, Yanbin Hao, Pan Hui, Sasu Tarkoma, Jussi Kangasharju

This paper offers an insightful examination of how currently top-trending AI technologies, i. e., generative artificial intelligence (Generative AI) and large language models (LLMs), are reshaping the field of video technology, including video generation, understanding, and streaming.

Video Generation Video Understanding

Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective

1 code implementation8 Dec 2023 Fangzhou Song, Bin Zhu, Yanbin Hao, Shuo Wang

Leveraging on the remarkable capabilities of foundation models (i. e., Llama2 and SAM), we propose to augment recipe and food image by extracting alignable information related to the counterpart.

Cross-Modal Retrieval Data Augmentation +2

Selective Volume Mixup for Video Action Recognition

1 code implementation18 Sep 2023 Yi Tan, Zhaofan Qiu, Yanbin Hao, Ting Yao, Tao Mei

In this paper, we propose a novel video augmentation strategy named Selective Volume Mixup (SV-Mix) to improve the generalization ability of deep models with limited training videos.

Action Recognition Image Augmentation +1

CgT-GAN: CLIP-guided Text GAN for Image Captioning

1 code implementation23 Aug 2023 Jiarui Yu, Haoran Li, Yanbin Hao, Bin Zhu, Tong Xu, Xiangnan He

Particularly, we use adversarial training to teach CgT-GAN to mimic the phrases of an external text corpus and CLIP-based reward to provide semantic guidance.

Image Captioning

Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation

1 code implementation15 May 2023 Fangwen Wu, Jingxuan He, Yufei Yin, Yanbin Hao, Gang Huang, Lechao Cheng

This study introduces an efficacious approach, Masked Collaborative Contrast (MCC), to highlight semantic regions in weakly supervised semantic segmentation.

Contrastive Learning Weakly supervised Semantic Segmentation +1

Bi-directional Distribution Alignment for Transductive Zero-Shot Learning

1 code implementation CVPR 2023 Zhicai Wang, Yanbin Hao, Tingting Mu, Ouxiang Li, Shuo Wang, Xiangnan He

It is well-known that zero-shot learning (ZSL) can suffer severely from the problem of domain shift, where the true and learned data distributions for the unseen classes do not match.

Zero-Shot Learning

3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention

1 code implementation CVPR 2023 Zhenhua Tang, Zhaofan Qiu, Yanbin Hao, Richang Hong, Ting Yao

On this basis, we devise STCFormer by stacking multiple STC blocks and further integrate a new Structure-enhanced Positional Embedding (SPE) into STCFormer to take the structure of human body into consideration.

3D Human Pose Estimation

Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

1 code implementation15 Jul 2022 Zhicai Wang, Yanbin Hao, Xingyu Gao, Hao Zhang, Shuo Wang, Tingting Mu, Xiangnan He

They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers.

Long-term Leap Attention, Short-term Periodic Shift for Video Classification

1 code implementation12 Jul 2022 Hao Zhang, Lechao Cheng, Yanbin Hao, Chong-Wah Ngo

By replacing a vanilla 2D attention with the LAPS, we could adapt a static transformer into a video one, with zero extra parameters and neglectable computation overhead ($\sim$2. 6\%).

Video Classification

Attention in Attention: Modeling Context Correlation for Efficient Video Classification

1 code implementation20 Apr 2022 Yanbin Hao, Shuo Wang, Pei Cao, Xinjian Gao, Tong Xu, Jinmeng Wu, Xiangnan He

Attention mechanisms have significantly boosted the performance of video classification neural networks thanks to the utilization of perspective contexts.

Video Classification

Group Contextualization for Video Recognition

1 code implementation CVPR 2022 Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Xiangnan He

By utilizing calibrators to embed feature with four different kinds of contexts in parallel, the learnt representation is expected to be more resilient to diverse types of activities.

Action Recognition Egocentric Activity Recognition +1

Token Shift Transformer for Video Classification

3 code implementations5 Aug 2021 Hao Zhang, Yanbin Hao, Chong-Wah Ngo

It is worth noticing that our TokShift transformer is a pure convolutional-free video transformer pilot with computational efficiency for video understanding.

Classification Computational Efficiency +2

Aggregated Multi-GANs for Controlled 3D Human Motion Prediction

no code implementations17 Mar 2021 Zhenguang Liu, Kedi Lyu, Shuang Wu, Haipeng Chen, Yanbin Hao, Shouling Ji

Our method is compelling in that it enables manipulable motion prediction across activity types and allows customization of the human movement in a variety of fine-grained ways.

Human motion prediction motion prediction +1

Motion Prediction Using Trajectory Cues

1 code implementation ICCV 2021 Zhenguang Liu, Pengxiang Su, Shuang Wu, Xuanjing Shen, Haipeng Chen, Yanbin Hao, Meng Wang

Predicting human motion from a historical pose sequence is at the core of many applications in computer vision.

motion prediction Prediction

Cross-sentence Pre-trained Model for Interactive QA matching

no code implementations LREC 2020 Jinmeng Wu, Yanbin Hao

In addition to the context information captured at each word position, we incorporate a new quantity of context information jump to facilitate the attention weight formulation.

Language Modeling Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.