Search Results for author: Qihang Fan

Found 9 papers, 6 papers with code

Vision Transformer with Sparse Scan Prior

1 code implementation22 May 2024 Qihang Fan, Huaibo Huang, Mingrui Chen, Ran He

In recent years, Transformers have achieved remarkable progress in computer vision tasks.

Instance Segmentation object-detection +2

Band-Attention Modulated RetNet for Face Forgery Detection

no code implementations9 Apr 2024 Zhida Zhang, Jie Cao, Wenkui Yang, Qihang Fan, Kai Zhou, Ran He

The transformer networks are extensively utilized in face forgery detection due to their scalability across large datasets. Despite their success, transformers face challenges in balancing the capture of global context, which is crucial for unveiling forgery clues, with computational complexity. To mitigate this issue, we introduce Band-Attention modulated RetNet (BAR-Net), a lightweight network designed to efficiently process extensive visual contexts while avoiding catastrophic forgetting. Our approach empowers the target token to perceive global information by assigning differential attention levels to tokens at varying distances.

ViTAR: Vision Transformer with Any Resolution

no code implementations27 Mar 2024 Qihang Fan, Quanzeng You, Xiaotian Han, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

Firstly, we propose a novel module for dynamic resolution adjustment, designed with a single Transformer block, specifically to achieve highly efficient incremental token integration.

Self-Supervised Learning Semantic Segmentation

Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling

no code implementations8 Oct 2023 Haogeng Liu, Qihang Fan, Tingkai Liu, Linjie Yang, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

This paper proposes Video-Teller, a video-language foundation model that leverages multi-modal fusion and fine-grained modality alignment to significantly enhance the video-to-text generation task.

Text Generation Video Summarization

DeVAn: Dense Video Annotation for Video-Language Models

1 code implementation8 Oct 2023 Tingkai Liu, Yunzhe Tao, Haogeng Liu, Qihang Fan, Ding Zhou, Huaibo Huang, Ran He, Hongxia Yang

Finally, we benchmarked a wide range of current video-language models on DeVAn, and we aim for DeVAn to serve as a useful evaluation set in the age of large language models and complex multi-modal tasks.

Retrieval Sentence +1

RMT: Retentive Networks Meet Vision Transformers

1 code implementation CVPR 2024 Qihang Fan, Huaibo Huang, Mingrui Chen, Hongmin Liu, Ran He

To alleviate these issues, we draw inspiration from the recent Retentive Network (RetNet) in the field of NLP, and propose RMT, a strong vision backbone with explicit spatial prior for general purposes.

Instance Segmentation object-detection +2

Lightweight Vision Transformer with Bidirectional Interaction

1 code implementation NeurIPS 2023 Qihang Fan, Huaibo Huang, Xiaoqiang Zhou, Ran He

This paper proposes a Fully Adaptive Self-Attention (FASA) mechanism for vision transformer to model the local and global information as well as the bidirectional interaction between them in context-aware ways.

Rethinking Local Perception in Lightweight Vision Transformer

1 code implementation31 Mar 2023 Qihang Fan, Huaibo Huang, Jiyang Guan, Ran He

The combination of the AttnConv and vanilla attention which uses pooling to reduce FLOPs in CloFormer enables the model to perceive high-frequency and low-frequency information.

Image Classification object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.