Search Results for author: Dongchen Han

Found 12 papers, 8 papers with code

Bridging the Divide: Reconsidering Softmax and Linear Attention

1 code implementation9 Dec 2024 Dongchen Han, Yifan Pu, Zhuofan Xia, Yizeng Han, Xuran Pan, Xiu Li, Jiwen Lu, Shiji Song, Gao Huang

Widely adopted in modern Vision Transformer designs, Softmax attention can effectively capture long-range visual information; however, it incurs excessive computational cost when dealing with high-resolution inputs.

Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators

1 code implementation11 Aug 2024 Yifan Pu, Zhuofan Xia, Jiayi Guo, Dongchen Han, Qixiu Li, Duo Li, Yuhui Yuan, Ji Li, Yizeng Han, Shiji Song, Gao Huang, Xiu Li

In response to this observation, we present a novel diffusion transformer framework incorporating an additional set of mediator tokens to engage with queries and keys separately.

Denoising

Demystify Mamba in Vision: A Linear Attention Perspective

1 code implementation26 May 2024 Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang

By exploring the similarities and disparities between the effective Mamba and subpar linear attention Transformer, we provide comprehensive analyses to demystify the key factors behind Mamba's success.

Image Classification Mamba

GSVA: Generalized Segmentation via Multimodal Large Language Models

1 code implementation CVPR 2024 Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang

Generalized Referring Expression Segmentation (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image.

 Ranked #1 on Generalized Referring Expression Segmentation on gRefCOCO (using extra training data)

Decoder Generalized Referring Expression Segmentation +2

Agent Attention: On the Integration of Softmax and Linear Attention

2 code implementations14 Dec 2023 Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Siyuan Pan, Pengfei Wan, Shiji Song, Gao Huang

Specifically, the Agent Attention, denoted as a quadruple $(Q, A, K, V)$, introduces an additional set of agent tokens $A$ into the conventional attention module.

Computational Efficiency Image Classification +4

OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization

no code implementations7 Dec 2023 Dongchen Han, Xiaojun Jia, Yang Bai, Jindong Gu, Yang Liu, Xiaochun Cao

Investigating the generation of high-transferability adversarial examples is crucial for uncovering VLP models' vulnerabilities in practical scenarios.

Adversarial Attack Data Augmentation +2

FLatten Transformer: Vision Transformer using Focused Linear Attention

1 code implementation ICCV 2023 Dongchen Han, Xuran Pan, Yizeng Han, Shiji Song, Gao Huang

The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks.

Diversity

Dynamic Perceiver for Efficient Visual Recognition

1 code implementation ICCV 2023 Yizeng Han, Dongchen Han, Zeyu Liu, Yulin Wang, Xuran Pan, Yifan Pu, Chao Deng, Junlan Feng, Shiji Song, Gao Huang

Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.

Action Recognition Classification +4

Contrastive Language-Image Pre-Training with Knowledge Graphs

no code implementations17 Oct 2022 Xuran Pan, Tianzhu Ye, Dongchen Han, Shiji Song, Gao Huang

Recent years have witnessed the fast development of large-scale pre-training frameworks that can extract multi-modal representations in a unified form and achieve promising performances when transferred to downstream tasks.

Knowledge Graphs

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

1 code implementation CVPR 2022 Haojun Jiang, Yuanze Lin, Dongchen Han, Shiji Song, Gao Huang

Our method leverages an off-the-shelf object detector to identify visual objects from unlabeled images, and then language queries for these objects are obtained in an unsupervised fashion with a pseudo-query generation module.

Language Modelling Natural Language Queries +1

Cannot find the paper you are looking for? You can Submit a new open access paper.