no code implementations • 10 Mar 2025 • Zihao He, Liu Liu, Dongchen Han, Kai Gao, Lei Dong, Dechao Bu, Peipei Huo, Zhihao Wang, Wenxin Deng, Jingjia Liu, Jin-cheng Guo, Yi Zhao, Yang Wu
We merged networks from experimental sources to generate a pre-integrated multi-layer human network for evaluation.
1 code implementation • 9 Dec 2024 • Dongchen Han, Yifan Pu, Zhuofan Xia, Yizeng Han, Xuran Pan, Xiu Li, Jiwen Lu, Shiji Song, Gao Huang
Widely adopted in modern Vision Transformer designs, Softmax attention can effectively capture long-range visual information; however, it incurs excessive computational cost when dealing with high-resolution inputs.
1 code implementation • 11 Aug 2024 • Yifan Pu, Zhuofan Xia, Jiayi Guo, Dongchen Han, Qixiu Li, Duo Li, Yuhui Yuan, Ji Li, Yizeng Han, Shiji Song, Gao Huang, Xiu Li
In response to this observation, we present a novel diffusion transformer framework incorporating an additional set of mediator tokens to engage with queries and keys separately.
1 code implementation • 26 May 2024 • Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang
By exploring the similarities and disparities between the effective Mamba and subpar linear attention Transformer, we provide comprehensive analyses to demystify the key factors behind Mamba's success.
no code implementations • 21 Feb 2024 • Jiawei Liang, Siyuan Liang, Man Luo, Aishan Liu, Dongchen Han, Ee-Chien Chang, Xiaochun Cao
Nevertheless, the frozen visual encoder in autoregressive VLMs imposes constraints on the learning of conventional image triggers.
1 code implementation • CVPR 2024 • Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang
Generalized Referring Expression Segmentation (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image.
Ranked #1 on
Generalized Referring Expression Segmentation
on gRefCOCO
(using extra training data)
2 code implementations • 14 Dec 2023 • Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Siyuan Pan, Pengfei Wan, Shiji Song, Gao Huang
Specifically, the Agent Attention, denoted as a quadruple $(Q, A, K, V)$, introduces an additional set of agent tokens $A$ into the conventional attention module.
no code implementations • 7 Dec 2023 • Dongchen Han, Xiaojun Jia, Yang Bai, Jindong Gu, Yang Liu, Xiaochun Cao
Investigating the generation of high-transferability adversarial examples is crucial for uncovering VLP models' vulnerabilities in practical scenarios.
1 code implementation • ICCV 2023 • Dongchen Han, Xuran Pan, Yizeng Han, Shiji Song, Gao Huang
The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks.
1 code implementation • ICCV 2023 • Yizeng Han, Dongchen Han, Zeyu Liu, Yulin Wang, Xuran Pan, Yifan Pu, Chao Deng, Junlan Feng, Shiji Song, Gao Huang
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
no code implementations • 17 Oct 2022 • Xuran Pan, Tianzhu Ye, Dongchen Han, Shiji Song, Gao Huang
Recent years have witnessed the fast development of large-scale pre-training frameworks that can extract multi-modal representations in a unified form and achieve promising performances when transferred to downstream tasks.
1 code implementation • CVPR 2022 • Haojun Jiang, Yuanze Lin, Dongchen Han, Shiji Song, Gao Huang
Our method leverages an off-the-shelf object detector to identify visual objects from unlabeled images, and then language queries for these objects are obtained in an unsupervised fashion with a pseudo-query generation module.