Search Results for author: Yizeng Han

Found 20 papers, 14 papers with code

Demystify Mamba in Vision: A Linear Attention Perspective

1 code implementation • 26 May 2024 • Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang

By exploring the similarities and disparities between the effective Mamba and subpar linear attention Transformer, we provide comprehensive analyses to demystify the key factors behind Mamba's success.

Paper
Code

EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training

1 code implementation • 14 May 2024 • Yulin Wang, Yang Yue, Rui Lu, Yizeng Han, Shiji Song, Gao Huang

These patterns, when observed through frequency and spatial domains, incorporate lower-frequency components, and the natural image contents without distortion or data augmentation.

Data Augmentation Self-Supervised Learning

171

Paper
Code

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

1 code implementation • 18 Mar 2024 • Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You

Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency.

Semantic Segmentation Video Recognition

Paper
Code

GRA: Detecting Oriented Objects through Group-wise Rotating and Attention

no code implementations • 17 Mar 2024 • Jiangshan Wang, Yifan Pu, Yizeng Han, Jiayi Guo, Yiru Wang, Xiu Li, Gao Huang

GRA can adaptively capture fine-grained features of objects with diverse orientations, comprising two key components: Group-wise Rotating and Group-wise Attention.

Object object-detection +2

Paper
Add Code

SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning

1 code implementation • 21 Feb 2024 • Chaoqun Du, Yizeng Han, Gao Huang

Recent advancements in semi-supervised learning have focused on a more realistic yet challenging task: addressing imbalances in labeled data while the class distribution of unlabeled data remains both unknown and potentially mismatched.

Paper
Code

Mask Grounding for Referring Image Segmentation

no code implementations • 19 Dec 2023 • Yong Xien Chng, Henry Zheng, Yizeng Han, Xuchong Qiu, Gao Huang

To tackle this challenge, we introduce a novel Mask Grounding auxiliary task that significantly improves visual grounding within language features, by explicitly teaching the model to learn fine-grained correspondence between masked textual tokens and their matching visual objects.

Ranked #2 on Referring Expression Segmentation on RefCOCO testB

Image Segmentation Referring Expression Segmentation +4

Paper
Add Code

GSVA: Generalized Segmentation via Multimodal Large Language Models

no code implementations • 15 Dec 2023 • Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang

Generalized Referring Expression Segmentation (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image.

Decoder Generalized Referring Expression Segmentation +2

Paper
Add Code

Agent Attention: On the Integration of Softmax and Linear Attention

2 code implementations • 14 Dec 2023 • Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Shiji Song, Gao Huang

Specifically, the Agent Attention, denoted as a quadruple $(Q, A, K, V)$, introduces an additional set of agent tokens $A$ into the conventional attention module.

Computational Efficiency Image Classification +4

349

Paper
Code

Fine-grained Recognition with Learnable Semantic Data Augmentation

1 code implementation • 1 Sep 2023 • Yifan Pu, Yizeng Han, Yulin Wang, Junlan Feng, Chao Deng, Gao Huang

Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories.

Data Augmentation Fine-Grained Image Recognition +2

Paper
Code

Latency-aware Unified Dynamic Networks for Efficient Image Recognition

1 code implementation • 30 Aug 2023 • Yizeng Han, Zeyu Liu, Zhihang Yuan, Yifan Pu, Chaofei Wang, Shiji Song, Gao Huang

Dynamic computation has emerged as a promising avenue to enhance the inference efficiency of deep networks.

Scheduling

Paper
Code

Computation-efficient Deep Learning for Computer Vision: A Survey

no code implementations • 27 Aug 2023 • Yulin Wang, Yizeng Han, Chaofei Wang, Shiji Song, Qi Tian, Gao Huang

Over the past decade, deep learning models have exhibited considerable advancements, reaching or even exceeding human-level performance in a range of visual perception tasks.

Autonomous Vehicles Edge-computing +1

Paper
Add Code

FLatten Transformer: Vision Transformer using Focused Linear Attention

1 code implementation • ICCV 2023 • Dongchen Han, Xuran Pan, Yizeng Han, Shiji Song, Gao Huang

The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks.

347

Paper
Code

Dynamic Perceiver for Efficient Visual Recognition

1 code implementation • ICCV 2023 • Yizeng Han, Dongchen Han, Zeyu Liu, Yulin Wang, Xuran Pan, Yifan Pu, Chao Deng, Junlan Feng, Shiji Song, Gao Huang

Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.

Action Recognition Classification +4

Paper
Code

Adaptive Rotated Convolution for Rotated Object Detection

1 code implementation • ICCV 2023 • Yifan Pu, Yiru Wang, Zhuofan Xia, Yizeng Han, Yulin Wang, Weihao Gan, Zidong Wang, Shiji Song, Gao Huang

In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images, and an efficient conditional computation mechanism is introduced to accommodate the large orientation variations of objects within an image.

Ranked #3 on Object Detection In Aerial Images on DOTA (using extra training data)

Object object-detection +2

Paper
Code

Latency-aware Spatial-wise Dynamic Networks

2 code implementations • 12 Oct 2022 • Yizeng Han, Zhihang Yuan, Yifan Pu, Chenhao Xue, Shiji Song, Guangyu Sun, Gao Huang

The latency prediction model can efficiently estimate the inference latency of dynamic networks by simultaneously considering algorithms, scheduling strategies, and hardware properties.

Image Classification Instance Segmentation +4

Paper
Code

Learning to Weight Samples for Dynamic Early-exiting Networks

1 code implementation • 17 Sep 2022 • Yizeng Han, Yifan Pu, Zihang Lai, Chaofei Wang, Shiji Song, Junfen Cao, Wenhui Huang, Chao Deng, Gao Huang

Intuitively, easy samples, which generally exit early in the network during inference, should contribute more to training early classifiers.

Meta-Learning

Paper
Code

CAM-loss: Towards Learning Spatially Discriminative Feature Representations

no code implementations • ICCV 2021 • Chaofei Wang, Jiayu Xiao, Yizeng Han, Qisen Yang, Shiji Song, Gao Huang

The backbone of traditional CNN classifier is generally considered as a feature extractor, followed by a linear layer which performs the classification.

Few-Shot Learning Image Classification +2