Search Results for author: Yizeng Han

Found 32 papers, 26 papers with code

DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation

1 code implementation9 Apr 2025 Wangbo Zhao, Yizeng Han, Jiasheng Tang, Kai Wang, Hao Luo, Yibing Song, Gao Huang, Fan Wang, Yang You

Our investigations reveal that these costs primarily stem from the \emph{static} inference paradigm, which inevitably introduces redundant computation in certain \emph{diffusion timesteps} and \emph{spatial regions}.

Text to Image Generation Text-to-Image Generation +1

Bridging the Divide: Reconsidering Softmax and Linear Attention

1 code implementation9 Dec 2024 Dongchen Han, Yifan Pu, Zhuofan Xia, Yizeng Han, Xuran Pan, Xiu Li, Jiwen Lu, Shiji Song, Gao Huang

Widely adopted in modern Vision Transformer designs, Softmax attention can effectively capture long-range visual information; however, it incurs excessive computational cost when dealing with high-resolution inputs.

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs

1 code implementation CVPR 2025 Wangbo Zhao, Yizeng Han, Jiasheng Tang, Zhikai Li, Yibing Song, Kai Wang, Zhangyang Wang, Yang You

Vision-language models (VLMs) have shown remarkable success across various multi-modal tasks, yet large VLMs encounter significant efficiency challenges due to processing numerous visual tokens.

Visual Question Answering

ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis

1 code implementation11 Nov 2024 Zanlin Ni, Yulin Wang, Renping Zhou, Yizeng Han, Jiayi Guo, Zhiyuan Liu, Yuan YAO, Gao Huang

At the spatial level, we disentangle the computations of visible and mask tokens by encoding visible tokens independently, while decoding mask tokens conditioned on the fully encoded visible tokens.

Image Generation

Exploring contextual modeling with linear complexity for point cloud segmentation

no code implementations28 Oct 2024 Yong Xien Chng, Xuchong Qiu, Yizeng Han, Yifan Pu, Jiewei Cao, Gao Huang

Recently, Mamba has emerged as a promising alternative, offering efficient long-range contextual modeling capabilities without the quadratic complexity associated with Transformer's attention mechanisms.

Mamba Point Cloud Segmentation

Dynamic Diffusion Transformer

2 code implementations4 Oct 2024 Wangbo Zhao, Yizeng Han, Jiasheng Tang, Kai Wang, Yibing Song, Gao Huang, Fan Wang, Yang You

In addition, we design a Spatial-wise Dynamic Token (SDT) strategy to avoid redundant computation at unnecessary spatial locations.

Image Generation

Adapting Vision-Language Model with Fine-grained Semantics for Open-Vocabulary Segmentation

no code implementations24 Sep 2024 Yong Xien Chng, Xuchong Qiu, Yizeng Han, Kai Ding, Wan Ding, Gao Huang

Building on the observation that VLMs pre-trained on global-pooled image-text features often fail to capture fine-grained semantics necessary for effective mask classification, we propose a novel Fine-grained Semantic Adaptation (FISA) method to address this limitation.

Language Modeling Language Modelling +2

OStr-DARTS: Differentiable Neural Architecture Search based on Operation Strength

1 code implementation22 Sep 2024 Le Yang, Ziwei Zheng, Yizeng Han, Shiji Song, Gao Huang, Fan Li

Differentiable architecture search (DARTS) has emerged as a promising technique for effective neural architecture search, and it mainly contains two steps to find the high-performance architecture: First, the DARTS supernet that consists of mixed operations will be optimized via gradient descent.

Attribute Neural Architecture Search

Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators

1 code implementation11 Aug 2024 Yifan Pu, Zhuofan Xia, Jiayi Guo, Dongchen Han, Qixiu Li, Duo Li, Yuhui Yuan, Ji Li, Yizeng Han, Shiji Song, Gao Huang, Xiu Li

In response to this observation, we present a novel diffusion transformer framework incorporating an additional set of mediator tokens to engage with queries and keys separately.

Denoising

UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation

1 code implementation29 Jul 2024 Chaoqun Du, Yulin Wang, Jiayi Guo, Yizeng Han, Jie zhou, Gao Huang

To this end, we propose a Unified Test-Time Adaptation (UniTTA) benchmark, which is comprehensive and widely applicable.

Test-time Adaptation

DyFADet: Dynamic Feature Aggregation for Temporal Action Detection

1 code implementation3 Jul 2024 Le Yang, Ziwei Zheng, Yizeng Han, Hao Cheng, Shiji Song, Gao Huang, Fan Li

Based on DFA, the proposed dynamic encoder layer aggregates the temporal features within the action time ranges and guarantees the discriminability of the extracted representations.

Action Detection Dynamic neural networks +1

Demystify Mamba in Vision: A Linear Attention Perspective

1 code implementation26 May 2024 Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang

By exploring the similarities and disparities between the effective Mamba and subpar linear attention Transformer, we provide comprehensive analyses to demystify the key factors behind Mamba's success.

image-classification Image Classification +1

EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training

1 code implementation14 May 2024 Yulin Wang, Yang Yue, Rui Lu, Yizeng Han, Shiji Song, Gao Huang

These patterns, when observed through frequency and spatial domains, incorporate lower-frequency components, and the natural image contents without distortion or data augmentation.

Data Augmentation Self-Supervised Learning

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

1 code implementation18 Mar 2024 Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You

Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency.

Mixture-of-Experts parameter-efficient fine-tuning +2

GRA: Detecting Oriented Objects through Group-wise Rotating and Attention

no code implementations17 Mar 2024 Jiangshan Wang, Yifan Pu, Yizeng Han, Jiayi Guo, Yiru Wang, Xiu Li, Gao Huang

GRA can adaptively capture fine-grained features of objects with diverse orientations, comprising two key components: Group-wise Rotating and Group-wise Attention.

Object object-detection +2

SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning

1 code implementation21 Feb 2024 Chaoqun Du, Yizeng Han, Gao Huang

Recent advancements in semi-supervised learning have focused on a more realistic yet challenging task: addressing imbalances in labeled data while the class distribution of unlabeled data remains both unknown and potentially mismatched.

Mask Grounding for Referring Image Segmentation

1 code implementation CVPR 2024 Yong Xien Chng, Henry Zheng, Yizeng Han, Xuchong Qiu, Gao Huang

To tackle this challenge, we introduce a novel Mask Grounding auxiliary task that significantly improves visual grounding within language features, by explicitly teaching the model to learn fine-grained correspondence between masked textual tokens and their matching visual objects.

cross-modal alignment Image Segmentation +5

GSVA: Generalized Segmentation via Multimodal Large Language Models

1 code implementation CVPR 2024 Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang

Generalized Referring Expression Segmentation (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image.

 Ranked #1 on Generalized Referring Expression Segmentation on gRefCOCO (using extra training data)

Decoder Generalized Referring Expression Segmentation +2

Agent Attention: On the Integration of Softmax and Linear Attention

2 code implementations14 Dec 2023 Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Siyuan Pan, Pengfei Wan, Shiji Song, Gao Huang

Specifically, the Agent Attention, denoted as a quadruple $(Q, A, K, V)$, introduces an additional set of agent tokens $A$ into the conventional attention module.

Computational Efficiency image-classification +5

Fine-grained Recognition with Learnable Semantic Data Augmentation

1 code implementation1 Sep 2023 Yifan Pu, Yizeng Han, Yulin Wang, Junlan Feng, Chao Deng, Gao Huang

Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories.

Data Augmentation Fine-Grained Image Recognition +3

Latency-aware Unified Dynamic Networks for Efficient Image Recognition

1 code implementation30 Aug 2023 Yizeng Han, Zeyu Liu, Zhihang Yuan, Yifan Pu, Chaofei Wang, Shiji Song, Gao Huang

Dynamic computation has emerged as a promising avenue to enhance the inference efficiency of deep networks.

Scheduling

Computation-efficient Deep Learning for Computer Vision: A Survey

no code implementations27 Aug 2023 Yulin Wang, Yizeng Han, Chaofei Wang, Shiji Song, Qi Tian, Gao Huang

Over the past decade, deep learning models have exhibited considerable advancements, reaching or even exceeding human-level performance in a range of visual perception tasks.

Autonomous Vehicles Deep Learning +3

FLatten Transformer: Vision Transformer using Focused Linear Attention

1 code implementation ICCV 2023 Dongchen Han, Xuran Pan, Yizeng Han, Shiji Song, Gao Huang

The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks.

Diversity

Dynamic Perceiver for Efficient Visual Recognition

1 code implementation ICCV 2023 Yizeng Han, Dongchen Han, Zeyu Liu, Yulin Wang, Xuran Pan, Yifan Pu, Chao Deng, Junlan Feng, Shiji Song, Gao Huang

Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.

Action Recognition Classification +5

Adaptive Rotated Convolution for Rotated Object Detection

1 code implementation ICCV 2023 Yifan Pu, Yiru Wang, Zhuofan Xia, Yizeng Han, Yulin Wang, Weihao Gan, Zidong Wang, Shiji Song, Gao Huang

In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images, and an efficient conditional computation mechanism is introduced to accommodate the large orientation variations of objects within an image.

ARC Object +3

Latency-aware Spatial-wise Dynamic Networks

2 code implementations12 Oct 2022 Yizeng Han, Zhihang Yuan, Yifan Pu, Chenhao Xue, Shiji Song, Guangyu Sun, Gao Huang

The latency prediction model can efficiently estimate the inference latency of dynamic networks by simultaneously considering algorithms, scheduling strategies, and hardware properties.

image-classification Image Classification +5

Learning to Weight Samples for Dynamic Early-exiting Networks

1 code implementation17 Sep 2022 Yizeng Han, Yifan Pu, Zihang Lai, Chaofei Wang, Shiji Song, Junfen Cao, Wenhui Huang, Chao Deng, Gao Huang

Intuitively, easy samples, which generally exit early in the network during inference, should contribute more to training early classifiers.

Meta-Learning

CAM-loss: Towards Learning Spatially Discriminative Feature Representations

no code implementations ICCV 2021 Chaofei Wang, Jiayu Xiao, Yizeng Han, Qisen Yang, Shiji Song, Gao Huang

The backbone of traditional CNN classifier is generally considered as a feature extractor, followed by a linear layer which performs the classification.

Few-Shot Learning image-classification +3

Adaptive Focus for Efficient Video Recognition

2 code implementations ICCV 2021 Yulin Wang, Zhaoxi Chen, Haojun Jiang, Shiji Song, Yizeng Han, Gao Huang

In this paper, we explore the spatial redundancy in video recognition with the aim to improve the computational efficiency.

Computational Efficiency Video Recognition

Resolution Adaptive Networks for Efficient Inference

2 code implementations CVPR 2020 Le Yang, Yizeng Han, Xi Chen, Shiji Song, Jifeng Dai, Gao Huang

Adaptive inference is an effective mechanism to achieve a dynamic tradeoff between accuracy and computational cost in deep networks.

Cannot find the paper you are looking for? You can Submit a new open access paper.