1 code implementation • 13 Aug 2024 • Shibo Jie, Yehui Tang, Jianyuan Guo, Zhi-Hong Deng, Kai Han, Yunhe Wang
Token compression expedites the training and inference of Vision Transformers (ViTs) by reducing the number of the redundant tokens, e. g., pruning inattentive tokens or merging similar tokens.
no code implementations • 14 Jul 2024 • Xiaoyu Liu, Yun Zhang, Wei Li, Simiao Li, Xudong Huang, Hanting Chen, Yehui Tang, Jie Hu, Zhiwei Xiong, Yunhe Wang
At the sequence level, we propose a sequence correction and re-generation (SCRG) strategy.
1 code implementation • 17 Jun 2024 • Wenshuo Li, Xinghao Chen, Han Shu, Yehui Tang, Yunhe Wang
For instance, we achieve approximately $70\times$ compression for the Pythia-410M model, with the final performance being as accurate as the original model on various downstream tasks.
3 code implementations • 19 May 2024 • Jialong Guo, Xinghao Chen, Yehui Tang, Yunhe Wang
However, replacing LayerNorm with more efficient BatchNorm in transformer often leads to inferior performance and collapse in training.
3 code implementations • 14 May 2024 • Yingjie Zhai, Wenshuo Li, Yehui Tang, Xinghao Chen, Yunhe Wang
In this paper, we propose to squeeze the time axis of a video sequence into the channel dimension and present a lightweight video recognition network, term as \textit{SqueezeTime}, for mobile video understanding.
1 code implementation • 13 May 2024 • Yunsheng Ni, Chuanjian Liu, Yehui Tang, Kai Han, Yunhe Wang
Speculative decoding emerges as a pivotal technique for enhancing the inference speed of Large Language Models (LLMs).
1 code implementation • 10 May 2024 • ZhenLiang Ni, Xinghao Chen, Yingjie Zhai, Yehui Tang, Yunhe Wang
A shape self-calibration function is designed to make the key areas closer to foreground objects.
1 code implementation • 9 May 2024 • Shibo Jie, Yehui Tang, Ning Ding, Zhi-Hong Deng, Kai Han, Yunhe Wang
Current solutions for efficiently constructing large vision-language (VL) models follow a two-step paradigm: projecting the output of pre-trained vision encoders to the input space of pre-trained language models as visual prompts; and then transferring the models to downstream VL tasks via end-to-end parameter-efficient fine-tuning (PEFT).
1 code implementation • 29 Apr 2024 • Fangcheng Liu, Yehui Tang, Zhenhua Liu, Yunsheng Ni, Kai Han, Yunhe Wang
It is noteworthy that the inference latency of the self-draft model may no longer be negligible compared to the large model, necessitating strategies to increase the token acceptance rate while minimizing the drafting steps of the small model.
1 code implementation • 17 Apr 2024 • Zhenhua Liu, Zhiwei Hao, Kai Han, Yehui Tang, Yunhe Wang
In this paper, by systematically investigating the impact of different training ingredients, we introduce a strong training strategy for compact models.
1 code implementation • 27 Feb 2024 • Chengcheng Wang, Zhiwei Hao, Yehui Tang, Jianyuan Guo, Yujie Yang, Kai Han, Yunhe Wang
In this paper, we propose the SAM-DiffSR model, which can utilize the fine-grained structure information from SAM in the process of sampling noise to improve the image quality without additional computational cost during inference.
1 code implementation • 26 Feb 2024 • wei he, Kai Han, Yehui Tang, Chengcheng Wang, Yujie Yang, Tianyu Guo, Yunhe Wang
Large language models (LLMs) face a daunting challenge due to the excessive computational and memory requirements of the commonly used Transformer architecture.
1 code implementation • 7 Feb 2024 • Jianyuan Guo, Zhiwei Hao, Chengcheng Wang, Yehui Tang, Han Wu, Han Hu, Kai Han, Chang Xu
Training general-purpose vision models on purely sequential visual data, eschewing linguistic inputs, has heralded a new frontier in visual understanding.
1 code implementation • 5 Feb 2024 • Yehui Tang, Fangcheng Liu, Yunsheng Ni, Yuchuan Tian, Zheyuan Bai, Yi-Qi Hu, Sichao Liu, Shangling Jui, Kai Han, Yunhe Wang
Several design formulas are empirically proved especially effective for tiny language models, including tokenizer compression, architecture tweaking, parameter inheritance and multiple-round training.
no code implementations • 5 Feb 2024 • Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, DaCheng Tao
Model compression methods reduce the memory and computational cost of Transformer, which is a necessary step to implement large language/vision models on practical devices.
no code implementations • CVPR 2024 • Hao Xiong, Yehui Tang, Xinyu Ye, Junchi Yan
However it remains unclear for the embodiment of the quantum circuits (QC) for QIP let alone a (thorough) evaluation of the QIP circuits especially in a practical context in the NISQ era by applying QIP to ML via hybrid quantum-classic pipelines.
no code implementations • 27 Dec 2023 • Yunhe Wang, Hanting Chen, Yehui Tang, Tianyu Guo, Kai Han, Ying Nie, Xutao Wang, Hailin Hu, Zheyuan Bai, Yun Wang, Fangcheng Liu, Zhicheng Liu, Jianyuan Guo, Sinan Zeng, Yinchen Zhang, Qinghua Xu, Qun Liu, Jun Yao, Chao Xu, DaCheng Tao
We then demonstrate that the proposed approach is significantly effective for enhancing the model nonlinearity through carefully designed ablations; thus, we present a new efficient model architecture for establishing modern, namely, PanGu-$\pi$.
2 code implementations • 21 Dec 2023 • Han Shu, Wenshuo Li, Yehui Tang, Yiman Zhang, Yihao Chen, Houqiang Li, Yunhe Wang, Xinghao Chen
Extensive experiments on various zero-shot transfer tasks demonstrate the significantly advantageous performance of our TinySAM against counterpart methods.
no code implementations • 13 Dec 2023 • Xin Ding, Xiaoyu Liu, Zhijun Tu, Yun Zhang, Wei Li, Jie Hu, Hanting Chen, Yehui Tang, Zhiwei Xiong, Baoqun Yin, Yunhe Wang
Post-training quantization (PTQ) has played a key role in compressing large language models (LLMs) with ultra-low costs.
no code implementations • 1 Dec 2023 • Ying Nie, wei he, Kai Han, Yehui Tang, Tianyu Guo, Fanyi Du, Yunhe Wang
Moreover, based on the observation that the accuracy of CLIP model does not increase correspondingly as the parameters of text encoder increase, an extra objective of masked language modeling (MLM) is leveraged for maximizing the potential of the shortened text encoder.
1 code implementation • NeurIPS 2023 • Zhiwei Hao, Jianyuan Guo, Kai Han, Yehui Tang, Han Hu, Yunhe Wang, Chang Xu
To tackle the challenge in distilling heterogeneous models, we propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.
1 code implementation • 1 Jun 2023 • Ning Ding, Yehui Tang, Zhongqian Fu, Chao Xu, Kai Han, Yunhe Wang
We present a new learning paradigm in which the knowledge extracted from large pre-trained models are utilized to help models like CNN and ViT learn enhanced representations and achieve better performance.
1 code implementation • CVPR 2023 • Haoqing Wang, Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhi-Hong Deng, Kai Han
The lower layers are not explicitly guided and the interaction among their patches is only used for calculating new activations.
1 code implementation • CVPR 2023 • Ning Ding, Yehui Tang, Kai Han, Chao Xu, Yunhe Wang
Recently, the sizes of deep neural networks and training datasets both increase drastically to pursue better performance in a practical sense.
1 code implementation • 13 Dec 2022 • Jianyuan Guo, Kai Han, Han Wu, Yehui Tang, Yunhe Wang, Chang Xu
This paper presents FastMIM, a simple and generic framework for expediting masked image modeling with the following two steps: (i) pre-training vision backbones with low-resolution input images; and (ii) reconstructing Histograms of Oriented Gradients (HOG) feature instead of original RGB values of the input images.
11 code implementations • 23 Nov 2022 • Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Chao Xu, Yunhe Wang
The convolutional operation can only capture local information in a window region, which prevents performance from being further improved.
1 code implementation • International Conference on Machine Learning 2022 • Yanxi Li, Xinghao Chen, Minjing Dong, Yehui Tang, Yunhe Wang, Chang Xu
Recently, neural architectures with all Multi-layer Perceptrons (MLPs) have attracted great research interest from the computer vision community.
Ranked #500 on Image Classification on ImageNet
9 code implementations • 1 Jun 2022 • Kai Han, Yunhe Wang, Jianyuan Guo, Yehui Tang, Enhua Wu
In this paper, we propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks.
Ranked #370 on Image Classification on ImageNet
1 code implementation • CVPR 2022 • Ning Ding, Yixing Xu, Yehui Tang, Chao Xu, Yunhe Wang, DaCheng Tao
Domain Adaptation aims to transfer the knowledge learned from a labeled source domain to an unlabeled target domain whose data distributions are different.
no code implementations • 19 Feb 2022 • Yehui Tang, Junchi Yan, Hancock Edwin
Quantum computing (QC) is a new computational paradigm whose foundations relate to quantum physics.
1 code implementation • 4 Jan 2022 • Kai Han, Jianyuan Guo, Yehui Tang, Yunhe Wang
We hope this new baseline will be helpful to the further research and application of vision transformer.
8 code implementations • CVPR 2022 • Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Yanxi Li, Chao Xu, Yunhe Wang
To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase.
10 code implementations • CVPR 2022 • Jianyuan Guo, Yehui Tang, Kai Han, Xinghao Chen, Han Wu, Chao Xu, Chang Xu, Yunhe Wang
Previous vision MLPs such as MLP-Mixer and ResMLP accept linearly flattened image patches as input, making them inflexible for different input sizes and hard to capture spatial information.
1 code implementation • ICCV 2021 • Yuqiao Liu, Yehui Tang, Yanan sun
Specifically, a homogeneous architecture augmentation algorithm is proposed in HAAP to generate sufficient training data taking the use of homogeneous representation.
14 code implementations • CVPR 2022 • Jianyuan Guo, Kai Han, Han Wu, Yehui Tang, Xinghao Chen, Yunhe Wang, Chang Xu
Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image.
1 code implementation • 3 Jul 2021 • Zhiwei Hao, Jianyuan Guo, Ding Jia, Kai Han, Yehui Tang, Chao Zhang, Han Hu, Yunhe Wang
Specifically, we train a tiny student model to match a pre-trained teacher model in the patch-level manifold space.
4 code implementations • NeurIPS 2021 • Yehui Tang, Kai Han, Chang Xu, An Xiao, Yiping Deng, Chao Xu, Yunhe Wang
Transformer models have achieved great progress on computer vision tasks recently.
7 code implementations • CVPR 2021 • Yixing Xu, Yunhe Wang, Kai Han, Yehui Tang, Shangling Jui, Chunjing Xu, Chang Xu
An effective and efficient architecture performance evaluation scheme is essential for the success of Neural Architecture Search (NAS).
no code implementations • CVPR 2022 • Yehui Tang, Kai Han, Yunhe Wang, Chang Xu, Jianyuan Guo, Chao Xu, DaCheng Tao
We first identify the effective patches in the last layer and then use them to guide the patch selection process of previous layers.
Ranked #8 on Efficient ViTs on ImageNet-1K (with DeiT-T)
2 code implementations • 17 Apr 2021 • Mingjian Zhu, Yehui Tang, Kai Han
Vision transformer has achieved competitive performance on a variety of computer vision applications.
7 code implementations • CVPR 2021 • Yehui Tang, Yunhe Wang, Yixing Xu, Yiping Deng, Chao Xu, DaCheng Tao, Chang Xu
Then, the manifold relationship between instances and the pruned sub-networks will be aligned in the training procedure.
3 code implementations • NeurIPS 2021 • Yixing Xu, Kai Han, Chang Xu, Yehui Tang, Chunjing Xu, Yunhe Wang
Binary neural networks (BNNs) represent original full-precision weights and activations into 1-bit with sign function.
no code implementations • 23 Dec 2020 • Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, DaCheng Tao
Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism.
4 code implementations • NeurIPS 2020 • Yehui Tang, Yunhe Wang, Yixing Xu, DaCheng Tao, Chunjing Xu, Chao Xu, Chang Xu
To increase the reliability of the results, we prefer to have a more rigorous research design by including a scientific control group as an essential part to minimize the effect of all factors except the association between the filter and expected network output.
no code implementations • CVPR 2020 • Yehui Tang, Yunhe Wang, Yixing Xu, Hanting Chen, Chunjing Xu, Boxin Shi, Chao Xu, Qi Tian, Chang Xu
A graph convolutional neural network is introduced to predict the performance of architectures based on the learned representations and their relation modeled by the graph.
2 code implementations • 23 Feb 2020 • Yehui Tang, Yunhe Wang, Yixing Xu, Boxin Shi, Chao Xu, Chunjing Xu, Chang Xu
On one hand, massive trainable parameters significantly enhance the performance of these deep networks.
4 code implementations • 30 Sep 2019 • Yixing Xu, Yunhe Wang, Kai Han, Yehui Tang, Shangling Jui, Chunjing Xu, Chang Xu
An effective and efficient architecture performance evaluation scheme is essential for the success of Neural Architecture Search (NAS).
no code implementations • 13 Jul 2019 • Yehui Tang, Shan You, Chang Xu, Boxin Shi, Chao Xu
Specifically, we exploit the unlabeled data to mimic the classification characteristics of giant networks, so that the original capacity can be preserved nicely.