Search Results for author: Yuhui Xu

Found 12 papers, 9 papers with code

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models

1 code implementation • 22 Feb 2024 • Xudong Lu, Qi Liu, Yuhui Xu, Aojun Zhou, Siyuan Huang, Bo Zhang, Junchi Yan, Hongsheng Li

Specifically, we propose, for the first time to our best knowledge, post-training approaches for task-agnostic and task-specific expert pruning and skipping of MoE LLMs, tailored to improve deployment efficiency while maintaining model performance across a wide range of tasks.

Paper
Code

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

1 code implementation • 26 Sep 2023 • Yuhui Xu, Lingxi Xie, Xiaotao Gu, Xin Chen, Heng Chang, Hengheng Zhang, Zhengsu Chen, Xiaopeng Zhang, Qi Tian

Recently years have witnessed a rapid development of large language models (LLMs).

Quantization

5,881

Paper
Code

Batch Normalization with Enhanced Linear Transformation

1 code implementation • 28 Nov 2020 • Yuhui Xu, Lingxi Xie, Cihang Xie, Jieru Mei, Siyuan Qiao, Wei Shen, Hongkai Xiong, Alan Yuille

Batch normalization (BN) is a fundamental unit in modern deep networks, in which a linear transformation module was designed for improving BN's flexibility of fitting complex data distributions.

Paper
Code

Weight-Sharing Neural Architecture Search: A Battle to Shrink the Optimization Gap

no code implementations • 4 Aug 2020 • Lingxi Xie, Xin Chen, Kaifeng Bi, Longhui Wei, Yuhui Xu, Zhengsu Chen, Lanfei Wang, An Xiao, Jianlong Chang, Xiaopeng Zhang, Qi Tian

Neural architecture search (NAS) has attracted increasing attentions in both academia and industry.

Neural Architecture Search

Paper
Add Code

TRP: Trained Rank Pruning for Efficient Deep Neural Networks

1 code implementation • 30 Apr 2020 • Yuhui Xu, Yuxi Li, Shuai Zhang, Wei Wen, Botao Wang, Yingyong Qi, Yiran Chen, Weiyao Lin, Hongkai Xiong

The TRP trained network inherently has a low-rank structure, and is approximated with negligible performance loss, thus eliminating the fine-tuning process after low rank decomposition.

Paper
Code

Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks

no code implementations • 17 Apr 2020 • Xin Chen, Lingxi Xie, Jun Wu, Longhui Wei, Yuhui Xu, Qi Tian

We alleviate this issue by training a graph convolutional network to fit the performance of sampled sub-networks so that the impact of random errors becomes minimal.

Neural Architecture Search

Paper
Add Code

Latency-Aware Differentiable Neural Architecture Search

1 code implementation • 17 Jan 2020 • Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Bowen Shi, Qi Tian, Hongkai Xiong

However, these methods suffer the difficulty in optimizing network, so that the searched network is often unfriendly to hardware.

Neural Architecture Search

Paper
Code

Trained Rank Pruning for Efficient Deep Neural Networks

1 code implementation • 9 Oct 2019 • Yuhui Xu, Yuxi Li, Shuai Zhang, Wei Wen, Botao Wang, Wenrui Dai, Yingyong Qi, Yiran Chen, Weiyao Lin, Hongkai Xiong

To accelerate DNNs inference, low-rank approximation has been widely adopted because of its solid theoretical rationale and efficient implementations.

Paper
Code

PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search

8 code implementations • ICLR 2020 • Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, Hongkai Xiong

Differentiable architecture search (DARTS) provided a fast solution in finding effective network architectures, but suffered from large memory and computing overheads in jointly training a super-network and searching for an optimal architecture.

Ranked #20 on Neural Architecture Search on CIFAR-10

Neural Architecture Search

429

Paper
Code

Trained Rank Pruning for Efficient Deep Neural Networks

1 code implementation • 6 Dec 2018 • Yuhui Xu, Yuxi Li, Shuai Zhang, Wei Wen, Botao Wang, Yingyong Qi, Yiran Chen, Weiyao Lin, Hongkai Xiong

We propose Trained Rank Pruning (TRP), which iterates low rank approximation and training.

Quantization

Paper
Code

DNQ: Dynamic Network Quantization

no code implementations • 6 Dec 2018 • Yuhui Xu, Shuai Zhang, Yingyong Qi, Jiaxian Guo, Weiyao Lin, Hongkai Xiong

Network quantization is an effective method for the deployment of neural networks on memory and energy constrained mobile devices.

Quantization

Paper
Add Code

Deep Neural Network Compression with Single and Multiple Level Quantization

1 code implementation • 6 Mar 2018 • Yuhui Xu, Yongzhuang Wang, Aojun Zhou, Weiyao Lin, Hongkai Xiong

In this paper, we propose two novel network quantization approaches, single-level network quantization (SLQ) for high-bit quantization and multi-level network quantization (MLQ) for extremely low-bit quantization (ternary). We are the first to consider the network quantization from both width and depth level.

Neural Network Compression Quantization

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.