Search Results for author: Yuchao Li

Found 16 papers, 11 papers with code

Most Likely Sequence Generation for $n$-Grams, Transformers, HMMs, and Markov Chains, by Using Rollout Algorithms

no code implementations19 Mar 2024 Yuchao Li, Dimitri Bertsekas

We consider methods for computing word sequences that are highly likely, based on these probabilities.

Distributed Charging Coordination of Electric Trucks with Limited Charging Resources

1 code implementation12 Nov 2023 Ting Bai, Yuchao Li, Karl Henrik Johansson, Jonas Mårtensson

Electric trucks usually need to charge their batteries during long-range delivery missions, and the charging times are often nontrivial.

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

1 code implementation19 Sep 2023 Haojun Xia, Zhen Zheng, Yuchao Li, Donglin Zhuang, Zhongzhu Zhou, Xiafei Qiu, Yong Li, Wei Lin, Shuaiwen Leon Song

Therefore, we propose Flash-LLM for enabling low-cost and highly-efficient large generative model inference with the sophisticated support of unstructured sparsity on high-performance but highly restrictive Tensor Cores.

Rollout-Based Charging Strategy for Electric Trucks with Hours-of-Service Regulations (Extended Version)

no code implementations15 Mar 2023 Ting Bai, Yuchao Li, Karl H. Johansson, Jonas Mårtensson

We assume that a collection of charging and rest stations is given along a pre-planned route with known detours and that the problem data are deterministic.

Automatic Network Pruning via Hilbert-Schmidt Independence Criterion Lasso under Information Bottleneck Principle

1 code implementation ICCV 2023 Song Guo, Lei Zhang, Xiawu Zheng, Yan Wang, Yuchao Li, Fei Chao, Chenglin Wu, Shengchuan Zhang, Rongrong Ji

In this paper, we try to solve this problem by introducing a principled and unified framework based on Information Bottleneck (IB) theory, which further guides us to an automatic pruning approach.

Network Pruning

Parameter-Efficient Sparsity for Large Language Models Fine-Tuning

2 code implementations23 May 2022 Yuchao Li, Fuli Luo, Chuanqi Tan, Mengdi Wang, Songfang Huang, Shen Li, Junjie Bai

With the dramatically increased number of parameters in language models, sparsity methods have received ever-increasing research focus to compress and accelerate the models.

An Information Theory-inspired Strategy for Automatic Network Pruning

1 code implementation19 Aug 2021 Xiawu Zheng, Yuexiao Ma, Teng Xi, Gang Zhang, Errui Ding, Yuchao Li, Jie Chen, Yonghong Tian, Rongrong Ji

This practically limits the application of model compression when the model needs to be deployed on a wide range of devices.

AutoML Model Compression +1

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient

1 code implementation4 Jun 2021 Shaokun Zhang, Xiawu Zheng, Chenyi Yang, Yuchao Li, Yan Wang, Fei Chao, Mengdi Wang, Shen Li, Jun Yang, Rongrong Ji

Motivated by the necessity of efficient inference across various constraints on BERT, we propose a novel approach, YOCO-BERT, to achieve compress once and deploy everywhere.

AutoML Model Compression

1xN Pattern for Pruning Convolutional Neural Networks

1 code implementation31 May 2021 Mingbao Lin, Yuxin Zhang, Yuchao Li, Bohong Chen, Fei Chao, Mengdi Wang, Shen Li, Yonghong Tian, Rongrong Ji

We also provide a workflow of filter rearrangement that first rearranges the weight matrix in the output channel dimension to derive more influential blocks for accuracy improvements and then applies similar rearrangement to the next-layer weights in the input channel dimension to ensure correct convolutional operations.

Network Pruning

PAMS: Quantized Super-Resolution via Parameterized Max Scale

1 code implementation ECCV 2020 Huixia Li, Chenqian Yan, Shaohui Lin, Xiawu Zheng, Yuchao Li, Baochang Zhang, Fan Yang, Rongrong Ji

Specifically, most state-of-the-art SR models without batch normalization have a large dynamic quantization range, which also serves as another cause of performance drop.

Quantization Super-Resolution +1

Lambda-Policy Iteration with Randomization for Contractive Models with Infinite Policies: Well-Posedness and Convergence

no code implementations L4DC 2020 Yuchao Li, Karl Henrik Johansson, Jonas Mårtensson

The operator is known to be well-posed for problems with finite states, but our analysis shows that it is also well-defined for the contractive models with infinite states studied.

Interpretable Neural Network Decoupling

no code implementations ECCV 2020 Yuchao Li, Rongrong Ji, Shaohui Lin, Baochang Zhang, Chenqian Yan, Yongjian Wu, Feiyue Huang, Ling Shao

More specifically, we introduce a novel architecture controlling module in each layer to encode the network architecture by a vector.

Network Interpretation

Towards Compact ConvNets via Structure-Sparsity Regularized Filter Pruning

1 code implementation23 Jan 2019 Shaohui Lin, Rongrong Ji, Yuchao Li, Cheng Deng, Xuelong. Li

In this paper, we propose a novel filter pruning scheme, termed structured sparsity regularization (SSR), to simultaneously speedup the computation and reduce the memory overhead of CNNs, which can be well supported by various off-the-shelf deep learning libraries.

Domain Adaptation object-detection +2

Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression

1 code implementation CVPR 2019 Yuchao Li, Shaohui Lin, Baochang Zhang, Jianzhuang Liu, David Doermann, Yongjian Wu, Feiyue Huang, Rongrong Ji

The relationship between the input feature maps and 2D kernels is revealed in a theoretical framework, based on which a kernel sparsity and entropy (KSE) indicator is proposed to quantitate the feature map importance in a feature-agnostic manner to guide model compression.

Clustering Model Compression

Cannot find the paper you are looking for? You can Submit a new open access paper.