Search Results for author: Qingru Zhang

Found 11 papers, 8 papers with code

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

1 code implementation • 8 Mar 2024 • Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao

Key-value (KV) caching has become the de-facto to accelerate generation speed for large language models (LLMs) inference.

Quantization

Paper
Code

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

1 code implementation • 3 Nov 2023 • Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao

In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers.

Paper
Code

Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer

no code implementations • 19 Oct 2023 • Qingru Zhang, Dhananjay Ram, Cole Hawkins, Sheng Zha, Tuo Zhao

These models leverage the attention mechanism to capture long- and short-range dependencies in the sequence.

8k Computational Efficiency +1

Paper
Add Code

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

no code implementations • 20 Jun 2023 • Yixiao Li, Yifan Yu, Qingru Zhang, Chen Liang, Pengcheng He, Weizhu Chen, Tuo Zhao

Pruning enhances the diversity of low-rank approximations, and low-rank approximation prevents pruning from losing too many expressive neurons.

Model Compression Natural Language Understanding +2

Paper
Add Code

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

2 code implementations • 18 Mar 2023 • Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao

Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e. g., low-rank increments.

Question Answering Text Generation

202

Paper
Code

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

1 code implementation • 4 Oct 2022 • Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao

As such, TED reduces the knowledge gap between the two models and helps the student to fit better on the target task.

Language Modelling Model Compression

Paper
Code

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

1 code implementation • 25 Jun 2022 • Qingru Zhang, Simiao Zuo, Chen Liang, Alexander Bukharin, Pengcheng He, Weizhu Chen, Tuo Zhao

Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks.

Image Classification Natural Language Understanding +1

Paper
Code

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation

1 code implementation • NAACL 2022 • Simiao Zuo, Qingru Zhang, Chen Liang, Pengcheng He, Tuo Zhao, Weizhu Chen

We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.

Knowledge Distillation Natural Language Understanding +1

Paper
Code

A Biased Graph Neural Network Sampler with Near-Optimal Regret

1 code implementation • NeurIPS 2021 • Qingru Zhang, David Wipf, Quan Gan, Le Song

Graph neural networks (GNN) have recently emerged as a vehicle for applying deep network architectures to graph and relational data.

Paper
Code

A Non-asymptotic comparison of SVRG and SGD: tradeoffs between compute and speed

no code implementations • 25 Sep 2019 • Qingru Zhang, Yuhuai Wu, Fartash Faghri, Tianzong Zhang, Jimmy Ba

In this paper, we present a non-asymptotic analysis of SVRG under a noisy least squares regression problem.

Computational Efficiency regression +1

Paper
Add Code

AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods

3 code implementations • ICLR 2019 • Zhiming Zhou, Qingru Zhang, Guansong Lu, Hongwei Wang, Wei-Nan Zhang, Yong Yu

Adam is shown not being able to converge to the optimal solution in certain cases.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.