Search Results for author: Qingru Zhang

Found 13 papers, 8 papers with code

Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering

no code implementations16 Sep 2024 Qingru Zhang, Xiaodong Yu, Chandan Singh, Xiaodong Liu, Liyuan Liu, Jianfeng Gao, Tuo Zhao, Dan Roth, Hao Cheng

However, they often struggle to fully comprehend and effectively utilize their input contexts, resulting in responses that are unfaithful or hallucinated.

Robust Reinforcement Learning from Corrupted Human Feedback

no code implementations21 Jun 2024 Alexander Bukharin, Ilgee Hong, Haoming Jiang, Zichong Li, Qingru Zhang, Zixuan Zhang, Tuo Zhao

To tackle this challenge, we propose a robust RLHF approach -- $R^3M$, which models the potentially corrupted preference label as sparse outliers.

reinforcement-learning Reinforcement Learning +1

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

1 code implementation8 Mar 2024 Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao

Key-value (KV) caching has become the de-facto to accelerate generation speed for large language models (LLMs) inference.

Quantization

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

1 code implementation3 Nov 2023 Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao

In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers.

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

no code implementations20 Jun 2023 Yixiao Li, Yifan Yu, Qingru Zhang, Chen Liang, Pengcheng He, Weizhu Chen, Tuo Zhao

Pruning enhances the diversity of low-rank approximations, and low-rank approximation prevents pruning from losing too many expressive neurons.

Diversity Model Compression +3

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

2 code implementations18 Mar 2023 Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao

Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e. g., low-rank increments.

parameter-efficient fine-tuning Question Answering +1

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

1 code implementation4 Oct 2022 Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao

As such, TED reduces the knowledge gap between the two models and helps the student to fit better on the target task.

Language Modeling Language Modelling +1

A Biased Graph Neural Network Sampler with Near-Optimal Regret

1 code implementation NeurIPS 2021 Qingru Zhang, David Wipf, Quan Gan, Le Song

Graph neural networks (GNN) have recently emerged as a vehicle for applying deep network architectures to graph and relational data.

Graph Neural Network

Cannot find the paper you are looking for? You can Submit a new open access paper.