no code implementations • 16 Sep 2024 • Qingru Zhang, Xiaodong Yu, Chandan Singh, Xiaodong Liu, Liyuan Liu, Jianfeng Gao, Tuo Zhao, Dan Roth, Hao Cheng
However, they often struggle to fully comprehend and effectively utilize their input contexts, resulting in responses that are unfaithful or hallucinated.
no code implementations • 21 Jun 2024 • Alexander Bukharin, Ilgee Hong, Haoming Jiang, Zichong Li, Qingru Zhang, Zixuan Zhang, Tuo Zhao
To tackle this challenge, we propose a robust RLHF approach -- $R^3M$, which models the potentially corrupted preference label as sparse outliers.
1 code implementation • 8 Mar 2024 • Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao
Key-value (KV) caching has become the de-facto to accelerate generation speed for large language models (LLMs) inference.
1 code implementation • 3 Nov 2023 • Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao
In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers.
no code implementations • 19 Oct 2023 • Qingru Zhang, Dhananjay Ram, Cole Hawkins, Sheng Zha, Tuo Zhao
These models leverage the attention mechanism to capture long- and short-range dependencies in the sequence.
no code implementations • 20 Jun 2023 • Yixiao Li, Yifan Yu, Qingru Zhang, Chen Liang, Pengcheng He, Weizhu Chen, Tuo Zhao
Pruning enhances the diversity of low-rank approximations, and low-rank approximation prevents pruning from losing too many expressive neurons.
2 code implementations • 18 Mar 2023 • Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao
Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e. g., low-rank increments.
1 code implementation • 4 Oct 2022 • Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao
As such, TED reduces the knowledge gap between the two models and helps the student to fit better on the target task.
1 code implementation • 25 Jun 2022 • Qingru Zhang, Simiao Zuo, Chen Liang, Alexander Bukharin, Pengcheng He, Weizhu Chen, Tuo Zhao
Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks.
1 code implementation • NAACL 2022 • Simiao Zuo, Qingru Zhang, Chen Liang, Pengcheng He, Tuo Zhao, Weizhu Chen
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
1 code implementation • NeurIPS 2021 • Qingru Zhang, David Wipf, Quan Gan, Le Song
Graph neural networks (GNN) have recently emerged as a vehicle for applying deep network architectures to graph and relational data.
no code implementations • 25 Sep 2019 • Qingru Zhang, Yuhuai Wu, Fartash Faghri, Tianzong Zhang, Jimmy Ba
In this paper, we present a non-asymptotic analysis of SVRG under a noisy least squares regression problem.
3 code implementations • ICLR 2019 • Zhiming Zhou, Qingru Zhang, Guansong Lu, Hongwei Wang, Wei-Nan Zhang, Yong Yu
Adam is shown not being able to converge to the optimal solution in certain cases.