1 code implementation • 3 Oct 2024 • Howard Yen, Tianyu Gao, Minmin Hou, Ke Ding, Daniel Fleischer, Peter Izsak, Moshe Wasserblat, Danqi Chen
There have been many benchmarks for evaluating long-context language models (LCLMs), but developers often rely on synthetic tasks like needle-in-a-haystack (NIAH) or arbitrary subsets of tasks.
1 code implementation • 3 Oct 2024 • Tianyu Gao, Alexander Wettig, Howard Yen, Danqi Chen
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information.
no code implementations • 16 Jul 2024 • Hongjin Su, Howard Yen, Mengzhou Xia, Weijia Shi, Niklas Muennighoff, Han-yu Wang, Haisu Liu, Quan Shi, Zachary S. Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O. Arik, Danqi Chen, Tao Yu
To better benchmark retrieval on such challenging queries, we introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents.
1 code implementation • 26 Feb 2024 • Howard Yen, Tianyu Gao, Danqi Chen
However, the substantial computational cost of transformers and limited generalization of positional encoding restrict the size of their context window.
1 code implementation • 1 Nov 2023 • Ryan Liu, Howard Yen, Raja Marjieh, Thomas L. Griffiths, Ranjay Krishna
How do we communicate with others to achieve our goals?
1 code implementation • 24 May 2023 • Tianyu Gao, Howard Yen, Jiatong Yu, Danqi Chen
We propose ALCE, the first benchmark for Automatic LLMs' Citation Evaluation.