no code implementations • 16 Dec 2024 • Michael Bendersky, Donald Metzler, Marc Najork, Xuanhui Wang
This article describes the history of information retrieval on personal document collections.
no code implementations • 7 Nov 2024 • Xinyu Zhang, Jing Lu, Vinh Q. Tran, Tal Schuster, Donald Metzler, Jimmy Lin
Results show that the general shared semantics could get the models a long way in making the predictions on mLMs with different tokenizers and model sizes.
no code implementations • 2 Oct 2024 • Dara Bahri, John Wieting, Dana Alon, Donald Metzler
Watermarking has recently emerged as an effective strategy for detecting the outputs of large language models (LLMs).
no code implementations • 15 Apr 2024 • Yang Gao, Dana Alon, Donald Metzler
A key requirement in developing Generative Language Models (GLMs) is to have their values aligned with human values.
no code implementations • 8 Apr 2024 • Tim Baumgärtner, Yang Gao, Dana Alon, Donald Metzler
Reinforcement Learning from Human Feedback (RLHF) is a popular method for aligning Language Models (LM) with human values and preferences.
2 code implementations • 8 Nov 2023 • Tal Schuster, Adam D. Lelkes, Haitian Sun, Jai Gupta, Jonathan Berant, William W. Cohen, Donald Metzler
Experimenting with several LLMs in various settings, we find this task to be surprisingly challenging, demonstrating the importance of QuoteSum for developing and studying such consolidation capabilities.
no code implementations • 22 Oct 2023 • Andrew Drozdov, Honglei Zhuang, Zhuyun Dai, Zhen Qin, Razieh Rahimi, Xuanhui Wang, Dana Alon, Mohit Iyyer, Andrew McCallum, Donald Metzler, Kai Hui
Recent studies show that large language models (LLMs) can be instructed to effectively perform zero-shot passage re-ranking, in which the results of a first stage retrieval method, such as BM25, are rated and reordered to improve relevance.
no code implementations • 30 Jun 2023 • Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Le Yan, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, Michael Bendersky
Ranking documents using Large Language Models (LLMs) by directly feeding the query and candidate documents into the prompt is an interesting and practical problem.
no code implementations • 5 Jun 2023 • Gabriel Bénédict, Ruqing Zhang, Donald Metzler
Generative information retrieval (IR) has experienced substantial growth across multiple research communities (e. g., information retrieval, computer vision, natural language processing, and machine learning), and has been highly visible in the popular press.
no code implementations • 31 May 2023 • Jeremiah Milbauer, Annie Louis, Mohammad Javad Hosseini, Alex Fabrikant, Donald Metzler, Tal Schuster
Transformer encoders contextualize token representations by attending to all other tokens at each layer, leading to quadratic increase in compute effort with the input length.
no code implementations • 19 May 2023 • Ronak Pradeep, Kai Hui, Jai Gupta, Adam D. Lelkes, Honglei Zhuang, Jimmy Lin, Donald Metzler, Vinh Q. Tran
Popularized by the Differentiable Search Index, the emerging paradigm of generative retrieval re-frames the classic information retrieval problem into a sequence-to-sequence modeling task, forgoing external indices and encoding an entire document corpus within a single Transformer.
no code implementations • 19 Dec 2022 • Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Jinfeng Rao, Marc Najork, Emma Strubell, Donald Metzler
In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents.
no code implementations • 16 Dec 2022 • Jai Gupta, Yi Tay, Chaitanya Kamath, Vinh Q. Tran, Donald Metzler, Shailesh Bavadekar, Mimi Sun, Evgeniy Gabrilovich
With the devastating outbreak of COVID-19, vaccines are one of the crucial lines of defense against mass infection in this global pandemic.
1 code implementation • 15 Dec 2022 • Bernd Bohnet, Vinh Q. Tran, Pat Verga, Roee Aharoni, Daniel Andor, Livio Baldini Soares, Massimiliano Ciaramita, Jacob Eisenstein, Kuzman Ganchev, Jonathan Herzig, Kai Hui, Tom Kwiatkowski, Ji Ma, Jianmo Ni, Lierni Sestorain Saralegui, Tal Schuster, William W. Cohen, Michael Collins, Dipanjan Das, Donald Metzler, Slav Petrov, Kellie Webster
We take human annotations as a gold standard and show that a correlated automatic metric is suitable for development.
no code implementations • 20 Oct 2022 • Yi Tay, Jason Wei, Hyung Won Chung, Vinh Q. Tran, David R. So, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani
This paper proposes UL2R, a method that substantially improves existing language models and their scaling curves with a relatively tiny amount of extra compute.
Ranked #2 on
Cross-Lingual Question Answering
on TyDiQA-GoldP
no code implementations • 21 Jul 2022 • Yi Tay, Mostafa Dehghani, Samira Abnar, Hyung Won Chung, William Fedus, Jinfeng Rao, Sharan Narang, Vinh Q. Tran, Dani Yogatama, Donald Metzler
There have been a lot of interest in the scaling properties of Transformer models.
no code implementations • 14 Jul 2022 • Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Q. Tran, Yi Tay, Donald Metzler
Recent advances in Transformer-based large language models (LLMs) have led to significant performance improvements across many tasks.
no code implementations • 15 Jun 2022 • Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus
Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks.
2 code implementations • 10 May 2022 • Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Siamak Shakeri, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, Donald Metzler
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
Ranked #1 on
Long-range modeling
on SCROLLS
(CNLI metric)
no code implementations • 2 May 2022 • Hamed Zamani, Fernando Diaz, Mostafa Dehghani, Donald Metzler, Michael Bendersky
Although information access systems have long supported people in accomplishing a wide range of tasks, we propose broadening the scope of users of information access systems to include task-driven machines, such as machine learning models.
1 code implementation • 15 Apr 2022 • Tal Schuster, Sihao Chen, Senaka Buthpitiya, Alex Fabrikant, Donald Metzler
Natural Language Inference (NLI) has been extensively studied by the NLP community as a framework for estimating the semantic relation between sentence pairs.
no code implementations • 1 Mar 2022 • Yun He, Huaixiu Steven Zheng, Yi Tay, Jai Gupta, Yu Du, Vamsi Aribandi, Zhe Zhao, Yaguang Li, Zhao Chen, Donald Metzler, Heng-Tze Cheng, Ed H. Chi
Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient way.
no code implementations • 22 Feb 2022 • Alyssa Lees, Vinh Q. Tran, Yi Tay, Jeffrey Sorensen, Jai Gupta, Donald Metzler, Lucy Vasserman
As such, it is crucial to develop models that are effective across a diverse range of languages, usages, and styles.
1 code implementation • 14 Feb 2022 • Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler
In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model.
no code implementations • 5 Jan 2022 • John Alex, Keith Hall, Donald Metzler
We argue that current IR metrics, modeled on optimizing user experience, measure too narrow a portion of the IR space.
4 code implementations • ICLR 2022 • Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler
Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during pre-training.
no code implementations • ICLR 2022 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler
The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.
3 code implementations • 22 Sep 2021 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler
The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.
1 code implementation • ACL 2021 • Yi Tay, Mostafa Dehghani, Jai Prakash Gupta, Vamsi Aribandi, Dara Bahri, Zhen Qin, Donald Metzler
In the context of language models, are convolutional models competitive to Transformers when pre-trained?
no code implementations • 14 Jul 2021 • Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals
The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods.
1 code implementation • ICLR 2022 • Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler
Self-supervised contrastive representation learning has proved incredibly successful in the vision and natural language domains, enabling state-of-the-art performance with orders of magnitude less labeled data.
2 code implementations • ICLR 2022 • Yi Tay, Vinh Q. Tran, Sebastian Ruder, Jai Gupta, Hyung Won Chung, Dara Bahri, Zhen Qin, Simon Baumgartner, Cong Yu, Donald Metzler
In this paper, we propose a new model inductive bias that learns a subword tokenization end-to-end as part of the model.
Ranked #3 on
Paraphrase Identification
on Quora Question Pairs
no code implementations • Findings (ACL) 2021 • Vamsi Aribandi, Yi Tay, Donald Metzler
In the pursuit of a deeper understanding of a model's behaviour, there is recent impetus for developing suites of probes aimed at diagnosing models beyond simple metrics like accuracy or BLEU.
1 code implementation • 7 May 2021 • Yi Tay, Mostafa Dehghani, Jai Gupta, Dara Bahri, Vamsi Aribandi, Zhen Qin, Donald Metzler
In the context of language models, are convolutional models competitive to Transformers when pre-trained?
no code implementations • 5 May 2021 • Donald Metzler, Yi Tay, Dara Bahri, Marc Najork
When experiencing an information need, users want to engage with a domain expert, but often turn to an information retrieval system, such as a search engine, instead.
1 code implementation • 1 Mar 2021 • Yi Tay, Mostafa Dehghani, Vamsi Aribandi, Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Donald Metzler
In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network.
Ranked #1 on
Machine Translation
on WMT2017 Russian-English
no code implementations • 9 Feb 2021 • Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler
Detecting out-of-distribution (OOD) examples is critical in many applications.
Out-of-Distribution Detection
Out of Distribution (OOD) Detection
no code implementations • 1 Jan 2021 • Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng
The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.
no code implementations • ICLR 2021 • Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan
Specifically, we propose a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.
no code implementations • ICLR 2021 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler
Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity.
2 code implementations • ACL 2021 • Yikang Shen, Yi Tay, Che Zheng, Dara Bahri, Donald Metzler, Aaron Courville
There are two major classes of natural language grammar -- the dependency grammar that models one-to-one correspondences between words and the constituency grammar that models the assembly of one or several corresponded words.
5 code implementations • 8 Nov 2020 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler
In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.
Ranked #4 on
ListOps
on ListOps
no code implementations • 19 Oct 2020 • Dara Bahri, Che Zheng, Yi Tay, Donald Metzler, Andrew Tomkins
Work in information retrieval has largely been centered around ranking and relevance: given a query, return some number of results ordered by relevance to the user.
no code implementations • 14 Sep 2020 • Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning.
no code implementations • 17 Aug 2020 • Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Cliff Brunk, Andrew Tomkins
Large generative language models such as GPT-2 are well-known for their ability to generate text as well as their utility in supervised downstream tasks via fine-tuning.
no code implementations • 12 Jul 2020 • Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan
The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.
1 code implementation • 2 May 2020 • Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng
The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.
Ranked #1 on
Dialogue Generation
on Persona-Chat
(BLEU-1 metric, using extra
training data)
no code implementations • 26 Apr 2020 • Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Andrew Tomkins
Work in information retrieval has traditionally focused on ranking and relevance: given a query, return some number of results ordered by relevance to the user.
no code implementations • ACL 2020 • Yi Tay, Dara Bahri, Che Zheng, Clifford Brunk, Donald Metzler, Andrew Tomkins
This paper seeks to develop a deeper understanding of the fundamental properties of neural text generations models.
1 code implementation • ICML 2020 • Yi Tay, Dara Bahri, Liu Yang, Donald Metzler, Da-Cheng Juan
We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend.
no code implementations • 21 Nov 2019 • Yu Meng, Maryam Karimzadehgan, Honglei Zhuang, Donald Metzler
In personal email search, user queries often impose different requirements on different aspects of the retrieved emails.
no code implementations • 19 Jun 2019 • Brandon Tran, Maryam Karimzadehgan, Rama Kumar Pasumarthi, Michael Bendersky, Donald Metzler
To address this data challenge, in this paper we propose a domain adaptation approach that fine-tunes the global model to each individual enterprise.
no code implementations • 15 Sep 2018 • Jiaming Shen, Maryam Karimzadehgan, Michael Bendersky, Zhen Qin, Donald Metzler
In this paper, we study how to obtain query type in an unsupervised fashion and how to incorporate this information into query-dependent ranking models.