Search Results for author: Donald Metzler

Found 51 papers, 15 papers with code

Paper
Add Code

Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering

no code implementations • 15 Sep 2018 • Jiaming Shen, Maryam Karimzadehgan, Michael Bendersky, Zhen Qin, Donald Metzler

In this paper, we study how to obtain query type in an unsupervised fashion and how to incorporate this information into query-dependent ranking models.

Clustering Multi-Task Learning +1

Paper
Add Code

Domain Adaptation for Enterprise Email Search

no code implementations • 19 Jun 2019 • Brandon Tran, Maryam Karimzadehgan, Rama Kumar Pasumarthi, Michael Bendersky, Donald Metzler

To address this data challenge, in this paper we propose a domain adaptation approach that fine-tunes the global model to each individual enterprise.

Domain Adaptation Information Retrieval +1

Paper
Add Code

Separate and Attend in Personal Email Search

no code implementations • 21 Nov 2019 • Yu Meng, Maryam Karimzadehgan, Honglei Zhuang, Donald Metzler

In personal email search, user queries often impose different requirements on different aspects of the retrieved emails.

Learning-To-Rank

Paper
Add Code

Sparse Sinkhorn Attention

1 code implementation • ICML 2020 • Yi Tay, Dara Bahri, Liu Yang, Donald Metzler, Da-Cheng Juan

We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend.

Document Classification Image Generation +2

247

Paper
Code

Reverse Engineering Configurations of Neural Text Generation Models

no code implementations • ACL 2020 • Yi Tay, Dara Bahri, Che Zheng, Clifford Brunk, Donald Metzler, Andrew Tomkins

This paper seeks to develop a deeper understanding of the fundamental properties of neural text generations models.

Text Generation

Paper
Add Code

Choppy: Cut Transformer For Ranked List Truncation

no code implementations • 26 Apr 2020 • Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Andrew Tomkins

Work in information retrieval has traditionally focused on ranking and relevance: given a query, return some number of results ordered by relevance to the user.

Information Retrieval Retrieval

Paper
Add Code

Synthesizer: Rethinking Self-Attention in Transformer Models

1 code implementation • 2 May 2020 • Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

Ranked #1 on Dialogue Generation on Persona-Chat (BLEU-1 metric, using extra training data)

Abstractive Text Summarization Dialogue Generation +6

Paper
Code

HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

no code implementations • 12 Jul 2020 • Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.

Multi-Task Learning Natural Language Understanding

Paper
Add Code

Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

no code implementations • 17 Aug 2020 • Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Cliff Brunk, Andrew Tomkins

Large generative language models such as GPT-2 are well-known for their ability to generate text as well as their utility in supervised downstream tasks via fine-tuning.

Paper
Add Code

Efficient Transformers: A Survey

no code implementations • 14 Sep 2020 • Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning.

Navigate reinforcement-learning +1

Paper
Add Code

Surprise: Result List Truncation via Extreme Value Theory

no code implementations • 19 Oct 2020 • Dara Bahri, Che Zheng, Yi Tay, Donald Metzler, Andrew Tomkins

Work in information retrieval has largely been centered around ranking and relevance: given a query, return some number of results ordered by relevance to the user.

Information Retrieval Retrieval +1

Paper
Add Code

Long Range Arena: A Benchmark for Efficient Transformers

5 code implementations • 8 Nov 2020 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.

Ranked #18 on Long-range modeling on LRA (Pathfinder metric)

16k Benchmarking +1

680

Paper
Code

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling

2 code implementations • ACL 2021 • Yikang Shen, Yi Tay, Che Zheng, Dara Bahri, Donald Metzler, Aaron Courville

There are two major classes of natural language grammar -- the dependency grammar that models one-to-one correspondences between words and the constituency grammar that models the assembly of one or several corresponded words.

Constituency Parsing Language Modelling +2

32,737

Paper
Code

Synthesizer: Rethinking Self-Attention for Transformer Models

no code implementations • 1 Jan 2021 • Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

Language Modelling Machine Translation +2

Paper
Add Code

Long Range Arena : A Benchmark for Efficient Transformers

no code implementations • ICLR 2021 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity.

16k Benchmarking

Paper
Add Code

HyperGrid Transformers: Towards A Single Model for Multiple Tasks

no code implementations • ICLR 2021 • Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

Specifically, we propose a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.

Multi-Task Learning Natural Language Understanding

Paper
Add Code

Label Smoothed Embedding Hypothesis for Out-of-Distribution Detection

no code implementations • 9 Feb 2021 • Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler

Detecting out-of-distribution (OOD) examples is critical in many applications.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

Paper
Add Code

OmniNet: Omnidirectional Representations from Transformers

1 code implementation • 1 Mar 2021 • Yi Tay, Mostafa Dehghani, Vamsi Aribandi, Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Donald Metzler

In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network.

Ranked #1 on Machine Translation on WMT2017 Russian-English

Few-Shot Learning Language Modelling +2

Paper
Code

Rethinking Search: Making Domain Experts out of Dilettantes

no code implementations • 5 May 2021 • Donald Metzler, Yi Tay, Dara Bahri, Marc Najork

When experiencing an information need, users want to engage with a domain expert, but often turn to an information retrieval system, such as a search engine, instead.

Information Retrieval Question Answering +1

Paper
Add Code

Are Pre-trained Convolutions Better than Pre-trained Transformers?

1 code implementation • 7 May 2021 • Yi Tay, Mostafa Dehghani, Jai Gupta, Dara Bahri, Vamsi Aribandi, Zhen Qin, Donald Metzler

In the context of language models, are convolutional models competitive to Transformers when pre-trained?

32,745

Paper
Code

How Reliable are Model Diagnostics?

no code implementations • Findings (ACL) 2021 • Vamsi Aribandi, Yi Tay, Donald Metzler

In the pursuit of a deeper understanding of a model's behaviour, there is recent impetus for developing suites of probes aimed at diagnosing models beyond simple metrics like accuracy or BLEU.

Paper
Add Code

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

2 code implementations • ICLR 2022 • Yi Tay, Vinh Q. Tran, Sebastian Ruder, Jai Gupta, Hyung Won Chung, Dara Bahri, Zhen Qin, Simon Baumgartner, Cong Yu, Donald Metzler

In this paper, we propose a new model inductive bias that learns a subword tokenization end-to-end as part of the model.

Ranked #3 on Paraphrase Identification on Quora Question Pairs

Inductive Bias Linguistic Acceptability +3

32,745

Paper
Code

SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption

no code implementations • ICLR 2022 • Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler

Self-supervised contrastive representation learning has proved incredibly successful in the vision and natural language domains, enabling state-of-the-art performance with orders of magnitude less labeled data.

Contrastive Learning Representation Learning +1

Paper
Add Code

The Benchmark Lottery

no code implementations • 14 Jul 2021 • Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals

The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods.

Benchmarking BIG-bench Machine Learning +3

Paper
Add Code

Are Pretrained Convolutions Better than Pretrained Transformers?

1 code implementation • ACL 2021 • Yi Tay, Mostafa Dehghani, Jai Prakash Gupta, Vamsi Aribandi, Dara Bahri, Zhen Qin, Donald Metzler

In the context of language models, are convolutional models competitive to Transformers when pre-trained?

32,736

Paper
Code

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

3 code implementations • 22 Sep 2021 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.

32,736

Paper
Code

Scale Efficiently: Insights from Pretraining and Finetuning Transformers

no code implementations • ICLR 2022 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

Paper
Add Code

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

3 code implementations • ICLR 2022 • Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler

Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during pre-training.

Denoising Multi-Task Learning

5,890

Paper
Code

Atomized Search Length: Beyond User Models

no code implementations • 5 Jan 2022 • John Alex, Keith Hall, Donald Metzler

We argue that current IR metrics, modeled on optimizing user experience, measure too narrow a portion of the IR space.

Paper
Add Code

Transformer Memory as a Differentiable Search Index

1 code implementation • 14 Feb 2022 • Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler

In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model.

Information Retrieval Retrieval

148

Paper
Code

A New Generation of Perspective API: Efficient Multilingual Character-level Transformers

no code implementations • 22 Feb 2022 • Alyssa Lees, Vinh Q. Tran, Yi Tay, Jeffrey Sorensen, Jai Gupta, Donald Metzler, Lucy Vasserman

As such, it is crucial to develop models that are effective across a diverse range of languages, usages, and styles.

Toxic Comment Classification

Paper
Add Code

HyperPrompt: Prompt-based Task-Conditioning of Transformers

no code implementations • 1 Mar 2022 • Yun He, Huaixiu Steven Zheng, Yi Tay, Jai Gupta, Yu Du, Vamsi Aribandi, Zhe Zhao, Yaguang Li, Zhao Chen, Donald Metzler, Heng-Tze Cheng, Ed H. Chi

Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient way.

Computational Efficiency Multi-Task Learning +1

Paper
Add Code

Stretching Sentence-pair NLI Models to Reason over Long Documents and Clusters

1 code implementation • 15 Apr 2022 • Tal Schuster, Sihao Chen, Senaka Buthpitiya, Alex Fabrikant, Donald Metzler

Natural Language Inference (NLI) has been extensively studied by the NLP community as a framework for estimating the semantic relation between sentence pairs.

Natural Language Inference Retrieval +1

Paper
Code

Retrieval-Enhanced Machine Learning

no code implementations • 2 May 2022 • Hamed Zamani, Fernando Diaz, Mostafa Dehghani, Donald Metzler, Michael Bendersky

Although information access systems have long supported people in accomplishing a wide range of tasks, we propose broadening the scope of users of information access systems to include task-driven machines, such as machine learning models.

BIG-bench Machine Learning Information Retrieval +1

Paper
Add Code

UL2: Unifying Language Learning Paradigms

1 code implementation • 10 May 2022 • Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Siamak Shakeri, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, Donald Metzler

Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.

Ranked #1 on Long-range modeling on SCROLLS (CNLI metric)

Arithmetic Reasoning Common Sense Reasoning +11

32,745

Paper
Code

Emergent Abilities of Large Language Models

no code implementations • 15 Jun 2022 • Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks.

Language Modelling

Paper
Add Code

Confident Adaptive Language Modeling

no code implementations • 14 Jul 2022 • Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Q. Tran, Yi Tay, Donald Metzler

Recent advances in Transformer-based large language models (LLMs) have led to significant performance improvements across many tasks.

Language Modelling Text Generation

Paper
Add Code

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?

no code implementations • 21 Jul 2022 • Yi Tay, Mostafa Dehghani, Samira Abnar, Hyung Won Chung, William Fedus, Jinfeng Rao, Sharan Narang, Vinh Q. Tran, Dani Yogatama, Donald Metzler

There have been a lot of interest in the scaling properties of Transformer models.

Inductive Bias

Paper
Add Code

Transcending Scaling Laws with 0.1% Extra Compute

no code implementations • 20 Oct 2022 • Yi Tay, Jason Wei, Hyung Won Chung, Vinh Q. Tran, David R. So, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani

This paper proposes UL2R, a method that substantially improves existing language models and their scaling curves with a relatively tiny amount of extra compute.

Ranked #2 on Cross-Lingual Question Answering on TyDiQA-GoldP

Arithmetic Reasoning Cross-Lingual Question Answering +4

Paper
Add Code

Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

1 code implementation • 15 Dec 2022 • Bernd Bohnet, Vinh Q. Tran, Pat Verga, Roee Aharoni, Daniel Andor, Livio Baldini Soares, Massimiliano Ciaramita, Jacob Eisenstein, Kuzman Ganchev, Jonathan Herzig, Kai Hui, Tom Kwiatkowski, Ji Ma, Jianmo Ni, Lierni Sestorain Saralegui, Tal Schuster, William W. Cohen, Michael Collins, Dipanjan Das, Donald Metzler, Slav Petrov, Kellie Webster

We take human annotations as a gold standard and show that a correlated automatic metric is suitable for development.

Attribute Question Answering

Paper
Code

Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search Classification

no code implementations • 16 Dec 2022 • Jai Gupta, Yi Tay, Chaitanya Kamath, Vinh Q. Tran, Donald Metzler, Shailesh Bavadekar, Mimi Sun, Evgeniy Gabrilovich

With the devastating outbreak of COVID-19, vaccines are one of the crucial lines of defense against mass infection in this global pandemic.

Natural Language Understanding

Paper
Add Code

DSI++: Updating Transformer Memory with New Documents

no code implementations • 19 Dec 2022 • Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Jinfeng Rao, Marc Najork, Emma Strubell, Donald Metzler

In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents.

Continual Learning Natural Questions +1

Paper
Add Code

How Does Generative Retrieval Scale to Millions of Passages?

no code implementations • 19 May 2023 • Ronak Pradeep, Kai Hui, Jai Gupta, Adam D. Lelkes, Honglei Zhuang, Jimmy Lin, Donald Metzler, Vinh Q. Tran

Popularized by the Differentiable Search Index, the emerging paradigm of generative retrieval re-frames the classic information retrieval problem into a sequence-to-sequence modeling task, forgoing external indices and encoding an entire document corpus within a single Transformer.

Information Retrieval Passage Ranking +1

Paper
Add Code

LAIT: Efficient Multi-Segment Encoding in Transformers with Layer-Adjustable Interaction

no code implementations • 31 May 2023 • Jeremiah Milbauer, Annie Louis, Mohammad Javad Hosseini, Alex Fabrikant, Donald Metzler, Tal Schuster

Transformer encoders contextualize token representations by attending to all other tokens at each layer, leading to quadratic increase in compute effort with the input length.

Paper
Add Code

Gen-IR @ SIGIR 2023: The First Workshop on Generative Information Retrieval

no code implementations • 5 Jun 2023 • Gabriel Bénédict, Ruqing Zhang, Donald Metzler

Generative information retrieval (IR) has experienced substantial growth across multiple research communities (e. g., information retrieval, computer vision, natural language processing, and machine learning), and has been highly visible in the popular press.

Answer Generation Information Retrieval +2

Paper
Add Code

Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting

no code implementations • 30 Jun 2023 • Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Le Yan, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, Michael Bendersky

Ranking documents using Large Language Models (LLMs) by directly feeding the query and candidate documents into the prompt is an interesting and practical problem.

Paper
Add Code

PaRaDe: Passage Ranking using Demonstrations with Large Language Models

no code implementations • 22 Oct 2023 • Andrew Drozdov, Honglei Zhuang, Zhuyun Dai, Zhen Qin, Razieh Rahimi, Xuanhui Wang, Dana Alon, Mohit Iyyer, Andrew McCallum, Donald Metzler, Kai Hui

Recent studies show that large language models (LLMs) can be instructed to effectively perform zero-shot passage re-ranking, in which the results of a first stage retrieval method, such as BM25, are rated and reordered to improve relevance.

Passage Ranking Passage Re-Ranking +6

Paper
Add Code

SEMQA: Semi-Extractive Multi-Source Question Answering

1 code implementation • 8 Nov 2023 • Tal Schuster, Adam D. Lelkes, Haitian Sun, Jai Gupta, Jonathan Berant, William W. Cohen, Donald Metzler

Experimenting with several LLMs in various settings, we find this task to be surprisingly challenging, demonstrating the importance of QuoteSum for developing and studying such consolidation capabilities.

Attribute Long Form Question Answering +1

Paper
Code

Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data

no code implementations • 8 Apr 2024 • Tim Baumgärtner, Yang Gao, Dana Alon, Donald Metzler

Reinforcement Learning from Human Feedback (RLHF) is a popular method for aligning Language Models (LM) with human values and preferences.

Paper
Add Code

Impact of Preference Noise on the Alignment Performance of Generative Language Models

no code implementations • 15 Apr 2024 • Yang Gao, Dana Alon, Donald Metzler

A key requirement in developing Generative Language Models (GLMs) is to have their values aligned with human values.

Dialogue Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.