Search Results for author: Donald Metzler

Found 29 papers, 7 papers with code

Atomized Search Length: Beyond User Models

no code implementations5 Jan 2022 John Alex, Keith Hall, Donald Metzler

We argue that current IR metrics, modeled on optimizing user experience, measure too narrow a portion of the IR space.

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

2 code implementations22 Nov 2021 Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler

Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during pre-training.

Denoising Multi-Task Learning

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

no code implementations22 Sep 2021 Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.

The Benchmark Lottery

no code implementations14 Jul 2021 Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals

The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods.

Information Retrieval Recommendation Systems

SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption

no code implementations29 Jun 2021 Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler

Self-supervised contrastive representation learning has proved incredibly successful in the vision and natural language domains, enabling state-of-the-art performance with orders of magnitude less labeled data.

Contrastive Learning Representation Learning

How Reliable are Model Diagnostics?

no code implementations Findings (ACL) 2021 Vamsi Aribandi, Yi Tay, Donald Metzler

In the pursuit of a deeper understanding of a model's behaviour, there is recent impetus for developing suites of probes aimed at diagnosing models beyond simple metrics like accuracy or BLEU.

Are Pre-trained Convolutions Better than Pre-trained Transformers?

no code implementations7 May 2021 Yi Tay, Mostafa Dehghani, Jai Gupta, Dara Bahri, Vamsi Aribandi, Zhen Qin, Donald Metzler

In the context of language models, are convolutional models competitive to Transformers when pre-trained?

Rethinking Search: Making Domain Experts out of Dilettantes

no code implementations5 May 2021 Donald Metzler, Yi Tay, Dara Bahri, Marc Najork

When experiencing an information need, users want to engage with a domain expert, but often turn to an information retrieval system, such as a search engine, instead.

Information Retrieval Question Answering

OmniNet: Omnidirectional Representations from Transformers

1 code implementation1 Mar 2021 Yi Tay, Mostafa Dehghani, Vamsi Aribandi, Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Donald Metzler

In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network.

Few-Shot Learning Language Modelling +2

Synthesizer: Rethinking Self-Attention for Transformer Models

no code implementations1 Jan 2021 Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

Language Modelling Machine Translation +2

HyperGrid Transformers: Towards A Single Model for Multiple Tasks

no code implementations ICLR 2021 Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

Specifically, we propose a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.

Multi-Task Learning Natural Language Understanding

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling

1 code implementation ACL 2021 Yikang Shen, Yi Tay, Che Zheng, Dara Bahri, Donald Metzler, Aaron Courville

There are two major classes of natural language grammar -- the dependency grammar that models one-to-one correspondences between words and the constituency grammar that models the assembly of one or several corresponded words.

Constituency Parsing Language Modelling +1

Long Range Arena: A Benchmark for Efficient Transformers

5 code implementations8 Nov 2020 Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.

Long-range modeling

Surprise: Result List Truncation via Extreme Value Theory

no code implementations19 Oct 2020 Dara Bahri, Che Zheng, Yi Tay, Donald Metzler, Andrew Tomkins

Work in information retrieval has largely been centered around ranking and relevance: given a query, return some number of results ordered by relevance to the user.

Information Retrieval Two-sample testing

Efficient Transformers: A Survey

no code implementations14 Sep 2020 Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning.

Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

no code implementations17 Aug 2020 Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Cliff Brunk, Andrew Tomkins

Large generative language models such as GPT-2 are well-known for their ability to generate text as well as their utility in supervised downstream tasks via fine-tuning.

HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

no code implementations12 Jul 2020 Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.

Multi-Task Learning Natural Language Understanding

Synthesizer: Rethinking Self-Attention in Transformer Models

1 code implementation2 May 2020 Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

 Ranked #1 on Dialogue Generation on Persona-Chat (BLEU-1 metric)

Abstractive Text Summarization Dialogue Generation +6

Choppy: Cut Transformer For Ranked List Truncation

no code implementations26 Apr 2020 Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Andrew Tomkins

Work in information retrieval has traditionally focused on ranking and relevance: given a query, return some number of results ordered by relevance to the user.

Information Retrieval

Reverse Engineering Configurations of Neural Text Generation Models

no code implementations ACL 2020 Yi Tay, Dara Bahri, Che Zheng, Clifford Brunk, Donald Metzler, Andrew Tomkins

This paper seeks to develop a deeper understanding of the fundamental properties of neural text generations models.

Text Generation

Sparse Sinkhorn Attention

1 code implementation ICML 2020 Yi Tay, Dara Bahri, Liu Yang, Donald Metzler, Da-Cheng Juan

We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend.

Document Classification Image Generation +2

Separate and Attend in Personal Email Search

no code implementations21 Nov 2019 Yu Meng, Maryam Karimzadehgan, Honglei Zhuang, Donald Metzler

In personal email search, user queries often impose different requirements on different aspects of the retrieved emails.

Learning-To-Rank

Domain Adaptation for Enterprise Email Search

no code implementations19 Jun 2019 Brandon Tran, Maryam Karimzadehgan, Rama Kumar Pasumarthi, Michael Bendersky, Donald Metzler

To address this data challenge, in this paper we propose a domain adaptation approach that fine-tunes the global model to each individual enterprise.

Domain Adaptation Information Retrieval

Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering

no code implementations15 Sep 2018 Jiaming Shen, Maryam Karimzadehgan, Michael Bendersky, Zhen Qin, Donald Metzler

In this paper, we study how to obtain query type in an unsupervised fashion and how to incorporate this information into query-dependent ranking models.

Multi-Task Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.