Search Results for author: Mostafa Dehghani

Found 44 papers, 22 papers with code

Unifying Language Learning Paradigms

1 code implementation10 May 2022 Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler

Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.

Information Retrieval Long-range modeling +3

Retrieval-Enhanced Machine Learning

no code implementations2 May 2022 Hamed Zamani, Fernando Diaz, Mostafa Dehghani, Donald Metzler, Michael Bendersky

Although information access systems have long supported people in accomplishing a wide range of tasks, we propose broadening the scope of users of information access systems to include task-driven machines, such as machine learning models.

Information Retrieval

Transformer Memory as a Differentiable Search Index

1 code implementation14 Feb 2022 Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler

In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model.

Information Retrieval

VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface Modeling

no code implementations10 Dec 2021 Yang Li, Gang Li, Xin Zhou, Mostafa Dehghani, Alexey Gritsenko

Our model consists of a multimodal Transformer encoder that jointly encodes UI images and structures, and performs UI object detection when the UI structures are absent in the input.

object-detection Object Detection +2

TokenLearner: Adaptive Space-Time Tokenization for Videos

no code implementations NeurIPS 2021 Michael Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks.

Representation Learning Video Recognition +1

PolyViT: Co-training Vision Transformers on Images, Videos and Audio

no code implementations25 Nov 2021 Valerii Likhosherstov, Anurag Arnab, Krzysztof Choromanski, Mario Lucic, Yi Tay, Adrian Weller, Mostafa Dehghani

Can we train a single transformer model capable of processing multiple modalities and datasets, whilst sharing almost all of its learnable parameters?

Audio Classification

The Efficiency Misnomer

no code implementations ICLR 2022 Mostafa Dehghani, Anurag Arnab, Lucas Beyer, Ashish Vaswani, Yi Tay

We further present suggestions to improve reporting of efficiency metrics.

SCENIC: A JAX Library for Computer Vision Research and Beyond

1 code implementation CVPR 2022 Mostafa Dehghani, Alexey Gritsenko, Anurag Arnab, Matthias Minderer, Yi Tay

Scenic is an open-source JAX library with a focus on Transformer-based models for computer vision research and beyond.

Exploring the Limits of Large Scale Pre-training

no code implementations ICLR 2022 Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie Sedghi

Recent developments in large-scale machine learning suggest that by scaling up data, model size and training time properly, one might observe that improvements in pre-training would transfer favorably to most downstream tasks.

Gradual Domain Adaptation in the Wild: When Intermediate Distributions are Absent

no code implementations29 Sep 2021 Samira Abnar, Rianne van den Berg, Golnaz Ghiasi, Mostafa Dehghani, Nal Kalchbrenner, Hanie Sedghi

It is shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution; self-training can be successfully applied on gradually shifted samples to adapt the model toward the target distribution.

Domain Adaptation

VUT: Versatile UI Transformer for Multimodal Multi-Task User Interface Modeling

no code implementations29 Sep 2021 Yang Li, Gang Li, Xin Zhou, Mostafa Dehghani, Alexey A. Gritsenko

Our model consists of a multimodal Transformer encoder that jointly encodes UI images and structures, and performs UI object detection when the UI structures are absent in the input.

object-detection Object Detection +2

Scale Efficiently: Insights from Pretraining and Finetuning Transformers

no code implementations ICLR 2022 Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

2 code implementations22 Sep 2021 Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.

Are Pretrained Convolutions Better than Pretrained Transformers?

1 code implementation ACL 2021 Yi Tay, Mostafa Dehghani, Jai Prakash Gupta, Vamsi Aribandi, Dara Bahri, Zhen Qin, Donald Metzler

In the context of language models, are convolutional models competitive to Transformers when pre-trained?

The Benchmark Lottery

no code implementations14 Jul 2021 Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals

The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods.

Information Retrieval Natural Language Processing +1

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

3 code implementations21 Jun 2021 Michael S. Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks.

Action Classification Image Classification +3

Gradual Domain Adaptation in the Wild:When Intermediate Distributions are Absent

1 code implementation10 Jun 2021 Samira Abnar, Rianne van den Berg, Golnaz Ghiasi, Mostafa Dehghani, Nal Kalchbrenner, Hanie Sedghi

It has been shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution, self-training can be successfully applied on gradually shifted samples to adapt the model toward the target distribution.

Domain Adaptation

Are Pre-trained Convolutions Better than Pre-trained Transformers?

1 code implementation7 May 2021 Yi Tay, Mostafa Dehghani, Jai Gupta, Dara Bahri, Vamsi Aribandi, Zhen Qin, Donald Metzler

In the context of language models, are convolutional models competitive to Transformers when pre-trained?

ViViT: A Video Vision Transformer

4 code implementations ICCV 2021 Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid

We present pure-transformer based models for video classification, drawing upon the recent success of such models in image classification.

Ranked #6 on Action Classification on Moments in Time (Top 5 Accuracy metric, using extra training data)

Action Classification Action Recognition +3

OmniNet: Omnidirectional Representations from Transformers

1 code implementation1 Mar 2021 Yi Tay, Mostafa Dehghani, Vamsi Aribandi, Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Donald Metzler

In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network.

Few-Shot Learning Language Modelling +2

Long Range Arena: A Benchmark for Efficient Transformers

5 code implementations8 Nov 2020 Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.

Long-range modeling

Efficient Transformers: A Survey

no code implementations14 Sep 2020 Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning.

Natural Language Processing reinforcement-learning

IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression

no code implementations ICLR 2021 Rianne van den Berg, Alexey A. Gritsenko, Mostafa Dehghani, Casper Kaae Sønderby, Tim Salimans

Furthermore, we zoom in on the effect of gradient bias due to the straight-through estimator in integer discrete flows, and demonstrate that its influence is highly dependent on architecture choices and less prominent than previously thought.

Quantization

Transferring Inductive Biases through Knowledge Distillation

1 code implementation31 May 2020 Samira Abnar, Mostafa Dehghani, Willem Zuidema

Having the right inductive biases can be crucial in many tasks or scenarios where data or computing resources are a limiting factor, or where training data is not perfectly representative of the conditions at test time.

Knowledge Distillation

Learning from Samples of Variable Quality

no code implementations ICLR Workshop LLD 2019 Mostafa Dehghani, Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard Schölkopf

Training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing.

HiTR: Hierarchical Topic Model Re-estimation for Measuring Topical Diversity of Documents

1 code implementation12 Oct 2018 Hosein Azarbonyad, Mostafa Dehghani, Tom Kenter, Maarten Marx, Jaap Kamps, Maarten de Rijke

For measuring topical diversity of text documents, our HiTR approach improves over the state-of-the-art measured on PubMed dataset.

Topic Models

Universal Transformers

7 code implementations ICLR 2019 Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Łukasz Kaiser

Feed-forward and convolutional architectures have recently been shown to achieve superior results on some sequence modeling tasks such as machine translation, with the added advantage that they concurrently process all inputs in the sequence, leading to easy parallelization and faster training times.

Inductive Bias Language Modelling +3

Learning to Rank from Samples of Variable Quality

no code implementations21 Jun 2018 Mostafa Dehghani, Jaap Kamps

To this end, we introduce "fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data.

Document Ranking Learning-To-Rank

Learning to Learn from Weak Supervision by Full Supervision

1 code implementation30 Nov 2017 Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps

In this paper, we propose a method for training neural networks when we have a large set of data with weak labels and a small amount of data with true labels.

Words are Malleable: Computing Semantic Shifts in Political and Media Discourse

1 code implementation15 Nov 2017 Hosein Azarbonyad, Mostafa Dehghani, Kaspar Beelen, Alexandra Arkut, Maarten Marx, Jaap Kamps

We propose an approach for detecting semantic shifts between different viewpoints--broadly defined as a set of texts that share a specific metadata feature, which can be a time-period, but also a social entity such as a political party.

Fidelity-Weighted Learning

no code implementations ICLR 2018 Mostafa Dehghani, Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard Schölkopf

To this end, we propose "fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data.

Ad-Hoc Information Retrieval Information Retrieval +1

Learning to Attend, Copy, and Generate for Session-Based Query Suggestion

no code implementations11 Aug 2017 Mostafa Dehghani, Sascha Rothe, Enrique Alfonseca, Pascal Fleury

e results suggest that our model outperforms the baselines both in terms of the generating queries and scoring candidate queries for the task of query suggestion.

Neural Networks for Information Retrieval

no code implementations13 Jul 2017 Tom Kenter, Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, Maarten de Rijke, Bhaskar Mitra

Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them.

Information Retrieval

Neural Ranking Models with Weak Supervision

1 code implementation28 Apr 2017 Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, W. Bruce Croft

Our experiments indicate that employing proper objective functions and letting the networks to learn the input representation based on weakly supervised data leads to impressive performance, with over 13% and 35% MAP improvements over the BM25 model on the Robust and the ClueWeb collections.

Ad-Hoc Information Retrieval Information Retrieval

On Horizontal and Vertical Separation in Hierarchical Text Classification

no code implementations2 Sep 2016 Mostafa Dehghani, Hosein Azarbonyad, Jaap Kamps, Maarten Marx

Extracting separable models of hierarchical entities requires us to take their relative position into account and to consider the different types of dependencies in the hierarchy.

Classification General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.