Search Results for author: Mostafa Dehghani

Found 64 papers, 32 papers with code

Parameter-Efficient Multilingual Summarisation: An Empirical Study

no code implementations14 Nov 2023 Chenxi Whitehouse, Fantine Huot, Jasmijn Bastings, Mostafa Dehghani, Chu-Cheng Lin, Mirella Lapata

With the increasing prevalence of Large Language Models, traditional full fine-tuning approaches face growing challenges, especially in memory-intensive tasks.

Cross-Lingual Transfer

Group Membership Bias

no code implementations5 Aug 2023 Ali Vardasbi, Maarten de Rijke, Fernando Diaz, Mostafa Dehghani

With group bias, the utility of the sensitive groups is under-estimated, hence, without correcting for this bias, a supposedly fair ranking is not truly fair.

Fairness Learning-To-Rank +1

PaLM 2 Technical Report

1 code implementation17 May 2023 Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vlad Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, Guy Gur-Ari, Steven Hand, Hadi Hashemi, Le Hou, Joshua Howland, Andrea Hu, Jeffrey Hui, Jeremy Hurwitz, Michael Isard, Abe Ittycheriah, Matthew Jagielski, Wenhao Jia, Kathleen Kenealy, Maxim Krikun, Sneha Kudugunta, Chang Lan, Katherine Lee, Benjamin Lee, Eric Li, Music Li, Wei Li, Yaguang Li, Jian Li, Hyeontaek Lim, Hanzhao Lin, Zhongtao Liu, Frederick Liu, Marcello Maggioni, Aroma Mahendru, Joshua Maynez, Vedant Misra, Maysam Moussalem, Zachary Nado, John Nham, Eric Ni, Andrew Nystrom, Alicia Parrish, Marie Pellat, Martin Polacek, Alex Polozov, Reiner Pope, Siyuan Qiao, Emily Reif, Bryan Richter, Parker Riley, Alex Castro Ros, Aurko Roy, Brennan Saeta, Rajkumar Samuel, Renee Shelby, Ambrose Slone, Daniel Smilkov, David R. So, Daniel Sohn, Simon Tokumine, Dasha Valter, Vijay Vasudevan, Kiran Vodrahalli, Xuezhi Wang, Pidong Wang, ZiRui Wang, Tao Wang, John Wieting, Yuhuai Wu, Kelvin Xu, Yunhan Xu, Linting Xue, Pengcheng Yin, Jiahui Yu, Qiao Zhang, Steven Zheng, Ce Zheng, Weikang Zhou, Denny Zhou, Slav Petrov, Yonghui Wu

Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM.

 Ranked #1 on Question Answering on TriviaQA (using extra training data)

Language Modelling Question Answering

End-to-End Spatio-Temporal Action Localisation with Video Transformers

no code implementations24 Apr 2023 Alexey Gritsenko, Xuehan Xiong, Josip Djolonga, Mostafa Dehghani, Chen Sun, Mario Lučić, Cordelia Schmid, Anurag Arnab

The most performant spatio-temporal action localisation models use external person proposals and complex external memory banks.

 Ranked #1 on Action Recognition on AVA v2.1 (using extra training data)

Action Detection Action Recognition +1

Dual PatchNorm

7 code implementations2 Feb 2023 Manoj Kumar, Mostafa Dehghani, Neil Houlsby

We propose Dual PatchNorm: two Layer Normalization layers (LayerNorms), before and after the patch embedding layer in Vision Transformers.

Adaptive Computation with Elastic Input Sequence

1 code implementation30 Jan 2023 Fuzhao Xue, Valerii Likhosherstov, Anurag Arnab, Neil Houlsby, Mostafa Dehghani, Yang You

However, most standard neural networks have a fixed function type and computation budget regardless of the sample's nature or difficulty.

Inductive Bias

DSI++: Updating Transformer Memory with New Documents

no code implementations19 Dec 2022 Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Jinfeng Rao, Marc Najork, Emma Strubell, Donald Metzler

In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents.

Continual Learning Natural Questions +1

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

1 code implementation9 Dec 2022 Aran Komatsuzaki, Joan Puigcerver, James Lee-Thorp, Carlos Riquelme Ruiz, Basil Mustafa, Joshua Ainslie, Yi Tay, Mostafa Dehghani, Neil Houlsby

In this work, we propose sparse upcycling -- a simple way to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint.

Karyotype AI for Precision Oncology

no code implementations20 Nov 2022 Zahra Shamsi, Drew Bryant, Jacob Wilson, Xiaoyu Qu, Avinava Dubey, Konik Kothari, Mostafa Dehghani, Mariya Chavarha, Valerii Likhosherstov, Brian Williams, Michael Frumkin, Fred Appelbaum, Krzysztof Choromanski, Ali Bashir, Min Fang

These individual chromosomes were used to train and assess deep learning models for classifying the 24 human chromosomes and identifying chromosomal aberrations.

Few-Shot Learning Inductive Bias

Intersection of Parallels as an Early Stopping Criterion

1 code implementation19 Aug 2022 Ali Vardasbi, Maarten de Rijke, Mostafa Dehghani

Using this result, we propose to train two parallel instances of a linear model, initialized with different random seeds, and use their intersection as a signal to detect overfitting.

counterfactual Learning-To-Rank

Confident Adaptive Language Modeling

no code implementations14 Jul 2022 Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Q. Tran, Yi Tay, Donald Metzler

Recent advances in Transformer-based large language models (LLMs) have led to significant performance improvements across many tasks.

Language Modelling Text Generation

UL2: Unifying Language Learning Paradigms

1 code implementation10 May 2022 Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Siamak Shakeri, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, Donald Metzler

Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.

Information Retrieval Long-range modeling +4

Retrieval-Enhanced Machine Learning

no code implementations2 May 2022 Hamed Zamani, Fernando Diaz, Mostafa Dehghani, Donald Metzler, Michael Bendersky

Although information access systems have long supported people in accomplishing a wide range of tasks, we propose broadening the scope of users of information access systems to include task-driven machines, such as machine learning models.

BIG-bench Machine Learning Information Retrieval +1

Transformer Memory as a Differentiable Search Index

1 code implementation14 Feb 2022 Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler

In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model.

Information Retrieval Retrieval

VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface Modeling

no code implementations10 Dec 2021 Yang Li, Gang Li, Xin Zhou, Mostafa Dehghani, Alexey Gritsenko

Our model consists of a multimodal Transformer encoder that jointly encodes UI images and structures, and performs UI object detection when the UI structures are absent in the input.

object-detection Object Detection +2

TokenLearner: Adaptive Space-Time Tokenization for Videos

1 code implementation NeurIPS 2021 Michael Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks.

Representation Learning Video Recognition +1

PolyViT: Co-training Vision Transformers on Images, Videos and Audio

no code implementations25 Nov 2021 Valerii Likhosherstov, Anurag Arnab, Krzysztof Choromanski, Mario Lucic, Yi Tay, Adrian Weller, Mostafa Dehghani

Can we train a single transformer model capable of processing multiple modalities and datasets, whilst sharing almost all of its learnable parameters?

Audio Classification

The Efficiency Misnomer

no code implementations ICLR 2022 Mostafa Dehghani, Anurag Arnab, Lucas Beyer, Ashish Vaswani, Yi Tay

We further present suggestions to improve reporting of efficiency metrics.

SCENIC: A JAX Library for Computer Vision Research and Beyond

1 code implementation CVPR 2022 Mostafa Dehghani, Alexey Gritsenko, Anurag Arnab, Matthias Minderer, Yi Tay

Scenic is an open-source JAX library with a focus on Transformer-based models for computer vision research and beyond.

Exploring the Limits of Large Scale Pre-training

no code implementations ICLR 2022 Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie Sedghi

Recent developments in large-scale machine learning suggest that by scaling up data, model size and training time properly, one might observe that improvements in pre-training would transfer favorably to most downstream tasks.

Scale Efficiently: Insights from Pretraining and Finetuning Transformers

no code implementations ICLR 2022 Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.

Gradual Domain Adaptation in the Wild: When Intermediate Distributions are Absent

no code implementations29 Sep 2021 Samira Abnar, Rianne van den Berg, Golnaz Ghiasi, Mostafa Dehghani, Nal Kalchbrenner, Hanie Sedghi

It is shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution; self-training can be successfully applied on gradually shifted samples to adapt the model toward the target distribution.

Domain Adaptation

VUT: Versatile UI Transformer for Multimodal Multi-Task User Interface Modeling

no code implementations29 Sep 2021 Yang Li, Gang Li, Xin Zhou, Mostafa Dehghani, Alexey A. Gritsenko

Our model consists of a multimodal Transformer encoder that jointly encodes UI images and structures, and performs UI object detection when the UI structures are absent in the input.

object-detection Object Detection +2

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

3 code implementations22 Sep 2021 Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.

Are Pretrained Convolutions Better than Pretrained Transformers?

1 code implementation ACL 2021 Yi Tay, Mostafa Dehghani, Jai Prakash Gupta, Vamsi Aribandi, Dara Bahri, Zhen Qin, Donald Metzler

In the context of language models, are convolutional models competitive to Transformers when pre-trained?

The Benchmark Lottery

no code implementations14 Jul 2021 Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals

The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods.

Benchmarking BIG-bench Machine Learning +3

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

4 code implementations21 Jun 2021 Michael S. Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks.

Action Classification Image Classification +3

Gradual Domain Adaptation in the Wild:When Intermediate Distributions are Absent

1 code implementation10 Jun 2021 Samira Abnar, Rianne van den Berg, Golnaz Ghiasi, Mostafa Dehghani, Nal Kalchbrenner, Hanie Sedghi

It has been shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution, self-training can be successfully applied on gradually shifted samples to adapt the model toward the target distribution.

Domain Adaptation

Are Pre-trained Convolutions Better than Pre-trained Transformers?

1 code implementation7 May 2021 Yi Tay, Mostafa Dehghani, Jai Gupta, Dara Bahri, Vamsi Aribandi, Zhen Qin, Donald Metzler

In the context of language models, are convolutional models competitive to Transformers when pre-trained?

ViViT: A Video Vision Transformer

7 code implementations ICCV 2021 Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid

We present pure-transformer based models for video classification, drawing upon the recent success of such models in image classification.

Ranked #8 on Action Classification on Moments in Time (Top 5 Accuracy metric, using extra training data)

Action Classification Action Recognition +4

OmniNet: Omnidirectional Representations from Transformers

1 code implementation1 Mar 2021 Yi Tay, Mostafa Dehghani, Vamsi Aribandi, Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Donald Metzler

In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network.

Few-Shot Learning Language Modelling +2

Long Range Arena: A Benchmark for Efficient Transformers

5 code implementations8 Nov 2020 Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.

Benchmarking Long-range modeling

Efficient Transformers: A Survey

no code implementations14 Sep 2020 Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning.

Navigate reinforcement-learning +1

IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression

no code implementations ICLR 2021 Rianne van den Berg, Alexey A. Gritsenko, Mostafa Dehghani, Casper Kaae Sønderby, Tim Salimans

Furthermore, we zoom in on the effect of gradient bias due to the straight-through estimator in integer discrete flows, and demonstrate that its influence is highly dependent on architecture choices and less prominent than previously thought.

Quantization

Transferring Inductive Biases through Knowledge Distillation

1 code implementation31 May 2020 Samira Abnar, Mostafa Dehghani, Willem Zuidema

Having the right inductive biases can be crucial in many tasks or scenarios where data or computing resources are a limiting factor, or where training data is not perfectly representative of the conditions at test time.

Knowledge Distillation

Learning from Samples of Variable Quality

no code implementations ICLR Workshop LLD 2019 Mostafa Dehghani, Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard Schölkopf

Training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing.

HiTR: Hierarchical Topic Model Re-estimation for Measuring Topical Diversity of Documents

1 code implementation12 Oct 2018 Hosein Azarbonyad, Mostafa Dehghani, Tom Kenter, Maarten Marx, Jaap Kamps, Maarten de Rijke

For measuring topical diversity of text documents, our HiTR approach improves over the state-of-the-art measured on PubMed dataset.

Topic Models

Universal Transformers

8 code implementations ICLR 2019 Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Łukasz Kaiser

Feed-forward and convolutional architectures have recently been shown to achieve superior results on some sequence modeling tasks such as machine translation, with the added advantage that they concurrently process all inputs in the sequence, leading to easy parallelization and faster training times.

Inductive Bias LAMBADA +4

Learning to Rank from Samples of Variable Quality

no code implementations21 Jun 2018 Mostafa Dehghani, Jaap Kamps

To this end, we introduce "fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data.

Document Ranking Learning-To-Rank

Learning to Learn from Weak Supervision by Full Supervision

1 code implementation30 Nov 2017 Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps

In this paper, we propose a method for training neural networks when we have a large set of data with weak labels and a small amount of data with true labels.

Words are Malleable: Computing Semantic Shifts in Political and Media Discourse

1 code implementation15 Nov 2017 Hosein Azarbonyad, Mostafa Dehghani, Kaspar Beelen, Alexandra Arkut, Maarten Marx, Jaap Kamps

We propose an approach for detecting semantic shifts between different viewpoints--broadly defined as a set of texts that share a specific metadata feature, which can be a time-period, but also a social entity such as a political party.

Fidelity-Weighted Learning

no code implementations ICLR 2018 Mostafa Dehghani, Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard Schölkopf

To this end, we propose "fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data.

Ad-Hoc Information Retrieval Information Retrieval +1

Learning to Attend, Copy, and Generate for Session-Based Query Suggestion

no code implementations11 Aug 2017 Mostafa Dehghani, Sascha Rothe, Enrique Alfonseca, Pascal Fleury

e results suggest that our model outperforms the baselines both in terms of the generating queries and scoring candidate queries for the task of query suggestion.

Neural Networks for Information Retrieval

no code implementations13 Jul 2017 Tom Kenter, Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, Maarten de Rijke, Bhaskar Mitra

Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them.

Information Retrieval Retrieval

Neural Ranking Models with Weak Supervision

1 code implementation28 Apr 2017 Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, W. Bruce Croft

Our experiments indicate that employing proper objective functions and letting the networks to learn the input representation based on weakly supervised data leads to impressive performance, with over 13% and 35% MAP improvements over the BM25 model on the Robust and the ClueWeb collections.

Ad-Hoc Information Retrieval Information Retrieval +1

On Horizontal and Vertical Separation in Hierarchical Text Classification

no code implementations2 Sep 2016 Mostafa Dehghani, Hosein Azarbonyad, Jaap Kamps, Maarten Marx

Extracting separable models of hierarchical entities requires us to take their relative position into account and to consider the different types of dependencies in the hierarchy.

General Classification text-classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.