Search Results for author: Sebastian Ruder

Found 59 papers, 38 papers with code

Multi-Domain Multilingual Question Answering

1 code implementation EMNLP (ACL) 2021 Sebastian Ruder, Avi Sil

Question answering (QA) is one of the most challenging and impactful tasks in natural language processing.

Cross-Lingual Transfer Domain Adaptation +2

MAD-G: Multilingual Adapter Generation for Efficient Cross-Lingual Transfer

no code implementations Findings (EMNLP) 2021 Alan Ansell, Edoardo Maria Ponti, Jonas Pfeiffer, Sebastian Ruder, Goran Glavaš, Ivan Vulić, Anna Korhonen

While offering (1) improved fine-tuning efficiency (by a factor of around 50 in our experiments), (2) a smaller parameter budget, and (3) increased language coverage, MAD-G remains competitive with more expensive methods for language-specific adapter training across the board.

Dependency Parsing Fine-tuning +4

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalisation

2 code implementations ICML 2020 Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, Melvin Johnson

However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing.

Zero-Shot Cross-Lingual Transfer

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

2 code implementations22 Nov 2021 Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler

Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during pre-training.

Denoising Multi-Task Learning

Balancing Average and Worst-case Accuracy in Multitask Learning

no code implementations12 Oct 2021 Paul Michel, Sebastian Ruder, Dani Yogatama

When training and evaluating machine learning models on a large number of tasks, it is important to not only look at average task accuracy -- which may be biased by easy or redundant tasks -- but also worst-case accuracy (i. e. the performance on the task with the lowest accuracy).

Image Classification Language Modelling

Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

1 code implementation NeurIPS 2021 Rabeeh Karimi Mahabadi, James Henderson, Sebastian Ruder

In this work, we propose Compacter, a method for fine-tuning large-scale language models with a better trade-off between task performance and the number of trainable parameters than prior work.

Fine-tuning

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

1 code implementation ACL 2021 Rabeeh Karimi Mahabadi, Sebastian Ruder, Mostafa Dehghani, James Henderson

State-of-the-art parameter-efficient fine-tuning methods rely on introducing adapter modules between the layers of a pretrained language model.

Domain Generalization Fine-tuning +2

BERT memorisation and pitfalls in low-resource scenarios

no code implementations16 Apr 2021 Michael Tänzer, Sebastian Ruder, Marek Rei

State-of-the-art pre-trained models have been shown to memorise facts and perform well with limited amounts of training data.

Few-Shot Learning Low Resource Named Entity Recognition +1

Multi-view Subword Regularization

1 code implementation NAACL 2021 Xinyi Wang, Sebastian Ruder, Graham Neubig

Multilingual pretrained representations generally rely on subword segmentation algorithms to create a shared multilingual vocabulary.

Cross-Lingual Transfer Fine-tuning

Mind the Gap: Assessing Temporal Generalization in Neural Language Models

1 code implementation NeurIPS 2021 Angeliki Lazaridou, Adhiguna Kuncoro, Elena Gribovskaya, Devang Agrawal, Adam Liska, Tayfun Terzi, Mai Gimenez, Cyprien de Masson d'Autume, Tomas Kocisky, Sebastian Ruder, Dani Yogatama, Kris Cao, Susannah Young, Phil Blunsom

Hence, given the compilation of ever-larger language modelling datasets, combined with the growing list of language-model-based NLP applications that require up-to-date factual knowledge about the world, we argue that now is the right time to rethink the static way in which we currently train and evaluate our language models, and develop adaptive language models that can remain up-to-date with respect to our ever-changing and non-stationary world.

Language Modelling

How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models

1 code implementation ACL 2021 Phillip Rust, Jonas Pfeiffer, Ivan Vulić, Sebastian Ruder, Iryna Gurevych

In this work, we provide a systematic and comprehensive empirical comparison of pretrained multilingual language models versus their monolingual counterparts with regard to their monolingual task performance.

UNKs Everywhere: Adapting Multilingual Language Models to New Scripts

1 code implementation EMNLP 2021 Jonas Pfeiffer, Ivan Vulić, Iryna Gurevych, Sebastian Ruder

The ultimate challenge is dealing with under-resourced languages not covered at all by the models and written in scripts unseen during pretraining.

Cross-Lingual Transfer

Morphologically Aware Word-Level Translation

no code implementations COLING 2020 Paula Czarnowska, Sebastian Ruder, Ryan Cotterell, Ann Copestake

We propose a novel morphologically aware probability model for bilingual lexicon induction, which jointly models lexeme translation and inflectional morphology in a structured way.

Bilingual Lexicon Induction Translation

Long Range Arena: A Benchmark for Efficient Transformers

4 code implementations8 Nov 2020 Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.

Long-range modeling

AdapterHub: A Framework for Adapting Transformers

3 code implementations EMNLP 2020 Jonas Pfeiffer, Andreas Rücklé, Clifton Poth, Aishwarya Kamath, Ivan Vulić, Sebastian Ruder, Kyunghyun Cho, Iryna Gurevych

We propose AdapterHub, a framework that allows dynamic "stitching-in" of pre-trained adapters for different tasks and languages.

Fine-tuning

MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer

1 code implementation EMNLP 2020 Jonas Pfeiffer, Ivan Vulić, Iryna Gurevych, Sebastian Ruder

The main goal behind state-of-the-art pre-trained multilingual models such as multilingual BERT and XLM-R is enabling and bootstrapping NLP applications in low-resource languages through zero-shot or few-shot cross-lingual transfer.

Ranked #2 on Cross-Lingual Transfer on XCOPA (using extra training data)

Cross-Lingual Transfer Named Entity Recognition +1

A Call for More Rigor in Unsupervised Cross-lingual Learning

no code implementations ACL 2020 Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre

We review motivations, definition, approaches, and methodology for unsupervised cross-lingual learning and call for a more rigorous position in each of them.

Translation Unsupervised Machine Translation +1

Are All Good Word Vector Spaces Isomorphic?

1 code implementation EMNLP 2020 Ivan Vulić, Sebastian Ruder, Anders Søgaard

Existing algorithms for aligning cross-lingual word vector spaces assume that vector spaces are approximately isomorphic.

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

2 code implementations24 Mar 2020 Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, Melvin Johnson

However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing.

Cross-Lingual Transfer

On the Cross-lingual Transferability of Monolingual Representations

5 code implementations ACL 2020 Mikel Artetxe, Sebastian Ruder, Dani Yogatama

This generalization ability has been attributed to the use of a shared subword vocabulary and joint training across multiple languages giving rise to deep multilingual abstractions.

Cross-Lingual Question Answering Language Modelling

What do Deep Networks Like to Read?

no code implementations10 Sep 2019 Jonas Pfeiffer, Aishwarya Kamath, Iryna Gurevych, Sebastian Ruder

Recent research towards understanding neural networks probes models in a top-down manner, but is only able to identify model tendencies that are known a priori.

Fine-tuning

Unsupervised Cross-Lingual Representation Learning

no code implementations ACL 2019 Sebastian Ruder, Anders S{\o}gaard, Ivan Vuli{\'c}

In this tutorial, we provide a comprehensive survey of the exciting recent work on cutting-edge weakly-supervised and unsupervised cross-lingual word representations.

Representation Learning Structured Prediction

Episodic Memory in Lifelong Language Learning

1 code implementation NeurIPS 2019 Cyprien de Masson d'Autume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama

We introduce a lifelong language learning setup where a model needs to learn from a stream of text examples without any dataset identifier.

Continual Learning General Classification +2

Transfer Learning in Natural Language Processing

no code implementations NAACL 2019 Sebastian Ruder, Matthew E. Peters, Swabha Swayamdipta, Thomas Wolf

The classic supervised machine learning paradigm is based on learning in isolation, a single predictive model for a task using a single dataset.

Transfer Learning Word Embeddings

To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks

1 code implementation WS 2019 Matthew E. Peters, Sebastian Ruder, Noah A. Smith

While most previous work has focused on different pretraining objectives and architectures for transfer learning, we ask how to best adapt the pretrained model to a given target task.

Fine-tuning Transfer Learning

A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks

1 code implementation14 Nov 2018 Victor Sanh, Thomas Wolf, Sebastian Ruder

The model is trained in a hierarchical fashion to introduce an inductive bias by supervising a set of low level tasks at the bottom layers of the model and more complex tasks at the top layers of the model.

Ranked #9 on Relation Extraction on ACE 2005 (using extra training data)

Multi-Task Learning Named Entity Recognition +1

Off-the-Shelf Unsupervised NMT

no code implementations6 Nov 2018 Chris Hokamp, Sebastian Ruder, John Glover

We frame unsupervised machine translation (MT) in the context of multi-task learning (MTL), combining insights from both directions.

Multi-Task Learning Translation +1

Generalizing Procrustes Analysis for Better Bilingual Dictionary Induction

1 code implementation CONLL 2018 Yova Kementchedjhieva, Sebastian Ruder, Ryan Cotterell, Anders Søgaard

Most recent approaches to bilingual dictionary induction find a linear alignment between the word vector spaces of two languages.

360\mbox$^\circ$ Stance Detection

no code implementations NAACL 2018 Sebastian Ruder, John Glover, Afshin Mehrabani, Parsa Ghaffari

To ameliorate this, we propose 360{\mbox{$^\circ$}} Stance Detection, a tool that aggregates news with multiple perspectives on a topic.

Stance Detection

On the Limitations of Unsupervised Bilingual Dictionary Induction

no code implementations ACL 2018 Anders Søgaard, Sebastian Ruder, Ivan Vulić

Unsupervised machine translation---i. e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora---seems impossible, but nevertheless, Lample et al. (2018) recently proposed a fully unsupervised machine translation (MT) model.

Graph Similarity Translation +1

Strong Baselines for Neural Semi-supervised Learning under Domain Shift

2 code implementations ACL 2018 Sebastian Ruder, Barbara Plank

In this paper, we re-evaluate classic general-purpose bootstrapping approaches in the context of neural networks under domain shifts vs. recent neural approaches and propose a novel multi-task tri-training method that reduces the time and space complexity of classic tri-training.

Domain Adaptation Multi-Task Learning +2

360° Stance Detection

no code implementations3 Apr 2018 Sebastian Ruder, John Glover, Afshin Mehrabani, Parsa Ghaffari

To ameliorate this, we propose 360{\deg} Stance Detection, a tool that aggregates news with multiple perspectives on a topic.

Stance Detection

Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces

1 code implementation NAACL 2018 Isabelle Augenstein, Sebastian Ruder, Anders Søgaard

We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and auxiliary, annotated datasets.

General Classification Multi-Task Learning +1

Universal Language Model Fine-tuning for Text Classification

64 code implementations ACL 2018 Jeremy Howard, Sebastian Ruder

Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch.

Classification Fine-tuning +5

Learning to select data for transfer learning with Bayesian Optimization

1 code implementation EMNLP 2017 Sebastian Ruder, Barbara Plank

Domain similarity measures can be used to gauge adaptability and select suitable data for transfer learning, but existing approaches define ad hoc measures that are deemed suitable for respective tasks.

Curriculum Learning Part-Of-Speech Tagging +2

A Survey Of Cross-lingual Word Embedding Models

no code implementations15 Jun 2017 Sebastian Ruder, Ivan Vulić, Anders Søgaard

Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages.

Cross-Lingual Transfer Word Embeddings

An Overview of Multi-Task Learning in Deep Neural Networks

1 code implementation15 Jun 2017 Sebastian Ruder

Multi-task learning (MTL) has led to successes in many applications of machine learning, from natural language processing and speech recognition to computer vision and drug discovery.

Drug Discovery Multi-Task Learning +1

Latent Multi-task Architecture Learning

2 code implementations23 May 2017 Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, Anders Søgaard

In practice, however, MTL involves searching an enormous space of possible parameter sharing architectures to find (a) the layers or subspaces that benefit from sharing, (b) the appropriate amount of sharing, and (c) the appropriate relative weights of the different task losses.

Multi-Task Learning

Data Selection Strategies for Multi-Domain Sentiment Analysis

1 code implementation8 Feb 2017 Sebastian Ruder, Parsa Ghaffari, John G. Breslin

However, the selection of appropriate training data is as important as the choice of algorithm.

Domain Adaptation Sentiment Analysis

Knowledge Adaptation: Teaching to Adapt

no code implementations7 Feb 2017 Sebastian Ruder, Parsa Ghaffari, John G. Breslin

Domain adaptation is crucial in many real-world applications where the distribution of the training data differs from the distribution of the test data.

Knowledge Distillation Sentiment Analysis +1

Character-level and Multi-channel Convolutional Neural Networks for Large-scale Authorship Attribution

3 code implementations21 Sep 2016 Sebastian Ruder, Parsa Ghaffari, John G. Breslin

Convolutional neural networks (CNNs) have demonstrated superior capability for extracting information from raw signals in computer vision.

Sentence Classification

An overview of gradient descent optimization algorithms

22 code implementations15 Sep 2016 Sebastian Ruder

Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by.

Cannot find the paper you are looking for? You can Submit a new open access paper.