We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-attention relation distillation for task-agnostic compression of pretrained Transformers.

Paper
Code

XeroAlign: Zero-Shot Cross-lingual Transformer Alignment

huawei-noah/noah-research • • Findings (ACL) 2021

The introduction of pretrained cross-lingual language models brought decisive improvements to multilingual NLP tasks.

Paper
Code

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

microsoft/DeBERTa • • 18 Nov 2021

We thus propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics, improving both training efficiency and the quality of the pre-trained model.

Paper
Code

X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks

zengyan-97/x2-vlm • • 22 Nov 2022

Vision language pre-training aims to learn alignments between vision and language from a large amount of data.

Paper
Code

XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models

facebook/xlm-v-base • 25 Jan 2023

Large multilingual language models typically rely on a single vocabulary shared across 100+ languages.

Paper
Code

GreekBART: The First Pretrained Greek Sequence-to-Sequence Model

iakovosevdaimon/greekbart • • 3 Apr 2023

In addition, we examine its performance on two NLG tasks from GreekSUM, a newly introduced summarization dataset for the Greek language.

Paper
Code

DUMB: A Benchmark for Smart Evaluation of Dutch Models

wietsedv/dumb • 22 May 2023

The benchmark includes a diverse set of datasets for low-, medium- and high-resource tasks.

Paper
Code

FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models

konstantinjdobler/focus • • 23 May 2023

However, if we want to use a new tokenizer specialized for the target language, we cannot transfer the source model's embedding matrix.

Paper
Code

PhoBERT: Pre-trained language models for Vietnamese

VinAIResearch/PhoBERT • • Findings of the Association for Computational Linguistics 2020

We present PhoBERT with two versions, PhoBERT-base and PhoBERT-large, the first public large-scale monolingual language models pre-trained for Vietnamese.

Paper
Code

Inducing Language-Agnostic Multilingual Representations

AIPHES/Language-Agnostic-Contextualized-Encoders • • Joint Conference on Lexical and Computational Semantics 2021

Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world.

Paper
Code

XLM-R

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result