Transfer Learning

1183 papers with code • 4 benchmarks • 6 datasets

Transfer learning is a methodology where weights from a model trained on one task are taken and either used (a) to construct a fixed feature extractor, (b) as weight initialization and/or fine-tuning.

( Image credit: Subodh Malgonde )

Greatest papers with code

Talking-Heads Attention

tensorflow/models 5 Mar 2020

We introduce "talking-heads attention" - a variation on multi-head attention which includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation. While inserting only a small number of additional parameters and a moderate amount of additionalcomputation, talking-heads attention leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.

Language Modelling Question Answering +1

BARThez: a Skilled Pretrained French Sequence-to-Sequence Model

huggingface/transformers 23 Oct 2020

We show BARThez to be very competitive with state-of-the-art BERT-based French language models such as CamemBERT and FlauBERT.

 Ranked #1 on Text Summarization on OrangeSum (using extra training data)

Natural Language Understanding Self-Supervised Learning +2

TAPAS: Weakly Supervised Table Parsing via Pre-training

huggingface/transformers ACL 2020

In this paper, we present TAPAS, an approach to question answering over tables without generating logical forms.

Question Answering Semantic Parsing +1

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

huggingface/transformers arXiv 2019

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP).

Common Sense Reasoning Question Answering +3

HuggingFace's Transformers: State-of-the-art Natural Language Processing

huggingface/transformers 9 Oct 2019

Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks.

Text Generation Transfer Learning

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

huggingface/transformers NeurIPS 2019

As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging.

Hate Speech Detection Knowledge Distillation +7

Movement Pruning: Adaptive Sparsity by Fine-Tuning

huggingface/transformers NeurIPS 2020

Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications.

Network Pruning Transfer Learning