TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	EXTRA DATA	REMOVE
Text Summarization	OrangeSum	mBARThez (OrangeSum abstract)	ROUGE-1	32.67	# 1
Text Summarization	OrangeSum	BARThez (OrangeSum abstract)	ROUGE-1	31.44	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/barthez-a-skilled-pretrained-french-sequence/text-summarization-on-orangesum)](https://paperswithcode.com/sota/text-summarization-on-orangesum?p=barthez-a-skilled-pretrained-french-sequence)`

BARThez: a Skilled Pretrained French Sequence-to-Sequence Model

EMNLP 2021 · Moussa Kamal Eddine, Antoine J. -P. Tixier, Michalis Vazirgiannis ·

Inductive transfer learning has taken the entire NLP field by storm, with models such as BERT and BART setting new state of the art on countless NLU tasks. However, most of the available models and research have been conducted for English. In this work, we introduce BARThez, the first large-scale pretrained seq2seq model for French. Being based on BART, BARThez is particularly well-suited for generative tasks. We evaluate BARThez on five discriminative tasks from the FLUE benchmark and two generative tasks from a novel summarization dataset, OrangeSum, that we created for this research. We show BARThez to be very competitive with state-of-the-art BERT-based French language models such as CamemBERT and FlauBERT. We also continue the pretraining of a multilingual BART on BARThez' corpus, and show our resulting model, mBARThez, to significantly boost BARThez' generative performance. Code, data and models are publicly available.

PDF Abstract EMNLP 2021 PDF EMNLP 2021 Abstract

Code

Add Remove Mark official

moussaKam/BARThez official

huggingface/transformers

125,725

Tixierae/OrangeSum

moussaKam/OrangeSum

Tasks

Add Remove

FLUE

Natural Language Understanding

OrangeSum

Self-Supervised Learning

Text Summarization

Transfer Learning

Datasets

Introduced in the Paper:

OrangeSum

Used in the Paper:

GLUE FLUE

Results from the Paper

Edit

Ranked #1 on Text Summarization on OrangeSum (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Uses Extra Training Data	Result	Benchmark
Text Summarization	OrangeSum	mBARThez (OrangeSum abstract)	ROUGE-1	32.67	# 1			Compare
Text Summarization	OrangeSum	BARThez (OrangeSum abstract)	ROUGE-1	31.44	# 2			Compare

Methods

Add Remove

Adam • Attention Dropout • BART • BERT • BPE • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • LSTM • mBARTHez • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Seq2Seq • Sigmoid Activation • Softmax • Tanh Activation • Weight Decay • WordPiece

Edit Social Preview

BARThez: a Skilled Pretrained French Sequence-to-Sequence Model

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove