TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Dependency Parsing	French GSD	CamemBERT	LAS	92.47	# 1
Dependency Parsing	French GSD	CamemBERT	UAS	94.82	# 1
Part-Of-Speech Tagging	French GSD	CamemBERT	UPOS	98.19	# 1
Named Entity Recognition (NER)	French Treebank	CamemBERT (subword masking)	F1	87.93	# 1
Named Entity Recognition (NER)	French Treebank	CamemBERT (subword masking)	Precision	88.35	# 1
Named Entity Recognition (NER)	French Treebank	CamemBERT (subword masking)	Recall	87.46	# 1
Dependency Parsing	ParTUT	CamemBERT	LAS	92.9	# 1
Dependency Parsing	ParTUT	CamemBERT	UAS	95.21	# 1
Part-Of-Speech Tagging	ParTUT	CamemBERT	UPOS	97.63	# 1
Part-Of-Speech Tagging	Sequoia Treebank	CamemBERT	UPOS	99.21	# 1
Dependency Parsing	Sequoia Treebank	CamemBERT	LAS	94.39	# 1
Dependency Parsing	Sequoia Treebank	CamemBERT	UAS	95.56	# 1
Part-Of-Speech Tagging	Spoken Corpus	CamemBERT	UPOS	96.68	# 1
Dependency Parsing	Spoken Corpus	CamemBERT	LAS	81.37	# 1
Dependency Parsing	Spoken Corpus	CamemBERT	UAS	86.05	# 1
Natural Language Inference	XNLI French	CamemBERT	Accuracy	81.2	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/camembert-a-tasty-french-language-model/dependency-parsing-on-french-gsd)](https://paperswithcode.com/sota/dependency-parsing-on-french-gsd?p=camembert-a-tasty-french-language-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/camembert-a-tasty-french-language-model/part-of-speech-tagging-on-french-gsd)](https://paperswithcode.com/sota/part-of-speech-tagging-on-french-gsd?p=camembert-a-tasty-french-language-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/camembert-a-tasty-french-language-model/named-entity-recognition-on-french-treebank)](https://paperswithcode.com/sota/named-entity-recognition-on-french-treebank?p=camembert-a-tasty-french-language-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/camembert-a-tasty-french-language-model/dependency-parsing-on-partut)](https://paperswithcode.com/sota/dependency-parsing-on-partut?p=camembert-a-tasty-french-language-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/camembert-a-tasty-french-language-model/part-of-speech-tagging-on-partut)](https://paperswithcode.com/sota/part-of-speech-tagging-on-partut?p=camembert-a-tasty-french-language-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/camembert-a-tasty-french-language-model/part-of-speech-tagging-on-sequoia-treebank)](https://paperswithcode.com/sota/part-of-speech-tagging-on-sequoia-treebank?p=camembert-a-tasty-french-language-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/camembert-a-tasty-french-language-model/dependency-parsing-on-sequoia-treebank)](https://paperswithcode.com/sota/dependency-parsing-on-sequoia-treebank?p=camembert-a-tasty-french-language-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/camembert-a-tasty-french-language-model/part-of-speech-tagging-on-spoken-corpus)](https://paperswithcode.com/sota/part-of-speech-tagging-on-spoken-corpus?p=camembert-a-tasty-french-language-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/camembert-a-tasty-french-language-model/dependency-parsing-on-spoken-corpus)](https://paperswithcode.com/sota/dependency-parsing-on-spoken-corpus?p=camembert-a-tasty-french-language-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/camembert-a-tasty-french-language-model/natural-language-inference-on-xnli-french)](https://paperswithcode.com/sota/natural-language-inference-on-xnli-french?p=camembert-a-tasty-french-language-model)`

CamemBERT: a Tasty French Language Model

ACL 2020 · Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah, Benoît Sagot ·

Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available models have either been trained on English data or on the concatenation of data in multiple languages. This makes practical use of such models --in all languages except English-- very limited. In this paper, we investigate the feasibility of training monolingual Transformer-based language models for other languages, taking French as an example and evaluating our language models on part-of-speech tagging, dependency parsing, named entity recognition and natural language inference tasks. We show that the use of web crawled data is preferable to the use of Wikipedia data. More surprisingly, we show that a relatively small web crawled dataset (4GB) leads to results that are as good as those obtained using larger datasets (130+GB). Our best performing model CamemBERT reaches or improves the state of the art in all four downstream tasks.

PDF Abstract ACL 2020 PDF ACL 2020 Abstract

Code

Add Remove Mark official

huggingface/transformers official

124,527

Karthik-Bhaskar/Context-Based-Quest…

anaishoareau/french_preprocessing

bourrel/French-News-Clustering

hbaflast/bert-sentiment-analysis-py…

See all 6 implementations

Tasks

Add Remove

Dependency Parsing

Language Modelling

Named Entity Recognition

Named Entity Recognition (NER)

Natural Language Inference

Part-Of-Speech Tagging

Datasets

Introduced in the Paper:

French Wikipedia

Used in the Paper:

MultiNLI

XNLI CCNet

Results from the Paper

Edit

Ranked #1 on Dependency Parsing on French GSD

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Dependency Parsing	French GSD	CamemBERT	LAS	92.47	# 1	Compare
Dependency Parsing	French GSD	CamemBERT	UAS	94.82	# 1	Compare
Part-Of-Speech Tagging	French GSD	CamemBERT	UPOS	98.19	# 1	Compare
Named Entity Recognition (NER)	French Treebank	CamemBERT (subword masking)	F1	87.93	# 1	Compare
			Precision	88.35	# 1	Compare
			Recall	87.46	# 1	Compare
Dependency Parsing	ParTUT	CamemBERT	LAS	92.9	# 1	Compare
Dependency Parsing	ParTUT	CamemBERT	UAS	95.21	# 1	Compare
Part-Of-Speech Tagging	ParTUT	CamemBERT	UPOS	97.63	# 1	Compare
Part-Of-Speech Tagging	Sequoia Treebank	CamemBERT	UPOS	99.21	# 1	Compare
Dependency Parsing	Sequoia Treebank	CamemBERT	LAS	94.39	# 1	Compare
Dependency Parsing	Sequoia Treebank	CamemBERT	UAS	95.56	# 1	Compare
Part-Of-Speech Tagging	Spoken Corpus	CamemBERT	UPOS	96.68	# 1	Compare
Dependency Parsing	Spoken Corpus	CamemBERT	LAS	81.37	# 1	Compare
Dependency Parsing	Spoken Corpus	CamemBERT	UAS	86.05	# 1	Compare
Natural Language Inference	XNLI French	CamemBERT	Accuracy	81.2	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

CamemBERT: a Tasty French Language Model

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove