TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Sentiment Analysis	TweetEval	RoBERTa-Base	Emoji	30.9	# 3
Sentiment Analysis	TweetEval	RoBERTa-Base	Emotion	76.1	# 3
Sentiment Analysis	TweetEval	RoBERTa-Base	Hate	46.6	# 5
Sentiment Analysis	TweetEval	RoBERTa-Base	Irony	59.7	# 7
Sentiment Analysis	TweetEval	RoBERTa-Base	Offensive	79.5	# 2
Sentiment Analysis	TweetEval	RoBERTa-Base	Sentiment	71.3	# 3
Sentiment Analysis	TweetEval	RoBERTa-Base	Stance	68	# 3
Sentiment Analysis	TweetEval	RoBERTa-Base	ALL	61.3	# 3
Sentiment Analysis	TweetEval	SVM	Emoji	29.3	# 4
Sentiment Analysis	TweetEval	SVM	Emotion	64.7	# 7
Sentiment Analysis	TweetEval	SVM	Hate	36.7	# 6
Sentiment Analysis	TweetEval	SVM	Irony	61.7	# 5
Sentiment Analysis	TweetEval	SVM	Offensive	52.3	# 7
Sentiment Analysis	TweetEval	SVM	Sentiment	62.9	# 5
Sentiment Analysis	TweetEval	SVM	Stance	67.3	# 4
Sentiment Analysis	TweetEval	SVM	ALL	53.5	# 7
Sentiment Analysis	TweetEval	LSTM	Emoji	24.7	# 7
Sentiment Analysis	TweetEval	LSTM	Emotion	66.0	# 5
Sentiment Analysis	TweetEval	LSTM	Hate	52.6	# 1
Sentiment Analysis	TweetEval	LSTM	Irony	62.8	# 4
Sentiment Analysis	TweetEval	LSTM	Offensive	71.7	# 6
Sentiment Analysis	TweetEval	LSTM	Sentiment	58.3	# 7
Sentiment Analysis	TweetEval	LSTM	Stance	59.4	# 7
Sentiment Analysis	TweetEval	LSTM	ALL	56.5	# 6
Sentiment Analysis	TweetEval	FastText	Emoji	25.8	# 6
Sentiment Analysis	TweetEval	FastText	Emotion	65.2	# 6
Sentiment Analysis	TweetEval	FastText	Hate	50.6	# 3
Sentiment Analysis	TweetEval	FastText	Irony	63.1	# 3
Sentiment Analysis	TweetEval	FastText	Offensive	73.4	# 5
Sentiment Analysis	TweetEval	FastText	Sentiment	62.9	# 5
Sentiment Analysis	TweetEval	FastText	Stance	65.4	# 6
Sentiment Analysis	TweetEval	FastText	ALL	58.1	# 5
Sentiment Analysis	TweetEval	RoBERTa-Twitter	Emoji	29.3	# 4
Sentiment Analysis	TweetEval	RoBERTa-Twitter	Emotion	72.0	# 4
Sentiment Analysis	TweetEval	RoBERTa-Twitter	Hate	49.9	# 4
Sentiment Analysis	TweetEval	RoBERTa-Twitter	Irony	65.4	# 2
Sentiment Analysis	TweetEval	RoBERTa-Twitter	Offensive	77.1	# 4
Sentiment Analysis	TweetEval	RoBERTa-Twitter	Sentiment	69.1	# 4
Sentiment Analysis	TweetEval	RoBERTa-Twitter	Stance	66.7	# 5
Sentiment Analysis	TweetEval	RoBERTa-Twitter	ALL	61.0	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tweeteval-unified-benchmark-and-comparative/sentiment-analysis-on-tweeteval)](https://paperswithcode.com/sota/sentiment-analysis-on-tweeteval?p=tweeteval-unified-benchmark-and-comparative)`

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

Findings of the Association for Computational Linguistics 2020 · Francesco Barbieri, Jose Camacho-Collados, Leonardo Neves, Luis Espinosa-Anke ·

The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domain-specific data. In this paper, we propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks. We also provide a strong set of baselines as starting point, and compare different language modeling pre-training strategies. Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.

PDF Abstract Findings of 2020 PDF Findings of 2020 Abstract

Code

Add Remove Mark official

cardiffnlp/tweeteval official

↳ Quickstart in

Colab

341

jinhxu/how-much-hate-with-china

↳ Quickstart in

Colab

Tasks

Add Remove

Classification

General Classification

Language Modelling

Sentiment Analysis

Datasets

Introduced in the Paper:

TweetEval

Used in the Paper:

GLUE

SuperGLUE

Results from the Paper

Edit

Ranked #3 on Sentiment Analysis on TweetEval

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Sentiment Analysis	TweetEval	RoBERTa-Base	Emoji	30.9	# 3	Compare
			Emotion	76.1	# 3	Compare
			Hate	46.6	# 5	Compare
			Irony	59.7	# 7	Compare
			Offensive	79.5	# 2	Compare
			Sentiment	71.3	# 3	Compare
			Stance	68	# 3	Compare
			ALL	61.3	# 3	Compare
Sentiment Analysis	TweetEval	SVM	Emoji	29.3	# 4	Compare
			Emotion	64.7	# 7	Compare
			Hate	36.7	# 6	Compare
			Irony	61.7	# 5	Compare
			Offensive	52.3	# 7	Compare
			Sentiment	62.9	# 5	Compare
			Stance	67.3	# 4	Compare
			ALL	53.5	# 7	Compare
Sentiment Analysis	TweetEval	LSTM	Emoji	24.7	# 7	Compare
			Emotion	66.0	# 5	Compare
			Hate	52.6	# 1	Compare
			Irony	62.8	# 4	Compare
			Offensive	71.7	# 6	Compare
			Sentiment	58.3	# 7	Compare
			Stance	59.4	# 7	Compare
			ALL	56.5	# 6	Compare
Sentiment Analysis	TweetEval	FastText	Emoji	25.8	# 6	Compare
			Emotion	65.2	# 6	Compare
			Hate	50.6	# 3	Compare
			Irony	63.1	# 3	Compare
			Offensive	73.4	# 5	Compare
			Sentiment	62.9	# 5	Compare
			Stance	65.4	# 6	Compare
			ALL	58.1	# 5	Compare
Sentiment Analysis	TweetEval	RoBERTa-Twitter	Emoji	29.3	# 4	Compare
			Emotion	72.0	# 4	Compare
			Hate	49.9	# 4	Compare
			Irony	65.4	# 2	Compare
			Offensive	77.1	# 4	Compare
			Sentiment	69.1	# 4	Compare
			Stance	66.7	# 5	Compare
			ALL	61.0	# 4	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove