TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domain-specific data. In this paper, we propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks. We also provide a strong set of baselines as starting point, and compare different language modeling pre-training strategies. Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.

PDF Abstract Findings of 2020 PDF Findings of 2020 Abstract

Datasets


Introduced in the Paper:

TweetEval

Used in the Paper:

GLUE SuperGLUE

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Sentiment Analysis TweetEval RoBERTa-Base Emoji 30.9 # 3
Emotion 76.1 # 3
Hate 46.6 # 5
Irony 59.7 # 7
Offensive 79.5 # 2
Sentiment 71.3 # 3
Stance 68 # 3
ALL 61.3 # 3
Sentiment Analysis TweetEval SVM Emoji 29.3 # 4
Emotion 64.7 # 7
Hate 36.7 # 6
Irony 61.7 # 5
Offensive 52.3 # 7
Sentiment 62.9 # 5
Stance 67.3 # 4
ALL 53.5 # 7
Sentiment Analysis TweetEval LSTM Emoji 24.7 # 7
Emotion 66.0 # 5
Hate 52.6 # 1
Irony 62.8 # 4
Offensive 71.7 # 6
Sentiment 58.3 # 7
Stance 59.4 # 7
ALL 56.5 # 6
Sentiment Analysis TweetEval FastText Emoji 25.8 # 6
Emotion 65.2 # 6
Hate 50.6 # 3
Irony 63.1 # 3
Offensive 73.4 # 5
Sentiment 62.9 # 5
Stance 65.4 # 6
ALL 58.1 # 5
Sentiment Analysis TweetEval RoBERTa-Twitter Emoji 29.3 # 4
Emotion 72.0 # 4
Hate 49.9 # 4
Irony 65.4 # 2
Offensive 77.1 # 4
Sentiment 69.1 # 4
Stance 66.7 # 5
ALL 61.0 # 4

Methods


No methods listed for this paper. Add relevant methods here