TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Spam detection	Traditional and Context-specific Spam Twitter	BERT	Avg F1	0.8553	# 1
Context-specific Spam Detection	Traditional and Context-specific Spam Twitter	BERT	Avg F1	0.8408	# 1
Traditional Spam Detection	Traditional and Context-specific Spam Twitter	BERT	Avg F1	0.9079	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/traditional-and-context-specific-spam/spam-detection-on-context-specific-spam)](https://paperswithcode.com/sota/spam-detection-on-context-specific-spam?p=traditional-and-context-specific-spam)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/traditional-and-context-specific-spam/context-specific-spam-detection-on-context)](https://paperswithcode.com/sota/context-specific-spam-detection-on-context?p=traditional-and-context-specific-spam)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/traditional-and-context-specific-spam/traditional-spam-detection-on-context)](https://paperswithcode.com/sota/traditional-spam-detection-on-context?p=traditional-and-context-specific-spam)`

Traditional and context-specific spam detection in low resource settings

Machine Learning 2022 · Kornraphop Kawintiranon, Lisa Singh ·

Social media data has a mix of high and low-quality content. One form of commonly studied low-quality content is spam. Most studies assume that spam is context-neutral. We show on different Twitter data sets that context-specific spam exists and is identifiable. We then compare multiple traditional machine learning models and a neural network model that uses a pre-trained BERT language model to capture contextual features for identifying spam, both traditional and context-specific, using only content-based features. The neural network model outperforms the traditional models with an F1 score of 0.91. Because spam training data sets are notoriously imbalanced, we also investigate the impact of this imbalance and show that simple Bag-of-Words models are best with extreme imbalance, but a neural model that fine-tunes using language models from other domains significantly improves the F1 score, but not to the levels of domain-specific neural models. This suggests that the strategy employed may vary depending upon the level of imbalance in the data set, the amount of data available in a low resource setting, and the prevalence of context-specific spam vs. traditional spam. Finally, we make our data sets available for use by the research community.

PDF Abstract

Code

Add Remove Mark official

GU-DataLab/context-spam

Tasks

Add Remove

Context-specific Spam Detection

Language Modelling

Spam detection

Traditional Spam Detection

Datasets

Introduced in the Paper:

Traditional and Context-specific Spam Twitter

Results from the Paper

Add Remove

Ranked #1 on Spam detection on Traditional and Context-specific Spam Twitter (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Spam detection	Traditional and Context-specific Spam Twitter	BERT	Avg F1	0.8553	# 1	Compare
Context-specific Spam Detection	Traditional and Context-specific Spam Twitter	BERT	Avg F1	0.8408	# 1	Compare
Traditional Spam Detection	Traditional and Context-specific Spam Twitter	BERT	Avg F1	0.9079	# 1	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

Traditional and context-specific spam detection in low resource settings

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove