Yelp-Fraud (Multi-relational Graph Dataset for Yelp Spam Review Detection)

Introduced by Dou et al. in Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters

Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

  • Dataset Statistics
# Nodes %Fraud Nodes (Class=1)
45,954 14.5
Relation # Edges
R-U-R 49,315
R-T-R 573,616
R-S-R 3,402,743
All 3,846,979
  • Graph Construction

The Yelp spam review dataset includes hotel and restaurant reviews filtered (spam) and recommended (legitimate) by Yelp. We conduct a spam review detection task on the Yelp-Fraud dataset which is a binary classification task. We take 32 handcrafted features from SpEagle paper as the raw node features for Yelp-Fraud. Based on previous studies which show that opinion fraudsters have connections in user, product, review text, and time, we take reviews as nodes in the graph and design three relations: 1) R-U-R: it connects reviews posted by the same user; 2) R-S-R: it connects reviews under the same product with the same star rating (1-5 stars); 3) R-T-R: it connects two reviews under the same product posted in the same month.

To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.

Papers


Paper Code Results Date Stars

Dataset Loaders


Tasks


Similar Datasets


License


Modalities


Languages