Yelp-Fraud Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

Yelp-Fraud is a multi-relational graph dataset built upon the [Yelp spam review dataset](http://odds.cs.stonybrook.edu/yelpchi-dataset/), which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

- **Dataset Statistics**

| # Nodes  |  %Fraud Nodes (Class=1) |
|-------|--------|
|  45,954 | 14.5   |

| Relation  | # Edges |
|--------|--------|
  |   R-U-R    |  49,315 |
 |  R-T-R  |  573,616  |
|  R-S-R  |  3,402,743 |
 |  All |  3,846,979  |

- **Graph Construction**

The Yelp spam review dataset includes hotel and restaurant reviews filtered (spam) and recommended (legitimate) by Yelp. We conduct a spam review detection task on the Yelp-Fraud dataset which is a binary classification task. We take 32 handcrafted features from [SpEagle](http://shebuti.com/wp-content/uploads/2016/06/15-kdd-collectiveopinionspam.pdf) paper as the raw node features for Yelp-Fraud. Based on previous studies which show that opinion fraudsters have connections in user, product, review text, and time, we take reviews as nodes in the graph and design three relations: **1) R-U-R:** it connects reviews posted by the same user; **2) R-S-R:** it connects reviews under the same product with the same star rating (1-5 stars); **3) R-T-R:** it connects two reviews under the same product posted in the same month.

To download the dataset, please visit [this](https://github.com/YingtongDou/CARE-GNN) Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

Currently

datasets/1d397634-70c6-4108-b9be-08c379f6ac0d.png Clear

Change

---

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Node Classification	Yelp-Fraud	GTAN
	Fraud Detection	Yelp-Fraud	GTAN

Yelp-Fraud (Multi-relational Graph Dataset for Yelp Spam Review Detection)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

Yelp

FDCompCN

Amazon-Fraud

Usage

License

Modalities

Languages

Yelp-Fraud (Multi-relational Graph Dataset for Yelp Spam Review Detection)

Benchmarks Edit Add a new result Link an existing benchmark