Amazon-Fraud Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

Amazon-Fraud is a multi-relational graph dataset built upon the [Amazon review dataset](https://jmcauley.ucsd.edu/data/amazon/), which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

- **Dataset Statistics**

| # Nodes  |  %Fraud Nodes  (Class=1)|
|-------|--------|
|  11,944 | 9.5   |

| Relation  | # Edges |
|--------|--------|
  |   U-P-U    |  175,608 |
 |  U-S-U  |  3,566,479  |
|  U-V-U  |  1,036,737 |
 |  All |  4,398,392  |

- **Graph Construction**

The Amazon dataset includes product reviews under the Musical Instruments category. Similar to this [paper](https://arxiv.org/abs/2005.10150), we label users with more than 80% helpful votes as benign entities and users with less than 20% helpful votes as fraudulent entities. we conduct a fraudulent user detection task on the Amazon-Fraud dataset, which is a binary classification task. We take 25 handcrafted features from this [paper](https://arxiv.org/abs/2005.10150) as the raw node features for Amazon-Fraud. We take users as nodes in the graph and design three relations: **1) U-P-U:** it connects users reviewing at least one same product; **2) U-S-V:** it connects users having at least one same star rating within one week; **3) U-V-U:** it connects users with top 5% mutual review text similarities (measured by TF-IDF) among all users.

To download the dataset, please visit [this](https://github.com/YingtongDou/CARE-GNN) Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

Currently

datasets/4dc39d7b-46ae-4147-9cdb-ac80cc3faaa9.png Clear

Change

---

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Node Classification	Amazon-Fraud	GTAN
	Fraud Detection	Amazon-Fraud	GTAN

Amazon-Fraud (Multi-relational Graph Dataset for Amazon Fraudulent Account Detection)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

Yelp

Yelp-Fraud

Usage

License

Modalities

Languages

Amazon-Fraud (Multi-relational Graph Dataset for Amazon Fraudulent Account Detection)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit