Amazon-Fraud (Multi-relational Graph Dataset for Amazon Fraudulent Account Detection)

Introduced by Dou et al. in Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters

Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

  • Dataset Statistics
# Nodes %Fraud Nodes (Class=1)
11,944 9.5
Relation # Edges
U-P-U 175,608
U-S-U 3,566,479
U-V-U 1,036,737
All 4,398,392
  • Graph Construction

The Amazon dataset includes product reviews under the Musical Instruments category. Similar to this paper, we label users with more than 80% helpful votes as benign entities and users with less than 20% helpful votes as fraudulent entities. we conduct a fraudulent user detection task on the Amazon-Fraud dataset, which is a binary classification task. We take 25 handcrafted features from this paper as the raw node features for Amazon-Fraud. We take users as nodes in the graph and design three relations: 1) U-P-U: it connects users reviewing at least one same product; 2) U-S-V: it connects users having at least one same star rating within one week; 3) U-V-U: it connects users with top 5% mutual review text similarities (measured by TF-IDF) among all users.

To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.

Papers


Paper Code Results Date Stars

Dataset Loaders


Tasks


Similar Datasets


License


Modalities


Languages