A Comparison Study of Credit Card Fraud Detection: Supervised versus Unsupervised

24 Apr 2019 · Xuetong Niu, Li Wang, Xulei Yang ·

Credit card has become popular mode of payment for both online and offline purchase, which leads to increasing daily fraud transactions. An Efficient fraud detection methodology is therefore essential to maintain the reliability of the payment system. In this study, we perform a comparison study of credit card fraud detection by using various supervised and unsupervised approaches. Specifically, 6 supervised classification models, i.e., Logistic Regression (LR), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGB), as well as 4 unsupervised anomaly detection models, i.e., One-Class SVM (OCSVM), Auto-Encoder (AE), Restricted Boltzmann Machine (RBM), and Generative Adversarial Networks (GAN), are explored in this study. We train all these models on a public credit card transaction dataset from Kaggle website, which contains 492 frauds out of 284,807 transactions. The labels of the transactions are used for supervised learning models only. The performance of each model is evaluated through 5-fold cross validation in terms of Area Under the Receiver Operating Curves (AUROC). Within supervised approaches, XGB and RF obtain the best performance with AUROC = 0.989 and AUROC = 0.988, respectively. While for unsupervised approaches, RBM achieves the best performance with AUROC = 0.961, followed by GAN with AUROC = 0.954. The experimental results show that supervised models perform slightly better than unsupervised models in this study. Anyway, unsupervised approaches are still promising for credit card fraud transaction detection due to the insufficient annotation and the data imbalance issue in real-world applications.

PDF Abstract