Unsupervised Anomaly Detection for Auditing Data and Impact of Categorical Encodings

25 Oct 2022  ยท  Ajay Chawda, Stefanie Grimm, Marius Kloft ยท

In this paper, we introduce the Vehicle Claims dataset, consisting of fraudulent insurance claims for automotive repairs. The data belongs to the more broad category of Auditing data, which includes also Journals and Network Intrusion data. Insurance claim data are distinctively different from other auditing data (such as network intrusion data) in their high number of categorical attributes. We tackle the common problem of missing benchmark datasets for anomaly detection: datasets are mostly confidential, and the public tabular datasets do not contain relevant and sufficient categorical attributes. Therefore, a large-sized dataset is created for this purpose and referred to as Vehicle Claims (VC) dataset. The dataset is evaluated on shallow and deep learning methods. Due to the introduction of categorical attributes, we encounter the challenge of encoding them for the large dataset. As One Hot encoding of high cardinal dataset invokes the "curse of dimensionality", we experiment with GEL encoding and embedding layer for representing categorical attributes. Our work compares competitive learning, reconstruction-error, density estimation and contrastive learning approaches for Label, One Hot, GEL encoding and embedding layer to handle categorical values.

PDF Abstract

Datasets


Introduced in the Paper:

Vehicle Claims

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Anomaly Detection Vehicle Claims Random Forest AUC 98.65 # 1
Anomaly Detection Vehicle Claims Gradient Boosting AUC 95.88 # 2
Unsupervised Anomaly Detection Vehicle Claims Isolation Forest AUC 59.42 # 2
Unsupervised Anomaly Detection Vehicle Claims One Class Support Vector Machines AUC 51.68 # 8
Unsupervised Anomaly Detection Vehicle Claims Local Outlier Factor AUC 52.86 # 7
Unsupervised Anomaly Detection Vehicle Claims Latent Outlier Exposure AUC 58.59 # 3
Unsupervised Anomaly Detection Vehicle Claims NeuTraL-AD AUC 57.03 # 4
Unsupervised Anomaly Detection Vehicle Claims SOM AUC 65.43 # 1
Unsupervised Anomaly Detection Vehicle Claims SOM-DAGMM AUC 53.82 # 6
Unsupervised Anomaly Detection Vehicle Claims DAGMM AUC 51.22 # 9
Unsupervised Anomaly Detection Vehicle Claims RSRAE AUC 55.38 # 5

Methods