Expected Similarity Estimation for Large-Scale Batch and Streaming Anomaly Detection

25 Jan 2016Markus Schneider • Wolfgang Ertel • Fabio Ramos

We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (EXPoSE), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution.

