Robust Feature Selection using Sparse Centroid-Encoder

29 Sep 2021 · Tomojit Ghosh, Michael Kirby ·

We develop a sparse optimization problem for the determination of the total set of features that discriminate two or more classes. This is a sparse implementa- tion of the centroid-encoder for nonlinear data reduction and visualization called Sparse Centroid-Encoder (SCE). We also provide an iterative feature selection al- gorithm that first ranks each feature by its occurrence, and the optimal number of features is chosen using a validation set. The algorithm is applied to a wide vari- ety of data sets including, single-cell biological data, high dimensional infectious disease data, hyperspectral data, image data, and GIS data. We compared our method to various state-of-the-art feature selection techniques, including three neural network-based models (DFS, SG-L1-NN, G-L1-NN), Sparse SVM, and Random Forest. We empirically showed that SCE features produced better classi- fication accuracy on the unseen test data, often with fewer features.

PDF Abstract