An Integrated Inverse Space Sparse Representation Framework for Tumor Classification

9 Mar 2018 · Xiaohui Yang, Wen-Ming Wu, Yun-Mei Chen, Xianqi Li, Juan Zhang, Dan Long, Li-Jun Yang ·

Microarray gene expression data-based tumor classification is an active and challenging issue. In this paper, an integrated tumor classification framework is presented, which aims to exploit information in existing available samples, and focuses on the small sample problem and unbalanced classification problem. Firstly, an inverse space sparse representation based classification (ISSRC) model is proposed by considering the characteristics of gene-based tumor data, such as sparsity and a small number of training samples. A decision information factors (DIF)-based gene selection method is constructed to enhance the representation ability of the ISSRC. It is worth noting that the DIF is established from reducing clinical misdiagnosis rate and dimension of small sample data. For further improving the representation ability and classification stability of the ISSRC, feature learning is conducted on the selected gene subset. The feature learning method is constructed by complementing the advantages of non-negative matrix factorization (NMF) and deep learning. Without confusion, the ISSRC combined with gene selection and feature learning is called the integrated ISSRC, whose stability, optimization and the corresponding convergence are analyzed. Extensive experiments on six public microarray gene expression datasets show the integrated ISSRC-based tumor classification framework is superior to classical and state-of-the-art methods. There are significant improvements in classification accuracy, specificity and sensitivity, whether there is a tumor in the early diagnosis, what kind of tumor, or whether metastasis occurs after tumor surgery.

PDF Abstract