Key Gene Mining in Transcriptional Regulation for Specific Biological Processes with Small Sample Sizes Using Multi-network pipeline Transformer

Gene mining is an important topic in the field of life sciences, but traditional machine learning methods cannot consider the regulatory relationships between genes. Deep learning methods perform poorly in small sample sizes. This study proposed a deep learning method, called TransGeneSelector, that can mine critical regulatory genes involved in certain life processes using a small-sample transcriptome dataset. The method combines a WGAN-GP data augmentation network, a sample filtering network, and a Transformer classifier network, which successfully classified the state (germinating or dry seeds) of Arabidopsis thaliana seed in a dataset of 79 samples, showing performance comparable to that of Random Forests. Further, through the use of SHapley Additive exPlanations method, TransGeneSelector successfully mined genes involved in seed germination. Through the construction of gene regulatory networks and the enrichment analysis of KEGG, as well as RT-qPCR quantitative analysis, it was confirmed that these genes are at a more upstream regulatory level than those Random Forests mined, and the top 11 genes that were uniquely mined by TransGeneSelector were found to be related to the KAI2 signaling pathway, which is of great regulatory importance for germination-related genes. This study provides a practical tool for life science researchers to mine key genes from transcriptome data.

PDF Abstract
No code implementations yet. Submit your code now


  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.