A Hybrid Classification Approach using Topic Modeling and Graph Convolution Networks

Text classification has become a key operation in various natural language processing tasks. The efficiency of most classification algorithms predominantly confide in the quality of input features. In this work, we propose a novel multi-class text classification technique that harvests features from two distinct feature extraction methods. Firstly, a structured heterogeneous text graph built based on document-word relations and word co-occurrences is leveraged using a Graph Convolution Network (GCN). Secondly, the documents are topic modeled to use the document-topic score as features into the classification model. The concerned graph is constructed using Point-Wise Mutual Information (PMI) between pair of word co-occurrences and Term Frequency-Inverse Document Frequency (TF-IDF) score for words in the documents for word co-occurrences. Experimentation reveals that our text classification model outperforms the existing techniques for five benchmark text classification data sets.

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods