Turkish Wikipedia Named-Entity Recognition and Text Categorization (TWNERTC) dataset is a collection of automatically categorized and annotated sentences obtained from Wikipedia.
The Tsetlin Machine either performs on par with or outperforms all of the evaluated methods on both the 20 Newsgroups and IMDb datasets, as well as on a non-public clinical dataset.
Another limitation of GCN when used on graph-based text representation tasks is that, GCNs do not consider the order information of nodes in graph.
Convolutional neural networks (CNNs) are inherently subject to invariable filters that can only aggregate local inputs with the same topological structures.
Contrary to the traditional Bag-of-Words approach, we consider the Graph-of-Words(GoW) model in which each document is represented by a graph that encodes relationships between the different terms.
We show that discourse structure, as defined by Rhetorical Structure Theory and provided by an existing discourse parser, benefits text categorization.