Text Categorization is the task of automatically assigning pre-defined categories to documents written in natural languages. Several types of Text Categorization have been studied, each of which deals with different types of documents and categories, such as topic categorization to detect discussed topics (e.g., sports, politics), spam detection, and sentiment classification to determine the sentiment typically in product or movie reviews.
Turkish Wikipedia Named-Entity Recognition and Text Categorization (TWNERTC) dataset is a collection of automatically categorized and annotated sentences obtained from Wikipedia.
We describe pke, an open source python-based keyphrase extraction toolkit.
A recently introduced text classifier, called SS3, has obtained state-of-the-art performance on the CLEF's eRisk tasks.
SS3 was created to deal with ERD problems naturally since: it supports incremental training and classification over text streams, and it can visually explain its rationale.
Ranked #1 on Depression Detection on eRisk 2017
Term weighting schemes often dominate the performance of many classifiers, such as kNN, centroid-based classifier and SVMs.
Ranked #1 on Multi-class Classification on Reuters-52
The Tsetlin Machine either performs on par with or outperforms all of the evaluated methods on both the 20 Newsgroups and IMDb datasets, as well as on a non-public clinical dataset.
Another limitation of GCN when used on graph-based text representation tasks is that, GCNs do not consider the order information of nodes in graph.
Convolutional neural networks (CNNs) are inherently subject to invariable filters that can only aggregate local inputs with the same topological structures.