39 papers with code • 0 benchmarks • 6 datasets
Text Categorization is the task of automatically assigning pre-defined categories to documents written in natural languages. Several types of Text Categorization have been studied, each of which deals with different types of documents and categories, such as topic categorization to detect discussed topics (e.g., sports, politics), spam detection, and sentiment classification to determine the sentiment typically in product or movie reviews.
These leaderboards are used to track progress in Text Categorization
LibrariesUse these libraries to find Text Categorization models and implementations
On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis
In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier.
Term weighting schemes often dominate the performance of many classifiers, such as kNN, centroid-based classifier and SVMs.
LEOPARD is trained with the state-of-the-art transformer architecture and shows better generalization to tasks not seen at all during training, with as few as 4 examples per label.
The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual.
We show that discourse structure, as defined by Rhetorical Structure Theory and provided by an existing discourse parser, benefits text categorization.