35 papers with code • 0 benchmarks • 4 datasets
Text Categorization is the task of automatically assigning pre-defined categories to documents written in natural languages. Several types of Text Categorization have been studied, each of which deals with different types of documents and categories, such as topic categorization to detect discussed topics (e.g., sports, politics), spam detection, and sentiment classification to determine the sentiment typically in product or movie reviews.
These leaderboards are used to track progress in Text Categorization
LibrariesUse these libraries to find Text Categorization models and implementations
On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis
In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier.
Term weighting schemes often dominate the performance of many classifiers, such as kNN, centroid-based classifier and SVMs.
We show that discourse structure, as defined by Rhetorical Structure Theory and provided by an existing discourse parser, benefits text categorization.
The present contribution revolves around a contrastive subword n-gram model which has been tested in the Discriminating between Similar Languages shared task.
The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution.