Text Categorization
42 papers with code • 0 benchmarks • 6 datasets
Text Categorization is the task of automatically assigning pre-defined categories to documents written in natural languages. Several types of Text Categorization have been studied, each of which deals with different types of documents and categories, such as topic categorization to detect discussed topics (e.g., sports, politics), spam detection, and sentiment classification to determine the sentiment typically in product or movie reviews.
Source: Effective Use of Word Order for Text Categorization with Convolutional Neural Networks
Benchmarks
These leaderboards are used to track progress in Text Categorization
Libraries
Use these libraries to find Text Categorization models and implementationsMost implemented papers
A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks
Sarcasm detection is a key task for many natural language processing tasks.
On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis
In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier.
Latent Dirichlet Allocation
Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities.
Inverse-Category-Frequency based supervised term weighting scheme for text categorization
Term weighting schemes often dominate the performance of many classifiers, such as kNN, centroid-based classifier and SVMs.
Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks
LEOPARD is trained with the state-of-the-art transformer architecture and shows better generalization to tasks not seen at all during training, with as few as 4 examples per label.
A Sequential Algorithm for Training Text Classifiers
The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual.
Massively Multilingual Word Embeddings
We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space.
pke: an open source python-based keyphrase extraction toolkit
We describe pke, an open source python-based keyphrase extraction toolkit.
Neural Discourse Structure for Text Categorization
We show that discourse structure, as defined by Rhetorical Structure Theory and provided by an existing discourse parser, benefits text categorization.
Authorship Attribution Using Text Distortion
A crucial point in this field is to quantify the personal style of writing, ideally in a way that is not affected by changes in topic or genre.