Text Categorization

42 papers with code • 0 benchmarks • 6 datasets

Text Categorization is the task of automatically assigning pre-defined categories to documents written in natural languages. Several types of Text Categorization have been studied, each of which deals with different types of documents and categories, such as topic categorization to detect discussed topics (e.g., sports, politics), spam detection, and sentiment classification to determine the sentiment typically in product or movie reviews.

Source: Effective Use of Word Order for Text Categorization with Convolutional Neural Networks

Libraries

Use these libraries to find Text Categorization models and implementations

Most implemented papers

On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis

pedrada88/preproc-textclassification WS 2018

In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier.

Latent Dirichlet Allocation

vrjkmr/arxiv-topic 1 Jan 2003

Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities.

Inverse-Category-Frequency based supervised term weighting scheme for text categorization

zveryansky/textvec 13 Dec 2010

Term weighting schemes often dominate the performance of many classifiers, such as kNN, centroid-based classifier and SVMs.

Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks

iesl/leopard COLING 2020

LEOPARD is trained with the state-of-the-art transformer architecture and shows better generalization to tasks not seen at all during training, with as few as 4 examples per label.

A Sequential Algorithm for Training Text Classifiers

airi-institute/al_toolbox 24 Jul 1994

The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual.

Massively Multilingual Word Embeddings

idiap/mhan 5 Feb 2016

We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space.

pke: an open source python-based keyphrase extraction toolkit

boudinfl/pke COLING 2016

We describe pke, an open source python-based keyphrase extraction toolkit.

Neural Discourse Structure for Text Categorization

jiyfeng/disco4textcat ACL 2017

We show that discourse structure, as defined by Rhetorical Structure Theory and provided by an existing discourse parser, benefits text categorization.

Authorship Attribution Using Text Distortion

AuthorshipVerifier/TextDistortion EACL 2017

A crucial point in this field is to quantify the personal style of writing, ideally in a way that is not affected by changes in topic or genre.