Text Categorization

35 papers with code • 0 benchmarks • 4 datasets

Text Categorization is the task of automatically assigning pre-defined categories to documents written in natural languages. Several types of Text Categorization have been studied, each of which deals with different types of documents and categories, such as topic categorization to detect discussed topics (e.g., sports, politics), spam detection, and sentiment classification to determine the sentiment typically in product or movie reviews.

Source: Effective Use of Word Order for Text Categorization with Convolutional Neural Networks

Libraries

Use these libraries to find Text Categorization models and implementations

Most implemented papers

On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis

pedrada88/preproc-textclassification WS 2018

In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier.

Latent Dirichlet Allocation

vrjkmr/arxiv-topic 1 Jan 2003

Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities.

Inverse-Category-Frequency based supervised term weighting scheme for text categorization

zveryansky/textvec 13 Dec 2010

Term weighting schemes often dominate the performance of many classifiers, such as kNN, centroid-based classifier and SVMs.

Massively Multilingual Word Embeddings

idiap/mhan 5 Feb 2016

We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space.

pke: an open source python-based keyphrase extraction toolkit

boudinfl/pke COLING 2016

We describe pke, an open source python-based keyphrase extraction toolkit.

Neural Discourse Structure for Text Categorization

jiyfeng/disco4textcat ACL 2017

We show that discourse structure, as defined by Rhetorical Structure Theory and provided by an existing discourse parser, benefits text categorization.

Authorship Attribution Using Text Distortion

AuthorshipVerifier/TextDistortion EACL 2017

A crucial point in this field is to quantify the personal style of writing, ideally in a way that is not affected by changes in topic or genre.

Discriminating between Similar Languages using Weighted Subword Features

adbar/vardial-experiments WS 2017

The present contribution revolves around a contrastive subword n-gram model which has been tested in the Discriminating between Similar Languages shared task.

An Automated Text Categorization Framework based on Hyperparameter Optimization

INGEOTEC/microTC 6 Apr 2017

The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution.