Text Categorization

41 papers with code • 0 benchmarks • 6 datasets

Text Categorization is the task of automatically assigning pre-defined categories to documents written in natural languages. Several types of Text Categorization have been studied, each of which deals with different types of documents and categories, such as topic categorization to detect discussed topics (e.g., sports, politics), spam detection, and sentiment classification to determine the sentiment typically in product or movie reviews.

Source: Effective Use of Word Order for Text Categorization with Convolutional Neural Networks

Libraries

Use these libraries to find Text Categorization models and implementations

Most implemented papers

Discriminating between Similar Languages using Weighted Subword Features

adbar/vardial-experiments WS 2017

The present contribution revolves around a contrastive subword n-gram model which has been tested in the Discriminating between Similar Languages shared task.

An Automated Text Categorization Framework based on Hyperparameter Optimization

INGEOTEC/microTC 6 Apr 2017

The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution.

Convex Formulation of Multiple Instance Learning from Positive and Unlabeled Bags

levelfour/pumil 22 Apr 2017

Multiple instance learning (MIL) is a variation of traditional supervised learning problems where data (referred to as bags) are composed of sub-elements (referred to as instances) and only bag labels are available.

Authorship Attribution Using the Chaos Game Representation

catalinstoean/FCGR-LR 14 Feb 2018

Validation results for the trained classifiers are competitive with the best methods in prior literature.

Fusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification

y3nk0/Graph-Based-TC WS 2018

Contrary to the traditional Bag-of-Words approach, we consider the Graph-of-Words(GoW) model in which each document is represented by a graph that encodes relationships between the different terms.

Topic or Style? Exploring the Most Useful Features for Authorship Attribution

yunitata/coling2018 COLING 2018

Approaches to authorship attribution, the task of identifying the author of a document, are based on analysis of individuals{'} writing style and/or preferred topics.

Document Informed Neural Autoregressive Topic Models

pgcool/iDocNADE 11 Aug 2018

Context information around words helps in determining their actual meaning, for example "networks" used in contexts of artificial neural networks or biological neuron networks.

SeVeN: Augmenting Word Embeddings with Unsupervised Relation Vectors

luisespinosa/seven COLING 2018

For example, by examining clusters of relation vectors, we observe that relational similarities can be identified at a more abstract level than with traditional word vector differences.

Using the Tsetlin Machine to Learn Human-Interpretable Rules for High-Accuracy Text Categorization with Medical Applications

cair/TextUnderstandingTsetlinMachine 12 Sep 2018

The Tsetlin Machine either performs on par with or outperforms all of the evaluated methods on both the 20 Newsgroups and IMDb datasets, as well as on a non-public clinical dataset.