TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Sentiment Analysis	IMDb	CNN+LSTM	Accuracy	88.9	# 35
Text Classification	Ohsumed	CNN+Lowercased	Accuracy	36.2	# 10
Sentiment Analysis	SST-2 Binary classification	CNN	Accuracy	91.2	# 56

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/on-the-role-of-text-preprocessing-in-neural/text-classification-on-ohsumed)](https://paperswithcode.com/sota/text-classification-on-ohsumed?p=on-the-role-of-text-preprocessing-in-neural)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/on-the-role-of-text-preprocessing-in-neural/sentiment-analysis-on-imdb)](https://paperswithcode.com/sota/sentiment-analysis-on-imdb?p=on-the-role-of-text-preprocessing-in-neural)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/on-the-role-of-text-preprocessing-in-neural/sentiment-analysis-on-sst-2-binary)](https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary?p=on-the-role-of-text-preprocessing-in-neural)`

On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis

WS 2018 · Jose Camacho-Collados, Mohammad Taher Pilehvar ·

Text preprocessing is often the first step in the pipeline of a Natural Language Processing (NLP) system, with potential impact in its final performance. Despite its importance, text preprocessing has not received much attention in the deep learning literature. In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier. We perform an extensive evaluation on standard benchmarks from text categorization and sentiment analysis. While our experiments show that a simple tokenization of input text is generally adequate, they also highlight significant degrees of variability across preprocessing techniques. This reveals the importance of paying attention to this usually-overlooked step in the pipeline, particularly when comparing different models. Finally, our evaluation provides insights into the best preprocessing practices for training word embeddings.

PDF Abstract WS 2018 PDF WS 2018 Abstract

Code

Add Remove Mark official

pedrada88/preproc-textclassification official

changji2069/Scope-Project

changji2069/literature-review

Tasks

Add Remove

Sentiment Analysis

Text Categorization

Text Classification

Word Embeddings

Datasets

SST

IMDb Movie Reviews SST-2

OHSUMED Ohsumed

Results from the Paper

Edit

Ranked #10 on Text Classification on Ohsumed

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Sentiment Analysis	IMDb	CNN+LSTM	Accuracy	88.9	# 35	Compare
Text Classification	Ohsumed	CNN+Lowercased	Accuracy	36.2	# 10	Compare
Sentiment Analysis	SST-2 Binary classification	CNN	Accuracy	91.2	# 56	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove