TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text Classification	AG News	ULMFiT (Small data)	Error	6.3	# 7
Text Classification	Amazon-2	ULMFiT (Small data)	Error	3.9	# 3
Text Classification	Amazon-5	ULMFiT (Small data)	Error	35.9	# 2
Text Classification	DBpedia	ULMFiT (Small data)	Error	0.8	# 6
Text Classification	Sogou News	ULMFiT (Small data)	Accuracy	97	# 3
Text Classification	Yahoo! Answers	ULMFiT (Small data)	Accuracy	74.3	# 5
Text Classification	Yelp-2	ULMFiT (Small data)	Accuracy	97.1%	# 4
Text Classification	Yelp-5	ULMFiT (Small data)	Accuracy	67.6%	# 7

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sampling-bias-in-deep-active-classification/text-classification-on-amazon-5)](https://paperswithcode.com/sota/text-classification-on-amazon-5?p=sampling-bias-in-deep-active-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sampling-bias-in-deep-active-classification/text-classification-on-amazon-2)](https://paperswithcode.com/sota/text-classification-on-amazon-2?p=sampling-bias-in-deep-active-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sampling-bias-in-deep-active-classification/text-classification-on-sogou-news)](https://paperswithcode.com/sota/text-classification-on-sogou-news?p=sampling-bias-in-deep-active-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sampling-bias-in-deep-active-classification/text-classification-on-yelp-2)](https://paperswithcode.com/sota/text-classification-on-yelp-2?p=sampling-bias-in-deep-active-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sampling-bias-in-deep-active-classification/text-classification-on-yahoo-answers)](https://paperswithcode.com/sota/text-classification-on-yahoo-answers?p=sampling-bias-in-deep-active-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sampling-bias-in-deep-active-classification/text-classification-on-dbpedia)](https://paperswithcode.com/sota/text-classification-on-dbpedia?p=sampling-bias-in-deep-active-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sampling-bias-in-deep-active-classification/text-classification-on-ag-news)](https://paperswithcode.com/sota/text-classification-on-ag-news?p=sampling-bias-in-deep-active-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sampling-bias-in-deep-active-classification/text-classification-on-yelp-5)](https://paperswithcode.com/sota/text-classification-on-yelp-5?p=sampling-bias-in-deep-active-classification)`

Sampling Bias in Deep Active Classification: An Empirical Study

IJCNLP 2019 · Ameya Prabhu, Charles Dognin, Maneesh Singh ·

The exploding cost and time needed for data labeling and model training are bottlenecks for training DNN models on large datasets. Identifying smaller representative data samples with strategies like active learning can help mitigate such bottlenecks. Previous works on active learning in NLP identify the problem of sampling bias in the samples acquired by uncertainty-based querying and develop costly approaches to address it. Using a large empirical study, we demonstrate that active set selection using the posterior entropy of deep models like FastText.zip (FTZ) is robust to sampling biases and to various algorithmic choices (query size and strategies) unlike that suggested by traditional literature. We also show that FTZ based query strategy produces sample sets similar to those from more sophisticated approaches (e.g ensemble networks). Finally, we show the effectiveness of the selected samples by creating tiny high-quality datasets, and utilizing them for fast and cheap training of large models. Based on the above, we propose a simple baseline for deep active text classification that outperforms the state-of-the-art. We expect the presented work to be useful and informative for dataset compression and for problems involving active, semi-supervised or online learning scenarios. Code and models are available at: https://github.com/drimpossible/Sampling-Bias-Active-Learning

PDF Abstract IJCNLP 2019 PDF IJCNLP 2019 Abstract

Code

Add Remove Mark official

Xtra-Computing/thundersvm official

1,535

drimpossible/Sampling-Bias-Active-L… official

Tasks

Add Remove

Active Learning

Classification

General Classification

text-classification

Text Classification

Datasets

AG News

DBpedia Yahoo! Answers Yelp

Results from the Paper

Edit

Ranked #2 on Text Classification on Amazon-5

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text Classification	AG News	ULMFiT (Small data)	Error	6.3	# 7	Compare
Text Classification	Amazon-2	ULMFiT (Small data)	Error	3.9	# 3	Compare
Text Classification	Amazon-5	ULMFiT (Small data)	Error	35.9	# 2	Compare
Text Classification	DBpedia	ULMFiT (Small data)	Error	0.8	# 6	Compare
Text Classification	Sogou News	ULMFiT (Small data)	Accuracy	97	# 3	Compare
Text Classification	Yahoo! Answers	ULMFiT (Small data)	Accuracy	74.3	# 5	Compare
Text Classification	Yelp-2	ULMFiT (Small data)	Accuracy	97.1%	# 4	Compare
Text Classification	Yelp-5	ULMFiT (Small data)	Accuracy	67.6%	# 7	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Sampling Bias in Deep Active Classification: An Empirical Study

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove