TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Phrase Ranking	KP20k	Wiki+RoBERTa	P@5K	100.0	# 1
Phrase Ranking	KP20k	Wiki+RoBERTa	P@50K	98.5	# 1
Keyphrase Extraction	KP20k	PKE	Recall	57.1	# 5
Keyphrase Extraction	KP20k	PKE	F1@10	12.6	# 7
Keyphrase Extraction	KP20k	Spacy	Recall	59.5	# 4
Keyphrase Extraction	KP20k	Spacy	F1@10	15.3	# 4
Keyphrase Extraction	KP20k	StanfordNLP	Recall	51.7	# 7
Keyphrase Extraction	KP20k	StanfordNLP	F1@10	13.9	# 6
Keyphrase Extraction	KP20k	AutoPhrase	Recall	62.9	# 3
Keyphrase Extraction	KP20k	AutoPhrase	F1@10	18.2	# 3
Keyphrase Extraction	KP20k	TopMine	Recall	53.3	# 6
Keyphrase Extraction	KP20k	TopMine	F1@10	15.0	# 5
Keyphrase Extraction	KP20k	UCPhrase	Recall	72.9	# 2
Keyphrase Extraction	KP20k	UCPhrase	F1@10	19.7	# 1
Phrase Tagging	KP20k	AutoPhrase	Precision	55.2	# 3
Phrase Tagging	KP20k	AutoPhrase	Recall	45.2	# 3
Phrase Tagging	KP20k	AutoPhrase	F1	49.7	# 3
Phrase Tagging	KP20k	Wiki+RoBERTa	Precision	58.1	# 2
Phrase Tagging	KP20k	Wiki+RoBERTa	Recall	64.2	# 2
Phrase Tagging	KP20k	Wiki+RoBERTa	F1	61.0	# 2
Phrase Tagging	KP20k	TopMine	Precision	39.8	# 4
Phrase Tagging	KP20k	TopMine	Recall	41.4	# 4
Phrase Tagging	KP20k	TopMine	F1	40.6	# 4
Phrase Ranking	KP20k	TopMine	P@5K	81.5	# 3
Phrase Ranking	KP20k	TopMine	P@50K	78.0	# 3
Phrase Ranking	KP20k	UCPhrase	P@5K	96.5	# 2
Phrase Ranking	KP20k	UCPhrase	P@50K	96.5	# 2
Keyphrase Extraction	KP20k	Wiki+RoBERTa	Recall	73.0	# 1
Keyphrase Extraction	KP20k	Wiki+RoBERTa	F1@10	19.2	# 2
Phrase Tagging	KP20k	UCPhrase	Precision	69.9	# 1
Phrase Tagging	KP20k	UCPhrase	Recall	78.3	# 1
Phrase Tagging	KP20k	UCPhrase	F1	73.9	# 1
Keyphrase Extraction	KPTimes	Wiki+RoBERTa	Recall	64.5	# 3
Keyphrase Extraction	KPTimes	Wiki+RoBERTa	F1@10	9.4	# 3
Keyphrase Extraction	KPTimes	UCPhrase	Recall	83.4	# 1
Keyphrase Extraction	KPTimes	UCPhrase	F1@10	10.9	# 1
Phrase Ranking	KPTimes	Wiki+RoBERTa	P@5K	99.0	# 1
Phrase Ranking	KPTimes	Wiki+RoBERTa	P@50K	96.5	# 1
Phrase Tagging	KPTimes	AutoPhrase	Precision	44.2	# 3
Phrase Tagging	KPTimes	AutoPhrase	Recall	47.7	# 3
Phrase Tagging	KPTimes	AutoPhrase	F1	45.9	# 3
Phrase Tagging	KPTimes	Wiki+RoBERTa	Precision	60.9	# 2
Phrase Tagging	KPTimes	Wiki+RoBERTa	Recall	65.6	# 2
Phrase Tagging	KPTimes	Wiki+RoBERTa	F1	63.2	# 2
Phrase Tagging	KPTimes	TopMine	Precision	32.0	# 4
Phrase Tagging	KPTimes	TopMine	Recall	36.3	# 4
Phrase Tagging	KPTimes	TopMine	F1	34.0	# 4
Keyphrase Extraction	KPTimes	AutoPhrase	Recall	77.8	# 2
Keyphrase Extraction	KPTimes	AutoPhrase	F1@10	10.3	# 2
Phrase Tagging	KPTimes	UCPhrase	Precision	69.1	# 1
Phrase Tagging	KPTimes	UCPhrase	Recall	78.9	# 1
Phrase Tagging	KPTimes	UCPhrase	F1	73.5	# 1
Keyphrase Extraction	KPTimes	TopMine	Recall	63.4	# 4
Keyphrase Extraction	KPTimes	TopMine	F1@10	8.5	# 4
Phrase Ranking	KPTimes	AutoPhrase	P@5K	96.5	# 2
Phrase Ranking	KPTimes	AutoPhrase	P@50K	95.5	# 2
Phrase Ranking	KPTimes	TopMine	P@5K	85.5	# 4
Phrase Ranking	KPTimes	TopMine	P@50K	71.0	# 4
Phrase Ranking	KPTimes	UCPhrase	P@5K	96.5	# 2
Phrase Ranking	KPTimes	UCPhrase	P@50K	95.5	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ucphrase-unsupervised-context-aware-quality/phrase-ranking-on-kp20k)](https://paperswithcode.com/sota/phrase-ranking-on-kp20k?p=ucphrase-unsupervised-context-aware-quality)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ucphrase-unsupervised-context-aware-quality/keyphrase-extraction-on-kp20k)](https://paperswithcode.com/sota/keyphrase-extraction-on-kp20k?p=ucphrase-unsupervised-context-aware-quality)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ucphrase-unsupervised-context-aware-quality/phrase-tagging-on-kp20k)](https://paperswithcode.com/sota/phrase-tagging-on-kp20k?p=ucphrase-unsupervised-context-aware-quality)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ucphrase-unsupervised-context-aware-quality/keyphrase-extraction-on-kptimes)](https://paperswithcode.com/sota/keyphrase-extraction-on-kptimes?p=ucphrase-unsupervised-context-aware-quality)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ucphrase-unsupervised-context-aware-quality/phrase-ranking-on-kptimes)](https://paperswithcode.com/sota/phrase-ranking-on-kptimes?p=ucphrase-unsupervised-context-aware-quality)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ucphrase-unsupervised-context-aware-quality/phrase-tagging-on-kptimes)](https://paperswithcode.com/sota/phrase-tagging-on-kptimes?p=ucphrase-unsupervised-context-aware-quality)`

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging

28 May 2021 · Xiaotao Gu, Zihan Wang, Zhenyu Bi, Yu Meng, Liyuan Liu, Jiawei Han, Jingbo Shang ·

Identifying and understanding quality phrases from context is a fundamental task in text mining. The most challenging part of this task arguably lies in uncommon, emerging, and domain-specific phrases. The infrequent nature of these phrases significantly hurts the performance of phrase mining methods that rely on sufficient phrase occurrences in the input corpus. Context-aware tagging models, though not restricted by frequency, heavily rely on domain experts for either massive sentence-level gold labels or handcrafted gazetteers. In this work, we propose UCPhrase, a novel unsupervised context-aware quality phrase tagger. Specifically, we induce high-quality phrase spans as silver labels from consistently co-occurring word sequences within each document. Compared with typical context-agnostic distant supervision based on existing knowledge bases (KBs), our silver labels root deeply in the input domain and context, thus having unique advantages in preserving contextual completeness and capturing emerging, out-of-KB phrases. Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names. Alternatively, we observe that the contextualized attention maps generated from a transformer-based neural language model effectively reveal the connections between words in a surface-agnostic way. Therefore, we pair such attention maps with the silver labels to train a lightweight span prediction model, which can be applied to new input to recognize (unseen) quality phrases regardless of their surface names or frequency. Thorough experiments on various tasks and datasets, including corpus-level phrase ranking, document-level keyphrase extraction, and sentence-level phrase tagging, demonstrate the superiority of our design over state-of-the-art pre-trained, unsupervised, and distantly supervised methods.

PDF Abstract

Code

Add Remove Mark official

xgeric/UCPhrase-reproduce official

165

xgeric/UCPhrase-exp official

165

Tasks

Add Remove

Keyphrase Extraction

Language Modelling

Phrase Ranking

Phrase Tagging

Sentence

Datasets

KP20k KPTimes

Results from the Paper

Add Remove

Ranked #1 on Phrase Tagging on KPTimes

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Phrase Ranking	KP20k	Wiki+RoBERTa	P@5K	100.0	# 1	Compare
Phrase Ranking	KP20k	Wiki+RoBERTa	P@50K	98.5	# 1	Compare
Keyphrase Extraction	KP20k	PKE	Recall	57.1	# 5	Compare
Keyphrase Extraction	KP20k	PKE	F1@10	12.6	# 7	Compare
Keyphrase Extraction	KP20k	Spacy	Recall	59.5	# 4	Compare
Keyphrase Extraction	KP20k	Spacy	F1@10	15.3	# 4	Compare
Keyphrase Extraction	KP20k	StanfordNLP	Recall	51.7	# 7	Compare
Keyphrase Extraction	KP20k	StanfordNLP	F1@10	13.9	# 6	Compare
Keyphrase Extraction	KP20k	AutoPhrase	Recall	62.9	# 3	Compare
Keyphrase Extraction	KP20k	AutoPhrase	F1@10	18.2	# 3	Compare
Keyphrase Extraction	KP20k	TopMine	Recall	53.3	# 6	Compare
Keyphrase Extraction	KP20k	TopMine	F1@10	15.0	# 5	Compare
Keyphrase Extraction	KP20k	UCPhrase	Recall	72.9	# 2	Compare
Keyphrase Extraction	KP20k	UCPhrase	F1@10	19.7	# 1	Compare
Phrase Tagging	KP20k	AutoPhrase	Precision	55.2	# 3	Compare
			Recall	45.2	# 3	Compare
			F1	49.7	# 3	Compare
Phrase Tagging	KP20k	Wiki+RoBERTa	Precision	58.1	# 2	Compare
			Recall	64.2	# 2	Compare
			F1	61.0	# 2	Compare
Phrase Tagging	KP20k	TopMine	Precision	39.8	# 4	Compare
			Recall	41.4	# 4	Compare
			F1	40.6	# 4	Compare
Phrase Ranking	KP20k	TopMine	P@5K	81.5	# 3	Compare
Phrase Ranking	KP20k	TopMine	P@50K	78.0	# 3	Compare
Phrase Ranking	KP20k	UCPhrase	P@5K	96.5	# 2	Compare
Phrase Ranking	KP20k	UCPhrase	P@50K	96.5	# 2	Compare
Keyphrase Extraction	KP20k	Wiki+RoBERTa	Recall	73.0	# 1	Compare
Keyphrase Extraction	KP20k	Wiki+RoBERTa	F1@10	19.2	# 2	Compare
Phrase Tagging	KP20k	UCPhrase	Precision	69.9	# 1	Compare
			Recall	78.3	# 1	Compare
			F1	73.9	# 1	Compare
Keyphrase Extraction	KPTimes	Wiki+RoBERTa	Recall	64.5	# 3	Compare
Keyphrase Extraction	KPTimes	Wiki+RoBERTa	F1@10	9.4	# 3	Compare
Keyphrase Extraction	KPTimes	UCPhrase	Recall	83.4	# 1	Compare
Keyphrase Extraction	KPTimes	UCPhrase	F1@10	10.9	# 1	Compare
Phrase Ranking	KPTimes	Wiki+RoBERTa	P@5K	99.0	# 1	Compare
Phrase Ranking	KPTimes	Wiki+RoBERTa	P@50K	96.5	# 1	Compare
Phrase Tagging	KPTimes	AutoPhrase	Precision	44.2	# 3	Compare
			Recall	47.7	# 3	Compare
			F1	45.9	# 3	Compare
Phrase Tagging	KPTimes	Wiki+RoBERTa	Precision	60.9	# 2	Compare
			Recall	65.6	# 2	Compare
			F1	63.2	# 2	Compare
Phrase Tagging	KPTimes	TopMine	Precision	32.0	# 4	Compare
			Recall	36.3	# 4	Compare
			F1	34.0	# 4	Compare
Keyphrase Extraction	KPTimes	AutoPhrase	Recall	77.8	# 2	Compare
Keyphrase Extraction	KPTimes	AutoPhrase	F1@10	10.3	# 2	Compare
Phrase Tagging	KPTimes	UCPhrase	Precision	69.1	# 1	Compare
			Recall	78.9	# 1	Compare
			F1	73.5	# 1	Compare
Keyphrase Extraction	KPTimes	TopMine	Recall	63.4	# 4	Compare
Keyphrase Extraction	KPTimes	TopMine	F1@10	8.5	# 4	Compare
Phrase Ranking	KPTimes	AutoPhrase	P@5K	96.5	# 2	Compare
Phrase Ranking	KPTimes	AutoPhrase	P@50K	95.5	# 2	Compare
Phrase Ranking	KPTimes	TopMine	P@5K	85.5	# 4	Compare
Phrase Ranking	KPTimes	TopMine	P@50K	71.0	# 4	Compare
Phrase Ranking	KPTimes	UCPhrase	P@5K	96.5	# 2	Compare
Phrase Ranking	KPTimes	UCPhrase	P@50K	95.5	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove